Matthias Boehm created SYSTEMML-2169:
----------------------------------------

             Summary: Spark nary cbind/rbind with broadcasts
                 Key: SYSTEMML-2169
                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2169
             Project: SystemML
          Issue Type: Task
            Reporter: Matthias Boehm


The introduction of nary cbind and rbinds in SYSTEMML-1986 added support for 
operations like {{E = cbind(A,B,C,D)}} which concatenates the matrices A, B, C, 
D column-wise without the need for intermediates as requires by traditional 
binary cbind operations ({{cbind(cbind(cbind(A,B),C),D)}}). SystemML also 
provides rewrites to automatically collapse chains of cbind or rbind operations 
into their nary counter-parts. 

However, for distributed spark operations, the binary cbind is still much 
better optimized than the nary operation, which only provides a general case 
operation based on repartition joins. 

This tasks aims to address this by extending {{BuiltinNarySPInstruction}} at 
runtime level. Given the unlimited number of inputs, this runtime approach 
seems more appropriate than dedicated physical operations at compiler level. In 
detail, we need to evaluate if a subset of input fits into the broadcast 
budget, and if so provide alternative code path for nary cbind/rbind operations 
with broadcast joins.

Note that distributed codegen operations have a similar characteristics of 
unlimited inputs and already leverage broadcast variables when possible. Hence, 
we can probably use a similar approach as done in {{SpoofSPInstruction}}.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to