Matthias Boehm created SYSTEMML-1837:
----------------------------------------

             Summary: Unary aggregate w/ corrections output to large physical 
blocks
                 Key: SYSTEMML-1837
                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1837
             Project: SystemML
          Issue Type: Bug
            Reporter: Matthias Boehm


Many unary aggregate operations store corrections in additional columns or 
rows. For example, {{rowSums(X)}} uses a two-column output to store sums and 
corrections. In CP, we drop these corrections immediately after the operations, 
while in MR and Spark these corrections are dropped after final aggregation. 
The issue is that the {{MatrixBlock::dropLastRowsOrColums}} does not actually 
drop the correction but simply shifts all values in the right starting 
positions. Hence, the physical output is actually larger than what the memory 
estimates represent. This leads to unnecessary large memory consumption during 
subsequent operations and in the buffer pool, which can lead to OOMs. This task 
aims to fix {{MatrixBlock::dropLastRowsOrColums}}. 

In a subsequent task, we could also modify all unary aggregates to never 
allocate the multi-column/row output when executed in CP. However, this 
requires custom code paths for the different backends. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to