[ 
https://issues.apache.org/jira/browse/SYSTEMML-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1837.
--------------------------------------
       Resolution: Fixed
         Assignee: Matthias Boehm
    Fix Version/s: SystemML 1.0

> Unary aggregate w/ corrections output to large physical blocks
> --------------------------------------------------------------
>
>                 Key: SYSTEMML-1837
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1837
>             Project: SystemML
>          Issue Type: Bug
>            Reporter: Matthias Boehm
>            Assignee: Matthias Boehm
>             Fix For: SystemML 1.0
>
>
> Many unary aggregate operations store corrections in additional columns or 
> rows. For example, {{rowSums(X)}} uses a two-column output to store sums and 
> corrections. In CP, we drop these corrections immediately after the 
> operations, while in MR and Spark these corrections are dropped after final 
> aggregation. The issue is that the {{MatrixBlock::dropLastRowsOrColums}} does 
> not actually drop the correction but simply shifts all values in the right 
> starting positions. Hence, the physical output is actually larger than what 
> the memory estimates represent. This leads to unnecessary large memory 
> consumption during subsequent operations and in the buffer pool, which can 
> lead to OOMs. This task aims to fix {{MatrixBlock::dropLastRowsOrColums}}. 
> In a subsequent task, we could also modify all unary aggregates to never 
> allocate the multi-column/row output when executed in CP. However, this 
> requires custom code paths for the different backends. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to