Matthias Boehm created SYSTEMML-946: ---------------------------------------
Summary: OOM on spark dataframe-matrix / csv-matrix conversion Key: SYSTEMML-946 URL: https://issues.apache.org/jira/browse/SYSTEMML-946 Project: SystemML Issue Type: Bug Components: Runtime Reporter: Matthias Boehm The decision on dense/sparse block allocation in our dataframeToBinaryBlock and csvToBinaryBlock data converters is purely based on the sparsity. This works very well for the common case of tall & skinny matrices. However, for scenarios with dense data but huge number of columns a single partition will rarely have 1000 rows to fill an entire row of blocks. This leads to unnecessary allocation and dense-sparse conversion as well as potential out-of-memory errors because the temporary memory requirement can be up to 1000x larger than the input partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)