Re: Passing a CoordinateMatrix to SystemML

2018-01-10 Thread Matthias Boehm
great - I'm glad to hear that. Thanks again for catching these issues Anthony. Regards, Matthias On Wed, Jan 10, 2018 at 11:09 AM, Anthony Thomas wrote: > Hey Matthias, > > Just wanted to confirm that patch above works for me - I'm now able to pass > a dataframe of

Re: Passing a CoordinateMatrix to SystemML

2017-12-25 Thread Matthias Boehm
ok that was very helpful - I just pushed two additional fixes which should resolve these issues. The underlying cause was an incorrect sparse row preallocation (to reduce GC overhead), which resulted in resizing issues for initial sizes of zero. These two patches fix the underlying issues,

Re: Passing a CoordinateMatrix to SystemML

2017-12-24 Thread Anthony Thomas
Thanks Matthias - unfortunately I'm still running into an ArrayIndexOutOfBounds exception both in reading the file as IJV and when calling dataFrametoBinaryBlock. Just to confirm: I downloaded and compiled the latest version using: git clone https://github.com/apache/systemml cd systemml mvn

Re: Passing a CoordinateMatrix to SystemML

2017-12-24 Thread Matthias Boehm
Hi Anthony, thanks for helping to debug this issue. There are no limits other than the dimensions and number of non-zeros being of type long. It sounds more like an issues of converting special cases of ultra-sparse matrices. I'll try to reproduce this issue and give an update as soon as I

Re: Passing a CoordinateMatrix to SystemML

2017-12-23 Thread Anthony Thomas
Okay thanks for the suggestions - I upgraded to 1.0 and tried providing dimensions and blocksizes to dataFrameToBinaryBlock both without success. I additionally wrote out the matrix to hdfs in IJV format and am still getting the same error when calling "read()" directly in the DML. However, I

Re: Passing a CoordinateMatrix to SystemML

2017-12-23 Thread Matthias Boehm
Given the line numbers from the stacktrace, it seems that you use a rather old version of SystemML. Hence, I would recommend to upgrade to SystemML 1.0 or at least 0.15 first. If the error persists or you're not able to upgrade, please try to call dataFrameToBinaryBlock with provided matrix

Re: Passing a CoordinateMatrix to SystemML

2017-12-22 Thread Anthony Thomas
Hi Matthias, Thanks for the help! In response to your questions: 1. Sorry - this was a typo: the correct schema is: [y: int, features: vector] - the column "features" was created using Spark's VectorAssembler and the underlying type is an org.apache.spark.ml.linalg.SparseVector.

Re: Passing a CoordinateMatrix to SystemML

2017-12-22 Thread Matthias Boehm
well, let's do the following to figure this out: 1) If the schema is indeed [label: Integer, features: SparseVector], please change the third line to val y = input_data.select("label"). 2) For debugging, I would recommend to use a simple script like "print(sum(X));" and try converting X and

Passing a CoordinateMatrix to SystemML

2017-12-21 Thread Anthony Thomas
Hi SystemML folks, I'm trying to pass some data from Spark to a DML script via the MLContext API. The data is derived from a parquet file containing a dataframe with the schema: [label: Integer, features: SparseVector]. I am doing the following: val input_data =