great - I'm glad to hear that. Thanks again for catching these issues
Anthony.
Regards,
Matthias
On Wed, Jan 10, 2018 at 11:09 AM, Anthony Thomas
wrote:
> Hey Matthias,
>
> Just wanted to confirm that patch above works for me - I'm now able to pass
> a dataframe of
ok that was very helpful - I just pushed two additional fixes which
should resolve these issues. The underlying cause was an incorrect
sparse row preallocation (to reduce GC overhead), which resulted in
resizing issues for initial sizes of zero. These two patches fix the
underlying issues,
Thanks Matthias - unfortunately I'm still running into an
ArrayIndexOutOfBounds exception both in reading the file as IJV and when
calling dataFrametoBinaryBlock. Just to confirm: I downloaded and compiled
the latest version using:
git clone https://github.com/apache/systemml
cd systemml
mvn
Hi Anthony,
thanks for helping to debug this issue. There are no limits other than
the dimensions and number of non-zeros being of type long. It sounds
more like an issues of converting special cases of ultra-sparse
matrices. I'll try to reproduce this issue and give an update as soon as
I
Okay thanks for the suggestions - I upgraded to 1.0 and tried providing
dimensions and blocksizes to dataFrameToBinaryBlock both without success. I
additionally wrote out the matrix to hdfs in IJV format and am still
getting the same error when calling "read()" directly in the DML. However,
I
Given the line numbers from the stacktrace, it seems that you use a
rather old version of SystemML. Hence, I would recommend to upgrade to
SystemML 1.0 or at least 0.15 first.
If the error persists or you're not able to upgrade, please try to call
dataFrameToBinaryBlock with provided matrix
Hi Matthias,
Thanks for the help! In response to your questions:
1. Sorry - this was a typo: the correct schema is: [y: int, features:
vector] - the column "features" was created using Spark's VectorAssembler
and the underlying type is an org.apache.spark.ml.linalg.SparseVector.
well, let's do the following to figure this out:
1) If the schema is indeed [label: Integer, features: SparseVector],
please change the third line to val y = input_data.select("label").
2) For debugging, I would recommend to use a simple script like
"print(sum(X));" and try converting X and
Hi SystemML folks,
I'm trying to pass some data from Spark to a DML script via the MLContext
API. The data is derived from a parquet file containing a dataframe with
the schema: [label: Integer, features: SparseVector]. I am doing the
following:
val input_data =