great - I'm glad to hear that. Thanks again for catching these issues
Anthony.
Regards,
Matthias
On Wed, Jan 10, 2018 at 11:09 AM, Anthony Thomas
wrote:
> Hey Matthias,
>
> Just wanted to confirm that patch above works for me - I'm now able to pass
> a dataframe of sparse vectors to a DML scrip
Hey Matthias,
Just wanted to confirm that patch above works for me - I'm now able to pass
a dataframe of sparse vectors to a DML script without issue. Sorry for the
slow confirmation on this - I've been out of the office for the last couple
weeks. Thanks for your help debugging this!
Best,
Antho
ok that was very helpful - I just pushed two additional fixes which
should resolve these issues. The underlying cause was an incorrect
sparse row preallocation (to reduce GC overhead), which resulted in
resizing issues for initial sizes of zero. These two patches fix the
underlying issues, make
Thanks Matthias - unfortunately I'm still running into an
ArrayIndexOutOfBounds exception both in reading the file as IJV and when
calling dataFrametoBinaryBlock. Just to confirm: I downloaded and compiled
the latest version using:
git clone https://github.com/apache/systemml
cd systemml
mvn clean
Thanks again for catching this issue Anthony - this IJV reblock issue
with large ultra-sparse matrices is now fixed in master. It likely did
not show up on the 1% sample because the data was small enough to read
it directly into the driver.
However, the dataFrameToBinaryBlock might be another
Hi Anthony,
thanks for helping to debug this issue. There are no limits other than
the dimensions and number of non-zeros being of type long. It sounds
more like an issues of converting special cases of ultra-sparse
matrices. I'll try to reproduce this issue and give an update as soon as
I kn
Okay thanks for the suggestions - I upgraded to 1.0 and tried providing
dimensions and blocksizes to dataFrameToBinaryBlock both without success. I
additionally wrote out the matrix to hdfs in IJV format and am still
getting the same error when calling "read()" directly in the DML. However,
I creat
Given the line numbers from the stacktrace, it seems that you use a
rather old version of SystemML. Hence, I would recommend to upgrade to
SystemML 1.0 or at least 0.15 first.
If the error persists or you're not able to upgrade, please try to call
dataFrameToBinaryBlock with provided matrix ch
Hi Matthias,
Thanks for the help! In response to your questions:
1. Sorry - this was a typo: the correct schema is: [y: int, features:
vector] - the column "features" was created using Spark's VectorAssembler
and the underlying type is an org.apache.spark.ml.linalg.SparseVector.
Calli
well, let's do the following to figure this out:
1) If the schema is indeed [label: Integer, features: SparseVector],
please change the third line to val y = input_data.select("label").
2) For debugging, I would recommend to use a simple script like
"print(sum(X));" and try converting X and y
Hi SystemML folks,
I'm trying to pass some data from Spark to a DML script via the MLContext
API. The data is derived from a parquet file containing a dataframe with
the schema: [label: Integer, features: SparseVector]. I am doing the
following:
val input_data = spark.read.parquet(inputPa
11 matches
Mail list logo