[jira] [Created] (SYSTEMML-1279) EOFException in MinMaxMean example snippet

2017-02-16 Thread JIRA
Felix Schüler created SYSTEMML-1279:
---

 Summary: EOFException in MinMaxMean example snippet
 Key: SYSTEMML-1279
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1279
 Project: SystemML
  Issue Type: Bug
Reporter: Felix Schüler
Priority: Minor


Our current documentation contains a snippet for a short DML scipt:

{code}
val numRows = 1
val numCols = 1000
val data = sc.parallelize(0 to numRows-1).map { _ => 
Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) }
val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, 
DoubleType, true) } )
val df = spark.createDataFrame(data, schema)



val minMaxMean =
"""
minOut = min(Xin)
maxOut = max(Xin)
meanOut = mean(Xin)
"""
val mm = new MatrixMetadata(numRows, numCols)
val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
"maxOut", "meanOut")
val (min, max, mean) = ml.execute(minMaxMeanScript).getTuple[Double, Double, 
Double]("minOut", "maxOut", "meanOut")
{code}

Execution of the line 
{code}
val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
"maxOut", "meanOut")
{code}

in the spark-shell leads to the following error:

{code}
scala> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
"maxOut", "meanOut")
[Stage 0:>  (0 + 4) / 
4]17/02/16 13:37:10 WARN CodeGenerator: Error calculating stats of compiled 
class.
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1509)
at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:644)
at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:623)
at org.codehaus.janino.util.ClassFile.(ClassFile.java:280)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:967)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:964)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:964)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:936)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:998)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:995)
at 
org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
at 
org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
at 
org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
at 
org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
at 
org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
at 
org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:890)
at 
org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:405)
at 
org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:359)
at 
org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:32)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:874)
at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection$lzycompute(ExpressionEncoder.scala:266)
at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection(ExpressionEncoder.scala:266)
at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:290)
at 
org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:547)
at 
org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:547)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at 

[jira] [Commented] (SYSTEMML-1238) Python test failing for LinearRegCG

2017-02-16 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871102#comment-15871102
 ] 

Niketan Pansare commented on SYSTEMML-1238:
---

1. I have verified that the mllearn API in 0.12.0 produces correct results.
2. No changes have been introduced in Python/Scala wrappers to affect this. The 
only change I see in algo since 0.12.0 is cbind. The bug is likely due to a 
side-effect of some other change.
3. I verified that Python wrappers are passing correct inputs to DML script by 
writing the input X,y to file and comparing it with original python data.

I tested LinRegDS:
A. commandline:
{code}
 ~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit SystemML.jar -f LinearRegDS.dml 
-nvargs X=X.csv Y=y.csv B=B.csv fmt=csv icpt=1 tol=0.01 reg=1
Calling the Direct Solver...
Computing the statistics...
17/02/16 21:02:52 INFO MapPartitionsRDD: Removing RDD 17 from persistence list
17/02/16 21:02:52 INFO BlockManager: Removing RDD 17
AVG_TOT_Y,152.13348416289594
STDEV_TOT_Y,77.09300453299106
AVG_RES_Y,-2.935409582574532E-14
STDEV_RES_Y,66.48545020578437
DISPERSION,4420.315089065834
PLAIN_R2,0.2579428201690507
ADJUSTED_R2,0.2562563265785258
PLAIN_R2_NOBIAS,0.2579428201690507
ADJUSTED_R2_NOBIAS,0.2562563265785258
Writing the output matrix...
END LINEAR REGRESSION SCRIPT
{code}

B. mllearn:
{code}
Calling the Direct Solver...
Computing the statistics...
AVG_TOT_Y,153.36255924170615
STDEV_TOT_Y,77.21853383600028
AVG_RES_Y,4.8020565933360324E-14
STDEV_RES_Y,67.06389890324985
DISPERSION,4497.566536105316
PLAIN_R2,0.24750834362605834
ADJUSTED_R2,0.24571669682516795
PLAIN_R2_NOBIAS,0.24750834362605834
ADJUSTED_R2_NOBIAS,0.24571669682516795
Writing the output matrix...
END LINEAR REGRESSION SCRIPT
lr
{code}

> Python test failing for LinearRegCG
> ---
>
> Key: SYSTEMML-1238
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1238
> Project: SystemML
>  Issue Type: Bug
>  Components: Algorithms, APIs
>Affects Versions: SystemML 0.13
>Reporter: Imran Younus
>Assignee: Niketan Pansare
> Attachments: python_LinearReg_test_spark.1.6.log, 
> python_LinearReg_test_spark.2.1.log
>
>
> [~deron] discovered that the one of the python test ({{test_mllearn_df.py}}) 
> with spark 2.1.0 was failing because the test score from linear regression 
> was very low ({{~ 0.24}}). I did a some investigation and it turns out the 
> the model parameters computed by the dml script are incorrect. In 
> systemml.12, the values of betas from linear regression model are 
> {{\[152.919, 938.237\]}}. This is what we expect from normal equation. (I 
> also tested this with sklearn). But the values of betas from systemml.13 
> (with spark 2.1.0) come out to be {{\[153.146, 458.489\]}}. These are not 
> correct and therefore the test score is much lower than expected. The data 
> going into DML script is correct. I printed out the valued of {{X}} and {{Y}} 
> in dml and I didn't see any issue there.
> Attached are the log files for two different tests (systemml0.12 and 0.13) 
> with explain flag.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1279) EOFException in MinMaxMean example snippet

2017-02-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870769#comment-15870769
 ] 

Felix Schüler commented on SYSTEMML-1279:
-

Might be related to https://issues.apache.org/jira/browse/SPARK-17131

> EOFException in MinMaxMean example snippet
> --
>
> Key: SYSTEMML-1279
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1279
> Project: SystemML
>  Issue Type: Bug
>Reporter: Felix Schüler
>Priority: Minor
>
> Our current documentation contains a snippet for a short DML scipt:
> {code}
> val numRows = 1
> val numCols = 1000
> val data = sc.parallelize(0 to numRows-1).map { _ => 
> Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) }
> val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, 
> DoubleType, true) } )
> val df = spark.createDataFrame(data, schema)
> val minMaxMean =
> """
> minOut = min(Xin)
> maxOut = max(Xin)
> meanOut = mean(Xin)
> """
> val mm = new MatrixMetadata(numRows, numCols)
> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
> "maxOut", "meanOut")
> val (min, max, mean) = ml.execute(minMaxMeanScript).getTuple[Double, Double, 
> Double]("minOut", "maxOut", "meanOut")
> {code}
> Execution of the line 
> {code}
> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
> "maxOut", "meanOut")
> {code}
> in the spark-shell leads to the following error:
> {code}
> scala> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
> "maxOut", "meanOut")
> [Stage 0:>  (0 + 4) / 
> 4]17/02/16 13:37:10 WARN CodeGenerator: Error calculating stats of compiled 
> class.
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1509)
>   at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:644)
>   at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:623)
>   at org.codehaus.janino.util.ClassFile.(ClassFile.java:280)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:967)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:964)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:964)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:936)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:998)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:995)
>   at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
>   at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
>   at 
> org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>   at 
> org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:890)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:405)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:359)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:32)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:874)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection$lzycompute(ExpressionEncoder.scala:266)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection(ExpressionEncoder.scala:266)
>   at 
> 

[jira] [Updated] (SYSTEMML-1280) Restore and deprecate SQLContext methods

2017-02-16 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson updated SYSTEMML-1280:
-
Description: 
SYSTEMML-1194 replaced SQLContext with SparkSession in SystemML for Spark 2.1.0 
since SQLContext is deprecated.

Restore the old Java SQLContext method signatures in case any users are using 
SystemML methods and are unable to use SparkSessions (SparkSessions are 
generally easy to create, as described in 
https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html)

Classes where this applies:
old MLContext class (whole class is deprecated)
old MLMatrix class (whole class is deprecated)
old MLOutput class (whole class is deprecated)
FrameRDDConverterUtils (this is a non-API class)
RDDConverterUtils (this is a non-API class)
RDDConverterUtilsExt (this is a non-API class)

In non-API classes, these SQLContext methods should be marked as deprecated and 
removed in a future version of SystemML (1.0) since SparkSessions should 
generally be used with Spark 2. As mentioned in SQLContext documentation, "As 
of Spark 2.0, this is replaced by SparkSession. However, we are keeping the 
class here for backward compatibility."



  was:
SYSTEMML-1194 replaced SQLContext with SparkSession in SystemML for Spark 2.1.0 
since SQLContext is deprecated.

Restore the old Java SQLContext method signatures in case any users are using 
SystemML methods and are unable to use SparkSessions (SparkSessions are 
generally easy to create, as described in 
https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html)

Classes where this applies:
old MLContext class (whole class is deprecated)
old MLMatrix class (whole class is deprecated)
old MLOutput class (whole class is deprecated)
FrameRDDConverterUtils (this is a non-API class)
RDDConverterUtils (this is a non-API class)
RDDConverterUtilsExt (this is a non-API class)

In non-API classes, these SQLContext methods should be marked as deprecated and 
removed in a future version of SystemML (1.0) since SparkSessions should be 
used with Spark 2. See 
https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html




> Restore and deprecate SQLContext methods
> 
>
> Key: SYSTEMML-1280
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1280
> Project: SystemML
>  Issue Type: Task
>  Components: APIs, Runtime
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
>
> SYSTEMML-1194 replaced SQLContext with SparkSession in SystemML for Spark 
> 2.1.0 since SQLContext is deprecated.
> Restore the old Java SQLContext method signatures in case any users are using 
> SystemML methods and are unable to use SparkSessions (SparkSessions are 
> generally easy to create, as described in 
> https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html)
> Classes where this applies:
> old MLContext class (whole class is deprecated)
> old MLMatrix class (whole class is deprecated)
> old MLOutput class (whole class is deprecated)
> FrameRDDConverterUtils (this is a non-API class)
> RDDConverterUtils (this is a non-API class)
> RDDConverterUtilsExt (this is a non-API class)
> In non-API classes, these SQLContext methods should be marked as deprecated 
> and removed in a future version of SystemML (1.0) since SparkSessions should 
> generally be used with Spark 2. As mentioned in SQLContext documentation, "As 
> of Spark 2.0, this is replaced by SparkSession. However, we are keeping the 
> class here for backward compatibility."



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1281) OOM Error On Binary Write

2017-02-16 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-1281:
-

 Summary: OOM Error On Binary Write
 Key: SYSTEMML-1281
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1281
 Project: SystemML
  Issue Type: Bug
Affects Versions: SystemML 0.13
Reporter: Mike Dusenberry
Priority: Blocker


I'm running into the following heap space OOM error while attempting to save a 
large Spark DataFrame to a SystemML binary format via DML {{write}} statements.

{code}
tr_sample_filename = os.path.join("data", "train_{}{}.parquet".format(size, 
"_grayscale" if grayscale else ""))
val_sample_filename = os.path.join("data", "val_{}{}.parquet".format(size, 
"_grayscale" if grayscale else ""))
train_df = sqlContext.read.load(tr_sample_filename)
val_df = sqlContext.read.load(val_sample_filename)
train_df, val_df

# Note: Must use the row index column, or X may not
# necessarily correspond correctly to Y
X_df = train_df.select("__INDEX", "sample")
X_val_df = val_df.select("__INDEX", "sample")
y_df = train_df.select("__INDEX", "tumor_score")
y_val_df = val_df.select("__INDEX", "tumor_score")
X_df, X_val_df, y_df, y_val_df

script = """
# Scale images to [-1,1]
X = X / 255
X_val = X_val / 255
X = X * 2 - 1
X_val = X_val * 2 - 1

# One-hot encode the labels
num_tumor_classes = 3
n = nrow(y)
n_val = nrow(y_val)
Y = table(seq(1, n), y, n, num_tumor_classes)
Y_val = table(seq(1, n_val), y_val, n_val, num_tumor_classes)
"""
outputs = ("X", "X_val", "Y", "Y_val")
script = dml(script).input(X=X_df, X_val=X_val_df, y=y_df, 
y_val=y_val_df).output(*outputs)
X, X_val, Y, Y_val = ml.execute(script).get(*outputs)
X, X_val, Y, Y_val

script = """
write(X, "data/systemml/X_"+size+"_"+c+"_binary", format="binary")
write(Y, "data/systemml/Y_"+size+"_"+c+"_binary", format="binary")
write(X_val, "data/systemml/X_val_"+size+"_"+c+"_binary", format="binary")
write(Y_val, "data/systemml/Y_val_"+size+"_"+c+"_binary", format="binary")
"""
script = dml(script).input(X=X, X_val=X_val, Y=Y, Y_val=Y_val, size=size, c=c)
ml.execute(script)
{code}

{code}
Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception 
occurred while executing runtime program
at 
org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:371)
at 
org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:292)
at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293)
... 12 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: 
org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program 
block generated from statement block between lines 1 and 11 -- Error evaluating 
instruction: CP°mvvar°X°¶_Var49¶°binaryblock
at 
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:130)
at 
org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:369)
... 14 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error 
in program block generated from statement block between lines 1 and 11 -- Error 
evaluating instruction: CP°mvvar°X°¶_Var49¶°binaryblock
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:320)
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:221)
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
at 
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123)
... 15 more
Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: Move 
to data/systemml/X_256_3_binary failed.
at 
org.apache.sysml.runtime.controlprogram.caching.CacheableData.moveData(CacheableData.java:1329)
at 
org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processMoveInstruction(VariableCPInstruction.java:706)
at 
org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processInstruction(VariableCPInstruction.java:511)
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:290)
... 18 more
Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: 
Export to data/systemml/X_256_3_binary failed.
at 
org.apache.sysml.runtime.controlprogram.caching.CacheableData.exportData(CacheableData.java:800)
at 
org.apache.sysml.runtime.controlprogram.caching.CacheableData.exportData(CacheableData.java:688)
at 
org.apache.sysml.runtime.controlprogram.caching.CacheableData.moveData(CacheableData.java:1315)
... 21 more
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 269 in stage 40.0 failed 4 times, most recent failure: Lost task 269.3 in 
stage 40.0 (TID 61177, 9.30.110.145, executor 10): ExecutorLostFailure 

[jira] [Resolved] (SYSTEMML-1280) Restore and deprecate SQLContext methods

2017-02-16 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson resolved SYSTEMML-1280.
--
   Resolution: Fixed
Fix Version/s: SystemML 0.13

Fixed by [PR396|https://github.com/apache/incubator-systemml/pull/396].

> Restore and deprecate SQLContext methods
> 
>
> Key: SYSTEMML-1280
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1280
> Project: SystemML
>  Issue Type: Task
>  Components: APIs, Runtime
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
> Fix For: SystemML 0.13
>
>
> SYSTEMML-1194 replaced SQLContext with SparkSession in SystemML for Spark 
> 2.1.0 since SQLContext is deprecated.
> Restore the old Java SQLContext method signatures in case any users are using 
> SystemML methods and are unable to use SparkSessions (SparkSessions are 
> described in 
> https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html)
> Classes where this applies:
> old MLContext class (whole class is deprecated)
> old MLMatrix class (whole class is deprecated)
> old MLOutput class (whole class is deprecated)
> FrameRDDConverterUtils (this is a non-API class)
> RDDConverterUtils (this is a non-API class)
> RDDConverterUtilsExt (this is a non-API class)
> In non-API classes, these SQLContext methods should be marked as deprecated 
> and removed in a future version of SystemML (1.0) since SparkSessions should 
> generally be used with Spark 2. As mentioned in SQLContext documentation, "As 
> of Spark 2.0, this is replaced by SparkSession. However, we are keeping the 
> class here for backward compatibility."



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (SYSTEMML-1280) Restore and deprecate SQLContext methods

2017-02-16 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson closed SYSTEMML-1280.


> Restore and deprecate SQLContext methods
> 
>
> Key: SYSTEMML-1280
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1280
> Project: SystemML
>  Issue Type: Task
>  Components: APIs, Runtime
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
> Fix For: SystemML 0.13
>
>
> SYSTEMML-1194 replaced SQLContext with SparkSession in SystemML for Spark 
> 2.1.0 since SQLContext is deprecated.
> Restore the old Java SQLContext method signatures in case any users are using 
> SystemML methods and are unable to use SparkSessions (SparkSessions are 
> described in 
> https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html)
> Classes where this applies:
> old MLContext class (whole class is deprecated)
> old MLMatrix class (whole class is deprecated)
> old MLOutput class (whole class is deprecated)
> FrameRDDConverterUtils (this is a non-API class)
> RDDConverterUtils (this is a non-API class)
> RDDConverterUtilsExt (this is a non-API class)
> In non-API classes, these SQLContext methods should be marked as deprecated 
> and removed in a future version of SystemML (1.0) since SparkSessions should 
> generally be used with Spark 2. As mentioned in SQLContext documentation, "As 
> of Spark 2.0, this is replaced by SparkSession. However, we are keeping the 
> class here for backward compatibility."



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.

2017-02-16 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson reassigned SYSTEMML-1277:


Assignee: Deron Eriksson

> DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
> ---
>
> Key: SYSTEMML-1277
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1277
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.13
>Reporter: Mike Dusenberry
>Assignee: Deron Eriksson
>Priority: Blocker
>
> Recently, we made the switch from the old {{mllib.Vector}} to the new 
> {{ml.Vector}} type.  Unfortunately, this leaves us with the issue of no 
> longer recognizing DataFrames with {{mllib.Vector}} columns during 
> conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} 
> objects, (2) instead fall back on conversion to {{Frame}} objects, and then 
> (3) fail completely when the ensuing DML script is expecting to operated on 
> matrices.
> Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, 
> sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the 
> following script will now fail (did not previously):
> {code}
> script = """
> # Scale images to [-1,1]
> X = X / 255
> X = X * 2 - 1
> """
> outputs = ("X")
> script = dml(script).input(X=X_df).output(*outputs)
> X = ml.execute(script).get(*outputs)
> X
> {code}
> {code}
> Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception 
> occurred while validating script
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280)
>   at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293)
>   ... 12 more
> Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : 
> ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME 
> SCALAR
>   at 
> org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:415)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:386)
>   at 
> org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130)
>   at 
> org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567)
>   at 
> org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485)
>   ... 14 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (SYSTEMML-1279) EOFException in MinMaxMean example snippet

2017-02-16 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson reassigned SYSTEMML-1279:


Assignee: Felix Schüler

> EOFException in MinMaxMean example snippet
> --
>
> Key: SYSTEMML-1279
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1279
> Project: SystemML
>  Issue Type: Bug
>Reporter: Felix Schüler
>Assignee: Felix Schüler
>Priority: Minor
> Fix For: SystemML 0.13
>
>
> Our current documentation contains a snippet for a short DML scipt:
> {code}
> val numRows = 1
> val numCols = 1000
> val data = sc.parallelize(0 to numRows-1).map { _ => 
> Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) }
> val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, 
> DoubleType, true) } )
> val df = spark.createDataFrame(data, schema)
> val minMaxMean =
> """
> minOut = min(Xin)
> maxOut = max(Xin)
> meanOut = mean(Xin)
> """
> val mm = new MatrixMetadata(numRows, numCols)
> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
> "maxOut", "meanOut")
> val (min, max, mean) = ml.execute(minMaxMeanScript).getTuple[Double, Double, 
> Double]("minOut", "maxOut", "meanOut")
> {code}
> Execution of the line 
> {code}
> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
> "maxOut", "meanOut")
> {code}
> in the spark-shell leads to the following error:
> {code}
> scala> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
> "maxOut", "meanOut")
> [Stage 0:>  (0 + 4) / 
> 4]17/02/16 13:37:10 WARN CodeGenerator: Error calculating stats of compiled 
> class.
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1509)
>   at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:644)
>   at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:623)
>   at org.codehaus.janino.util.ClassFile.(ClassFile.java:280)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:967)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:964)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:964)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:936)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:998)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:995)
>   at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
>   at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
>   at 
> org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>   at 
> org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:890)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:405)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:359)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:32)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:874)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection$lzycompute(ExpressionEncoder.scala:266)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection(ExpressionEncoder.scala:266)
>   at 

[jira] [Resolved] (SYSTEMML-1279) EOFException in MinMaxMean example snippet

2017-02-16 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson resolved SYSTEMML-1279.
--
   Resolution: Fixed
Fix Version/s: SystemML 0.13

> EOFException in MinMaxMean example snippet
> --
>
> Key: SYSTEMML-1279
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1279
> Project: SystemML
>  Issue Type: Bug
>Reporter: Felix Schüler
>Priority: Minor
> Fix For: SystemML 0.13
>
>
> Our current documentation contains a snippet for a short DML scipt:
> {code}
> val numRows = 1
> val numCols = 1000
> val data = sc.parallelize(0 to numRows-1).map { _ => 
> Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) }
> val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, 
> DoubleType, true) } )
> val df = spark.createDataFrame(data, schema)
> val minMaxMean =
> """
> minOut = min(Xin)
> maxOut = max(Xin)
> meanOut = mean(Xin)
> """
> val mm = new MatrixMetadata(numRows, numCols)
> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
> "maxOut", "meanOut")
> val (min, max, mean) = ml.execute(minMaxMeanScript).getTuple[Double, Double, 
> Double]("minOut", "maxOut", "meanOut")
> {code}
> Execution of the line 
> {code}
> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
> "maxOut", "meanOut")
> {code}
> in the spark-shell leads to the following error:
> {code}
> scala> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
> "maxOut", "meanOut")
> [Stage 0:>  (0 + 4) / 
> 4]17/02/16 13:37:10 WARN CodeGenerator: Error calculating stats of compiled 
> class.
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1509)
>   at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:644)
>   at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:623)
>   at org.codehaus.janino.util.ClassFile.(ClassFile.java:280)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:967)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:964)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:964)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:936)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:998)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:995)
>   at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
>   at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
>   at 
> org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>   at 
> org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:890)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:405)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:359)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:32)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:874)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection$lzycompute(ExpressionEncoder.scala:266)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection(ExpressionEncoder.scala:266)
>   at 
> 

[jira] [Closed] (SYSTEMML-1279) EOFException in MinMaxMean example snippet

2017-02-16 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson closed SYSTEMML-1279.


> EOFException in MinMaxMean example snippet
> --
>
> Key: SYSTEMML-1279
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1279
> Project: SystemML
>  Issue Type: Bug
>Reporter: Felix Schüler
>Assignee: Felix Schüler
>Priority: Minor
> Fix For: SystemML 0.13
>
>
> Our current documentation contains a snippet for a short DML scipt:
> {code}
> val numRows = 1
> val numCols = 1000
> val data = sc.parallelize(0 to numRows-1).map { _ => 
> Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) }
> val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, 
> DoubleType, true) } )
> val df = spark.createDataFrame(data, schema)
> val minMaxMean =
> """
> minOut = min(Xin)
> maxOut = max(Xin)
> meanOut = mean(Xin)
> """
> val mm = new MatrixMetadata(numRows, numCols)
> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
> "maxOut", "meanOut")
> val (min, max, mean) = ml.execute(minMaxMeanScript).getTuple[Double, Double, 
> Double]("minOut", "maxOut", "meanOut")
> {code}
> Execution of the line 
> {code}
> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
> "maxOut", "meanOut")
> {code}
> in the spark-shell leads to the following error:
> {code}
> scala> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
> "maxOut", "meanOut")
> [Stage 0:>  (0 + 4) / 
> 4]17/02/16 13:37:10 WARN CodeGenerator: Error calculating stats of compiled 
> class.
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1509)
>   at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:644)
>   at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:623)
>   at org.codehaus.janino.util.ClassFile.(ClassFile.java:280)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:967)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:964)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:964)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:936)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:998)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:995)
>   at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
>   at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
>   at 
> org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>   at 
> org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:890)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:405)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:359)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:32)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:874)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection$lzycompute(ExpressionEncoder.scala:266)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection(ExpressionEncoder.scala:266)
>   at 
> 

[jira] [Commented] (SYSTEMML-1279) EOFException in MinMaxMean example snippet

2017-02-16 Thread Deron Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870934#comment-15870934
 ] 

Deron Eriksson commented on SYSTEMML-1279:
--

[~fschueler] I am going to resolve this since your 
[PR395|https://github.com/apache/incubator-systemml/pull/395] addressed the 
codegen warning generated by following the docs.

SYSTEMML-1267 is a duplicate of this issue but perhaps I will keep that open 
for now in case [~mboehm7] or someone else can figure out any workaround.

> EOFException in MinMaxMean example snippet
> --
>
> Key: SYSTEMML-1279
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1279
> Project: SystemML
>  Issue Type: Bug
>Reporter: Felix Schüler
>Priority: Minor
>
> Our current documentation contains a snippet for a short DML scipt:
> {code}
> val numRows = 1
> val numCols = 1000
> val data = sc.parallelize(0 to numRows-1).map { _ => 
> Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) }
> val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, 
> DoubleType, true) } )
> val df = spark.createDataFrame(data, schema)
> val minMaxMean =
> """
> minOut = min(Xin)
> maxOut = max(Xin)
> meanOut = mean(Xin)
> """
> val mm = new MatrixMetadata(numRows, numCols)
> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
> "maxOut", "meanOut")
> val (min, max, mean) = ml.execute(minMaxMeanScript).getTuple[Double, Double, 
> Double]("minOut", "maxOut", "meanOut")
> {code}
> Execution of the line 
> {code}
> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
> "maxOut", "meanOut")
> {code}
> in the spark-shell leads to the following error:
> {code}
> scala> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
> "maxOut", "meanOut")
> [Stage 0:>  (0 + 4) / 
> 4]17/02/16 13:37:10 WARN CodeGenerator: Error calculating stats of compiled 
> class.
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1509)
>   at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:644)
>   at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:623)
>   at org.codehaus.janino.util.ClassFile.(ClassFile.java:280)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:967)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:964)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:964)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:936)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:998)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:995)
>   at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
>   at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
>   at 
> org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>   at 
> org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:890)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:405)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:359)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:32)
>   at 
> 

[jira] [Commented] (SYSTEMML-1279) EOFException in MinMaxMean example snippet

2017-02-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870940#comment-15870940
 ] 

Felix Schüler commented on SYSTEMML-1279:
-

Oh, didn't see that! Thanks!

> EOFException in MinMaxMean example snippet
> --
>
> Key: SYSTEMML-1279
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1279
> Project: SystemML
>  Issue Type: Bug
>Reporter: Felix Schüler
>Assignee: Felix Schüler
>Priority: Minor
> Fix For: SystemML 0.13
>
>
> Our current documentation contains a snippet for a short DML scipt:
> {code}
> val numRows = 1
> val numCols = 1000
> val data = sc.parallelize(0 to numRows-1).map { _ => 
> Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) }
> val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, 
> DoubleType, true) } )
> val df = spark.createDataFrame(data, schema)
> val minMaxMean =
> """
> minOut = min(Xin)
> maxOut = max(Xin)
> meanOut = mean(Xin)
> """
> val mm = new MatrixMetadata(numRows, numCols)
> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
> "maxOut", "meanOut")
> val (min, max, mean) = ml.execute(minMaxMeanScript).getTuple[Double, Double, 
> Double]("minOut", "maxOut", "meanOut")
> {code}
> Execution of the line 
> {code}
> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
> "maxOut", "meanOut")
> {code}
> in the spark-shell leads to the following error:
> {code}
> scala> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
> "maxOut", "meanOut")
> [Stage 0:>  (0 + 4) / 
> 4]17/02/16 13:37:10 WARN CodeGenerator: Error calculating stats of compiled 
> class.
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1509)
>   at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:644)
>   at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:623)
>   at org.codehaus.janino.util.ClassFile.(ClassFile.java:280)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:967)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:964)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:964)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:936)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:998)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:995)
>   at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
>   at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
>   at 
> org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>   at 
> org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:890)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:405)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:359)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:32)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:874)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection$lzycompute(ExpressionEncoder.scala:266)
>   at 
> 

[jira] [Commented] (SYSTEMML-1279) EOFException in MinMaxMean example snippet

2017-02-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870771#comment-15870771
 ] 

Felix Schüler commented on SYSTEMML-1279:
-

For the sake of having a clean documentation I suggest to set numCols to 100 
for now until it is fixed in spark.

> EOFException in MinMaxMean example snippet
> --
>
> Key: SYSTEMML-1279
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1279
> Project: SystemML
>  Issue Type: Bug
>Reporter: Felix Schüler
>Priority: Minor
>
> Our current documentation contains a snippet for a short DML scipt:
> {code}
> val numRows = 1
> val numCols = 1000
> val data = sc.parallelize(0 to numRows-1).map { _ => 
> Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) }
> val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, 
> DoubleType, true) } )
> val df = spark.createDataFrame(data, schema)
> val minMaxMean =
> """
> minOut = min(Xin)
> maxOut = max(Xin)
> meanOut = mean(Xin)
> """
> val mm = new MatrixMetadata(numRows, numCols)
> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
> "maxOut", "meanOut")
> val (min, max, mean) = ml.execute(minMaxMeanScript).getTuple[Double, Double, 
> Double]("minOut", "maxOut", "meanOut")
> {code}
> Execution of the line 
> {code}
> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
> "maxOut", "meanOut")
> {code}
> in the spark-shell leads to the following error:
> {code}
> scala> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", 
> "maxOut", "meanOut")
> [Stage 0:>  (0 + 4) / 
> 4]17/02/16 13:37:10 WARN CodeGenerator: Error calculating stats of compiled 
> class.
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1509)
>   at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:644)
>   at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:623)
>   at org.codehaus.janino.util.ClassFile.(ClassFile.java:280)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:967)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:964)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:964)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:936)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:998)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:995)
>   at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
>   at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
>   at 
> org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>   at 
> org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:890)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:405)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:359)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:32)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:874)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection$lzycompute(ExpressionEncoder.scala:266)
>   at 
> 

[jira] [Commented] (SYSTEMML-1276) Resolve jersey class not found error with Spark2 and YARN

2017-02-16 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871256#comment-15871256
 ] 

Matthias Boehm commented on SYSTEMML-1276:
--

it also fails on creating the spark context on certain hadoop distributions 
(see below).

{code}
Exception in thread "main" java.lang.NoClassDefFoundError: 
com/sun/jersey/api/client/config/ClientConfig
at 
org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181)
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:151)
at 
org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
at org.apache.spark.SparkContext.(SparkContext.scala:509)
at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
at 
org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext.initSparkContext(SparkExecutionContext.java:215)
at 
org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext.getSparkContext(SparkExecutionContext.java:130)
at 
org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext.getRDDHandleForMatrixObject(SparkExecutionContext.java:359)
at 
org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext.getRDDHandleForVariable(SparkExecutionContext.java:304)
at 
org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext.getBinaryBlockRDDHandleForVariable(SparkExecutionContext.java:279)
at 
org.apache.sysml.runtime.instructions.spark.MapmmSPInstruction.processInstruction(MapmmSPInstruction.java:117)
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:290)
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:221)
at 
org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168)
at 
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123)
at org.apache.sysml.api.DMLScript.execute(DMLScript.java:665)
at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:346)
at org.apache.sysml.api.DMLScript.main(DMLScript.java:207)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: 
com.sun.jersey.api.client.config.ClientConfig
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357
{code}

> Resolve jersey class not found error with Spark2 and YARN
> -
>
> Key: SYSTEMML-1276
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1276
> Project: SystemML
>  Issue Type: Improvement
>  Components: Runtime
>Affects Versions: SystemML 0.13
> Environment: Spark 2.x, Hadoop 2.7.3
>Reporter: Glenn Weidner
>Assignee: Glenn Weidner
>
> This is a known issue as reported in [YARN-5271] and [SPARK-15343].  It was 
> observed during 0.13 performance testing and can be reproduced with following 
> example:
> spark-submit --master yarn --deploy-mode client --class 
> org.apache.sysml.api.DMLScript ./systemml-0.13.0-incubating-SNAPSHOT.jar -f 
> ./scripts/utils/sample.dml -exec hybrid_spark -nvargs X=linRegData.csv 
> sv=perc.csv O=linRegDataParts ofmt=csv
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> com/sun/jersey/api/client/config/ClientConfig
> at 
> org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
> at 
> 

[jira] [Resolved] (SYSTEMML-1271) Increment MLContext minimum Spark version to 2.1.0

2017-02-16 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson resolved SYSTEMML-1271.
--
   Resolution: Fixed
Fix Version/s: SystemML 0.13

Fixed by [PR392|https://github.com/apache/incubator-systemml/pull/392].

> Increment MLContext minimum Spark version to 2.1.0
> --
>
> Key: SYSTEMML-1271
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1271
> Project: SystemML
>  Issue Type: Task
>  Components: APIs
>Affects Versions: SystemML 0.13
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
> Fix For: SystemML 0.13
>
>
> For SystemML 0.13, set MLContext SYSTEMML_MINIMUM_SPARK_VERSION to 2.1.0.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Closed] (SYSTEMML-1271) Increment MLContext minimum Spark version to 2.1.0

2017-02-16 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson closed SYSTEMML-1271.


> Increment MLContext minimum Spark version to 2.1.0
> --
>
> Key: SYSTEMML-1271
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1271
> Project: SystemML
>  Issue Type: Task
>  Components: APIs
>Affects Versions: SystemML 0.13
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
> Fix For: SystemML 0.13
>
>
> For SystemML 0.13, set MLContext SYSTEMML_MINIMUM_SPARK_VERSION to 2.1.0.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (SYSTEMML-1238) Python test failing for LinearRegCG

2017-02-16 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871170#comment-15871170
 ] 

Niketan Pansare edited comment on SYSTEMML-1238 at 2/17/17 5:33 AM:


I am able to reproduce this bug (not sure if it is) with command-line as well. 
Here is the output of GLM-predict (after running LinRegDS):
{code}
$ cat y_predicted.csv
189.09660701586185
133.3260601238074
157.3739106185465
132.8144037303023
135.88434209133283
154.81562865102103
194.2131709509127
136.3959984848379
125.13955782772601
137.41931127184807
178.35182275225503
123.60458864721075
152.7690030770007
141.0009060263837
116.95305553164462
161.46716176658717
144.58250078091928
144.58250078091928
170.67697684967874
117.4647119251497
{code}

Here is the output of Python mllearn:
{code}
>>> import numpy as np
>>> from pyspark.context import SparkContext
>>> from pyspark.ml import Pipeline
>>> from pyspark.ml.feature import HashingTF, Tokenizer
from pyspark.sql import SparkSession
from sklearn import datasets, metrics, neighbors
>>> from pyspark.sql import SparkSession
>>> from sklearn import datasets, metrics, neighbors
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer

from systemml.mllearn import LinearRegression, LogisticRegression, NaiveBayes, 
SVM
diabetes = datasets.load_diabetes()
diabetes_X = diabetes.data[:, np.newaxis, 2]
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]
sparkSession = SparkSession.builder.getOrCreate()
regr = LinearRegression(sparkSession, solver="direct-solve")
regr.fit(diabetes_X_train, diabetes_y_train)>>> from sklearn.datasets import 
fetch_20newsgroups
>>> from sklearn.feature_extraction.text import TfidfVectorizer
>>>
>>> from systemml.mllearn import LinearRegression, LogisticRegression, 
>>> NaiveBayes, SVM
>>> diabetes = datasets.load_diabetes()
>>> diabetes_X = diabetes.data[:, np.newaxis, 2]
>>> diabetes_X_train = diabetes_X[:-20]
>>> diabetes_X_test = diabetes_X[-20:]
>>> diabetes_y_train = diabetes.target[:-20]
>>> diabetes_y_test = diabetes.target[-20:]
>>> sparkSession = SparkSession.builder.getOrCreate()
>>> regr = LinearRegression(sparkSession, solver="direct-solve")
>>> regr.fit(diabetes_X_train, diabetes_y_train)

Welcome to Apache SystemML!

17/02/16 22:39:21 WARN RewriteRemovePersistentReadWrite: Non-registered 
persistent write of variable 'X' (line 87).
17/02/16 22:39:21 WARN RewriteRemovePersistentReadWrite: Non-registered 
persistent write of variable 'y' (line 88).
BEGIN LINEAR REGRESSION SCRIPT
Reading X and Y...
Calling the Direct Solver...
Computing the statistics...
AVG_TOT_Y,153.36255924170615
STDEV_TOT_Y,77.21853383600028
AVG_RES_Y,4.8020565933360324E-14
STDEV_RES_Y,67.06389890324985
DISPERSION,4497.566536105316
PLAIN_R2,0.24750834362605834
ADJUSTED_R2,0.24571669682516795
PLAIN_R2_NOBIAS,0.24750834362605834
ADJUSTED_R2_NOBIAS,0.24571669682516795
Writing the output matrix...
END LINEAR REGRESSION SCRIPT
lr
>>> regr.predict(diabetes_X_test)
17/02/16 22:39:35 WARN Expression: WARNING: null -- line 149, column 4 -- Read 
input file does not exist on FS (local mode):
17/02/16 22:39:35 WARN Expression: Metadata file:  .mtd not provided
array([[ 188.84521284],
   [ 134.98127765],
   [ 158.20701117],
   [ 134.4871131 ],
   [ 137.45210036],
   [ 155.73618846],
   [ 193.78685827],
   [ 137.94626491],
   [ 127.07464496],
   [ 138.93459399],
   [ 178.46775744],
   [ 125.59215133],
   [ 153.75953028],
   [ 142.39374579],
   [ 119.16801227],
   [ 162.16032752],
   [ 145.8528976 ],
   [ 145.8528976 ],
   [ 171.05528929],
   [ 119.66217681]])
{code}

To reproduce the command-line output, please dump the test data into csv:
{code}
import numpy as np
from sklearn import datasets
diabetes = datasets.load_diabetes()
diabetes_X = diabetes.data[:, np.newaxis, 2]
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]
diabetes_X_test.tofile('X_test.csv', sep="\n")
diabetes_X.tofile('X.csv', sep="\n")
diabetes.target.tofile('y.csv', sep="\n")
{code}

And execute following commands (you may have to edit dml script to add format 
or create metadata file):
{code}
~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit SystemML.jar -f LinearRegDS.dml 
-nvargs X=X.csv Y=y.csv B=B.csv fmt=csv icpt=1 tol=0.01 reg=1 
~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit SystemML.jar -f GLM-predict.dml 
-nvargs X=X_test.csv M=y_predicted.csv B=B.csv fmt=csv icpt=1 tol=0.01 reg=1
{code}

I also tested using SystemML 0.12.0 and got the same predictions:
{code}
$ ~/spark-1.6.1-bin-hadoop2.6/bin/spark-submit systemml-0.12.0-incubating.jar 
-f LinearRegDS.dml -nvargs X=X.csv 

[jira] [Updated] (SYSTEMML-1255) New fused operator tack+* in CP and Spark

2017-02-16 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-1255:
-
Summary: New fused operator tack+* in CP and Spark  (was: New fused 
operator tack+* in CP)

> New fused operator tack+* in CP and Spark
> -
>
> Key: SYSTEMML-1255
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1255
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Compiler
>Reporter: Matthias Boehm
>
> Similar to the existing tak+* operator, this new tack+* operator fused two or 
> three binary multiply operations and final column-wise aggregation 
> colSums(X*Y*Z) in order to avoid materializing the intermediates which is 
> very expensive compared to the cheap multiply and sum operations.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1211) Verify dependencies for Spark 2

2017-02-16 Thread Deron Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871175#comment-15871175
 ] 

Deron Eriksson commented on SYSTEMML-1211:
--

[PR394|https://github.com/apache/incubator-systemml/pull/394] addresses 
dependencies in the pom for Spark 2.1.0.


> Verify dependencies for Spark 2
> ---
>
> Key: SYSTEMML-1211
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1211
> Project: SystemML
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
>
> With the migration to Spark 2, we should verify that the artifact assemblies 
> are properly handling all dependencies.
> Also, we should verify that that artifact licenses properly include all 
> dependencies following the Spark 2 migration.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1257) Univar-Stats scripts failing due to Unexpected ValueType in ArithmeticInstruction

2017-02-16 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869639#comment-15869639
 ] 

Matthias Boehm commented on SYSTEMML-1257:
--

As it turned out, it was a bug in the special case, where the matrix 
intermediate resulting from a relational comparison, i.e., (K>1)*maxs), was not 
bound to a target variable. For these cases, we created temporary targets 
during HOPs construction, where the value type was mistakenly set to boolean. 
Later this resulted in the incorrect data type as certain operations are only 
support over scalars. In comparison, ppred is only supported over matrices and 
hence worked correctly.

> Univar-Stats scripts failing due to Unexpected ValueType in 
> ArithmeticInstruction
> -
>
> Key: SYSTEMML-1257
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1257
> Project: SystemML
>  Issue Type: Bug
>Reporter: Arvind Surve
>Assignee: Matthias Boehm
>
> Running Release verification process 
> (http://apache.github.io/incubator-systemml/release-process.html)  where 
> Univar-Stats.dml failing to execute.
> Trying to run following example on Single Node Spark environment.
> $ tar -xvzf systemml-0.11.0-incubating.tgz
> $ cd systemml-0.11.0-incubating
> $ export SPARK_HOME=/Users/deroneriksson/spark-1.5.1-bin-hadoop2.6
> $ $SPARK_HOME/bin/spark-submit SystemML.jar -f 
> scripts/datagen/genRandData4Univariate.dml -exec hybrid_spark -args 100 
> 100 10 1 2 3 4 uni.mtx
> $ echo '1' > uni-types.csv
> $ echo '{"rows": 1, "cols": 1, "format": "csv"}' > uni-types.csv.mtd
> $ $SPARK_HOME/bin/spark-submit SystemML.jar -f 
> scripts/algorithms/Univar-Stats.dml -exec hybrid_spark -nvargs X=uni.mtx 
> TYPES=uni-types.csv STATS=uni-stats.txt CONSOLE_OUTPUT=TRUE
> Exception get is following:
> Exception in thread "main" org.apache.sysml.api.DMLException: 
> org.apache.sysml.lops.LopsException: ERROR: line 64, column 21 -- Problem 
> generating simple inst - 
> CP°*°_Var27·SCALAR·DOUBLE·false°_Var25·SCALAR·DOUBLE·false°_Var28·SCALAR·BOOLEAN
>   at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:374)
>   at org.apache.sysml.api.DMLScript.main(DMLScript.java:221)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: org.apache.sysml.lops.LopsException: ERROR: line 64, column 21 -- 
> Problem generating simple inst - 
> CP°*°_Var27·SCALAR·DOUBLE·false°_Var25·SCALAR·DOUBLE·false°_Var28·SCALAR·BOOLEAN
>   at 
> org.apache.sysml.lops.compile.Dag.generateControlProgramJobs(Dag.java:1572)
>   at org.apache.sysml.lops.compile.Dag.doGreedyGrouping(Dag.java:1212)
>   at org.apache.sysml.lops.compile.Dag.getJobs(Dag.java:267)
>   at 
> org.apache.sysml.parser.DMLProgram.createRuntimeProgramBlock(DMLProgram.java:531)
>   at 
> org.apache.sysml.parser.DMLProgram.getRuntimeProgram(DMLProgram.java:207)
>   at org.apache.sysml.api.DMLScript.execute(DMLScript.java:633)
>   at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:360)
>   ... 10 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: Unexpected ValueType 
> in ArithmeticInstruction.
>   at 
> org.apache.sysml.runtime.instructions.cp.ArithmeticBinaryCPInstruction.parseInstruction(ArithmeticBinaryCPInstruction.java:80)
>   at 
> org.apache.sysml.runtime.instructions.CPInstructionParser.parseSingleInstruction(CPInstructionParser.java:321)
>   at 
> org.apache.sysml.runtime.instructions.InstructionParser.parseSingleInstruction(InstructionParser.java:45)
>   at 
> org.apache.sysml.lops.compile.Dag.generateControlProgramJobs(Dag.java:1559)
> Following line in Univar-stats dml causing that exception:
> maxDomainSize = max( (K > 1) * maxs );
> Its Boolean x Double, causing problem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1276) Resolve jersey class not found error with Spark2 and YARN

2017-02-16 Thread Glenn Weidner (JIRA)
Glenn Weidner created SYSTEMML-1276:
---

 Summary: Resolve jersey class not found error with Spark2 and YARN
 Key: SYSTEMML-1276
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1276
 Project: SystemML
  Issue Type: Improvement
  Components: Runtime
Affects Versions: SystemML 0.13
 Environment: Spark 2.x, Hadoop 2.7.3
Reporter: Glenn Weidner
Assignee: Glenn Weidner


This is a known issue as reported in [YARN-5271] and [SPARK-15343].  It was 
observed during 0.13 performance testing and can be reproduced with following 
example:

spark-submit --master yarn --deploy-mode client --class 
org.apache.sysml.api.DMLScript ./systemml-0.13.0-incubating-SNAPSHOT.jar -f 
./scripts/utils/sample.dml -exec hybrid_spark -nvargs X=linRegData.csv 
sv=perc.csv O=linRegDataParts ofmt=csv

Exception in thread "main" java.lang.NoClassDefFoundError: 
com/sun/jersey/api/client/config/ClientConfig
at 
org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:182)
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:169)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.mapred.ResourceMgrDelegate.serviceInit(ResourceMgrDelegate.java:103)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.mapred.ResourceMgrDelegate.(ResourceMgrDelegate.java:97)
at org.apache.hadoop.mapred.YARNRunner.(YARNRunner.java:122)
at 
org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34)
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:95)
at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:475)
at org.apache.hadoop.mapred.JobClient.(JobClient.java:454)
at 
org.apache.sysml.runtime.controlprogram.parfor.stat.InfrastructureAnalyzer.analyzeHadoopCluster(InfrastructureAnalyzer.java:472)
at 
org.apache.sysml.runtime.controlprogram.parfor.stat.InfrastructureAnalyzer.getRemoteParallelMapTasks(InfrastructureAnalyzer.java:114)
at 
org.apache.sysml.runtime.controlprogram.parfor.stat.InfrastructureAnalyzer.getCkMaxMR(InfrastructureAnalyzer.java:298)
at 
org.apache.sysml.runtime.controlprogram.parfor.opt.OptimizationWrapper.optimize(OptimizationWrapper.java:168)
at 
org.apache.sysml.runtime.controlprogram.ParForProgramBlock.execute(ParForProgramBlock.java:550)
at 
org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:145)
at org.apache.sysml.api.DMLScript.execute(DMLScript.java:674)
at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:354)
at org.apache.sysml.api.DMLScript.main(DMLScript.java:199)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: 
com.sun.jersey.api.client.config.ClientConfig
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 32 more







--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1276) Resolve jersey class not found error with Spark2 and YARN

2017-02-16 Thread Glenn Weidner (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870239#comment-15870239
 ] 

Glenn Weidner commented on SYSTEMML-1276:
-

[Submitted PR 393 | https://github.com/apache/incubator-systemml/pull/393].

> Resolve jersey class not found error with Spark2 and YARN
> -
>
> Key: SYSTEMML-1276
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1276
> Project: SystemML
>  Issue Type: Improvement
>  Components: Runtime
>Affects Versions: SystemML 0.13
> Environment: Spark 2.x, Hadoop 2.7.3
>Reporter: Glenn Weidner
>Assignee: Glenn Weidner
>
> This is a known issue as reported in [YARN-5271] and [SPARK-15343].  It was 
> observed during 0.13 performance testing and can be reproduced with following 
> example:
> spark-submit --master yarn --deploy-mode client --class 
> org.apache.sysml.api.DMLScript ./systemml-0.13.0-incubating-SNAPSHOT.jar -f 
> ./scripts/utils/sample.dml -exec hybrid_spark -nvargs X=linRegData.csv 
> sv=perc.csv O=linRegDataParts ofmt=csv
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> com/sun/jersey/api/client/config/ClientConfig
> at 
> org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:182)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:169)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.serviceInit(ResourceMgrDelegate.java:103)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.(ResourceMgrDelegate.java:97)
> at org.apache.hadoop.mapred.YARNRunner.(YARNRunner.java:122)
> at 
> org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34)
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:95)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75)
> at org.apache.hadoop.mapred.JobClient.init(JobClient.java:475)
> at org.apache.hadoop.mapred.JobClient.(JobClient.java:454)
> at 
> org.apache.sysml.runtime.controlprogram.parfor.stat.InfrastructureAnalyzer.analyzeHadoopCluster(InfrastructureAnalyzer.java:472)
> at 
> org.apache.sysml.runtime.controlprogram.parfor.stat.InfrastructureAnalyzer.getRemoteParallelMapTasks(InfrastructureAnalyzer.java:114)
> at 
> org.apache.sysml.runtime.controlprogram.parfor.stat.InfrastructureAnalyzer.getCkMaxMR(InfrastructureAnalyzer.java:298)
> at 
> org.apache.sysml.runtime.controlprogram.parfor.opt.OptimizationWrapper.optimize(OptimizationWrapper.java:168)
> at 
> org.apache.sysml.runtime.controlprogram.ParForProgramBlock.execute(ParForProgramBlock.java:550)
> at 
> org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:145)
> at org.apache.sysml.api.DMLScript.execute(DMLScript.java:674)
> at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:354)
> at org.apache.sysml.api.DMLScript.main(DMLScript.java:199)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: 
> com.sun.jersey.api.client.config.ClientConfig
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 32 more



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (SYSTEMML-1246) Mismatched name in sparkDML.sh for main jar of -bin artifact

2017-02-16 Thread Glenn Weidner (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Glenn Weidner reassigned SYSTEMML-1246:
---

Assignee: Glenn Weidner

> Mismatched name in sparkDML.sh for main jar of -bin artifact
> 
>
> Key: SYSTEMML-1246
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1246
> Project: SystemML
>  Issue Type: Bug
>Reporter: Glenn Weidner
>Assignee: Glenn Weidner
>
> For distributed release artifacts systemml-[0.12.0 | 
> 0.11.0]-incubating-bin.[tgz | zip]:
> scripts/sparkDML.sh references
> {code}
> ${SYSTEMML_HOME}/SystemML.jar 
> {code}
> but lib folder of archive contains
> systemml-[0.12.0 | 0.11.0]-incubating.jar.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1250) Binary artifact missing antlr-runtime and wink-json4j classes

2017-02-16 Thread Glenn Weidner (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870254#comment-15870254
 ] 

Glenn Weidner commented on SYSTEMML-1250:
-

A fix will be incorporated in 0.12.1 as mentioned in [dev mail thread | 
https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01399.html].

> Binary artifact missing antlr-runtime and wink-json4j classes
> -
>
> Key: SYSTEMML-1250
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1250
> Project: SystemML
>  Issue Type: Bug
>  Components: Build
>Reporter: Glenn Weidner
>
> The -bin artifact (both 0.11 and 0.12) are missing org/antlr/v4/runtime and 
> org/apache/wink/json4j classes.  Since the -bin has a lib folder, the 
> corresponding jars can be included there.  For comparison, these classes are 
> included in systemml-0.12.0-incubating.jar at 
> https://repository.apache.org/content/repositories/releases/org/apache/systemml/systemml/0.12.0-incubating/,
>  and although there is a jar by that same name inside the -bin artifact, it 
> does not include the classes.  Similar content observed for 
> systemml-0.11.0-incubating.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1181) Update documentation with changes related to Spark 2.1.0

2017-02-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870446#comment-15870446
 ] 

Felix Schüler commented on SYSTEMML-1181:
-

[~deron] you double checked the docs the other day, right? Can we resolve this 
issue?

> Update documentation with changes related to Spark 2.1.0
> 
>
> Key: SYSTEMML-1181
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1181
> Project: SystemML
>  Issue Type: Documentation
>Reporter: Arvind Surve
>Assignee: Felix Schüler
>
> Update web page for any changes related to SystemML on Spark 2.1.0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1181) Update documentation with changes related to Spark 2.1.0

2017-02-16 Thread Deron Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870495#comment-15870495
 ] 

Deron Eriksson commented on SYSTEMML-1181:
--

[~fschueler] no i have not double-checked the docs for 2.1.0. feel free to 
review them and make any needed updates.

> Update documentation with changes related to Spark 2.1.0
> 
>
> Key: SYSTEMML-1181
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1181
> Project: SystemML
>  Issue Type: Documentation
>Reporter: Arvind Surve
>Assignee: Felix Schüler
>
> Update web page for any changes related to SystemML on Spark 2.1.0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (SYSTEMML-1275) Remove workaround flags disable_sparse disable_caching

2017-02-16 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1275.
--
   Resolution: Fixed
 Assignee: Matthias Boehm
Fix Version/s: SystemML 0.13

> Remove workaround flags disable_sparse disable_caching
> --
>
> Key: SYSTEMML-1275
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1275
> Project: SystemML
>  Issue Type: Bug
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 0.13
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (SYSTEMML-1274) Unnecessary rdd computation for nnz maintenance on write

2017-02-16 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1274.
--
   Resolution: Done
 Assignee: Matthias Boehm
Fix Version/s: SystemML 0.13

> Unnecessary rdd computation for nnz maintenance on write
> 
>
> Key: SYSTEMML-1274
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1274
> Project: SystemML
>  Issue Type: Bug
>  Components: Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 0.13
>
>
> Our primitive for writing binary block RDDs to HDFS (as used in guarded 
> collect), first computes the number of non-zeros (nnz) and subsequently 
> writes out the data. This leads to redundant RDD computation, which can be 
> expensive for large DAGs of RDD operations. Explicitly computing the nnz is 
> unnecessary as we could simply piggyback this computation onto the write via 
> an accumulator as done in multiple other places in SystemML. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (SYSTEMML-1273) Performance spark right indexing w/o aggregation

2017-02-16 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1273.
--
   Resolution: Done
 Assignee: Matthias Boehm
Fix Version/s: SystemML 0.13

> Performance spark right indexing w/o aggregation
> 
>
> Key: SYSTEMML-1273
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1273
> Project: SystemML
>  Issue Type: Task
>  Components: Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 0.13
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (SYSTEMML-1257) Univar-Stats scripts failing due to Unexpected ValueType in ArithmeticInstruction

2017-02-16 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-1257.
--
   Resolution: Fixed
Fix Version/s: SystemML 0.13

> Univar-Stats scripts failing due to Unexpected ValueType in 
> ArithmeticInstruction
> -
>
> Key: SYSTEMML-1257
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1257
> Project: SystemML
>  Issue Type: Bug
>Reporter: Arvind Surve
>Assignee: Matthias Boehm
> Fix For: SystemML 0.13
>
>
> Running Release verification process 
> (http://apache.github.io/incubator-systemml/release-process.html)  where 
> Univar-Stats.dml failing to execute.
> Trying to run following example on Single Node Spark environment.
> $ tar -xvzf systemml-0.11.0-incubating.tgz
> $ cd systemml-0.11.0-incubating
> $ export SPARK_HOME=/Users/deroneriksson/spark-1.5.1-bin-hadoop2.6
> $ $SPARK_HOME/bin/spark-submit SystemML.jar -f 
> scripts/datagen/genRandData4Univariate.dml -exec hybrid_spark -args 100 
> 100 10 1 2 3 4 uni.mtx
> $ echo '1' > uni-types.csv
> $ echo '{"rows": 1, "cols": 1, "format": "csv"}' > uni-types.csv.mtd
> $ $SPARK_HOME/bin/spark-submit SystemML.jar -f 
> scripts/algorithms/Univar-Stats.dml -exec hybrid_spark -nvargs X=uni.mtx 
> TYPES=uni-types.csv STATS=uni-stats.txt CONSOLE_OUTPUT=TRUE
> Exception get is following:
> Exception in thread "main" org.apache.sysml.api.DMLException: 
> org.apache.sysml.lops.LopsException: ERROR: line 64, column 21 -- Problem 
> generating simple inst - 
> CP°*°_Var27·SCALAR·DOUBLE·false°_Var25·SCALAR·DOUBLE·false°_Var28·SCALAR·BOOLEAN
>   at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:374)
>   at org.apache.sysml.api.DMLScript.main(DMLScript.java:221)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: org.apache.sysml.lops.LopsException: ERROR: line 64, column 21 -- 
> Problem generating simple inst - 
> CP°*°_Var27·SCALAR·DOUBLE·false°_Var25·SCALAR·DOUBLE·false°_Var28·SCALAR·BOOLEAN
>   at 
> org.apache.sysml.lops.compile.Dag.generateControlProgramJobs(Dag.java:1572)
>   at org.apache.sysml.lops.compile.Dag.doGreedyGrouping(Dag.java:1212)
>   at org.apache.sysml.lops.compile.Dag.getJobs(Dag.java:267)
>   at 
> org.apache.sysml.parser.DMLProgram.createRuntimeProgramBlock(DMLProgram.java:531)
>   at 
> org.apache.sysml.parser.DMLProgram.getRuntimeProgram(DMLProgram.java:207)
>   at org.apache.sysml.api.DMLScript.execute(DMLScript.java:633)
>   at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:360)
>   ... 10 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: Unexpected ValueType 
> in ArithmeticInstruction.
>   at 
> org.apache.sysml.runtime.instructions.cp.ArithmeticBinaryCPInstruction.parseInstruction(ArithmeticBinaryCPInstruction.java:80)
>   at 
> org.apache.sysml.runtime.instructions.CPInstructionParser.parseSingleInstruction(CPInstructionParser.java:321)
>   at 
> org.apache.sysml.runtime.instructions.InstructionParser.parseSingleInstruction(InstructionParser.java:45)
>   at 
> org.apache.sysml.lops.compile.Dag.generateControlProgramJobs(Dag.java:1559)
> Following line in Univar-stats dml causing that exception:
> maxDomainSize = max( (K > 1) * maxs );
> Its Boolean x Double, causing problem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.

2017-02-16 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870565#comment-15870565
 ] 

Mike Dusenberry commented on SYSTEMML-1277:
---

Update: Here's the official word on DataFrame conversions from the old 
{{mllib.Vector}} to {{ml.Vector}}: 
https://spark.apache.org/docs/2.0.0/ml-guide.html#breaking-changes.

> DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
> ---
>
> Key: SYSTEMML-1277
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1277
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.13
>Reporter: Mike Dusenberry
>Priority: Blocker
>
> Recently, we made the switch from the old {{mllib.Vector}} to the new 
> {{ml.Vector}} type.  Unfortunately, this leaves us with the issue of no 
> longer recognizing DataFrames with {{mllib.Vector}} columns during 
> conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} 
> objects, (2) instead fall back on conversion to {{Frame}} objects, and then 
> (3) fail completely when the ensuing DML script is expecting to operated on 
> matrices.
> Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, 
> sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the 
> following script will now fail (did not previously):
> {code}
> script = """
> # Scale images to [-1,1]
> X = X / 255
> X = X * 2 - 1
> """
> outputs = ("X")
> script = dml(script).input(X=X_df).output(*outputs)
> X = ml.execute(script).get(*outputs)
> X
> {code}
> {code}
> Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception 
> occurred while validating script
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280)
>   at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293)
>   ... 12 more
> Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : 
> ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME 
> SCALAR
>   at 
> org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:415)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:386)
>   at 
> org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130)
>   at 
> org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567)
>   at 
> org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485)
>   ... 14 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.

2017-02-16 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870577#comment-15870577
 ] 

Mike Dusenberry commented on SYSTEMML-1277:
---

Adding the following fixes the issue, so we should just add the similar 
wrappers at the Java MLContext layer.

{code}
# Convert DataFrame columns of type `mllib.Vector` to type `ml.Vector`
X_df = MLUtils.convertVectorColumnsToML(X_df)
{code}

> DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
> ---
>
> Key: SYSTEMML-1277
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1277
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.13
>Reporter: Mike Dusenberry
>Priority: Blocker
>
> Recently, we made the switch from the old {{mllib.Vector}} to the new 
> {{ml.Vector}} type.  Unfortunately, this leaves us with the issue of no 
> longer recognizing DataFrames with {{mllib.Vector}} columns during 
> conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} 
> objects, (2) instead fall back on conversion to {{Frame}} objects, and then 
> (3) fail completely when the ensuing DML script is expecting to operated on 
> matrices.
> Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, 
> sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the 
> following script will now fail (did not previously):
> {code}
> script = """
> # Scale images to [-1,1]
> X = X / 255
> X = X * 2 - 1
> """
> outputs = ("X")
> script = dml(script).input(X=X_df).output(*outputs)
> X = ml.execute(script).get(*outputs)
> X
> {code}
> {code}
> Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception 
> occurred while validating script
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280)
>   at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293)
>   ... 12 more
> Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : 
> ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME 
> SCALAR
>   at 
> org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:415)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:386)
>   at 
> org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130)
>   at 
> org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567)
>   at 
> org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485)
>   ... 14 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.

2017-02-16 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-1277:
--
Priority: Blocker  (was: Major)

> DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
> ---
>
> Key: SYSTEMML-1277
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1277
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.13
>Reporter: Mike Dusenberry
>Priority: Blocker
>
> Recently, we made the switch from the old {{mllib.Vector}} to the new 
> {{ml.Vector}} type.  Unfortunately, this leaves us with the issue of no 
> longer recognizing DataFrames with {{mllib.Vector}} columns during 
> conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} 
> objects, (2) instead fall back on conversion to {{Frame}} objects, and then 
> (3) fail completely when the ensuing DML script is expecting to operated on 
> matrices.
> Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, 
> sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the 
> following script will now fail (did not previously):
> {code}
> script = """
> # Scale images to [-1,1]
> X = X / 255
> X = X * 2 - 1
> """
> outputs = ("X")
> script = dml(script).input(X=X_df).output(*outputs)
> X = ml.execute(script).get(*outputs)
> X
> {code}
> {code}
> Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception 
> occurred while validating script
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280)
>   at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293)
>   ... 12 more
> Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : 
> ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME 
> SCALAR
>   at 
> org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:415)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:386)
>   at 
> org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130)
>   at 
> org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567)
>   at 
> org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485)
>   ... 14 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.

2017-02-16 Thread Mike Dusenberry (JIRA)
Mike Dusenberry created SYSTEMML-1277:
-

 Summary: DataFrames With `mllib.Vector` Columns Are No Longer 
Converted to Matrices.
 Key: SYSTEMML-1277
 URL: https://issues.apache.org/jira/browse/SYSTEMML-1277
 Project: SystemML
  Issue Type: Bug
Reporter: Mike Dusenberry


Recently, we made the switch from the old {{mllib.Vector}} to the new 
{{ml.Vector}} type.  Unfortunately, this leaves us with the issue of no longer 
recognizing DataFrames with {{mllib.Vector}} columns during conversion, and 
thus, we (1) do not correctly convert to SystemML {{Matrix}} objects, (2) 
instead fall back on conversion to {{Frame}} objects, and then (3) fail 
completely when the ensuing DML script is expecting to operated on matrices.

Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, sample: 
vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the following script 
will now fail (did not previously):

{code}
script = """
# Scale images to [-1,1]
X = X / 255
X = X * 2 - 1
"""
outputs = ("X")
script = dml(script).input(X=X_df).output(*outputs)
X = ml.execute(script).get(*outputs)
X
{code}

{code}
Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception 
occurred while validating script
at 
org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487)
at 
org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280)
at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293)
... 12 more
Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : 
ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME SCALAR
at 
org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549)
at 
org.apache.sysml.parser.Expression.computeDataType(Expression.java:415)
at 
org.apache.sysml.parser.Expression.computeDataType(Expression.java:386)
at 
org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130)
at 
org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567)
at 
org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140)
at 
org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485)
... 14 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.

2017-02-16 Thread Mike Dusenberry (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Dusenberry updated SYSTEMML-1277:
--
Affects Version/s: SystemML 0.13

> DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
> ---
>
> Key: SYSTEMML-1277
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1277
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.13
>Reporter: Mike Dusenberry
>Priority: Blocker
>
> Recently, we made the switch from the old {{mllib.Vector}} to the new 
> {{ml.Vector}} type.  Unfortunately, this leaves us with the issue of no 
> longer recognizing DataFrames with {{mllib.Vector}} columns during 
> conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} 
> objects, (2) instead fall back on conversion to {{Frame}} objects, and then 
> (3) fail completely when the ensuing DML script is expecting to operated on 
> matrices.
> Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, 
> sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the 
> following script will now fail (did not previously):
> {code}
> script = """
> # Scale images to [-1,1]
> X = X / 255
> X = X * 2 - 1
> """
> outputs = ("X")
> script = dml(script).input(X=X_df).output(*outputs)
> X = ml.execute(script).get(*outputs)
> X
> {code}
> {code}
> Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception 
> occurred while validating script
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280)
>   at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293)
>   ... 12 more
> Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : 
> ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME 
> SCALAR
>   at 
> org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:415)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:386)
>   at 
> org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130)
>   at 
> org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567)
>   at 
> org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485)
>   ... 14 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.

2017-02-16 Thread Mike Dusenberry (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870536#comment-15870536
 ] 

Mike Dusenberry commented on SYSTEMML-1277:
---

Also, just to follow up, the {{ml.Vector}} type should remain the standard 
default, as Spark is moving away from {{mllib.Vector}}.  However, since 
DataFrames created and saved with {{mllib.Vector}} types can still be used (and 
often without the user realizing that a saved DataFrame would maintain a 
distinct separation between the two types), it's plausible that a user will try 
to run the same SystemML code with the same DataFrame as before, and thus run 
into issues now.  We could just catch any {{mllib.Vector}} types and convert to 
{{ml.Vector}} with {{mllib.Vector.asML}} which does not make any copy of the 
data --> 
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Vector.

> DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
> ---
>
> Key: SYSTEMML-1277
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1277
> Project: SystemML
>  Issue Type: Bug
>Affects Versions: SystemML 0.13
>Reporter: Mike Dusenberry
>Priority: Blocker
>
> Recently, we made the switch from the old {{mllib.Vector}} to the new 
> {{ml.Vector}} type.  Unfortunately, this leaves us with the issue of no 
> longer recognizing DataFrames with {{mllib.Vector}} columns during 
> conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} 
> objects, (2) instead fall back on conversion to {{Frame}} objects, and then 
> (3) fail completely when the ensuing DML script is expecting to operated on 
> matrices.
> Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, 
> sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the 
> following script will now fail (did not previously):
> {code}
> script = """
> # Scale images to [-1,1]
> X = X / 255
> X = X * 2 - 1
> """
> outputs = ("X")
> script = dml(script).input(X=X_df).output(*outputs)
> X = ml.execute(script).get(*outputs)
> X
> {code}
> {code}
> Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception 
> occurred while validating script
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280)
>   at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293)
>   ... 12 more
> Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : 
> ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME 
> SCALAR
>   at 
> org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:415)
>   at 
> org.apache.sysml.parser.Expression.computeDataType(Expression.java:386)
>   at 
> org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130)
>   at 
> org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567)
>   at 
> org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140)
>   at 
> org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485)
>   ... 14 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1257) Univar-Stats scripts failing due to Unexpected ValueType in ArithmeticInstruction

2017-02-16 Thread Deron Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870691#comment-15870691
 ] 

Deron Eriksson commented on SYSTEMML-1257:
--

Thanks for the quick fix Matthias!

> Univar-Stats scripts failing due to Unexpected ValueType in 
> ArithmeticInstruction
> -
>
> Key: SYSTEMML-1257
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1257
> Project: SystemML
>  Issue Type: Bug
>Reporter: Arvind Surve
>Assignee: Matthias Boehm
> Fix For: SystemML 0.13
>
>
> Running Release verification process 
> (http://apache.github.io/incubator-systemml/release-process.html)  where 
> Univar-Stats.dml failing to execute.
> Trying to run following example on Single Node Spark environment.
> $ tar -xvzf systemml-0.11.0-incubating.tgz
> $ cd systemml-0.11.0-incubating
> $ export SPARK_HOME=/Users/deroneriksson/spark-1.5.1-bin-hadoop2.6
> $ $SPARK_HOME/bin/spark-submit SystemML.jar -f 
> scripts/datagen/genRandData4Univariate.dml -exec hybrid_spark -args 100 
> 100 10 1 2 3 4 uni.mtx
> $ echo '1' > uni-types.csv
> $ echo '{"rows": 1, "cols": 1, "format": "csv"}' > uni-types.csv.mtd
> $ $SPARK_HOME/bin/spark-submit SystemML.jar -f 
> scripts/algorithms/Univar-Stats.dml -exec hybrid_spark -nvargs X=uni.mtx 
> TYPES=uni-types.csv STATS=uni-stats.txt CONSOLE_OUTPUT=TRUE
> Exception get is following:
> Exception in thread "main" org.apache.sysml.api.DMLException: 
> org.apache.sysml.lops.LopsException: ERROR: line 64, column 21 -- Problem 
> generating simple inst - 
> CP°*°_Var27·SCALAR·DOUBLE·false°_Var25·SCALAR·DOUBLE·false°_Var28·SCALAR·BOOLEAN
>   at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:374)
>   at org.apache.sysml.api.DMLScript.main(DMLScript.java:221)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: org.apache.sysml.lops.LopsException: ERROR: line 64, column 21 -- 
> Problem generating simple inst - 
> CP°*°_Var27·SCALAR·DOUBLE·false°_Var25·SCALAR·DOUBLE·false°_Var28·SCALAR·BOOLEAN
>   at 
> org.apache.sysml.lops.compile.Dag.generateControlProgramJobs(Dag.java:1572)
>   at org.apache.sysml.lops.compile.Dag.doGreedyGrouping(Dag.java:1212)
>   at org.apache.sysml.lops.compile.Dag.getJobs(Dag.java:267)
>   at 
> org.apache.sysml.parser.DMLProgram.createRuntimeProgramBlock(DMLProgram.java:531)
>   at 
> org.apache.sysml.parser.DMLProgram.getRuntimeProgram(DMLProgram.java:207)
>   at org.apache.sysml.api.DMLScript.execute(DMLScript.java:633)
>   at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:360)
>   ... 10 more
> Caused by: org.apache.sysml.runtime.DMLRuntimeException: Unexpected ValueType 
> in ArithmeticInstruction.
>   at 
> org.apache.sysml.runtime.instructions.cp.ArithmeticBinaryCPInstruction.parseInstruction(ArithmeticBinaryCPInstruction.java:80)
>   at 
> org.apache.sysml.runtime.instructions.CPInstructionParser.parseSingleInstruction(CPInstructionParser.java:321)
>   at 
> org.apache.sysml.runtime.instructions.InstructionParser.parseSingleInstruction(InstructionParser.java:45)
>   at 
> org.apache.sysml.lops.compile.Dag.generateControlProgramJobs(Dag.java:1559)
> Following line in Univar-stats dml causing that exception:
> maxDomainSize = max( (K > 1) * maxs );
> Its Boolean x Double, causing problem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (SYSTEMML-1238) Python test failing for LinearRegCG

2017-02-16 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871170#comment-15871170
 ] 

Niketan Pansare commented on SYSTEMML-1238:
---

I am able to reproduce this bug with command-line as well. Here is the output 
of GLM-predict (after running LinRegDS):
{code}
$ cat y_predicted.csv
189.09660701586185
133.3260601238074
157.3739106185465
132.8144037303023
135.88434209133283
154.81562865102103
194.2131709509127
136.3959984848379
125.13955782772601
137.41931127184807
178.35182275225503
123.60458864721075
152.7690030770007
141.0009060263837
116.95305553164462
161.46716176658717
144.58250078091928
144.58250078091928
170.67697684967874
117.4647119251497
{code}

Here is the output of Python mllearn:
{code}
>>> import numpy as np
>>> from pyspark.context import SparkContext
>>> from pyspark.ml import Pipeline
>>> from pyspark.ml.feature import HashingTF, Tokenizer
from pyspark.sql import SparkSession
from sklearn import datasets, metrics, neighbors
>>> from pyspark.sql import SparkSession
>>> from sklearn import datasets, metrics, neighbors
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer

from systemml.mllearn import LinearRegression, LogisticRegression, NaiveBayes, 
SVM
diabetes = datasets.load_diabetes()
diabetes_X = diabetes.data[:, np.newaxis, 2]
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]
sparkSession = SparkSession.builder.getOrCreate()
regr = LinearRegression(sparkSession, solver="direct-solve")
regr.fit(diabetes_X_train, diabetes_y_train)>>> from sklearn.datasets import 
fetch_20newsgroups
>>> from sklearn.feature_extraction.text import TfidfVectorizer
>>>
>>> from systemml.mllearn import LinearRegression, LogisticRegression, 
>>> NaiveBayes, SVM
>>> diabetes = datasets.load_diabetes()
>>> diabetes_X = diabetes.data[:, np.newaxis, 2]
>>> diabetes_X_train = diabetes_X[:-20]
>>> diabetes_X_test = diabetes_X[-20:]
>>> diabetes_y_train = diabetes.target[:-20]
>>> diabetes_y_test = diabetes.target[-20:]
>>> sparkSession = SparkSession.builder.getOrCreate()
>>> regr = LinearRegression(sparkSession, solver="direct-solve")
>>> regr.fit(diabetes_X_train, diabetes_y_train)

Welcome to Apache SystemML!

17/02/16 22:39:21 WARN RewriteRemovePersistentReadWrite: Non-registered 
persistent write of variable 'X' (line 87).
17/02/16 22:39:21 WARN RewriteRemovePersistentReadWrite: Non-registered 
persistent write of variable 'y' (line 88).
BEGIN LINEAR REGRESSION SCRIPT
Reading X and Y...
Calling the Direct Solver...
Computing the statistics...
AVG_TOT_Y,153.36255924170615
STDEV_TOT_Y,77.21853383600028
AVG_RES_Y,4.8020565933360324E-14
STDEV_RES_Y,67.06389890324985
DISPERSION,4497.566536105316
PLAIN_R2,0.24750834362605834
ADJUSTED_R2,0.24571669682516795
PLAIN_R2_NOBIAS,0.24750834362605834
ADJUSTED_R2_NOBIAS,0.24571669682516795
Writing the output matrix...
END LINEAR REGRESSION SCRIPT
lr
>>> regr.predict(diabetes_X_test)
17/02/16 22:39:35 WARN Expression: WARNING: null -- line 149, column 4 -- Read 
input file does not exist on FS (local mode):
17/02/16 22:39:35 WARN Expression: Metadata file:  .mtd not provided
array([[ 188.84521284],
   [ 134.98127765],
   [ 158.20701117],
   [ 134.4871131 ],
   [ 137.45210036],
   [ 155.73618846],
   [ 193.78685827],
   [ 137.94626491],
   [ 127.07464496],
   [ 138.93459399],
   [ 178.46775744],
   [ 125.59215133],
   [ 153.75953028],
   [ 142.39374579],
   [ 119.16801227],
   [ 162.16032752],
   [ 145.8528976 ],
   [ 145.8528976 ],
   [ 171.05528929],
   [ 119.66217681]])
{code}

To reproduce the command-line output, please dump the test data into csv:
{code}
import numpy as np
from sklearn import datasets
diabetes = datasets.load_diabetes()
diabetes_X = diabetes.data[:, np.newaxis, 2]
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]
diabetes_X_test.tofile('X_test.csv', sep="\n")
diabetes_X.tofile('X.csv', sep="\n")
diabetes.target.tofile('y.csv', sep="\n")
{code}

And execute following commands (you may have to edit dml script to add format 
or create metadata file):
{code}
~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit SystemML.jar -f LinearRegDS.dml 
-nvargs X=X.csv Y=y.csv B=B.csv fmt=csv icpt=1 tol=0.01 reg=1 
~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit SystemML.jar -f GLM-predict.dml 
-nvargs X=X_test.csv M=y_predicted.csv B=B.csv fmt=csv icpt=1 tol=0.01 reg=1
{code}

> Python test failing for LinearRegCG
> ---
>
> Key: SYSTEMML-1238
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1238
> Project: SystemML
>  Issue Type: Bug
> 

[jira] [Comment Edited] (SYSTEMML-1238) Python test failing for LinearRegCG

2017-02-16 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871170#comment-15871170
 ] 

Niketan Pansare edited comment on SYSTEMML-1238 at 2/17/17 5:36 AM:


I am able to reproduce this bug (not sure if it is) with command-line as well. 
Here is the output of GLM-predict (after running LinRegDS):
{code}
$ cat y_predicted.csv
189.09660701586185
133.3260601238074
157.3739106185465
132.8144037303023
135.88434209133283
154.81562865102103
194.2131709509127
136.3959984848379
125.13955782772601
137.41931127184807
178.35182275225503
123.60458864721075
152.7690030770007
141.0009060263837
116.95305553164462
161.46716176658717
144.58250078091928
144.58250078091928
170.67697684967874
117.4647119251497
{code}

Here is the output of Python mllearn:
{code}
>>> import numpy as np
>>> from pyspark.context import SparkContext
>>> from pyspark.ml import Pipeline
>>> from pyspark.ml.feature import HashingTF, Tokenizer
from pyspark.sql import SparkSession
from sklearn import datasets, metrics, neighbors
>>> from pyspark.sql import SparkSession
>>> from sklearn import datasets, metrics, neighbors
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer

from systemml.mllearn import LinearRegression, LogisticRegression, NaiveBayes, 
SVM
diabetes = datasets.load_diabetes()
diabetes_X = diabetes.data[:, np.newaxis, 2]
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]
sparkSession = SparkSession.builder.getOrCreate()
regr = LinearRegression(sparkSession, solver="direct-solve")
regr.fit(diabetes_X_train, diabetes_y_train)>>> from sklearn.datasets import 
fetch_20newsgroups
>>> from sklearn.feature_extraction.text import TfidfVectorizer
>>>
>>> from systemml.mllearn import LinearRegression, LogisticRegression, 
>>> NaiveBayes, SVM
>>> diabetes = datasets.load_diabetes()
>>> diabetes_X = diabetes.data[:, np.newaxis, 2]
>>> diabetes_X_train = diabetes_X[:-20]
>>> diabetes_X_test = diabetes_X[-20:]
>>> diabetes_y_train = diabetes.target[:-20]
>>> diabetes_y_test = diabetes.target[-20:]
>>> sparkSession = SparkSession.builder.getOrCreate()
>>> regr = LinearRegression(sparkSession, solver="direct-solve")
>>> regr.fit(diabetes_X_train, diabetes_y_train)

Welcome to Apache SystemML!

17/02/16 22:39:21 WARN RewriteRemovePersistentReadWrite: Non-registered 
persistent write of variable 'X' (line 87).
17/02/16 22:39:21 WARN RewriteRemovePersistentReadWrite: Non-registered 
persistent write of variable 'y' (line 88).
BEGIN LINEAR REGRESSION SCRIPT
Reading X and Y...
Calling the Direct Solver...
Computing the statistics...
AVG_TOT_Y,153.36255924170615
STDEV_TOT_Y,77.21853383600028
AVG_RES_Y,4.8020565933360324E-14
STDEV_RES_Y,67.06389890324985
DISPERSION,4497.566536105316
PLAIN_R2,0.24750834362605834
ADJUSTED_R2,0.24571669682516795
PLAIN_R2_NOBIAS,0.24750834362605834
ADJUSTED_R2_NOBIAS,0.24571669682516795
Writing the output matrix...
END LINEAR REGRESSION SCRIPT
lr
>>> regr.predict(diabetes_X_test)
17/02/16 22:39:35 WARN Expression: WARNING: null -- line 149, column 4 -- Read 
input file does not exist on FS (local mode):
17/02/16 22:39:35 WARN Expression: Metadata file:  .mtd not provided
array([[ 188.84521284],
   [ 134.98127765],
   [ 158.20701117],
   [ 134.4871131 ],
   [ 137.45210036],
   [ 155.73618846],
   [ 193.78685827],
   [ 137.94626491],
   [ 127.07464496],
   [ 138.93459399],
   [ 178.46775744],
   [ 125.59215133],
   [ 153.75953028],
   [ 142.39374579],
   [ 119.16801227],
   [ 162.16032752],
   [ 145.8528976 ],
   [ 145.8528976 ],
   [ 171.05528929],
   [ 119.66217681]])
{code}

To reproduce the command-line output, please dump the test data into csv:
{code}
import numpy as np
from sklearn import datasets
diabetes = datasets.load_diabetes()
diabetes_X = diabetes.data[:, np.newaxis, 2]
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]
diabetes_X_test.tofile('X_test.csv', sep="\n")
diabetes_X.tofile('X.csv', sep="\n")
diabetes.target.tofile('y.csv', sep="\n")
{code}

And execute following commands (you may have to edit dml script to add format 
or create metadata file):
{code}
~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit SystemML.jar -f LinearRegDS.dml 
-nvargs X=X.csv Y=y.csv B=B.csv fmt=csv icpt=1 tol=0.01 reg=1 
~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit SystemML.jar -f GLM-predict.dml 
-nvargs X=X_test.csv M=y_predicted.csv B=B.csv fmt=csv icpt=1 tol=0.01 reg=1
{code}

I also tested using SystemML 0.12.0 and got the same predictions:
{code}
$ ~/spark-1.6.1-bin-hadoop2.6/bin/spark-submit systemml-0.12.0-incubating.jar 
-f LinearRegDS.dml -nvargs X=X.csv 

[jira] [Updated] (SYSTEMML-1280) Restore and deprecate SQLContext methods

2017-02-16 Thread Deron Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deron Eriksson updated SYSTEMML-1280:
-
Description: 
SYSTEMML-1194 replaced SQLContext with SparkSession in SystemML for Spark 2.1.0 
since SQLContext is deprecated.

Restore the old Java SQLContext method signatures in case any users are using 
SystemML methods and are unable to use SparkSessions (SparkSessions are 
described in 
https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html)

Classes where this applies:
old MLContext class (whole class is deprecated)
old MLMatrix class (whole class is deprecated)
old MLOutput class (whole class is deprecated)
FrameRDDConverterUtils (this is a non-API class)
RDDConverterUtils (this is a non-API class)
RDDConverterUtilsExt (this is a non-API class)

In non-API classes, these SQLContext methods should be marked as deprecated and 
removed in a future version of SystemML (1.0) since SparkSessions should 
generally be used with Spark 2. As mentioned in SQLContext documentation, "As 
of Spark 2.0, this is replaced by SparkSession. However, we are keeping the 
class here for backward compatibility."



  was:
SYSTEMML-1194 replaced SQLContext with SparkSession in SystemML for Spark 2.1.0 
since SQLContext is deprecated.

Restore the old Java SQLContext method signatures in case any users are using 
SystemML methods and are unable to use SparkSessions (SparkSessions are 
generally easy to create, as described in 
https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html)

Classes where this applies:
old MLContext class (whole class is deprecated)
old MLMatrix class (whole class is deprecated)
old MLOutput class (whole class is deprecated)
FrameRDDConverterUtils (this is a non-API class)
RDDConverterUtils (this is a non-API class)
RDDConverterUtilsExt (this is a non-API class)

In non-API classes, these SQLContext methods should be marked as deprecated and 
removed in a future version of SystemML (1.0) since SparkSessions should 
generally be used with Spark 2. As mentioned in SQLContext documentation, "As 
of Spark 2.0, this is replaced by SparkSession. However, we are keeping the 
class here for backward compatibility."




> Restore and deprecate SQLContext methods
> 
>
> Key: SYSTEMML-1280
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1280
> Project: SystemML
>  Issue Type: Task
>  Components: APIs, Runtime
>Reporter: Deron Eriksson
>Assignee: Deron Eriksson
>
> SYSTEMML-1194 replaced SQLContext with SparkSession in SystemML for Spark 
> 2.1.0 since SQLContext is deprecated.
> Restore the old Java SQLContext method signatures in case any users are using 
> SystemML methods and are unable to use SparkSessions (SparkSessions are 
> described in 
> https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html)
> Classes where this applies:
> old MLContext class (whole class is deprecated)
> old MLMatrix class (whole class is deprecated)
> old MLOutput class (whole class is deprecated)
> FrameRDDConverterUtils (this is a non-API class)
> RDDConverterUtils (this is a non-API class)
> RDDConverterUtilsExt (this is a non-API class)
> In non-API classes, these SQLContext methods should be marked as deprecated 
> and removed in a future version of SystemML (1.0) since SparkSessions should 
> generally be used with Spark 2. As mentioned in SQLContext documentation, "As 
> of Spark 2.0, this is replaced by SparkSession. However, we are keeping the 
> class here for backward compatibility."



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)