[jira] [Created] (SYSTEMML-1279) EOFException in MinMaxMean example snippet
Felix Schüler created SYSTEMML-1279: --- Summary: EOFException in MinMaxMean example snippet Key: SYSTEMML-1279 URL: https://issues.apache.org/jira/browse/SYSTEMML-1279 Project: SystemML Issue Type: Bug Reporter: Felix Schüler Priority: Minor Our current documentation contains a snippet for a short DML scipt: {code} val numRows = 1 val numCols = 1000 val data = sc.parallelize(0 to numRows-1).map { _ => Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) } val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, DoubleType, true) } ) val df = spark.createDataFrame(data, schema) val minMaxMean = """ minOut = min(Xin) maxOut = max(Xin) meanOut = mean(Xin) """ val mm = new MatrixMetadata(numRows, numCols) val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", "maxOut", "meanOut") val (min, max, mean) = ml.execute(minMaxMeanScript).getTuple[Double, Double, Double]("minOut", "maxOut", "meanOut") {code} Execution of the line {code} val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", "maxOut", "meanOut") {code} in the spark-shell leads to the following error: {code} scala> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", "maxOut", "meanOut") [Stage 0:> (0 + 4) / 4]17/02/16 13:37:10 WARN CodeGenerator: Error calculating stats of compiled class. java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at java.io.DataInputStream.readFully(DataInputStream.java:169) at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1509) at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:644) at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:623) at org.codehaus.janino.util.ClassFile.(ClassFile.java:280) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:967) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:964) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:964) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:936) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:998) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:995) at org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) at org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) at org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000) at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:890) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:405) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:359) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:32) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:874) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection$lzycompute(ExpressionEncoder.scala:266) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection(ExpressionEncoder.scala:266) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:290) at org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:547) at org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:547) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at
[jira] [Commented] (SYSTEMML-1238) Python test failing for LinearRegCG
[ https://issues.apache.org/jira/browse/SYSTEMML-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871102#comment-15871102 ] Niketan Pansare commented on SYSTEMML-1238: --- 1. I have verified that the mllearn API in 0.12.0 produces correct results. 2. No changes have been introduced in Python/Scala wrappers to affect this. The only change I see in algo since 0.12.0 is cbind. The bug is likely due to a side-effect of some other change. 3. I verified that Python wrappers are passing correct inputs to DML script by writing the input X,y to file and comparing it with original python data. I tested LinRegDS: A. commandline: {code} ~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit SystemML.jar -f LinearRegDS.dml -nvargs X=X.csv Y=y.csv B=B.csv fmt=csv icpt=1 tol=0.01 reg=1 Calling the Direct Solver... Computing the statistics... 17/02/16 21:02:52 INFO MapPartitionsRDD: Removing RDD 17 from persistence list 17/02/16 21:02:52 INFO BlockManager: Removing RDD 17 AVG_TOT_Y,152.13348416289594 STDEV_TOT_Y,77.09300453299106 AVG_RES_Y,-2.935409582574532E-14 STDEV_RES_Y,66.48545020578437 DISPERSION,4420.315089065834 PLAIN_R2,0.2579428201690507 ADJUSTED_R2,0.2562563265785258 PLAIN_R2_NOBIAS,0.2579428201690507 ADJUSTED_R2_NOBIAS,0.2562563265785258 Writing the output matrix... END LINEAR REGRESSION SCRIPT {code} B. mllearn: {code} Calling the Direct Solver... Computing the statistics... AVG_TOT_Y,153.36255924170615 STDEV_TOT_Y,77.21853383600028 AVG_RES_Y,4.8020565933360324E-14 STDEV_RES_Y,67.06389890324985 DISPERSION,4497.566536105316 PLAIN_R2,0.24750834362605834 ADJUSTED_R2,0.24571669682516795 PLAIN_R2_NOBIAS,0.24750834362605834 ADJUSTED_R2_NOBIAS,0.24571669682516795 Writing the output matrix... END LINEAR REGRESSION SCRIPT lr {code} > Python test failing for LinearRegCG > --- > > Key: SYSTEMML-1238 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1238 > Project: SystemML > Issue Type: Bug > Components: Algorithms, APIs >Affects Versions: SystemML 0.13 >Reporter: Imran Younus >Assignee: Niketan Pansare > Attachments: python_LinearReg_test_spark.1.6.log, > python_LinearReg_test_spark.2.1.log > > > [~deron] discovered that the one of the python test ({{test_mllearn_df.py}}) > with spark 2.1.0 was failing because the test score from linear regression > was very low ({{~ 0.24}}). I did a some investigation and it turns out the > the model parameters computed by the dml script are incorrect. In > systemml.12, the values of betas from linear regression model are > {{\[152.919, 938.237\]}}. This is what we expect from normal equation. (I > also tested this with sklearn). But the values of betas from systemml.13 > (with spark 2.1.0) come out to be {{\[153.146, 458.489\]}}. These are not > correct and therefore the test score is much lower than expected. The data > going into DML script is correct. I printed out the valued of {{X}} and {{Y}} > in dml and I didn't see any issue there. > Attached are the log files for two different tests (systemml0.12 and 0.13) > with explain flag. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1279) EOFException in MinMaxMean example snippet
[ https://issues.apache.org/jira/browse/SYSTEMML-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870769#comment-15870769 ] Felix Schüler commented on SYSTEMML-1279: - Might be related to https://issues.apache.org/jira/browse/SPARK-17131 > EOFException in MinMaxMean example snippet > -- > > Key: SYSTEMML-1279 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1279 > Project: SystemML > Issue Type: Bug >Reporter: Felix Schüler >Priority: Minor > > Our current documentation contains a snippet for a short DML scipt: > {code} > val numRows = 1 > val numCols = 1000 > val data = sc.parallelize(0 to numRows-1).map { _ => > Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) } > val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, > DoubleType, true) } ) > val df = spark.createDataFrame(data, schema) > val minMaxMean = > """ > minOut = min(Xin) > maxOut = max(Xin) > meanOut = mean(Xin) > """ > val mm = new MatrixMetadata(numRows, numCols) > val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", > "maxOut", "meanOut") > val (min, max, mean) = ml.execute(minMaxMeanScript).getTuple[Double, Double, > Double]("minOut", "maxOut", "meanOut") > {code} > Execution of the line > {code} > val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", > "maxOut", "meanOut") > {code} > in the spark-shell leads to the following error: > {code} > scala> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", > "maxOut", "meanOut") > [Stage 0:> (0 + 4) / > 4]17/02/16 13:37:10 WARN CodeGenerator: Error calculating stats of compiled > class. > java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:197) > at java.io.DataInputStream.readFully(DataInputStream.java:169) > at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1509) > at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:644) > at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:623) > at org.codehaus.janino.util.ClassFile.(ClassFile.java:280) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:967) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:964) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:964) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:936) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:998) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:995) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at > org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) > at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000) > at > org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:890) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:405) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:359) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:32) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:874) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection$lzycompute(ExpressionEncoder.scala:266) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection(ExpressionEncoder.scala:266) > at >
[jira] [Updated] (SYSTEMML-1280) Restore and deprecate SQLContext methods
[ https://issues.apache.org/jira/browse/SYSTEMML-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deron Eriksson updated SYSTEMML-1280: - Description: SYSTEMML-1194 replaced SQLContext with SparkSession in SystemML for Spark 2.1.0 since SQLContext is deprecated. Restore the old Java SQLContext method signatures in case any users are using SystemML methods and are unable to use SparkSessions (SparkSessions are generally easy to create, as described in https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html) Classes where this applies: old MLContext class (whole class is deprecated) old MLMatrix class (whole class is deprecated) old MLOutput class (whole class is deprecated) FrameRDDConverterUtils (this is a non-API class) RDDConverterUtils (this is a non-API class) RDDConverterUtilsExt (this is a non-API class) In non-API classes, these SQLContext methods should be marked as deprecated and removed in a future version of SystemML (1.0) since SparkSessions should generally be used with Spark 2. As mentioned in SQLContext documentation, "As of Spark 2.0, this is replaced by SparkSession. However, we are keeping the class here for backward compatibility." was: SYSTEMML-1194 replaced SQLContext with SparkSession in SystemML for Spark 2.1.0 since SQLContext is deprecated. Restore the old Java SQLContext method signatures in case any users are using SystemML methods and are unable to use SparkSessions (SparkSessions are generally easy to create, as described in https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html) Classes where this applies: old MLContext class (whole class is deprecated) old MLMatrix class (whole class is deprecated) old MLOutput class (whole class is deprecated) FrameRDDConverterUtils (this is a non-API class) RDDConverterUtils (this is a non-API class) RDDConverterUtilsExt (this is a non-API class) In non-API classes, these SQLContext methods should be marked as deprecated and removed in a future version of SystemML (1.0) since SparkSessions should be used with Spark 2. See https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html > Restore and deprecate SQLContext methods > > > Key: SYSTEMML-1280 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1280 > Project: SystemML > Issue Type: Task > Components: APIs, Runtime >Reporter: Deron Eriksson >Assignee: Deron Eriksson > > SYSTEMML-1194 replaced SQLContext with SparkSession in SystemML for Spark > 2.1.0 since SQLContext is deprecated. > Restore the old Java SQLContext method signatures in case any users are using > SystemML methods and are unable to use SparkSessions (SparkSessions are > generally easy to create, as described in > https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html) > Classes where this applies: > old MLContext class (whole class is deprecated) > old MLMatrix class (whole class is deprecated) > old MLOutput class (whole class is deprecated) > FrameRDDConverterUtils (this is a non-API class) > RDDConverterUtils (this is a non-API class) > RDDConverterUtilsExt (this is a non-API class) > In non-API classes, these SQLContext methods should be marked as deprecated > and removed in a future version of SystemML (1.0) since SparkSessions should > generally be used with Spark 2. As mentioned in SQLContext documentation, "As > of Spark 2.0, this is replaced by SparkSession. However, we are keeping the > class here for backward compatibility." -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (SYSTEMML-1281) OOM Error On Binary Write
Mike Dusenberry created SYSTEMML-1281: - Summary: OOM Error On Binary Write Key: SYSTEMML-1281 URL: https://issues.apache.org/jira/browse/SYSTEMML-1281 Project: SystemML Issue Type: Bug Affects Versions: SystemML 0.13 Reporter: Mike Dusenberry Priority: Blocker I'm running into the following heap space OOM error while attempting to save a large Spark DataFrame to a SystemML binary format via DML {{write}} statements. {code} tr_sample_filename = os.path.join("data", "train_{}{}.parquet".format(size, "_grayscale" if grayscale else "")) val_sample_filename = os.path.join("data", "val_{}{}.parquet".format(size, "_grayscale" if grayscale else "")) train_df = sqlContext.read.load(tr_sample_filename) val_df = sqlContext.read.load(val_sample_filename) train_df, val_df # Note: Must use the row index column, or X may not # necessarily correspond correctly to Y X_df = train_df.select("__INDEX", "sample") X_val_df = val_df.select("__INDEX", "sample") y_df = train_df.select("__INDEX", "tumor_score") y_val_df = val_df.select("__INDEX", "tumor_score") X_df, X_val_df, y_df, y_val_df script = """ # Scale images to [-1,1] X = X / 255 X_val = X_val / 255 X = X * 2 - 1 X_val = X_val * 2 - 1 # One-hot encode the labels num_tumor_classes = 3 n = nrow(y) n_val = nrow(y_val) Y = table(seq(1, n), y, n, num_tumor_classes) Y_val = table(seq(1, n_val), y_val, n_val, num_tumor_classes) """ outputs = ("X", "X_val", "Y", "Y_val") script = dml(script).input(X=X_df, X_val=X_val_df, y=y_df, y_val=y_val_df).output(*outputs) X, X_val, Y, Y_val = ml.execute(script).get(*outputs) X, X_val, Y, Y_val script = """ write(X, "data/systemml/X_"+size+"_"+c+"_binary", format="binary") write(Y, "data/systemml/Y_"+size+"_"+c+"_binary", format="binary") write(X_val, "data/systemml/X_val_"+size+"_"+c+"_binary", format="binary") write(Y_val, "data/systemml/Y_val_"+size+"_"+c+"_binary", format="binary") """ script = dml(script).input(X=X, X_val=X_val, Y=Y, Y_val=Y_val, size=size, c=c) ml.execute(script) {code} {code} Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception occurred while executing runtime program at org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:371) at org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:292) at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) ... 12 more Caused by: org.apache.sysml.runtime.DMLRuntimeException: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program block generated from statement block between lines 1 and 11 -- Error evaluating instruction: CP°mvvar°X°¶_Var49¶°binaryblock at org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:130) at org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(ScriptExecutor.java:369) ... 14 more Caused by: org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in program block generated from statement block between lines 1 and 11 -- Error evaluating instruction: CP°mvvar°X°¶_Var49¶°binaryblock at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:320) at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:221) at org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168) at org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123) ... 15 more Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: Move to data/systemml/X_256_3_binary failed. at org.apache.sysml.runtime.controlprogram.caching.CacheableData.moveData(CacheableData.java:1329) at org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processMoveInstruction(VariableCPInstruction.java:706) at org.apache.sysml.runtime.instructions.cp.VariableCPInstruction.processInstruction(VariableCPInstruction.java:511) at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:290) ... 18 more Caused by: org.apache.sysml.runtime.controlprogram.caching.CacheException: Export to data/systemml/X_256_3_binary failed. at org.apache.sysml.runtime.controlprogram.caching.CacheableData.exportData(CacheableData.java:800) at org.apache.sysml.runtime.controlprogram.caching.CacheableData.exportData(CacheableData.java:688) at org.apache.sysml.runtime.controlprogram.caching.CacheableData.moveData(CacheableData.java:1315) ... 21 more Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 269 in stage 40.0 failed 4 times, most recent failure: Lost task 269.3 in stage 40.0 (TID 61177, 9.30.110.145, executor 10): ExecutorLostFailure
[jira] [Resolved] (SYSTEMML-1280) Restore and deprecate SQLContext methods
[ https://issues.apache.org/jira/browse/SYSTEMML-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deron Eriksson resolved SYSTEMML-1280. -- Resolution: Fixed Fix Version/s: SystemML 0.13 Fixed by [PR396|https://github.com/apache/incubator-systemml/pull/396]. > Restore and deprecate SQLContext methods > > > Key: SYSTEMML-1280 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1280 > Project: SystemML > Issue Type: Task > Components: APIs, Runtime >Reporter: Deron Eriksson >Assignee: Deron Eriksson > Fix For: SystemML 0.13 > > > SYSTEMML-1194 replaced SQLContext with SparkSession in SystemML for Spark > 2.1.0 since SQLContext is deprecated. > Restore the old Java SQLContext method signatures in case any users are using > SystemML methods and are unable to use SparkSessions (SparkSessions are > described in > https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html) > Classes where this applies: > old MLContext class (whole class is deprecated) > old MLMatrix class (whole class is deprecated) > old MLOutput class (whole class is deprecated) > FrameRDDConverterUtils (this is a non-API class) > RDDConverterUtils (this is a non-API class) > RDDConverterUtilsExt (this is a non-API class) > In non-API classes, these SQLContext methods should be marked as deprecated > and removed in a future version of SystemML (1.0) since SparkSessions should > generally be used with Spark 2. As mentioned in SQLContext documentation, "As > of Spark 2.0, this is replaced by SparkSession. However, we are keeping the > class here for backward compatibility." -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (SYSTEMML-1280) Restore and deprecate SQLContext methods
[ https://issues.apache.org/jira/browse/SYSTEMML-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deron Eriksson closed SYSTEMML-1280. > Restore and deprecate SQLContext methods > > > Key: SYSTEMML-1280 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1280 > Project: SystemML > Issue Type: Task > Components: APIs, Runtime >Reporter: Deron Eriksson >Assignee: Deron Eriksson > Fix For: SystemML 0.13 > > > SYSTEMML-1194 replaced SQLContext with SparkSession in SystemML for Spark > 2.1.0 since SQLContext is deprecated. > Restore the old Java SQLContext method signatures in case any users are using > SystemML methods and are unable to use SparkSessions (SparkSessions are > described in > https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html) > Classes where this applies: > old MLContext class (whole class is deprecated) > old MLMatrix class (whole class is deprecated) > old MLOutput class (whole class is deprecated) > FrameRDDConverterUtils (this is a non-API class) > RDDConverterUtils (this is a non-API class) > RDDConverterUtilsExt (this is a non-API class) > In non-API classes, these SQLContext methods should be marked as deprecated > and removed in a future version of SystemML (1.0) since SparkSessions should > generally be used with Spark 2. As mentioned in SQLContext documentation, "As > of Spark 2.0, this is replaced by SparkSession. However, we are keeping the > class here for backward compatibility." -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
[ https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deron Eriksson reassigned SYSTEMML-1277: Assignee: Deron Eriksson > DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices. > --- > > Key: SYSTEMML-1277 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1277 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Assignee: Deron Eriksson >Priority: Blocker > > Recently, we made the switch from the old {{mllib.Vector}} to the new > {{ml.Vector}} type. Unfortunately, this leaves us with the issue of no > longer recognizing DataFrames with {{mllib.Vector}} columns during > conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} > objects, (2) instead fall back on conversion to {{Frame}} objects, and then > (3) fail completely when the ensuing DML script is expecting to operated on > matrices. > Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, > sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the > following script will now fail (did not previously): > {code} > script = """ > # Scale images to [-1,1] > X = X / 255 > X = X * 2 - 1 > """ > outputs = ("X") > script = dml(script).input(X=X_df).output(*outputs) > X = ml.execute(script).get(*outputs) > X > {code} > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while validating script > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : > ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME > SCALAR > at > org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:415) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:386) > at > org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130) > at > org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567) > at > org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485) > ... 14 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (SYSTEMML-1279) EOFException in MinMaxMean example snippet
[ https://issues.apache.org/jira/browse/SYSTEMML-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deron Eriksson reassigned SYSTEMML-1279: Assignee: Felix Schüler > EOFException in MinMaxMean example snippet > -- > > Key: SYSTEMML-1279 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1279 > Project: SystemML > Issue Type: Bug >Reporter: Felix Schüler >Assignee: Felix Schüler >Priority: Minor > Fix For: SystemML 0.13 > > > Our current documentation contains a snippet for a short DML scipt: > {code} > val numRows = 1 > val numCols = 1000 > val data = sc.parallelize(0 to numRows-1).map { _ => > Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) } > val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, > DoubleType, true) } ) > val df = spark.createDataFrame(data, schema) > val minMaxMean = > """ > minOut = min(Xin) > maxOut = max(Xin) > meanOut = mean(Xin) > """ > val mm = new MatrixMetadata(numRows, numCols) > val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", > "maxOut", "meanOut") > val (min, max, mean) = ml.execute(minMaxMeanScript).getTuple[Double, Double, > Double]("minOut", "maxOut", "meanOut") > {code} > Execution of the line > {code} > val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", > "maxOut", "meanOut") > {code} > in the spark-shell leads to the following error: > {code} > scala> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", > "maxOut", "meanOut") > [Stage 0:> (0 + 4) / > 4]17/02/16 13:37:10 WARN CodeGenerator: Error calculating stats of compiled > class. > java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:197) > at java.io.DataInputStream.readFully(DataInputStream.java:169) > at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1509) > at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:644) > at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:623) > at org.codehaus.janino.util.ClassFile.(ClassFile.java:280) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:967) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:964) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:964) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:936) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:998) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:995) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at > org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) > at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000) > at > org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:890) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:405) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:359) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:32) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:874) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection$lzycompute(ExpressionEncoder.scala:266) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection(ExpressionEncoder.scala:266) > at
[jira] [Resolved] (SYSTEMML-1279) EOFException in MinMaxMean example snippet
[ https://issues.apache.org/jira/browse/SYSTEMML-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deron Eriksson resolved SYSTEMML-1279. -- Resolution: Fixed Fix Version/s: SystemML 0.13 > EOFException in MinMaxMean example snippet > -- > > Key: SYSTEMML-1279 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1279 > Project: SystemML > Issue Type: Bug >Reporter: Felix Schüler >Priority: Minor > Fix For: SystemML 0.13 > > > Our current documentation contains a snippet for a short DML scipt: > {code} > val numRows = 1 > val numCols = 1000 > val data = sc.parallelize(0 to numRows-1).map { _ => > Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) } > val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, > DoubleType, true) } ) > val df = spark.createDataFrame(data, schema) > val minMaxMean = > """ > minOut = min(Xin) > maxOut = max(Xin) > meanOut = mean(Xin) > """ > val mm = new MatrixMetadata(numRows, numCols) > val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", > "maxOut", "meanOut") > val (min, max, mean) = ml.execute(minMaxMeanScript).getTuple[Double, Double, > Double]("minOut", "maxOut", "meanOut") > {code} > Execution of the line > {code} > val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", > "maxOut", "meanOut") > {code} > in the spark-shell leads to the following error: > {code} > scala> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", > "maxOut", "meanOut") > [Stage 0:> (0 + 4) / > 4]17/02/16 13:37:10 WARN CodeGenerator: Error calculating stats of compiled > class. > java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:197) > at java.io.DataInputStream.readFully(DataInputStream.java:169) > at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1509) > at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:644) > at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:623) > at org.codehaus.janino.util.ClassFile.(ClassFile.java:280) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:967) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:964) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:964) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:936) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:998) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:995) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at > org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) > at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000) > at > org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:890) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:405) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:359) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:32) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:874) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection$lzycompute(ExpressionEncoder.scala:266) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection(ExpressionEncoder.scala:266) > at >
[jira] [Closed] (SYSTEMML-1279) EOFException in MinMaxMean example snippet
[ https://issues.apache.org/jira/browse/SYSTEMML-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deron Eriksson closed SYSTEMML-1279. > EOFException in MinMaxMean example snippet > -- > > Key: SYSTEMML-1279 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1279 > Project: SystemML > Issue Type: Bug >Reporter: Felix Schüler >Assignee: Felix Schüler >Priority: Minor > Fix For: SystemML 0.13 > > > Our current documentation contains a snippet for a short DML scipt: > {code} > val numRows = 1 > val numCols = 1000 > val data = sc.parallelize(0 to numRows-1).map { _ => > Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) } > val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, > DoubleType, true) } ) > val df = spark.createDataFrame(data, schema) > val minMaxMean = > """ > minOut = min(Xin) > maxOut = max(Xin) > meanOut = mean(Xin) > """ > val mm = new MatrixMetadata(numRows, numCols) > val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", > "maxOut", "meanOut") > val (min, max, mean) = ml.execute(minMaxMeanScript).getTuple[Double, Double, > Double]("minOut", "maxOut", "meanOut") > {code} > Execution of the line > {code} > val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", > "maxOut", "meanOut") > {code} > in the spark-shell leads to the following error: > {code} > scala> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", > "maxOut", "meanOut") > [Stage 0:> (0 + 4) / > 4]17/02/16 13:37:10 WARN CodeGenerator: Error calculating stats of compiled > class. > java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:197) > at java.io.DataInputStream.readFully(DataInputStream.java:169) > at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1509) > at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:644) > at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:623) > at org.codehaus.janino.util.ClassFile.(ClassFile.java:280) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:967) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:964) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:964) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:936) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:998) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:995) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at > org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) > at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000) > at > org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:890) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:405) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:359) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:32) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:874) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection$lzycompute(ExpressionEncoder.scala:266) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection(ExpressionEncoder.scala:266) > at >
[jira] [Commented] (SYSTEMML-1279) EOFException in MinMaxMean example snippet
[ https://issues.apache.org/jira/browse/SYSTEMML-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870934#comment-15870934 ] Deron Eriksson commented on SYSTEMML-1279: -- [~fschueler] I am going to resolve this since your [PR395|https://github.com/apache/incubator-systemml/pull/395] addressed the codegen warning generated by following the docs. SYSTEMML-1267 is a duplicate of this issue but perhaps I will keep that open for now in case [~mboehm7] or someone else can figure out any workaround. > EOFException in MinMaxMean example snippet > -- > > Key: SYSTEMML-1279 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1279 > Project: SystemML > Issue Type: Bug >Reporter: Felix Schüler >Priority: Minor > > Our current documentation contains a snippet for a short DML scipt: > {code} > val numRows = 1 > val numCols = 1000 > val data = sc.parallelize(0 to numRows-1).map { _ => > Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) } > val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, > DoubleType, true) } ) > val df = spark.createDataFrame(data, schema) > val minMaxMean = > """ > minOut = min(Xin) > maxOut = max(Xin) > meanOut = mean(Xin) > """ > val mm = new MatrixMetadata(numRows, numCols) > val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", > "maxOut", "meanOut") > val (min, max, mean) = ml.execute(minMaxMeanScript).getTuple[Double, Double, > Double]("minOut", "maxOut", "meanOut") > {code} > Execution of the line > {code} > val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", > "maxOut", "meanOut") > {code} > in the spark-shell leads to the following error: > {code} > scala> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", > "maxOut", "meanOut") > [Stage 0:> (0 + 4) / > 4]17/02/16 13:37:10 WARN CodeGenerator: Error calculating stats of compiled > class. > java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:197) > at java.io.DataInputStream.readFully(DataInputStream.java:169) > at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1509) > at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:644) > at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:623) > at org.codehaus.janino.util.ClassFile.(ClassFile.java:280) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:967) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:964) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:964) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:936) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:998) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:995) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at > org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) > at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000) > at > org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:890) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:405) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:359) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:32) > at >
[jira] [Commented] (SYSTEMML-1279) EOFException in MinMaxMean example snippet
[ https://issues.apache.org/jira/browse/SYSTEMML-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870940#comment-15870940 ] Felix Schüler commented on SYSTEMML-1279: - Oh, didn't see that! Thanks! > EOFException in MinMaxMean example snippet > -- > > Key: SYSTEMML-1279 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1279 > Project: SystemML > Issue Type: Bug >Reporter: Felix Schüler >Assignee: Felix Schüler >Priority: Minor > Fix For: SystemML 0.13 > > > Our current documentation contains a snippet for a short DML scipt: > {code} > val numRows = 1 > val numCols = 1000 > val data = sc.parallelize(0 to numRows-1).map { _ => > Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) } > val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, > DoubleType, true) } ) > val df = spark.createDataFrame(data, schema) > val minMaxMean = > """ > minOut = min(Xin) > maxOut = max(Xin) > meanOut = mean(Xin) > """ > val mm = new MatrixMetadata(numRows, numCols) > val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", > "maxOut", "meanOut") > val (min, max, mean) = ml.execute(minMaxMeanScript).getTuple[Double, Double, > Double]("minOut", "maxOut", "meanOut") > {code} > Execution of the line > {code} > val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", > "maxOut", "meanOut") > {code} > in the spark-shell leads to the following error: > {code} > scala> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", > "maxOut", "meanOut") > [Stage 0:> (0 + 4) / > 4]17/02/16 13:37:10 WARN CodeGenerator: Error calculating stats of compiled > class. > java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:197) > at java.io.DataInputStream.readFully(DataInputStream.java:169) > at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1509) > at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:644) > at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:623) > at org.codehaus.janino.util.ClassFile.(ClassFile.java:280) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:967) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:964) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:964) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:936) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:998) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:995) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at > org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) > at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000) > at > org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:890) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:405) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:359) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:32) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:874) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection$lzycompute(ExpressionEncoder.scala:266) > at >
[jira] [Commented] (SYSTEMML-1279) EOFException in MinMaxMean example snippet
[ https://issues.apache.org/jira/browse/SYSTEMML-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870771#comment-15870771 ] Felix Schüler commented on SYSTEMML-1279: - For the sake of having a clean documentation I suggest to set numCols to 100 for now until it is fixed in spark. > EOFException in MinMaxMean example snippet > -- > > Key: SYSTEMML-1279 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1279 > Project: SystemML > Issue Type: Bug >Reporter: Felix Schüler >Priority: Minor > > Our current documentation contains a snippet for a short DML scipt: > {code} > val numRows = 1 > val numCols = 1000 > val data = sc.parallelize(0 to numRows-1).map { _ => > Row.fromSeq(Seq.fill(numCols)(Random.nextDouble)) } > val schema = StructType((0 to numCols-1).map { i => StructField("C" + i, > DoubleType, true) } ) > val df = spark.createDataFrame(data, schema) > val minMaxMean = > """ > minOut = min(Xin) > maxOut = max(Xin) > meanOut = mean(Xin) > """ > val mm = new MatrixMetadata(numRows, numCols) > val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", > "maxOut", "meanOut") > val (min, max, mean) = ml.execute(minMaxMeanScript).getTuple[Double, Double, > Double]("minOut", "maxOut", "meanOut") > {code} > Execution of the line > {code} > val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", > "maxOut", "meanOut") > {code} > in the spark-shell leads to the following error: > {code} > scala> val minMaxMeanScript = dml(minMaxMean).in("Xin", df, mm).out("minOut", > "maxOut", "meanOut") > [Stage 0:> (0 + 4) / > 4]17/02/16 13:37:10 WARN CodeGenerator: Error calculating stats of compiled > class. > java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:197) > at java.io.DataInputStream.readFully(DataInputStream.java:169) > at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1509) > at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:644) > at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:623) > at org.codehaus.janino.util.ClassFile.(ClassFile.java:280) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:967) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:964) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:964) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:936) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:998) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:995) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at > org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) > at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000) > at > org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:890) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:405) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:359) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.create(GenerateUnsafeProjection.scala:32) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:874) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.extractProjection$lzycompute(ExpressionEncoder.scala:266) > at >
[jira] [Commented] (SYSTEMML-1276) Resolve jersey class not found error with Spark2 and YARN
[ https://issues.apache.org/jira/browse/SYSTEMML-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871256#comment-15871256 ] Matthias Boehm commented on SYSTEMML-1276: -- it also fails on creating the spark context on certain hadoop distributions (see below). {code} Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:151) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156) at org.apache.spark.SparkContext.(SparkContext.scala:509) at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58) at org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext.initSparkContext(SparkExecutionContext.java:215) at org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext.getSparkContext(SparkExecutionContext.java:130) at org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext.getRDDHandleForMatrixObject(SparkExecutionContext.java:359) at org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext.getRDDHandleForVariable(SparkExecutionContext.java:304) at org.apache.sysml.runtime.controlprogram.context.SparkExecutionContext.getBinaryBlockRDDHandleForVariable(SparkExecutionContext.java:279) at org.apache.sysml.runtime.instructions.spark.MapmmSPInstruction.processInstruction(MapmmSPInstruction.java:117) at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:290) at org.apache.sysml.runtime.controlprogram.ProgramBlock.executeInstructions(ProgramBlock.java:221) at org.apache.sysml.runtime.controlprogram.ProgramBlock.execute(ProgramBlock.java:168) at org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:123) at org.apache.sysml.api.DMLScript.execute(DMLScript.java:665) at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:346) at org.apache.sysml.api.DMLScript.main(DMLScript.java:207) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357 {code} > Resolve jersey class not found error with Spark2 and YARN > - > > Key: SYSTEMML-1276 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1276 > Project: SystemML > Issue Type: Improvement > Components: Runtime >Affects Versions: SystemML 0.13 > Environment: Spark 2.x, Hadoop 2.7.3 >Reporter: Glenn Weidner >Assignee: Glenn Weidner > > This is a known issue as reported in [YARN-5271] and [SPARK-15343]. It was > observed during 0.13 performance testing and can be reproduced with following > example: > spark-submit --master yarn --deploy-mode client --class > org.apache.sysml.api.DMLScript ./systemml-0.13.0-incubating-SNAPSHOT.jar -f > ./scripts/utils/sample.dml -exec hybrid_spark -nvargs X=linRegData.csv > sv=perc.csv O=linRegDataParts ofmt=csv > Exception in thread "main" java.lang.NoClassDefFoundError: > com/sun/jersey/api/client/config/ClientConfig > at > org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55) > at >
[jira] [Resolved] (SYSTEMML-1271) Increment MLContext minimum Spark version to 2.1.0
[ https://issues.apache.org/jira/browse/SYSTEMML-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deron Eriksson resolved SYSTEMML-1271. -- Resolution: Fixed Fix Version/s: SystemML 0.13 Fixed by [PR392|https://github.com/apache/incubator-systemml/pull/392]. > Increment MLContext minimum Spark version to 2.1.0 > -- > > Key: SYSTEMML-1271 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1271 > Project: SystemML > Issue Type: Task > Components: APIs >Affects Versions: SystemML 0.13 >Reporter: Deron Eriksson >Assignee: Deron Eriksson > Fix For: SystemML 0.13 > > > For SystemML 0.13, set MLContext SYSTEMML_MINIMUM_SPARK_VERSION to 2.1.0. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (SYSTEMML-1271) Increment MLContext minimum Spark version to 2.1.0
[ https://issues.apache.org/jira/browse/SYSTEMML-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deron Eriksson closed SYSTEMML-1271. > Increment MLContext minimum Spark version to 2.1.0 > -- > > Key: SYSTEMML-1271 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1271 > Project: SystemML > Issue Type: Task > Components: APIs >Affects Versions: SystemML 0.13 >Reporter: Deron Eriksson >Assignee: Deron Eriksson > Fix For: SystemML 0.13 > > > For SystemML 0.13, set MLContext SYSTEMML_MINIMUM_SPARK_VERSION to 2.1.0. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (SYSTEMML-1238) Python test failing for LinearRegCG
[ https://issues.apache.org/jira/browse/SYSTEMML-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871170#comment-15871170 ] Niketan Pansare edited comment on SYSTEMML-1238 at 2/17/17 5:33 AM: I am able to reproduce this bug (not sure if it is) with command-line as well. Here is the output of GLM-predict (after running LinRegDS): {code} $ cat y_predicted.csv 189.09660701586185 133.3260601238074 157.3739106185465 132.8144037303023 135.88434209133283 154.81562865102103 194.2131709509127 136.3959984848379 125.13955782772601 137.41931127184807 178.35182275225503 123.60458864721075 152.7690030770007 141.0009060263837 116.95305553164462 161.46716176658717 144.58250078091928 144.58250078091928 170.67697684967874 117.4647119251497 {code} Here is the output of Python mllearn: {code} >>> import numpy as np >>> from pyspark.context import SparkContext >>> from pyspark.ml import Pipeline >>> from pyspark.ml.feature import HashingTF, Tokenizer from pyspark.sql import SparkSession from sklearn import datasets, metrics, neighbors >>> from pyspark.sql import SparkSession >>> from sklearn import datasets, metrics, neighbors from sklearn.datasets import fetch_20newsgroups from sklearn.feature_extraction.text import TfidfVectorizer from systemml.mllearn import LinearRegression, LogisticRegression, NaiveBayes, SVM diabetes = datasets.load_diabetes() diabetes_X = diabetes.data[:, np.newaxis, 2] diabetes_X_train = diabetes_X[:-20] diabetes_X_test = diabetes_X[-20:] diabetes_y_train = diabetes.target[:-20] diabetes_y_test = diabetes.target[-20:] sparkSession = SparkSession.builder.getOrCreate() regr = LinearRegression(sparkSession, solver="direct-solve") regr.fit(diabetes_X_train, diabetes_y_train)>>> from sklearn.datasets import fetch_20newsgroups >>> from sklearn.feature_extraction.text import TfidfVectorizer >>> >>> from systemml.mllearn import LinearRegression, LogisticRegression, >>> NaiveBayes, SVM >>> diabetes = datasets.load_diabetes() >>> diabetes_X = diabetes.data[:, np.newaxis, 2] >>> diabetes_X_train = diabetes_X[:-20] >>> diabetes_X_test = diabetes_X[-20:] >>> diabetes_y_train = diabetes.target[:-20] >>> diabetes_y_test = diabetes.target[-20:] >>> sparkSession = SparkSession.builder.getOrCreate() >>> regr = LinearRegression(sparkSession, solver="direct-solve") >>> regr.fit(diabetes_X_train, diabetes_y_train) Welcome to Apache SystemML! 17/02/16 22:39:21 WARN RewriteRemovePersistentReadWrite: Non-registered persistent write of variable 'X' (line 87). 17/02/16 22:39:21 WARN RewriteRemovePersistentReadWrite: Non-registered persistent write of variable 'y' (line 88). BEGIN LINEAR REGRESSION SCRIPT Reading X and Y... Calling the Direct Solver... Computing the statistics... AVG_TOT_Y,153.36255924170615 STDEV_TOT_Y,77.21853383600028 AVG_RES_Y,4.8020565933360324E-14 STDEV_RES_Y,67.06389890324985 DISPERSION,4497.566536105316 PLAIN_R2,0.24750834362605834 ADJUSTED_R2,0.24571669682516795 PLAIN_R2_NOBIAS,0.24750834362605834 ADJUSTED_R2_NOBIAS,0.24571669682516795 Writing the output matrix... END LINEAR REGRESSION SCRIPT lr >>> regr.predict(diabetes_X_test) 17/02/16 22:39:35 WARN Expression: WARNING: null -- line 149, column 4 -- Read input file does not exist on FS (local mode): 17/02/16 22:39:35 WARN Expression: Metadata file: .mtd not provided array([[ 188.84521284], [ 134.98127765], [ 158.20701117], [ 134.4871131 ], [ 137.45210036], [ 155.73618846], [ 193.78685827], [ 137.94626491], [ 127.07464496], [ 138.93459399], [ 178.46775744], [ 125.59215133], [ 153.75953028], [ 142.39374579], [ 119.16801227], [ 162.16032752], [ 145.8528976 ], [ 145.8528976 ], [ 171.05528929], [ 119.66217681]]) {code} To reproduce the command-line output, please dump the test data into csv: {code} import numpy as np from sklearn import datasets diabetes = datasets.load_diabetes() diabetes_X = diabetes.data[:, np.newaxis, 2] diabetes_X_train = diabetes_X[:-20] diabetes_X_test = diabetes_X[-20:] diabetes_y_train = diabetes.target[:-20] diabetes_y_test = diabetes.target[-20:] diabetes_X_test.tofile('X_test.csv', sep="\n") diabetes_X.tofile('X.csv', sep="\n") diabetes.target.tofile('y.csv', sep="\n") {code} And execute following commands (you may have to edit dml script to add format or create metadata file): {code} ~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit SystemML.jar -f LinearRegDS.dml -nvargs X=X.csv Y=y.csv B=B.csv fmt=csv icpt=1 tol=0.01 reg=1 ~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit SystemML.jar -f GLM-predict.dml -nvargs X=X_test.csv M=y_predicted.csv B=B.csv fmt=csv icpt=1 tol=0.01 reg=1 {code} I also tested using SystemML 0.12.0 and got the same predictions: {code} $ ~/spark-1.6.1-bin-hadoop2.6/bin/spark-submit systemml-0.12.0-incubating.jar -f LinearRegDS.dml -nvargs X=X.csv
[jira] [Updated] (SYSTEMML-1255) New fused operator tack+* in CP and Spark
[ https://issues.apache.org/jira/browse/SYSTEMML-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm updated SYSTEMML-1255: - Summary: New fused operator tack+* in CP and Spark (was: New fused operator tack+* in CP) > New fused operator tack+* in CP and Spark > - > > Key: SYSTEMML-1255 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1255 > Project: SystemML > Issue Type: Sub-task > Components: Compiler >Reporter: Matthias Boehm > > Similar to the existing tak+* operator, this new tack+* operator fused two or > three binary multiply operations and final column-wise aggregation > colSums(X*Y*Z) in order to avoid materializing the intermediates which is > very expensive compared to the cheap multiply and sum operations. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1211) Verify dependencies for Spark 2
[ https://issues.apache.org/jira/browse/SYSTEMML-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871175#comment-15871175 ] Deron Eriksson commented on SYSTEMML-1211: -- [PR394|https://github.com/apache/incubator-systemml/pull/394] addresses dependencies in the pom for Spark 2.1.0. > Verify dependencies for Spark 2 > --- > > Key: SYSTEMML-1211 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1211 > Project: SystemML > Issue Type: Sub-task > Components: Build >Reporter: Deron Eriksson >Assignee: Deron Eriksson > > With the migration to Spark 2, we should verify that the artifact assemblies > are properly handling all dependencies. > Also, we should verify that that artifact licenses properly include all > dependencies following the Spark 2 migration. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1257) Univar-Stats scripts failing due to Unexpected ValueType in ArithmeticInstruction
[ https://issues.apache.org/jira/browse/SYSTEMML-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869639#comment-15869639 ] Matthias Boehm commented on SYSTEMML-1257: -- As it turned out, it was a bug in the special case, where the matrix intermediate resulting from a relational comparison, i.e., (K>1)*maxs), was not bound to a target variable. For these cases, we created temporary targets during HOPs construction, where the value type was mistakenly set to boolean. Later this resulted in the incorrect data type as certain operations are only support over scalars. In comparison, ppred is only supported over matrices and hence worked correctly. > Univar-Stats scripts failing due to Unexpected ValueType in > ArithmeticInstruction > - > > Key: SYSTEMML-1257 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1257 > Project: SystemML > Issue Type: Bug >Reporter: Arvind Surve >Assignee: Matthias Boehm > > Running Release verification process > (http://apache.github.io/incubator-systemml/release-process.html) where > Univar-Stats.dml failing to execute. > Trying to run following example on Single Node Spark environment. > $ tar -xvzf systemml-0.11.0-incubating.tgz > $ cd systemml-0.11.0-incubating > $ export SPARK_HOME=/Users/deroneriksson/spark-1.5.1-bin-hadoop2.6 > $ $SPARK_HOME/bin/spark-submit SystemML.jar -f > scripts/datagen/genRandData4Univariate.dml -exec hybrid_spark -args 100 > 100 10 1 2 3 4 uni.mtx > $ echo '1' > uni-types.csv > $ echo '{"rows": 1, "cols": 1, "format": "csv"}' > uni-types.csv.mtd > $ $SPARK_HOME/bin/spark-submit SystemML.jar -f > scripts/algorithms/Univar-Stats.dml -exec hybrid_spark -nvargs X=uni.mtx > TYPES=uni-types.csv STATS=uni-stats.txt CONSOLE_OUTPUT=TRUE > Exception get is following: > Exception in thread "main" org.apache.sysml.api.DMLException: > org.apache.sysml.lops.LopsException: ERROR: line 64, column 21 -- Problem > generating simple inst - > CP°*°_Var27·SCALAR·DOUBLE·false°_Var25·SCALAR·DOUBLE·false°_Var28·SCALAR·BOOLEAN > at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:374) > at org.apache.sysml.api.DMLScript.main(DMLScript.java:221) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: org.apache.sysml.lops.LopsException: ERROR: line 64, column 21 -- > Problem generating simple inst - > CP°*°_Var27·SCALAR·DOUBLE·false°_Var25·SCALAR·DOUBLE·false°_Var28·SCALAR·BOOLEAN > at > org.apache.sysml.lops.compile.Dag.generateControlProgramJobs(Dag.java:1572) > at org.apache.sysml.lops.compile.Dag.doGreedyGrouping(Dag.java:1212) > at org.apache.sysml.lops.compile.Dag.getJobs(Dag.java:267) > at > org.apache.sysml.parser.DMLProgram.createRuntimeProgramBlock(DMLProgram.java:531) > at > org.apache.sysml.parser.DMLProgram.getRuntimeProgram(DMLProgram.java:207) > at org.apache.sysml.api.DMLScript.execute(DMLScript.java:633) > at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:360) > ... 10 more > Caused by: org.apache.sysml.runtime.DMLRuntimeException: Unexpected ValueType > in ArithmeticInstruction. > at > org.apache.sysml.runtime.instructions.cp.ArithmeticBinaryCPInstruction.parseInstruction(ArithmeticBinaryCPInstruction.java:80) > at > org.apache.sysml.runtime.instructions.CPInstructionParser.parseSingleInstruction(CPInstructionParser.java:321) > at > org.apache.sysml.runtime.instructions.InstructionParser.parseSingleInstruction(InstructionParser.java:45) > at > org.apache.sysml.lops.compile.Dag.generateControlProgramJobs(Dag.java:1559) > Following line in Univar-stats dml causing that exception: > maxDomainSize = max( (K > 1) * maxs ); > Its Boolean x Double, causing problem. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (SYSTEMML-1276) Resolve jersey class not found error with Spark2 and YARN
Glenn Weidner created SYSTEMML-1276: --- Summary: Resolve jersey class not found error with Spark2 and YARN Key: SYSTEMML-1276 URL: https://issues.apache.org/jira/browse/SYSTEMML-1276 Project: SystemML Issue Type: Improvement Components: Runtime Affects Versions: SystemML 0.13 Environment: Spark 2.x, Hadoop 2.7.3 Reporter: Glenn Weidner Assignee: Glenn Weidner This is a known issue as reported in [YARN-5271] and [SPARK-15343]. It was observed during 0.13 performance testing and can be reproduced with following example: spark-submit --master yarn --deploy-mode client --class org.apache.sysml.api.DMLScript ./systemml-0.13.0-incubating-SNAPSHOT.jar -f ./scripts/utils/sample.dml -exec hybrid_spark -nvargs X=linRegData.csv sv=perc.csv O=linRegDataParts ofmt=csv Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:182) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:169) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapred.ResourceMgrDelegate.serviceInit(ResourceMgrDelegate.java:103) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapred.ResourceMgrDelegate.(ResourceMgrDelegate.java:97) at org.apache.hadoop.mapred.YARNRunner.(YARNRunner.java:122) at org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34) at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:95) at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82) at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:475) at org.apache.hadoop.mapred.JobClient.(JobClient.java:454) at org.apache.sysml.runtime.controlprogram.parfor.stat.InfrastructureAnalyzer.analyzeHadoopCluster(InfrastructureAnalyzer.java:472) at org.apache.sysml.runtime.controlprogram.parfor.stat.InfrastructureAnalyzer.getRemoteParallelMapTasks(InfrastructureAnalyzer.java:114) at org.apache.sysml.runtime.controlprogram.parfor.stat.InfrastructureAnalyzer.getCkMaxMR(InfrastructureAnalyzer.java:298) at org.apache.sysml.runtime.controlprogram.parfor.opt.OptimizationWrapper.optimize(OptimizationWrapper.java:168) at org.apache.sysml.runtime.controlprogram.ParForProgramBlock.execute(ParForProgramBlock.java:550) at org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:145) at org.apache.sysml.api.DMLScript.execute(DMLScript.java:674) at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:354) at org.apache.sysml.api.DMLScript.main(DMLScript.java:199) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 32 more -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1276) Resolve jersey class not found error with Spark2 and YARN
[ https://issues.apache.org/jira/browse/SYSTEMML-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870239#comment-15870239 ] Glenn Weidner commented on SYSTEMML-1276: - [Submitted PR 393 | https://github.com/apache/incubator-systemml/pull/393]. > Resolve jersey class not found error with Spark2 and YARN > - > > Key: SYSTEMML-1276 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1276 > Project: SystemML > Issue Type: Improvement > Components: Runtime >Affects Versions: SystemML 0.13 > Environment: Spark 2.x, Hadoop 2.7.3 >Reporter: Glenn Weidner >Assignee: Glenn Weidner > > This is a known issue as reported in [YARN-5271] and [SPARK-15343]. It was > observed during 0.13 performance testing and can be reproduced with following > example: > spark-submit --master yarn --deploy-mode client --class > org.apache.sysml.api.DMLScript ./systemml-0.13.0-incubating-SNAPSHOT.jar -f > ./scripts/utils/sample.dml -exec hybrid_spark -nvargs X=linRegData.csv > sv=perc.csv O=linRegDataParts ofmt=csv > Exception in thread "main" java.lang.NoClassDefFoundError: > com/sun/jersey/api/client/config/ClientConfig > at > org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:182) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:169) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.serviceInit(ResourceMgrDelegate.java:103) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.(ResourceMgrDelegate.java:97) > at org.apache.hadoop.mapred.YARNRunner.(YARNRunner.java:122) > at > org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34) > at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:95) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75) > at org.apache.hadoop.mapred.JobClient.init(JobClient.java:475) > at org.apache.hadoop.mapred.JobClient.(JobClient.java:454) > at > org.apache.sysml.runtime.controlprogram.parfor.stat.InfrastructureAnalyzer.analyzeHadoopCluster(InfrastructureAnalyzer.java:472) > at > org.apache.sysml.runtime.controlprogram.parfor.stat.InfrastructureAnalyzer.getRemoteParallelMapTasks(InfrastructureAnalyzer.java:114) > at > org.apache.sysml.runtime.controlprogram.parfor.stat.InfrastructureAnalyzer.getCkMaxMR(InfrastructureAnalyzer.java:298) > at > org.apache.sysml.runtime.controlprogram.parfor.opt.OptimizationWrapper.optimize(OptimizationWrapper.java:168) > at > org.apache.sysml.runtime.controlprogram.ParForProgramBlock.execute(ParForProgramBlock.java:550) > at > org.apache.sysml.runtime.controlprogram.Program.execute(Program.java:145) > at org.apache.sysml.api.DMLScript.execute(DMLScript.java:674) > at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:354) > at org.apache.sysml.api.DMLScript.main(DMLScript.java:199) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > com.sun.jersey.api.client.config.ClientConfig > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 32 more -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (SYSTEMML-1246) Mismatched name in sparkDML.sh for main jar of -bin artifact
[ https://issues.apache.org/jira/browse/SYSTEMML-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Glenn Weidner reassigned SYSTEMML-1246: --- Assignee: Glenn Weidner > Mismatched name in sparkDML.sh for main jar of -bin artifact > > > Key: SYSTEMML-1246 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1246 > Project: SystemML > Issue Type: Bug >Reporter: Glenn Weidner >Assignee: Glenn Weidner > > For distributed release artifacts systemml-[0.12.0 | > 0.11.0]-incubating-bin.[tgz | zip]: > scripts/sparkDML.sh references > {code} > ${SYSTEMML_HOME}/SystemML.jar > {code} > but lib folder of archive contains > systemml-[0.12.0 | 0.11.0]-incubating.jar. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1250) Binary artifact missing antlr-runtime and wink-json4j classes
[ https://issues.apache.org/jira/browse/SYSTEMML-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870254#comment-15870254 ] Glenn Weidner commented on SYSTEMML-1250: - A fix will be incorporated in 0.12.1 as mentioned in [dev mail thread | https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg01399.html]. > Binary artifact missing antlr-runtime and wink-json4j classes > - > > Key: SYSTEMML-1250 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1250 > Project: SystemML > Issue Type: Bug > Components: Build >Reporter: Glenn Weidner > > The -bin artifact (both 0.11 and 0.12) are missing org/antlr/v4/runtime and > org/apache/wink/json4j classes. Since the -bin has a lib folder, the > corresponding jars can be included there. For comparison, these classes are > included in systemml-0.12.0-incubating.jar at > https://repository.apache.org/content/repositories/releases/org/apache/systemml/systemml/0.12.0-incubating/, > and although there is a jar by that same name inside the -bin artifact, it > does not include the classes. Similar content observed for > systemml-0.11.0-incubating. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1181) Update documentation with changes related to Spark 2.1.0
[ https://issues.apache.org/jira/browse/SYSTEMML-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870446#comment-15870446 ] Felix Schüler commented on SYSTEMML-1181: - [~deron] you double checked the docs the other day, right? Can we resolve this issue? > Update documentation with changes related to Spark 2.1.0 > > > Key: SYSTEMML-1181 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1181 > Project: SystemML > Issue Type: Documentation >Reporter: Arvind Surve >Assignee: Felix Schüler > > Update web page for any changes related to SystemML on Spark 2.1.0 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1181) Update documentation with changes related to Spark 2.1.0
[ https://issues.apache.org/jira/browse/SYSTEMML-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870495#comment-15870495 ] Deron Eriksson commented on SYSTEMML-1181: -- [~fschueler] no i have not double-checked the docs for 2.1.0. feel free to review them and make any needed updates. > Update documentation with changes related to Spark 2.1.0 > > > Key: SYSTEMML-1181 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1181 > Project: SystemML > Issue Type: Documentation >Reporter: Arvind Surve >Assignee: Felix Schüler > > Update web page for any changes related to SystemML on Spark 2.1.0 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (SYSTEMML-1275) Remove workaround flags disable_sparse disable_caching
[ https://issues.apache.org/jira/browse/SYSTEMML-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm resolved SYSTEMML-1275. -- Resolution: Fixed Assignee: Matthias Boehm Fix Version/s: SystemML 0.13 > Remove workaround flags disable_sparse disable_caching > -- > > Key: SYSTEMML-1275 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1275 > Project: SystemML > Issue Type: Bug >Reporter: Matthias Boehm >Assignee: Matthias Boehm > Fix For: SystemML 0.13 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (SYSTEMML-1274) Unnecessary rdd computation for nnz maintenance on write
[ https://issues.apache.org/jira/browse/SYSTEMML-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm resolved SYSTEMML-1274. -- Resolution: Done Assignee: Matthias Boehm Fix Version/s: SystemML 0.13 > Unnecessary rdd computation for nnz maintenance on write > > > Key: SYSTEMML-1274 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1274 > Project: SystemML > Issue Type: Bug > Components: Runtime >Reporter: Matthias Boehm >Assignee: Matthias Boehm > Fix For: SystemML 0.13 > > > Our primitive for writing binary block RDDs to HDFS (as used in guarded > collect), first computes the number of non-zeros (nnz) and subsequently > writes out the data. This leads to redundant RDD computation, which can be > expensive for large DAGs of RDD operations. Explicitly computing the nnz is > unnecessary as we could simply piggyback this computation onto the write via > an accumulator as done in multiple other places in SystemML. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (SYSTEMML-1273) Performance spark right indexing w/o aggregation
[ https://issues.apache.org/jira/browse/SYSTEMML-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm resolved SYSTEMML-1273. -- Resolution: Done Assignee: Matthias Boehm Fix Version/s: SystemML 0.13 > Performance spark right indexing w/o aggregation > > > Key: SYSTEMML-1273 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1273 > Project: SystemML > Issue Type: Task > Components: Runtime >Reporter: Matthias Boehm >Assignee: Matthias Boehm > Fix For: SystemML 0.13 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (SYSTEMML-1257) Univar-Stats scripts failing due to Unexpected ValueType in ArithmeticInstruction
[ https://issues.apache.org/jira/browse/SYSTEMML-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm resolved SYSTEMML-1257. -- Resolution: Fixed Fix Version/s: SystemML 0.13 > Univar-Stats scripts failing due to Unexpected ValueType in > ArithmeticInstruction > - > > Key: SYSTEMML-1257 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1257 > Project: SystemML > Issue Type: Bug >Reporter: Arvind Surve >Assignee: Matthias Boehm > Fix For: SystemML 0.13 > > > Running Release verification process > (http://apache.github.io/incubator-systemml/release-process.html) where > Univar-Stats.dml failing to execute. > Trying to run following example on Single Node Spark environment. > $ tar -xvzf systemml-0.11.0-incubating.tgz > $ cd systemml-0.11.0-incubating > $ export SPARK_HOME=/Users/deroneriksson/spark-1.5.1-bin-hadoop2.6 > $ $SPARK_HOME/bin/spark-submit SystemML.jar -f > scripts/datagen/genRandData4Univariate.dml -exec hybrid_spark -args 100 > 100 10 1 2 3 4 uni.mtx > $ echo '1' > uni-types.csv > $ echo '{"rows": 1, "cols": 1, "format": "csv"}' > uni-types.csv.mtd > $ $SPARK_HOME/bin/spark-submit SystemML.jar -f > scripts/algorithms/Univar-Stats.dml -exec hybrid_spark -nvargs X=uni.mtx > TYPES=uni-types.csv STATS=uni-stats.txt CONSOLE_OUTPUT=TRUE > Exception get is following: > Exception in thread "main" org.apache.sysml.api.DMLException: > org.apache.sysml.lops.LopsException: ERROR: line 64, column 21 -- Problem > generating simple inst - > CP°*°_Var27·SCALAR·DOUBLE·false°_Var25·SCALAR·DOUBLE·false°_Var28·SCALAR·BOOLEAN > at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:374) > at org.apache.sysml.api.DMLScript.main(DMLScript.java:221) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: org.apache.sysml.lops.LopsException: ERROR: line 64, column 21 -- > Problem generating simple inst - > CP°*°_Var27·SCALAR·DOUBLE·false°_Var25·SCALAR·DOUBLE·false°_Var28·SCALAR·BOOLEAN > at > org.apache.sysml.lops.compile.Dag.generateControlProgramJobs(Dag.java:1572) > at org.apache.sysml.lops.compile.Dag.doGreedyGrouping(Dag.java:1212) > at org.apache.sysml.lops.compile.Dag.getJobs(Dag.java:267) > at > org.apache.sysml.parser.DMLProgram.createRuntimeProgramBlock(DMLProgram.java:531) > at > org.apache.sysml.parser.DMLProgram.getRuntimeProgram(DMLProgram.java:207) > at org.apache.sysml.api.DMLScript.execute(DMLScript.java:633) > at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:360) > ... 10 more > Caused by: org.apache.sysml.runtime.DMLRuntimeException: Unexpected ValueType > in ArithmeticInstruction. > at > org.apache.sysml.runtime.instructions.cp.ArithmeticBinaryCPInstruction.parseInstruction(ArithmeticBinaryCPInstruction.java:80) > at > org.apache.sysml.runtime.instructions.CPInstructionParser.parseSingleInstruction(CPInstructionParser.java:321) > at > org.apache.sysml.runtime.instructions.InstructionParser.parseSingleInstruction(InstructionParser.java:45) > at > org.apache.sysml.lops.compile.Dag.generateControlProgramJobs(Dag.java:1559) > Following line in Univar-stats dml causing that exception: > maxDomainSize = max( (K > 1) * maxs ); > Its Boolean x Double, causing problem. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
[ https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870565#comment-15870565 ] Mike Dusenberry commented on SYSTEMML-1277: --- Update: Here's the official word on DataFrame conversions from the old {{mllib.Vector}} to {{ml.Vector}}: https://spark.apache.org/docs/2.0.0/ml-guide.html#breaking-changes. > DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices. > --- > > Key: SYSTEMML-1277 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1277 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Priority: Blocker > > Recently, we made the switch from the old {{mllib.Vector}} to the new > {{ml.Vector}} type. Unfortunately, this leaves us with the issue of no > longer recognizing DataFrames with {{mllib.Vector}} columns during > conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} > objects, (2) instead fall back on conversion to {{Frame}} objects, and then > (3) fail completely when the ensuing DML script is expecting to operated on > matrices. > Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, > sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the > following script will now fail (did not previously): > {code} > script = """ > # Scale images to [-1,1] > X = X / 255 > X = X * 2 - 1 > """ > outputs = ("X") > script = dml(script).input(X=X_df).output(*outputs) > X = ml.execute(script).get(*outputs) > X > {code} > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while validating script > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : > ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME > SCALAR > at > org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:415) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:386) > at > org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130) > at > org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567) > at > org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485) > ... 14 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
[ https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870577#comment-15870577 ] Mike Dusenberry commented on SYSTEMML-1277: --- Adding the following fixes the issue, so we should just add the similar wrappers at the Java MLContext layer. {code} # Convert DataFrame columns of type `mllib.Vector` to type `ml.Vector` X_df = MLUtils.convertVectorColumnsToML(X_df) {code} > DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices. > --- > > Key: SYSTEMML-1277 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1277 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Priority: Blocker > > Recently, we made the switch from the old {{mllib.Vector}} to the new > {{ml.Vector}} type. Unfortunately, this leaves us with the issue of no > longer recognizing DataFrames with {{mllib.Vector}} columns during > conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} > objects, (2) instead fall back on conversion to {{Frame}} objects, and then > (3) fail completely when the ensuing DML script is expecting to operated on > matrices. > Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, > sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the > following script will now fail (did not previously): > {code} > script = """ > # Scale images to [-1,1] > X = X / 255 > X = X * 2 - 1 > """ > outputs = ("X") > script = dml(script).input(X=X_df).output(*outputs) > X = ml.execute(script).get(*outputs) > X > {code} > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while validating script > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : > ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME > SCALAR > at > org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:415) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:386) > at > org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130) > at > org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567) > at > org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485) > ... 14 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
[ https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Dusenberry updated SYSTEMML-1277: -- Priority: Blocker (was: Major) > DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices. > --- > > Key: SYSTEMML-1277 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1277 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Priority: Blocker > > Recently, we made the switch from the old {{mllib.Vector}} to the new > {{ml.Vector}} type. Unfortunately, this leaves us with the issue of no > longer recognizing DataFrames with {{mllib.Vector}} columns during > conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} > objects, (2) instead fall back on conversion to {{Frame}} objects, and then > (3) fail completely when the ensuing DML script is expecting to operated on > matrices. > Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, > sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the > following script will now fail (did not previously): > {code} > script = """ > # Scale images to [-1,1] > X = X / 255 > X = X * 2 - 1 > """ > outputs = ("X") > script = dml(script).input(X=X_df).output(*outputs) > X = ml.execute(script).get(*outputs) > X > {code} > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while validating script > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : > ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME > SCALAR > at > org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:415) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:386) > at > org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130) > at > org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567) > at > org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485) > ... 14 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
Mike Dusenberry created SYSTEMML-1277: - Summary: DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices. Key: SYSTEMML-1277 URL: https://issues.apache.org/jira/browse/SYSTEMML-1277 Project: SystemML Issue Type: Bug Reporter: Mike Dusenberry Recently, we made the switch from the old {{mllib.Vector}} to the new {{ml.Vector}} type. Unfortunately, this leaves us with the issue of no longer recognizing DataFrames with {{mllib.Vector}} columns during conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} objects, (2) instead fall back on conversion to {{Frame}} objects, and then (3) fail completely when the ensuing DML script is expecting to operated on matrices. Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the following script will now fail (did not previously): {code} script = """ # Scale images to [-1,1] X = X / 255 X = X * 2 - 1 """ outputs = ("X") script = dml(script).input(X=X_df).output(*outputs) X = ml.execute(script).get(*outputs) X {code} {code} Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception occurred while validating script at org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487) at org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280) at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) ... 12 more Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME SCALAR at org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549) at org.apache.sysml.parser.Expression.computeDataType(Expression.java:415) at org.apache.sysml.parser.Expression.computeDataType(Expression.java:386) at org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130) at org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567) at org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140) at org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485) ... 14 more {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
[ https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Dusenberry updated SYSTEMML-1277: -- Affects Version/s: SystemML 0.13 > DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices. > --- > > Key: SYSTEMML-1277 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1277 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Priority: Blocker > > Recently, we made the switch from the old {{mllib.Vector}} to the new > {{ml.Vector}} type. Unfortunately, this leaves us with the issue of no > longer recognizing DataFrames with {{mllib.Vector}} columns during > conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} > objects, (2) instead fall back on conversion to {{Frame}} objects, and then > (3) fail completely when the ensuing DML script is expecting to operated on > matrices. > Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, > sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the > following script will now fail (did not previously): > {code} > script = """ > # Scale images to [-1,1] > X = X / 255 > X = X * 2 - 1 > """ > outputs = ("X") > script = dml(script).input(X=X_df).output(*outputs) > X = ml.execute(script).get(*outputs) > X > {code} > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while validating script > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : > ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME > SCALAR > at > org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:415) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:386) > at > org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130) > at > org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567) > at > org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485) > ... 14 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1277) DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices.
[ https://issues.apache.org/jira/browse/SYSTEMML-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870536#comment-15870536 ] Mike Dusenberry commented on SYSTEMML-1277: --- Also, just to follow up, the {{ml.Vector}} type should remain the standard default, as Spark is moving away from {{mllib.Vector}}. However, since DataFrames created and saved with {{mllib.Vector}} types can still be used (and often without the user realizing that a saved DataFrame would maintain a distinct separation between the two types), it's plausible that a user will try to run the same SystemML code with the same DataFrame as before, and thus run into issues now. We could just catch any {{mllib.Vector}} types and convert to {{ml.Vector}} with {{mllib.Vector.asML}} which does not make any copy of the data --> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.linalg.Vector. > DataFrames With `mllib.Vector` Columns Are No Longer Converted to Matrices. > --- > > Key: SYSTEMML-1277 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1277 > Project: SystemML > Issue Type: Bug >Affects Versions: SystemML 0.13 >Reporter: Mike Dusenberry >Priority: Blocker > > Recently, we made the switch from the old {{mllib.Vector}} to the new > {{ml.Vector}} type. Unfortunately, this leaves us with the issue of no > longer recognizing DataFrames with {{mllib.Vector}} columns during > conversion, and thus, we (1) do not correctly convert to SystemML {{Matrix}} > objects, (2) instead fall back on conversion to {{Frame}} objects, and then > (3) fail completely when the ensuing DML script is expecting to operated on > matrices. > Given a Spark {{DataFrame}} {{X_df}} of type {{DataFrame\[__INDEX: int, > sample: vector\]}}, where {{vector}} is of type {{mllib.Vector}}, the > following script will now fail (did not previously): > {code} > script = """ > # Scale images to [-1,1] > X = X / 255 > X = X * 2 - 1 > """ > outputs = ("X") > script = dml(script).input(X=X_df).output(*outputs) > X = ml.execute(script).get(*outputs) > X > {code} > {code} > Caused by: org.apache.sysml.api.mlcontext.MLContextException: Exception > occurred while validating script > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:487) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.execute(ScriptExecutor.java:280) > at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:293) > ... 12 more > Caused by: org.apache.sysml.parser.LanguageException: Invalid Parameters : > ERROR: null -- line 4, column 4 -- Invalid Datatypes for operation FRAME > SCALAR > at > org.apache.sysml.parser.Expression.raiseValidateError(Expression.java:549) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:415) > at > org.apache.sysml.parser.Expression.computeDataType(Expression.java:386) > at > org.apache.sysml.parser.BinaryExpression.validateExpression(BinaryExpression.java:130) > at > org.apache.sysml.parser.StatementBlock.validate(StatementBlock.java:567) > at > org.apache.sysml.parser.DMLTranslator.validateParseTree(DMLTranslator.java:140) > at > org.apache.sysml.api.mlcontext.ScriptExecutor.validateScript(ScriptExecutor.java:485) > ... 14 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1257) Univar-Stats scripts failing due to Unexpected ValueType in ArithmeticInstruction
[ https://issues.apache.org/jira/browse/SYSTEMML-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870691#comment-15870691 ] Deron Eriksson commented on SYSTEMML-1257: -- Thanks for the quick fix Matthias! > Univar-Stats scripts failing due to Unexpected ValueType in > ArithmeticInstruction > - > > Key: SYSTEMML-1257 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1257 > Project: SystemML > Issue Type: Bug >Reporter: Arvind Surve >Assignee: Matthias Boehm > Fix For: SystemML 0.13 > > > Running Release verification process > (http://apache.github.io/incubator-systemml/release-process.html) where > Univar-Stats.dml failing to execute. > Trying to run following example on Single Node Spark environment. > $ tar -xvzf systemml-0.11.0-incubating.tgz > $ cd systemml-0.11.0-incubating > $ export SPARK_HOME=/Users/deroneriksson/spark-1.5.1-bin-hadoop2.6 > $ $SPARK_HOME/bin/spark-submit SystemML.jar -f > scripts/datagen/genRandData4Univariate.dml -exec hybrid_spark -args 100 > 100 10 1 2 3 4 uni.mtx > $ echo '1' > uni-types.csv > $ echo '{"rows": 1, "cols": 1, "format": "csv"}' > uni-types.csv.mtd > $ $SPARK_HOME/bin/spark-submit SystemML.jar -f > scripts/algorithms/Univar-Stats.dml -exec hybrid_spark -nvargs X=uni.mtx > TYPES=uni-types.csv STATS=uni-stats.txt CONSOLE_OUTPUT=TRUE > Exception get is following: > Exception in thread "main" org.apache.sysml.api.DMLException: > org.apache.sysml.lops.LopsException: ERROR: line 64, column 21 -- Problem > generating simple inst - > CP°*°_Var27·SCALAR·DOUBLE·false°_Var25·SCALAR·DOUBLE·false°_Var28·SCALAR·BOOLEAN > at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:374) > at org.apache.sysml.api.DMLScript.main(DMLScript.java:221) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: org.apache.sysml.lops.LopsException: ERROR: line 64, column 21 -- > Problem generating simple inst - > CP°*°_Var27·SCALAR·DOUBLE·false°_Var25·SCALAR·DOUBLE·false°_Var28·SCALAR·BOOLEAN > at > org.apache.sysml.lops.compile.Dag.generateControlProgramJobs(Dag.java:1572) > at org.apache.sysml.lops.compile.Dag.doGreedyGrouping(Dag.java:1212) > at org.apache.sysml.lops.compile.Dag.getJobs(Dag.java:267) > at > org.apache.sysml.parser.DMLProgram.createRuntimeProgramBlock(DMLProgram.java:531) > at > org.apache.sysml.parser.DMLProgram.getRuntimeProgram(DMLProgram.java:207) > at org.apache.sysml.api.DMLScript.execute(DMLScript.java:633) > at org.apache.sysml.api.DMLScript.executeScript(DMLScript.java:360) > ... 10 more > Caused by: org.apache.sysml.runtime.DMLRuntimeException: Unexpected ValueType > in ArithmeticInstruction. > at > org.apache.sysml.runtime.instructions.cp.ArithmeticBinaryCPInstruction.parseInstruction(ArithmeticBinaryCPInstruction.java:80) > at > org.apache.sysml.runtime.instructions.CPInstructionParser.parseSingleInstruction(CPInstructionParser.java:321) > at > org.apache.sysml.runtime.instructions.InstructionParser.parseSingleInstruction(InstructionParser.java:45) > at > org.apache.sysml.lops.compile.Dag.generateControlProgramJobs(Dag.java:1559) > Following line in Univar-stats dml causing that exception: > maxDomainSize = max( (K > 1) * maxs ); > Its Boolean x Double, causing problem. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (SYSTEMML-1238) Python test failing for LinearRegCG
[ https://issues.apache.org/jira/browse/SYSTEMML-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871170#comment-15871170 ] Niketan Pansare commented on SYSTEMML-1238: --- I am able to reproduce this bug with command-line as well. Here is the output of GLM-predict (after running LinRegDS): {code} $ cat y_predicted.csv 189.09660701586185 133.3260601238074 157.3739106185465 132.8144037303023 135.88434209133283 154.81562865102103 194.2131709509127 136.3959984848379 125.13955782772601 137.41931127184807 178.35182275225503 123.60458864721075 152.7690030770007 141.0009060263837 116.95305553164462 161.46716176658717 144.58250078091928 144.58250078091928 170.67697684967874 117.4647119251497 {code} Here is the output of Python mllearn: {code} >>> import numpy as np >>> from pyspark.context import SparkContext >>> from pyspark.ml import Pipeline >>> from pyspark.ml.feature import HashingTF, Tokenizer from pyspark.sql import SparkSession from sklearn import datasets, metrics, neighbors >>> from pyspark.sql import SparkSession >>> from sklearn import datasets, metrics, neighbors from sklearn.datasets import fetch_20newsgroups from sklearn.feature_extraction.text import TfidfVectorizer from systemml.mllearn import LinearRegression, LogisticRegression, NaiveBayes, SVM diabetes = datasets.load_diabetes() diabetes_X = diabetes.data[:, np.newaxis, 2] diabetes_X_train = diabetes_X[:-20] diabetes_X_test = diabetes_X[-20:] diabetes_y_train = diabetes.target[:-20] diabetes_y_test = diabetes.target[-20:] sparkSession = SparkSession.builder.getOrCreate() regr = LinearRegression(sparkSession, solver="direct-solve") regr.fit(diabetes_X_train, diabetes_y_train)>>> from sklearn.datasets import fetch_20newsgroups >>> from sklearn.feature_extraction.text import TfidfVectorizer >>> >>> from systemml.mllearn import LinearRegression, LogisticRegression, >>> NaiveBayes, SVM >>> diabetes = datasets.load_diabetes() >>> diabetes_X = diabetes.data[:, np.newaxis, 2] >>> diabetes_X_train = diabetes_X[:-20] >>> diabetes_X_test = diabetes_X[-20:] >>> diabetes_y_train = diabetes.target[:-20] >>> diabetes_y_test = diabetes.target[-20:] >>> sparkSession = SparkSession.builder.getOrCreate() >>> regr = LinearRegression(sparkSession, solver="direct-solve") >>> regr.fit(diabetes_X_train, diabetes_y_train) Welcome to Apache SystemML! 17/02/16 22:39:21 WARN RewriteRemovePersistentReadWrite: Non-registered persistent write of variable 'X' (line 87). 17/02/16 22:39:21 WARN RewriteRemovePersistentReadWrite: Non-registered persistent write of variable 'y' (line 88). BEGIN LINEAR REGRESSION SCRIPT Reading X and Y... Calling the Direct Solver... Computing the statistics... AVG_TOT_Y,153.36255924170615 STDEV_TOT_Y,77.21853383600028 AVG_RES_Y,4.8020565933360324E-14 STDEV_RES_Y,67.06389890324985 DISPERSION,4497.566536105316 PLAIN_R2,0.24750834362605834 ADJUSTED_R2,0.24571669682516795 PLAIN_R2_NOBIAS,0.24750834362605834 ADJUSTED_R2_NOBIAS,0.24571669682516795 Writing the output matrix... END LINEAR REGRESSION SCRIPT lr >>> regr.predict(diabetes_X_test) 17/02/16 22:39:35 WARN Expression: WARNING: null -- line 149, column 4 -- Read input file does not exist on FS (local mode): 17/02/16 22:39:35 WARN Expression: Metadata file: .mtd not provided array([[ 188.84521284], [ 134.98127765], [ 158.20701117], [ 134.4871131 ], [ 137.45210036], [ 155.73618846], [ 193.78685827], [ 137.94626491], [ 127.07464496], [ 138.93459399], [ 178.46775744], [ 125.59215133], [ 153.75953028], [ 142.39374579], [ 119.16801227], [ 162.16032752], [ 145.8528976 ], [ 145.8528976 ], [ 171.05528929], [ 119.66217681]]) {code} To reproduce the command-line output, please dump the test data into csv: {code} import numpy as np from sklearn import datasets diabetes = datasets.load_diabetes() diabetes_X = diabetes.data[:, np.newaxis, 2] diabetes_X_train = diabetes_X[:-20] diabetes_X_test = diabetes_X[-20:] diabetes_y_train = diabetes.target[:-20] diabetes_y_test = diabetes.target[-20:] diabetes_X_test.tofile('X_test.csv', sep="\n") diabetes_X.tofile('X.csv', sep="\n") diabetes.target.tofile('y.csv', sep="\n") {code} And execute following commands (you may have to edit dml script to add format or create metadata file): {code} ~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit SystemML.jar -f LinearRegDS.dml -nvargs X=X.csv Y=y.csv B=B.csv fmt=csv icpt=1 tol=0.01 reg=1 ~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit SystemML.jar -f GLM-predict.dml -nvargs X=X_test.csv M=y_predicted.csv B=B.csv fmt=csv icpt=1 tol=0.01 reg=1 {code} > Python test failing for LinearRegCG > --- > > Key: SYSTEMML-1238 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1238 > Project: SystemML > Issue Type: Bug >
[jira] [Comment Edited] (SYSTEMML-1238) Python test failing for LinearRegCG
[ https://issues.apache.org/jira/browse/SYSTEMML-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871170#comment-15871170 ] Niketan Pansare edited comment on SYSTEMML-1238 at 2/17/17 5:36 AM: I am able to reproduce this bug (not sure if it is) with command-line as well. Here is the output of GLM-predict (after running LinRegDS): {code} $ cat y_predicted.csv 189.09660701586185 133.3260601238074 157.3739106185465 132.8144037303023 135.88434209133283 154.81562865102103 194.2131709509127 136.3959984848379 125.13955782772601 137.41931127184807 178.35182275225503 123.60458864721075 152.7690030770007 141.0009060263837 116.95305553164462 161.46716176658717 144.58250078091928 144.58250078091928 170.67697684967874 117.4647119251497 {code} Here is the output of Python mllearn: {code} >>> import numpy as np >>> from pyspark.context import SparkContext >>> from pyspark.ml import Pipeline >>> from pyspark.ml.feature import HashingTF, Tokenizer from pyspark.sql import SparkSession from sklearn import datasets, metrics, neighbors >>> from pyspark.sql import SparkSession >>> from sklearn import datasets, metrics, neighbors from sklearn.datasets import fetch_20newsgroups from sklearn.feature_extraction.text import TfidfVectorizer from systemml.mllearn import LinearRegression, LogisticRegression, NaiveBayes, SVM diabetes = datasets.load_diabetes() diabetes_X = diabetes.data[:, np.newaxis, 2] diabetes_X_train = diabetes_X[:-20] diabetes_X_test = diabetes_X[-20:] diabetes_y_train = diabetes.target[:-20] diabetes_y_test = diabetes.target[-20:] sparkSession = SparkSession.builder.getOrCreate() regr = LinearRegression(sparkSession, solver="direct-solve") regr.fit(diabetes_X_train, diabetes_y_train)>>> from sklearn.datasets import fetch_20newsgroups >>> from sklearn.feature_extraction.text import TfidfVectorizer >>> >>> from systemml.mllearn import LinearRegression, LogisticRegression, >>> NaiveBayes, SVM >>> diabetes = datasets.load_diabetes() >>> diabetes_X = diabetes.data[:, np.newaxis, 2] >>> diabetes_X_train = diabetes_X[:-20] >>> diabetes_X_test = diabetes_X[-20:] >>> diabetes_y_train = diabetes.target[:-20] >>> diabetes_y_test = diabetes.target[-20:] >>> sparkSession = SparkSession.builder.getOrCreate() >>> regr = LinearRegression(sparkSession, solver="direct-solve") >>> regr.fit(diabetes_X_train, diabetes_y_train) Welcome to Apache SystemML! 17/02/16 22:39:21 WARN RewriteRemovePersistentReadWrite: Non-registered persistent write of variable 'X' (line 87). 17/02/16 22:39:21 WARN RewriteRemovePersistentReadWrite: Non-registered persistent write of variable 'y' (line 88). BEGIN LINEAR REGRESSION SCRIPT Reading X and Y... Calling the Direct Solver... Computing the statistics... AVG_TOT_Y,153.36255924170615 STDEV_TOT_Y,77.21853383600028 AVG_RES_Y,4.8020565933360324E-14 STDEV_RES_Y,67.06389890324985 DISPERSION,4497.566536105316 PLAIN_R2,0.24750834362605834 ADJUSTED_R2,0.24571669682516795 PLAIN_R2_NOBIAS,0.24750834362605834 ADJUSTED_R2_NOBIAS,0.24571669682516795 Writing the output matrix... END LINEAR REGRESSION SCRIPT lr >>> regr.predict(diabetes_X_test) 17/02/16 22:39:35 WARN Expression: WARNING: null -- line 149, column 4 -- Read input file does not exist on FS (local mode): 17/02/16 22:39:35 WARN Expression: Metadata file: .mtd not provided array([[ 188.84521284], [ 134.98127765], [ 158.20701117], [ 134.4871131 ], [ 137.45210036], [ 155.73618846], [ 193.78685827], [ 137.94626491], [ 127.07464496], [ 138.93459399], [ 178.46775744], [ 125.59215133], [ 153.75953028], [ 142.39374579], [ 119.16801227], [ 162.16032752], [ 145.8528976 ], [ 145.8528976 ], [ 171.05528929], [ 119.66217681]]) {code} To reproduce the command-line output, please dump the test data into csv: {code} import numpy as np from sklearn import datasets diabetes = datasets.load_diabetes() diabetes_X = diabetes.data[:, np.newaxis, 2] diabetes_X_train = diabetes_X[:-20] diabetes_X_test = diabetes_X[-20:] diabetes_y_train = diabetes.target[:-20] diabetes_y_test = diabetes.target[-20:] diabetes_X_test.tofile('X_test.csv', sep="\n") diabetes_X.tofile('X.csv', sep="\n") diabetes.target.tofile('y.csv', sep="\n") {code} And execute following commands (you may have to edit dml script to add format or create metadata file): {code} ~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit SystemML.jar -f LinearRegDS.dml -nvargs X=X.csv Y=y.csv B=B.csv fmt=csv icpt=1 tol=0.01 reg=1 ~/spark-2.1.0-bin-hadoop2.7/bin/spark-submit SystemML.jar -f GLM-predict.dml -nvargs X=X_test.csv M=y_predicted.csv B=B.csv fmt=csv icpt=1 tol=0.01 reg=1 {code} I also tested using SystemML 0.12.0 and got the same predictions: {code} $ ~/spark-1.6.1-bin-hadoop2.6/bin/spark-submit systemml-0.12.0-incubating.jar -f LinearRegDS.dml -nvargs X=X.csv
[jira] [Updated] (SYSTEMML-1280) Restore and deprecate SQLContext methods
[ https://issues.apache.org/jira/browse/SYSTEMML-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deron Eriksson updated SYSTEMML-1280: - Description: SYSTEMML-1194 replaced SQLContext with SparkSession in SystemML for Spark 2.1.0 since SQLContext is deprecated. Restore the old Java SQLContext method signatures in case any users are using SystemML methods and are unable to use SparkSessions (SparkSessions are described in https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html) Classes where this applies: old MLContext class (whole class is deprecated) old MLMatrix class (whole class is deprecated) old MLOutput class (whole class is deprecated) FrameRDDConverterUtils (this is a non-API class) RDDConverterUtils (this is a non-API class) RDDConverterUtilsExt (this is a non-API class) In non-API classes, these SQLContext methods should be marked as deprecated and removed in a future version of SystemML (1.0) since SparkSessions should generally be used with Spark 2. As mentioned in SQLContext documentation, "As of Spark 2.0, this is replaced by SparkSession. However, we are keeping the class here for backward compatibility." was: SYSTEMML-1194 replaced SQLContext with SparkSession in SystemML for Spark 2.1.0 since SQLContext is deprecated. Restore the old Java SQLContext method signatures in case any users are using SystemML methods and are unable to use SparkSessions (SparkSessions are generally easy to create, as described in https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html) Classes where this applies: old MLContext class (whole class is deprecated) old MLMatrix class (whole class is deprecated) old MLOutput class (whole class is deprecated) FrameRDDConverterUtils (this is a non-API class) RDDConverterUtils (this is a non-API class) RDDConverterUtilsExt (this is a non-API class) In non-API classes, these SQLContext methods should be marked as deprecated and removed in a future version of SystemML (1.0) since SparkSessions should generally be used with Spark 2. As mentioned in SQLContext documentation, "As of Spark 2.0, this is replaced by SparkSession. However, we are keeping the class here for backward compatibility." > Restore and deprecate SQLContext methods > > > Key: SYSTEMML-1280 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1280 > Project: SystemML > Issue Type: Task > Components: APIs, Runtime >Reporter: Deron Eriksson >Assignee: Deron Eriksson > > SYSTEMML-1194 replaced SQLContext with SparkSession in SystemML for Spark > 2.1.0 since SQLContext is deprecated. > Restore the old Java SQLContext method signatures in case any users are using > SystemML methods and are unable to use SparkSessions (SparkSessions are > described in > https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html) > Classes where this applies: > old MLContext class (whole class is deprecated) > old MLMatrix class (whole class is deprecated) > old MLOutput class (whole class is deprecated) > FrameRDDConverterUtils (this is a non-API class) > RDDConverterUtils (this is a non-API class) > RDDConverterUtilsExt (this is a non-API class) > In non-API classes, these SQLContext methods should be marked as deprecated > and removed in a future version of SystemML (1.0) since SparkSessions should > generally be used with Spark 2. As mentioned in SQLContext documentation, "As > of Spark 2.0, this is replaced by SparkSession. However, we are keeping the > class here for backward compatibility." -- This message was sent by Atlassian JIRA (v6.3.15#6346)