[jira] [Comment Edited] (SYSTEMML-869) Error converting Matrix to Spark DataFrame with MLContext After Subsequent Executions

Matthias Boehm (JIRA) Sun, 18 Sep 2016 17:43:15 -0700

    [ 
https://issues.apache.org/jira/browse/SYSTEMML-869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15501892#comment-15501892
 ]


Matthias Boehm edited comment on SYSTEMML-869 at 9/19/16 12:42 AM:
-------------------------------------------------------------------

somehow this slipped through - I just saw it while going over all open bugs. 
I'll have a look into this tomorrow but it needs some more time as I like to 
create a reproducible testcase in our testsuite first.

The initial guess is that the second script cleans up the input variables 
passed from the first script once they are no longer needed as they appear to 
the runtime to be intermediates. Hence, the third script fails.  


was (Author: mboehm7):
somehow this slipped through - I just saw it while going over all open bugs. 
I'll have a look into this tomorrow.

> Error converting Matrix to Spark DataFrame with MLContext After Subsequent 
> Executions
> -------------------------------------------------------------------------------------
>
>                 Key: SYSTEMML-869
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-869
>             Project: SystemML
>          Issue Type: Bug
>          Components: APIs
>            Reporter: Mike Dusenberry
>            Assignee: Matthias Boehm
>            Priority: Blocker
>             Fix For: SystemML 0.11
>
>
> Running the LeNet deep learning example notebook with the new {{MLContext}} 
> API in Python results in the below error when converting the resulting 
> {{Matrix}} to a Spark {{DataFrame}} via the {{toDF()}} call.  This only 
> occurs with the large LeNet example, and not for the similar "Softmax 
> Classifier" example that has a smaller model. 
> {code}
> Py4JJavaError: An error occurred while calling o34.asDataFrame.
> : org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> file:/Users/mwdusenb/Documents/Code/systemML/deep_learning/examples/scratch_space/_p85157_9.31.116.142/_t0/temp816_133
>     at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251)
>     at 
> org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45)
>     at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
>     at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
>     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>     at scala.Option.getOrElse(Option.scala:120)
>     at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>     at scala.Option.getOrElse(Option.scala:120)
>     at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
>     at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
>     at scala.Option.getOrElse(Option.scala:120)
>     at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>     at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:65)
>     at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$groupByKey$3.apply(PairRDDFunctions.scala:642)
>     at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$groupByKey$3.apply(PairRDDFunctions.scala:642)
>     at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>     at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>     at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
>     at 
> org.apache.spark.rdd.PairRDDFunctions.groupByKey(PairRDDFunctions.scala:641)
>     at org.apache.spark.api.java.JavaPairRDD.groupByKey(JavaPairRDD.scala:538)
>     at 
> org.apache.sysml.runtime.instructions.spark.utils.RDDConverterUtilsExt.binaryBlockToDataFrame(RDDConverterUtilsExt.java:502)
>     at 
> org.apache.sysml.api.mlcontext.MLContextConversionUtil.matrixObjectToDataFrame(MLContextConversionUtil.java:762)
>     at org.apache.sysml.api.mlcontext.Matrix.asDataFrame(Matrix.java:111)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:497)
>     at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>     at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
>     at py4j.Gateway.invoke(Gateway.java:259)
>     at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>     at py4j.commands.CallCommand.execute(CallCommand.java:79)
>     at py4j.GatewayConnection.run(GatewayConnection.java:209)
>     at java.lang.Thread.run(Thread.java:745)
> {code}
> To setup, I used the instructions [here | 
> https://github.com/dusenberrymw/systemml-nn/tree/master/examples], running 
> the {{Example - MNIST LeNet.ipynb}} notebook.  Additionally, to speed up the 
> actual training time, I modified [line 84 & 85 of mnist_lenet.dml | 
> https://github.com/dusenberrymw/systemml-nn/blob/master/examples/mnist_lenet.dml#L84]
>  to set the {{epochs = 1}}, and {{iters = 1}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (SYSTEMML-869) Error converting Matrix to Spark DataFrame with MLContext After Subsequent Executions

Reply via email to