[jira] [Commented] (SYSTEMML-650) Error while trying to load data as a DataFrame in PySpark

Mike Dusenberry (JIRA) Tue, 26 Apr 2016 10:12:33 -0700

    [ 
https://issues.apache.org/jira/browse/SYSTEMML-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258478#comment-15258478
 ]


Mike Dusenberry commented on SYSTEMML-650:
------------------------------------------

Great, thanks [~kartikkanna...@gmail.com].

> Error while trying to load data as a DataFrame in PySpark
> ---------------------------------------------------------
>
>                 Key: SYSTEMML-650
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-650
>             Project: SystemML
>          Issue Type: Bug
>    Affects Versions: SystemML 0.9
>         Environment: Cloudera Distribution CDH 5.5.0
> Hadoop 2.6.0
> Spark 1.5.0
> SystemML 0.9.0
> Python 2.7.6
>            Reporter: Kartik Kannapur
>              Labels: documentation, newbie
>             Fix For: SystemML 0.10
>
>
> I tried to run the sample code for "Jupyter (PySpark) Notebook Example - 
> Poisson Nonnegative Matrix Factorization"  as provided in the documentation.
> The code fails at the line where we try to run the PNMF script on SystemML 
> with Spark:
> {code:xml}
> outputs = ml.executeScript(pnmf, {"X": X_train, "maxiter": 100, "rank": 10}, 
> ["W", "H", "losses"])
> {code}
> The script seems to fail at the first line itself, where *X_train* is passed 
> as a DataFrame into the variable *X*.
> The error message is as below:
> {code:xml}
> /tmp/spark-e7974be5-4438-44b2-ae83-574b2c2bad21/userFiles-5a3c99c5-9bb7-46fe-af83-5119f9358e0f/SystemML.py
>  in executeScript(self, dmlScript, nargs, outputs, configFilePath)
>     126 
>     127             # Execute script
> --> 128             jml_out = self.ml.executeScript(dmlScript, nargs, 
> configFilePath)
>     129             ml_out = MLOutput(jml_out, self.sc)
>     130             return ml_out
> /opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py
>  in __call__(self, *args)
>     536         answer = self.gateway_client.send_command(command)
>     537         return_value = get_return_value(answer, self.gateway_client,
> --> 538                 self.target_id, self.name)
>     539 
>     540         for temp_arg in temp_args:
> /opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/spark/python/pyspark/sql/utils.pyc
>  in deco(*a, **kw)
>      34     def deco(*a, **kw):
>      35         try:
> ---> 36             return f(*a, **kw)
>      37         except py4j.protocol.Py4JJavaError as e:
>      38             s = e.java_exception.toString()
> /opt/cloudera/parcels/CDH-5.5.0-1.cdh5.5.0.p0.8/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py
>  in get_return_value(answer, gateway_client, target_id, name)
>     302                 raise Py4JError(
>     303                     'An error occurred while calling {0}{1}{2}. 
> Trace:\n{3}\n'.
> --> 304                     format(target_id, '.', name, value))
>     305         else:
>     306             raise Py4JError(
> Py4JError: An error occurred while calling o79.executeScript. Trace:
> py4j.Py4JException: Method executeScript([class java.lang.String, class 
> java.util.HashMap, null]) does not exist
>       at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
>       at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
>       at py4j.Gateway.invoke(Gateway.java:252)
>       at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>       at py4j.commands.CallCommand.execute(CallCommand.java:79)
>       at py4j.GatewayConnection.run(GatewayConnection.java:207)
>       at java.lang.Thread.run(Thread.java:745)
> {code}
> Is there any workaround for this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SYSTEMML-650) Error while trying to load data as a DataFrame in PySpark

Reply via email to