I am using pyspark and I want to test the sql function. I get this Java tree error. Any ideas.
iwaggDF.registerTempTable('iwagg') hierDF.registerTempTable('hier') res3=sqlc.sql('select name, sum(amount) as amount from iwagg a left join hier b on a.segm=b.segm group by name order by sum(amount) desc').collect() File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/sql/dataframe.py", line 281, in collect port = self._sc._jvm.PythonRDD.collectAndServe(self._jdf.javaToPython().rdd()) File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/py4j/java_gateway.py", line 538, in __call__ self.target_id, self.name) File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/py4j/protocol.py", line 300, in get_return_value format(target_id, '.', name), value) py4j.protocol.Py4JJavaError: An error occurred while calling o75.javaToPython. : org.apache.spark.sql.catalyst.errors.package$*TreeNodeException*: sort, tree: Sort [SUM(amount#326) DESC], true Exchange (RangePartitioning 200) Aggregate false, [name#6], [name#6,CombineSum(PartialSum#328) AS amount#326] Exchange (HashPartitioning 200) Aggregate true, [name#6], [name#6,SUM(CAST(amount#3, DoubleType)) AS PartialSum#328] Project [name#6,amount#3] HashOuterJoin [segm#1], [segm#5], LeftOuter, None Exchange (HashPartitioning 200) Project [amount#3,segm#1] PhysicalRDD [cust#0,segm#1,wsd#2,amount#3,trips#4], MapPartitionsRDD[7] at applySchemaToPythonRDD at NativeMethodAccessorImpl.java:-2 Exchange (HashPartitioning 200) PhysicalRDD [segm#5,name#6], MapPartitionsRDD[15] at applySchemaToPythonRDD at NativeMethodAccessorImpl.java:-2 Dirk