Hi,
The code we're executing constructs pyspark.ml.Pipeline objects
concurrently in separate python threads.
We observe that the stages fed to the pipeline object get corrupted i.e
the stages supplied to a Pipeline object in one thread appear inside a
different Pipeline object constructed in
)
at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown
Source)
Regards,
Vinayak Joshi
From: Vinayak Joshi5/India/IBM@IBMIN
To: "user.spark" <user@spark.apache.org>
Date: 01/12/2016 10:53 PM
Subject:Spark 2.x Pyspark Spark SQL createDataframe Error
Wi
With a local spark instance built with hive support, (-Pyarn -Phadoop-2.6
-Dhadoop.version=2.6.0 -Phive -Phive-thriftserver)
The following script/sequence works in Pyspark without any error against
1.6.x, but fails with 2.x.
people = sc.parallelize(["Michael,30", "Andy,12", "Justin,19"])
Thanks Michal.
I have submitted a Spark issue and PR based on my understanding of why
this changed in Spark 2.0. If interested you can follow it on
https://issues.apache.org/jira/browse/SPARK-18687
Regards,
Vinayak.
From: Michal Šenkýř <bina...@gmail.com>
To: Vinayak Joshi5/Ind