With a local spark instance built with hive support, (-Pyarn -Phadoop-2.6
-Dhadoop.version=2.6.0 -Phive -Phive-thriftserver)
The following script/sequence works in Pyspark without any error against
1.6.x, but fails with 2.x.
people = sc.parallelize(["Michael,30", "Andy,12", "Justin,19"])
peoplePartsRDD = people.map(lambda p: p.split(","))
peopleRDD = peoplePartsRDD.map(lambda p: pyspark.sql.Row(name=p,
sqlContext2 = SQLContext(sc)
people2 = sc.parallelize(["Abcd,40", "Efgh,14", "Ijkl,16"])
peoplePartsRDD2 = people2.map(lambda l: l.split(","))
peopleRDD2 = peoplePartsRDD2.map(lambda p: pyspark.sql.Row(fname=p,
peopleDF2 = sqlContext2.createDataFrame(peopleRDD2) # <==== error here
The error goes away if sqlContext2 is replaced with sqlContext in the
error line. Is this a regression, or has something changed that makes this
the expected behavior in Spark 2.x ?