Am I overlooking something? This doesn't seem right:

x = sc.parallelize([dict(k=1, v="Evert"), dict(k=2, v="Erik")]).toDF()
y = sc.parallelize([dict(k=1, v="Ruud"), dict(k=3, v="Vincent")]).toDF()
x.registerTempTable('x')
y.registerTempTable('y')
sqlContext.sql("select y.v, x.v FROM x INNER JOIN y ON x.k=y.k").collect()

Out[26]: [Row(v=u'Evert', v=u'Evert')]

May just be because I'm behind; I'm on:

Spark 1.5.0-SNAPSHOT (git revision 27ef854) built for Hadoop 2.6.0 Build
flags: -Pyarn -Psparkr -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive
-Phive-thriftserver -DskipTests

Can somebody check whether the above code does work on the latest release?

Thanks!
Evert

Reply via email to