I’ve tested this out and found these issues. Firstly,
http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame # Code should be changed to this – it does not work in pyspark CLI otherwise rdd = sc.parallelize(["1","2","3"]) Data = Row('first') df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d))) Secondly, z.show() doesn’t seem to work properly in Python – I see the same error below: “AttributeError: 'DataFrame' object has no attribute '_get_object_id'" #Python/PySpark – doesn’t work rdd = sc.parallelize(["1","2","3"]) Data = Row('first') df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d))) print df print df.collect() z.show(df) AttributeError: 'DataFrame' object has no attribute ‘_get_object_id' #Scala – this works val a = sc.parallelize(List("1", "2", "3")) val df = a.toDF() z.show(df) Created JIRA https://issues.apache.org/jira/browse/ZEPPELIN-185 On Thu, Jul 23, 2015 at 5:35 AM -0700, "IT CTO" <goi....@gmail.com> wrote: I am trying the simple thing in pyspark: %pyspark rdd = sc.parallelize(["1","2","3"]) print(rdd.collect()) z.show(sqlContext.createDataFrame(rdd)) AND keep getting error: Traceback (most recent call last): File "/tmp/zeppelin_pyspark.py", line 116, in <module> eval(compiledCode) File "<string>", line 3, in <module> File "/home/cto/Downloads/incubator-zeppelin/interpreter/spark/pyspark/pyspark.zip/pyspark/sql/context.py", line 339, in createDataFrame _verify_type(row, schema) File "/home/cto/Downloads/incubator-zeppelin/interpreter/spark/pyspark/pyspark.zip/pyspark/sql/types.py", line 1013, in _verify_type % (dataType, type(obj))) TypeError: StructType(List(StructField(Number,StringType,true))) can not accept object in type <type 'str'> This show be easy... Eran