I’ve tested this out and found these issues. Firstly,


http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame


# Code should be changed to this – it does not work in pyspark CLI otherwise


rdd = sc.parallelize(["1","2","3"])


Data = Row('first')


df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))



Secondly,


z.show() doesn’t seem to work properly in Python – I see the same error below: 
“AttributeError: 'DataFrame' object has no attribute '_get_object_id'"


#Python/PySpark – doesn’t work


rdd = sc.parallelize(["1","2","3"])


Data = Row('first')


df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))


print df


print df.collect()


z.show(df)


AttributeError: 'DataFrame' object has no attribute ‘_get_object_id'



#Scala – this works


val a = sc.parallelize(List("1", "2", "3"))


val df = a.toDF()


z.show(df)



Created JIRA  https://issues.apache.org/jira/browse/ZEPPELIN-185



On Thu, Jul 23, 2015 at 5:35 AM -0700, "IT CTO" <goi....@gmail.com> wrote:
I am trying the simple thing in pyspark:
%pyspark

rdd = sc.parallelize(["1","2","3"])
print(rdd.collect())
z.show(sqlContext.createDataFrame(rdd))

AND keep getting error:
Traceback (most recent call last): File "/tmp/zeppelin_pyspark.py", line
116, in <module> eval(compiledCode) File "<string>", line 3, in <module>
File
"/home/cto/Downloads/incubator-zeppelin/interpreter/spark/pyspark/pyspark.zip/pyspark/sql/context.py",
line 339, in createDataFrame _verify_type(row, schema) File
"/home/cto/Downloads/incubator-zeppelin/interpreter/spark/pyspark/pyspark.zip/pyspark/sql/types.py",
line 1013, in _verify_type % (dataType, type(obj)))
TypeError: StructType(List(StructField(Number,StringType,true))) can not
accept object in type <type 'str'>

This show be easy...
Eran

Reply via email to