Thanks for the help. Fixing the z.show() in pySpark will help a lot my users :-) Eran
On Sat, Jul 25, 2015 at 10:25 PM <felixcheun...@hotmail.com> wrote: > > I’ve tested this out and found these issues. Firstly, > > http:// > <http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame> > spark.apache.org > <http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame> > /docs/latest/ > <http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame> > api > <http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame> > /python/ > <http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame> > pyspark.sql.html > <http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame> > ?highlight= > <http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame> > createdataframe#pyspark.sql.SQLContext.createDataFrame > <http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame> > > # Code should be changed to this – it does not work in pyspark CLI > otherwise > > rdd = sc.parallelize(["1","2","3"]) > > Data = Row('first') > > df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d))) > > Secondly, > > z.show() doesn’t seem to work properly in Python – I see the same error > below: “AttributeError: 'DataFrame' object has no attribute > '_get_object_id'" > > #Python/PySpark – doesn’t work > > rdd = sc.parallelize(["1","2","3"]) > > Data = Row('first') > > df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d))) > > print df > > print df.collect() > > z.show(df) > > AttributeError: 'DataFrame' object has no attribute ‘_get_object_id' > > #Scala – this works > > val a = sc.parallelize(List("1", "2", "3")) > > val df = a.toDF() > > z.show(df) > > Created JIRA https <https://issues.apache.org/jira/browse/ZEPPELIN-185> > :// <https://issues.apache.org/jira/browse/ZEPPELIN-185>issues.apache.org > <https://issues.apache.org/jira/browse/ZEPPELIN-185>/ > <https://issues.apache.org/jira/browse/ZEPPELIN-185>jira > <https://issues.apache.org/jira/browse/ZEPPELIN-185>/browse/ZEPPELIN-185 > <https://issues.apache.org/jira/browse/ZEPPELIN-185> > > > > On Thu, Jul 23, 2015 at 5:35 AM -0700, "IT CTO" <goi....@gmail.com> wrote: > > I am trying the simple thing in pyspark: > %pyspark > > rdd = sc.parallelize(["1","2","3"]) > print(rdd.collect()) > z.show(sqlContext.createDataFrame(rdd)) > > AND keep getting error: > Traceback (most recent call last): File "/tmp/zeppelin_pyspark.py", line > 116, in <module> eval(compiledCode) File "<string>", line 3, in <module> > File > "/home/cto/Downloads/incubator-zeppelin/interpreter/spark/pyspark/pyspark.zip/pyspark/sql/context.py", > line 339, in createDataFrame _verify_type(row, schema) File > "/home/cto/Downloads/incubator-zeppelin/interpreter/spark/pyspark/pyspark.zip/pyspark/sql/types.py", > line 1013, in _verify_type % (dataType, type(obj))) > TypeError: StructType(List(StructField(Number,StringType,true))) can not > accept object in type <type 'str'> > > This show be easy... > Eran >