Re: pyspark not working for me...

IT CTO Sat, 25 Jul 2015 12:46:52 -0700

Thanks for the help.
Fixing the z.show() in pySpark will help a lot my users :-)
Eran


On Sat, Jul 25, 2015 at 10:25 PM <felixcheun...@hotmail.com> wrote:

>
> I’ve tested this out and found these issues. Firstly,
>
> http://
> <http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame>
> spark.apache.org
> <http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame>
> /docs/latest/
> <http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame>
> api
> <http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame>
> /python/
> <http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame>
> pyspark.sql.html
> <http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame>
> ?highlight=
> <http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame>
> createdataframe#pyspark.sql.SQLContext.createDataFrame
> <http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame>
>
> # Code should be changed to this – it does not work in pyspark CLI
> otherwise
>
> rdd = sc.parallelize(["1","2","3"])
>
> Data = Row('first')
>
> df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
>
> Secondly,
>
> z.show() doesn’t seem to work properly in Python – I see the same error
> below: “AttributeError: 'DataFrame' object has no attribute
> '_get_object_id'"
>
> #Python/PySpark – doesn’t work
>
> rdd = sc.parallelize(["1","2","3"])
>
> Data = Row('first')
>
> df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
>
> print df
>
> print df.collect()
>
> z.show(df)
>
> AttributeError: 'DataFrame' object has no attribute ‘_get_object_id'
>
> #Scala – this works
>
> val a = sc.parallelize(List("1", "2", "3"))
>
> val df = a.toDF()
>
> z.show(df)
>
> Created JIRA  https <https://issues.apache.org/jira/browse/ZEPPELIN-185>
> :// <https://issues.apache.org/jira/browse/ZEPPELIN-185>issues.apache.org
> <https://issues.apache.org/jira/browse/ZEPPELIN-185>/
> <https://issues.apache.org/jira/browse/ZEPPELIN-185>jira
> <https://issues.apache.org/jira/browse/ZEPPELIN-185>/browse/ZEPPELIN-185
> <https://issues.apache.org/jira/browse/ZEPPELIN-185>
>
>
>
> On Thu, Jul 23, 2015 at 5:35 AM -0700, "IT CTO" <goi....@gmail.com> wrote:
>
>  I am trying the simple thing in pyspark:
> %pyspark
>
>  rdd = sc.parallelize(["1","2","3"])
> print(rdd.collect())
> z.show(sqlContext.createDataFrame(rdd))
>
>  AND keep getting error:
> Traceback (most recent call last): File "/tmp/zeppelin_pyspark.py", line
> 116, in <module> eval(compiledCode) File "<string>", line 3, in <module>
> File
> "/home/cto/Downloads/incubator-zeppelin/interpreter/spark/pyspark/pyspark.zip/pyspark/sql/context.py",
> line 339, in createDataFrame _verify_type(row, schema) File
> "/home/cto/Downloads/incubator-zeppelin/interpreter/spark/pyspark/pyspark.zip/pyspark/sql/types.py",
> line 1013, in _verify_type % (dataType, type(obj)))
> TypeError: StructType(List(StructField(Number,StringType,true))) can not
> accept object in type <type 'str'>
>
>  This show be easy...
> Eran
>

Re: pyspark not working for me...

Reply via email to