btw, it should work better in python if you first convert it to Row as the example from the documentation (http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection), and use sqlContext.createDataFrame(): lines = sc.textFile("examples/src/main/resources/people.txt") parts = lines.map(lambda l: l.split(",")) people = parts.map(lambda p: Row(name=p[0], age=int(p[1])))
# Infer the schema schemaPeople = sqlContext.createDataFrame(people) From: felixcheun...@hotmail.com To: users@zeppelin.incubator.apache.org Subject: RE: Print RDD as table Date: Mon, 20 Jul 2015 04:14:36 -0700 Just a thought, try this instead? wordcount = sc.textFile("some path to file")wcDF = wordcount.toDF()z.show(wcDF) From: goi....@gmail.com Date: Mon, 20 Jul 2015 08:54:44 +0000 Subject: Re: Print RDD as table To: users@zeppelin.incubator.apache.org Here is the code first is a paragraph in pySpark which fails and second is one in scala which works %pyspark#This paragraph fails wordcount = (sc.textFile("some path to file"))wcDF = wordcount.toDF() #here is where the code failsz.show(wcDF) btw, the same code works in scala: //This paragraph works wellval wordcount = (sc.textFile("some path to file"))val wcDF = wordcount.toDF() z.show(wcDF) On Mon, Jul 20, 2015 at 10:34 AM <felixcheun...@hotmail.com> wrote: Could you post more of your code leading to that? On Sun, Jul 19, 2015 at 10:19 PM -0700, "IT CTO" <goi....@gmail.com> wrote: I am trying to convert the Python RDD to DF but I am getting and error: myRDD_DF = myRDD.toDF() error: AtributeError("'list' object is not attribute '_get_object_id'",) As much as I read this is something to do with python and java conversion but I don't know.... Any help? On Mon, Jul 20, 2015 at 4:21 AM <felixcheun...@hotmail.com> wrote: You should try to convert the RDD into a DataFrame. Zeppelin can then display it as a table automatically On Sun, Jul 19, 2015 at 1:55 AM -0700, "IT CTO" <goi....@gmail.com> wrote: Hi, I am using pySpark with zeppelin and would like to print the RDD as a table to be able to display in the display system. I know how to loop through the records and generate the %table string and print it but I am looking for a more elegant way. I tried z.show(MyRdd) but it failed: ... 'PipelinedRDD object has no attribute '_get_object_id any help? Eran