Hi, I tried as suggested, apparently sqlContext is not recognized in pySpark paragraph. No problem access it in spark paragraph
When I try to import SQLContext and create one from sc sqlContext = SQLContext(sc) wordcount = (sc.textFile("some path to file")) wcDF = sqlContext.createDataFrame(wordcount) z.show(wcDF) I am back to the original error Eran On Mon, Jul 20, 2015 at 2:24 PM Felix Cheung <felixcheun...@hotmail.com> wrote: > btw, it should work better in python if you first convert it to Row as the > example from the documentation ( > http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection), > and use sqlContext.createDataFrame(): > > lines = sc.textFile("examples/src/main/resources/people.txt") > parts = lines.map(lambda l: l.split(",")) > people = parts.map(lambda p: Row(name=p[0], age=int(p[1]))) > > # Infer the schema > schemaPeople = sqlContext.createDataFrame(people) > ------------------------------ > From: felixcheun...@hotmail.com > To: users@zeppelin.incubator.apache.org > Subject: RE: Print RDD as table > Date: Mon, 20 Jul 2015 04:14:36 -0700 > > > Just a thought, try this instead? > wordcount = sc.textFile("some path to file") > wcDF = wordcount.toDF() > z.show(wcDF) > > > ------------------------------ > From: goi....@gmail.com > Date: Mon, 20 Jul 2015 08:54:44 +0000 > Subject: Re: Print RDD as table > To: users@zeppelin.incubator.apache.org > > Here is the code first is a paragraph in pySpark which fails and second is > one in scala which works > > %pyspark > #This paragraph fails > wordcount = (sc.textFile("some path to file")) > wcDF = wordcount.toDF() #here is where the code fails > z.show(wcDF) > > btw, the same code works in scala: > > //This paragraph works well > val wordcount = (sc.textFile("some path to file")) > val wcDF = wordcount.toDF() > z.show(wcDF) > > > > On Mon, Jul 20, 2015 at 10:34 AM <felixcheun...@hotmail.com> wrote: > > Could you post more of your code leading to that? > > > > On Sun, Jul 19, 2015 at 10:19 PM -0700, "IT CTO" <goi....@gmail.com> > wrote: > > I am trying to convert the Python RDD to DF but I am getting and error: > > myRDD_DF = myRDD.toDF() > > error: AtributeError("'list' object is not attribute '_get_object_id'",) > > As much as I read this is something to do with python and java > conversion but I don't know.... > Any help? > > On Mon, Jul 20, 2015 at 4:21 AM <felixcheun...@hotmail.com> wrote: > > You should try to convert the RDD into a DataFrame. Zeppelin can then > display it as a table automatically > > > > On Sun, Jul 19, 2015 at 1:55 AM -0700, "IT CTO" <goi....@gmail.com> wrote: > > Hi, > I am using pySpark with zeppelin and would like to print the RDD as a > table to be able to display in the display system. > I know how to loop through the records and generate the %table string and > print it but I am looking for a more elegant way. > I tried z.show(MyRdd) but it failed: > ... 'PipelinedRDD object has no attribute '_get_object_id > > any help? > Eran > >