Thanks Nihal for the help, I think I can upgrade to Spark 1.4 next week, hopefully that works!
On Wed, Jun 17, 2015 at 9:11 PM, Nihal Bhagchandani < nihal_bhagchand...@yahoo.com> wrote: > Hi Su, > > I have already switched to spark 1.4.0, from spark 1.3.0 the concept of > DataFrame introduced, which gives more flexibility to manage data in > different formats. > what is the possibility that you move your zeppelin to spark 1.4.0? > you can build your zeppelin by running the following command > > $ sudo mvn clean package -Pspark-1.4 -Dhadoop.version=2.2.0 -Phadoop-2.2 > -DskipTests > > for more details : apache/incubator-zeppelin > <https://github.com/apache/incubator-zeppelin> > > > [image: image] <https://github.com/apache/incubator-zeppelin> > > > > > > apache/incubator-zeppelin <https://github.com/apache/incubator-zeppelin> > incubator-zeppelin - Mirror of Apache Zeppelin (Incubating) > View on github.com <https://github.com/apache/incubator-zeppelin> > Preview by Yahoo > > > Regards > Nihal > > > > > > On Wednesday, 17 June 2015 1:41 PM, Su She <suhsheka...@gmail.com> > wrote: > > > Couple clarifications: > > 1) Was able to use sqlContext.sql when using "programitically specifying > schema" in this documentation: > https://spark.apache.org/docs/1.2.0/sql-programming-guide.html > > 2) Here is the notebook I ran using this, i was able to run sql commands, > but not the %sql commands > > *import sys.process._* > *import org.apache.spark.sql._* > > *// sc is an existing SparkContext.* > *val sqlContext = new org.apache.spark.sql.SQLContext(sc)* > > *val wiki = sc.textFile("data/wiki.csv")* > > *val schemaString = "date language title pagecounts"* > > *import org.apache.spark.sql._* > > *val schema =StructType(schemaString.split(" ").map(fieldName => > StructField(fieldName, StringType, true)))* > > *val rowRDD = wiki.map(_.split(" ")).map(line => Row(line(0).substring(0, > 8),line(1), line(2), line(3)))* > > *val wikiSchemaRDD = sqlContext.applySchema(rowRDD, schema)* > > *wikiSchemaRDD.registerTempTable("people")* > > *val results = sqlContext.sql("SELECT * FROM people")* > > *results.take(10)* > > so results returns the correct results > > however when I try: > > %sql select date from people > > java.lang.reflect.InvocationTargetException > > Hope this adds clarity to my issues, thank you > > Best, > > Su > > On Wed, Jun 17, 2015 at 12:47 AM, Su She <suhsheka...@gmail.com> wrote: > > Hello Nihal, > > This is what I got: > > sc.version: 1.2.1 > > I couldn't get the name of the tables: > > I tried it with this line in the code as well as commented out:val > sqlContext = new org.apache.spark.sql.SQLContext(sc) > > error: value tableNames is not a member of org.apache.spark.sql.SQLContext > sqlContext.tableNames().foreach(println) > > However, I don't think the table is registered with sqlContext. For > example, if you check the Zeppelin tutorial, you cannot run: > > val results = sqlContext.sql("select * from bank") //error: table bank not > found, however you can run %sql select * from bank > > When I followed this: > https://spark.apache.org/docs/1.2.0/sql-programming-guide.html, I was > able to use sqlContext.sql to query results, but I couldn't use %sql in > that case :( > > Thank again for the help and please let me know how I can proceed :) > > Thanks, > > Su > > On Wed, Jun 17, 2015 at 12:10 AM, Nihal Bhagchandani < > nihal_bhagchand...@yahoo.com> wrote: > > Hi Su, > > could you please check if your bank1 get register as table? > > -Nihal > > > > On Wednesday, 17 June 2015 11:54 AM, Su She <suhsheka...@gmail.com> > wrote: > > > Thanks Nihal for the suggestion, I kinda realized what the problem is.I > realized that zeppelin will use the hivecontext unless it is set to false. > So I set it to false in env.sh and the 3 charts at the bottom of the > tutorial work as then SQLContext becomes the default instead of HiveContext. > > However, I am having trouble running my own version of this notebook. > > As I was having problems with the notebook, I c/p the code from the > tutorial and instead of bank-full.csv I used wiki.csv. I followed the same > format as the tutorial and i kept on getting errors. I kept on trying to > simplify code and this is where I ended up with: > > *PARA1:* > > val wiki = bankText.map(s => s.split(" ")).map( > s => Bank(s(3).toInt, > "secondary", > "third", > "fourth", > s(4).toInt > ) > ) > wiki.registerTempTable("bank1") > > *PARA2:* > > wiki.take(10) > > *Result:* > > res213: Array[Bank] = Array(Bank(2,secondary,third,fourth,9980), > Bank(1,secondary,third,fourth,465), Bank(1,secondary,third,fourth,16086), > > COMPARE THIS TO bank.take(10) from the tutorial: > > res188: Array[Bank] = Array(Bank(58,management,married,tertiary,2143), > Bank(44,technician,single,secondary,29), > Bank(33,entrepreneur,married,secondary,2), > Bank(47,blue-collar,married,unknown,1506), Bank(33,unknown,single,unknown,1) > > *PARA3:* > > *%sql * > *select age, count(1) value * > *from bank1* > *where age < 33 * > *group by age * > *order by age* > > > *java.lang.reflect.InvocationTargetException* > > I'm not sure what I'm doing wrong. The new array has the same data format, > but different values. It doesn't seem like there are any extra spaces and > such. > > On Tue, Jun 16, 2015 at 10:55 PM, Nihal Bhagchandani < > nihal_bhagchand...@yahoo.com> wrote: > > Hi Su, > > it seems like your table is not getting registered. > > can you try the following: > if you have used the following line > "val sqlContext = new org.apache.spark.sql.SQLContext(sc)" > > I would suggest to comment it, as zeppelin creates sqlContext byDefault. > > if you didnt have the above line do write following lines at the end of > paragraph and run: > > sqlContext.tableNames().foreach(println) // this should print all the > tables register with current sqlContext on output section. > > you can also check you spark version by running following command > sc.version > > -Nihal > > > > > > On Wednesday, 17 June 2015 10:01 AM, Su She <suhsheka...@gmail.com> > wrote: > > > Hello, > > excited to get Zeppelin up and running! > > 1) I was not able to go through the Zeppelin tutorial notebook. I did > remove toDF which made that paragraph work, but the 3 graphs at the > bottom all returned the InvocationTargetException > > 2) From a couple other threads on the archive it seems like this error > means that it isn't connected to Spark: > > a) I am running it locally > > b) I created a new notebook and I was able to run spark commands and > create a table using sqlContext and query it, so this means that it is > connected to Spark right? > > c) I am able to do: > > val results = sqlContext.sql("SELECT * FROM wiki") > > but i can't do: > > %sql select pagecounts, count(1) from wiki > > 3) I am a bit confused on how to get the visualizations. I understand > the %table command, but do I use %table when running Spark jobs or do > I use %sql? > > Thanks! > > -Su > > > > > > > > > >