Hey Moon/All, sorry for the late reply.
This is the problem I'm encountering when trying to register Hive as a temptable. It seems that it cannot find a table, I have bolded this in the error message that I've c/p below. Please let me know if this is the best way for doing this. My end goal is to execute: *z.show(hc.sql("select * from test1"))* Thank you for the help! *//Code:* import sys.process._ import org.apache.spark.sql.hive._ val hc = new HiveContext(sc) val sqlContext = new org.apache.spark.sql.SQLContext(sc) hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test1(x string, y string, time string, z int, v int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 'hdfs://.us-west-1.compute.internal:8020/user/flume/'").registerTempTable("test2") val results = hc.sql("select * from test2 limit 100") //have also tried test1 *//everything works fine upto here, but due to lazy evaluation, i guess that doesn't mean much* results.map(t => "Name: " + t(0)).collect().foreach(println) results: org.apache.spark.sql.SchemaRDD = SchemaRDD[41] at RDD at SchemaRDD.scala:108 == Query Plan == == Physical Plan == Limit 100 !Project [result#105] NativeCommand CREATE EXTERNAL TABLE IF NOT EXISTS test1(date int, date_time string, time string, sensor int, value int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 'hdfs://ip-10-0-2-216.us-west-1.compute.internal:8020/user/flume/', [result#112] org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7.0 failed 1 times, most recent failure: Lost task 0.0 in stage 7.0 (TID 4, localhost): org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: result#105 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:47) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:46) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135) at org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:46) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:54) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:54) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.<init>(Projection.scala:54) at org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:105) at org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:105) at org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:44) at org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:43) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) *Caused by: java.lang.RuntimeException: Couldn't find result#105 in [result#112]* at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:53) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:47) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:46) ... 33 more Thank you! On Thu, Jun 25, 2015 at 11:51 AM, moon soo Lee <m...@apache.org> wrote: > Hi, > > Yes, %sql function is only for the tables that has been registered. > Using DataFrame is basically similar to what currently you're doing. It > needs registerTempTable. > > Could you share little bit about your problem when registering tables? > > And really appreciate for reporting a bug! > > Thanks, > moon > > On Wed, Jun 24, 2015 at 11:28 PM Corneau Damien <cornead...@apache.org> > wrote: > >> Yes, you can change the number of records. The default value is 1000 >> >> On Thu, Jun 25, 2015 at 2:32 PM, Nihal Bhagchandani < >> nihal_bhagchand...@yahoo.com> wrote: >> >>> Hi Su, >>> >>> as per my understanding you can change the limit of 1000record from the >>> interpreter section by setting up the value for variable >>> "zeppelin.spark.maxResult", >>> moon could you please confirm my understanding? >>> >>> Regards >>> Nihal >>> >>> >>> >>> On Thursday, 25 June 2015 10:00 AM, Su She <suhsheka...@gmail.com> >>> wrote: >>> >>> >>> Hello Everyone, >>> >>> Excited to be making progress, and thanks for the community for >>> providing help along the way.This stuff is all really cool. >>> >>> >>> *Questions:* >>> >>> *1) *I noticed that the limit for the visual representation is 1000 >>> results. Are there any short term plans to expand the limit? It seemed a >>> little on the low side as many of the reasons for working with spark/hadoop >>> is to work with large datasets. >>> >>> *2) *When can I use the %sql function? Is it only on tables that have >>> been registered? I have been having trouble registering tables unless I do: >>> >>> // Apply the schema to the RDD.val peopleSchemaRDD = >>> sqlContext.applySchema(rowRDD, schema) >>> // Register the SchemaRDD as a >>> table.peopleSchemaRDD.registerTempTable("people") >>> >>> >>> I am having lots of trouble registering tables through HiveContext or >>> even duplicating the Zeppelin tutorial, is this issue mitigated by using >>> DataFrames ( I am planning to move to 1.3 very soon)? >>> >>> >>> *Bug:* >>> >>> When I do this: >>> z.show(sqlContext.sql("select * from sensortable limit 100")) >>> >>> I get the table, but I also get text results in the bottom, please see >>> attached image. For some reason, if the image doesn't go through, i >>> basically get the table, and everything works well, but the select >>> statement also returns text (regardless of its 100 results or all) >>> >>> >>> Thank you ! >>> >>> Best, >>> >>> Su >>> >>> >>> >>