Re: Minor bug in dynamic form and question on limits of results

Su She Mon, 29 Jun 2015 12:05:19 -0700

Hey Moon/All,

sorry for the late reply.

This is the problem I'm encountering when trying to register Hive as a
temptable. It seems that it cannot find a table, I have bolded this in the
error message that I've c/p below. Please let me know if this is the best
way for doing this. My end goal is to execute:

*z.show(hc.sql("select * from test1"))*

Thank you for the help!

*//Code:*
import sys.process._
import org.apache.spark.sql.hive._
val hc = new HiveContext(sc)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)

hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test1(x string, y string, time
string, z int, v int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION
'hdfs://.us-west-1.compute.internal:8020/user/flume/'").registerTempTable("test2")

val results = hc.sql("select * from test2 limit 100") //have also tried
test1

*//everything works fine upto here, but due to lazy evaluation, i guess
that doesn't mean much*
results.map(t => "Name: " + t(0)).collect().foreach(println)

results: org.apache.spark.sql.SchemaRDD = SchemaRDD[41] at RDD at
SchemaRDD.scala:108 == Query Plan == == Physical Plan == Limit 100 !Project
[result#105] NativeCommand CREATE EXTERNAL TABLE IF NOT EXISTS test1(date
int, date_time string, time string, sensor int, value int) ROW FORMAT
DELIMITED FIELDS TERMINATED BY ',' LOCATION
'hdfs://ip-10-0-2-216.us-west-1.compute.internal:8020/user/flume/',
[result#112] org.apache.spark.SparkException: Job aborted due to stage
failure: Task 0 in stage 7.0 failed 1 times, most recent failure: Lost task
0.0 in stage 7.0 (TID 4, localhost):
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding
attribute, tree: result#105 at
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
at
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:47)
at
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:46)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)
at
org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:46)
at
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:54)
at
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:54)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at
scala.collection.AbstractTraversable.map(Traversable.scala:105) at
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.<init>(Projection.scala:54)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:105)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:105)
at
org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:44)
at
org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:43)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618) at
org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618) at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at
org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at
org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745) *Caused by:
java.lang.RuntimeException: Couldn't find result#105 in [result#112]* at
scala.sys.package$.error(package.scala:27) at
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:53)
at
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:47)
at
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:46)
... 33 more

Thank you!

On Thu, Jun 25, 2015 at 11:51 AM, moon soo Lee <m...@apache.org> wrote:

> Hi,
>
> Yes, %sql function is only for the tables that has been registered.
> Using DataFrame is basically similar to what currently you're doing. It
> needs registerTempTable.
>
> Could you share little bit about your problem when registering tables?
>
> And really appreciate for reporting a bug!
>
> Thanks,
> moon
>
> On Wed, Jun 24, 2015 at 11:28 PM Corneau Damien <cornead...@apache.org>
> wrote:
>
>> Yes, you can change the number of records. The default value is 1000
>>
>> On Thu, Jun 25, 2015 at 2:32 PM, Nihal Bhagchandani <
>> nihal_bhagchand...@yahoo.com> wrote:
>>
>>> Hi Su,
>>>
>>> as per my understanding you can change the limit of 1000record from the
>>> interpreter section by setting up the value for variable 
>>> "zeppelin.spark.maxResult",
>>> moon could you please confirm my understanding?
>>>
>>> Regards
>>> Nihal
>>>
>>>
>>>
>>>   On Thursday, 25 June 2015 10:00 AM, Su She <suhsheka...@gmail.com>
>>> wrote:
>>>
>>>
>>> Hello Everyone,
>>>
>>> Excited to be making progress, and thanks for the community for
>>> providing help along the way.This stuff is all really cool.
>>>
>>>
>>> *Questions:*
>>>
>>> *1) *I noticed that the limit for the visual representation is 1000
>>> results. Are there any short term plans to expand the limit? It seemed a
>>> little on the low side as many of the reasons for working with spark/hadoop
>>> is to work with large datasets.
>>>
>>> *2) *When can I use the %sql function? Is it only on tables that have
>>> been registered? I have been having trouble registering tables unless I do:
>>>
>>> // Apply the schema to the RDD.val peopleSchemaRDD = 
>>> sqlContext.applySchema(rowRDD, schema)
>>> // Register the SchemaRDD as a 
>>> table.peopleSchemaRDD.registerTempTable("people")
>>>
>>>
>>> I am having lots of trouble registering tables through HiveContext or
>>> even duplicating the Zeppelin tutorial, is this issue mitigated by using
>>> DataFrames ( I am planning to move to 1.3 very soon)?
>>>
>>>
>>> *Bug:*
>>>
>>> When I do this:
>>> z.show(sqlContext.sql("select * from sensortable limit 100"))
>>>
>>> I get the table, but I also get text results in the bottom, please see
>>> attached image. For some reason, if the image doesn't go through, i
>>> basically get the table, and everything works well, but the select
>>> statement also returns text (regardless of its 100 results or all)
>>>
>>>
>>> Thank you !
>>>
>>> Best,
>>>
>>> Su
>>>
>>>
>>>
>>

Re: Minor bug in dynamic form and question on limits of results

Reply via email to