Re: Minor bug in dynamic form and question on limits of results

moon soo Lee Tue, 30 Jun 2015 11:08:17 -0700

Hi,

Please try not create hc, sqlContext manually, and use zeppelin created
sqlContext. After run


sqlContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test1(x string, y
string, time string, z int, v int) ROW FORMAT DELIMITED FIELDS TERMINATED
BY ',' LOCATION 'hdfs://.us-west-1.compute.internal:8020/user/flume/'")

it supposed to access table 'test1' in your query.

You can also do registerTempTable("test2"), for accessing table 'test2',
but  it supposed from valid dataframe. So not

sqlContext("CREATE EXTERNAL TABLE .... ").registerTempTable("test2")

but like this

sqlContext("select * from test1").registerTempTable("test1")

Tell me if it helps.

Best,
moon


On Mon, Jun 29, 2015 at 12:05 PM Su She <suhsheka...@gmail.com> wrote:

> Hey Moon/All,
>
> sorry for the late reply.
>
> This is the problem I'm encountering when trying to register Hive as a
> temptable. It seems that it cannot find a table, I have bolded this in the
> error message that I've c/p below. Please let me know if this is the best
> way for doing this. My end goal is to execute:
>
> *z.show(hc.sql("select * from test1"))*
>
> Thank you for the help!
>
> *//Code:*
> import sys.process._
> import org.apache.spark.sql.hive._
> val hc = new HiveContext(sc)
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>
> hc.sql("CREATE EXTERNAL TABLE IF NOT EXISTS test1(x string, y string, time
> string, z int, v int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
> LOCATION
> 'hdfs://.us-west-1.compute.internal:8020/user/flume/'").registerTempTable("test2")
>
> val results = hc.sql("select * from test2 limit 100") //have also tried
> test1
>
> *//everything works fine upto here, but due to lazy evaluation, i guess
> that doesn't mean much*
> results.map(t => "Name: " + t(0)).collect().foreach(println)
>
> results: org.apache.spark.sql.SchemaRDD = SchemaRDD[41] at RDD at
> SchemaRDD.scala:108 == Query Plan == == Physical Plan == Limit 100 !Project
> [result#105] NativeCommand CREATE EXTERNAL TABLE IF NOT EXISTS test1(date
> int, date_time string, time string, sensor int, value int) ROW FORMAT
> DELIMITED FIELDS TERMINATED BY ',' LOCATION
> 'hdfs://ip-10-0-2-216.us-west-1.compute.internal:8020/user/flume/',
> [result#112] org.apache.spark.SparkException: Job aborted due to stage
> failure: Task 0 in stage 7.0 failed 1 times, most recent failure: Lost task
> 0.0 in stage 7.0 (TID 4, localhost):
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding
> attribute, tree: result#105 at
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
> at
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:47)
> at
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:46)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)
> at
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:46)
> at
> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:54)
> at
> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:54)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at
> scala.collection.AbstractTraversable.map(Traversable.scala:105) at
> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.<init>(Projection.scala:54)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:105)
> at
> org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:105)
> at
> org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:44)
> at
> org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:43)
> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618) at
> org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:618) at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at
> org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280) at
> org.apache.spark.rdd.RDD.iterator(RDD.scala:247) at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> at org.apache.spark.scheduler.Task.run(Task.scala:56) at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745) *Caused by:
> java.lang.RuntimeException: Couldn't find result#105 in [result#112]* at
> scala.sys.package$.error(package.scala:27) at
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:53)
> at
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:47)
> at
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:46)
> ... 33 more
>
> Thank you!
>
> On Thu, Jun 25, 2015 at 11:51 AM, moon soo Lee <m...@apache.org> wrote:
>
>> Hi,
>>
>> Yes, %sql function is only for the tables that has been registered.
>> Using DataFrame is basically similar to what currently you're doing. It
>> needs registerTempTable.
>>
>> Could you share little bit about your problem when registering tables?
>>
>> And really appreciate for reporting a bug!
>>
>> Thanks,
>> moon
>>
>> On Wed, Jun 24, 2015 at 11:28 PM Corneau Damien <cornead...@apache.org>
>> wrote:
>>
>>> Yes, you can change the number of records. The default value is 1000
>>>
>>> On Thu, Jun 25, 2015 at 2:32 PM, Nihal Bhagchandani <
>>> nihal_bhagchand...@yahoo.com> wrote:
>>>
>>>> Hi Su,
>>>>
>>>> as per my understanding you can change the limit of 1000record from the
>>>> interpreter section by setting up the value for variable 
>>>> "zeppelin.spark.maxResult",
>>>> moon could you please confirm my understanding?
>>>>
>>>> Regards
>>>> Nihal
>>>>
>>>>
>>>>
>>>>   On Thursday, 25 June 2015 10:00 AM, Su She <suhsheka...@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>> Hello Everyone,
>>>>
>>>> Excited to be making progress, and thanks for the community for
>>>> providing help along the way.This stuff is all really cool.
>>>>
>>>>
>>>> *Questions:*
>>>>
>>>> *1) *I noticed that the limit for the visual representation is 1000
>>>> results. Are there any short term plans to expand the limit? It seemed a
>>>> little on the low side as many of the reasons for working with spark/hadoop
>>>> is to work with large datasets.
>>>>
>>>> *2) *When can I use the %sql function? Is it only on tables that have
>>>> been registered? I have been having trouble registering tables unless I do:
>>>>
>>>> // Apply the schema to the RDD.val peopleSchemaRDD = 
>>>> sqlContext.applySchema(rowRDD, schema)
>>>> // Register the SchemaRDD as a 
>>>> table.peopleSchemaRDD.registerTempTable("people")
>>>>
>>>>
>>>> I am having lots of trouble registering tables through HiveContext or
>>>> even duplicating the Zeppelin tutorial, is this issue mitigated by using
>>>> DataFrames ( I am planning to move to 1.3 very soon)?
>>>>
>>>>
>>>> *Bug:*
>>>>
>>>> When I do this:
>>>> z.show(sqlContext.sql("select * from sensortable limit 100"))
>>>>
>>>> I get the table, but I also get text results in the bottom, please see
>>>> attached image. For some reason, if the image doesn't go through, i
>>>> basically get the table, and everything works well, but the select
>>>> statement also returns text (regardless of its 100 results or all)
>>>>
>>>>
>>>> Thank you !
>>>>
>>>> Best,
>>>>
>>>> Su
>>>>
>>>>
>>>>
>>>
>

Re: Minor bug in dynamic form and question on limits of results

Reply via email to