Re: Spark-sql USING from Zeppelin?

Adam Hull Fri, 11 Mar 2016 18:20:07 -0800

Cool, I'll try that.

One more hole in my understanding if you've got the patience for it: If the
SQLContext.sql(SQLContext.scala:725) error were thrown from the separate
SparkSubmit process, will the JVM stitch together the stack traces across
processes?  The
org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:137)
call deeper in the stack seems suspicious to me.  Would that code only be
loaded in the Zeppelin process?


On Fri, Mar 11, 2016 at 5:32 PM, Felix Cheung <felixcheun...@hotmail.com>
wrote:

> Not from the stack, I think the best way is to run
>
> jps -v
>
> You should see a process SparkSubmit if it is running the one from your
> spark home.
>
>
> _____________________________
> From: Adam Hull <a...@goodeggs.com>
> Sent: Friday, March 11, 2016 2:06 PM
> Subject: Re: Spark-sql USING from Zeppelin?
> To: <users@zeppelin.incubator.apache.org>
>
>
> Thanks Felix! I've been struggling to wrap my head around *which *
> SQLContext. scala file is being executed in this stack.  I've set
> SPARK_HOME="/usr/local/opt/apache-spark/libexec" in my zeppelin-env.sh
> file, but I also see ./spark in my zeppelin install directory.  I'd image
> if Zeppelin is actually using the spark libs
> in /usr/local/opt/apache-spark/libexec (installed by homebrew), then it
> should parse the same as
> /usr/local/opt/apache-spark/libexec/bin/spark-sql.
>
> Any ideas which spark jar or installation is being executed in this stack
> trace?
>
> On Fri, Mar 11, 2016 at 1:55 PM, Felix Cheung <felixcheun...@hotmail.com>
> wrote:
>
>>
>> As you can see in the stack below, it's just calling SQLContext.sql()
>> org.apache.spark.sql.SQLContext.sql(SQLContext.scala:725) at
>>
>> It is possible this is caused by some issue with line parsing. I will try
>> to take a look.
>>
>> _____________________________
>> From: Adam Hull < a...@goodeggs.com>
>> Sent: Friday, March 11, 2016 1:47 PM
>> Subject: Spark-sql USING from Zeppelin?
>> To: < users@zeppelin.incubator.apache.org>
>>
>>
>> Hi! This whole ecosystem is pretty new to me.
>>
>> I'd like to pull JSON files from S3 via the spark-sql interpreter.  I've
>> got code that's working when I run `spark-sql foo.sql` directly, but it
>> fails from a  Zeppelin notebook.  Here's the code:
>>
>> ```
>> %sql
>>
>> CREATE TEMPORARY TABLE data
>> USING org.apache.spark.sql.json
>> OPTIONS (
>>   path "s3a://some-bucket/data.json.gz"
>> );
>>
>> SELECT * FROM data;
>> ```
>>
>> And here's the  Zeppelin error:
>>
>> cannot recognize input near 'data' 'USING' 'org' in table name; line 2
>> pos 0
>> at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:297) at
>> org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:41)
>> at
>> org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:40)
>> at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136) at
>> scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135) at
>> scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
>> at
>> scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
>> at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
>> at
>> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
>> at
>> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
>> at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202)
>> at
>> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
>> at
>> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
>> at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
>> at
>> scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
>> at
>> scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
>> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at
>> scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890) at
>> scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
>> at
>> org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:34)
>> at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:277) at
>> org.apache.spark.sql.hive.HiveQLDialect.parse(HiveContext.scala:62) at
>> org.apache.spark.sql.SQLContext$$anonfun$3.apply(SQLContext.scala:175) at
>> org.apache.spark.sql.SQLContext$$anonfun$3.apply(SQLContext.scala:175) at
>> org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:115)
>> at
>> org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:114)
>> at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136) at
>> scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135) at
>> scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
>> at
>> scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
>> at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
>> at
>> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
>> at
>> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
>> at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202)
>> at
>> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
>> at
>> scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
>> at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
>> at
>> scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
>> at
>> scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
>> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at
>> scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890) at
>> scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
>> at
>> org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:34)
>> at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:172)
>> at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:172)
>> at
>> org.apache.spark.sql.execution.datasources.DDLParser.parse(DDLParser.scala:42)
>> at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:195) at
>> org.apache.spark.sql.hive.HiveContext.parseSql(HiveContext.scala:279) at
>> org.apache.spark.sql.SQLContext.sql(SQLContext.scala:725) at
>> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.
>> zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:137)
>> at 
>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
>> at 
>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
>> at 
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:331)
>> at org.apache.zeppelin.scheduler.Job.run(Job.java:171) at org.apache.
>> zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at
>> java.util.concurrent.FutureTask.run(FutureTask.java:266) at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> Seems to me that  Zeppelin  is using a different Spark SQL parser.  I've
>> checked via the Spark UI that both `spark-sql` and  Zeppelin are using
>> Spark 1.5.1, and Hadoop 2.6.0.  I'm using  Zeppelin  0.6.
>>
>> Any suggestions where to look next?  I see hive in that stack trace...
>>
>>
>>
>
>
>

Re: Spark-sql USING from Zeppelin?

Reply via email to