Zeppelin is always launching the interpreter group in a separate process, so
that part would be he same either way.
About the root cause of your issue, try putting everything into a single line,
but I suspect that wouldn't be it. It might be something importing library or
something.
_____________________________
From: Adam Hull <[email protected]>
Sent: Friday, March 11, 2016 6:19 PM
Subject: Re: Spark-sql USING from Zeppelin?
To: <[email protected]>
Cool, I'll try that.
One more hole in my understanding if you've got the patience for it:
If the SQLContext.sql(SQLContext. scala:725) error were thrown from the
separate SparkSubmit process, will the JVM stitch together the stack traces
across processes? The org.apache. zeppelin .spark.
SparkSqlInterpreter.interpret( SparkSqlInterpreter.java:137) call deeper in
the stack seems suspicious to me. Would that code only be loaded in the
Zeppelin process?
On Fri, Mar 11, 2016 at 5:32 PM, Felix Cheung
<[email protected]> wrote:
Not from the stack, I think the best way is to run
jps -v
You should see a process SparkSubmit if it is running the
one from your spark home.
_____________________________
From: Adam Hull <[email protected]>
Sent: Friday, March 11, 2016 2:06 PM
Subject: Re: Spark-sql USING from Zeppelin?
To: < [email protected]>
Thanks Felix! I've been struggling to wrap my head around
which SQLContext. scala file is being executed in this stack.
I've set SPARK_HOME="/usr/local/opt/apache-spark/libexec" in my zeppelin-env.sh
file, but I also see ./spark in my zeppelin install directory. I'd image if
Zeppelin is actually using the spark libs in
/usr/local/opt/apache-spark/libexec (installed by homebrew), then it should
parse the same as /usr/local/opt/apache-spark/libexec/bin/spark-sql.
Any ideas which spark jar or installation is being executed
in this stack trace?
On Fri, Mar 11, 2016 at 1:55 PM, Felix Cheung
<[email protected]> wrote:
As you can see in the stack below, it's just calling SQLContext.sql()
org.apache.spark.sql.SQLContext.sql(SQLContext.scala:725) at
It is possible this is caused by some
issue with line parsing. I will try to take a look.
_____________________________
From: Adam Hull < [email protected]>
Sent: Friday, March 11, 2016 1:47 PM
Subject: Spark-sql USING from Zeppelin?
To: < [email protected]>
Hi! This whole ecosystem is pretty new to me.
I'd like to pull JSON files from S3
via the spark-sql interpreter. I've got code that's working when I run
`spark-sql foo.sql` directly, but it fails from a Zeppelin
notebook. Here's the code:
```
%sql
CREATE TEMPORARY
TABLE data USING
org.apache.spark.sql.json OPTIONS
( path
"s3a://some-bucket/data.json.gz"
);
SELECT * FROM data;
```
And here's the
Zeppelin error:
cannot recognize input near 'data'
'USING' 'org' in table name; line 2 pos 0
at
org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:297) at
org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:41)
at
org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:40)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136) at
scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135) at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202) at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
at
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at
scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890) at
scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
at
org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:34)
at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:277) at
org.apache.spark.sql.hive.HiveQLDialect.parse(HiveContext.scala:62) at
org.apache.spark.sql.SQLContext$$anonfun$3.apply(SQLContext.scala:175) at
org.apache.spark.sql.SQLContext$$anonfun$3.apply(SQLContext.scala:175) at
org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:115)
at
org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:114)
at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136) at
scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135) at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202) at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
at
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at
scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890) at
scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
at
org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:34)
at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:172) at
org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:172) at
org.apache.spark.sql.execution.datasources.DDLParser.parse(DDLParser.scala:42)
at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:195) at
org.apache.spark.sql.hive.HiveContext.parseSql(HiveContext.scala:279) at
org.apache.spark.sql.SQLContext.sql(SQLContext.scala:725) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497) at
org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:137)
at
org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:331)
at org.apache.zeppelin.scheduler.Job.run(Job.java:171) at
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at
java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Seems to me that Zeppelin
is using a different Spark SQL parser. I've checked via the Spark UI that both
`spark-sql` and Zeppelin are using Spark 1.5.1, and
Hadoop 2.6.0. I'm using Zeppelin 0.6.
Any suggestions where to look next?
I see hive in that stack trace...