Re: Spark-sql USING from Zeppelin?

Felix Cheung Fri, 11 Mar 2016 18:56:14 -0800

Zeppelin is always launching the interpreter group in a separate process, so 
that part would be he same either way.


About the root cause of your issue, try putting everything into a single line, 
but I suspect that wouldn't be it. It might be something importing library or 
something.


    _____________________________
From: Adam Hull <[email protected]>
Sent: Friday, March 11, 2016 6:19 PM
Subject: Re: Spark-sql USING from Zeppelin?
To:  <[email protected]>


       Cool, I'll try that.        
          One more hole in my understanding if you've got the patience for it: 
If the     SQLContext.sql(SQLContext.    scala:725) error were thrown from the 
separate SparkSubmit process, will the JVM stitch together the stack traces 
across processes?  The     org.apache.    zeppelin    .spark.    
SparkSqlInterpreter.interpret(    SparkSqlInterpreter.java:137) call deeper in 
the stack seems suspicious to me.  Would that code only be loaded in the 
Zeppelin process?          
       On Fri, Mar 11, 2016 at 5:32 PM, Felix Cheung     
<[email protected]> wrote:    
                       Not from the stack, I think the best way is to run       
            
                   jps -v                   
                   You should see a process SparkSubmit if it is running the 
one from your spark home.       
       
              
                       _____________________________
From: Adam Hull <[email protected]>
Sent: Friday, March 11, 2016 2:06 PM      
Subject: Re: Spark-sql USING from Zeppelin?      
To: <      [email protected]>      
      
      
               Thanks Felix! I've been struggling to wrap my head around        
which         SQLContext.        scala file is being executed in this stack.  
I've set SPARK_HOME="/usr/local/opt/apache-spark/libexec" in my zeppelin-env.sh 
file, but I also see ./spark in my zeppelin install directory.  I'd image if 
Zeppelin is actually using the spark libs in 
/usr/local/opt/apache-spark/libexec (installed by homebrew), then it should 
parse the same as        /usr/local/opt/apache-spark/libexec/bin/spark-sql.     
   
        
                   Any ideas which spark jar or installation is being executed 
in this stack trace?                                       
                   On Fri, Mar 11, 2016 at 1:55 PM, Felix Cheung          
<[email protected]> wrote:          
                                              
As you can see in the stack below, it's just calling SQLContext.sql()           
                          
org.apache.spark.sql.SQLContext.sql(SQLContext.scala:725) at                    
                  
                                     It is possible this is caused by some 
issue with line parsing. I will try to take a look.             
                          
                                              _____________________________     
       
From: Adam Hull <            [email protected]>            
Sent: Friday, March 11, 2016 1:47 PM            
Subject: Spark-sql USING from Zeppelin?            
To: <            [email protected]>            
            
            
                          Hi! This whole ecosystem is pretty new to me.         
                    
                                           I'd like to pull JSON files from S3 
via the spark-sql interpreter.  I've got code that's working when I run 
`spark-sql foo.sql` directly, but it fails from a                Zeppelin 
notebook.  Here's the code:                                           
                                           ```                                  
         %sql                                           
                                                           CREATE TEMPORARY 
TABLE data                                              USING 
org.apache.spark.sql.json                                              OPTIONS 
(                                                path 
"s3a://some-bucket/data.json.gz"                                              
);                                              
                                              SELECT * FROM data;               
                                           ```                                  
         
                                           And here's the                
Zeppelin error:                                           
                                           cannot recognize input near 'data' 
'USING' 'org' in table name; line 2 pos 0               
                                           at 
org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:297) at 
org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:41)
 at 
org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:40)
 at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136) at 
scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135) at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
 at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
 at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
 at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
 at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202) at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
 at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
 at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at 
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
 at 
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
 at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at 
scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890) at 
scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
 at 
org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:34)
 at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:277) at 
org.apache.spark.sql.hive.HiveQLDialect.parse(HiveContext.scala:62) at 
org.apache.spark.sql.SQLContext$$anonfun$3.apply(SQLContext.scala:175) at 
org.apache.spark.sql.SQLContext$$anonfun$3.apply(SQLContext.scala:175) at 
org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:115)
 at 
org.apache.spark.sql.SparkSQLParser$$anonfun$org$apache$spark$sql$SparkSQLParser$$others$1.apply(SparkSQLParser.scala:114)
 at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136) at 
scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:135) at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
 at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$map$1.apply(Parsers.scala:242)
 at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
 at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1$$anonfun$apply$2.apply(Parsers.scala:254)
 at scala.util.parsing.combinator.Parsers$Failure.append(Parsers.scala:202) at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
 at 
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
 at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222) at 
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
 at 
scala.util.parsing.combinator.Parsers$$anon$2$$anonfun$apply$14.apply(Parsers.scala:891)
 at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57) at 
scala.util.parsing.combinator.Parsers$$anon$2.apply(Parsers.scala:890) at 
scala.util.parsing.combinator.PackratParsers$$anon$1.apply(PackratParsers.scala:110)
 at 
org.apache.spark.sql.catalyst.AbstractSparkSQLParser.parse(AbstractSparkSQLParser.scala:34)
 at org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:172) at 
org.apache.spark.sql.SQLContext$$anonfun$2.apply(SQLContext.scala:172) at 
org.apache.spark.sql.execution.datasources.DDLParser.parse(DDLParser.scala:42) 
at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:195) at 
org.apache.spark.sql.hive.HiveContext.parseSql(HiveContext.scala:279) at 
org.apache.spark.sql.SQLContext.sql(SQLContext.scala:725) at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497) at 
org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:137)
 at 
org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
 at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
 at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:331)
 at org.apache.zeppelin.scheduler.Job.run(Job.java:171) at 
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745)               
                                           
                            Seems to me that              Zeppelin              
is using a different Spark SQL parser.  I've checked via the Spark UI that both 
`spark-sql` and              Zeppelin             are using Spark 1.5.1, and 
Hadoop 2.6.0.  I'm using              Zeppelin              0.6.                
                          
                                           Any suggestions where to look next?  
I see hive in that stack trace...

Re: Spark-sql USING from Zeppelin?

Reply via email to