First you need to figure out where is the table. Is this table registered in spark sql code or a hive table ? If it is hive table, then check whether you put hive-site.xml on classpath and configure metastore uri correctly in hive-site.xml. look at the interpreter log to see which metastore it is using.
Ruslan Dautkhanov <[email protected]>于2016年11月25日周五 上午4:04写道: > Problem 1, with sqlContext) > Spark 1.6 > CDH 5.8.3 > Zeppelin 0.6.2 > > Running > > sqlCtx = SQLContext(sc) > sqlCtx.sql('select * from marketview.spend_dim') > > > shows exception "Table not found" . > The same runs find when using hiveContext. > See full stack in [1] > The same stack in the log file [2]. > > I probably wouldn't send this message seeking for your help, > but using hiveContext gives its own problems. > Any ideas why would sparkContext not see that table? > > Problem 2, with HiveContext) > The other problem with hiveContext is brought up in another email > chain. Getting > You must *build Spark with Hive*. Export 'SPARK_HIVE=true' > The wierd part with this hiveContext problem - is that only happens > on second time we try to run a paragraph (and any consequative runs). > First time Zeppelin starts, I can see the same paragraph runs fine. > Zeppelin somehow corrupts its internal state after first run? > > We use Jupyter notebooks without this problems in the same envrionment. > It might be something how Zeppelin was compiled? > > This is how Zeppelin was built: > /opt/maven/maven-latest/bin/mvn clean package -DskipTests -Pspark-1.6 > -Ppyspark -Dhadoop.version=2.6.0-cdh5.8.3 -Phadoop-2.6 -Pyarn -Pvendor-repo > -Pscala-2.10 -e > > Any help will be greatly appreciated. > You see, I send this meesage on Thanksgiving, so it's an important problem > :-) > Happy Thanksgiving everyone! (if you celebrate it) > > > [1] > > Traceback (most recent call last): > File "/tmp/zeppelin_pyspark-8000586427786928449.py", line 267, in <module> > raise Exception(traceback.format_exc()) > Exception: Traceback (most recent call last): > File "/tmp/zeppelin_pyspark-8000586427786928449.py", line 265, in <module> > exec(code) > File "<stdin>", line 2, in <module> > File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", > line 580, in sql > return DataFrame(self._ssql_ctx.sql(sqlQuery), self) > File > "/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", > line 813, in __call__ > answer, self.gateway_client, self.target_id, self.name) > File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/utils.py", > line 51, in deco > raise AnalysisException(s.split(': ', 1)[1], stackTrace) > AnalysisException: u'Table not found: `marketview`.`spend_dim`;' > > > [2] > > ERROR [2016-11-24 00:18:34,579] ({pool-2-thread-5} > SparkSqlInterpreter.java[interpret]:120) - Invocation target exception > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:115) > at > org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:94) > at > org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:341) > at org.apache.zeppelin.scheduler.Job.run(Job.java:176) > at > org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.spark.sql.AnalysisException: Table not found: > `marketview`.`mv_update_2016q1`; > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:54) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:121) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120) > at scala.collection.immutable.List.foreach(List.scala:318) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:120) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:44) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34) > at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133) > at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817) > ... 16 more > INFO [2016-11-24 00:18:34,581] ({pool-2-thread-5} > SchedulerFactory.java[jobFinished]:137) - Job > remoteInterpretJob_1479971914506 finished by scheduler > org.apache.zeppelin.spark.SparkInterpreter866606804 > > > > Thank you, > Ruslan >
