Hi! The hive table is an external table, which I created like this:
CREATE EXTERNAL TABLE MyHiveTable ( id int, data string ) STORED BY 'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler' TBLPROPERTIES ("cassandra.host" = "10.194.30.2", "cassandra.ks.name" = "test" , "cassandra.cf.name" = "mytable" , "cassandra.ks.repfactor" = "1" , "cassandra.ks.strategy" = "org.apache.cassandra.locator.SimpleStrategy" ); Here is the output from spark-sql for different commands: spark-sql> show tables; 14/11/20 09:50:32 INFO parse.ParseDriver: Parsing command: show tables 14/11/20 09:50:32 INFO parse.ParseDriver: Parse Completed 14/11/20 09:50:32 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/11/20 09:50:32 INFO ql.Driver: <PERFLOG method=Driver.run> 14/11/20 09:50:32 INFO ql.Driver: <PERFLOG method=TimeToSubmit> 14/11/20 09:50:32 INFO ql.Driver: <PERFLOG method=compile> 14/11/20 09:50:32 INFO ql.Driver: <PERFLOG method=parse> 14/11/20 09:50:32 INFO parse.ParseDriver: Parsing command: show tables 14/11/20 09:50:32 INFO parse.ParseDriver: Parse Completed 14/11/20 09:50:32 INFO ql.Driver: </PERFLOG method=parse start=1416473432290 end=1416473432290 duration=0> 14/11/20 09:50:32 INFO ql.Driver: <PERFLOG method=semanticAnalyze> 14/11/20 09:50:32 INFO ql.Driver: Semantic Analysis Completed 14/11/20 09:50:32 INFO ql.Driver: </PERFLOG method=semanticAnalyze start=1416473432290 end=1416473432295 duration=5> 14/11/20 09:50:32 INFO exec.ListSinkOperator: Initializing Self 0 OP 14/11/20 09:50:32 INFO exec.ListSinkOperator: Operator 0 OP initialized 14/11/20 09:50:32 INFO exec.ListSinkOperator: Initialization Done 0 OP 14/11/20 09:50:32 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null) 14/11/20 09:50:32 INFO ql.Driver: </PERFLOG method=compile start=1416473432289 end=1416473432298 duration=9> 14/11/20 09:50:32 INFO ql.Driver: <PERFLOG method=Driver.execute> 14/11/20 09:50:32 INFO ql.Driver: Starting command: show tables 14/11/20 09:50:32 INFO ql.Driver: </PERFLOG method=TimeToSubmit start=1416473432289 end=1416473432298 duration=9> 14/11/20 09:50:32 INFO ql.Driver: <PERFLOG method=runTasks> 14/11/20 09:50:32 INFO ql.Driver: <PERFLOG method=task.DDL.Stage-0> 14/11/20 09:50:32 INFO ql.Driver: </PERFLOG method=task.DDL.Stage-0 start=1416473432298 end=1416473432314 duration=16> 14/11/20 09:50:32 INFO ql.Driver: </PERFLOG method=runTasks start=1416473432298 end=1416473432314 duration=16> 14/11/20 09:50:32 INFO ql.Driver: </PERFLOG method=Driver.execute start=1416473432298 end=1416473432314 duration=16> OK 14/11/20 09:50:32 INFO ql.Driver: OK 14/11/20 09:50:32 INFO ql.Driver: <PERFLOG method=releaseLocks> 14/11/20 09:50:32 INFO ql.Driver: </PERFLOG method=releaseLocks start=1416473432314 end=1416473432315 duration=1> 14/11/20 09:50:32 INFO ql.Driver: </PERFLOG method=Driver.run start=1416473432289 end=1416473432315 duration=26> 14/11/20 09:50:32 INFO mapred.FileInputFormat: Total input paths to process : 1 14/11/20 09:50:32 INFO ql.Driver: <PERFLOG method=releaseLocks> 14/11/20 09:50:32 INFO ql.Driver: </PERFLOG method=releaseLocks start=1416473432319 end=1416473432319 duration=0> myhivetable Time taken: 0.088 seconds 14/11/20 09:50:32 INFO CliDriver: Time taken: 0.088 seconds 14/11/20 09:50:32 INFO ql.Driver: <PERFLOG method=releaseLocks> 14/11/20 09:50:32 INFO ql.Driver: </PERFLOG method=releaseLocks start=1416473432325 end=1416473432325 duration=0> spark-sql> describe myhivetable; 14/11/20 09:50:35 INFO parse.ParseDriver: Parsing command: describe myhivetable 14/11/20 09:50:35 INFO parse.ParseDriver: Parse Completed id int from deserializer data string from deserializer Time taken: 0.226 seconds 14/11/20 09:50:35 INFO CliDriver: Time taken: 0.226 seconds spark-sql> select * from myhivetable; 14/11/20 09:50:39 INFO parse.ParseDriver: Parsing command: select * from myhivetable 14/11/20 09:50:39 INFO parse.ParseDriver: Parse Completed 14/11/20 09:50:39 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 14/11/20 09:50:39 INFO storage.MemoryStore: ensureFreeSpace(420085) called with curMem=0, maxMem=278302556 14/11/20 09:50:39 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 410.2 KB, free 265.0 MB) 14/11/20 09:50:39 INFO storage.MemoryStore: ensureFreeSpace(30564) called with curMem=420085, maxMem=278302556 14/11/20 09:50:39 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 29.8 KB, free 265.0 MB) 14/11/20 09:50:39 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.194.30.2:57707 (size: 29.8 KB, free: 265.4 MB) 14/11/20 09:50:39 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0 14/11/20 09:50:39 ERROR thriftserver.SparkSQLDriver: Failed in [select * from myhivetable] java.lang.InstantiationError: org.apache.hadoop.mapreduce.JobContext at org.apache.hadoop.hive.cassandra.input.cql.HiveCqlInputFormat.getSplits(HiveCqlInputFormat.java:166) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:179) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1135) at org.apache.spark.rdd.RDD.collect(RDD.scala:774) at org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:415) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:59) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:291) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) java.lang.InstantiationError: org.apache.hadoop.mapreduce.JobContext at org.apache.hadoop.hive.cassandra.input.cql.HiveCqlInputFormat.getSplits(HiveCqlInputFormat.java:166) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:179) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1135) at org.apache.spark.rdd.RDD.collect(RDD.scala:774) at org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:415) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:59) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:291) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 14/11/20 09:50:39 ERROR CliDriver: java.lang.InstantiationError: org.apache.hadoop.mapreduce.JobContext at org.apache.hadoop.hive.cassandra.input.cql.HiveCqlInputFormat.getSplits(HiveCqlInputFormat.java:166) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:179) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:202) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1135) at org.apache.spark.rdd.RDD.collect(RDD.scala:774) at org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:415) at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:59) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:291) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/tableau-spark-sql-cassandra-tp19282p19356.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org