I followed your instructions to try to load data as parquet format through hiveContext but failed. Do you happen to know my uncorrectness in the following steps?
The steps I am following is like: 1. download "parquet-hive-bundle-1.5.0.jar" 2. revise hive-site.xml including this: <property> <name>hive.jar.directory</name> <value>/home/hduser/hive/lib/parquet-hive-bundle-1.5.0.jar</value> <description> This is the location hive in tez mode will look for to find a site wide installed hive instance. If not set, the directory under hive.user.install.directory corresponding to current user name will be used. </description> </property> 3. copy hive-site.xml to all nodes. 4. start spark-shell, then try to create table: val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) import hiveContext._ hql("create table part (P_PARTKEY INT, P_NAME STRING, P_MFGR STRING, P_BRAND STRING, P_TYPE STRING, P_SIZE INT, P_CONTAINER STRING, P_RETAILPRICE DOUBLE, P_COMMENT STRING) STORED AS PARQUET") Then I got this error: 14/08/18 19:09:00 ERROR Driver: FAILED: SemanticException Unrecognized file format in STORED AS clause: PARQUET org.apache.hadoop.hive.ql.parse.SemanticException: Unrecognized file format in STORED AS clause: PARQUET at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.handleGenericFileFormat(BaseSemanticAnalyzer.java:569) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:8968) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:8313) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:284) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:441) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:977) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888) at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:186) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:160) at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:250) at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:247) at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:85) at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:90) at $line44.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:18) at $line44.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:23) at $line44.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:25) at $line44.$read$$iwC$$iwC$$iwC.<init>(<console>:27) at $line44.$read$$iwC$$iwC.<init>(<console>:29) at $line44.$read$$iwC.<init>(<console>:31) at $line44.$read.<init>(<console>:33) at $line44.$read$.<init>(<console>:37) at $line44.$read$.<clinit>(<console>) at $line44.$eval$.<init>(<console>:7) at $line44.$eval$.<clinit>(<console>) at $line44.$eval.$print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608) at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Many thanks for help! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-HiveContext-support-Parquet-tp12209p12318.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org