Hi Arpit, I didn't build it, I am using the prebuild version described here: http://www.abcn.net/2014/04/install-shark-on-cdh5-hadoop2-spark.html including adding e.g. the mentioned jar
br...Gerd... On 17 April 2014 15:49, Arpit Tak <arpi...@mobipulse.in> wrote: > Just for curiosity , as you are using Cloudera-Manager hadoop and spark.. > How you build shark .....for it?? > > are you able to read any file from hdfs .......did you tried that out..??? > > > Regards, > Arpit Tak > > > On Thu, Apr 17, 2014 at 7:07 PM, ge ko <koenig....@gmail.com> wrote: > >> Hi, >> >> the error java.lang.ClassNotFoundException: >> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat has been >> resolved by adding >> parquet-hive-bundle-1.4.1.jar to shark's lib folder. >> Now the Hive metastore can be read successfully (also the parquet based >> table). >> >> But if I want to select from that table I receive: >> >> org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times >> (most recent failure: Exception failure: java.lang.ClassNotFoundException: >> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018) >> >> This is really strange, since the class >> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe is included in >> the parquet-hive-bundle-1.4.1.jar ?!?! >> ...getting more and more confused ;) >> >> any help ? >> >> regards, Gerd >> >> >> On 17 April 2014 11:55, ge ko <koenig....@gmail.com> wrote: >> >>> Hi, >>> >>> I want to select from a parquet based table in shark, but receive the >>> error: >>> >>> shark> select * from wl_parquet; >>> 14/04/17 11:33:49 INFO shark.SharkCliDriver: Execution Mode: shark >>> 14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=Driver.run> >>> 14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=TimeToSubmit> >>> 14/04/17 11:33:49 INFO ql.Driver: <PERFLOG method=compile> >>> 14/04/17 11:33:49 INFO parse.ParseDriver: Parsing command: select * from >>> wl_parquet >>> 14/04/17 11:33:49 INFO parse.ParseDriver: Parse Completed >>> 14/04/17 11:33:49 INFO parse.SharkSemanticAnalyzer: Get metadata for >>> source tables >>> FAILED: Hive Internal Error: >>> java.lang.RuntimeException(java.lang.ClassNotFoundException: >>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat) >>> 14/04/17 11:33:50 ERROR shark.SharkDriver: FAILED: Hive Internal Error: >>> java.lang.RuntimeException(java.lang.ClassNotFoundException: >>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat) >>> java.lang.RuntimeException: java.lang.ClassNotFoundException: >>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat >>> at >>> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:306) >>> at org.apache.hadoop.hive.ql.metadata.Table.<init>(Table.java:99) >>> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:988) >>> at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:891) >>> at >>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1083) >>> at >>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059) >>> at >>> shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:137) >>> at >>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279) >>> at shark.SharkDriver.compile(SharkDriver.scala:215) >>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) >>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909) >>> at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:338) >>> at >>> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) >>> at shark.SharkCliDriver$.main(SharkCliDriver.scala:235) >>> at shark.SharkCliDriver.main(SharkCliDriver.scala) >>> Caused by: java.lang.ClassNotFoundException: >>> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) >>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425) >>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) >>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358) >>> at java.lang.Class.forName0(Native Method) >>> at java.lang.Class.forName(Class.java:270) >>> at >>> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:302) >>> ... 14 more >>> >>> I can successfully select from that table with Hive and Impala, but >>> shark doesn't work. I am using CDH5 incl. Spark parcel and Shark 0.9.1. >>> >>> In what jar is this class "hidden", how can I get rid of this exception >>> ?!?! >>> >>> The lib folder of shark contains: >>> [root@hadoop-pg-9 shark-0.9.1]# ll lib >>> total 180 >>> lrwxrwxrwx 1 root root 67 16. Apr 14:17 hive-serdes-1.0-SNAPSHOT.jar >>> -> /opt/cloudera/parcels/CDH/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar >>> -rwxrwxr-x 1 root root 23086 9. Apr 10:57 JavaEWAH-0.4.2.jar >>> lrwxrwxrwx 1 root root 53 14. Apr 21:46 parquet-avro.jar -> >>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-avro.jar >>> lrwxrwxrwx 1 root root 58 14. Apr 21:46 parquet-cascading.jar -> >>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-cascading.jar >>> lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-column.jar -> >>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-column.jar >>> lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-common.jar -> >>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-common.jar >>> lrwxrwxrwx 1 root root 57 14. Apr 21:46 parquet-encoding.jar -> >>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-encoding.jar >>> lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-format.jar -> >>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-format.jar >>> lrwxrwxrwx 1 root root 58 14. Apr 21:46 parquet-generator.jar -> >>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-generator.jar >>> lrwxrwxrwx 1 root root 62 14. Apr 21:46 parquet-hadoop-bundle.jar -> >>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-hadoop-bundle.jar >>> lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-hadoop.jar -> >>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-hadoop.jar >>> -rw-r--r-- 1 root root 70103 27. Nov 21:24 parquet-hive-1.2.8.jar >>> lrwxrwxrwx 1 root root 56 14. Apr 21:46 parquet-scrooge.jar -> >>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-scrooge.jar >>> lrwxrwxrwx 1 root root 55 14. Apr 21:46 parquet-thrift.jar -> >>> /opt/cloudera/parcels/CDH/lib/hadoop/parquet-thrift.jar >>> -rw-rw-r-- 1 root root 76220 9. Apr 10:57 pyrolite.jar >>> >>> thanks in advance, Gerd >>> >> >> >