Hi there, My product environment is AWS EMR with hadoop2.4.0 and spark1.0.2. I moved the spark configuration in SPARK_CLASSPATH to spark-default.conf, then the hiveContext went wrong. I also found WARN info “WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/hadoop/.versions/spark-1.0.2-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/hadoop/spark/lib/datanucleus-rdbms-3.2.1.jar.”. But I do not know where the registration was?
content of SPAKR_CLASSPATH: export SPARK_MASTER_IP=10.187.25.107 export SCALA_HOME=/home/hadoop/.versions/scala-2.10.3 export SPARK_LOCAL_DIRS=/mnt/spark/ export SPARK_CLASSPATH="/usr/share/aws/emr/emr-fs/lib/*:/usr/share/aws/emr/lib/*:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/.versions/2.4.0/share/hadoop/common/lib/hadoop-lzo.jar" export SPARK_DAEMON_JAVA_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps contents of spark-default.conf: spark.master spark://10.187.25.107:7077 spark.eventLog.enabled true # spark.eventLog.dir hdfs://namenode:8021/directory spark.serializer org.apache.spark.serializer.KryoSerializer spark.local.dir /mnt/spark/ spark.executor.memory 10g spark.executor.extraLibraryPath "/home/hadoop/.versions/2.4.0/share/hadoop/common/lib/hadoop-lzo.jar" # spark.executor.extraClassPath "/usr/share/aws/emr/emr-fs/lib/*:/usr/share/aws/emr/lib/*:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/.versions/2.4.0/share/hadoop/common/lib/hadoop-lzo.jar" the error log: 14/09/18 02:28:45 INFO parse.ParseDriver: Parsing command: show tables 14/09/18 02:28:45 INFO parse.ParseDriver: Parse Completed 14/09/18 02:28:45 INFO analysis.Analyzer: Max iterations (2) reached for batch MultiInstanceRelations 14/09/18 02:28:45 INFO analysis.Analyzer: Max iterations (2) reached for batch CaseInsensitiveAttributeReferences 14/09/18 02:28:45 INFO analysis.Analyzer: Max iterations (2) reached for batch Check Analysis 14/09/18 02:28:45 INFO sql.SQLContext$$anon$1: Max iterations (2) reached for batch Add exchange 14/09/18 02:28:45 INFO sql.SQLContext$$anon$1: Max iterations (2) reached for batch Prepare Expressions 14/09/18 02:28:45 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=Driver.run> 14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=TimeToSubmit> 14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=compile> 14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=parse> 14/09/18 02:28:45 INFO parse.ParseDriver: Parsing command: show tables 14/09/18 02:28:45 INFO parse.ParseDriver: Parse Completed 14/09/18 02:28:45 INFO ql.Driver: </PERFLOG method=parse start=1411007325561 end=1411007325561 duration=0> 14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=semanticAnalyze> 14/09/18 02:28:45 INFO ql.Driver: Semantic Analysis Completed 14/09/18 02:28:45 INFO ql.Driver: </PERFLOG method=semanticAnalyze start=1411007325561 end=1411007325611 duration=50> 14/09/18 02:28:45 INFO exec.ListSinkOperator: Initializing Self 0 OP 14/09/18 02:28:45 INFO exec.ListSinkOperator: Operator 0 OP initialized 14/09/18 02:28:45 INFO exec.ListSinkOperator: Initialization Done 0 OP 14/09/18 02:28:45 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null) 14/09/18 02:28:45 INFO ql.Driver: </PERFLOG method=compile start=1411007325538 end=1411007325677 duration=139> 14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=Driver.execute> 14/09/18 02:28:45 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name 14/09/18 02:28:45 INFO ql.Driver: Starting command: show tables 14/09/18 02:28:45 INFO ql.Driver: </PERFLOG method=TimeToSubmit start=1411007325538 end=1411007325692 duration=154> 14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=runTasks> 14/09/18 02:28:45 INFO ql.Driver: <PERFLOG method=task.DDL.Stage-0> 14/09/18 02:28:45 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 14/09/18 02:28:45 INFO metastore.ObjectStore: ObjectStore, initialize called 14/09/18 02:28:45 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/hadoop/.versions/spark-1.0.2-bin-hadoop2/lib/datanucleus-rdbms-3.2.1.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/hadoop/spark/lib/datanucleus-rdbms-3.2.1.jar." 14/09/18 02:28:45 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/hadoop/.versions/spark-1.0.2-bin-hadoop2/lib/datanucleus-core-3.2.2.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/hadoop/spark/lib/datanucleus-core-3.2.2.jar." 14/09/18 02:28:45 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/home/hadoop/spark/lib/datanucleus-api-jdo-3.2.1.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/home/hadoop/.versions/spark-1.0.2-bin-hadoop2/lib/datanucleus-api-jdo-3.2.1.jar." 14/09/18 02:28:46 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored 14/09/18 02:28:46 WARN bonecp.BoneCPConfig: Max Connections < 1. Setting to 20 14/09/18 02:28:46 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 14/09/18 02:28:46 INFO metastore.ObjectStore: Initialized ObjectStore 14/09/18 02:28:47 WARN bonecp.BoneCPConfig: Max Connections < 1. Setting to 20 14/09/18 02:28:47 INFO metastore.HiveMetaStore: 0: get_database: default 14/09/18 02:28:47 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=get_database: default 14/09/18 02:28:47 INFO metastore.HiveMetaStore: 0: get_tables: db=default pat=.* 14/09/18 02:28:47 INFO HiveMetaStore.audit: ugi=hadoop ip=unknown-ip-addr cmd=get_tables: db=default pat=.* 14/09/18 02:28:47 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 14/09/18 02:28:47 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 14/09/18 02:28:48 INFO ql.Driver: </PERFLOG method=task.DDL.Stage-0 start=1411007325692 end=1411007328020 duration=2328> 14/09/18 02:28:48 INFO ql.Driver: </PERFLOG method=runTasks start=1411007325692 end=1411007328020 duration=2328> 14/09/18 02:28:48 INFO ql.Driver: </PERFLOG method=Driver.execute start=1411007325677 end=1411007328020 duration=2343> 14/09/18 02:28:48 INFO ql.Driver: OK 14/09/18 02:28:48 INFO ql.Driver: <PERFLOG method=releaseLocks> 14/09/18 02:28:48 INFO ql.Driver: </PERFLOG method=releaseLocks start=1411007328024 end=1411007328024 duration=0> 14/09/18 02:28:48 INFO ql.Driver: </PERFLOG method=Driver.run start=1411007325538 end=1411007328024 duration=2486> 14/09/18 02:28:48 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 14/09/18 02:28:48 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 14/09/18 02:28:48 ERROR hive.HiveContext: ====================== HIVE FAILURE OUTPUT ====================== OK ====================== END HIVE FAILURE OUTPUT ====================== java.io.IOException: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork! at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:551) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:489) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1471) at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:196) at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:163) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35) at org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35) at org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:38) at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:250) at org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:250) at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58) at org.apache.spark.sql.SchemaRDD.<init>(SchemaRDD.scala:104) at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:75) at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:78) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:18) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:23) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:25) at $iwC$$iwC$$iwC.<init>(<console>:27) at $iwC$$iwC.<init>(<console>:29) at $iwC.<init>(<console>:31) at <init>(<console>:33) at .<init>(<console>:37) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608) at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884) at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork! at org.apache.hadoop.hive.ql.exec.FetchOperator.getInputFormatFromCache(FetchOperator.java:223) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:379) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:515) ... 56 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.ql.exec.FetchOperator.getInputFormatFromCache(FetchOperator.java:219) ... 58 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 61 more Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found. at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:135) at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:175) at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45) ... 66 more Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801) at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128) ... 68 more -- Zhun Shen Data Mining at LightnInTheBox.com Email: shenzhunal...@gmail.com | shenz...@yahoo.com Phone: 186 0627 7769 GitHub: https://github.com/shenzhun LinkedIn: http://www.linkedin.com/in/shenzhun