Re: Spark can not access jar from HDFS !!

Ravindra Sun, 10 May 2015 23:46:30 -0700

Hi All,

Thanks for suggestions. What I tried is -
hiveContext.sql ("add jar ....") and that helps to complete the "create
temporary function" but while using this function I get ClassNotFound for
the class handling this function. The same class is present in the jar
added .


Please note that the same works fine from the Hive Shell.

Is there an issue with Spark while distributing jars across workers? May be
that is causing the problem. Also can you please suggest the manual way of
copying the jars to the workers, I just want to ascertain my assumption.

Thanks,
Ravi

On Sun, May 10, 2015 at 1:40 AM Michael Armbrust <mich...@databricks.com>
wrote:

> That code path is entirely delegated to hive.  Does hive support this?
> You might try instead using sparkContext.addJar.
>
> On Sat, May 9, 2015 at 12:32 PM, Ravindra <ravindra.baj...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I am trying to create custom udfs with hiveContext as given below -
>> scala> hiveContext.sql ("CREATE TEMPORARY FUNCTION sample_to_upper AS
>> 'com.abc.api.udf.MyUpper' USING JAR
>> 'hdfs:///users/ravindra/customUDF2.jar'")
>>
>> I have put the udf jar in the hdfs at the path given above. The same
>> command works well in the hive shell but failing here in the spark shell.
>> And it fails as given below. -
>> 15/05/10 00:41:51 ERROR Task: FAILED:
>> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to load JAR
>> hdfs:///users/ravindra/customUDF2.jar
>> 15/05/10 00:41:51 INFO FunctionTask: create function:
>> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to load JAR
>> hdfs:///users/ravindra/customUDF2.jar
>> at
>> org.apache.hadoop.hive.ql.exec.FunctionTask.addFunctionResources(FunctionTask.java:305)
>> at
>> org.apache.hadoop.hive.ql.exec.FunctionTask.createTemporaryFunction(FunctionTask.java:179)
>> at
>> org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:81)
>> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
>> at
>> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
>> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
>> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
>> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
>> at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:305)
>> at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276)
>> at
>> org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35)
>> at
>> org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35)
>> at org.apache.spark.sql.execution.Command$class.execute(commands.scala:46)
>> at
>> org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:30)
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)
>> at
>> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)
>> at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
>> at org.apache.spark.sql.SchemaRDD.<init>(SchemaRDD.scala:108)
>> at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:94)
>> at $line17.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:18)
>> at $line17.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:23)
>> at $line17.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:25)
>> at $line17.$read$$iwC$$iwC$$iwC.<init>(<console>:27)
>> at $line17.$read$$iwC$$iwC.<init>(<console>:29)
>> at $line17.$read$$iwC.<init>(<console>:31)
>> at $line17.$read.<init>(<console>:33)
>> at $line17.$read$.<init>(<console>:37)
>> at $line17.$read$.<clinit>(<console>)
>> at $line17.$eval$.<init>(<console>:7)
>> at $line17.$eval$.<clinit>(<console>)
>> at $line17.$eval.$print(<console>)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at
>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:852)
>> at
>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1125)
>> at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:674)
>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:705)
>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:669)
>> at
>> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:828)
>> at
>> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:873)
>> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:785)
>> at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:628)
>> at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:636)
>> at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:641)
>> at
>> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:968)
>> at
>> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
>> at
>> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:916)
>> at
>> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
>> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:916)
>> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1011)
>> at org.apache.spark.repl.Main$.main(Main.scala:31)
>> at org.apache.spark.repl.Main.main(Main.scala)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>> 15/05/10 00:41:51 ERROR Driver: FAILED: Execution Error, return code 1
>> from org.apache.hadoop.hive.ql.exec.FunctionTask
>> 15/05/10 00:41:51 INFO PerfLogger: </PERFLOG method=Driver.execute
>> start=1431198710959 end=1431198711073 duration=114
>> from=org.apache.hadoop.hive.ql.Driver>
>> 15/05/10 00:41:51 INFO PerfLogger: <PERFLOG method=releaseLocks
>> from=org.apache.hadoop.hive.ql.Driver>
>> 15/05/10 00:41:51 INFO PerfLogger: </PERFLOG method=releaseLocks
>> start=1431198711074 end=1431198711074 duration=0
>> from=org.apache.hadoop.hive.ql.Driver>
>> 15/05/10 00:41:51 ERROR HiveContext:
>> ======================
>> HIVE FAILURE OUTPUT
>> ======================
>> converting to local hdfs:///users/ravindra/customUDF2.jar
>> Failed to read external resource hdfs:///users/ravindra/customUDF2.jar
>> FAILED: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to load
>> JAR hdfs:///users/ravindra/customUDF2.jar
>> FAILED: Execution Error, return code 1 from
>> org.apache.hadoop.hive.ql.exec.FunctionTask
>>
>> ======================
>> END HIVE FAILURE OUTPUT
>> ======================
>>
>> org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution
>> Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
>> at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:309)
>> at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:276)
>>
>>

Re: Spark can not access jar from HDFS !!

Reply via email to