Re: Hive query starts own session for LLAP

2017-09-26 Thread Lefty Leverenz
Thanks for the explanations of "all" and "only" Sergey.  I've added them to
the wiki, with minor edits:  hive.llap.execution.mode

.

Now we need an explanation of "map" -- can you supply it?

-- Lefty


On Mon, Sep 25, 2017 at 11:33 AM, Sergey Shelukhin 
wrote:

> Hello.
> Hive would create a new Tez AM to coordinate the query (or use an existing
> one if HS2 session pool is used). However, the YARN app for Tez should
> only have a single container. Is this not the case?
> If it’s running additional containers, what is hive.llap.execution.mode
> set to? It should be set to all or only by default (“all” means run
> everything in LLAP if at all possible; “only” is the same with fallback to
> containers disabled - so the query would fail if it cannot run in LLAP).
>
> From:  Rajesh Narayanan  on behalf of
> Rajesh Narayanan 
> Reply-To:  "user@hive.apache.org" 
> Date:  Friday, September 22, 2017 at 11:59
> To:  "user@hive.apache.org" 
> Subject:  Hive query starts own session for LLAP
>
>
> HI All,
> When I execute the hive query , that  starts its own session and creates
> new yarn jobs rather than using the llap enabled job
> Can you please provide some suggestion?
>
> Thanks
> Rajesh
>
>


Re: hive on spark - why is it so hard?

2017-09-26 Thread Stephen Sprague
i _seem_ to be getting closer.  Maybe its just wishful thinking.   Here's
where i'm at now.

2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl:
17/09/26 21:10:38 INFO rest.RestSubmissionClient: Server responded with
CreateSubmissionResponse:
2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl: {
2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl:
"action" : "CreateSubmissionResponse",
2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl:
"message" : "Driver successfully submitted as driver-20170926211038-0003",
2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl:
"serverSparkVersion" : "2.2.0",
2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl:
"submissionId" : "driver-20170926211038-0003",
2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl:
"success" : true
2017-09-26T21:10:38,892  INFO [stderr-redir-1] client.SparkClientImpl: }
2017-09-26T21:10:45,701 DEBUG [IPC Client (425015667) connection to
dwrdevnn1.sv2.trulia.com/172.19.73.136:8020 from dwr] ipc.Client: IPC
Client (425015667) connection to dwrdevnn1.sv2.trulia.com/172.19.73.136:8020
from dwr: closed
2017-09-26T21:10:45,702 DEBUG [IPC Client (425015667) connection to
dwrdevnn1.sv2.trulia.com/172.19.73.136:8020 from dwr] ipc.Client: IPC Clien
t (425015667) connection to dwrdevnn1.sv2.trulia.com/172.19.73.136:8020
from dwr: stopped, remaining connections 0
2017-09-26T21:12:06,719 ERROR [2337b36e-86ca-47cd-b1ae-f0b32571b97e main]
client.SparkClientImpl: Timed out waiting for client to connect.
*Possible reasons include network issues, errors in remote driver or the
cluster has no available resources, etc.*
*Please check YARN or Spark driver's logs for further information.*
java.util.concurrent.ExecutionException:
java.util.concurrent.TimeoutException: Timed out waiting for client
connection.
at
io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
~[netty-all-4.0.29.Final.jar:4.0.29.Final]
at
org.apache.hive.spark.client.SparkClientImpl.(SparkClientImpl.java:108)
[hive-exec-2.3.0.jar:2.3.0]
at
org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
[hive-exec-2.3.0.jar:2.3.0]
at
org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:101)
[hive-exec-2.3.0.jar:2.3.0]
at
org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.(RemoteHiveSparkClient.java:97)
[hive-exec-2.3.0.jar:2.3.0]
at
org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:73)
[hive-exec-2.3.0.jar:2.3.0]
at
org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:62)
[hive-exec-2.3.0.jar:2.3.0]
at
org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:115)
[hive-exec-2.3.0.jar:2.3.0]
at
org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:126)
[hive-exec-2.3.0.jar:2.3.0]
at
org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.getSparkMemoryAndCores(SetSparkReducerParallelism.java:236)
[hive-exec-2.3.0.jar:2.3.0]


i'll dig some more tomorrow.

On Tue, Sep 26, 2017 at 8:23 PM, Stephen Sprague  wrote:

> oh. i missed Gopal's reply.  oy... that sounds foreboding.  I'll keep you
> posted on my progress.
>
> On Tue, Sep 26, 2017 at 4:40 PM, Gopal Vijayaraghavan 
> wrote:
>
>> Hi,
>>
>> > org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a
>> spark session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed
>> to create spark client.
>>
>> I get inexplicable errors with Hive-on-Spark unless I do a three step
>> build.
>>
>> Build Hive first, use that version to build Spark, use that Spark version
>> to rebuild Hive.
>>
>> I have to do this to make it work because Spark contains Hive jars and
>> Hive contains Spark jars in the class-path.
>>
>> And specifically I have to edit the pom.xml files, instead of passing in
>> params with -Dspark.version, because the installed pom files don't get
>> replacements from the build args.
>>
>> Cheers,
>> Gopal
>>
>>
>>
>


Re: hive on spark - why is it so hard?

2017-09-26 Thread Stephen Sprague
oh. i missed Gopal's reply.  oy... that sounds foreboding.  I'll keep you
posted on my progress.

On Tue, Sep 26, 2017 at 4:40 PM, Gopal Vijayaraghavan 
wrote:

> Hi,
>
> > org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a
> spark session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed
> to create spark client.
>
> I get inexplicable errors with Hive-on-Spark unless I do a three step
> build.
>
> Build Hive first, use that version to build Spark, use that Spark version
> to rebuild Hive.
>
> I have to do this to make it work because Spark contains Hive jars and
> Hive contains Spark jars in the class-path.
>
> And specifically I have to edit the pom.xml files, instead of passing in
> params with -Dspark.version, because the installed pom files don't get
> replacements from the build args.
>
> Cheers,
> Gopal
>
>
>


Re: hive on spark - why is it so hard?

2017-09-26 Thread Stephen Sprague
well this is the spark-submit line from above:

   2017-09-26T14:04:45,678  INFO [4cb82b6d-9568-4518-8e00-f0cf7ac58cd3
main] client.SparkClientImpl: Running client driver with argv:
*/usr/li/spark-2.2.0-bin-**hadoop2.6/bin/spark-submit*

and that's pretty clearly v2.2


I do have other versions of spark on the namenode so lemme remove those and
see what happens


A-HA! dang it!

$ echo $SPARK_HOME
/usr/local/spark

well that clearly needs to be: */usr/lib/spark-2.2.0-bin-*
*hadoop2.6  *

how did i miss that? unbelievable.


Thank you Sahil!   Lets see what happens next!

Cheers,
Stephen


On Tue, Sep 26, 2017 at 4:12 PM, Sahil Takiar 
wrote:

> Are you sure you are using Spark 2.2.0? Based on the stack-trace it looks
> like your call to spark-submit it using an older version of Spark (looks
> like some early 1.x version). Do you have SPARK_HOME set locally? Do you
> have older versions of Spark installed locally?
>
> --Sahil
>
> On Tue, Sep 26, 2017 at 3:33 PM, Stephen Sprague 
> wrote:
>
>> thanks Sahil.  here it is.
>>
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/spark/scheduler/SparkListenerInterface
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:344)
>> at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.
>> scala:318)
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:
>> 75)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.spark.scheduler.SparkListenerInterface
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> ... 5 more
>>
>> at 
>> org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:212)
>> ~[hive-exec-2.3.0.jar:2.3.0]
>> at 
>> org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:500)
>> ~[hive-exec-2.3.0.jar:2.3.0]
>> at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_25]
>> FAILED: SemanticException Failed to get a spark session:
>> org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark
>> client.
>> 2017-09-26T14:04:46,470 ERROR [4cb82b6d-9568-4518-8e00-f0cf7ac58cd3
>> main] ql.Driver: FAILED: SemanticException Failed to get a spark session:
>> org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark
>> client.
>> org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a spark
>> session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to
>> create spark client.
>> at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerPar
>> allelism.getSparkMemoryAndCores(SetSparkReducerParallelism.java:240)
>> at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerPar
>> allelism.process(SetSparkReducerParallelism.java:173)
>> at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch
>> (DefaultRuleDispatcher.java:90)
>> at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAnd
>> Return(DefaultGraphWalker.java:105)
>> at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(De
>> faultGraphWalker.java:89)
>> at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWa
>> lker.java:56)
>> at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWa
>> lker.java:61)
>> at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWa
>> lker.java:61)
>> at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWa
>> lker.java:61)
>> at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalkin
>> g(DefaultGraphWalker.java:120)
>> at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runSetRe
>> ducerParallelism(SparkCompiler.java:288)
>> at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimize
>> OperatorPlan(SparkCompiler.java:122)
>> at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCom
>> piler.java:140)
>> at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInte
>> rnal(SemanticAnalyzer.java:11253)
>> at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeIntern
>> al(CalcitePlanner.java:286)
>> at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze
>> (BaseSemanticAnalyzer.java:258)
>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:511)
>> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java
>> :1316)
>> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1456)
>> at 

Re: hive on spark - why is it so hard?

2017-09-26 Thread Gopal Vijayaraghavan
Hi,

> org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a spark 
> session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create 
> spark client.
 
I get inexplicable errors with Hive-on-Spark unless I do a three step build.

Build Hive first, use that version to build Spark, use that Spark version to 
rebuild Hive.

I have to do this to make it work because Spark contains Hive jars and Hive 
contains Spark jars in the class-path.

And specifically I have to edit the pom.xml files, instead of passing in params 
with -Dspark.version, because the installed pom files don't get replacements 
from the build args.

Cheers,
Gopal




Re: hive on spark - why is it so hard?

2017-09-26 Thread Sahil Takiar
Are you sure you are using Spark 2.2.0? Based on the stack-trace it looks
like your call to spark-submit it using an older version of Spark (looks
like some early 1.x version). Do you have SPARK_HOME set locally? Do you
have older versions of Spark installed locally?

--Sahil

On Tue, Sep 26, 2017 at 3:33 PM, Stephen Sprague  wrote:

> thanks Sahil.  here it is.
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/spark/scheduler/SparkListenerInterface
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:344)
> at org.apache.spark.deploy.SparkSubmit$.launch(
> SparkSubmit.scala:318)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: org.apache.spark.scheduler.
> SparkListenerInterface
> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 5 more
>
> at 
> org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:212)
> ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:500)
> ~[hive-exec-2.3.0.jar:2.3.0]
> at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_25]
> FAILED: SemanticException Failed to get a spark session:
> org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark
> client.
> 2017-09-26T14:04:46,470 ERROR [4cb82b6d-9568-4518-8e00-f0cf7ac58cd3 main]
> ql.Driver: FAILED: SemanticException Failed to get a spark session:
> org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark
> client.
> org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a spark
> session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to
> create spark client.
> at org.apache.hadoop.hive.ql.optimizer.spark.
> SetSparkReducerParallelism.getSparkMemoryAndCores(
> SetSparkReducerParallelism.java:240)
> at org.apache.hadoop.hive.ql.optimizer.spark.
> SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:173)
> at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(
> DefaultRuleDispatcher.java:90)
> at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.
> dispatchAndReturn(DefaultGraphWalker.java:105)
> at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(
> DefaultGraphWalker.java:89)
> at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(
> PreOrderWalker.java:56)
> at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(
> PreOrderWalker.java:61)
> at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(
> PreOrderWalker.java:61)
> at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(
> PreOrderWalker.java:61)
> at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(
> DefaultGraphWalker.java:120)
> at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.
> runSetReducerParallelism(SparkCompiler.java:288)
> at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.
> optimizeOperatorPlan(SparkCompiler.java:122)
> at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(
> TaskCompiler.java:140)
> at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.
> analyzeInternal(SemanticAnalyzer.java:11253)
> at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(
> CalcitePlanner.java:286)
> at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.
> analyze(BaseSemanticAnalyzer.java:258)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:511)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.
> java:1316)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1456)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1236)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1226)
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(
> CliDriver.java:233)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(
> CliDriver.java:184)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(
> CliDriver.java:403)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(
> CliDriver.java:336)
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(
> CliDriver.java:787)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 

Re: hive on spark - why is it so hard?

2017-09-26 Thread Stephen Sprague
thanks Sahil.  here it is.

Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/spark/scheduler/SparkListenerInterface
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:344)
at
org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:318)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException:
org.apache.spark.scheduler.SparkListenerInterface
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 5 more

at
org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:212)
~[hive-exec-2.3.0.jar:2.3.0]
at
org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:500)
~[hive-exec-2.3.0.jar:2.3.0]
at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_25]
FAILED: SemanticException Failed to get a spark session:
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark
client.
2017-09-26T14:04:46,470 ERROR [4cb82b6d-9568-4518-8e00-f0cf7ac58cd3 main]
ql.Driver: FAILED: SemanticException Failed to get a spark session:
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark
client.
org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a spark
session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create
spark client.
at
org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.getSparkMemoryAndCores(SetSparkReducerParallelism.java:240)
at
org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:173)
at
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
at
org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:56)
at
org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
at
org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
at
org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:61)
at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
at
org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runSetReducerParallelism(SparkCompiler.java:288)
at
org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:122)
at
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:140)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11253)
at
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:511)
at
org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1316)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1456)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1236)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1226)
at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
at
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:787)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)


I bugs me that that class is in spark-core_2.11-2.2.0.jar yet so seemingly
out of reach. :(



On Tue, Sep 26, 2017 at 2:44 PM, Sahil Takiar 
wrote:

> Hey Stephen,
>
> Can you send the full stack 

Re: hive on spark - why is it so hard?

2017-09-26 Thread Sahil Takiar
Hey Stephen,

Can you send the full stack trace for the NoClassDefFoundError? For Hive
2.3.0, we only support Spark 2.0.0. Hive may work with more recent versions
of Spark, but we only test with Spark 2.0.0.

--Sahil

On Tue, Sep 26, 2017 at 2:35 PM, Stephen Sprague  wrote:

> * i've installed hive 2.3 and spark 2.2
>
> * i've read this doc plenty of times -> https://cwiki.apache.org/
> confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
>
> * i run this query:
>
>hive --hiveconf hive.root.logger=DEBUG,console -e 'set
> hive.execution.engine=spark; select date_key, count(*) from
> fe_inventory.merged_properties_hist group by 1 order by 1;'
>
>
> * i get this error:
>
> *   Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/spark/scheduler/SparkListenerInterface*
>
>
> * this class in:
>   /usr/lib/spark-2.2.0-bin-hadoop2.6/jars/spark-core_2.11-2.2.0.jar
>
> * i have copied all the spark jars to hdfs://dwrdevnn1/spark-2.2-jars
>
> * i have updated hive-site.xml to set spark.yarn.jars to it.
>
> * i see this is the console:
>
> 2017-09-26T13:34:15,505  INFO [334aa7db-ad0c-48c3-9ada-467aaf05cff3 main]
> spark.HiveSparkClientFactory: load spark property from hive configuration
> (spark.yarn.jars -> hdfs://dwrdevnn1.sv2.trulia.com:8020/spark-2.2-jars/*
> ).
>
> * i see this on the console
>
> 2017-09-26T14:04:45,678  INFO [4cb82b6d-9568-4518-8e00-f0cf7ac58cd3 main]
> client.SparkClientImpl: Running client driver with argv:
> /usr/lib/spark-2.2.0-bin-hadoop2.6/bin/spark-submit --properties-file
> /tmp/spark-submit.6105784757200912217.properties --class
> org.apache.hive.spark.client.RemoteDriver 
> /usr/lib/apache-hive-2.3.0-bin/lib/hive-exec-2.3.0.jar
> --remote-host dwrdevnn1.sv2.trulia.com --remote-port 53393 --conf
> hive.spark.client.connect.timeout=1000 --conf 
> hive.spark.client.server.connect.timeout=9
> --conf hive.spark.client.channel.log.level=null --conf
> hive.spark.client.rpc.max.size=52428800 --conf
> hive.spark.client.rpc.threads=8 --conf hive.spark.client.secret.bits=256
> --conf hive.spark.client.rpc.server.address=null
>
> * i even print out CLASSPATH in this script: /usr/lib/spark-2.2.0-bin-
> hadoop2.6/bin/spark-submit
>
> and /usr/lib/spark-2.2.0-bin-hadoop2.6/jars/spark-core_2.11-2.2.0.jar is
> in it.
>
> ​so i ask... what am i missing?
>
> thanks,
> Stephen​
>
>
>
>
>
>


-- 
Sahil Takiar
Software Engineer at Cloudera
takiar.sa...@gmail.com | (510) 673-0309


hive on spark - why is it so hard?

2017-09-26 Thread Stephen Sprague
* i've installed hive 2.3 and spark 2.2

* i've read this doc plenty of times ->
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started

* i run this query:

   hive --hiveconf hive.root.logger=DEBUG,console -e 'set
hive.execution.engine=spark; select date_key, count(*) from
fe_inventory.merged_properties_hist group by 1 order by 1;'


* i get this error:

*   Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/spark/scheduler/SparkListenerInterface*


* this class in:
  /usr/lib/spark-2.2.0-bin-hadoop2.6/jars/spark-core_2.11-2.2.0.jar

* i have copied all the spark jars to hdfs://dwrdevnn1/spark-2.2-jars

* i have updated hive-site.xml to set spark.yarn.jars to it.

* i see this is the console:

2017-09-26T13:34:15,505  INFO [334aa7db-ad0c-48c3-9ada-467aaf05cff3 main]
spark.HiveSparkClientFactory: load spark property from hive configuration
(spark.yarn.jars -> hdfs://dwrdevnn1.sv2.trulia.com:8020/spark-2.2-jars/*).

* i see this on the console

2017-09-26T14:04:45,678  INFO [4cb82b6d-9568-4518-8e00-f0cf7ac58cd3 main]
client.SparkClientImpl: Running client driver with argv:
/usr/lib/spark-2.2.0-bin-hadoop2.6/bin/spark-submit --properties-file
/tmp/spark-submit.6105784757200912217.properties --class
org.apache.hive.spark.client.RemoteDriver
/usr/lib/apache-hive-2.3.0-bin/lib/hive-exec-2.3.0.jar --remote-host
dwrdevnn1.sv2.trulia.com --remote-port 53393 --conf
hive.spark.client.connect.timeout=1000 --conf
hive.spark.client.server.connect.timeout=9 --conf
hive.spark.client.channel.log.level=null --conf
hive.spark.client.rpc.max.size=52428800 --conf
hive.spark.client.rpc.threads=8 --conf hive.spark.client.secret.bits=256
--conf hive.spark.client.rpc.server.address=null

* i even print out CLASSPATH in this script:
/usr/lib/spark-2.2.0-bin-hadoop2.6/bin/spark-submit

and /usr/lib/spark-2.2.0-bin-hadoop2.6/jars/spark-core_2.11-2.2.0.jar is in
it.

​so i ask... what am i missing?

thanks,
Stephen​