Re: Add a function to support Google's Word2Vec

2015-11-17 Thread Robin East (hotmail)
Have a look at SPARK-9484, JIRA is already there. Pull request would be good.

Robin


> On 17 Nov 2015, at 12:10, yuming wang  wrote:
> 
> Hi:
> 
>  
> 
> I have a function to load Google’s Word2Vec generated binary file and spark 
> can use this model. If it is convenient, I'm going to open a JIRA and Pull 
> Request.
> 
>  
> 
> My code is:
> 
> https://github.com/wangyum/spark/commit/7c80d311722d67ed4b9746537e0a21c2dc1a9670
> 
>  
> 
> Thanks


Re: Mesos cluster dispatcher doesn't respect most args from the submit req

2015-11-17 Thread Iulian Dragoș
Hi Jo,

I agree that there's something fishy with the cluster dispatcher, I've seen
some issues like that.

I think it actually tries to send all properties as part of
`SPARK_EXECUTOR_OPTS`, which may not be everything that's needed:

https://github.com/jayv/spark/blob/mesos_cluster_params/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L375-L377

Can you please open a Jira ticket and describe also the symptoms? This
might be related, or the same issue: SPARK-11280
 and also SPARK-11327


thanks,
iulian




On Tue, Nov 17, 2015 at 2:46 AM, Jo Voordeckers 
wrote:

>
> Hi all,
>
> I'm running the mesos cluster dispatcher, however when I submit jobs with
> things like jvm args, classpath order and UI port aren't added to the
> commandline executed by the mesos scheduler. In fact it only cares about
> the class, jar and num cores/mem.
>
>
> https://github.com/jayv/spark/blob/mesos_cluster_params/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L412-L424
>
> I've made an attempt at adding a few of the args that I believe are useful
> to the MesosClusterScheduler class, which seems to solve my problem.
>
> Please have a look:
>
> https://github.com/apache/spark/pull/9752
>
> Thanks
>
> - Jo Voordeckers
>
>


-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Mesos cluster dispatcher doesn't respect most args from the submit req

2015-11-17 Thread Jo Voordeckers
On Tue, Nov 17, 2015 at 5:16 AM, Iulian Dragoș 
wrote:

> I think it actually tries to send all properties as part of
> `SPARK_EXECUTOR_OPTS`, which may not be everything that's needed:
>
>
> https://github.com/jayv/spark/blob/mesos_cluster_params/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L375-L377
>
>
Aha that's interesting, I overlooked that line, I'll debug some more today
because I know for sure that those options don't make it onto the
commandline when I was running it in my debugger.


> Can you please open a Jira ticket and describe also the symptoms? This
> might be related, or the same issue: SPARK-11280
>  and also SPARK-11327
> 
>

SPARK-11327  is exactly
my problem, but I don't run docker.

 - Jo

On Tue, Nov 17, 2015 at 2:46 AM, Jo Voordeckers 
> wrote:
>
>>
>> Hi all,
>>
>> I'm running the mesos cluster dispatcher, however when I submit jobs with
>> things like jvm args, classpath order and UI port aren't added to the
>> commandline executed by the mesos scheduler. In fact it only cares about
>> the class, jar and num cores/mem.
>>
>>
>> https://github.com/jayv/spark/blob/mesos_cluster_params/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L412-L424
>>
>> I've made an attempt at adding a few of the args that I believe are
>> useful to the MesosClusterScheduler class, which seems to solve my problem.
>>
>> Please have a look:
>>
>> https://github.com/apache/spark/pull/9752
>>
>> Thanks
>>
>> - Jo Voordeckers
>>
>>
>
>
> --
>
> --
> Iulian Dragos
>
> --
> Reactive Apps on the JVM
> www.typesafe.com
>
>


Re: Mesos cluster dispatcher doesn't respect most args from the submit req

2015-11-17 Thread Jo Voordeckers
Hi Tim,

I've done more forensics on this bug, see my comment here:

https://issues.apache.org/jira/browse/SPARK-11327?focusedCommentId=15009843=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15009843


- Jo Voordeckers


On Tue, Nov 17, 2015 at 12:01 PM, Timothy Chen  wrote:

> Hi Jo,
>
> Thanks for the links, I would expected the properties to be in
> scheduler properties but I need to double check.
>
> I'll be looking into these problems this week.
>
> Tim
>
> On Tue, Nov 17, 2015 at 10:28 AM, Jo Voordeckers
>  wrote:
> > On Tue, Nov 17, 2015 at 5:16 AM, Iulian Dragoș <
> iulian.dra...@typesafe.com>
> > wrote:
> >>
> >> I think it actually tries to send all properties as part of
> >> `SPARK_EXECUTOR_OPTS`, which may not be everything that's needed:
> >>
> >>
> >>
> https://github.com/jayv/spark/blob/mesos_cluster_params/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L375-L377
> >>
> >
> > Aha that's interesting, I overlooked that line, I'll debug some more
> today
> > because I know for sure that those options don't make it onto the
> > commandline when I was running it in my debugger.
> >
> >>
> >> Can you please open a Jira ticket and describe also the symptoms? This
> >> might be related, or the same issue: SPARK-11280 and also SPARK-11327
> >
> >
> > SPARK-11327 is exactly my problem, but I don't run docker.
> >
> >  - Jo
> >
> >> On Tue, Nov 17, 2015 at 2:46 AM, Jo Voordeckers <
> jo.voordeck...@gmail.com>
> >> wrote:
> >>>
> >>>
> >>> Hi all,
> >>>
> >>> I'm running the mesos cluster dispatcher, however when I submit jobs
> with
> >>> things like jvm args, classpath order and UI port aren't added to the
> >>> commandline executed by the mesos scheduler. In fact it only cares
> about the
> >>> class, jar and num cores/mem.
> >>>
> >>>
> >>>
> https://github.com/jayv/spark/blob/mesos_cluster_params/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L412-L424
> >>>
> >>> I've made an attempt at adding a few of the args that I believe are
> >>> useful to the MesosClusterScheduler class, which seems to solve my
> problem.
> >>>
> >>> Please have a look:
> >>>
> >>> https://github.com/apache/spark/pull/9752
> >>>
> >>> Thanks
> >>>
> >>> - Jo Voordeckers
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> --
> >> Iulian Dragos
> >>
> >> --
> >> Reactive Apps on the JVM
> >> www.typesafe.com
> >>
> >
>


Re: Does anyone meet the issue that jars under lib_managed is never downloaded ?

2015-11-17 Thread Jeff Zhang
Sure, hive profile is enabled.

On Wed, Nov 18, 2015 at 6:12 AM, Josh Rosen 
wrote:

> Is the Hive profile enabled? I think it may need to be turned on in order
> for those JARs to be deployed.
>
> On Tue, Nov 17, 2015 at 2:27 AM Jeff Zhang  wrote:
>
>> BTW, After I revert  SPARK-7841, I can see all the jars under
>> lib_managed/jars
>>
>> On Tue, Nov 17, 2015 at 2:46 PM, Jeff Zhang  wrote:
>>
>>> Hi Josh,
>>>
>>> I notice the comments in https://github.com/apache/spark/pull/9575 said
>>> that Datanucleus related jars will still be copied to lib_managed/jars.
>>> But I don't see any jars under lib_managed/jars. The weird thing is that I
>>> see the jars on another machine, but could not see jars on my laptop even
>>> after I delete the whole spark project and start from scratch. Does it
>>> related with environments ? I try to add the following code in
>>> SparkBuild.scala to track the issue, it shows that the jars is empty. Any
>>> thoughts on that ?
>>>
>>>
>>> deployDatanucleusJars := {
>>>   val jars: Seq[File] = (fullClasspath in assembly).value.map(_.data)
>>> .filter(_.getPath.contains("org.datanucleus"))
>>>   // this is what I added
>>>   println("*")
>>>   println("fullClasspath:"+fullClasspath)
>>>   println("assembly:"+assembly)
>>>   println("jars:"+jars.map(_.getAbsolutePath()).mkString(","))
>>>   //
>>>
>>>
>>> On Mon, Nov 16, 2015 at 4:51 PM, Jeff Zhang  wrote:
>>>
 This is the exception I got

 15/11/16 16:50:48 WARN metastore.HiveMetaStore: Retrying creating
 default database after error: Class
 org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
 javax.jdo.JDOFatalUserException: Class
 org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
 at
 javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)
 at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
 at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
 at
 org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:365)
 at
 org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:394)
 at
 org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:291)
 at
 org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258)
 at
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at
 org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:57)
 at
 org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
 at
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:66)
 at
 org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
 at
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:199)
 at
 org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
 at
 org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
 at
 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86)

 On Mon, Nov 16, 2015 at 4:47 PM, Jeff Zhang  wrote:

> It's about the datanucleus related jars which is needed by spark sql.
> Without these jars, I could not call data frame related api ( I make
> HiveContext enabled)
>
>
>
> On Mon, Nov 16, 2015 at 4:10 PM, Josh Rosen 
> wrote:
>
>> As of https://github.com/apache/spark/pull/9575, Spark's build will
>> no longer place every dependency JAR into lib_managed. Can you say more
>> about how this affected spark-shell for you (maybe share a stacktrace)?

Re: Mesos cluster dispatcher doesn't respect most args from the submit req

2015-11-17 Thread Timothy Chen
Hi Jo,

Thanks for the links, I would expected the properties to be in
scheduler properties but I need to double check.

I'll be looking into these problems this week.

Tim

On Tue, Nov 17, 2015 at 10:28 AM, Jo Voordeckers
 wrote:
> On Tue, Nov 17, 2015 at 5:16 AM, Iulian Dragoș 
> wrote:
>>
>> I think it actually tries to send all properties as part of
>> `SPARK_EXECUTOR_OPTS`, which may not be everything that's needed:
>>
>>
>> https://github.com/jayv/spark/blob/mesos_cluster_params/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L375-L377
>>
>
> Aha that's interesting, I overlooked that line, I'll debug some more today
> because I know for sure that those options don't make it onto the
> commandline when I was running it in my debugger.
>
>>
>> Can you please open a Jira ticket and describe also the symptoms? This
>> might be related, or the same issue: SPARK-11280 and also SPARK-11327
>
>
> SPARK-11327 is exactly my problem, but I don't run docker.
>
>  - Jo
>
>> On Tue, Nov 17, 2015 at 2:46 AM, Jo Voordeckers 
>> wrote:
>>>
>>>
>>> Hi all,
>>>
>>> I'm running the mesos cluster dispatcher, however when I submit jobs with
>>> things like jvm args, classpath order and UI port aren't added to the
>>> commandline executed by the mesos scheduler. In fact it only cares about the
>>> class, jar and num cores/mem.
>>>
>>>
>>> https://github.com/jayv/spark/blob/mesos_cluster_params/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L412-L424
>>>
>>> I've made an attempt at adding a few of the args that I believe are
>>> useful to the MesosClusterScheduler class, which seems to solve my problem.
>>>
>>> Please have a look:
>>>
>>> https://github.com/apache/spark/pull/9752
>>>
>>> Thanks
>>>
>>> - Jo Voordeckers
>>>
>>
>>
>>
>> --
>>
>> --
>> Iulian Dragos
>>
>> --
>> Reactive Apps on the JVM
>> www.typesafe.com
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Does anyone meet the issue that jars under lib_managed is never downloaded ?

2015-11-17 Thread Josh Rosen
Can you file a JIRA issue to help me triage this further? Thanks!

On Tue, Nov 17, 2015 at 4:08 PM Jeff Zhang  wrote:

> Sure, hive profile is enabled.
>
> On Wed, Nov 18, 2015 at 6:12 AM, Josh Rosen 
> wrote:
>
>> Is the Hive profile enabled? I think it may need to be turned on in order
>> for those JARs to be deployed.
>>
>> On Tue, Nov 17, 2015 at 2:27 AM Jeff Zhang  wrote:
>>
>>> BTW, After I revert  SPARK-7841, I can see all the jars under
>>> lib_managed/jars
>>>
>>> On Tue, Nov 17, 2015 at 2:46 PM, Jeff Zhang  wrote:
>>>
 Hi Josh,

 I notice the comments in https://github.com/apache/spark/pull/9575 said
 that Datanucleus related jars will still be copied to
 lib_managed/jars. But I don't see any jars under lib_managed/jars.
 The weird thing is that I see the jars on another machine, but could not
 see jars on my laptop even after I delete the whole spark project and start
 from scratch. Does it related with environments ? I try to add the
 following code in SparkBuild.scala to track the issue, it shows that the
 jars is empty. Any thoughts on that ?


 deployDatanucleusJars := {
   val jars: Seq[File] = (fullClasspath in
 assembly).value.map(_.data)
 .filter(_.getPath.contains("org.datanucleus"))
   // this is what I added
   println("*")
   println("fullClasspath:"+fullClasspath)
   println("assembly:"+assembly)
   println("jars:"+jars.map(_.getAbsolutePath()).mkString(","))
   //


 On Mon, Nov 16, 2015 at 4:51 PM, Jeff Zhang  wrote:

> This is the exception I got
>
> 15/11/16 16:50:48 WARN metastore.HiveMetaStore: Retrying creating
> default database after error: Class
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
> javax.jdo.JDOFatalUserException: Class
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
> at
> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)
> at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
> at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:365)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:394)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:291)
> at
> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258)
> at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
> at
> org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:57)
> at
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
> at
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:66)
> at
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:199)
> at
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
> at
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86)
>
> On Mon, Nov 16, 2015 at 4:47 PM, Jeff Zhang  wrote:
>
>> It's about the datanucleus related jars which is needed by spark sql.
>> Without these jars, I could not call data frame related api ( I make
>> HiveContext enabled)
>>
>>
>>
>> On Mon, Nov 16, 2015 at 4:10 PM, Josh Rosen 

Re: Does anyone meet the issue that jars under lib_managed is never downloaded ?

2015-11-17 Thread Jeff Zhang
Created https://issues.apache.org/jira/browse/SPARK-11798



On Wed, Nov 18, 2015 at 9:42 AM, Josh Rosen 
wrote:

> Can you file a JIRA issue to help me triage this further? Thanks!
>
> On Tue, Nov 17, 2015 at 4:08 PM Jeff Zhang  wrote:
>
>> Sure, hive profile is enabled.
>>
>> On Wed, Nov 18, 2015 at 6:12 AM, Josh Rosen 
>> wrote:
>>
>>> Is the Hive profile enabled? I think it may need to be turned on in
>>> order for those JARs to be deployed.
>>>
>>> On Tue, Nov 17, 2015 at 2:27 AM Jeff Zhang  wrote:
>>>
 BTW, After I revert  SPARK-7841, I can see all the jars under
 lib_managed/jars

 On Tue, Nov 17, 2015 at 2:46 PM, Jeff Zhang  wrote:

> Hi Josh,
>
> I notice the comments in https://github.com/apache/spark/pull/9575 said
> that Datanucleus related jars will still be copied to
> lib_managed/jars. But I don't see any jars under lib_managed/jars.
> The weird thing is that I see the jars on another machine, but could not
> see jars on my laptop even after I delete the whole spark project and 
> start
> from scratch. Does it related with environments ? I try to add the
> following code in SparkBuild.scala to track the issue, it shows that the
> jars is empty. Any thoughts on that ?
>
>
> deployDatanucleusJars := {
>   val jars: Seq[File] = (fullClasspath in
> assembly).value.map(_.data)
> .filter(_.getPath.contains("org.datanucleus"))
>   // this is what I added
>   println("*")
>   println("fullClasspath:"+fullClasspath)
>   println("assembly:"+assembly)
>   println("jars:"+jars.map(_.getAbsolutePath()).mkString(","))
>   //
>
>
> On Mon, Nov 16, 2015 at 4:51 PM, Jeff Zhang  wrote:
>
>> This is the exception I got
>>
>> 15/11/16 16:50:48 WARN metastore.HiveMetaStore: Retrying creating
>> default database after error: Class
>> org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
>> javax.jdo.JDOFatalUserException: Class
>> org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
>> at
>> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)
>> at
>> javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
>> at
>> javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
>> at
>> org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:365)
>> at
>> org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:394)
>> at
>> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:291)
>> at
>> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258)
>> at
>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
>> at
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>> at
>> org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:57)
>> at
>> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
>> at
>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593)
>> at
>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571)
>> at
>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
>> at
>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
>> at
>> org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:66)
>> at
>> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
>> at
>> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
>> at
>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:199)
>> at
>> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>> at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>> at
>> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
>> at
>> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86)
>>
>> On Mon, Nov 16, 2015 at 4:47 PM, Jeff Zhang  wrote:
>>
>>> 

Fwd: zeppelin (or spark-shell) with HBase fails on executor level

2015-11-17 Thread 임정택
HI,

First of all, I'm sorry if you received mail before (apache spark user
mailing group).
I posted this mail to user mailing group but I didn't receive any
informations about resolving so I'd like to post this to dev mailing group.

This link points to the thread I'm forwarding, so if you feel convenient
referring to mail archive, please use this link.

https://mail-archives.apache.org/mod_mbox/spark-user/201511.mbox/%3CCAF5108jMXyOjiGmCgr%3Ds%2BNvTMcyKWMBVM1GsrH7Pz4xUj48LfA%40mail.gmail.com%3E

This behavior is a bit odd for me, so I'd like to get any hints to resolve,
or report bug if it is.

Thanks!
Jungtaek Lim (HeartSaVioR)


-- Forwarded message --
From: 임정택 
Date: 2015-11-17 18:01 GMT+09:00
Subject: zeppelin (or spark-shell) with HBase fails on executor level
To: u...@spark.apache.org


Hi all,

I'm evaluating zeppelin to run driver which interacts with HBase.
I use fat jar to include HBase dependencies, and see failures on executor
level.
I thought it is zeppelin's issue, but it fails on spark-shell, too.

I loaded fat jar via --jars option,

> ./bin/spark-shell --jars hbase-included-assembled.jar

and run driver code using provided SparkContext instance, and see failures
from spark-shell console and executor logs.

below is stack traces,

org.apache.spark.SparkException: Job aborted due to stage failure:
Task 55 in stage 0.0 failed 4 times, most recent failure: Lost task
55.3 in stage 0.0 (TID 281, ):
java.lang.NoClassDefFoundError: Could not initialize class
org.apache.hadoop.hbase.client.HConnectionManager
at org.apache.hadoop.hbase.client.HTable.(HTable.java:197)
at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
at 
org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:128)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)


15/11/16 18:59:57 ERROR Executor: Exception in task 14.0 in stage 0.0 (TID 14)
java.lang.ExceptionInInitializerError
at org.apache.hadoop.hbase.client.HTable.(HTable.java:197)
at org.apache.hadoop.hbase.client.HTable.(HTable.java:159)
at 
org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:101)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:128)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:104)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:66)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at 

Re: Does anyone meet the issue that jars under lib_managed is never downloaded ?

2015-11-17 Thread Jeff Zhang
BTW, After I revert  SPARK-7841, I can see all the jars under
lib_managed/jars

On Tue, Nov 17, 2015 at 2:46 PM, Jeff Zhang  wrote:

> Hi Josh,
>
> I notice the comments in https://github.com/apache/spark/pull/9575 said
> that Datanucleus related jars will still be copied to lib_managed/jars.
> But I don't see any jars under lib_managed/jars. The weird thing is that I
> see the jars on another machine, but could not see jars on my laptop even
> after I delete the whole spark project and start from scratch. Does it
> related with environments ? I try to add the following code in
> SparkBuild.scala to track the issue, it shows that the jars is empty. Any
> thoughts on that ?
>
>
> deployDatanucleusJars := {
>   val jars: Seq[File] = (fullClasspath in assembly).value.map(_.data)
> .filter(_.getPath.contains("org.datanucleus"))
>   // this is what I added
>   println("*")
>   println("fullClasspath:"+fullClasspath)
>   println("assembly:"+assembly)
>   println("jars:"+jars.map(_.getAbsolutePath()).mkString(","))
>   //
>
>
> On Mon, Nov 16, 2015 at 4:51 PM, Jeff Zhang  wrote:
>
>> This is the exception I got
>>
>> 15/11/16 16:50:48 WARN metastore.HiveMetaStore: Retrying creating default
>> database after error: Class
>> org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
>> javax.jdo.JDOFatalUserException: Class
>> org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
>> at
>> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)
>> at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
>> at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
>> at
>> org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:365)
>> at
>> org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:394)
>> at
>> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:291)
>> at
>> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258)
>> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
>> at
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>> at
>> org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:57)
>> at
>> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
>> at
>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593)
>> at
>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571)
>> at
>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
>> at
>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
>> at
>> org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:66)
>> at
>> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
>> at
>> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
>> at
>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:199)
>> at
>> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>> at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>> at
>> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
>> at
>> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86)
>>
>> On Mon, Nov 16, 2015 at 4:47 PM, Jeff Zhang  wrote:
>>
>>> It's about the datanucleus related jars which is needed by spark sql.
>>> Without these jars, I could not call data frame related api ( I make
>>> HiveContext enabled)
>>>
>>>
>>>
>>> On Mon, Nov 16, 2015 at 4:10 PM, Josh Rosen 
>>> wrote:
>>>
 As of https://github.com/apache/spark/pull/9575, Spark's build will no
 longer place every dependency JAR into lib_managed. Can you say more about
 how this affected spark-shell for you (maybe share a stacktrace)?

 On Mon, Nov 16, 2015 at 12:03 AM, Jeff Zhang  wrote:

>
> Sometimes, the jars under lib_managed is missing. And after I rebuild
> the spark, the jars under lib_managed is still not downloaded. This would
> cause the spark-shell fail due to jars missing. Anyone has hit this weird
> issue ?
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>


>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>>
>>
>> --
>> 

Re: Does anyone meet the issue that jars under lib_managed is never downloaded ?

2015-11-17 Thread Josh Rosen
Is the Hive profile enabled? I think it may need to be turned on in order
for those JARs to be deployed.
On Tue, Nov 17, 2015 at 2:27 AM Jeff Zhang  wrote:

> BTW, After I revert  SPARK-7841, I can see all the jars under
> lib_managed/jars
>
> On Tue, Nov 17, 2015 at 2:46 PM, Jeff Zhang  wrote:
>
>> Hi Josh,
>>
>> I notice the comments in https://github.com/apache/spark/pull/9575 said
>> that Datanucleus related jars will still be copied to lib_managed/jars.
>> But I don't see any jars under lib_managed/jars. The weird thing is that I
>> see the jars on another machine, but could not see jars on my laptop even
>> after I delete the whole spark project and start from scratch. Does it
>> related with environments ? I try to add the following code in
>> SparkBuild.scala to track the issue, it shows that the jars is empty. Any
>> thoughts on that ?
>>
>>
>> deployDatanucleusJars := {
>>   val jars: Seq[File] = (fullClasspath in assembly).value.map(_.data)
>> .filter(_.getPath.contains("org.datanucleus"))
>>   // this is what I added
>>   println("*")
>>   println("fullClasspath:"+fullClasspath)
>>   println("assembly:"+assembly)
>>   println("jars:"+jars.map(_.getAbsolutePath()).mkString(","))
>>   //
>>
>>
>> On Mon, Nov 16, 2015 at 4:51 PM, Jeff Zhang  wrote:
>>
>>> This is the exception I got
>>>
>>> 15/11/16 16:50:48 WARN metastore.HiveMetaStore: Retrying creating
>>> default database after error: Class
>>> org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
>>> javax.jdo.JDOFatalUserException: Class
>>> org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
>>> at
>>> javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)
>>> at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
>>> at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
>>> at
>>> org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:365)
>>> at
>>> org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:394)
>>> at
>>> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:291)
>>> at
>>> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258)
>>> at
>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
>>> at
>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
>>> at
>>> org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:57)
>>> at
>>> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
>>> at
>>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593)
>>> at
>>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571)
>>> at
>>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
>>> at
>>> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
>>> at
>>> org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:66)
>>> at
>>> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
>>> at
>>> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
>>> at
>>> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:199)
>>> at
>>> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> at
>>> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
>>> at
>>> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86)
>>>
>>> On Mon, Nov 16, 2015 at 4:47 PM, Jeff Zhang  wrote:
>>>
 It's about the datanucleus related jars which is needed by spark sql.
 Without these jars, I could not call data frame related api ( I make
 HiveContext enabled)



 On Mon, Nov 16, 2015 at 4:10 PM, Josh Rosen 
 wrote:

> As of https://github.com/apache/spark/pull/9575, Spark's build will
> no longer place every dependency JAR into lib_managed. Can you say more
> about how this affected spark-shell for you (maybe share a stacktrace)?
>
> On Mon, Nov 16, 2015 at 12:03 AM, Jeff Zhang  wrote:
>
>>
>> Sometimes, the jars under lib_managed is missing. And after I rebuild
>> the spark, the jars under lib_managed is