Re: Adding abstraction in MLlib

2014-09-17 Thread Xiangrui Meng
Hi Egor,

I posted the design doc for pipeline and parameters on the JIRA, now
I'm trying to work out some details of ML datasets, which I will post
it later this week. You feedback is welcome!

Best,
Xiangrui

On Mon, Sep 15, 2014 at 12:44 AM, Reynold Xin r...@databricks.com wrote:
 Hi Egor,

 Thanks for the suggestion. It is definitely our intention and practice to
 post design docs as soon as they are ready, and short iteration cycles. As a
 matter of fact, we encourage design docs for major features posted before
 implementation starts, and WIP pull requests before they are fully baked for
 large features.

 That said, no, not 100% of a committer's time is on a specific ticket. There
 are lots of tickets that are open for a long time before somebody starts
 actively working on it. So no, it is not true that all this time was active
 development. Xiangrui should post the design doc as soon as it is ready for
 feedback.



 On Sun, Sep 14, 2014 at 11:26 PM, Egor Pahomov pahomov.e...@gmail.com
 wrote:

 It's good, that databricks working on this issue! However current process
 of working on that is not very clear for outsider.

 Last update on this ticket is August 5. If all this time was active
 development, I have concerns that without feedback from community for such
 long time development can fall in wrong way.
 Even if it would be great big patch as soon as you introduce new
 interfaces to community it would allow us to start working on our pipeline
 code. It would allow us write algorithm in new paradigm instead of in lack
 of any paradigms like it was before. It would allow us to help you transfer
 old code to new paradigm.

 My main point - shorter iterations with more transparency.

 I think it would be good idea to create some pull request with code, which
 you have so far, even if it doesn't pass tests, so just we can comment on it
 before formulating it in design doc.


 2014-09-13 0:00 GMT+04:00 Patrick Wendell pwend...@gmail.com:

 We typically post design docs on JIRA's before major work starts. For
 instance, pretty sure SPARk-1856 will have a design doc posted
 shortly.

 On Fri, Sep 12, 2014 at 12:10 PM, Erik Erlandson e...@redhat.com wrote:
 
  Are interface designs being captured anywhere as documents that the
  community can follow along with as the proposals evolve?
 
  I've worked on other open source projects where design docs were
  published as living documents (e.g. on google docs, or etherpad, but the
  particular mechanism isn't crucial).   FWIW, I found that to be a good way
  to work in a community environment.
 
 
  - Original Message -
  Hi Egor,
 
  Thanks for the feedback! We are aware of some of the issues you
  mentioned and there are JIRAs created for them. Specifically, I'm
  pushing out the design on pipeline features and algorithm/model
  parameters this week. We can move our discussion to
  https://issues.apache.org/jira/browse/SPARK-1856 .
 
  It would be nice to make tests against interfaces. But it definitely
  needs more discussion before making PRs. For example, we discussed the
  learning interfaces in Christoph's PR
  (https://github.com/apache/spark/pull/2137/) but it takes time to
  reach a consensus, especially on interfaces. Hopefully all of us could
  benefit from the discussion. The best practice is to break down the
  proposal into small independent piece and discuss them on the JIRA
  before submitting PRs.
 
  For performance tests, there is a spark-perf package
  (https://github.com/databricks/spark-perf) and we added performance
  tests for MLlib in v1.1. But definitely more work needs to be done.
 
  The dev-list may not be a good place for discussion on the design,
  could you create JIRAs for each of the issues you pointed out, and we
  track the discussion on JIRA? Thanks!
 
  Best,
  Xiangrui
 
  On Fri, Sep 12, 2014 at 10:45 AM, Reynold Xin r...@databricks.com
  wrote:
   Xiangrui can comment more, but I believe Joseph and him are actually
   working on standardize interface and pipeline feature for 1.2
   release.
  
   On Fri, Sep 12, 2014 at 8:20 AM, Egor Pahomov
   pahomov.e...@gmail.com
   wrote:
  
   Some architect suggestions on this matter -
   https://github.com/apache/spark/pull/2371
  
   2014-09-12 16:38 GMT+04:00 Egor Pahomov pahomov.e...@gmail.com:
  
Sorry, I misswrote  - I meant learners part of framework - models
already
exists.
   
2014-09-12 15:53 GMT+04:00 Christoph Sawade 
christoph.saw...@googlemail.com:
   
I totally agree, and we discovered also some drawbacks with the
classification models implementation that are based on GLMs:
   
- There is no distinction between predicting scores, classes,
and
calibrated scores (probabilities). For these models it is common
to
have
access to all of them and the prediction function
``predict``should be
consistent and stateless. Currently, the score is only available
after
removing the threshold from the 

Re: Network Communication - Akka or more?

2014-09-17 Thread Reynold Xin
I'm not familiar with Infiniband, but I can chime in on the Spark part.

There are two kinds of communications in Spark: control plane and data
plane.  Task scheduling / dispatching is control, whereas fetching a block
(e.g. shuffle) is data.


On Tue, Sep 16, 2014 at 4:22 PM, Trident cw...@vip.qq.com wrote:

 Thank you for reading this mail.

 I'm trying to change the underlying network connection system of Spark to
 support Infiniteband.

 1. I doubt whether ConnectionManager and netty is under construction. It
 seems that they are not usually used.


They are used for data plane communication. Broadcast, shuffle, all use
them.



 2. How much connection payload is carried by akka?


Akka is mainly responsible for control, i.e. dispatching tasks, reporting a
block being put into memory to the driver etc.



 3. When running ./bin/run-example SparkPi   I noticed that the jar file
 has been sent from server to client. It is scary because the jar is big. Is
 it common?


How are you going to distribute the jar file if you don't send it? The
workers need to bytecode for those classes you are going to execute.


Re: Workflow Scheduler for Spark

2014-09-17 Thread Mark Hamstra
See https://issues.apache.org/jira/browse/SPARK-3530 and this doc,
referenced in that JIRA:

https://docs.google.com/document/d/1rVwXRjWKfIb-7PI6b86ipytwbUH7irSNLF1_6dLmh8o/edit?usp=sharing

On Wed, Sep 17, 2014 at 2:00 AM, Egor Pahomov pahomov.e...@gmail.com
wrote:

 I have problems using Oozie. For example it doesn't sustain spark context
 like ooyola job server does. Other than GUI interfaces like HUE it's hard
 to work with - scoozie stopped in development year ago(I spoke with
 creator) and oozie xml very hard to write.
 Oozie still have all documentation and code in MR model rather than in yarn
 model. And based on it's current speed of development I can't expect
 radical changes in nearest future. There is no Databricks for oozie,
 which would have people on salary to develop this kind of radical changes.
 It's dinosaur.

 Reunold, can you help finding this doc? Do you mean just pipelining spark
 code or additional logic of persistence tasks, job server, task retry, data
 availability and extra?


 2014-09-17 11:21 GMT+04:00 Reynold Xin r...@databricks.com:

  Hi Egor,
 
  I think the design doc for the pipeline feature has been posted.
 
  For the workflow, I believe Oozie actually works fine with Spark if you
  want some external workflow system. Do you have any trouble using that?
 
 
  On Tue, Sep 16, 2014 at 11:45 PM, Egor Pahomov pahomov.e...@gmail.com
  wrote:
 
  There are two things we(Yandex) miss in Spark: MLlib good abstractions
 and
  good workflow job scheduler. From threads Adding abstraction in MlLib
  and
  [mllib] State of Multi-Model training I got the idea, that databricks
  working on it and we should wait until first post doc, which would lead
  us.
  What about workflow scheduler? Is there anyone already working on it?
 Does
  anyone have a plan on doing it?
 
  P.S. We thought that MLlib abstractions about multiple algorithms run
 with
  same data would need such scheduler, which would rerun algorithm in case
  of
  failure. I understand, that spark provide fault tolerance out of the
 box,
  but we found some Ooozie-like scheduler more reliable for such long
  living workflows.
 
  --
 
 
 
  *Sincerely yoursEgor PakhomovScala Developer, Yandex*
 
 
 


 --



 *Sincerely yoursEgor PakhomovScala Developer, Yandex*



Re: Workflow Scheduler for Spark

2014-09-17 Thread Egor Pahomov
It's doc about MLLib pipeline functionality. What about oozie-like
workflow?

2014-09-17 13:08 GMT+04:00 Mark Hamstra m...@clearstorydata.com:

 See https://issues.apache.org/jira/browse/SPARK-3530 and this doc,
 referenced in that JIRA:


 https://docs.google.com/document/d/1rVwXRjWKfIb-7PI6b86ipytwbUH7irSNLF1_6dLmh8o/edit?usp=sharing

 On Wed, Sep 17, 2014 at 2:00 AM, Egor Pahomov pahomov.e...@gmail.com
 wrote:

 I have problems using Oozie. For example it doesn't sustain spark context
 like ooyola job server does. Other than GUI interfaces like HUE it's hard
 to work with - scoozie stopped in development year ago(I spoke with
 creator) and oozie xml very hard to write.
 Oozie still have all documentation and code in MR model rather than in
 yarn
 model. And based on it's current speed of development I can't expect
 radical changes in nearest future. There is no Databricks for oozie,
 which would have people on salary to develop this kind of radical changes.
 It's dinosaur.

 Reunold, can you help finding this doc? Do you mean just pipelining spark
 code or additional logic of persistence tasks, job server, task retry,
 data
 availability and extra?


 2014-09-17 11:21 GMT+04:00 Reynold Xin r...@databricks.com:

  Hi Egor,
 
  I think the design doc for the pipeline feature has been posted.
 
  For the workflow, I believe Oozie actually works fine with Spark if you
  want some external workflow system. Do you have any trouble using that?
 
 
  On Tue, Sep 16, 2014 at 11:45 PM, Egor Pahomov pahomov.e...@gmail.com
  wrote:
 
  There are two things we(Yandex) miss in Spark: MLlib good abstractions
 and
  good workflow job scheduler. From threads Adding abstraction in MlLib
  and
  [mllib] State of Multi-Model training I got the idea, that databricks
  working on it and we should wait until first post doc, which would lead
  us.
  What about workflow scheduler? Is there anyone already working on it?
 Does
  anyone have a plan on doing it?
 
  P.S. We thought that MLlib abstractions about multiple algorithms run
 with
  same data would need such scheduler, which would rerun algorithm in
 case
  of
  failure. I understand, that spark provide fault tolerance out of the
 box,
  but we found some Ooozie-like scheduler more reliable for such long
  living workflows.
 
  --
 
 
 
  *Sincerely yoursEgor PakhomovScala Developer, Yandex*
 
 
 


 --



 *Sincerely yoursEgor PakhomovScala Developer, Yandex*





-- 



*Sincerely yoursEgor PakhomovScala Developer, Yandex*


network.ConnectionManager error

2014-09-17 Thread wyphao.2007
Hi,  When I run spark job on yarn,and the job finished success,but I found 
there are some error logs in the logfile as follow(the red color text):


14/09/17 18:25:03 INFO ui.SparkUI: Stopped Spark web UI at 
http://sparkserver2.cn:63937
14/09/17 18:25:03 INFO scheduler.DAGScheduler: Stopping DAGScheduler
14/09/17 18:25:03 INFO cluster.YarnClusterSchedulerBackend: Shutting down all 
executors
14/09/17 18:25:03 INFO cluster.YarnClusterSchedulerBackend: Asking each 
executor to shut down
14/09/17 18:25:03 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(sparkserver2.cn,9072)
14/09/17 18:25:03 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(sparkserver2.cn,9072)
14/09/17 18:25:03 ERROR network.ConnectionManager: Corresponding 
SendingConnection to ConnectionManagerId(sparkserver2.cn,9072) not found
14/09/17 18:25:03 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(sparkserver2.cn,14474)
14/09/17 18:25:03 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(sparkserver2.cn,14474)
14/09/17 18:25:03 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(sparkserver2.cn,14474)
14/09/17 18:25:04 INFO spark.MapOutputTrackerMasterActor: MapOutputTrackerActor 
stopped!
14/09/17 18:25:04 INFO network.ConnectionManager: Selector thread was 
interrupted!
14/09/17 18:25:04 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(sparkserver2.cn,9072)
14/09/17 18:25:04 INFO network.ConnectionManager: Removing SendingConnection to 
ConnectionManagerId(sparkserver2.cn,14474)
14/09/17 18:25:04 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(sparkserver2.cn,9072)
14/09/17 18:25:04 ERROR network.ConnectionManager: Corresponding 
SendingConnection to ConnectionManagerId(sparkserver2.cn,9072) not found
14/09/17 18:25:04 INFO network.ConnectionManager: Removing ReceivingConnection 
to ConnectionManagerId(sparkserver2.cn,14474)
14/09/17 18:25:04 ERROR network.ConnectionManager: Corresponding 
SendingConnection to ConnectionManagerId(sparkserver2.cn,14474) not found
14/09/17 18:25:04 WARN network.ConnectionManager: All connections not cleaned up
14/09/17 18:25:04 INFO network.ConnectionManager: ConnectionManager stopped
14/09/17 18:25:04 INFO storage.MemoryStore: MemoryStore cleared
14/09/17 18:25:04 INFO storage.BlockManager: BlockManager stopped
14/09/17 18:25:04 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
14/09/17 18:25:04 INFO spark.SparkContext: Successfully stopped SparkContext
14/09/17 18:25:04 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster 
with SUCCEEDED
14/09/17 18:25:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
Shutting down remote daemon.
14/09/17 18:25:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote 
daemon shut down; proceeding with flushing remote transports.
14/09/17 18:25:04 INFO impl.AMRMClientImpl: Waiting for application to be 
successfully unregistered.
14/09/17 18:25:04 INFO Remoting: Remoting shut down
14/09/17 18:25:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
Remoting shut down.


What is the cause of this error? My spark version is 1.1.0   hadoop version is 
2.2.0.
Thank you.

Re: network.ConnectionManager error

2014-09-17 Thread Christian Chua
I see the same thing. 

A workaround is to put a Thread.sleep(5000) statement before sc.stop()

Let us know how it goes. 



 On Sep 17, 2014, at 3:43 AM, wyphao.2007 wyphao.2...@163.com wrote:
 
 Hi,  When I run spark job on yarn,and the job finished success,but I found 
 there are some error logs in the logfile as follow(the red color text):
 
 
 14/09/17 18:25:03 INFO ui.SparkUI: Stopped Spark web UI at 
 http://sparkserver2.cn:63937
 14/09/17 18:25:03 INFO scheduler.DAGScheduler: Stopping DAGScheduler
 14/09/17 18:25:03 INFO cluster.YarnClusterSchedulerBackend: Shutting down all 
 executors
 14/09/17 18:25:03 INFO cluster.YarnClusterSchedulerBackend: Asking each 
 executor to shut down
 14/09/17 18:25:03 INFO network.ConnectionManager: Removing SendingConnection 
 to ConnectionManagerId(sparkserver2.cn,9072)
 14/09/17 18:25:03 INFO network.ConnectionManager: Removing 
 ReceivingConnection to ConnectionManagerId(sparkserver2.cn,9072)
 14/09/17 18:25:03 ERROR network.ConnectionManager: Corresponding 
 SendingConnection to ConnectionManagerId(sparkserver2.cn,9072) not found
 14/09/17 18:25:03 INFO network.ConnectionManager: Removing 
 ReceivingConnection to ConnectionManagerId(sparkserver2.cn,14474)
 14/09/17 18:25:03 INFO network.ConnectionManager: Removing SendingConnection 
 to ConnectionManagerId(sparkserver2.cn,14474)
 14/09/17 18:25:03 INFO network.ConnectionManager: Removing SendingConnection 
 to ConnectionManagerId(sparkserver2.cn,14474)
 14/09/17 18:25:04 INFO spark.MapOutputTrackerMasterActor: 
 MapOutputTrackerActor stopped!
 14/09/17 18:25:04 INFO network.ConnectionManager: Selector thread was 
 interrupted!
 14/09/17 18:25:04 INFO network.ConnectionManager: Removing SendingConnection 
 to ConnectionManagerId(sparkserver2.cn,9072)
 14/09/17 18:25:04 INFO network.ConnectionManager: Removing SendingConnection 
 to ConnectionManagerId(sparkserver2.cn,14474)
 14/09/17 18:25:04 INFO network.ConnectionManager: Removing 
 ReceivingConnection to ConnectionManagerId(sparkserver2.cn,9072)
 14/09/17 18:25:04 ERROR network.ConnectionManager: Corresponding 
 SendingConnection to ConnectionManagerId(sparkserver2.cn,9072) not found
 14/09/17 18:25:04 INFO network.ConnectionManager: Removing 
 ReceivingConnection to ConnectionManagerId(sparkserver2.cn,14474)
 14/09/17 18:25:04 ERROR network.ConnectionManager: Corresponding 
 SendingConnection to ConnectionManagerId(sparkserver2.cn,14474) not found
 14/09/17 18:25:04 WARN network.ConnectionManager: All connections not cleaned 
 up
 14/09/17 18:25:04 INFO network.ConnectionManager: ConnectionManager stopped
 14/09/17 18:25:04 INFO storage.MemoryStore: MemoryStore cleared
 14/09/17 18:25:04 INFO storage.BlockManager: BlockManager stopped
 14/09/17 18:25:04 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
 14/09/17 18:25:04 INFO spark.SparkContext: Successfully stopped SparkContext
 14/09/17 18:25:04 INFO yarn.ApplicationMaster: Unregistering 
 ApplicationMaster with SUCCEEDED
 14/09/17 18:25:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
 Shutting down remote daemon.
 14/09/17 18:25:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
 Remote daemon shut down; proceeding with flushing remote transports.
 14/09/17 18:25:04 INFO impl.AMRMClientImpl: Waiting for application to be 
 successfully unregistered.
 14/09/17 18:25:04 INFO Remoting: Remoting shut down
 14/09/17 18:25:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
 Remoting shut down.
 
 
 What is the cause of this error? My spark version is 1.1.0   hadoop version 
 is 2.2.0.
 Thank you.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [mllib] State of Multi-Model training

2014-09-17 Thread Kyle Ellrott
This sounds like a pretty major re-write of the system. Is it going to live
in an different repo during development? Or will we be able to track
progress in the main Spark repo?

Kyle

On Tue, Sep 16, 2014 at 10:22 PM, Burak Yavuz bya...@stanford.edu wrote:

 Hi Kyle,

 Thank you for the code examples. We may be able to use some of the ideas
 there. I think initially the goal is to have the optimizers ready (SGD,
 LBFGS),
 and then the evaluation metrics will come next. It might take some time,
 however as MLlib is going to have a significant API face-lift (e.g.
 https://issues.apache.org/jira/browse/SPARK-3530). Evaluation metrics
 will be significant in the new pipelines and the ability to evaluate
 multiple models
 efficiently is very important. We encourage you to read through the design
 docs, and we would appreciate any feedback from you and the rest of the
 community!

 Best,
 Burak

 - Original Message -
 From: Kyle Ellrott kellr...@soe.ucsc.edu
 To: Burak Yavuz bya...@stanford.edu
 Cc: dev@spark.apache.org
 Sent: Tuesday, September 16, 2014 9:41:45 PM
 Subject: Re: [mllib] State of Multi-Model training

 I'd be interested in helping to test your code as soon as its available.
 The version I wrote used a paired RDD and combined by key, it worked best
 if it used a custom partitioner that put all the samples in the same area.
 Running things in batched matrices would probably speed things up greatly.
 You probably won't need my training code, but I did write some stuff
 related to calculating Binary classifications metric (
 https://github.com/apache/spark/pull/1292/files#diff-6) and AUC (
 https://github.com/apache/spark/pull/1292/files#diff-5) for multiple
 models
 that you might be able to use.

 Kyle


 On Tue, Sep 16, 2014 at 4:09 PM, Burak Yavuz bya...@stanford.edu wrote:

  Hi Kyle,
 
  I'm actively working on it now. It's pretty close to completion, I'm just
  trying to figure out bottlenecks and optimize as much as possible.
  As Phase 1, I implemented multi model training on Gradient Descent.
  Instead of performing Vector-Vector operations on rows (examples) and
  weights,
  I've batched them into matrices so that we can use Level 3 BLAS to speed
  things up. I've also added support for Sparse Matrices (
  https://github.com/apache/spark/pull/2294) as making use of sparsity
 will
  allow you to train more models at once.
 
  Best,
  Burak
 
  - Original Message -
  From: Kyle Ellrott kellr...@soe.ucsc.edu
  To: dev@spark.apache.org
  Sent: Tuesday, September 16, 2014 3:21:53 PM
  Subject: [mllib] State of Multi-Model training
 
  I'm curious about the state of development Multi-Model learning in MLlib
  (training sets of models during the same training session, rather then
 one
  at a time). The JIRA lists it as in progress targeting Spark 1.2.0 (
  https://issues.apache.org/jira/browse/SPARK-1486 ). But there hasn't
 been
  any notes on it in over a month.
  I submitted a pull request for a possible method to do this work a little
  over two months ago (https://github.com/apache/spark/pull/1292), but
  haven't yet received any feedback on the patch yet.
  Is anybody else working on multi-model training?
 
  Kyle
 
 




Re: network.ConnectionManager error

2014-09-17 Thread Reynold Xin
This is during shutdown right? Looks ok to me since connections are being
closed. We could've handle this more gracefully, but the logs look
harmless.

On Wednesday, September 17, 2014, wyphao.2007 wyphao.2...@163.com wrote:

 Hi,  When I run spark job on yarn,and the job finished success,but I found
 there are some error logs in the logfile as follow(the red color text):

 14/09/17 18:25:03 INFO ui.SparkUI: Stopped Spark web UI at
 http://sparkserver2.cn:63937
 14/09/17 18:25:03 INFO scheduler.DAGScheduler: Stopping DAGScheduler
 14/09/17 18:25:03 INFO cluster.YarnClusterSchedulerBackend: Shutting down
 all executors
 14/09/17 18:25:03 INFO cluster.YarnClusterSchedulerBackend: Asking each
 executor to shut down
 14/09/17 18:25:03 INFO network.ConnectionManager: Removing
 SendingConnection to ConnectionManagerId(sparkserver2.cn,9072)
 14/09/17 18:25:03 INFO network.ConnectionManager: Removing
 ReceivingConnection to ConnectionManagerId(sparkserver2.cn,9072)
 14/09/17 18:25:03 ERROR network.ConnectionManager: Corresponding
 SendingConnection to ConnectionManagerId(sparkserver2.cn,9072) not found
 14/09/17 18:25:03 INFO network.ConnectionManager: Removing
 ReceivingConnection to ConnectionManagerId(sparkserver2.cn,14474)
 14/09/17 18:25:03 INFO network.ConnectionManager: Removing
 SendingConnection to ConnectionManagerId(sparkserver2.cn,14474)
 14/09/17 18:25:03 INFO network.ConnectionManager: Removing
 SendingConnection to ConnectionManagerId(sparkserver2.cn,14474)
 14/09/17 18:25:04 INFO spark.MapOutputTrackerMasterActor:
 MapOutputTrackerActor stopped!
 14/09/17 18:25:04 INFO network.ConnectionManager: Selector thread was
 interrupted!
 14/09/17 18:25:04 INFO network.ConnectionManager: Removing
 SendingConnection to ConnectionManagerId(sparkserver2.cn,9072)
 14/09/17 18:25:04 INFO network.ConnectionManager: Removing
 SendingConnection to ConnectionManagerId(sparkserver2.cn,14474)
 14/09/17 18:25:04 INFO network.ConnectionManager: Removing
 ReceivingConnection to ConnectionManagerId(sparkserver2.cn,9072)
 14/09/17 18:25:04 ERROR network.ConnectionManager: Corresponding
 SendingConnection to ConnectionManagerId(sparkserver2.cn,9072) not found
 14/09/17 18:25:04 INFO network.ConnectionManager: Removing
 ReceivingConnection to ConnectionManagerId(sparkserver2.cn,14474)
 14/09/17 18:25:04 ERROR network.ConnectionManager: Corresponding
 SendingConnection to ConnectionManagerId(sparkserver2.cn,14474) not found
 14/09/17 18:25:04 WARN network.ConnectionManager: All connections not
 cleaned up
 14/09/17 18:25:04 INFO network.ConnectionManager: ConnectionManager stopped
 14/09/17 18:25:04 INFO storage.MemoryStore: MemoryStore cleared
 14/09/17 18:25:04 INFO storage.BlockManager: BlockManager stopped
 14/09/17 18:25:04 INFO storage.BlockManagerMaster: BlockManagerMaster
 stopped
 14/09/17 18:25:04 INFO spark.SparkContext: Successfully stopped
 SparkContext
 14/09/17 18:25:04 INFO yarn.ApplicationMaster: Unregistering
 ApplicationMaster with SUCCEEDED
 14/09/17 18:25:04 INFO remote.RemoteActorRefProvider$RemotingTerminator:
 Shutting down remote daemon.
 14/09/17 18:25:04 INFO remote.RemoteActorRefProvider$RemotingTerminator:
 Remote daemon shut down; proceeding with flushing remote transports.
 14/09/17 18:25:04 INFO impl.AMRMClientImpl: Waiting for application to be
 successfully unregistered.
 14/09/17 18:25:04 INFO Remoting: Remoting shut down
 14/09/17 18:25:04 INFO remote.RemoteActorRefProvider$RemotingTerminator:
 Remoting shut down.

 What is the cause of this error? My spark version is 1.1.0   hadoop
 version is 2.2.0.
 Thank you.





Re: problem with HiveContext inside Actor

2014-09-17 Thread Michael Armbrust
- dev

Is it possible that you are constructing more than one HiveContext in a
single JVM?  Due to global state in Hive code this is not allowed.

Michael

On Wed, Sep 17, 2014 at 7:21 PM, Cheng, Hao hao.ch...@intel.com wrote:

  Hi, Du

 I am not sure what you mean “triggers the HiveContext to create a
 database”, do you create the sub class of HiveContext? Just be sure you
 call the “HiveContext.sessionState” eagerly, since it will set the proper
 “hiveconf” into the SessionState, otherwise the HiveDriver will always get
 the null value when retrieving HiveConf.



 Cheng Hao



 *From:* Du Li [mailto:l...@yahoo-inc.com.INVALID]
 *Sent:* Thursday, September 18, 2014 7:51 AM
 *To:* u...@spark.apache.org; dev@spark.apache.org
 *Subject:* problem with HiveContext inside Actor



 Hi,



 Wonder anybody had similar experience or any suggestion here.



 I have an akka Actor that processes database requests in high-level
 messages. Inside this Actor, it creates a HiveContext object that does the
 actual db work. The main thread creates the needed SparkContext and passes
 in to the Actor to create the HiveContext.



 When a message is sent to the Actor, it is processed properly except that,
 when the message triggers the HiveContext to create a database, it throws a
 NullPointerException in hive.ql.Driver.java which suggests that its conf
 variable is not initialized.



 Ironically, it works fine if my main thread directly calls
 actor.hiveContext to create the database. The spark version is 1.1.0.



 Thanks,

 Du