Re: Spark client reconnect to driver in yarn-cluster deployment mode

2015-01-20 Thread Andrew Or
Hi Preeze,

 Is there any designed way that the client connects back to the driver
(still running in YARN) for collecting results at a later stage?

No, there is not support built into Spark for this. For this to happen
seamlessly the driver will have to start a server (pull model) or send the
results to some other server once the jobs complete (push model), both of
which add complexity to the driver. Alternatively, you can just poll on the
output files that your application produces; e.g. you can have your driver
write the results of a count to a file and poll on that file. Something
like that.

-Andrew

2015-01-19 5:59 GMT-08:00 Romi Kuntsman r...@totango.com:

 in yarn-client mode it only controls the environment of the executor
 launcher

 So you either use yarn-client mode, and then your app keeps running and
 controlling the process
 Or you use yarn-cluster mode, and then you send a jar to YARN, and that jar
 should have code to report the result back to you

 *Romi Kuntsman*, *Big Data Engineer*
  http://www.totango.com

 On Thu, Jan 15, 2015 at 1:52 PM, preeze etan...@gmail.com wrote:

  From the official spark documentation
  (http://spark.apache.org/docs/1.2.0/running-on-yarn.html):
 
  In yarn-cluster mode, the Spark driver runs inside an application master
  process which is managed by YARN on the cluster, and the client can go
 away
  after initiating the application.
 
  Is there any designed way that the client connects back to the driver
  (still
  running in YARN) for collecting results at a later stage?
 
 
 
  --
  View this message in context:
 
 http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-client-reconnect-to-driver-in-yarn-cluster-deployment-mode-tp10122.html
  Sent from the Apache Spark Developers List mailing list archive at
  Nabble.com.
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 



Re: Spark client reconnect to driver in yarn-cluster deployment mode

2015-01-19 Thread Romi Kuntsman
in yarn-client mode it only controls the environment of the executor
launcher

So you either use yarn-client mode, and then your app keeps running and
controlling the process
Or you use yarn-cluster mode, and then you send a jar to YARN, and that jar
should have code to report the result back to you

*Romi Kuntsman*, *Big Data Engineer*
 http://www.totango.com

On Thu, Jan 15, 2015 at 1:52 PM, preeze etan...@gmail.com wrote:

 From the official spark documentation
 (http://spark.apache.org/docs/1.2.0/running-on-yarn.html):

 In yarn-cluster mode, the Spark driver runs inside an application master
 process which is managed by YARN on the cluster, and the client can go away
 after initiating the application.

 Is there any designed way that the client connects back to the driver
 (still
 running in YARN) for collecting results at a later stage?



 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-client-reconnect-to-driver-in-yarn-cluster-deployment-mode-tp10122.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org