RE: Spark Tasks Progress

2019-09-23 Thread Ilya Matiach
@Sultan Alamro<mailto:sultan.ala...@gmail.com> great question.  I had a similar 
scenario, where workers needed to aggregate host:port information for 
initializing an MPI ring, and I used direct socket communication between the 
workers and driver.

This is where the driver accepts sockets from workers:
https://github.com/Azure/mmlspark/blob/master/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMUtils.scala#L125

And this is where the workers send information to the driver:
https://github.com/Azure/mmlspark/blob/master/src/main/scala/com/microsoft/ml/spark/lightgbm/TrainUtils.scala#L347

You could probably do something similar, and send the partitionId or some other 
id to the driver.

Hope this helps!
Thank you, Ilya


From: Sultan Alamro 
Sent: Friday, September 20, 2019 8:26 PM
To: dev@spark.apache.org
Subject: Spark Tasks Progress

Hi all,

I am trying to do some actions at the Driver side in Spark while an application 
is running. The Driver needs to know the tasks progress before making any 
decision. I know that tasks progress can be accessed within each executor or 
task from RecordReader class by calling getProgress().

The question is, how can I let the Driver call or have an access to 
getProgress() method of each task? I thought about using broadcast and 
accumulator variables, but I don't know how the Driver would distinguish 
between different tasks.

Note that I am not looking for results displayed in Spark UI.

Any help is appreciated!


Spark Tasks Progress

2019-09-20 Thread Sultan Alamro
Hi all,

I am trying to do some actions at the Driver side in Spark while an
application is running. The Driver needs to know the tasks progress before
making any decision. I know that tasks progress can be accessed within each
executor or task from RecordReader class by calling getProgress().

The question is, how can I let the Driver call or have an access to
getProgress() method of each task? I thought about using broadcast and
accumulator
variables, but I don't know how the Driver would distinguish between
different tasks.

Note that I am not looking for results displayed in Spark UI.

Any help is appreciated!