Github user chesterxgchen commented on the pull request:

    https://github.com/apache/spark/pull/2786#issuecomment-62060857
  
    Tom,
    
     thanks for reviewing. 
    
    I am still working on the second PR, which I haven't submitted yet. The 
code is currently used in our application and I am pulling them out from our 
code and make a PR for it. The current code only uses Akka to do the 
communication, I would like to add the Netty support as well before I submit 
the Pull Request, that's why I haven't submit it yet. 
    
    The followings are the use cases in our application, which show how  the 
new APIs are used. I assume other applications will have the similar use cases. 
    
    Our application doesn't use spark-submit command line to run spark. We 
submit both hadoop and  spark job directly from our servlet application 
(jetty). We are deploying in Yarn Cluster Mode. We invoke the Spark Client ( in 
yarn module) directly. Client can't call System.exists, which will shutdown the 
jetty JVM. 
    
    Our application will submit and stop spark job, monitoring the spark job 
progress, get the states from the spark jobs ( for example, bad data counters 
), logging and exceptions.  So far the communication is one way (direction) 
after the job is submitted; we will move to two-ways communication soon. ( for 
example, a long running spark context, with different short spark actions: 
distinct counts, samplings, filters, transformations, etc. a pipeline of 
actions but need to be feedback on each action ( visualization etc.) ) 
    
    In this particular Pull Request, we only address very limited requirements, 
the next PR will address the rest of communication mentioned above. 
    
    1) Get Yarn Container Capacities before submit Yarn Applications 
        In our Spark Job,  we can use this callback in two ways: 
         A) we cap the request memory usage if the request is too large.  For 
example, if the spark.executor.memory supplied by client is larger than the 
Yarn Container max memory, we reset the spark.executor.memory to yarn max 
container max memory  minus over head and send a message to the Application ( 
UI message) tell the user that we reset the memory.  Or we could simply throw 
exception without submit the job. 
          users might be use the information about virtual cores to do other 
validation. 
    
       2) We can dynamically estimate the executor memory based on data size ( 
if you have the information from prev processing steps) and max memory 
available; rather than directly use the fix memory size and potentially get 
kill if they are too large. 
    
    2) Add some callback via listener to monitoring Yarn application progress
       We are using the listener call back do show progress ( limited 
information) 
        1) We have tracking URL from Yarn that can bring us directly to the 
Hadoop Cluster Application management page. 
          As soon as the Yarn container is created and job is submitted, we 
have tracking URL from Yarn ( we need to watch out for invalid URL), at this 
point you can put the URL in the UI, even the Spark job is not started yet. 
        2) We display the progress bar on the UI with the callback 
        For CDH5, we only got 0%, 10% and 100%, not very useful, but still some 
earlier feedback for customer. 
        3)We get the Yarn Application ID when the job is started, which can be 
used for tracking progress or kill the app. 
     ( with next PR, we are able to directly using the tracking URL to open the 
spark UI page, show spark job iterations and spark specific progress etc., 
currently all above are implemented in our application.)
    
    3) expose Yarn Kill Application API 
    
        Yes, you can directly invoke from command line with 
          yarn kill -applicationId appid.
    
        But since we need to call from our application, we need a API to do 
this.  In our application, if client start the job and then decided to stop it 
( running too long, change parameters etc.), we have to use kill API to kill 
it, as stop API doesn't stop it. 
    
    Hope this will give you a better picture as to why this PR is important to 
us. 
    
    I will move faster with next PR mentioned. 
    
    Thanks
    Chester
    
    
    
    
    
    
    
    
    
    
    
    
    
        
      


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to