akshatb1 opened a new pull request #327:
URL: https://github.com/apache/incubator-livy/pull/327


   ## What changes were proposed in this pull request?
   Currently Livy queries Yarn applications by applicationType : Spark. This 
will put heavy load on Yarn clusters if there are thousands or more Spark 
application in all states (running, finished, failed, queued etc.).
   A better approach would be to query the applications by tags in addition to 
job type since Livy only needs to track application with certain application 
tags. However, YarnClient does not expose any API to query applications by tags.
   
   As part of this imlementation, extending the YarnClientImpl and implementing 
getApplications method which can take GetApplicationRequest as paramter. 
Instead of querying all SPARK application, query SPARK application with 
required tags to avoid load on Yarn and Livy servers.
   
   JIRA: https://issues.apache.org/jira/browse/LIVY-866
   
   ## How was this patch tested?
   
   Verfied in a local Yarn cluster. Checked in the trace logs that the request 
is sent with the applicationTags and the response returns the application 
report. Please see the logs below.
   
   Verified that other calls to Yarn client such as 
getApplicationAttemptReport, getContainerReport are successful.
   Updated existing tests to use the new YarnClientExt.
   
   `21/09/07 15:38:50 TRACE YarnClientExt: getApplications called in 
YarnClientExt with GetApplicationsRequest, calling rmClient to get Applications`
   `21/09/07 15:38:50 TRACE ProtobufRpcEngine: 75: Call -> 
0.0.0.0/0.0.0.0:8032: getApplications {application_types: "SPARK" 
applicationTags: "livy-batch-5-osefkl7m"}`
   `21/09/07 15:38:50 DEBUG Client: IPC Client (72154307) connection to 
0.0.0.0/0.0.0.0:8032 from Administrator sending #28`
   `21/09/07 15:38:50 DEBUG Client: IPC Client (72154307) connection to 
0.0.0.0/0.0.0.0:8032 from Administrator got value #28`
   `21/09/07 15:38:50 DEBUG ProtobufRpcEngine: Call: getApplications took 8ms`
   `21/09/07 15:38:50 TRACE ProtobufRpcEngine: 75: Response <- 
0.0.0.0/0.0.0.0:8032: getApplications {applications { applicationId { id: 2 
cluster_timestamp: 1631009244715 } user: "Administrator" queue: "default" name: 
"SparkBatchJobTest-8" host: "N/A" rpc_port: -1 yarn_application_state: ACCEPTED 
trackingUrl: "http://MININT-AHVKP1D:8088/proxy/application_1631009244715_0002/"; 
diagnostics: "[Tue Sep 07 15:38:50 +0530 2021] Application is Activated, 
waiting for resources to be assigned for AM.  Details : AM Partition = 
<DEFAULT_PARTITION> ; Partition Resource = <memory:14336, vCores:8> ; Queue\'s 
Absolute capacity = 100.0 % ; Queue\'s Absolute used capacity = 0.0 % ; 
Queue\'s Absolute max capacity = 100.0 % ; " startTime: 1631009330477 
finishTime: 0 final_application_status: APP_UNDEFINED app_resource_Usage { 
num_used_containers: 0 num_reserved_containers: 0 used_resources { memory: 0 
virtual_cores: 0 3: "\n\tmemory-mb\020\000\032\002Mi \000" 3: 
"\n\006vcores\020\000\032\000 \000" } re
 served_resources { memory: 0 virtual_cores: 0 3: 
"\n\tmemory-mb\020\000\032\002Mi \000" 3: "\n\006vcores\020\000\032\000 \000" } 
needed_resources { memory: 0 virtual_cores: 0 3: 
"\n\tmemory-mb\020\000\032\002Mi \000" 3: "\n\006vcores\020\000\032\000 \000" } 
memory_seconds: 0 vcore_seconds: 0 8: 0x00000000 9: 0x00000000 10: 0 11: 0 12: 
"\n\tmemory-mb\020\000" 12: "\n\006vcores\020\000" 13: "\n\tmemory-mb\020\000" 
13: "\n\006vcores\020\000" } originalTrackingUrl: "N/A" 
currentApplicationAttemptId { application_id { id: 2 cluster_timestamp: 
1631009244715 } attemptId: 1 } progress: 0.0 applicationType: "SPARK" 
applicationTags: "livy-batch-5-osefkl7m" 21: 1 22: 0 23: "\b\000" 24: "<Not 
set>" 25: "<DEFAULT_PARTITION>" 26: 
"\b\001\022\030\b\001\022\tUNLIMITED\030\377\377\377\377\377\377\377\377\377\001"
 27: 0 }}`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to