cxzl25 opened a new pull request #30725:
URL: https://github.com/apache/spark/pull/30725


   ### What changes were proposed in this pull request?
   Modify cache(`hadoopJobMetadata`) softValues to weakValues.
   
   ### Why are the changes needed?
   Reduce driver memory pressure, gc time and frequency, job execution time.
   
   `HadoopRDD` uses soft-reference map to cache `jobconf` (rdd_id -> jobconf)
   When the number of hive partitions read by the driver is large, 
HadoopRDD.getPartitions will create many jobconfs and add them to the cache.
   The executor will also create a jobconf, add it to the cache, and share it 
among exeuctors.
   
   The number of jobconfs in the driver cache increases the memory pressure. 
When the driver memory configuration is not high, full gc becoming very 
frequent, and these jobconfs are hardly reused.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Exist UT
   Manual test
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to