kevin85421 opened a new pull request, #37409:
URL: https://github.com/apache/spark/pull/37409

   ### What changes were proposed in this pull request?
   This PR implemented a ThrottledLogger, a logger with RateLimiters, to 
prevent  log message flooding caused by network issues. In our ThrottledLogger, 
we enable users to register prefixes. Each prefix has its RateLimiter, and each 
prefix has its rate limit strategy (by setting different value of 
`throttlingSeconds`). Each RateLimiter will create a permit per second, and 
printing a message needs to acquire `throttlingSeconds` permits. For example, 
   
   ```java
   ThrottledLogger tlogger = new ThrottledLogger("ThrottledLogger");
   tlogger.registerPrefix("msg", 2); // Printing a message with the prefix 
"msg" consumes 2 permits.
   
   // will be printed
   tlogger.info("msg1"); 
   
   // The RateLimiter of prefix "msg" does not have enough permits. => will not 
be printed
   tlogger.info("msg2"); 
   
   // The message which does not match with any registered prefix will always 
be printed.
   tlogger.info("abc"); 
   
   // Sleep two seconds, and the RateLimiter will get 2 permits.
   Thread.sleep(2000);
   
   // The RateLimiter of the prefix "msg" has enough permits. => print
   tlogger.info("msg3");
   ```
   
   
   ### Why are the changes needed?
   When transient network error occurs, Spark may write out a large volume of 
error logs related to the network errors. The excessive logging can create 
further problems downstream, e.g. blow up our log storage. If we can combine / 
batch the network errors when they come in a burst, we only print out periodic 
summaries of errors that are happening repeatedly in a short time window.
   
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   ```
   build/sbt "network-common/testOnly *ThrottledLoggerSuite"
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to