[GitHub] [spark] HyukjinKwon commented on a change in pull request #27085: [SPARK-29779][CORE] Compact old event log files and cleanup
HyukjinKwon commented on a change in pull request #27085: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/27085#discussion_r373345690 ## File path: core/src/main/resources/META-INF/services/org.apache.spark.deploy.history.EventFilterBuilder ## @@ -0,0 +1 @@ +org.apache.spark.deploy.history.BasicEventFilterBuilder Review comment: Okay, thanks. at least it's consistent. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #27085: [SPARK-29779][CORE] Compact old event log files and cleanup
HyukjinKwon commented on a change in pull request #27085: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/27085#discussion_r373307820 ## File path: core/src/main/resources/META-INF/services/org.apache.spark.deploy.history.EventFilterBuilder ## @@ -0,0 +1 @@ +org.apache.spark.deploy.history.BasicEventFilterBuilder Review comment: I see. I think that's possible via simply using reflection which I think is easier to read the codes. I think we're already doing this in few places such as `FileCommitProtocol.instantiate` Seems a bit odds to use service loader for internal classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #27085: [SPARK-29779][CORE] Compact old event log files and cleanup
HyukjinKwon commented on a change in pull request #27085: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/27085#discussion_r373306885 ## File path: core/src/main/scala/org/apache/spark/internal/config/package.scala ## @@ -195,6 +195,24 @@ package object config { "configured to be at least 10 MiB.") .createWithDefaultString("128m") + private[spark] val EVENT_LOG_ROLLING_MAX_FILES_TO_RETAIN = +ConfigBuilder("spark.eventLog.rolling.maxFilesToRetain") + // TODO: remove this when integrating compactor with FsHistoryProvider + .internal() + .doc("The maximum number of event log files which will be retained as non-compacted. " + +"By default, all event log files will be retained. Please set the configuration " + +s"and ${EVENT_LOG_ROLLING_MAX_FILE_SIZE.key} accordingly if you want to control " + +"the overall size of event log files.") + .intConf + .checkValue(_ > 0, "Max event log files to retain should be higher than 0.") + .createWithDefault(Integer.MAX_VALUE) Review comment: Why didn't we make it optional, or defaults to -1 to express "all event log files will be retained"? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #27085: [SPARK-29779][CORE] Compact old event log files and cleanup
HyukjinKwon commented on a change in pull request #27085: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/27085#discussion_r373304218 ## File path: core/src/main/scala/org/apache/spark/internal/config/package.scala ## @@ -195,6 +195,24 @@ package object config { "configured to be at least 10 MiB.") .createWithDefaultString("128m") + private[spark] val EVENT_LOG_ROLLING_MAX_FILES_TO_RETAIN = +ConfigBuilder("spark.eventLog.rolling.maxFilesToRetain") + // TODO: remove this when integrating compactor with FsHistoryProvider + .internal() + .doc("The maximum number of event log files which will be retained as non-compacted. " + +"By default, all event log files will be retained. Please set the configuration " + +s"and ${EVENT_LOG_ROLLING_MAX_FILE_SIZE.key} accordingly if you want to control " + +"the overall size of event log files.") + .intConf + .checkValue(_ > 0, "Max event log files to retain should be higher than 0.") + .createWithDefault(Integer.MAX_VALUE) + + private[spark] val EVENT_LOG_COMPACTION_SCORE_THRESHOLD = +ConfigBuilder("spark.eventLog.rolling.compaction.score.threshold") + .internal() Review comment: I think we should have added some docs here too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #27085: [SPARK-29779][CORE] Compact old event log files and cleanup
HyukjinKwon commented on a change in pull request #27085: [SPARK-29779][CORE] Compact old event log files and cleanup URL: https://github.com/apache/spark/pull/27085#discussion_r373302332 ## File path: core/src/main/resources/META-INF/services/org.apache.spark.deploy.history.EventFilterBuilder ## @@ -0,0 +1 @@ +org.apache.spark.deploy.history.BasicEventFilterBuilder Review comment: `EventFilterBuilder` is private in Spark. Do you mind if I ask why we use service loader? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org