sarutak commented on PR #54575: URL: https://github.com/apache/spark/pull/54575#issuecomment-3987607977
@dongjoon-hyun Thank you for your interest. > What happens when there exists a conflict among the log directories? For example, a user want to abuse this as a kind of multi-tier log managements like the following and copy from shorterm to longterm? Of course, the sync operation is non-atomic. > > hdfs://spark-events/shorterm > hdfs://spark-events/longterm Each event log file is tracked by its full path as the key in `LogInfo`. So if the same application's event log exists in both directories, they are treated as separate entries. I didn't anticipated such kind of usage but during a non-atomic copy, the incomplete log file in the destination directory may fail to parse or show incomplete information temporarily. However, on the next scan cycle, `shouldReloadLog` invoked through `checkForLogs` [detects](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L553) the file size change and re-parses it, so the entry self-corrects once the copy completes. > What is the semantic on the ordering in the config value? Especially, when we have [SPARK-52914](https://issues.apache.org/jira/browse/SPARK-52914) ? The ordering of directories in the config value has no semantic. All directories are scanned equally in each polling cycle (`checkForLogs` iterates over all `logDirs`). The order does not affect priority. On-demand loading operates per log file within `checkForLogsInDir`, which is called independently for each directory. There is no cross-directory interaction, so I believe multiple directories support and on-demand loading are orthogonal and work together without issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
