sarutak commented on PR #54575:
URL: https://github.com/apache/spark/pull/54575#issuecomment-3987607977

   @dongjoon-hyun Thank you for your interest.
   
   > What happens when there exists a conflict among the log directories? For 
example, a user want to abuse this as a kind of multi-tier log managements like 
the following and copy from shorterm to longterm? Of course, the sync operation 
is non-atomic.
   >
   > hdfs://spark-events/shorterm
   > hdfs://spark-events/longterm
   
   Each event log file is tracked by its full path as the key in `LogInfo`. So 
if the same application's event log exists in both directories, they are 
treated as separate entries.
   I didn't anticipated such kind of usage but during a non-atomic copy, the 
incomplete log file in the destination directory may fail to parse or show 
incomplete information temporarily. However, on the next scan cycle, 
`shouldReloadLog` invoked through `checkForLogs` 
[detects](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L553)
 the file size change and re-parses it, so the entry self-corrects once the 
copy completes.
   
   > What is the semantic on the ordering in the config value? Especially, when 
we have [SPARK-52914](https://issues.apache.org/jira/browse/SPARK-52914) ?
   
   The ordering of directories in the config value has no semantic. All 
directories are scanned equally in each polling cycle (`checkForLogs` iterates 
over all `logDirs`). The order does not affect priority.
   
   On-demand loading operates per log file within `checkForLogsInDir`, which is 
called independently for each directory. There is no cross-directory 
interaction, so I believe multiple directories support and on-demand loading 
are orthogonal and work together without issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to