Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/20575
  
    True. Still, to be able to do that, you're hardcoding YARN-isms into the 
code, e.g., how application IDs look, so that you can create a "fake" 
application entry that will, hopefully, eventually match the actual contents of 
the log file.
    
    What you're trying here is a stop-gap fix for SPARK-6951. I was hoping we 
could have an actual solution to that problem. I thought about skipping data 
(instead of the current code that still reads the data, just doesn't process 
events it doesn't care about), but couldn't figure out how to make that work 
with compression on.
    
    There have been suggestions thrown around, like having Spark write a 
summary file side-by-side with the event log, for the SHS to consume. But that 
doesn't help existing event logs.
    
    If you'd like to go down this path I'd suggest forgetting about the whole 
app id parsing thing, and creating actual, fake entries for these logs that 
clearly indicate they're fake and temporary, and cleaning them up once the log 
file is parsed. You could do that by creating the fake entry (if the app's 
entry doesn't exist yet) and providing it to the parsing task, so that once 
it's done it cleans up the temp entry before writing the real one.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to