Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/4821#issuecomment-76780546
  
    @pwendell the header is needed because it contains potentially useful 
information for the code parsing the logs. For example, now it contains the 
Spark version, which might be needed to tell the parsing code which properties 
to expect in the logs.
    
    The original version of the change (the one that got rid of the directories 
and used a single file) encoded all metadata in the file name. The feedback was 
that it was ugly (long, cryptic file names) and brittle, since if you change 
the file name, you lose that information. I agree with that and thus the header 
was born.
    
    Now we're back to encoding metadata in the file name. A simple extension is 
not to bad, though, espcially since you can probably figure out the compression 
codec by looking at the first few bytes of the file. But the header still 
provides useful information.
    
    So I'm a little worried that the latest patch removes the metadata 
completely. Especially since it's common for the first event of the log to 
*not* be the one that contains the spark version 
(`SparkListenerEnvironmentUpdate`?), and instead be 
`SparkListenerBlockManagerAdded`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to