Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/4821#issuecomment-76641955
I took a pass on this with some feedback. Overall, it would be good to
really minimize the scope of the changes since this is so late in the game.
There is some clean-up and renaming etc that would be best just left out of the
patch.
The main thing I'm wondering is why we need this header at all. It doesn't
even ever get used by our own replay - we just ignore it. It seems like it was
added for the purpose of conveying the compression codec to bootstrap replaying
the file, however just having an extension seems like a better, much more
standard way of doing that. The only argument I see for it is that the header
could be used in the future to encode things that are necessary for proper
replay of the logs. However, in that case I don't see why we can't just add it
later if and when those things occur.
I guess I don't see a good argument against a straw man of just not having
the header. Curious to hear thoughts from @andrewor14 and @vanzin.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]