Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/1222#issuecomment-67576790
  
    The event log is not human readable when it's compressed... and the header 
is mostly human readable now, even with the tiny binary ints encoded in there.
    
    The problem with `readLine()` is buffering. Imagine, using a BufferedReader:
    
        === LOG_HEADER_END ===\nsomebinarydatapartoftheactuallog
                              [1]          [2]
    
    You have:
    [1] is the end of the line
    [2] is the current file pointer in the underlying FileInputStream
    
    When you detect the header you can't just wrap the underlying input stream 
with the compression codec stream, because you need that buffered data (the 
data between [1] and [2]) for the event data to make sense. So you'd need some 
way of keeping track of where the actuall start of the event data is.
    
    That could be done, I haven't really thought of it. I actually thought 
about using "mark/reset" but that sounds ugly and may not work with all stream 
implementations. The approach I took avoids all that since the input stream 
pointer will never go past the end of the header.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to