Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1222#issuecomment-67709747
  
    Ah yes, that won't work straight out of the box when there's compression. 
However, I still think it makes sense to pass it an iterator rather than an 
input stream. What we could do is read lines manually without buffering, e.g. 
something like the following
    ```
    def readLine(in: java.io.InputStream): String = {
      var x = in.read()
      var line = "" // use string builder here later
      while (x != 10 && x != -1) {
        line += Character.toString(x.toChar)
        x = in.read()
      }
      return line
    }
    ```
    Then when the return value equals `=== HEADER END MARKER ===`, we wrap the 
stream in a buffered stream and then a compressed stream (as we already do in 
`ReplayListenerBus`) and extract the lines from this into an iterator, e.g.
    ```
    val fstream = // file stream after reading in the header end marker
    val bstream = new BufferedInputStream(fstream)
    val cstream = codec.compressedInputStream(bstream)
    val jsonEvents: Iterator[String] = 
scala.io.Source.fromInputStream(cstream).getLines
    ```
    I tried this locally and it does what I expect. This doesn't seem super 
complicated to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to