Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/1222#issuecomment-67709747
Ah yes, that won't work straight out of the box when there's compression.
However, I still think it makes sense to pass it an iterator rather than an
input stream. What we could do is read lines manually without buffering, e.g.
something like the following
```
def readLine(in: java.io.InputStream): String = {
var x = in.read()
var line = "" // use string builder here later
while (x != 10 && x != -1) {
line += Character.toString(x.toChar)
x = in.read()
}
return line
}
```
Then when the return value equals `=== HEADER END MARKER ===`, we wrap the
stream in a buffered stream and then a compressed stream (as we already do in
`ReplayListenerBus`) and extract the lines from this into an iterator, e.g.
```
val fstream = // file stream after reading in the header end marker
val bstream = new BufferedInputStream(fstream)
val cstream = codec.compressedInputStream(bstream)
val jsonEvents: Iterator[String] =
scala.io.Source.fromInputStream(cstream).getLines
```
I tried this locally and it does what I expect. This doesn't seem super
complicated to me.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]