Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/23241#discussion_r239509724 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -197,4 +201,8 @@ class ZStdCompressionCodec(conf: SparkConf) extends CompressionCodec { // avoid overhead excessive of JNI call while trying to uncompress small amount of data. new BufferedInputStream(new ZstdInputStream(s), bufferSize) } + + override def zstdEventLogCompressedInputStream(s: InputStream): InputStream = { + new BufferedInputStream(new ZstdInputStream(s).setContinuous(true), bufferSize) --- End diff -- That's what I'm wondering about. Is it actually desirable to not fail on a partial frame? I'm not sure. We *shouldn't* encounter it elsewhere. This changes a developer API, but may not even be a breaking change as there is a default implementation. We can take breaking changes in Spark 3 though. I think I agree with your approach here in the end.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org