Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22528 > Another concern here is, we have another place to control the compression codec (where we usually delegate to HDFS libraries). I was considering using Compressor API but its streaming nature controverses to structure of zip archive where meta-info is located at the end of files, and you cannot read/uncompress it sequentially block-by-block. > It just sounds like a bandaid fix to allow one zipped file case in multi line mode. I believe it is better to return correct result in a case when wrong result is returned for now (try to read zipped CSV), or to force users to use this workaround only to read zip archives via RDD API: https://docs.databricks.com/spark/latest/data-sources/zip-files.html#zip-files . Especially in the case of compressed not splittable CSV, there is not big difference how to read it in multiLine enabled or disabled.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org