Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22528#discussion_r219686326
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CodecStreams.scala
---
@@ -41,7 +42,12 @@ object CodecStreams {
getDecompressionCodec(config, file)
.map(codec => codec.createInputStream(inputStream))
- .getOrElse(inputStream)
+ .orElse {
+ if (file.getName.toLowerCase.endsWith(".zip")) {
+ val zip = new ZipArchiveInputStream(inputStream)
+ if (zip.getNextEntry != null) Some(zip) else None
+ } else None
+ }.getOrElse(inputStream)
--- End diff --
@MaxGekk, I got that we can support zipped one but isn't this difficult to
extend this support to non multiline modes as well? Basically deflate is the
same codec and I wonder if we really should allow this zip one specifically in
multiline mode for CSV / JSON specifically with a clear restriction (single
file). Please correct me if I misunderstood.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]