[GitHub] spark pull request #22528: [SPARK-25513][SQL] Read zipped CSV and JSON

HyukjinKwon Sat, 22 Sep 2018 18:59:11 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22528#discussion_r219686326
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CodecStreams.scala
 ---
    @@ -41,7 +42,12 @@ object CodecStreams {
     
         getDecompressionCodec(config, file)
           .map(codec => codec.createInputStream(inputStream))
    -      .getOrElse(inputStream)
    +      .orElse {
    +        if (file.getName.toLowerCase.endsWith(".zip")) {
    +          val zip = new ZipArchiveInputStream(inputStream)
    +          if (zip.getNextEntry != null) Some(zip) else None
    +        } else None
    +      }.getOrElse(inputStream)
    --- End diff --
    
    @MaxGekk, I got that we can support zipped one but isn't this difficult to 
extend this support to non multiline modes as well? Basically deflate is the 
same codec and I wonder if we really should allow this zip one specifically in 
multiline mode for CSV / JSON specifically with a clear restriction (single 
file). Please correct me if I misunderstood.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22528: [SPARK-25513][SQL] Read zipped CSV and JSON

Reply via email to