[GitHub] spark issue #22528: [SPARK-25513][SQL] Read zipped CSV and JSON

MaxGekk Thu, 27 Sep 2018 06:06:25 -0700

Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22528
  
    > Another concern here is, we have another place to control the compression 
codec (where we usually delegate to HDFS libraries).
    
    I was considering using Compressor API but its streaming nature 
controverses to structure of zip archive where meta-info is located at the end 
of files, and you cannot read/uncompress it sequentially block-by-block.
    
    > It just sounds like a bandaid fix to allow one zipped file case in multi 
line mode.
    
    I believe it is better to return correct result in a case when wrong result 
is returned for now (try to read zipped CSV), or to force users to use this 
workaround only to read zip archives via RDD API: 
https://docs.databricks.com/spark/latest/data-sources/zip-files.html#zip-files 
. Especially in the case of compressed not splittable CSV, there is not big 
difference how to read it in multiLine enabled or disabled.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22528: [SPARK-25513][SQL] Read zipped CSV and JSON

Reply via email to