Abhishek Rawat has posted comments on this change. ( http://gerrit.cloudera.org:8080/15304 )
Change subject: IMPALA-9389: [DOCS] Support reading zstd text files ...................................................................... Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/15304/3/docs/topics/impala_file_formats.xml File docs/topics/impala_file_formats.xml: http://gerrit.cloudera.org:8080/#/c/15304/3/docs/topics/impala_file_formats.xml@315 PS3, Line 315: <dd>For Parquet and text files only. Impala can read zstd-encoded text files written by Hive "Impala can read zstd-encoded text files written by Hive (streaming) or compressed by the zStandard library (block)." I think this is slightly misleading since the above is entirely true for zstd compressed text files. Also streaming/block are internal details which we don't necessarily have to put in the documentation. For zstd compressed Parquet files, we support both reading and writing . This statement would be misleading since it seems we only support reading. Also, compressing parquet files requires page level compression and so if someone uses the zstd lib to compress a parquet file (and not doing page level compression) Impala cannot read/uncompress it. IMPALA-9201 is a related JIRA. I think I am happy with just having following here: "For Parquet and text files only" In other parts of documentation we anyways cover the fact that Impala can only read text compressed files and this is no different for the new zstd support. And it can read/write parquet compressed files. http://gerrit.cloudera.org:8080/#/c/15304/3/docs/topics/impala_txtfile.xml File docs/topics/impala_txtfile.xml: http://gerrit.cloudera.org:8080/#/c/15304/3/docs/topics/impala_txtfile.xml@650 PS3, Line 650: capability. Impala can read zstd-encoded text files written by Hive (streaming) or compressed I don't think it is necessary to document the details such as streaming/block. It doesn't help the documentation but only raises more questions. Also, I am not sure this is only true for zstd. I would think this is true for other "text" compression formats also. And if that is the case we probably should just add a generic statement something like this: "Impala can read compressed text files written by Hive or compressed by the standard library implementation" @Xiaomeng could you please confirm this? I think this is true for all supported text compression codecs. http://gerrit.cloudera.org:8080/#/c/15304/3/docs/topics/impala_txtfile.xml@676 PS3, Line 676: <codeph>.gz</codeph>, <codeph>.snappy</codeph>, or <codeph>zstd</codeph>. The extensions I think the extension is '.zst' -- To view, visit http://gerrit.cloudera.org:8080/15304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic83137bd2c3a49398fb60cf1901f8b74ed111fce Gerrit-Change-Number: 15304 Gerrit-PatchSet: 3 Gerrit-Owner: Anonymous Coward <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Andrew Sherman <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Xiaomeng Zhang <[email protected]> Gerrit-Comment-Date: Thu, 27 Feb 2020 19:43:51 +0000 Gerrit-HasComments: Yes
