Abhishek Rawat has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15304 )

Change subject: IMPALA-9389: [DOCS] Support reading zstd text files
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/15304/3/docs/topics/impala_file_formats.xml
File docs/topics/impala_file_formats.xml:

http://gerrit.cloudera.org:8080/#/c/15304/3/docs/topics/impala_file_formats.xml@315
PS3, Line 315:         <dd>For Parquet and text files only. Impala can read 
zstd-encoded text files written by Hive
"Impala can read zstd-encoded text files written by Hive
          (streaming) or compressed by the zStandard library (block)."

I think this is slightly misleading since the above is entirely true for zstd 
compressed text files. Also streaming/block are internal details which we don't 
necessarily have to put in the documentation.

For zstd compressed Parquet files, we support both reading and writing . This 
statement would be misleading since it seems we only support reading. Also, 
compressing parquet files requires page level compression and so if someone 
uses the zstd lib to compress a parquet file (and not doing page level 
compression) Impala cannot read/uncompress it. IMPALA-9201 is a related JIRA.

I think I am happy with just having following here:
"For Parquet and text files only"

In other parts of documentation we anyways cover the fact that Impala can only 
read text compressed files and this is no different for the new zstd support. 
And it can read/write parquet compressed files.


http://gerrit.cloudera.org:8080/#/c/15304/3/docs/topics/impala_txtfile.xml
File docs/topics/impala_txtfile.xml:

http://gerrit.cloudera.org:8080/#/c/15304/3/docs/topics/impala_txtfile.xml@650
PS3, Line 650:         capability. Impala can read zstd-encoded text files 
written by Hive (streaming) or compressed
I don't think it is necessary to document the details such as streaming/block. 
It doesn't help the documentation but only raises more questions.

Also, I am not sure this is only true for zstd. I would think this is true for 
other "text" compression formats also. And if that is the case we probably 
should just add a generic statement something like this:

"Impala can read compressed text files written by Hive or compressed by the 
standard library implementation"

@Xiaomeng could you please confirm this? I think this is true for all supported 
text compression codecs.


http://gerrit.cloudera.org:8080/#/c/15304/3/docs/topics/impala_txtfile.xml@676
PS3, Line 676:           <codeph>.gz</codeph>, <codeph>.snappy</codeph>, or 
<codeph>zstd</codeph>. The extensions
I think the extension is '.zst'



--
To view, visit http://gerrit.cloudera.org:8080/15304
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic83137bd2c3a49398fb60cf1901f8b74ed111fce
Gerrit-Change-Number: 15304
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Andrew Sherman <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Xiaomeng Zhang <[email protected]>
Gerrit-Comment-Date: Thu, 27 Feb 2020 19:43:51 +0000
Gerrit-HasComments: Yes

Reply via email to