[ 
https://issues.apache.org/jira/browse/IMPALA-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan resolved IMPALA-8549.
---------------------------
       Resolution: Fixed
    Fix Version/s: Impala 3.3.0

> Add support for scanning DEFLATE text files
> -------------------------------------------
>
>                 Key: IMPALA-8549
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8549
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Sahil Takiar
>            Assignee: Ethan
>            Priority: Minor
>              Labels: ramp-up
>             Fix For: Impala 3.3.0
>
>
> Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing 
> text files stored using zlib / deflate (results in files such as 
> {{000000_0.deflate}}). Impala currently does not support reading {{.deflate}} 
> text files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is 
> not one of the enabled plugins: 'LZO'}}.
> Moreover, the default compression codec in Hadoop is zlib / deflate (see 
> {{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, 
> if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files 
> will be written by default.
> Impala does support zlib / deflate with other file formats though: Avro, 
> RCFiles, SequenceFiles (see 
> [https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]).
> Currently, the frontend assigns a compression type to a file depending on its 
> extension. For instance, the functional_text_def database is stored as a file 
> with a .deflate extension and is assigned the compression type DEFLATE. The 
> HdfsTextScanner class receives this value and uses it directly to create a 
> decompressor. The functional_\{avro,seq,rc}_databases are stored as files 
> without extensions, so the frontend interprets their compression type as 
> NONE. However, in the backend, each of their corresponding scanners implement 
> custom logic of their own to read file headers and override the existing NONE 
> compression type assigned to files with new values, such as DEFAULT or 
> DEFLATE, so that they appropriate decompressor can be instantiated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to