[ https://issues.apache.org/jira/browse/IMPALA-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ethan resolved IMPALA-8549. --------------------------- Resolution: Fixed Fix Version/s: Impala 3.3.0 > Add support for scanning DEFLATE text files > ------------------------------------------- > > Key: IMPALA-8549 > URL: https://issues.apache.org/jira/browse/IMPALA-8549 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Reporter: Sahil Takiar > Assignee: Ethan > Priority: Minor > Labels: ramp-up > Fix For: Impala 3.3.0 > > > Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing > text files stored using zlib / deflate (results in files such as > {{000000_0.deflate}}). Impala currently does not support reading {{.deflate}} > text files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is > not one of the enabled plugins: 'LZO'}}. > Moreover, the default compression codec in Hadoop is zlib / deflate (see > {{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, > if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files > will be written by default. > Impala does support zlib / deflate with other file formats though: Avro, > RCFiles, SequenceFiles (see > [https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]). > Currently, the frontend assigns a compression type to a file depending on its > extension. For instance, the functional_text_def database is stored as a file > with a .deflate extension and is assigned the compression type DEFLATE. The > HdfsTextScanner class receives this value and uses it directly to create a > decompressor. The functional_\{avro,seq,rc}_databases are stored as files > without extensions, so the frontend interprets their compression type as > NONE. However, in the backend, each of their corresponding scanners implement > custom logic of their own to read file headers and override the existing NONE > compression type assigned to files with new values, such as DEFAULT or > DEFLATE, so that they appropriate decompressor can be instantiated. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org