[ https://issues.apache.org/jira/browse/HIVE-16870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044789#comment-16044789 ]
Ashutosh Chauhan commented on HIVE-16870: ----------------------------------------- Dupe of HIVE-13040 ? > Give Hive the ability to suppress output of empty files > ------------------------------------------------------- > > Key: HIVE-16870 > URL: https://issues.apache.org/jira/browse/HIVE-16870 > Project: Hive > Issue Type: Improvement > Components: StorageHandler > Reporter: Stephen Measmer > > Today some hive queries using joins can output zero byte files, particularly > on large joins. This can have a negative affect on HDFS as it can lead to > too many small files [1]. > A solution suggested in this Cloudera Community thread [2] suggests using > OutputFormat of LazyOutputFormat because MapReduce can be set to suppress the > generation of empty (zero byte) files. > But it's not possible to create a table with an OutputFormat of just > LazyOutputFormat in Hive. Below is what we found when testing. > create table mytable (fip int, state string, zip string, level int) STORED AS > INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT > 'org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat'; > ------------ > Error: Error while compiling statement: FAILED: SemanticException [Error > 10055]: Output Format must implement HiveOutputFormat, otherwise it should be > either IgnoreKeyTextOutputFormat or SequenceFileOutputFormat > (state=42000,code=10055) > [1] http://blog.cloudera.com/blog/2009/02/the-small-files-problem/ > [2] > https://community.cloudera.com/t5/Batch-Processing-and-Workflow/how-to-suppress-mapper-output-files-if-the-output-file-does-not/td-p/29540 -- This message was sent by Atlassian JIRA (v6.3.15#6346)