Hive 0.13 vs LZO index vs hive.hadoop.supports.splittable.combineinputformat issue

Nathalie Blais Wed, 07 Jan 2015 12:27:07 -0800

Hello Hive support team,

Happy new year to you!


Quick question in regards to combining small LZO files in Hive.  As some of our 
HDFS files are indexed (not all, but there is always a few .lzo.index files in 
the directory structure), we are experiencing the problematic behavior 
described in JIRA MAPREDUCE-5537 
(https://issues.apache.org/jira/browse/MAPREDUCE-5537 ); the case is 100% 
reproducible.

We have a separate aggregation process that runs on the cluster to take care of 
the “small files issue”.  However, in between runs, in order to reduce the 
number of mappers (and busy containers), we would have loved to set 
hive.hadoop.supports.splittable.combineinputformat to true and allow Hive to 
combine small files by itself.

We are using Cloudera distro CDH 5.2.0 and ideally we would avoid building 
hadoop-core manually.  Do you know if the patch on JIRA MAPREDUCE-5537 has ever 
been included in any official release?

I will wait for news from you.

Thank you very much,

Nathalie Blais
Ubisoft Montreal

[cid:image002.png@01CFED39.93DB5F20]

Nathalie Blais
BI Developer - DNA<http://technologygroup/dna>
Technology Group Online – Ubisoft Montreal

Hive 0.13 vs LZO index vs hive.hadoop.supports.splittable.combineinputformat issue

Reply via email to