Hello Hive support team, Happy new year to you!
Quick question in regards to combining small LZO files in Hive. As some of our HDFS files are indexed (not all, but there is always a few .lzo.index files in the directory structure), we are experiencing the problematic behavior described in JIRA MAPREDUCE-5537 (https://issues.apache.org/jira/browse/MAPREDUCE-5537 ); the case is 100% reproducible. We have a separate aggregation process that runs on the cluster to take care of the “small files issue”. However, in between runs, in order to reduce the number of mappers (and busy containers), we would have loved to set hive.hadoop.supports.splittable.combineinputformat to true and allow Hive to combine small files by itself. We are using Cloudera distro CDH 5.2.0 and ideally we would avoid building hadoop-core manually. Do you know if the patch on JIRA MAPREDUCE-5537 has ever been included in any official release? I will wait for news from you. Thank you very much, Nathalie Blais Ubisoft Montreal [cid:image002.png@01CFED39.93DB5F20] Nathalie Blais BI Developer - DNA<http://technologygroup/dna> Technology Group Online – Ubisoft Montreal