Hi, We often have map-only queries that result in a large number of small output files (in the thousands). Although this doesn't affect CLI, when users try to view/download the query result in Hue, Hue would time out in trying to read all these small files. We tried to set the following properties that supposedly will make Hive launch an extra MR job to merge these files when the average file size is smaller than some threshold, but it's not working:
1. hive.merge.mapfiles = true 2. hive.merge.mapredfiles = true 3. hive.merge.smallfiles.avgsize = 32000000 (Default is 16000000) 4. In Hive 10, we used to have hive.mergejob.maponly set to true, but this property does not exist in Hive 11 and 12. What's the story behind this? For example, in the following select-from-where query on a partitioned table in RCFile, there would be two root stages - one doing a scan with filter and the other doing a fetch. *Query*: select data_date as date, ID, if(col_10=1, "yes","no") as answer from table_1 where arr[4] <> "0" and lookup("table_1", x,"action_id")=20519251 and data_date>=20131014 *Query Plan:* STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: table_1 TableScan alias: table_1 Filter Operator predicate: expr: ((arr[4] <> '0') and (dim_lookup('table_1', x, 'action_id') = 20519251)) type: boolean Select Operator expressions: expr: data_date type: string expr: ID type: string expr: if((col_10= 1), 'yes', 'no') type: string outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: true GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 The query leads to 6253 output files, and the total size is 86427 bytes. Many of the files have 8 bytes and the ones that have more than 8 bytes usually have ~30 bytes. With the aforementioned settings, I'd expect an extra MR job to merge the files, but that didn't happen. If anyone has some insights please let me know. Thanks, Eric