Re: Potential bug around hive merging of small files
This does look like a bug. Shrijeet, mind opening a jira and attaching your patch there. Thanks, Ashutosh On Mon, Mar 12, 2012 at 16:29, Shrijeet Paliwal shrij...@rocketfuel.comwrote: I had a type in last email. Settings are as follows hive set mapred.min.split.size.per.node=10; hive set mapred.min.split.size.per.rack=10; hive set mapred.max.split.size=10; hive set hive.merge.size.per.task=10; hive set hive.merge.smallfiles.avgsize=10; hive set hive.merge.size.smallfiles.avgsize=10;*hive set hive.merge.mapfiles=true;*hive set hive.merge.mapredfiles=true; *hive set hive.mergejob.maponly=false;* On Mon, Mar 12, 2012 at 4:27 PM, Shrijeet Paliwal shrij...@rocketfuel.comwrote: Hive Version: Hive 0.8 (last commit SHA b581a6192b8d4c544092679d05f45b2e50d42b45 ) Hadoop version : chd3u0 I am trying to use the hive merge small file feature by setting all the necessary params. I am disabling use of CombineHiveInputFormat since my input is compressed text. hive set mapred.min.split.size.per.node=10; hive set mapred.min.split.size.per.rack=10; hive set mapred.max.split.size=10; hive set hive.merge.size.per.task=10; hive set hive.merge.smallfiles.avgsize=10; hive set hive.merge.size.smallfiles.avgsize=10; hive set hive.merge.mapfiles=false; hive set hive.merge.mapredfiles=true; The plan decides to launch two MR jobs but after first job succeeds I get runt time error java.lang.RuntimeException: Plan invalid, Reason: Reducers == 0 but reduce operator specified I think the problem can be fixed by using this patch I came with : https://gist.github.com/2025303 Of course my understanding and hence this patch can be totally wrong. Please provide feedback.
Re: Potential bug around hive merging of small files
I have opened https://issues.apache.org/jira/browse/HIVE-2869 On Tue, Mar 13, 2012 at 8:37 AM, Ashutosh Chauhan hashut...@apache.orgwrote: This does look like a bug. Shrijeet, mind opening a jira and attaching your patch there. Thanks, Ashutosh On Mon, Mar 12, 2012 at 16:29, Shrijeet Paliwal shrij...@rocketfuel.com wrote: I had a type in last email. Settings are as follows hive set mapred.min.split.size.per.node=10; hive set mapred.min.split.size.per.rack=10; hive set mapred.max.split.size=10; hive set hive.merge.size.per.task=10; hive set hive.merge.smallfiles.avgsize=10; hive set hive.merge.size.smallfiles.avgsize=10;*hive set hive.merge.mapfiles=true;*hive set hive.merge.mapredfiles=true; *hive set hive.mergejob.maponly=false;* On Mon, Mar 12, 2012 at 4:27 PM, Shrijeet Paliwal shrij...@rocketfuel.comwrote: Hive Version: Hive 0.8 (last commit SHA b581a6192b8d4c544092679d05f45b2e50d42b45 ) Hadoop version : chd3u0 I am trying to use the hive merge small file feature by setting all the necessary params. I am disabling use of CombineHiveInputFormat since my input is compressed text. hive set mapred.min.split.size.per.node=10; hive set mapred.min.split.size.per.rack=10; hive set mapred.max.split.size=10; hive set hive.merge.size.per.task=10; hive set hive.merge.smallfiles.avgsize=10; hive set hive.merge.size.smallfiles.avgsize=10; hive set hive.merge.mapfiles=false; hive set hive.merge.mapredfiles=true; The plan decides to launch two MR jobs but after first job succeeds I get runt time error java.lang.RuntimeException: Plan invalid, Reason: Reducers == 0 but reduce operator specified I think the problem can be fixed by using this patch I came with : https://gist.github.com/2025303 Of course my understanding and hence this patch can be totally wrong. Please provide feedback.
Potential bug around hive merging of small files
Hive Version: Hive 0.8 (last commit SHA b581a6192b8d4c544092679d05f45b2e50d42b45 ) Hadoop version : chd3u0 I am trying to use the hive merge small file feature by setting all the necessary params. I am disabling use of CombineHiveInputFormat since my input is compressed text. hive set mapred.min.split.size.per.node=10; hive set mapred.min.split.size.per.rack=10; hive set mapred.max.split.size=10; hive set hive.merge.size.per.task=10; hive set hive.merge.smallfiles.avgsize=10; hive set hive.merge.size.smallfiles.avgsize=10; hive set hive.merge.mapfiles=false; hive set hive.merge.mapredfiles=true; The plan decides to launch two MR jobs but after first job succeeds I get runt time error java.lang.RuntimeException: Plan invalid, Reason: Reducers == 0 but reduce operator specified I think the problem can be fixed by using this patch I came with : https://gist.github.com/2025303 Of course my understanding and hence this patch can be totally wrong. Please provide feedback.
Re: Potential bug around hive merging of small files
I had a type in last email. Settings are as follows hive set mapred.min.split.size.per.node=10; hive set mapred.min.split.size.per.rack=10; hive set mapred.max.split.size=10; hive set hive.merge.size.per.task=10; hive set hive.merge.smallfiles.avgsize=10; hive set hive.merge.size.smallfiles.avgsize=10;*hive set hive.merge.mapfiles=true;*hive set hive.merge.mapredfiles=true; *hive set hive.mergejob.maponly=false;* On Mon, Mar 12, 2012 at 4:27 PM, Shrijeet Paliwal shrij...@rocketfuel.comwrote: Hive Version: Hive 0.8 (last commit SHA b581a6192b8d4c544092679d05f45b2e50d42b45 ) Hadoop version : chd3u0 I am trying to use the hive merge small file feature by setting all the necessary params. I am disabling use of CombineHiveInputFormat since my input is compressed text. hive set mapred.min.split.size.per.node=10; hive set mapred.min.split.size.per.rack=10; hive set mapred.max.split.size=10; hive set hive.merge.size.per.task=10; hive set hive.merge.smallfiles.avgsize=10; hive set hive.merge.size.smallfiles.avgsize=10; hive set hive.merge.mapfiles=false; hive set hive.merge.mapredfiles=true; The plan decides to launch two MR jobs but after first job succeeds I get runt time error java.lang.RuntimeException: Plan invalid, Reason: Reducers == 0 but reduce operator specified I think the problem can be fixed by using this patch I came with : https://gist.github.com/2025303 Of course my understanding and hence this patch can be totally wrong. Please provide feedback.