Re: Potential bug around hive merging of small files

2012-03-13 Thread Ashutosh Chauhan
This does look like a bug. Shrijeet, mind opening a jira and attaching your
patch there.

Thanks,
Ashutosh
On Mon, Mar 12, 2012 at 16:29, Shrijeet Paliwal shrij...@rocketfuel.comwrote:

 I had a type in last email. Settings are as follows

 hive set mapred.min.split.size.per.node=10;
 hive set mapred.min.split.size.per.rack=10;
 hive set mapred.max.split.size=10;
 hive set hive.merge.size.per.task=10;
 hive set hive.merge.smallfiles.avgsize=10;
 hive set hive.merge.size.smallfiles.avgsize=10;*hive set
 hive.merge.mapfiles=true;*hive set hive.merge.mapredfiles=true;

 *hive set hive.mergejob.maponly=false;*




 On Mon, Mar 12, 2012 at 4:27 PM, Shrijeet Paliwal
 shrij...@rocketfuel.comwrote:

  Hive Version: Hive 0.8 (last commit SHA
   b581a6192b8d4c544092679d05f45b2e50d42b45 )
 
  Hadoop version : chd3u0
 
  I am trying to use the hive merge small file feature by setting all the
  necessary params.
  I am disabling use of CombineHiveInputFormat since my input is compressed
  text.
 
  hive set mapred.min.split.size.per.node=10;
  hive set mapred.min.split.size.per.rack=10;
  hive set mapred.max.split.size=10;
  hive set hive.merge.size.per.task=10;
  hive set hive.merge.smallfiles.avgsize=10;
  hive set hive.merge.size.smallfiles.avgsize=10;
  hive set hive.merge.mapfiles=false;
  hive set hive.merge.mapredfiles=true;
 
 
  The plan decides to launch two MR jobs but after first job succeeds I get
  runt time error
 
  java.lang.RuntimeException: Plan invalid, Reason: Reducers == 0 but
  reduce operator specified
 
  I think the problem can be fixed by using this patch I came with :
  https://gist.github.com/2025303
 
  Of course my understanding and hence this patch can be totally wrong.
  Please provide feedback.
 



Re: Potential bug around hive merging of small files

2012-03-13 Thread Shrijeet Paliwal
I have opened https://issues.apache.org/jira/browse/HIVE-2869

On Tue, Mar 13, 2012 at 8:37 AM, Ashutosh Chauhan hashut...@apache.orgwrote:

 This does look like a bug. Shrijeet, mind opening a jira and attaching your
 patch there.

 Thanks,
 Ashutosh
 On Mon, Mar 12, 2012 at 16:29, Shrijeet Paliwal shrij...@rocketfuel.com
 wrote:

  I had a type in last email. Settings are as follows
 
  hive set mapred.min.split.size.per.node=10;
  hive set mapred.min.split.size.per.rack=10;
  hive set mapred.max.split.size=10;
  hive set hive.merge.size.per.task=10;
  hive set hive.merge.smallfiles.avgsize=10;
  hive set hive.merge.size.smallfiles.avgsize=10;*hive set
  hive.merge.mapfiles=true;*hive set hive.merge.mapredfiles=true;
 
  *hive set hive.mergejob.maponly=false;*
 
 
 
 
  On Mon, Mar 12, 2012 at 4:27 PM, Shrijeet Paliwal
  shrij...@rocketfuel.comwrote:
 
   Hive Version: Hive 0.8 (last commit SHA
b581a6192b8d4c544092679d05f45b2e50d42b45 )
  
   Hadoop version : chd3u0
  
   I am trying to use the hive merge small file feature by setting all the
   necessary params.
   I am disabling use of CombineHiveInputFormat since my input is
 compressed
   text.
  
   hive set mapred.min.split.size.per.node=10;
   hive set mapred.min.split.size.per.rack=10;
   hive set mapred.max.split.size=10;
   hive set hive.merge.size.per.task=10;
   hive set hive.merge.smallfiles.avgsize=10;
   hive set hive.merge.size.smallfiles.avgsize=10;
   hive set hive.merge.mapfiles=false;
   hive set hive.merge.mapredfiles=true;
  
  
   The plan decides to launch two MR jobs but after first job succeeds I
 get
   runt time error
  
   java.lang.RuntimeException: Plan invalid, Reason: Reducers == 0 but
   reduce operator specified
  
   I think the problem can be fixed by using this patch I came with :
   https://gist.github.com/2025303
  
   Of course my understanding and hence this patch can be totally wrong.
   Please provide feedback.
  
 



Potential bug around hive merging of small files

2012-03-12 Thread Shrijeet Paliwal
Hive Version: Hive 0.8 (last commit SHA
 b581a6192b8d4c544092679d05f45b2e50d42b45 )

Hadoop version : chd3u0

I am trying to use the hive merge small file feature by setting all the
necessary params.
I am disabling use of CombineHiveInputFormat since my input is compressed
text.

hive set mapred.min.split.size.per.node=10;
hive set mapred.min.split.size.per.rack=10;
hive set mapred.max.split.size=10;
hive set hive.merge.size.per.task=10;
hive set hive.merge.smallfiles.avgsize=10;
hive set hive.merge.size.smallfiles.avgsize=10;
hive set hive.merge.mapfiles=false;
hive set hive.merge.mapredfiles=true;


The plan decides to launch two MR jobs but after first job succeeds I get
runt time error

java.lang.RuntimeException: Plan invalid, Reason: Reducers == 0 but reduce
operator specified

I think the problem can be fixed by using this patch I came with :
https://gist.github.com/2025303

Of course my understanding and hence this patch can be totally wrong.
Please provide feedback.


Re: Potential bug around hive merging of small files

2012-03-12 Thread Shrijeet Paliwal
I had a type in last email. Settings are as follows

hive set mapred.min.split.size.per.node=10;
hive set mapred.min.split.size.per.rack=10;
hive set mapred.max.split.size=10;
hive set hive.merge.size.per.task=10;
hive set hive.merge.smallfiles.avgsize=10;
hive set hive.merge.size.smallfiles.avgsize=10;*hive set
hive.merge.mapfiles=true;*hive set hive.merge.mapredfiles=true;

*hive set hive.mergejob.maponly=false;*




On Mon, Mar 12, 2012 at 4:27 PM, Shrijeet Paliwal
shrij...@rocketfuel.comwrote:

 Hive Version: Hive 0.8 (last commit SHA
  b581a6192b8d4c544092679d05f45b2e50d42b45 )

 Hadoop version : chd3u0

 I am trying to use the hive merge small file feature by setting all the
 necessary params.
 I am disabling use of CombineHiveInputFormat since my input is compressed
 text.

 hive set mapred.min.split.size.per.node=10;
 hive set mapred.min.split.size.per.rack=10;
 hive set mapred.max.split.size=10;
 hive set hive.merge.size.per.task=10;
 hive set hive.merge.smallfiles.avgsize=10;
 hive set hive.merge.size.smallfiles.avgsize=10;
 hive set hive.merge.mapfiles=false;
 hive set hive.merge.mapredfiles=true;


 The plan decides to launch two MR jobs but after first job succeeds I get
 runt time error

 java.lang.RuntimeException: Plan invalid, Reason: Reducers == 0 but
 reduce operator specified

 I think the problem can be fixed by using this patch I came with :
 https://gist.github.com/2025303

 Of course my understanding and hence this patch can be totally wrong.
 Please provide feedback.