I've upgraded to pig 0.8 and still not able to correctly set the input split size. It still defaults to DFS block size:
here are the params I set via the cmd line: -Dmapred.min.split.size=512MB -Dpig.maxCombinedSplitSize=512MB -Dpig.splitCombination=false I'm starting to wonder if the ChukwaLoader isn't respecting the splits. Anyone actually got this working? On Thu, Aug 5, 2010 at 9:34 PM, Corbin Hoenes <[email protected]> wrote: > Thanks guys this is the issue. Need to move to pig 0.7 and while I'm at it > upgrade to the latest chukwa. > > On Aug 5, 2010, at 6:38 PM, Richard Ding wrote: > > > Pig 0.6 implements its own splits (called slice) with size equal to the > block size. So this explains why the setting doesn't work. > > > > Thanks, > > -Richard > > > > -----Original Message----- > > From: Bill Graham [mailto:[email protected]] > > Sent: Thursday, August 05, 2010 5:06 PM > > To: [email protected] > > Subject: Re: mapred.min.split.size > > > > FYI, Chukwa support for Pig 0.7.0 was just committed last week: > > > > https://issues.apache.org/jira/browse/CHUKWA-495 > > > > The patch was built on Chukwa 0.4.0, but you could try applying the patch > > against Chukwa 0.3.0. I don't think the relevant code changed much > between > > 3-4. > > > > > > On Thu, Aug 5, 2010 at 4:40 PM, Richard Ding <[email protected]> > wrote: > > > >> What version of Pig you are on? ChukwaStorage loader for Pig 0.7 uses > >> Hadoop FileInputFormat to generate splits so the mapred.min.split.size > >> property should work. > >> > >> But from the release date, Chukwa 0.3 seems not on Pig 0.7. > >> > >> Thanks, > >> -Richard > >> > >> -----Original Message----- > >> From: Corbin Hoenes [mailto:[email protected]] > >> Sent: Thursday, August 05, 2010 3:50 PM > >> To: [email protected] > >> Subject: Re: mapred.min.split.size > >> > >> I am using the ChukwaStorage loader from chukwa 0.3. Is it the loader's > >> responsibility to deal with input splits? > >> > >> On Aug 5, 2010, at 4:14 PM, Richard Ding wrote: > >> > >>> I misunderstood your earlier question. If you have one large file, set > >> mapred.min.split.size property will help to increase the file split > size. > >> Pig will pass system properties to Hadoop. What loader are you using? > >>> > >>> Thanks, > >>> -Richard > >>> > >>> -----Original Message----- > >>> From: Corbin Hoenes [mailto:[email protected]] > >>> Sent: Thursday, August 05, 2010 1:22 PM > >>> To: [email protected] > >>> Subject: Re: mapred.min.split.size > >>> > >>> So what does pig do when I have a 5 gig file? Does it simply hardcode > >> the split size to block size? Is there no way to tell it to just > operate > >> on a larger split size? > >>> > >>> > >>> On Jul 27, 2010, at 3:41 PM, Richard Ding wrote: > >>> > >>>> For Pig loaders, each split can have at most one file, doesn't matter > >> what split size is. > >>>> > >>>> You can concatenate the input files before loading them. > >>>> > >>>> Thanks, > >>>> -Richard > >>>> -----Original Message----- > >>>> From: Corbin Hoenes [mailto:[email protected]] > >>>> Sent: Tuesday, July 27, 2010 2:09 PM > >>>> To: [email protected] > >>>> Subject: mapred.min.split.size > >>>> > >>>> Is there a way to set the mapred.min.split.size property in pig? I set > >> it but doesn't seem to have changed the mapper's HDFS_BYTES_READ > counter. > >> My mappers are finishing ~10 secs. I have ~20,000 of them. > >>>> > >>>> > >>>> > >>> > >> > >> > >
