Re: getWrappedSplit() is incorrectly returning the first split

Alex Rovner Mon, 09 Jan 2012 21:11:08 -0800

I have already created the patch and tested with some of my jobs. I ran
into unit tests failure issues though as well. I can attach the patch to
Jira tomorrow anyways to be applied once things are straightened out.


Alex R

On Mon, Jan 9, 2012 at 8:07 PM, Jonathan Coveney <[email protected]> wrote:

> If it is affecting production jobs, I see no reason why we can't put the
> fix into 0.9.2, though I sense that a vote will be coming soon for a 0.9.2
> release, so a fix would have to come soon..the issues running the tests
> brought up in Bill's thread will have to be fixed before we can, though. I
> have a patch that's completely stopped because I can develop any new tests,
> and so on.
>
> 2012/1/9 Prashant Kommireddi <[email protected]>
>
> > Is this critical enough to make it back into 0.9.1?
> >
> > -Prashant
> >
> > On Mon, Jan 9, 2012 at 4:44 PM, Aniket Mokashi <[email protected]>
> > wrote:
> >
> > > Thanks so much for finding this out.
> > >
> > > I was using
> > >
> > > @Override
> > >
> > > public void prepareToRead(@SuppressWarnings("rawtypes")
> > > RecordReaderreader, PigSplit split)
> > >
> > >  throws IOException {
> > >
> > >  this.in = reader;
> > >
> > >  partValues =
> > >
> > >
> >
> ((DataovenSplit)split.getWrappedSplit()).getPartitionInfo().getPartitionValues();
> > >
> > >
> > > in my loader that behaves like hcatalog for delimited text in hive.
> That
> > > returns me same partvalues for all the values. I hacked it with
> something
> > > else. But, I think I must have hit this case. I will confirm. Thanks
> > again
> > > for reporting this.
> > >
> > > Thanks,
> > >
> > > Aniket
> > >
> > > On Mon, Jan 9, 2012 at 11:06 AM, Daniel Dai <[email protected]>
> > wrote:
> > >
> > > > Yes, please. Thanks!
> > > >
> > > > On Mon, Jan 9, 2012 at 10:48 AM, Alex Rovner <[email protected]>
> > > wrote:
> > > >
> > > > > Jira opened.
> > > > >
> > > > > I can attempt to submit a patch as this seems like a fairly
> straight
> > > > > forward fix.
> > > > >
> > > > > https://issues.apache.org/jira/browse/PIG-2462
> > > > >
> > > > >
> > > > > Thanks
> > > > > Alex R
> > > > >
> > > > > On Sat, Jan 7, 2012 at 6:14 PM, Daniel Dai <[email protected]>
> > > > wrote:
> > > > >
> > > > > > Sounds like a bug. I guess no one ever rely on specific split
> info
> > > > > before.
> > > > > > Please open a Jira.
> > > > > >
> > > > > > Daniel
> > > > > >
> > > > > > On Fri, Jan 6, 2012 at 10:21 PM, Alex Rovner <
> [email protected]
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Additionally it looks like PigRecordReader is not incrementing
> > the
> > > > > index
> > > > > > in
> > > > > > > the PigSplit when dealing with CombinedInputFormat thus the
> index
> > > > will
> > > > > be
> > > > > > > incorrect in either case.
> > > > > > >
> > > > > > > On Fri, Jan 6, 2012 at 4:50 PM, Alex Rovner <
> > [email protected]>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Ran into this today. Using trunk (0.11)
> > > > > > > >
> > > > > > > > If you are using a custom loader and are trying to get input
> > > split
> > > > > > > > information In prepareToRead(), getWrappedSplit() is
> providing
> > > the
> > > > > fist
> > > > > > > > split instead of current.
> > > > > > > >
> > > > > > > > Checking the code confirms the suspicion:
> > > > > > > >
> > > > > > > > PigSplit.java:
> > > > > > > >
> > > > > > > >     public InputSplit getWrappedSplit() {
> > > > > > > >         return wrappedSplits[0];
> > > > > > > >     }
> > > > > > > >
> > > > > > > > Should be:
> > > > > > > >     public InputSplit getWrappedSplit() {
> > > > > > > >         return wrappedSplits[splitIndex];
> > > > > > > >     }
> > > > > > > >
> > > > > > > >
> > > > > > > > The side effect is that if you are trying to retrieve the
> > current
> > > > > split
> > > > > > > > when pig is using CombinedInputFormat it incorrectly always
> > > returns
> > > > > the
> > > > > > > > first file in the list instead of the current one that its
> > > > reading. I
> > > > > > > have
> > > > > > > > also confirmed it by outputing a log statement in the
> > > > > prepareToRead():
> > > > > > > >
> > > > > > > >     @Override
> > > > > > > >     public void prepareToRead(@SuppressWarnings("rawtypes")
> > > > > > RecordReader
> > > > > > > > reader, PigSplit split)
> > > > > > > >             throws IOException {
> > > > > > > >         String path =
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ((FileSplit)split.getWrappedSplit(split.getSplitIndex())).getPath().toString();
> > > > > > > >         partitions = getPartitions(table, path);
> > > > > > > >         log.info("Preparing to read: " + path);
> > > > > > > >         this.reader = reader;
> > > > > > > >     }
> > > > > > > >
> > > > > > > > 2012-01-06 16:27:24,165 INFO
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader:
> > > > > > > Current split being processed
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00005:0+61870852012-01-06
> > > > > > > 16:27:24,180 INFO
> com.hadoop.compression.lzo.GPLNativeCodeLoader:
> > > > > Loaded
> > > > > > > native gpl library2012-01-06 16:27:24,183 INFO
> > > > > > > com.hadoop.compression.lzo.LzoCodec: Successfully loaded &
> > > > initialized
> > > > > > > native-lzo library [hadoop-lzo rev
> > > > > > > 2dd49ec41018ba4141b20edf28dbb43c0c07f373]2012-01-06
> 16:27:24,189
> > > INFO
> > > > > > > com.proclivitysystems.etl.pig.udf.loaders.HiveLoader: Preparing
> > to
> > > > > read:
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-000052012-01-06
> > > > > > > 16:27:28,053 INFO
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader:
> > > > > > > Current split being processed
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00006:0+61814752012-01-06
> > > > > > > 16:27:28,056 INFO
> > > > com.proclivitysystems.etl.pig.udf.loaders.HiveLoader:
> > > > > > > Preparing to read:
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00005
> > > > > > > >
> > > > > > > >
> > > > > > > > Notice how the pig is correctly reporting the split but my
> > "info"
> > > > > > > > statement is always reporting the first input split vs
> current.
> > > > > > > >
> > > > > > > > Bug? Jira? Patch?
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > > Alex R
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > "...:::Aniket:::... Quetzalco@tl"
> > >
> >
>

Re: getWrappedSplit() is incorrectly returning the first split

Reply via email to