I have already created the patch and tested with some of my jobs. I ran into unit tests failure issues though as well. I can attach the patch to Jira tomorrow anyways to be applied once things are straightened out.
Alex R On Mon, Jan 9, 2012 at 8:07 PM, Jonathan Coveney <[email protected]> wrote: > If it is affecting production jobs, I see no reason why we can't put the > fix into 0.9.2, though I sense that a vote will be coming soon for a 0.9.2 > release, so a fix would have to come soon..the issues running the tests > brought up in Bill's thread will have to be fixed before we can, though. I > have a patch that's completely stopped because I can develop any new tests, > and so on. > > 2012/1/9 Prashant Kommireddi <[email protected]> > > > Is this critical enough to make it back into 0.9.1? > > > > -Prashant > > > > On Mon, Jan 9, 2012 at 4:44 PM, Aniket Mokashi <[email protected]> > > wrote: > > > > > Thanks so much for finding this out. > > > > > > I was using > > > > > > @Override > > > > > > public void prepareToRead(@SuppressWarnings("rawtypes") > > > RecordReaderreader, PigSplit split) > > > > > > throws IOException { > > > > > > this.in = reader; > > > > > > partValues = > > > > > > > > > ((DataovenSplit)split.getWrappedSplit()).getPartitionInfo().getPartitionValues(); > > > > > > > > > in my loader that behaves like hcatalog for delimited text in hive. > That > > > returns me same partvalues for all the values. I hacked it with > something > > > else. But, I think I must have hit this case. I will confirm. Thanks > > again > > > for reporting this. > > > > > > Thanks, > > > > > > Aniket > > > > > > On Mon, Jan 9, 2012 at 11:06 AM, Daniel Dai <[email protected]> > > wrote: > > > > > > > Yes, please. Thanks! > > > > > > > > On Mon, Jan 9, 2012 at 10:48 AM, Alex Rovner <[email protected]> > > > wrote: > > > > > > > > > Jira opened. > > > > > > > > > > I can attempt to submit a patch as this seems like a fairly > straight > > > > > forward fix. > > > > > > > > > > https://issues.apache.org/jira/browse/PIG-2462 > > > > > > > > > > > > > > > Thanks > > > > > Alex R > > > > > > > > > > On Sat, Jan 7, 2012 at 6:14 PM, Daniel Dai <[email protected]> > > > > wrote: > > > > > > > > > > > Sounds like a bug. I guess no one ever rely on specific split > info > > > > > before. > > > > > > Please open a Jira. > > > > > > > > > > > > Daniel > > > > > > > > > > > > On Fri, Jan 6, 2012 at 10:21 PM, Alex Rovner < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > > > Additionally it looks like PigRecordReader is not incrementing > > the > > > > > index > > > > > > in > > > > > > > the PigSplit when dealing with CombinedInputFormat thus the > index > > > > will > > > > > be > > > > > > > incorrect in either case. > > > > > > > > > > > > > > On Fri, Jan 6, 2012 at 4:50 PM, Alex Rovner < > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > > > Ran into this today. Using trunk (0.11) > > > > > > > > > > > > > > > > If you are using a custom loader and are trying to get input > > > split > > > > > > > > information In prepareToRead(), getWrappedSplit() is > providing > > > the > > > > > fist > > > > > > > > split instead of current. > > > > > > > > > > > > > > > > Checking the code confirms the suspicion: > > > > > > > > > > > > > > > > PigSplit.java: > > > > > > > > > > > > > > > > public InputSplit getWrappedSplit() { > > > > > > > > return wrappedSplits[0]; > > > > > > > > } > > > > > > > > > > > > > > > > Should be: > > > > > > > > public InputSplit getWrappedSplit() { > > > > > > > > return wrappedSplits[splitIndex]; > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > The side effect is that if you are trying to retrieve the > > current > > > > > split > > > > > > > > when pig is using CombinedInputFormat it incorrectly always > > > returns > > > > > the > > > > > > > > first file in the list instead of the current one that its > > > > reading. I > > > > > > > have > > > > > > > > also confirmed it by outputing a log statement in the > > > > > prepareToRead(): > > > > > > > > > > > > > > > > @Override > > > > > > > > public void prepareToRead(@SuppressWarnings("rawtypes") > > > > > > RecordReader > > > > > > > > reader, PigSplit split) > > > > > > > > throws IOException { > > > > > > > > String path = > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ((FileSplit)split.getWrappedSplit(split.getSplitIndex())).getPath().toString(); > > > > > > > > partitions = getPartitions(table, path); > > > > > > > > log.info("Preparing to read: " + path); > > > > > > > > this.reader = reader; > > > > > > > > } > > > > > > > > > > > > > > > > 2012-01-06 16:27:24,165 INFO > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: > > > > > > > Current split being processed > > > > > > > > > > > > > > > > > > > > > > > > > > > > hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00005:0+61870852012-01-06 > > > > > > > 16:27:24,180 INFO > com.hadoop.compression.lzo.GPLNativeCodeLoader: > > > > > Loaded > > > > > > > native gpl library2012-01-06 16:27:24,183 INFO > > > > > > > com.hadoop.compression.lzo.LzoCodec: Successfully loaded & > > > > initialized > > > > > > > native-lzo library [hadoop-lzo rev > > > > > > > 2dd49ec41018ba4141b20edf28dbb43c0c07f373]2012-01-06 > 16:27:24,189 > > > INFO > > > > > > > com.proclivitysystems.etl.pig.udf.loaders.HiveLoader: Preparing > > to > > > > > read: > > > > > > > > > > > > > > > > > > > > > > > > > > > > hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-000052012-01-06 > > > > > > > 16:27:28,053 INFO > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: > > > > > > > Current split being processed > > > > > > > > > > > > > > > > > > > > > > > > > > > > hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00006:0+61814752012-01-06 > > > > > > > 16:27:28,056 INFO > > > > com.proclivitysystems.etl.pig.udf.loaders.HiveLoader: > > > > > > > Preparing to read: > > > > > > > > > > > > > > > > > > > > > > > > > > > > hdfs://tuamotu:9000/user/hive/warehouse/cobra_client_consumer_cag/client_tid=3/cag_tid=150/150-r-00005 > > > > > > > > > > > > > > > > > > > > > > > > Notice how the pig is correctly reporting the split but my > > "info" > > > > > > > > statement is always reporting the first input split vs > current. > > > > > > > > > > > > > > > > Bug? Jira? Patch? > > > > > > > > > > > > > > > > Thanks > > > > > > > > Alex R > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > "...:::Aniket:::... Quetzalco@tl" > > > > > >
