[jira] Commented: (PIG-1292) Interface Refinements
[ https://issues.apache.org/jira/browse/PIG-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845336#action_12845336 ] Hadoop QA commented on PIG-1292: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12438638/pig-1292.patch against trunk revision 923043. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 531 release audit warnings (more than the trunk's current 530 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/237/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/237/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/237/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/237/console This message is automatically generated. Interface Refinements - Key: PIG-1292 URL: https://issues.apache.org/jira/browse/PIG-1292 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: pig-1292.patch, pig-interfaces.patch A loader can't implement both OrderedLoadFunc and IndexableLoadFunc, as both are abstract classes instead of being interfaces. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Broken build
Hi guys, Trunk has been broken for a while. A bunch of tests in the test-commit target fail, mostly due to The import org.apache.pig.experimental.logical.optimizer.PlanPrinter cannot be resolved. Could someone check in the missing file? -D
[jira] Updated: (PIG-1296) Skewed join fail due to negative partition index
[ https://issues.apache.org/jira/browse/PIG-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1296: Status: Patch Available (was: Open) Skewed join fail due to negative partition index Key: PIG-1296 URL: https://issues.apache.org/jira/browse/PIG-1296 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1296-1.patch Skewed join throw stack: java.io.IOException: Illegal partition for Partition: -1 Null: false index: 0 (fc52di95l6m3j,20100210) (-3648) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:904) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:541) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$MapWithPartitionIndex.collect(PigMapReduce.java:187) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$MapWithPartitionIndex.runPipeline(PigMapReduce.java:206) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:159) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1285) Allow SingleTupleBag to be serialized
[ https://issues.apache.org/jira/browse/PIG-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845458#action_12845458 ] Pradeep Kamath commented on PIG-1285: - Couple of comments: * I think instead of the code below the implementation of write should be inlined into SingleTupleBag.write() (I guess DefaultDataBag.write() and SingleTupleBag.write() could call a common method to implement write()). {noformat} +DataBag bag = bagFactory.newDefaultBag(); +bag.addAll(this); +bag.write(out) {noformat} The reason is that bagFactory.newDefaultBag() registers the bag with the SpillableMemoryManager which inturn puts a weak reference to the bag on a Linked list - in the past we have seen this list grow in size and cause memory issue and was one of the main motivations for creating SingleTupleBag. * There is an implementation for write() but not read() - reading through the code I guess this is because during deserialization SingleTupleBag.read() will not be called but DefaultDataBag.read() would be called. I am wondering if leaving the SingleTupleBag.read() as-is is confusing since it throws an exception with the message - SingleTupleBag should never be serialized or deserialized. Allow SingleTupleBag to be serialized - Key: PIG-1285 URL: https://issues.apache.org/jira/browse/PIG-1285 Project: Pig Issue Type: Improvement Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Fix For: 0.7.0 Attachments: PIG-1285.patch Currently, Pig uses a SingleTupleBag for efficiency when a full-blown spillable bag implementation is not needed in the Combiner optimization. Unfortunately this can create problems. The below Initial.exec() code fails at run-time with the message that a SingleTupleBag cannot be serialized: {code} @Override public Tuple exec(Tuple in) throws IOException { // single record. just copy. if (in == null) return null; try { Tuple resTuple = tupleFactory_.newTuple(in.size()); for (int i=0; i in.size(); i++) { resTuple.set(i, in.get(i)); } return resTuple; } catch (IOException e) { log.warn(e); return null; } } {code} The code below can fix the problem in the UDF, but it seems like something that should be handled transparently, not requiring UDF authors to know about SingleTupleBags. {code} @Override public Tuple exec(Tuple in) throws IOException { // single record. just copy. if (in == null) return null; /* * Unfortunately SingleTupleBags are not serializable. We cache whether a given index contains a bag * in the map below, and copy all bags into DefaultBags before returning to avoid serialization exceptions. */ MapInteger, Boolean isBagAtIndex = Maps.newHashMap(); try { Tuple resTuple = tupleFactory_.newTuple(in.size()); for (int i=0; i in.size(); i++) { Object obj = in.get(i); if (!isBagAtIndex.containsKey(i)) { isBagAtIndex.put(i, obj instanceof SingleTupleBag); } if (isBagAtIndex.get(i)) { DataBag newBag = bagFactory_.newDefaultBag(); newBag.addAll((DataBag)obj); obj = newBag; } resTuple.set(i, obj); } return resTuple; } catch (IOException e) { log.warn(e); return null; } } {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1297) algebraic interface of udf does not get used if the foreach with udf projects column within group
algebraic interface of udf does not get used if the foreach with udf projects column within group - Key: PIG-1297 URL: https://issues.apache.org/jira/browse/PIG-1297 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Thejas M Nair grunt l = load 'file' as (a,b,c); grunt g = group l by (a,b); grunt f = foreach g generate SUM(l.c), group.a; grunt explain f; ... ... #-- # Map Reduce Plan #-- MapReduce node 1-752 Map Plan Local Rearrange[tuple]{tuple}(false) - 1-742 | | | Project[bytearray][0] - 1-743 | | | Project[bytearray][1] - 1-744 | |---Load(file:///Users/tejas/pig/trunk/file:org.apache.pig.builtin.PigStorage) - 1-739 Reduce Plan Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-751 | |---New For Each(false,false)[bag] - 1-750 | | | POUserFunc(org.apache.pig.builtin.SUM)[double] - 1-747 | | | |---Project[bag][2] - 1-746 | | | |---Project[bag][1] - 1-745 | | | Project[bytearray][0] - 1-749 | | | |---Project[tuple][0] - 1-748 | |---Package[tuple]{tuple} - 1-741 Global sort: false -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1298) Restore file traveral behavior to Pig loaders
Restore file traveral behavior to Pig loaders - Key: PIG-1298 URL: https://issues.apache.org/jira/browse/PIG-1298 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.7.0 Given a location to a Pig loader, it is expected to recursively load all the files under the location (i.e., all the files returned with ls -R command). However, after the transition to using Hadoop 20 API, only files returned with ls command are loaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1298) Restore file traversal behavior to Pig loaders
[ https://issues.apache.org/jira/browse/PIG-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1298: -- Summary: Restore file traversal behavior to Pig loaders (was: Restore file traveral behavior to Pig loaders) Restore file traversal behavior to Pig loaders -- Key: PIG-1298 URL: https://issues.apache.org/jira/browse/PIG-1298 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.7.0 Given a location to a Pig loader, it is expected to recursively load all the files under the location (i.e., all the files returned with ls -R command). However, after the transition to using Hadoop 20 API, only files returned with ls command are loaded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1299) Implement Pig counter to track number of output rows for each output files
Implement Pig counter to track number of output rows for each output files Key: PIG-1299 URL: https://issues.apache.org/jira/browse/PIG-1299 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.7.0 When running a multi-store query, the Hadoop job tracker often displays only 0 for Reduce output records or Map output records counters, This is incorrect and misleading. Pig should implement an output records counter for each output files in the query. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1279) Make sample loaders interchangeable
[ https://issues.apache.org/jira/browse/PIG-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding reassigned PIG-1279: - Assignee: Richard Ding Make sample loaders interchangeable Key: PIG-1279 URL: https://issues.apache.org/jira/browse/PIG-1279 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Richard Ding Assignee: Richard Ding In Pig 0.6 one can use random sample loader in place of Poisson sample loader for skewed join, but this isn't the case in trunk (PIG-1264). In general, the sample loaders should be interchangeable (the sampling characteristics differs). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-200) Pig Performance Benchmarks
[ https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-200: --- Attachment: perf-0.6.patch Hi, Duncan, perf.patch is a little bit old. I attach new perf-0.6.patch. Instruction to generate input data for Pigmix is: 1. apply perf-0.6.patch on pig 0.6 release 2. ant jar compile-test 3. export PIG_HOME=. 4. test/utils/pigmix/datagen/generate_data.sh Pig Performance Benchmarks -- Key: PIG-200 URL: https://issues.apache.org/jira/browse/PIG-200 Project: Pig Issue Type: Task Reporter: Amir Youssefi Assignee: Alan Gates Attachments: generate_data.pl, perf-0.6.patch, perf.hadoop.patch, perf.patch To benchmark Pig performance, we need to have a TPC-H like Large Data Set plus Script Collection. This is used in comparison of different Pig releases, Pig vs. other systems (e.g. Pig + Hadoop vs. Hadoop Only). Here is Wiki for small tests: http://wiki.apache.org/pig/PigPerformance I am currently running long-running Pig scripts over data-sets in the order of tens of TBs. Next step is hundreds of TBs. We need to have an open large-data set (open source scripts which generate data-set) and detailed scripts for important operations such as ORDER, AGGREGATION etc. We can call those the Pig Workouts: Cardio (short processing), Marathon (long running scripts) and Triathlon (Mix). I will update this JIRA with more details of current activities soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1296) Skewed join fail due to negative partition index
[ https://issues.apache.org/jira/browse/PIG-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845584#action_12845584 ] Hadoop QA commented on PIG-1296: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12438842/PIG-1296-1.patch against trunk revision 923043. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/238/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/238/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/238/console This message is automatically generated. Skewed join fail due to negative partition index Key: PIG-1296 URL: https://issues.apache.org/jira/browse/PIG-1296 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1296-1.patch Skewed join throw stack: java.io.IOException: Illegal partition for Partition: -1 Null: false index: 0 (fc52di95l6m3j,20100210) (-3648) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:904) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:541) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$MapWithPartitionIndex.collect(PigMapReduce.java:187) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$MapWithPartitionIndex.runPipeline(PigMapReduce.java:206) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:159) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1285) Allow SingleTupleBag to be serialized
[ https://issues.apache.org/jira/browse/PIG-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845606#action_12845606 ] Pradeep Kamath commented on PIG-1285: - SingleTupleBag did not go the route of extending DefaultAbstractBag for a couple of reasons 1) The object would have few more members (like mMemSize* fields, mSize etc which are present in DefaultAbstractBag) - this would make the object bigger in memory and SingleTupleBag was designed to be used in map/combine phase with minimal memory overhead 2) The first point in my previous comment - we don't want this bag to register with SpillableMemoryManger which in turn puts a weak reference to the bag on a Linked list - in the past we have seen this list grow in size and itself cause memory issues Allow SingleTupleBag to be serialized - Key: PIG-1285 URL: https://issues.apache.org/jira/browse/PIG-1285 Project: Pig Issue Type: Improvement Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Fix For: 0.7.0 Attachments: PIG-1285.patch Currently, Pig uses a SingleTupleBag for efficiency when a full-blown spillable bag implementation is not needed in the Combiner optimization. Unfortunately this can create problems. The below Initial.exec() code fails at run-time with the message that a SingleTupleBag cannot be serialized: {code} @Override public Tuple exec(Tuple in) throws IOException { // single record. just copy. if (in == null) return null; try { Tuple resTuple = tupleFactory_.newTuple(in.size()); for (int i=0; i in.size(); i++) { resTuple.set(i, in.get(i)); } return resTuple; } catch (IOException e) { log.warn(e); return null; } } {code} The code below can fix the problem in the UDF, but it seems like something that should be handled transparently, not requiring UDF authors to know about SingleTupleBags. {code} @Override public Tuple exec(Tuple in) throws IOException { // single record. just copy. if (in == null) return null; /* * Unfortunately SingleTupleBags are not serializable. We cache whether a given index contains a bag * in the map below, and copy all bags into DefaultBags before returning to avoid serialization exceptions. */ MapInteger, Boolean isBagAtIndex = Maps.newHashMap(); try { Tuple resTuple = tupleFactory_.newTuple(in.size()); for (int i=0; i in.size(); i++) { Object obj = in.get(i); if (!isBagAtIndex.containsKey(i)) { isBagAtIndex.put(i, obj instanceof SingleTupleBag); } if (isBagAtIndex.get(i)) { DataBag newBag = bagFactory_.newDefaultBag(); newBag.addAll((DataBag)obj); obj = newBag; } resTuple.set(i, obj); } return resTuple; } catch (IOException e) { log.warn(e); return null; } } {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files
[ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1257: Status: Open (was: Patch Available) PigStorage per the new load-store redesign should support splitting of bzip files - Key: PIG-1257 URL: https://issues.apache.org/jira/browse/PIG-1257 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1257-2.patch, PIG-1257.patch PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files
[ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1257: Attachment: blockHeaderEndsAt136500.txt.bz2 blockEndingInCR.txt.bz2 PIG-1257-3.patch Since the last patch, I uncovered some issue with code while testing some boundary conditions. I have fixed those in the new patch PIG-1257-3.patch and included those boundary conditions in testcases in TestBZip PigStorage per the new load-store redesign should support splitting of bzip files - Key: PIG-1257 URL: https://issues.apache.org/jira/browse/PIG-1257 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: blockEndingInCR.txt.bz2, blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, PIG-1257.patch PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files
[ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1257: Status: Patch Available (was: Open) PigStorage per the new load-store redesign should support splitting of bzip files - Key: PIG-1257 URL: https://issues.apache.org/jira/browse/PIG-1257 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: blockEndingInCR.txt.bz2, blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2 PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files
[ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1257: Attachment: recordLossblockHeaderEndsAt136500.txt.bz2 The .bz2 files attached to this issue should be put in test/org/apache/pig/test/data for this patch to pass unit tests. PigStorage per the new load-store redesign should support splitting of bzip files - Key: PIG-1257 URL: https://issues.apache.org/jira/browse/PIG-1257 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: blockEndingInCR.txt.bz2, blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2 PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files
[ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845627#action_12845627 ] Hadoop QA commented on PIG-1257: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12438883/recordLossblockHeaderEndsAt136500.txt.bz2 against trunk revision 923043. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/239/console This message is automatically generated. PigStorage per the new load-store redesign should support splitting of bzip files - Key: PIG-1257 URL: https://issues.apache.org/jira/browse/PIG-1257 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: blockEndingInCR.txt.bz2, blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2 PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1300) PigStorage does not load tuples with large #s.
PigStorage does not load tuples with large #s. -- Key: PIG-1300 URL: https://issues.apache.org/jira/browse/PIG-1300 Project: Pig Issue Type: Bug Components: data Reporter: Brian Donaldson Say I have a file 'a' with the following entry: (30010401402) grunt A = LOAD 'a' AS (t:tuple(a:chararray)); grunt DUMP A; 2010-03-15 17:37:23,333 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.PigStorage: Unable to interpret value [...@353c375 in field being converted to type tuple, caught Exception For input string: 30010401402 field discarded 2010-03-15 17:37:23,335 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully stored result in: file:/tmp/temp-1345435162/tmp-308780808 2010-03-15 17:37:23,335 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written : 1 2010-03-15 17:37:23,335 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 0 2010-03-15 17:37:23,335 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2010-03-15 17:37:23,336 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! () If I have another file 'b' with the following entry: (30010401402L) grunt B = LOAD 'b' AS (t:tuple(a:chararray)); grunt DUMP B; 2010-03-15 17:39:10,051 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully stored result in: file:/tmp/temp-1630850555/tmp1316256240 2010-03-15 17:39:10,051 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written : 1 2010-03-15 17:39:10,051 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 0 2010-03-15 17:39:10,051 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2010-03-15 17:39:10,052 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! ((30010401402L)) Is there a way to get the load in the first example to work? Or do I need to start affixing an L to all my #s? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1300) PigStorage does not load tuples with large #s.
[ https://issues.apache.org/jira/browse/PIG-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845628#action_12845628 ] Daniel Dai commented on PIG-1300: - Which version of Pig are you using? Can you try it on trunk? Looks like it should be fixed in PIG-613. PigStorage does not load tuples with large #s. -- Key: PIG-1300 URL: https://issues.apache.org/jira/browse/PIG-1300 Project: Pig Issue Type: Bug Components: data Reporter: Brian Donaldson Say I have a file 'a' with the following entry: (30010401402) grunt A = LOAD 'a' AS (t:tuple(a:chararray)); grunt DUMP A; 2010-03-15 17:37:23,333 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.PigStorage: Unable to interpret value [...@353c375 in field being converted to type tuple, caught Exception For input string: 30010401402 field discarded 2010-03-15 17:37:23,335 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully stored result in: file:/tmp/temp-1345435162/tmp-308780808 2010-03-15 17:37:23,335 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written : 1 2010-03-15 17:37:23,335 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 0 2010-03-15 17:37:23,335 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2010-03-15 17:37:23,336 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! () If I have another file 'b' with the following entry: (30010401402L) grunt B = LOAD 'b' AS (t:tuple(a:chararray)); grunt DUMP B; 2010-03-15 17:39:10,051 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully stored result in: file:/tmp/temp-1630850555/tmp1316256240 2010-03-15 17:39:10,051 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written : 1 2010-03-15 17:39:10,051 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 0 2010-03-15 17:39:10,051 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2010-03-15 17:39:10,052 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! ((30010401402L)) Is there a way to get the load in the first example to work? Or do I need to start affixing an L to all my #s? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1300) PigStorage does not load tuples with large #s.
[ https://issues.apache.org/jira/browse/PIG-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845636#action_12845636 ] Brian Donaldson commented on PIG-1300: -- This is with version 0.5+11.1 (cloudera), and with the recently released 0.6. PigStorage does not load tuples with large #s. -- Key: PIG-1300 URL: https://issues.apache.org/jira/browse/PIG-1300 Project: Pig Issue Type: Bug Components: data Reporter: Brian Donaldson Say I have a file 'a' with the following entry: (30010401402) grunt A = LOAD 'a' AS (t:tuple(a:chararray)); grunt DUMP A; 2010-03-15 17:37:23,333 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.PigStorage: Unable to interpret value [...@353c375 in field being converted to type tuple, caught Exception For input string: 30010401402 field discarded 2010-03-15 17:37:23,335 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully stored result in: file:/tmp/temp-1345435162/tmp-308780808 2010-03-15 17:37:23,335 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written : 1 2010-03-15 17:37:23,335 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 0 2010-03-15 17:37:23,335 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2010-03-15 17:37:23,336 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! () If I have another file 'b' with the following entry: (30010401402L) grunt B = LOAD 'b' AS (t:tuple(a:chararray)); grunt DUMP B; 2010-03-15 17:39:10,051 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully stored result in: file:/tmp/temp-1630850555/tmp1316256240 2010-03-15 17:39:10,051 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written : 1 2010-03-15 17:39:10,051 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 0 2010-03-15 17:39:10,051 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2010-03-15 17:39:10,052 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! ((30010401402L)) Is there a way to get the load in the first example to work? Or do I need to start affixing an L to all my #s? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1300) PigStorage does not load tuples with large #s.
[ https://issues.apache.org/jira/browse/PIG-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845641#action_12845641 ] Daniel Dai commented on PIG-1300: - I just tried, it works in trunk. The fix will come with next release (0.7). PigStorage does not load tuples with large #s. -- Key: PIG-1300 URL: https://issues.apache.org/jira/browse/PIG-1300 Project: Pig Issue Type: Bug Components: data Reporter: Brian Donaldson Fix For: 0.7.0 Say I have a file 'a' with the following entry: (30010401402) grunt A = LOAD 'a' AS (t:tuple(a:chararray)); grunt DUMP A; 2010-03-15 17:37:23,333 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.PigStorage: Unable to interpret value [...@353c375 in field being converted to type tuple, caught Exception For input string: 30010401402 field discarded 2010-03-15 17:37:23,335 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully stored result in: file:/tmp/temp-1345435162/tmp-308780808 2010-03-15 17:37:23,335 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written : 1 2010-03-15 17:37:23,335 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 0 2010-03-15 17:37:23,335 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2010-03-15 17:37:23,336 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! () If I have another file 'b' with the following entry: (30010401402L) grunt B = LOAD 'b' AS (t:tuple(a:chararray)); grunt DUMP B; 2010-03-15 17:39:10,051 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully stored result in: file:/tmp/temp-1630850555/tmp1316256240 2010-03-15 17:39:10,051 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written : 1 2010-03-15 17:39:10,051 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 0 2010-03-15 17:39:10,051 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2010-03-15 17:39:10,052 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! ((30010401402L)) Is there a way to get the load in the first example to work? Or do I need to start affixing an L to all my #s? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1300) PigStorage does not load tuples with large #s.
[ https://issues.apache.org/jira/browse/PIG-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved PIG-1300. - Resolution: Fixed Fix Version/s: 0.7.0 PigStorage does not load tuples with large #s. -- Key: PIG-1300 URL: https://issues.apache.org/jira/browse/PIG-1300 Project: Pig Issue Type: Bug Components: data Reporter: Brian Donaldson Fix For: 0.7.0 Say I have a file 'a' with the following entry: (30010401402) grunt A = LOAD 'a' AS (t:tuple(a:chararray)); grunt DUMP A; 2010-03-15 17:37:23,333 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.builtin.PigStorage: Unable to interpret value [...@353c375 in field being converted to type tuple, caught Exception For input string: 30010401402 field discarded 2010-03-15 17:37:23,335 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully stored result in: file:/tmp/temp-1345435162/tmp-308780808 2010-03-15 17:37:23,335 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written : 1 2010-03-15 17:37:23,335 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 0 2010-03-15 17:37:23,335 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2010-03-15 17:37:23,336 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! () If I have another file 'b' with the following entry: (30010401402L) grunt B = LOAD 'b' AS (t:tuple(a:chararray)); grunt DUMP B; 2010-03-15 17:39:10,051 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully stored result in: file:/tmp/temp-1630850555/tmp1316256240 2010-03-15 17:39:10,051 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records written : 1 2010-03-15 17:39:10,051 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written : 0 2010-03-15 17:39:10,051 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2010-03-15 17:39:10,052 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! ((30010401402L)) Is there a way to get the load in the first example to work? Or do I need to start affixing an L to all my #s? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1292) Interface Refinements
[ https://issues.apache.org/jira/browse/PIG-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12845717#action_12845717 ] Pradeep Kamath commented on PIG-1292: - As Xuefu mentioned, we can get rid of the splitIdx argument in public WritableComparable? getSplitComparable(InputSplit split, int splitIdx). Otherwise the changes look good, +1 for commit with the above change. Interface Refinements - Key: PIG-1292 URL: https://issues.apache.org/jira/browse/PIG-1292 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: pig-1292.patch, pig-interfaces.patch A loader can't implement both OrderedLoadFunc and IndexableLoadFunc, as both are abstract classes instead of being interfaces. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.