Re: High(er) res Pig logo?
I have a couple of higher resolution pigs in overalls and a pig on the Hadoop elephant. I've checked them into src/docs/src/documentation/ resources/images/ so all can use them. Also, we're working on cleaning up the Pig with Y! logo issue. Alan. On Sep 27, 2009, at 9:59 AM, Dmitriy Ryaboy wrote: Where can one find the Pig logo in a size/resolution suitable for presentations? Also, I went on the website and noticed that the Y! reappeared on Pig's chest. -D
Re: High(er) res Pig logo?
Thanks Alan, got 'em. Much appreciated. -D On Mon, Sep 28, 2009 at 3:44 PM, Alan Gates ga...@yahoo-inc.com wrote: I have a couple of higher resolution pigs in overalls and a pig on the Hadoop elephant. I've checked them into src/docs/src/documentation/resources/images/ so all can use them. Also, we're working on cleaning up the Pig with Y! logo issue. Alan. On Sep 27, 2009, at 9:59 AM, Dmitriy Ryaboy wrote: Where can one find the Pig logo in a size/resolution suitable for presentations? Also, I went on the website and noticed that the Y! reappeared on Pig's chest. -D
RE: High(er) res Pig logo?
I have cleaned up the logo. Olga -Original Message- From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] Sent: Sunday, September 27, 2009 10:00 AM To: pig-dev@hadoop.apache.org Subject: High(er) res Pig logo? Where can one find the Pig logo in a size/resolution suitable for presentations? Also, I went on the website and noticed that the Y! reappeared on Pig's chest. -D
[jira] Commented: (PIG-979) Acummulator Interface for UDFs
[ https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760389#action_12760389 ] Alan Gates commented on PIG-979: Jeff, thanks for the paper. I looked over it and I'm not certain it directly applies. They are measuring both the aggregation time (sort or hash) and how it is passed to the user defined aggregate (iterate or accumulate). Being in Hadoop we already have the aggregation done. So it's just a question of the fastest way to make the data available to the UDF. As I said above, we want to test the performance of this and prove its worth before we add it. As a general complaint, they used a fairly old revision of Pig code in their paper, even though it appears it was published in the last few months. Acummulator Interface for UDFs -- Key: PIG-979 URL: https://issues.apache.org/jira/browse/PIG-979 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Ying He Add an accumulator interface for UDFs that would allow them to take a set number of records at a time instead of the entire bag. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-981) Merge join should restrict join key expressions to simple projects
Merge join should restrict join key expressions to simple projects -- Key: PIG-981 URL: https://issues.apache.org/jira/browse/PIG-981 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Currently merge join allows join key expressions to be arbitrary expressions with the assumption that the expressions keep the sort order. Since currently only ascending sort order is supported, the code checks at run times for sort order and catches the case where sort order is broken because the join key expression is not order preserving. However there is a reason we should restrict the join keys to projection of columns only: PIG-953 will enable pig to perform merge join to work with loaders and store functions which can internally index sorted data. These store functions can only create an index (and hence lookup on the index) on raw data columns (and not expressions on the columns). Hopefully this does not downgrade the usability of merge join much since if the expressions can always be applied post join on the join columns and since the expressions are order preserving they do not affect the outcome of the join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-979) Acummulator Interface for UDFs
[ https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760396#action_12760396 ] Alan Gates commented on PIG-979: Ciemo, In your comment above, you indicate you'd like functions like cumulative sum to be able to emit a value each time a record is added. But how does that work with something like: {code} A = load 'bla'; B = group A by $0; C = foreach B generate { D = order A by $1; generate CUMULATIVE_SUM(D.$2), SUM(D.$2); } {code} SUM can't output a value until it's seen everything, but CUMULATIVE_SUM will have an output on every record. The way Pig's data model handles this with bags. The other possibility I can see is that Pig handles this as having an implicit flatten, so output from above would look like: 1 10 3 10 6 10 10 10 Are you proposing that we create a way to streamline output of these types of functions to STORE (or DUMP) so that the bag never need be materialized? Or do you want a UDF type that takes a bag and produces multiple outputs along with an implicit flatten? Or are you suggesting a change in the data model? Acummulator Interface for UDFs -- Key: PIG-979 URL: https://issues.apache.org/jira/browse/PIG-979 Project: Pig Issue Type: New Feature Reporter: Alan Gates Assignee: Ying He Add an accumulator interface for UDFs that would allow them to take a set number of records at a time instead of the entire bag. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-981) Merge join should restrict join key expressions to simple projects
[ https://issues.apache.org/jira/browse/PIG-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760406#action_12760406 ] Ashutosh Chauhan commented on PIG-981: -- Default Merge Join implementation can handle order preserving join expressions, that is, when merge join itself builds the index and doesn't rely on underlying storage for index. When Merge Join doesn't build index itself, this can't be guaranteed, but then we don't have to limit all possible uses of merge-join because of this reason. Rather, we should check if Merge Join is building indexes of its own, if it is then allow order preserving expression, if it is not, only *then* restrict expressions to projections. Merge join should restrict join key expressions to simple projects -- Key: PIG-981 URL: https://issues.apache.org/jira/browse/PIG-981 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Currently merge join allows join key expressions to be arbitrary expressions with the assumption that the expressions keep the sort order. Since currently only ascending sort order is supported, the code checks at run times for sort order and catches the case where sort order is broken because the join key expression is not order preserving. However there is a reason we should restrict the join keys to projection of columns only: PIG-953 will enable pig to perform merge join to work with loaders and store functions which can internally index sorted data. These store functions can only create an index (and hence lookup on the index) on raw data columns (and not expressions on the columns). Hopefully this does not downgrade the usability of merge join much since if the expressions can always be applied post join on the join columns and since the expressions are order preserving they do not affect the outcome of the join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data
[ https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760423#action_12760423 ] Pradeep Kamath commented on PIG-953: Here is a proposal for dealing with Sort Column information in SortInfo. Rather than giving Arraylist of column names and separate array list of asc/desc flags, it would be good to have a unified structure containing both pieces of information per sort column. Also there are use cases for providing column names (zebra) and for them being optional and providing column positions instead which some other loader /optimizer might find useful. The type of the column might also be useful if available. Hence, the proposal is to have a SortColumn class with the following attributes : column name, column position (zero based index), column type, asc/desc flag. Then in SortInfo there would be a ListSortColumn which would be available through a getter. This should address both the concerns above. Callers will need to explicity check for null column names and UNKNOWN column type since these two scenarios may occur if schema is not available for pig runtime to provide the information. Thoughts? Enable merge join in pig to work with loaders and store functions which can internally index sorted data - Key: PIG-953 URL: https://issues.apache.org/jira/browse/PIG-953 Project: Pig Issue Type: Improvement Affects Versions: 0.3.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-953-2.patch, PIG-953.patch Currently merge join implementation in pig includes construction of an index on sorted data and use of that index to seek into the right input to efficiently perform the join operation. Some loaders (notably the zebra loader) internally implement an index on sorted data and can perform this seek efficiently using their index. So the use of the index needs to be abstracted in such a way that when the loader supports indexing, pig uses it (indirectly through the loader) and does not construct an index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage
[ https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-960: --- Attachment: pig_rlr.patch Added a new patch with Apache license and SVN Trunk Revision 819662 Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage --- Key: PIG-960 URL: https://issues.apache.org/jira/browse/PIG-960 Project: Pig Issue Type: Improvement Components: impl Reporter: Ankit Modi Attachments: pig_rlr.patch PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's {{LineRecordReader}}. This can help in following areas - Improving performance reading of Tuples (lines) in {{PigStorage}} - Any future improvements in line reading done in Hadoop's {{LineRecordReader}} is automatically carried over to Pig Issues that are handled by this patch - BZip uses internal buffers and positioning for determining the number of bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off - Current implementation of {{LocalSeekableInputStream}} does not implement {{available}} method. This method has to be implemented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
[ https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-975: --- Status: Open (was: Patch Available) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively - Key: PIG-975 URL: https://issues.apache.org/jira/browse/PIG-975 Project: Pig Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Ying He Assignee: Ying He Fix For: 0.2.0 Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, PIG-975.patch3, PIG-975.patch4 POPackage uses DefaultDataBag during reduce process to hold data. It is registered with SpillableMemoryManager and prone to OutOfMemoryException. It's better to pro-actively managers the usage of the memory. The bag fills in memory to a specified amount, and dump the rest the disk. The amount of memory to hold tuples is configurable. This can avoid out of memory error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
[ https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-975: --- Fix Version/s: (was: 0.2.0) 0.6.0 Affects Version/s: (was: 0.2.0) 0.4.0 Status: Patch Available (was: Open) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively - Key: PIG-975 URL: https://issues.apache.org/jira/browse/PIG-975 Project: Pig Issue Type: Improvement Affects Versions: 0.4.0 Reporter: Ying He Assignee: Ying He Fix For: 0.6.0 Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, PIG-975.patch3, PIG-975.patch4 POPackage uses DefaultDataBag during reduce process to hold data. It is registered with SpillableMemoryManager and prone to OutOfMemoryException. It's better to pro-actively managers the usage of the memory. The bag fills in memory to a specified amount, and dump the rest the disk. The amount of memory to hold tuples is configurable. This can avoid out of memory error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-752) local mode doesn't read bzip2 and gzip compressed data files
[ https://issues.apache.org/jira/browse/PIG-752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-752: --- Fix Version/s: (was: 0.4.0) 0.6.0 local mode doesn't read bzip2 and gzip compressed data files Key: PIG-752 URL: https://issues.apache.org/jira/browse/PIG-752 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: David Ciemiewicz Assignee: Jeff Zhang Fix For: 0.6.0 Attachments: Pig_752.Patch Problem 1) use of .bz2 file extension does not store results bzip2 compressed in Local mode (-exectype local) If I use the .bz2 filename extension in a STORE statement on HDFS, the results are stored with bzip2 compression. If I use the .bz2 filename extension in a STORE statement on local file system, the results are NOT stored with bzip2 compression. compact.bz2.pig: {code} A = load 'events.test' using PigStorage(); store A into 'events.test.bz2' using PigStorage(); C = load 'events.test.bz2' using PigStorage(); C = limit C 10; dump C; {code} {code} -bash-3.00$ pig -exectype local compact.bz2.pig -bash-3.00$ file events.test events.test: ASCII English text, with very long lines -bash-3.00$ file events.test.bz2 events.test.bz2: ASCII English text, with very long lines -bash-3.00$ cat events.test | bzip2 events.test.bz2 -bash-3.00$ file events.test.bz2 events.test.bz2: bzip2 compressed data, block size = 900k {code} The output format in local mode is definitely not bzip2, but it should be. {code} Problem 2) pig in local mode does not decompress bzip2 compressed files, but should, to be consistent with HDFS read.bz2.pig: {code} A = load 'events.test.bz2' using PigStorage(); A = limit A 10; dump A; {code} The output should be human readable but is instead garbage, indicating no decompression took place during the load: {code} -bash-3.00$ pig -exectype local read.bz2.pig USING: /grid/0/gs/pig/current 2009-04-03 18:26:30,455 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete! 2009-04-03 18:26:30,456 [main] INFO org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!! (BZh91AYsyoz?u?...@{x_?d?|u-??mK???;??4?C??) ((R? 6?*mg, ?6?Zj?k,???0?QT?d???hY?#mJ?[j???z?m?t?u?K)??K5+??)?m?E7j?X?8a?? ??U?p@@MT?$?B?P??N??=???(z}gk...@c$\??i]?g:?J) a(R?,?u?v???...@?i@??J??!D?)???A?PP?IY??m? (mP(i?4,#F[?I)@?...@??|7^?}U??wwg,?u?$?T???((Q!D?=`*?}hP??_|??=?(??2???m=?xG?(?rC?B?(33??:4?N???t|??T?*??k??NT?x???=?fyv?wf??4z???4t?) (?oou?t???Kwl?3?nCM?WS?;l???P?s?x a???e)B??9? ?44 ((?...@4?) (f) (?...@+?d?0@?U) (Q?SR) -bash-3.00$ {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-660) Integration with Hadoop 0.20
[ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-660. Resolution: Fixed patch was committed a while back Integration with Hadoop 0.20 Key: PIG-660 URL: https://issues.apache.org/jira/browse/PIG-660 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0 Environment: Hadoop 0.20 Reporter: Santhosh Srinivasan Assignee: Santhosh Srinivasan Fix For: 0.5.0 Attachments: hadoop20.jar.gz, PIG-660-for-branch-0.3.patch, PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch, PIG-660_trunk.patch, PIG-660_trunk_2.patch, pig_660_shims.patch, pig_660_shims_2.patch, pig_660_shims_3.patch With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking. 1. Hadoop should return objects instead of strings when exceptions are thrown 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-592) schema inferred incorrectly
[ https://issues.apache.org/jira/browse/PIG-592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-592: --- Fix Version/s: (was: 0.5.0) 0.6.0 schema inferred incorrectly --- Key: PIG-592 URL: https://issues.apache.org/jira/browse/PIG-592 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Christopher Olston Fix For: 0.6.0 Attachments: PIG-592-1.patch A simple pig script, that never introduces any schema information: A = load 'foo'; B = foreach (group A by $8) generate group, COUNT($1); C = load 'bar'; // ('bar' has two columns) D = join B by $0, C by $0; E = foreach D generate $0, $1, $3; Fails, complaining that $3 does not exist: java.io.IOException: Out of bound access. Trying to access non-existent column: 3. Schema {B::group: bytearray,long,bytearray} has 3 column(s). Apparently Pig gets confused, and thinks it knows the schema for C (a single bytearray column). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-956) Reduce patch testing time
[ https://issues.apache.org/jira/browse/PIG-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-956: --- Attachment: PIG-956.patch Reduce patch testing time - Key: PIG-956 URL: https://issues.apache.org/jira/browse/PIG-956 Project: Pig Issue Type: Improvement Affects Versions: 0.4.0 Reporter: Olga Natkovich Assignee: Olga Natkovich Fix For: 0.6.0 Attachments: PIG-956.patch The proposal is to split the tests into 2 groups: (1) Ten-minute tests - this is a set of tests that run with every patch submission and takes aproximately 10 minutes (2) All tests - these include all tests and they will run nightly This is similar to work done in Hadoop: http://issues.apache.org/jira/browse/HDFS-458 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-956) Reduce patch testing time
[ https://issues.apache.org/jira/browse/PIG-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-956: --- Status: Patch Available (was: Open) Reduce patch testing time - Key: PIG-956 URL: https://issues.apache.org/jira/browse/PIG-956 Project: Pig Issue Type: Improvement Affects Versions: 0.4.0 Reporter: Olga Natkovich Assignee: Olga Natkovich Fix For: 0.6.0 Attachments: PIG-956.patch The proposal is to split the tests into 2 groups: (1) Ten-minute tests - this is a set of tests that run with every patch submission and takes aproximately 10 minutes (2) All tests - these include all tests and they will run nightly This is similar to work done in Hadoop: http://issues.apache.org/jira/browse/HDFS-458 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-956) Reduce patch testing time
[ https://issues.apache.org/jira/browse/PIG-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760454#action_12760454 ] Olga Natkovich commented on PIG-956: The test-commit target runs 7-8 minutes and has coverage of 53% (compared to 70% for the entire set of tests) Reduce patch testing time - Key: PIG-956 URL: https://issues.apache.org/jira/browse/PIG-956 Project: Pig Issue Type: Improvement Affects Versions: 0.4.0 Reporter: Olga Natkovich Assignee: Olga Natkovich Fix For: 0.6.0 Attachments: PIG-956.patch The proposal is to split the tests into 2 groups: (1) Ten-minute tests - this is a set of tests that run with every patch submission and takes aproximately 10 minutes (2) All tests - these include all tests and they will run nightly This is similar to work done in Hadoop: http://issues.apache.org/jira/browse/HDFS-458 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage
[ https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760484#action_12760484 ] Hadoop QA commented on PIG-960: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12420748/pig_rlr.patch against trunk revision 819691. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 406 javac compiler warnings (more than the trunk's current 403 warnings). +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/48/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/48/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/48/console This message is automatically generated. Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage --- Key: PIG-960 URL: https://issues.apache.org/jira/browse/PIG-960 Project: Pig Issue Type: Improvement Components: impl Reporter: Ankit Modi Attachments: pig_rlr.patch PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's {{LineRecordReader}}. This can help in following areas - Improving performance reading of Tuples (lines) in {{PigStorage}} - Any future improvements in line reading done in Hadoop's {{LineRecordReader}} is automatically carried over to Pig Issues that are handled by this patch - BZip uses internal buffers and positioning for determining the number of bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off - Current implementation of {{LocalSeekableInputStream}} does not implement {{available}} method. This method has to be implemented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
[ https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760485#action_12760485 ] Hadoop QA commented on PIG-975: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12420603/PIG-975.patch4 against trunk revision 819691. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 406 javac compiler warnings (more than the trunk's current 403 warnings). -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. -1 release audit. The applied patch generated 278 release audit warnings (more than the trunk's current 277 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/10/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/10/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/10/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/10/console This message is automatically generated. Need a databag that does not register with SpillableMemoryManager and spill data pro-actively - Key: PIG-975 URL: https://issues.apache.org/jira/browse/PIG-975 Project: Pig Issue Type: Improvement Affects Versions: 0.4.0 Reporter: Ying He Assignee: Ying He Fix For: 0.6.0 Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, PIG-975.patch3, PIG-975.patch4 POPackage uses DefaultDataBag during reduce process to hold data. It is registered with SpillableMemoryManager and prone to OutOfMemoryException. It's better to pro-actively managers the usage of the memory. The bag fills in memory to a specified amount, and dump the rest the disk. The amount of memory to hold tuples is configurable. This can avoid out of memory error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.