[jira] Commented: (PIG-1330) Move pruned schema tracking logic from LoadFunc to core code
[ https://issues.apache.org/jira/browse/PIG-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852269#action_12852269 ] Hadoop QA commented on PIG-1330: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12440308/PIG-1330-1.patch against trunk revision 929737. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/274/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/274/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/274/console This message is automatically generated. Move pruned schema tracking logic from LoadFunc to core code Key: PIG-1330 URL: https://issues.apache.org/jira/browse/PIG-1330 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1330-1.patch Currently, LoadFunc.getSchema require a schema after column pruning. The good side of this is LoadFunc.getSchema matches the data it actually load. This gives a sense of consistency. However, by doing this, every LoadFunc need to keep track of the columns pruned. This is an unnecessary burden to the LoadFunc writer and it is very error proning. This issue is to move this logic from LoadFunc to Pig core. LoadFunc.getSchema then only need to return original schema even after pruning. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1341) BinStorage cannot convert DataByteArray to Chararray and results in FIELD_DISCARDED_TYPE_CONVERSION_FAILED
[ https://issues.apache.org/jira/browse/PIG-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852308#action_12852308 ] Hadoop QA commented on PIG-1341: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12440415/PIG-1341.patch against trunk revision 929737. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/266/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/266/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/266/console This message is automatically generated. BinStorage cannot convert DataByteArray to Chararray and results in FIELD_DISCARDED_TYPE_CONVERSION_FAILED -- Key: PIG-1341 URL: https://issues.apache.org/jira/browse/PIG-1341 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1341.patch Script reads in BinStorage data and tries to convert a column which is in DataByteArray to Chararray. {code} raw = load 'sampledata' using BinStorage() as (col1,col2, col3); --filter out null columns A = filter raw by col1#'bcookie' is not null; B = foreach A generate col1#'bcookie' as reqcolumn; describe B; --B: {regcolumn: bytearray} X = limit B 5; dump X; B = foreach A generate (chararray)col1#'bcookie' as convertedcol; describe B; --B: {convertedcol: chararray} X = limit B 5; dump X; {code} The first dump produces: (36co9b55onr8s) (36co9b55onr8s) (36hilul5oo1q1) (36hilul5oo1q1) (36l4cj15ooa8a) The second dump produces: () () () () () It also throws an error message: FIELD_DISCARDED_TYPE_CONVERSION_FAILED 5 time(s). Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1337) Need a way to pass distributed cache configuration information to hadoop backend in Pig's LoadFunc
[ https://issues.apache.org/jira/browse/PIG-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852445#action_12852445 ] Chao Wang commented on PIG-1337: It's ok for us not to use getSchema() for this purpose since it's a pure getter method. What we need is simply a setter method in LoadFunc through which we can set up distributed cache. Pig needs to ensure that this information is indeed in the job configuration variable that's being passed to hadoop backend. Also, this setter method should be only invoked at Pig's frondend. In the case of one m/r job containing multiple LoadFunc instances, Pig may need to combine distributed cache configuration information from all instances. Also, we note that using the UDFContext to convey information from frontend to backend is not working for this. We need the job configuration variable already contain all the distributed cache related information when it's being passed to the hadoop backend. Need a way to pass distributed cache configuration information to hadoop backend in Pig's LoadFunc -- Key: PIG-1337 URL: https://issues.apache.org/jira/browse/PIG-1337 Project: Pig Issue Type: Improvement Affects Versions: 0.6.0 Reporter: Chao Wang Fix For: 0.8.0 The Zebra storage layer needs to use distributed cache to reduce name node load during job runs. To to this, Zebra needs to set up distributed cache related configuration information in TableLoader (which extends Pig's LoadFunc) . It is doing this within getSchema(conf). The problem is that the conf object here is not the one that is being serialized to map/reduce backend. As such, the distributed cache is not set up properly. To work over this problem, we need Pig in its LoadFunc to ensure a way that we can use to set up distributed cache information in a conf object, and this conf object is the one used by map/reduce backend. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1313) PigServer leaks memory over time
[ https://issues.apache.org/jira/browse/PIG-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Graham updated PIG-1313: - Attachment: PIG-1313-4.patch # Attaching PIG-1313-4.patch with additional Javadocs on PigServer and PigServer.shutdown(). # triggerDeleteOnFail is still called by TestMultQueryLocal.executePlan, but you are correct in that no one is (or was) calling registerDeleteOnFail, which is the only entry point to push something onto the deleteOnFail stack. I will gladly remove deleteOnFail and all calls to it as part of this JIRA, or we can handle it in another one if that's cleaner w.r.t. issue tracking. Let me know. PigServer leaks memory over time Key: PIG-1313 URL: https://issues.apache.org/jira/browse/PIG-1313 Project: Pig Issue Type: Bug Reporter: Bill Graham Assignee: Bill Graham Attachments: PIG-1313-0.4.0-1.patch, PIG-1313-1.patch, PIG-1313-1.patch, PIG-1313-2.patch, PIG-1313-3.patch, PIG-1313-4.patch, Pig1313Reproducer.java When {{PigServer}} runs it creates temporary files using the {{FileLocalizer.getTemporaryPath(..)}}. This static method creates and returns a handle to a temporary file (as an instance of {{ElementDescriptor}}). The {{ElementDescriptors}} returned by this method are kept on a static {{Stack}} named {{toDelete}}. The items on {{toDelete}} get removed by the {{FileLocalizer.deleteTempFile()}} method. The only place in the code where I see {{FileLocalizer.deleteTempFile()}} called is in the Main class. {{PigServer}} does not call that method though, so a long-running VM that repeatedly uses instances of {{PigServer}} to run jobs will leak memory via {{toDelete}}. One suggested fix is to have {{PigServer.shutdown()}} call {{FileLocalizer.deleteTempFile()}}, but this would cause problems in a multi-threaded environment, since it seems {{ElementDescriptors}} are pushed onto the {{toDelete}} stack before they're used, not once they're done with. With this approach, running multiple instances of {{PigServer}} in separate threads could cause one completed job to clobber the other's still-in-use temp files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1313) PigServer leaks memory over time
[ https://issues.apache.org/jira/browse/PIG-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1313: Component/s: impl Affects Version/s: 0.7.0 Fix Version/s: 0.8.0 PigServer leaks memory over time Key: PIG-1313 URL: https://issues.apache.org/jira/browse/PIG-1313 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Bill Graham Assignee: Bill Graham Fix For: 0.8.0 Attachments: PIG-1313-0.4.0-1.patch, PIG-1313-1.patch, PIG-1313-1.patch, PIG-1313-2.patch, PIG-1313-3.patch, PIG-1313-4.patch, Pig1313Reproducer.java When {{PigServer}} runs it creates temporary files using the {{FileLocalizer.getTemporaryPath(..)}}. This static method creates and returns a handle to a temporary file (as an instance of {{ElementDescriptor}}). The {{ElementDescriptors}} returned by this method are kept on a static {{Stack}} named {{toDelete}}. The items on {{toDelete}} get removed by the {{FileLocalizer.deleteTempFile()}} method. The only place in the code where I see {{FileLocalizer.deleteTempFile()}} called is in the Main class. {{PigServer}} does not call that method though, so a long-running VM that repeatedly uses instances of {{PigServer}} to run jobs will leak memory via {{toDelete}}. One suggested fix is to have {{PigServer.shutdown()}} call {{FileLocalizer.deleteTempFile()}}, but this would cause problems in a multi-threaded environment, since it seems {{ElementDescriptors}} are pushed onto the {{toDelete}} stack before they're used, not once they're done with. With this approach, running multiple instances of {{PigServer}} in separate threads could cause one completed job to clobber the other's still-in-use temp files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1313) PigServer leaks memory over time
[ https://issues.apache.org/jira/browse/PIG-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852463#action_12852463 ] Daniel Dai commented on PIG-1313: - Thanks, Bill, Let's leave triggerDeleteOnFail. This is the thing we want to fix. I've opened another Jira [PIG-1347|https://issues.apache.org/jira/browse/PIG-1347] for that. This patch is good to go and I will commit it shortly. PigServer leaks memory over time Key: PIG-1313 URL: https://issues.apache.org/jira/browse/PIG-1313 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Bill Graham Assignee: Bill Graham Fix For: 0.8.0 Attachments: PIG-1313-0.4.0-1.patch, PIG-1313-1.patch, PIG-1313-1.patch, PIG-1313-2.patch, PIG-1313-3.patch, PIG-1313-4.patch, Pig1313Reproducer.java When {{PigServer}} runs it creates temporary files using the {{FileLocalizer.getTemporaryPath(..)}}. This static method creates and returns a handle to a temporary file (as an instance of {{ElementDescriptor}}). The {{ElementDescriptors}} returned by this method are kept on a static {{Stack}} named {{toDelete}}. The items on {{toDelete}} get removed by the {{FileLocalizer.deleteTempFile()}} method. The only place in the code where I see {{FileLocalizer.deleteTempFile()}} called is in the Main class. {{PigServer}} does not call that method though, so a long-running VM that repeatedly uses instances of {{PigServer}} to run jobs will leak memory via {{toDelete}}. One suggested fix is to have {{PigServer.shutdown()}} call {{FileLocalizer.deleteTempFile()}}, but this would cause problems in a multi-threaded environment, since it seems {{ElementDescriptors}} are pushed onto the {{toDelete}} stack before they're used, not once they're done with. With this approach, running multiple instances of {{PigServer}} in separate threads could cause one completed job to clobber the other's still-in-use temp files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1333) API interface to Pig
[ https://issues.apache.org/jira/browse/PIG-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding reassigned PIG-1333: - Assignee: Richard Ding API interface to Pig Key: PIG-1333 URL: https://issues.apache.org/jira/browse/PIG-1333 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Richard Ding Fix For: 0.8.0 It would be nice to make Pig more friendly for applications like workflow that would be executing pig scripts on user behalf. Currently, they would have to use pig command line to execute the code; however, this has limitation on the kind of output that would be delivered. For instance, it is hard to produce error information that is easy to use programatically or collect statistics. The proposal is to create a class that mimics the behavior of the Main but gives users a status object back. The the main code of pig would look somethig like: public static void main(String args[]) { PigStatus ps = PigMain.exec(args); exit (PigStatus.rc); } We need to define the following: - Content of PigStatus. It should at least include * return code * error string * exception * statistics - A way to propagate the status class through pig code -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1338) Pig should exclude hadoop conf in local mode
[ https://issues.apache.org/jira/browse/PIG-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1338: Attachment: PIG-1338-5.patch Pig should exclude hadoop conf in local mode Key: PIG-1338 URL: https://issues.apache.org/jira/browse/PIG-1338 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Attachments: PIG-1338-1.patch, PIG-1338-2.patch, PIG-1338-3.patch, PIG-1338-4.patch, PIG-1338-5.patch Currently, the behavior for hadoop conf look up is: * in local mode, if there is hadoop conf, bail out; if there is no hadoop conf, launch local mode * in hadoop mode, if there is hadoop conf, use this conf to launch Pig; if no, still launch without warning, but many functionality will go wrong We should bring it to a more intuitive way, which is: * in local mode, always launch Pig in local mode * in hadoop mode, if there is hadoop conf, use this conf to launch Pig; if no, bail out with a meaningful message -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1338) Pig should exclude hadoop conf in local mode
[ https://issues.apache.org/jira/browse/PIG-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1338: Status: Patch Available (was: Open) Pig should exclude hadoop conf in local mode Key: PIG-1338 URL: https://issues.apache.org/jira/browse/PIG-1338 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Attachments: PIG-1338-1.patch, PIG-1338-2.patch, PIG-1338-3.patch, PIG-1338-4.patch, PIG-1338-5.patch Currently, the behavior for hadoop conf look up is: * in local mode, if there is hadoop conf, bail out; if there is no hadoop conf, launch local mode * in hadoop mode, if there is hadoop conf, use this conf to launch Pig; if no, still launch without warning, but many functionality will go wrong We should bring it to a more intuitive way, which is: * in local mode, always launch Pig in local mode * in hadoop mode, if there is hadoop conf, use this conf to launch Pig; if no, bail out with a meaningful message -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1338) Pig should exclude hadoop conf in local mode
[ https://issues.apache.org/jira/browse/PIG-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1338: Status: Open (was: Patch Available) Pig should exclude hadoop conf in local mode Key: PIG-1338 URL: https://issues.apache.org/jira/browse/PIG-1338 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Attachments: PIG-1338-1.patch, PIG-1338-2.patch, PIG-1338-3.patch, PIG-1338-4.patch, PIG-1338-5.patch Currently, the behavior for hadoop conf look up is: * in local mode, if there is hadoop conf, bail out; if there is no hadoop conf, launch local mode * in hadoop mode, if there is hadoop conf, use this conf to launch Pig; if no, still launch without warning, but many functionality will go wrong We should bring it to a more intuitive way, which is: * in local mode, always launch Pig in local mode * in hadoop mode, if there is hadoop conf, use this conf to launch Pig; if no, bail out with a meaningful message -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1337) Need a way to pass distributed cache configuration information to hadoop backend in Pig's LoadFunc
[ https://issues.apache.org/jira/browse/PIG-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852485#action_12852485 ] Pradeep Kamath commented on PIG-1337: - We may need to add a new method - addToDistributedCache() on LoadFunc - notice this is an adder not a setter since there is only one key for distributed cache in hadoop's Job (Configuration in the Job). So implementations of loadfunc will have to use the DistributedCache.add*() methods. Need a way to pass distributed cache configuration information to hadoop backend in Pig's LoadFunc -- Key: PIG-1337 URL: https://issues.apache.org/jira/browse/PIG-1337 Project: Pig Issue Type: Improvement Affects Versions: 0.6.0 Reporter: Chao Wang Fix For: 0.8.0 The Zebra storage layer needs to use distributed cache to reduce name node load during job runs. To to this, Zebra needs to set up distributed cache related configuration information in TableLoader (which extends Pig's LoadFunc) . It is doing this within getSchema(conf). The problem is that the conf object here is not the one that is being serialized to map/reduce backend. As such, the distributed cache is not set up properly. To work over this problem, we need Pig in its LoadFunc to ensure a way that we can use to set up distributed cache information in a conf object, and this conf object is the one used by map/reduce backend. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1313) PigServer leaks memory over time
[ https://issues.apache.org/jira/browse/PIG-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Graham updated PIG-1313: - Attachment: PIG-1313-0.4.0-4.patch Here's the same patch for 0.4.0 if anyone wants it. PigServer leaks memory over time Key: PIG-1313 URL: https://issues.apache.org/jira/browse/PIG-1313 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Bill Graham Assignee: Bill Graham Fix For: 0.8.0 Attachments: PIG-1313-0.4.0-1.patch, PIG-1313-0.4.0-4.patch, PIG-1313-1.patch, PIG-1313-1.patch, PIG-1313-2.patch, PIG-1313-3.patch, PIG-1313-4.patch, Pig1313Reproducer.java When {{PigServer}} runs it creates temporary files using the {{FileLocalizer.getTemporaryPath(..)}}. This static method creates and returns a handle to a temporary file (as an instance of {{ElementDescriptor}}). The {{ElementDescriptors}} returned by this method are kept on a static {{Stack}} named {{toDelete}}. The items on {{toDelete}} get removed by the {{FileLocalizer.deleteTempFile()}} method. The only place in the code where I see {{FileLocalizer.deleteTempFile()}} called is in the Main class. {{PigServer}} does not call that method though, so a long-running VM that repeatedly uses instances of {{PigServer}} to run jobs will leak memory via {{toDelete}}. One suggested fix is to have {{PigServer.shutdown()}} call {{FileLocalizer.deleteTempFile()}}, but this would cause problems in a multi-threaded environment, since it seems {{ElementDescriptors}} are pushed onto the {{toDelete}} stack before they're used, not once they're done with. With this approach, running multiple instances of {{PigServer}} in separate threads could cause one completed job to clobber the other's still-in-use temp files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1346) In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME
[ https://issues.apache.org/jira/browse/PIG-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852492#action_12852492 ] Daniel Dai commented on PIG-1346: - +1, patch looks good. In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME Key: PIG-1346 URL: https://issues.apache.org/jira/browse/PIG-1346 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1346.patch Util.executeShellCommand is currently used in unit tests to execute java related binaries like java, javac, jar - this method should check if JAVA_HOME is set and use $JAVA_HOME/bin/java etc. If JAVA_HOME is not set, the method can try and execute the command as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1348) InternalCachedBag running out of memory
InternalCachedBag running out of memory --- Key: PIG-1348 URL: https://issues.apache.org/jira/browse/PIG-1348 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Ashutosh Chauhan Assignee: Richard Ding InternalCachedBag makes estimate of memory available to the VM by using Runtime.getRuntime().maxMemory(). It then uses 10%(by default, though configurable) of this memory and divides this memory into number of bags. It keeps track of the memory used by bags and then proactively spills if bags memory usage reach close to these limits. Given all this in theory when presented with data more then it can handle InternalCachedBag should not run out of memory. But in practice we find OOM happening. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1348) InternalCachedBag running out of memory
[ https://issues.apache.org/jira/browse/PIG-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852511#action_12852511 ] Ashutosh Chauhan commented on PIG-1348: --- To reproduce, cogroup page_views(from PigMix's dataset) with page_views on user and this exception should occur. Apart from making InternalCachedBag more robust, important thing to figure out here is to see where 90% of available memory is getting used. Also, a related fix went in for this recently: PIG-1307 Might be related to that issue. InternalCachedBag running out of memory --- Key: PIG-1348 URL: https://issues.apache.org/jira/browse/PIG-1348 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Ashutosh Chauhan Assignee: Richard Ding InternalCachedBag makes estimate of memory available to the VM by using Runtime.getRuntime().maxMemory(). It then uses 10%(by default, though configurable) of this memory and divides this memory into number of bags. It keeps track of the memory used by bags and then proactively spills if bags memory usage reach close to these limits. Given all this in theory when presented with data more then it can handle InternalCachedBag should not run out of memory. But in practice we find OOM happening. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1341) BinStorage cannot convert DataByteArray to Chararray and results in FIELD_DISCARDED_TYPE_CONVERSION_FAILED
[ https://issues.apache.org/jira/browse/PIG-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852515#action_12852515 ] Daniel Dai commented on PIG-1341: - Have a discussion with Alan and Richard, we felt that caster for BinStorage does not make sense. We don't know how to cast bytearray datatype for BinStorage. In the intermediate storage case, we will find the original loader, and use lineage for that loader to convert bytearray. But if user use the BinStorage directly, we have no idea what bytearray means. So the suggestion is we don't give caster to BinStorage. The implication is that if user want to use BinStorage as a temporary store, in some cases, it will fail. Here is a sample script which will be broken if we make this change: script 1: {code} a = load '1.txt'; b = order a by $0; store b into 'temp.out' using BinStorage(); -- store in BinStorage format with the datatype bytearray {code} script 2: {code} a = load 'temp.out' using BinStorage(); b = foreach a generate $0+$1; -- here we will need a caster, but BinStorage does not have it, we will fail {code} BinStorage cannot convert DataByteArray to Chararray and results in FIELD_DISCARDED_TYPE_CONVERSION_FAILED -- Key: PIG-1341 URL: https://issues.apache.org/jira/browse/PIG-1341 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1341.patch Script reads in BinStorage data and tries to convert a column which is in DataByteArray to Chararray. {code} raw = load 'sampledata' using BinStorage() as (col1,col2, col3); --filter out null columns A = filter raw by col1#'bcookie' is not null; B = foreach A generate col1#'bcookie' as reqcolumn; describe B; --B: {regcolumn: bytearray} X = limit B 5; dump X; B = foreach A generate (chararray)col1#'bcookie' as convertedcol; describe B; --B: {convertedcol: chararray} X = limit B 5; dump X; {code} The first dump produces: (36co9b55onr8s) (36co9b55onr8s) (36hilul5oo1q1) (36hilul5oo1q1) (36l4cj15ooa8a) The second dump produces: () () () () () It also throws an error message: FIELD_DISCARDED_TYPE_CONVERSION_FAILED 5 time(s). Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1341) BinStorage cannot convert DataByteArray to Chararray and results in FIELD_DISCARDED_TYPE_CONVERSION_FAILED
[ https://issues.apache.org/jira/browse/PIG-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852521#action_12852521 ] Daniel Dai commented on PIG-1341: - Also if we doing that, BinStorage is not a reliable way to dump data and load it, without having to explicitly list all the fields and figure out their parts (though it is already not). I think we shall provide some way for this use case. BinStorage cannot convert DataByteArray to Chararray and results in FIELD_DISCARDED_TYPE_CONVERSION_FAILED -- Key: PIG-1341 URL: https://issues.apache.org/jira/browse/PIG-1341 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1341.patch Script reads in BinStorage data and tries to convert a column which is in DataByteArray to Chararray. {code} raw = load 'sampledata' using BinStorage() as (col1,col2, col3); --filter out null columns A = filter raw by col1#'bcookie' is not null; B = foreach A generate col1#'bcookie' as reqcolumn; describe B; --B: {regcolumn: bytearray} X = limit B 5; dump X; B = foreach A generate (chararray)col1#'bcookie' as convertedcol; describe B; --B: {convertedcol: chararray} X = limit B 5; dump X; {code} The first dump produces: (36co9b55onr8s) (36co9b55onr8s) (36hilul5oo1q1) (36hilul5oo1q1) (36l4cj15ooa8a) The second dump produces: () () () () () It also throws an error message: FIELD_DISCARDED_TYPE_CONVERSION_FAILED 5 time(s). Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1341) BinStorage cannot convert DataByteArray to Chararray and results in FIELD_DISCARDED_TYPE_CONVERSION_FAILED
[ https://issues.apache.org/jira/browse/PIG-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852565#action_12852565 ] Daniel Dai commented on PIG-1341: - I am thinking about encoding lineage info into the BinStorage file header, so even after we dump and load, we still maintain the lineage information. BinStorage cannot convert DataByteArray to Chararray and results in FIELD_DISCARDED_TYPE_CONVERSION_FAILED -- Key: PIG-1341 URL: https://issues.apache.org/jira/browse/PIG-1341 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1341.patch Script reads in BinStorage data and tries to convert a column which is in DataByteArray to Chararray. {code} raw = load 'sampledata' using BinStorage() as (col1,col2, col3); --filter out null columns A = filter raw by col1#'bcookie' is not null; B = foreach A generate col1#'bcookie' as reqcolumn; describe B; --B: {regcolumn: bytearray} X = limit B 5; dump X; B = foreach A generate (chararray)col1#'bcookie' as convertedcol; describe B; --B: {convertedcol: chararray} X = limit B 5; dump X; {code} The first dump produces: (36co9b55onr8s) (36co9b55onr8s) (36hilul5oo1q1) (36hilul5oo1q1) (36l4cj15ooa8a) The second dump produces: () () () () () It also throws an error message: FIELD_DISCARDED_TYPE_CONVERSION_FAILED 5 time(s). Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1349) [Zebra] Hubson test failure in test case TestBasicUnion
[Zebra] Hubson test failure in test case TestBasicUnion --- Key: PIG-1349 URL: https://issues.apache.org/jira/browse/PIG-1349 Project: Pig Issue Type: Test Reporter: Xuefu Zhang Fix For: 0.7.0 junit.framework.AssertionFailedError: expected:0_01 but was:0_00 at org.apache.hadoop.zebra.pig.TestBasicUnion.__CLR2_5_168gq2gqpe(TestBasicUnion.java:690) at org.apache.hadoop.zebra.pig.TestBasicUnion.testReader6(TestBasicUnion.java:672) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1309) Map-side Cogroup
[ https://issues.apache.org/jira/browse/PIG-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1309: -- Attachment: pig-1309_2.patch Updated the patch to fix test failures, javac warnings and more comments. Result of test-patch on latest patch: {noformat} [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 9 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] {noformat} Result of test-commit: {noformat} test-commit: [mkdir] Created dir: /homes/chauhana/scratch/latest/build/test/logs [junit] Running org.apache.pig.test.TestAdd [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.036 sec : : [junit] Running org.apache.pig.test.TestTypeCheckingValidatorNoSchema [junit] Tests run: 13, Failures: 0, Errors: 0, Time elapsed: 0.165 sec BUILD SUCCESSFUL {noformat} Patch checked in trunk. Map-side Cogroup Key: PIG-1309 URL: https://issues.apache.org/jira/browse/PIG-1309 Project: Pig Issue Type: Bug Components: impl Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: mapsideCogrp.patch, pig-1309_1.patch, pig-1309_2.patch In never ending quest to make Pig go faster, we want to parallelize as many relational operations as possible. Its already possible to do Group-by( PIG-984 ) and Joins( PIG-845 , PIG-554 ) purely in map-side in Pig. This jira is to add map-side implementation of Cogroup in Pig. Details to follow. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1349) [Zebra] Hubson test failure in test case TestBasicUnion
[ https://issues.apache.org/jira/browse/PIG-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1349: - Attachment: owl.0401 [Zebra] Hubson test failure in test case TestBasicUnion --- Key: PIG-1349 URL: https://issues.apache.org/jira/browse/PIG-1349 Project: Pig Issue Type: Test Reporter: Xuefu Zhang Fix For: 0.7.0 Attachments: owl.0401 junit.framework.AssertionFailedError: expected:0_01 but was:0_00 at org.apache.hadoop.zebra.pig.TestBasicUnion.__CLR2_5_168gq2gqpe(TestBasicUnion.java:690) at org.apache.hadoop.zebra.pig.TestBasicUnion.testReader6(TestBasicUnion.java:672) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1349) [Zebra] Hubson test failure in test case TestBasicUnion
[ https://issues.apache.org/jira/browse/PIG-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang reassigned PIG-1349: Assignee: Xuefu Zhang [Zebra] Hubson test failure in test case TestBasicUnion --- Key: PIG-1349 URL: https://issues.apache.org/jira/browse/PIG-1349 Project: Pig Issue Type: Test Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.7.0 Attachments: owl.0401 junit.framework.AssertionFailedError: expected:0_01 but was:0_00 at org.apache.hadoop.zebra.pig.TestBasicUnion.__CLR2_5_168gq2gqpe(TestBasicUnion.java:690) at org.apache.hadoop.zebra.pig.TestBasicUnion.testReader6(TestBasicUnion.java:672) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1349) [Zebra] Hubson test failure in test case TestBasicUnion
[ https://issues.apache.org/jira/browse/PIG-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1349: - Status: Patch Available (was: Open) [Zebra] Hubson test failure in test case TestBasicUnion --- Key: PIG-1349 URL: https://issues.apache.org/jira/browse/PIG-1349 Project: Pig Issue Type: Test Reporter: Xuefu Zhang Fix For: 0.7.0 Attachments: owl.0401 junit.framework.AssertionFailedError: expected:0_01 but was:0_00 at org.apache.hadoop.zebra.pig.TestBasicUnion.__CLR2_5_168gq2gqpe(TestBasicUnion.java:690) at org.apache.hadoop.zebra.pig.TestBasicUnion.testReader6(TestBasicUnion.java:672) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1349) [Zebra] Hubson test failure in test case TestBasicUnion
[ https://issues.apache.org/jira/browse/PIG-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852570#action_12852570 ] Hadoop QA commented on PIG-1349: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12440544/owl.0401 against trunk revision 930108. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/276/console This message is automatically generated. [Zebra] Hubson test failure in test case TestBasicUnion --- Key: PIG-1349 URL: https://issues.apache.org/jira/browse/PIG-1349 Project: Pig Issue Type: Test Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.7.0 Attachments: owl.0401 junit.framework.AssertionFailedError: expected:0_01 but was:0_00 at org.apache.hadoop.zebra.pig.TestBasicUnion.__CLR2_5_168gq2gqpe(TestBasicUnion.java:690) at org.apache.hadoop.zebra.pig.TestBasicUnion.testReader6(TestBasicUnion.java:672) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1349) [Zebra] Hubson test failure in test case TestBasicUnion
[ https://issues.apache.org/jira/browse/PIG-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1349: - Attachment: zebra.0401 [Zebra] Hubson test failure in test case TestBasicUnion --- Key: PIG-1349 URL: https://issues.apache.org/jira/browse/PIG-1349 Project: Pig Issue Type: Test Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.7.0 Attachments: zebra.0401 junit.framework.AssertionFailedError: expected:0_01 but was:0_00 at org.apache.hadoop.zebra.pig.TestBasicUnion.__CLR2_5_168gq2gqpe(TestBasicUnion.java:690) at org.apache.hadoop.zebra.pig.TestBasicUnion.testReader6(TestBasicUnion.java:672) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1349) [Zebra] Hubson test failure in test case TestBasicUnion
[ https://issues.apache.org/jira/browse/PIG-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1349: - Attachment: (was: owl.0401) [Zebra] Hubson test failure in test case TestBasicUnion --- Key: PIG-1349 URL: https://issues.apache.org/jira/browse/PIG-1349 Project: Pig Issue Type: Test Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.7.0 Attachments: zebra.0401 junit.framework.AssertionFailedError: expected:0_01 but was:0_00 at org.apache.hadoop.zebra.pig.TestBasicUnion.__CLR2_5_168gq2gqpe(TestBasicUnion.java:690) at org.apache.hadoop.zebra.pig.TestBasicUnion.testReader6(TestBasicUnion.java:672) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1338) Pig should exclude hadoop conf in local mode
[ https://issues.apache.org/jira/browse/PIG-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852584#action_12852584 ] Hadoop QA commented on PIG-1338: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12440527/PIG-1338-5.patch against trunk revision 929737. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 79 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/267/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/267/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/267/console This message is automatically generated. Pig should exclude hadoop conf in local mode Key: PIG-1338 URL: https://issues.apache.org/jira/browse/PIG-1338 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Attachments: PIG-1338-1.patch, PIG-1338-2.patch, PIG-1338-3.patch, PIG-1338-4.patch, PIG-1338-5.patch Currently, the behavior for hadoop conf look up is: * in local mode, if there is hadoop conf, bail out; if there is no hadoop conf, launch local mode * in hadoop mode, if there is hadoop conf, use this conf to launch Pig; if no, still launch without warning, but many functionality will go wrong We should bring it to a more intuitive way, which is: * in local mode, always launch Pig in local mode * in hadoop mode, if there is hadoop conf, use this conf to launch Pig; if no, bail out with a meaningful message -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1349) [Zebra] Hubson test failure in test case TestBasicUnion
[ https://issues.apache.org/jira/browse/PIG-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1349: - Status: Open (was: Patch Available) [Zebra] Hubson test failure in test case TestBasicUnion --- Key: PIG-1349 URL: https://issues.apache.org/jira/browse/PIG-1349 Project: Pig Issue Type: Test Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.7.0 Attachments: zebra.0401 junit.framework.AssertionFailedError: expected:0_01 but was:0_00 at org.apache.hadoop.zebra.pig.TestBasicUnion.__CLR2_5_168gq2gqpe(TestBasicUnion.java:690) at org.apache.hadoop.zebra.pig.TestBasicUnion.testReader6(TestBasicUnion.java:672) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1349) [Zebra] Hubson test failure in test case TestBasicUnion
[ https://issues.apache.org/jira/browse/PIG-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1349: - Status: Patch Available (was: Open) [Zebra] Hubson test failure in test case TestBasicUnion --- Key: PIG-1349 URL: https://issues.apache.org/jira/browse/PIG-1349 Project: Pig Issue Type: Test Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.7.0 Attachments: zebra.0401 junit.framework.AssertionFailedError: expected:0_01 but was:0_00 at org.apache.hadoop.zebra.pig.TestBasicUnion.__CLR2_5_168gq2gqpe(TestBasicUnion.java:690) at org.apache.hadoop.zebra.pig.TestBasicUnion.testReader6(TestBasicUnion.java:672) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1350) [Zebra] Zebra column names cannot have leading _
[Zebra] Zebra column names cannot have leading _ -- Key: PIG-1350 URL: https://issues.apache.org/jira/browse/PIG-1350 Project: Pig Issue Type: Improvement Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.7.0 Disallowing '_' as leading character in column names in Zebra schema is too restrictive, which should be lifted. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1350) [Zebra] Zebra column names cannot have leading _
[ https://issues.apache.org/jira/browse/PIG-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1350: - Attachment: pig-1350.patch [Zebra] Zebra column names cannot have leading _ -- Key: PIG-1350 URL: https://issues.apache.org/jira/browse/PIG-1350 Project: Pig Issue Type: Improvement Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.7.0 Attachments: pig-1350.patch Disallowing '_' as leading character in column names in Zebra schema is too restrictive, which should be lifted. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1313) PigServer leaks memory over time
[ https://issues.apache.org/jira/browse/PIG-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1313: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) PIG-1313-4.patch committed to trunk. Will come with Pig 0.8 release. This issue is about memory leak and it is hard to write a unit test for it. Bill tested it manually and it works. Thanks Bill for contributing! PigServer leaks memory over time Key: PIG-1313 URL: https://issues.apache.org/jira/browse/PIG-1313 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Bill Graham Assignee: Bill Graham Fix For: 0.8.0 Attachments: PIG-1313-0.4.0-1.patch, PIG-1313-0.4.0-4.patch, PIG-1313-1.patch, PIG-1313-1.patch, PIG-1313-2.patch, PIG-1313-3.patch, PIG-1313-4.patch, Pig1313Reproducer.java When {{PigServer}} runs it creates temporary files using the {{FileLocalizer.getTemporaryPath(..)}}. This static method creates and returns a handle to a temporary file (as an instance of {{ElementDescriptor}}). The {{ElementDescriptors}} returned by this method are kept on a static {{Stack}} named {{toDelete}}. The items on {{toDelete}} get removed by the {{FileLocalizer.deleteTempFile()}} method. The only place in the code where I see {{FileLocalizer.deleteTempFile()}} called is in the Main class. {{PigServer}} does not call that method though, so a long-running VM that repeatedly uses instances of {{PigServer}} to run jobs will leak memory via {{toDelete}}. One suggested fix is to have {{PigServer.shutdown()}} call {{FileLocalizer.deleteTempFile()}}, but this would cause problems in a multi-threaded environment, since it seems {{ElementDescriptors}} are pushed onto the {{toDelete}} stack before they're used, not once they're done with. With this approach, running multiple instances of {{PigServer}} in separate threads could cause one completed job to clobber the other's still-in-use temp files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1335) UDFFinder should find LoadFunc used by POCast
[ https://issues.apache.org/jira/browse/PIG-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1335: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed to both trunk and 0.7 branch. UDFFinder should find LoadFunc used by POCast - Key: PIG-1335 URL: https://issues.apache.org/jira/browse/PIG-1335 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Attachments: PIG-1335-1.patch UDFFinder doesn't look into POCast so it will miss LoadFunc used by POCast for lineage. We could see class not found exception in some cases. Here is a sample script: {code} a = load '1.txt' using CustomLoader() as (a0, a1, a2); b = group a by a0; c = foreach b generate flatten(a); d = order c by a0; e = foreach d generate(a1+a2); -- use lineage dump e; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1335) UDFFinder should find LoadFunc used by POCast
[ https://issues.apache.org/jira/browse/PIG-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1335: Fix Version/s: 0.7.0 UDFFinder should find LoadFunc used by POCast - Key: PIG-1335 URL: https://issues.apache.org/jira/browse/PIG-1335 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1335-1.patch UDFFinder doesn't look into POCast so it will miss LoadFunc used by POCast for lineage. We could see class not found exception in some cases. Here is a sample script: {code} a = load '1.txt' using CustomLoader() as (a0, a1, a2); b = group a by a0; c = foreach b generate flatten(a); d = order c by a0; e = foreach d generate(a1+a2); -- use lineage dump e; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1336) Optimize POStore serialized into JobConf
[ https://issues.apache.org/jira/browse/PIG-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1336: Resolution: Fixed Fix Version/s: 0.7.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) This issue is an optimization and is hard to write a unit test. Test it manually and it works. Patch committed to both trunk and 0.7 branch. Optimize POStore serialized into JobConf Key: PIG-1336 URL: https://issues.apache.org/jira/browse/PIG-1336 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1336-1.patch, PIG-1336-2.patch, PIG-1336-3.patch, PIG-1336-4.patch We serialize POStore too early in the JobControlCompiler. At that time, storeFunc have unconstraint link to other operator; in the worst case, it will chain the whole physical plan. Also, in multi-store case, POStore has link to its data source, which is not needed and will increase the footprint of serialized POStore. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1350) [Zebra] Zebra column names cannot have leading _
[ https://issues.apache.org/jira/browse/PIG-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1350: - Status: Patch Available (was: Open) [Zebra] Zebra column names cannot have leading _ -- Key: PIG-1350 URL: https://issues.apache.org/jira/browse/PIG-1350 Project: Pig Issue Type: Improvement Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.7.0 Attachments: pig-1350.patch Disallowing '_' as leading character in column names in Zebra schema is too restrictive, which should be lifted. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1350) [Zebra] Zebra column names cannot have leading _
[ https://issues.apache.org/jira/browse/PIG-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1350: - Status: Open (was: Patch Available) [Zebra] Zebra column names cannot have leading _ -- Key: PIG-1350 URL: https://issues.apache.org/jira/browse/PIG-1350 Project: Pig Issue Type: Improvement Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.7.0 Attachments: pig-1350.patch Disallowing '_' as leading character in column names in Zebra schema is too restrictive, which should be lifted. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1346) In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME
[ https://issues.apache.org/jira/browse/PIG-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1346: Status: Open (was: Patch Available) In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME Key: PIG-1346 URL: https://issues.apache.org/jira/browse/PIG-1346 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1346.patch Util.executeShellCommand is currently used in unit tests to execute java related binaries like java, javac, jar - this method should check if JAVA_HOME is set and use $JAVA_HOME/bin/java etc. If JAVA_HOME is not set, the method can try and execute the command as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1346) In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME
[ https://issues.apache.org/jira/browse/PIG-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1346: Attachment: PIG-1346-2.patch The earlier patch was using System.getProperty(java.home) - apparently ant sometimes appends jre to $JAVA_HOME as the value of the java.home property - this causes failures since $JAVA_HOME/jre/bin/ does not contain javac. I have changed this code to use System.getEnv(JAVA_HOME) instead. In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME Key: PIG-1346 URL: https://issues.apache.org/jira/browse/PIG-1346 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1346-2.patch, PIG-1346.patch Util.executeShellCommand is currently used in unit tests to execute java related binaries like java, javac, jar - this method should check if JAVA_HOME is set and use $JAVA_HOME/bin/java etc. If JAVA_HOME is not set, the method can try and execute the command as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1346) In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME
[ https://issues.apache.org/jira/browse/PIG-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1346: Status: Patch Available (was: Open) In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME Key: PIG-1346 URL: https://issues.apache.org/jira/browse/PIG-1346 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1346-2.patch, PIG-1346.patch Util.executeShellCommand is currently used in unit tests to execute java related binaries like java, javac, jar - this method should check if JAVA_HOME is set and use $JAVA_HOME/bin/java etc. If JAVA_HOME is not set, the method can try and execute the command as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1346) In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME
[ https://issues.apache.org/jira/browse/PIG-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852599#action_12852599 ] Daniel Dai commented on PIG-1346: - +1 In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME Key: PIG-1346 URL: https://issues.apache.org/jira/browse/PIG-1346 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1346-2.patch, PIG-1346.patch Util.executeShellCommand is currently used in unit tests to execute java related binaries like java, javac, jar - this method should check if JAVA_HOME is set and use $JAVA_HOME/bin/java etc. If JAVA_HOME is not set, the method can try and execute the command as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1350) [Zebra] Zebra column names cannot have leading _
[ https://issues.apache.org/jira/browse/PIG-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1350: - Attachment: pig-1350.patch [Zebra] Zebra column names cannot have leading _ -- Key: PIG-1350 URL: https://issues.apache.org/jira/browse/PIG-1350 Project: Pig Issue Type: Improvement Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.7.0 Attachments: pig-1350.patch, pig-1350.patch Disallowing '_' as leading character in column names in Zebra schema is too restrictive, which should be lifted. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1350) [Zebra] Zebra column names cannot have leading _
[ https://issues.apache.org/jira/browse/PIG-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1350: - Attachment: (was: pig-1350.patch) [Zebra] Zebra column names cannot have leading _ -- Key: PIG-1350 URL: https://issues.apache.org/jira/browse/PIG-1350 Project: Pig Issue Type: Improvement Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.7.0 Attachments: pig-1350.patch Disallowing '_' as leading character in column names in Zebra schema is too restrictive, which should be lifted. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1349) [Zebra] Hubson test failure in test case TestBasicUnion
[ https://issues.apache.org/jira/browse/PIG-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852688#action_12852688 ] Hadoop QA commented on PIG-1349: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12440547/zebra.0401 against trunk revision 930123. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/268/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/268/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/268/console This message is automatically generated. [Zebra] Hubson test failure in test case TestBasicUnion --- Key: PIG-1349 URL: https://issues.apache.org/jira/browse/PIG-1349 Project: Pig Issue Type: Test Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.7.0 Attachments: zebra.0401 junit.framework.AssertionFailedError: expected:0_01 but was:0_00 at org.apache.hadoop.zebra.pig.TestBasicUnion.__CLR2_5_168gq2gqpe(TestBasicUnion.java:690) at org.apache.hadoop.zebra.pig.TestBasicUnion.testReader6(TestBasicUnion.java:672) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1349) [Zebra] Hubson test failure in test case TestBasicUnion
[ https://issues.apache.org/jira/browse/PIG-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1349: -- Resolution: Fixed Fix Version/s: 0.8.0 Status: Resolved (was: Patch Available) Committed to the trunk and the 0.7 branch. [Zebra] Hubson test failure in test case TestBasicUnion --- Key: PIG-1349 URL: https://issues.apache.org/jira/browse/PIG-1349 Project: Pig Issue Type: Test Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.7.0, 0.8.0 Attachments: zebra.0401 junit.framework.AssertionFailedError: expected:0_01 but was:0_00 at org.apache.hadoop.zebra.pig.TestBasicUnion.__CLR2_5_168gq2gqpe(TestBasicUnion.java:690) at org.apache.hadoop.zebra.pig.TestBasicUnion.testReader6(TestBasicUnion.java:672) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1346) In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME
[ https://issues.apache.org/jira/browse/PIG-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852695#action_12852695 ] Hadoop QA commented on PIG-1346: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12440552/PIG-1346-2.patch against trunk revision 930123. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/278/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/278/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/278/console This message is automatically generated. In unit tests Util.executeShellCommand relies on java commands being in the path and does not consider JAVA_HOME Key: PIG-1346 URL: https://issues.apache.org/jira/browse/PIG-1346 Project: Pig Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1346-2.patch, PIG-1346.patch Util.executeShellCommand is currently used in unit tests to execute java related binaries like java, javac, jar - this method should check if JAVA_HOME is set and use $JAVA_HOME/bin/java etc. If JAVA_HOME is not set, the method can try and execute the command as-is. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.