[jira] Commented: (PIG-1252) Diamond splitter does not generate correct results when using Multi-query optimization
[ https://issues.apache.org/jira/browse/PIG-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843760#action_12843760 ] Hadoop QA commented on PIG-1252: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12438263/PIG-1252-2.patch against trunk revision 921185. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/232/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/232/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/232/console This message is automatically generated. Diamond splitter does not generate correct results when using Multi-query optimization -- Key: PIG-1252 URL: https://issues.apache.org/jira/browse/PIG-1252 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1252-2.patch, PIG-1252.patch I have script which uses split but somehow does not use one of the split branch. The skeleton of the script is as follows {code} loadData = load '/user/viraj/zebradata' using org.apache.hadoop.zebra.pig.TableLoader('col1,col2, col3, col4, col5, col6, col7'); prjData = FOREACH loadData GENERATE (chararray) col1, (chararray) col2, (chararray) col3, (chararray) ((col4 is not null and col4 != '') ? col4 : ((col5 is not null) ? col5 : '')) as splitcond, (chararray) (col6 == 'c' ? 1 : IS_VALID ('200', '0', '0', 'input.txt')) as validRec; SPLIT prjData INTO trueDataTmp IF (validRec == '1' AND splitcond != ''), falseDataTmp IF (validRec == '1' AND splitcond == ''); grpData = GROUP trueDataTmp BY splitcond; finalData = FOREACH grpData { orderedData = ORDER trueDataTmp BY col1,col2; GENERATE FLATTEN ( MYUDF (orderedData, 60, 1800, 'input.txt', 'input.dat','20100222','5', 'debug_on')) as (s,m,l); } dump finalData; {code} You can see that falseDataTmp is untouched. When I run this script with no-Multiquery (-M) option I get the right result. This could be the result of complex BinCond's in the POLoad. We can get rid of this error by using FILTER instead of SPIT. Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1252) Diamond splitter does not generate correct results when using Multi-query optimization
[ https://issues.apache.org/jira/browse/PIG-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843764#action_12843764 ] Richard Ding commented on PIG-1252: --- +1 for Daniel's patch Diamond splitter does not generate correct results when using Multi-query optimization -- Key: PIG-1252 URL: https://issues.apache.org/jira/browse/PIG-1252 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1252-2.patch, PIG-1252.patch I have script which uses split but somehow does not use one of the split branch. The skeleton of the script is as follows {code} loadData = load '/user/viraj/zebradata' using org.apache.hadoop.zebra.pig.TableLoader('col1,col2, col3, col4, col5, col6, col7'); prjData = FOREACH loadData GENERATE (chararray) col1, (chararray) col2, (chararray) col3, (chararray) ((col4 is not null and col4 != '') ? col4 : ((col5 is not null) ? col5 : '')) as splitcond, (chararray) (col6 == 'c' ? 1 : IS_VALID ('200', '0', '0', 'input.txt')) as validRec; SPLIT prjData INTO trueDataTmp IF (validRec == '1' AND splitcond != ''), falseDataTmp IF (validRec == '1' AND splitcond == ''); grpData = GROUP trueDataTmp BY splitcond; finalData = FOREACH grpData { orderedData = ORDER trueDataTmp BY col1,col2; GENERATE FLATTEN ( MYUDF (orderedData, 60, 1800, 'input.txt', 'input.dat','20100222','5', 'debug_on')) as (s,m,l); } dump finalData; {code} You can see that falseDataTmp is untouched. When I run this script with no-Multiquery (-M) option I get the right result. This could be the result of complex BinCond's in the POLoad. We can get rid of this error by using FILTER instead of SPIT. Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1252) Diamond splitter does not generate correct results when using Multi-query optimization
[ https://issues.apache.org/jira/browse/PIG-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12840889#action_12840889 ] Richard Ding commented on PIG-1252: --- The secondary key optimization is documented in PIG-1038. Diamond splitter does not generate correct results when using Multi-query optimization -- Key: PIG-1252 URL: https://issues.apache.org/jira/browse/PIG-1252 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1252.patch I have script which uses split but somehow does not use one of the split branch. The skeleton of the script is as follows {code} loadData = load '/user/viraj/zebradata' using org.apache.hadoop.zebra.pig.TableLoader('col1,col2, col3, col4, col5, col6, col7'); prjData = FOREACH loadData GENERATE (chararray) col1, (chararray) col2, (chararray) col3, (chararray) ((col4 is not null and col4 != '') ? col4 : ((col5 is not null) ? col5 : '')) as splitcond, (chararray) (col6 == 'c' ? 1 : IS_VALID ('200', '0', '0', 'input.txt')) as validRec; SPLIT prjData INTO trueDataTmp IF (validRec == '1' AND splitcond != ''), falseDataTmp IF (validRec == '1' AND splitcond == ''); grpData = GROUP trueDataTmp BY splitcond; finalData = FOREACH grpData { orderedData = ORDER trueDataTmp BY col1,col2; GENERATE FLATTEN ( MYUDF (orderedData, 60, 1800, 'input.txt', 'input.dat','20100222','5', 'debug_on')) as (s,m,l); } dump finalData; {code} You can see that falseDataTmp is untouched. When I run this script with no-Multiquery (-M) option I get the right result. This could be the result of complex BinCond's in the POLoad. We can get rid of this error by using FILTER instead of SPIT. Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1252) Diamond splitter does not generate correct results when using Multi-query optimization
[ https://issues.apache.org/jira/browse/PIG-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841010#action_12841010 ] Hadoop QA commented on PIG-1252: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/1243/PIG-1252.patch against trunk revision 917827. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/232/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/232/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/232/console This message is automatically generated. Diamond splitter does not generate correct results when using Multi-query optimization -- Key: PIG-1252 URL: https://issues.apache.org/jira/browse/PIG-1252 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1252.patch I have script which uses split but somehow does not use one of the split branch. The skeleton of the script is as follows {code} loadData = load '/user/viraj/zebradata' using org.apache.hadoop.zebra.pig.TableLoader('col1,col2, col3, col4, col5, col6, col7'); prjData = FOREACH loadData GENERATE (chararray) col1, (chararray) col2, (chararray) col3, (chararray) ((col4 is not null and col4 != '') ? col4 : ((col5 is not null) ? col5 : '')) as splitcond, (chararray) (col6 == 'c' ? 1 : IS_VALID ('200', '0', '0', 'input.txt')) as validRec; SPLIT prjData INTO trueDataTmp IF (validRec == '1' AND splitcond != ''), falseDataTmp IF (validRec == '1' AND splitcond == ''); grpData = GROUP trueDataTmp BY splitcond; finalData = FOREACH grpData { orderedData = ORDER trueDataTmp BY col1,col2; GENERATE FLATTEN ( MYUDF (orderedData, 60, 1800, 'input.txt', 'input.dat','20100222','5', 'debug_on')) as (s,m,l); } dump finalData; {code} You can see that falseDataTmp is untouched. When I run this script with no-Multiquery (-M) option I get the right result. This could be the result of complex BinCond's in the POLoad. We can get rid of this error by using FILTER instead of SPIT. Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1252) Diamond splitter does not generate correct results when using Multi-query optimization
[ https://issues.apache.org/jira/browse/PIG-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12840339#action_12840339 ] Viraj Bhat commented on PIG-1252: - A modified version of the script works, does this have to do with nested foreach? {code} loadData = load '/user/viraj/zebradata' using org.apache.hadoop.zebra.pig.TableLoader('col1,col2, col3, col4, col5, col6, col7'); prjData = FOREACH loadData GENERATE (chararray) col1, (chararray) col2, (chararray) col3, (chararray) ((col4 is not null and col4 != '') ? col4 : ((col5 is not null) ? col5 : '')) as splitcond, (chararray) (col6 == 'c' ? 1 : IS_VALID ('200', '0', '0', 'input.txt')) as validRec; SPLIT prjData INTO trueDataTmp IF (validRec == '1' AND splitcond != ''), falseDataTmp IF (validRec == '1' AND splitcond == ''); grpData = GROUP trueDataTmp BY splitcond; finalData = FOREACH grpData GENERATE FLATTEN ( MYUDF (orderedData, 60, 1800, 'input.txt', 'input.dat','20100222','5', 'debug_on')) as (s,m,l); dump finalData; {code} Diamond splitter does not generate correct results when using Multi-query optimization -- Key: PIG-1252 URL: https://issues.apache.org/jira/browse/PIG-1252 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.7.0 I have script which uses split but somehow does not use one of the split branch. The skeleton of the script is as follows {code} loadData = load '/user/viraj/zebradata' using org.apache.hadoop.zebra.pig.TableLoader('col1,col2, col3, col4, col5, col6, col7'); prjData = FOREACH loadData GENERATE (chararray) col1, (chararray) col2, (chararray) col3, (chararray) ((col4 is not null and col4 != '') ? col4 : ((col5 is not null) ? col5 : '')) as splitcond, (chararray) (col6 == 'c' ? 1 : IS_VALID ('200', '0', '0', 'input.txt')) as validRec; SPLIT prjData INTO trueDataTmp IF (validRec == '1' AND splitcond != ''), falseDataTmp IF (validRec == '1' AND splitcond == ''); grpData = GROUP trueDataTmp BY splitcond; finalData = FOREACH grpData { orderedData = ORDER trueDataTmp BY col1,col2; GENERATE FLATTEN ( MYUDF (orderedData, 60, 1800, 'input.txt', 'input.dat','20100222','5', 'debug_on')) as (s,m,l); } dump finalData; {code} You can see that falseDataTmp is untouched. When I run this script with no-Multiquery (-M) option I get the right result. This could be the result of complex BinCond's in the POLoad. We can get rid of this error by using FILTER instead of SPIT. Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.