[jira] Commented: (PIG-1252) Diamond splitter does not generate correct results when using Multi-query optimization

2010-03-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843760#action_12843760
 ] 

Hadoop QA commented on PIG-1252:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12438263/PIG-1252-2.patch
  against trunk revision 921185.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/232/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/232/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/232/console

This message is automatically generated.

 Diamond splitter does not generate correct results when using Multi-query 
 optimization
 --

 Key: PIG-1252
 URL: https://issues.apache.org/jira/browse/PIG-1252
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1252-2.patch, PIG-1252.patch


 I have script which uses split but somehow does not use one of the split 
 branch. The skeleton of the script is as follows
 {code}
 loadData = load '/user/viraj/zebradata' using 
 org.apache.hadoop.zebra.pig.TableLoader('col1,col2, col3, col4, col5, col6, 
 col7');
 prjData = FOREACH loadData GENERATE (chararray) col1, (chararray) col2, 
 (chararray) col3, (chararray) ((col4 is not null and col4 != '') ? col4 : 
 ((col5 is not null) ? col5 : '')) as splitcond, (chararray) (col6 == 'c' ? 1 
 : IS_VALID ('200', '0', '0', 'input.txt')) as validRec;
 SPLIT prjData INTO trueDataTmp IF (validRec == '1' AND splitcond != ''), 
 falseDataTmp IF (validRec == '1' AND splitcond == '');
 grpData = GROUP trueDataTmp BY splitcond;
 finalData = FOREACH grpData {
orderedData = ORDER trueDataTmp BY col1,col2;
GENERATE FLATTEN ( MYUDF (orderedData, 60, 
 1800, 'input.txt', 'input.dat','20100222','5', 'debug_on')) as (s,m,l);
   }
 dump finalData;
 {code}
 You can see that falseDataTmp is untouched.
 When I run this script with no-Multiquery (-M) option I get the right result. 
  This could be the result of complex BinCond's in the POLoad. We can get rid 
 of this error by using  FILTER instead of SPIT.
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1252) Diamond splitter does not generate correct results when using Multi-query optimization

2010-03-10 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843764#action_12843764
 ] 

Richard Ding commented on PIG-1252:
---

+1 for Daniel's patch

 Diamond splitter does not generate correct results when using Multi-query 
 optimization
 --

 Key: PIG-1252
 URL: https://issues.apache.org/jira/browse/PIG-1252
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1252-2.patch, PIG-1252.patch


 I have script which uses split but somehow does not use one of the split 
 branch. The skeleton of the script is as follows
 {code}
 loadData = load '/user/viraj/zebradata' using 
 org.apache.hadoop.zebra.pig.TableLoader('col1,col2, col3, col4, col5, col6, 
 col7');
 prjData = FOREACH loadData GENERATE (chararray) col1, (chararray) col2, 
 (chararray) col3, (chararray) ((col4 is not null and col4 != '') ? col4 : 
 ((col5 is not null) ? col5 : '')) as splitcond, (chararray) (col6 == 'c' ? 1 
 : IS_VALID ('200', '0', '0', 'input.txt')) as validRec;
 SPLIT prjData INTO trueDataTmp IF (validRec == '1' AND splitcond != ''), 
 falseDataTmp IF (validRec == '1' AND splitcond == '');
 grpData = GROUP trueDataTmp BY splitcond;
 finalData = FOREACH grpData {
orderedData = ORDER trueDataTmp BY col1,col2;
GENERATE FLATTEN ( MYUDF (orderedData, 60, 
 1800, 'input.txt', 'input.dat','20100222','5', 'debug_on')) as (s,m,l);
   }
 dump finalData;
 {code}
 You can see that falseDataTmp is untouched.
 When I run this script with no-Multiquery (-M) option I get the right result. 
  This could be the result of complex BinCond's in the POLoad. We can get rid 
 of this error by using  FILTER instead of SPIT.
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1252) Diamond splitter does not generate correct results when using Multi-query optimization

2010-03-03 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12840889#action_12840889
 ] 

Richard Ding commented on PIG-1252:
---

The secondary key optimization is documented in PIG-1038.   

 Diamond splitter does not generate correct results when using Multi-query 
 optimization
 --

 Key: PIG-1252
 URL: https://issues.apache.org/jira/browse/PIG-1252
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1252.patch


 I have script which uses split but somehow does not use one of the split 
 branch. The skeleton of the script is as follows
 {code}
 loadData = load '/user/viraj/zebradata' using 
 org.apache.hadoop.zebra.pig.TableLoader('col1,col2, col3, col4, col5, col6, 
 col7');
 prjData = FOREACH loadData GENERATE (chararray) col1, (chararray) col2, 
 (chararray) col3, (chararray) ((col4 is not null and col4 != '') ? col4 : 
 ((col5 is not null) ? col5 : '')) as splitcond, (chararray) (col6 == 'c' ? 1 
 : IS_VALID ('200', '0', '0', 'input.txt')) as validRec;
 SPLIT prjData INTO trueDataTmp IF (validRec == '1' AND splitcond != ''), 
 falseDataTmp IF (validRec == '1' AND splitcond == '');
 grpData = GROUP trueDataTmp BY splitcond;
 finalData = FOREACH grpData {
orderedData = ORDER trueDataTmp BY col1,col2;
GENERATE FLATTEN ( MYUDF (orderedData, 60, 
 1800, 'input.txt', 'input.dat','20100222','5', 'debug_on')) as (s,m,l);
   }
 dump finalData;
 {code}
 You can see that falseDataTmp is untouched.
 When I run this script with no-Multiquery (-M) option I get the right result. 
  This could be the result of complex BinCond's in the POLoad. We can get rid 
 of this error by using  FILTER instead of SPIT.
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1252) Diamond splitter does not generate correct results when using Multi-query optimization

2010-03-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841010#action_12841010
 ] 

Hadoop QA commented on PIG-1252:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/1243/PIG-1252.patch
  against trunk revision 917827.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/232/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/232/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/232/console

This message is automatically generated.

 Diamond splitter does not generate correct results when using Multi-query 
 optimization
 --

 Key: PIG-1252
 URL: https://issues.apache.org/jira/browse/PIG-1252
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1252.patch


 I have script which uses split but somehow does not use one of the split 
 branch. The skeleton of the script is as follows
 {code}
 loadData = load '/user/viraj/zebradata' using 
 org.apache.hadoop.zebra.pig.TableLoader('col1,col2, col3, col4, col5, col6, 
 col7');
 prjData = FOREACH loadData GENERATE (chararray) col1, (chararray) col2, 
 (chararray) col3, (chararray) ((col4 is not null and col4 != '') ? col4 : 
 ((col5 is not null) ? col5 : '')) as splitcond, (chararray) (col6 == 'c' ? 1 
 : IS_VALID ('200', '0', '0', 'input.txt')) as validRec;
 SPLIT prjData INTO trueDataTmp IF (validRec == '1' AND splitcond != ''), 
 falseDataTmp IF (validRec == '1' AND splitcond == '');
 grpData = GROUP trueDataTmp BY splitcond;
 finalData = FOREACH grpData {
orderedData = ORDER trueDataTmp BY col1,col2;
GENERATE FLATTEN ( MYUDF (orderedData, 60, 
 1800, 'input.txt', 'input.dat','20100222','5', 'debug_on')) as (s,m,l);
   }
 dump finalData;
 {code}
 You can see that falseDataTmp is untouched.
 When I run this script with no-Multiquery (-M) option I get the right result. 
  This could be the result of complex BinCond's in the POLoad. We can get rid 
 of this error by using  FILTER instead of SPIT.
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1252) Diamond splitter does not generate correct results when using Multi-query optimization

2010-03-02 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12840339#action_12840339
 ] 

Viraj Bhat commented on PIG-1252:
-

A modified version of the script works, does this have to do with nested 
foreach? 

{code}
loadData = load '/user/viraj/zebradata' using 
org.apache.hadoop.zebra.pig.TableLoader('col1,col2, col3, col4, col5, col6, 
col7');

prjData = FOREACH loadData GENERATE (chararray) col1, (chararray) col2, 
(chararray) col3, (chararray) ((col4 is not null and col4 != '') ? col4 : 
((col5 is not null) ? col5 : '')) as splitcond, (chararray) (col6 == 'c' ? 1 : 
IS_VALID ('200', '0', '0', 'input.txt')) as validRec;

SPLIT prjData INTO trueDataTmp IF (validRec == '1' AND splitcond != ''), 
falseDataTmp IF (validRec == '1' AND splitcond == '');

grpData = GROUP trueDataTmp BY splitcond;

finalData = FOREACH grpData GENERATE FLATTEN ( MYUDF (orderedData, 60, 1800, 
'input.txt', 'input.dat','20100222','5', 'debug_on')) as (s,m,l);
 
dump finalData;
{code}

 Diamond splitter does not generate correct results when using Multi-query 
 optimization
 --

 Key: PIG-1252
 URL: https://issues.apache.org/jira/browse/PIG-1252
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Richard Ding
 Fix For: 0.7.0


 I have script which uses split but somehow does not use one of the split 
 branch. The skeleton of the script is as follows
 {code}
 loadData = load '/user/viraj/zebradata' using 
 org.apache.hadoop.zebra.pig.TableLoader('col1,col2, col3, col4, col5, col6, 
 col7');
 prjData = FOREACH loadData GENERATE (chararray) col1, (chararray) col2, 
 (chararray) col3, (chararray) ((col4 is not null and col4 != '') ? col4 : 
 ((col5 is not null) ? col5 : '')) as splitcond, (chararray) (col6 == 'c' ? 1 
 : IS_VALID ('200', '0', '0', 'input.txt')) as validRec;
 SPLIT prjData INTO trueDataTmp IF (validRec == '1' AND splitcond != ''), 
 falseDataTmp IF (validRec == '1' AND splitcond == '');
 grpData = GROUP trueDataTmp BY splitcond;
 finalData = FOREACH grpData {
orderedData = ORDER trueDataTmp BY col1,col2;
GENERATE FLATTEN ( MYUDF (orderedData, 60, 
 1800, 'input.txt', 'input.dat','20100222','5', 'debug_on')) as (s,m,l);
   }
 dump finalData;
 {code}
 You can see that falseDataTmp is untouched.
 When I run this script with no-Multiquery (-M) option I get the right result. 
  This could be the result of complex BinCond's in the POLoad. We can get rid 
 of this error by using  FILTER instead of SPIT.
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.