[jira] Created: (PIG-1221) Filter equality does not work for tuples

2010-02-04 Thread Neil Blue (JIRA)
Filter equality does not work for tuples


 Key: PIG-1221
 URL: https://issues.apache.org/jira/browse/PIG-1221
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.5.0
 Environment: Windows and Linux. Java 1.6 hadoop 0.20.1
Reporter: Neil Blue


From the documentation I understand that it should be possible to  filter a 
relation based on the equality of tuples. 
http://wiki.apache.org/pig/PigTypesFunctionalSpec , 
http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#deref:

 However with this data file

-- indext.txt:
(1,one) (1,ONE)
(2,two) (22, twentytwo)
(3,three)   (3,three)

I run this pig script:
A = LOAD 'indext.txt' AS (t1:(a:int, b:chararray), t2:(a:int, b:chararray)); B 
= FILTER A BY t1==t2; DUMP B;
Expecting the output:
((3,three),(3,three))

However there is an error:
2010-02-03 09:05:20,523 [main] ERROR org.apache.pig.tools.grunt.Grunt 
- ERROR 2067: EqualToExpr does not know how to handle type: tuple
 Pig Stack Trace
 ---
 ERROR 2067: EqualToExpr does not know how to handle type: tuple
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066:  
 Unable to
 open iterator for alias B
at org.apache.pig.PigServer.openIterator(PigServer.java:475)
at
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java: 
 532)
at
 org
 .apache
 .pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.
 java:190)
at
 org
 .apache
 .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166
 )
at
 org
 .apache
 .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142
 )
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:397)
 Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR
 1002:
 Unable to store alias B
at org.apache.pig.PigServer.store(PigServer.java:530)
at org.apache.pig.PigServer.openIterator(PigServer.java:458)
... 6 more
 Caused by: org.apache.pig.backend.executionengine.ExecException:  
 ERROR 2067:
 EqualToExpr does not know how to handle type: tuple
at
 org
 .apache
 .pig.backend.hadoop.executionengine.physicalLayer.expressionOperat
 ors.EqualToExpr.getNext(EqualToExpr.java:108)
at
 org
 .apache
 .pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
 ors.POFilter.getNext(POFilter.java:148)
at
 org
 .apache
 .pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
 .processInput(PhysicalOperator.java:231)
at
 org
 .apache
 .pig.backend.local.executionengine.physicalLayer.counters.POCounte
 r.getNext(POCounter.java:71)
at
 org
 .apache
 .pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
 .processInput(PhysicalOperator.java:231)
at
 org
 .apache
 .pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
 ors.POStore.getNext(POStore.java:117)
at
 org
 .apache
 .pig.backend.local.executionengine.LocalPigLauncher.runPipeline(Lo
 calPigLauncher.java:146)
at
 org
 .apache
 .pig.backend.local.executionengine.LocalPigLauncher.launchPig(Loca
 lPigLauncher.java:109)
at
 org
 .apache
 .pig.backend.local.executionengine.LocalExecutionEngine.execute(Lo
 calExecutionEngine.java:165)

Thanks
Neil


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-259) allow store to overwrite existing directroy

2010-02-04 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-259:
---

Attachment: Pig_259.patch

I choose the keyword overwrite to indicate user want to overwrite the file.

The following is the implementation details:
1. Add an variable isOverWrite in LOStore
2. In the InputOutputFileValidator, delete the destination file first if you 
use the overwrite keyword.



 allow store to overwrite existing directroy
 ---

 Key: PIG-259
 URL: https://issues.apache.org/jira/browse/PIG-259
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Jeff Zhang
 Fix For: 0.8.0

 Attachments: Pig_259.patch


 we have users who are asking for a flag to overwrite existing directory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1217) [piggybank] evaluation.util.Top is broken

2010-02-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829557#action_12829557
 ] 

Hadoop QA commented on PIG-1217:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12434746/fix_top_udf.diff
  against trunk revision 906326.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/198/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/198/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/198/console

This message is automatically generated.

 [piggybank] evaluation.util.Top is broken
 -

 Key: PIG-1217
 URL: https://issues.apache.org/jira/browse/PIG-1217
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0, 0.3.1, 0.4.0, site, 0.5.0, 0.6.0, 0.7.0
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
Priority: Minor
 Fix For: 0.7.0

 Attachments: fix_top_udf.diff


 The Top udf has been broken for a while, due to an incorrect implementation 
 of getArgToFuncMapping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner

2010-02-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829575#action_12829575
 ] 

Hadoop QA commented on PIG-1219:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12434747/PIG-1219-2.patch
  against trunk revision 906326.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/191/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/191/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/191/console

This message is automatically generated.

 Extra listStatus call to the namenode in WeightedRangePartitioner
 -

 Key: PIG-1219
 URL: https://issues.apache.org/jira/browse/PIG-1219
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1219-1.patch, PIG-1219-2.patch


 We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open 
 quantile file. openDFSFile internally will check the existence of the 
 quantile file, which adds burden to hdfs namenode. We shall remove this extra 
 check.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1217) [piggybank] evaluation.util.Top is broken

2010-02-04 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829589#action_12829589
 ] 

Dmitriy V. Ryaboy commented on PIG-1217:


The test failures appear to be unrelated to this change. Please review.

 [piggybank] evaluation.util.Top is broken
 -

 Key: PIG-1217
 URL: https://issues.apache.org/jira/browse/PIG-1217
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0, 0.3.1, 0.4.0, site, 0.5.0, 0.6.0, 0.7.0
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
Priority: Minor
 Fix For: 0.7.0

 Attachments: fix_top_udf.diff


 The Top udf has been broken for a while, due to an incorrect implementation 
 of getArgToFuncMapping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1046) join algorithm specification is within double quotes

2010-02-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829608#action_12829608
 ] 

Hadoop QA commented on PIG-1046:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12434775/pig-1046_3.patch
  against trunk revision 906326.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/199/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/199/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/199/console

This message is automatically generated.

 join algorithm specification is within double quotes
 

 Key: PIG-1046
 URL: https://issues.apache.org/jira/browse/PIG-1046
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-1046.patch, pig-1046_1.patch, pig-1046_2.patch, 
 pig-1046_3.patch


 This fails -
 j = join l1 by $0, l2 by $0 using 'skewed';
 This works -
 j = join l1 by $0, l2 by $0 using skewed;
 String constants are single-quoted in pig-latin. If the algorithm 
 specification is supposed to be a string, specifying it within single quotes 
 should be supported.
 Alternatively, we should be using identifiers here, since these are 
 pre-defined in pig users will not be specifying arbitrary values that might 
 not be valid identifier. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1221) Filter equality does not work for tuples

2010-02-04 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829618#action_12829618
 ] 

Ashutosh Chauhan commented on PIG-1221:
---

Looking at code it seems we don't support equality on maps either, while 
specification tells us we should. 

 Filter equality does not work for tuples
 

 Key: PIG-1221
 URL: https://issues.apache.org/jira/browse/PIG-1221
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.5.0
 Environment: Windows and Linux. Java 1.6 hadoop 0.20.1
Reporter: Neil Blue

 From the documentation I understand that it should be possible to  filter a 
 relation based on the equality of tuples. 
 http://wiki.apache.org/pig/PigTypesFunctionalSpec , 
 http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#deref:
  However with this data file
 -- indext.txt:
 (1,one) (1,ONE)
 (2,two) (22, twentytwo)
 (3,three)   (3,three)
 I run this pig script:
 A = LOAD 'indext.txt' AS (t1:(a:int, b:chararray), t2:(a:int, b:chararray)); 
 B = FILTER A BY t1==t2; DUMP B;
 Expecting the output:
 ((3,three),(3,three))
 However there is an error:
 2010-02-03 09:05:20,523 [main] ERROR org.apache.pig.tools.grunt.Grunt 
 - ERROR 2067: EqualToExpr does not know how to handle type: tuple
  Pig Stack Trace
  ---
  ERROR 2067: EqualToExpr does not know how to handle type: tuple
  org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066:  
  Unable to
  open iterator for alias B
 at org.apache.pig.PigServer.openIterator(PigServer.java:475)
 at
  org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java: 
  532)
 at
  org
  .apache
  .pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.
  java:190)
 at
  org
  .apache
  .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166
  )
 at
  org
  .apache
  .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142
  )
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
 at org.apache.pig.Main.main(Main.java:397)
  Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR
  1002:
  Unable to store alias B
 at org.apache.pig.PigServer.store(PigServer.java:530)
 at org.apache.pig.PigServer.openIterator(PigServer.java:458)
 ... 6 more
  Caused by: org.apache.pig.backend.executionengine.ExecException:  
  ERROR 2067:
  EqualToExpr does not know how to handle type: tuple
 at
  org
  .apache
  .pig.backend.hadoop.executionengine.physicalLayer.expressionOperat
  ors.EqualToExpr.getNext(EqualToExpr.java:108)
 at
  org
  .apache
  .pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
  ors.POFilter.getNext(POFilter.java:148)
 at
  org
  .apache
  .pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
  .processInput(PhysicalOperator.java:231)
 at
  org
  .apache
  .pig.backend.local.executionengine.physicalLayer.counters.POCounte
  r.getNext(POCounter.java:71)
 at
  org
  .apache
  .pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
  .processInput(PhysicalOperator.java:231)
 at
  org
  .apache
  .pig.backend.hadoop.executionengine.physicalLayer.relationalOperat
  ors.POStore.getNext(POStore.java:117)
 at
  org
  .apache
  .pig.backend.local.executionengine.LocalPigLauncher.runPipeline(Lo
  calPigLauncher.java:146)
 at
  org
  .apache
  .pig.backend.local.executionengine.LocalPigLauncher.launchPig(Loca
  lPigLauncher.java:109)
 at
  org
  .apache
  .pig.backend.local.executionengine.LocalExecutionEngine.execute(Lo
  calExecutionEngine.java:165)
 Thanks
 Neil

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1217) [piggybank] evaluation.util.Top is broken

2010-02-04 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-1217:
---

Status: Open  (was: Patch Available)

Huh. I wonder what Hudson tested -- I accidentally attached an old version of 
the unit test, which doesn't even compile with the new Top.  But Hudson passed 
contrib tests, and managed to fail on core tests.

 [piggybank] evaluation.util.Top is broken
 -

 Key: PIG-1217
 URL: https://issues.apache.org/jira/browse/PIG-1217
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0, 0.3.1, 0.4.0, site, 0.5.0, 0.6.0, 0.7.0
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
Priority: Minor
 Fix For: 0.7.0

 Attachments: fix_top_udf.diff


 The Top udf has been broken for a while, due to an incorrect implementation 
 of getArgToFuncMapping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1217) [piggybank] evaluation.util.Top is broken

2010-02-04 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-1217:
---

Attachment: fix_top_udf.diff

 [piggybank] evaluation.util.Top is broken
 -

 Key: PIG-1217
 URL: https://issues.apache.org/jira/browse/PIG-1217
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0, 0.3.1, 0.4.0, site, 0.5.0, 0.6.0, 0.7.0
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
Priority: Minor
 Fix For: 0.7.0

 Attachments: fix_top_udf.diff, fix_top_udf.diff


 The Top udf has been broken for a while, due to an incorrect implementation 
 of getArgToFuncMapping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1217) [piggybank] evaluation.util.Top is broken

2010-02-04 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-1217:
---

Status: Patch Available  (was: Open)

 [piggybank] evaluation.util.Top is broken
 -

 Key: PIG-1217
 URL: https://issues.apache.org/jira/browse/PIG-1217
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0, 0.3.1, 0.4.0, site, 0.5.0, 0.6.0, 0.7.0
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
Priority: Minor
 Fix For: 0.7.0

 Attachments: fix_top_udf.diff, fix_top_udf.diff


 The Top udf has been broken for a while, due to an incorrect implementation 
 of getArgToFuncMapping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner

2010-02-04 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1219:


Status: Open  (was: Patch Available)

 Extra listStatus call to the namenode in WeightedRangePartitioner
 -

 Key: PIG-1219
 URL: https://issues.apache.org/jira/browse/PIG-1219
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1219-1.patch, PIG-1219-2.patch


 We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open 
 quantile file. openDFSFile internally will check the existence of the 
 quantile file, which adds burden to hdfs namenode. We shall remove this extra 
 check.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner

2010-02-04 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1219:


Attachment: PIG-1219-3.patch

The test failure is because the way we test it, not the core code. We now 
require the quantile file to be created before we run JobControlCompiler. In 
our testcase, we invoke the methods of JobControlCompiler directly without 
actually running the job, so we do not have quantile file when we get into 
JobControlCompiler. Change testcase to force create the quantile file.

 Extra listStatus call to the namenode in WeightedRangePartitioner
 -

 Key: PIG-1219
 URL: https://issues.apache.org/jira/browse/PIG-1219
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1219-1.patch, PIG-1219-2.patch, PIG-1219-3.patch


 We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open 
 quantile file. openDFSFile internally will check the existence of the 
 quantile file, which adds burden to hdfs namenode. We shall remove this extra 
 check.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1217) [piggybank] evaluation.util.Top is broken

2010-02-04 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829688#action_12829688
 ] 

Alan Gates commented on PIG-1217:
-

In general, looks good.  A comment on Top.Initial.  If you do something like

B = group A ...
C = foreach B generate myudf(A);

and myudf is algebraic, you are guaranteed to only get one record at a time in 
the Initial function because Pig doesn't do any collecting of the keys.  That 
is, even if ten records in a row have the same key Pig won't detect that and 
collate them into the bag before calling Initial.  We take advantage of that in 
a number of the built in functions (eg COUNT) to make the processing of Initial 
easier.  You may want to do the same here.

As far as getting it into 0.6 release, I think Olga was trying to roll the 
package today or tomorrow, so we may be out of time.

 [piggybank] evaluation.util.Top is broken
 -

 Key: PIG-1217
 URL: https://issues.apache.org/jira/browse/PIG-1217
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0, 0.3.1, 0.4.0, site, 0.5.0, 0.6.0, 0.7.0
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
Priority: Minor
 Fix For: 0.7.0

 Attachments: fix_top_udf.diff, fix_top_udf.diff


 The Top udf has been broken for a while, due to an incorrect implementation 
 of getArgToFuncMapping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner

2010-02-04 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829690#action_12829690
 ] 

Olga Natkovich commented on PIG-1219:
-

I asked Pradeep to also review the code - just to have another set of eyes 
since this change is so late in the game and is not streighforward

 Extra listStatus call to the namenode in WeightedRangePartitioner
 -

 Key: PIG-1219
 URL: https://issues.apache.org/jira/browse/PIG-1219
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1219-1.patch, PIG-1219-2.patch, PIG-1219-3.patch


 We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open 
 quantile file. openDFSFile internally will check the existence of the 
 quantile file, which adds burden to hdfs namenode. We shall remove this extra 
 check.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1209) Port POJoinPackage to proactively spill

2010-02-04 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829691#action_12829691
 ] 

Olga Natkovich commented on PIG-1209:
-

+1. Changes look good

 Port POJoinPackage to proactively spill
 ---

 Key: PIG-1209
 URL: https://issues.apache.org/jira/browse/PIG-1209
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-1209.patch


 POPackage proactively spills the bag whereas POJoinPackage still uses the 
 SpillableMemoryManager. We should port this to use InternalCacheBag which 
 proactively spills.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1217) [piggybank] evaluation.util.Top is broken

2010-02-04 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829694#action_12829694
 ] 

Dmitriy V. Ryaboy commented on PIG-1217:


I see, thanks for the tip. How does this work with tuple reuse -- can I just 
return the input tuple, or do I need to copy the contents to a new tuple in 
Top.Initial() ?

No worries about 0.6, I'd rather it finally go out than try to get something 
like this in at the last moment.

 [piggybank] evaluation.util.Top is broken
 -

 Key: PIG-1217
 URL: https://issues.apache.org/jira/browse/PIG-1217
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0, 0.3.1, 0.4.0, site, 0.5.0, 0.6.0, 0.7.0
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
Priority: Minor
 Fix For: 0.7.0

 Attachments: fix_top_udf.diff, fix_top_udf.diff


 The Top udf has been broken for a while, due to an incorrect implementation 
 of getArgToFuncMapping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-259) allow store to overwrite existing directroy

2010-02-04 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829697#action_12829697
 ] 

Alan Gates commented on PIG-259:


A few comments and questions on this:

1) We should make this work against the load/store branch instead of trunk.  
We're hoping to merge load/store into trunk in a week or two, so it makes more 
sense to put it there.  This will also have implications for load/store.  One, 
it will need to communicate to the new validate function that it's ok if the 
file (or whatever is being overwritten) exists.  Two, load implementations will 
need to handle removing the file (or whatever) if necessary.  For example, 
PigStorage will need to handle removing the file so MR doesn't complain.

2) Should we have overwrite be a keyword (as originally proposed and in the 
patch) or should it be string, like hints in join?  I don't have a strong 
opinion one way or another but I think it's worth considering which we want.

3) Is the semantic of overwrite that it saves whether the file is there or not, 
or that it's an error if the file is not there to write?  Write whether there 
or not makes more sense to me, but I wanted to make sure we all agree on it.

4) What happens when a user requests overwrite and the job fails before it 
runs?  In the current implementation the file will be removed up front, so any 
planning errors will still result in the file being removed.  Also, the file 
will be removed up front, even if the job remains in Hadoop's queue for a long 
time waiting to run.  At the very least, I think Pig should delay removing the 
file until it is ready to launch the job so that type checking errors or 
whatever don't result in the file being removed when the job is not run.


 allow store to overwrite existing directroy
 ---

 Key: PIG-259
 URL: https://issues.apache.org/jira/browse/PIG-259
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Jeff Zhang
 Fix For: 0.8.0

 Attachments: Pig_259.patch


 we have users who are asking for a flag to overwrite existing directory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-259) allow store to overwrite existing directroy

2010-02-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829728#action_12829728
 ] 

Hadoop QA commented on PIG-259:
---

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12434801/Pig_259.patch
  against trunk revision 906326.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/192/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/192/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/192/console

This message is automatically generated.

 allow store to overwrite existing directroy
 ---

 Key: PIG-259
 URL: https://issues.apache.org/jira/browse/PIG-259
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Jeff Zhang
 Fix For: 0.8.0

 Attachments: Pig_259.patch


 we have users who are asking for a flag to overwrite existing directory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1046) join algorithm specification is within double quotes

2010-02-04 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829731#action_12829731
 ] 

Olga Natkovich commented on PIG-1046:
-

(1) I think the error message should be made a little more clear on invalid 
cogroup modifier. Something like:

Only COLLECTED or REGULAR are valid GROUP modifiers.

(2) There seems to be some code duplication to support doubequotes. It would be 
better if you just had warning for deprication but then had the rest of the 
code in one place.

(3) Similar comments for the join part.

 join algorithm specification is within double quotes
 

 Key: PIG-1046
 URL: https://issues.apache.org/jira/browse/PIG-1046
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-1046.patch, pig-1046_1.patch, pig-1046_2.patch, 
 pig-1046_3.patch


 This fails -
 j = join l1 by $0, l2 by $0 using 'skewed';
 This works -
 j = join l1 by $0, l2 by $0 using skewed;
 String constants are single-quoted in pig-latin. If the algorithm 
 specification is supposed to be a string, specifying it within single quotes 
 should be supported.
 Alternatively, we should be using identifiers here, since these are 
 pre-defined in pig users will not be specifying arbitrary values that might 
 not be valid identifier. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1131) Pig simple join does not work when it contains empty lines

2010-02-04 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829736#action_12829736
 ] 

Ashutosh Chauhan commented on PIG-1131:
---

Can't reproduce this on trunk. PIG-1194 touched upon the same piece of code and 
was recently checked in. That one might have fixed this one too. Viraj, can you 
please confirm if you can reproduce it or some variant of it ?

 Pig simple join does not work when it contains empty lines
 --

 Key: PIG-1131
 URL: https://issues.apache.org/jira/browse/PIG-1131
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Viraj Bhat
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: junk1.txt, junk2.txt, simplejoinscript.pig


 I have a simple script, which does a JOIN.
 {code}
 input1 = load '/user/viraj/junk1.txt' using PigStorage(' ');
 describe input1;
 input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001');
 describe input2;
 joineddata = JOIN input1 by $0, input2 by $0;
 describe joineddata;
 store joineddata into 'result';
 {code}
 The input data contains empty lines.  
 The join fails in the Map phase with the following error in the 
 PRLocalRearrange.java
 java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:159)
 I am surprised that the test cases did not detect this error. Could we add 
 this data which contains empty lines to the testcases?
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1222) cast ends up with NULL value

2010-02-04 Thread Ying He (JIRA)
cast ends up with NULL value


 Key: PIG-1222
 URL: https://issues.apache.org/jira/browse/PIG-1222
 Project: Pig
  Issue Type: Bug
Reporter: Ying He


I want to generate data with bags, so I did this,

take a simple text file b.txt

100  apple
200  orange
300  pear
400  apple

then run query:

a = load 'b.txt' as (id, f);
b = group a by id;
store b into 'g' using BinStorage();

then run another query to load data generated from previous step.

a = load 'g/part*' using BinStorage() as (id, d:bag{t:(v, s)});
b = foreach a generate (double)id, flatten(d);
dump b;

then I got the following result:

(,100,apple)
(,100,apple)
(,200,orange)
(,200,apple)
(,300,strawberry)
(,300,pear)
(,400,pear)

the value for id is gone.  If there is no cast, then the result is correct.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

2010-02-04 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829759#action_12829759
 ] 

Alan Gates commented on PIG-1178:
-

Comments that came out of a review of the twiki doc the pig team did:

1) In OperatorPlan, the use of roots and leaves in the graph was considered 
confusing.  Some people view roots as sources and some as sinks.  It was 
recommended that we switch roots to sources and leaves to sinks to avoid 
confusion.

2) The new OperatorPlan does not include mergeSharedPlan, which was used by 
multi-query functionality in the old plan.  After further investigation I found 
that merge is currently only used by multi-query for physical plans.  While 
ideally we would like to use this infrastructure for physical plans too, I feel 
it is reasonable to put off adding merge until at least the initial prototyping 
phase is done.  After briefling looking at it I see no reason why it should not 
work, though we may need a more precise way to decide when two nodes are the 
same and should be merged.

3) A point was raised that perhaps the optimizer should reset the annotations 
on the nodes after a transform and all the attached listeners have been run.  
With further thought, I don't think so, as there may be annotations we want to 
last across transforms.  For example, a rule that could match an infinite 
number of times may want to sign a node to note it's already been there so 
that it does not fire on the node again.  The easiest way to do this signing 
would be with the annotations.  However, I can see that there would be a desire 
to clear certain annotations so that each pass of the optimizer has a fresh 
state.  To accomplish this I was wondering if we should allow developers to add 
visitors that would be run after all the listeners run.  So PlanOptimizer would 
change to have a new method:

{code}
addStatusResettingVisitor(Visitor v) {
resetters.add(v);
}
{code}

and in the optimize loop

{code}
for (OperatorPlan m : matches) {
if (transformer.check(m)) {
sawMatch = true;
transformer.transform(m);
for(PlanTransformListener l: listeners) {
l.transformed(plan, transformer.reportChanges());
}
}
}
{code}

would change to be:

{code}

for (OperatorPlan m : matches) {
if (transformer.check(m)) {
sawMatch = true;
transformer.transform(m);
for(PlanTransformListener l: listeners) {
l.transformed(plan, transformer.reportChanges());
}
for(Visitor v : resetters) {
v.visit();
}
}
}
{code}

Thoughts?

4) There is not clarity on how column pruning will work in the new optimizer.  
Will it be represented by a rule?  If so, how, since the new optimizer does not 
allow matching on any operator just on specific operators?  Would it be better 
instead to have it use the Transformers but not the PlanOptimizer 
infrastructure, since it isn't clear that we would want the column pruning rule 
to be triggered more than once?  To answer these I think we should prototype 
the column pruning soon.  It was one of the hardest parts of the existing 
infrastructure.  We want to make sure it can be done well in this new approach 
before committing to the approach.

5) The comment was made that while the examples in the document appear to show 
that the proposal will work for nested plans (that is, inner plans in foreach) 
they do not show that it will work for operators not yet nestable in foreach 
(e.g. group, foreach).  Since a stated goal of Pig Latin is to someday allow 
arbitrary nesting, we should validate that the proposal will support these 
additional operators to be nested in foreach.


 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Ying He
 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 lp.patch, pig_1178.patch, PIG_1178.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier 

[jira] Updated: (PIG-1209) Port POJoinPackage to proactively spill

2010-02-04 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-1209:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch checked-in.

 Port POJoinPackage to proactively spill
 ---

 Key: PIG-1209
 URL: https://issues.apache.org/jira/browse/PIG-1209
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-1209.patch


 POPackage proactively spills the bag whereas POJoinPackage still uses the 
 SpillableMemoryManager. We should port this to use InternalCacheBag which 
 proactively spills.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1217) [piggybank] evaluation.util.Top is broken

2010-02-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829811#action_12829811
 ] 

Hadoop QA commented on PIG-1217:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12434839/fix_top_udf.diff
  against trunk revision 906326.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/200/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/200/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/200/console

This message is automatically generated.

 [piggybank] evaluation.util.Top is broken
 -

 Key: PIG-1217
 URL: https://issues.apache.org/jira/browse/PIG-1217
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0, 0.3.1, 0.4.0, site, 0.5.0, 0.6.0, 0.7.0
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
Priority: Minor
 Fix For: 0.7.0

 Attachments: fix_top_udf.diff, fix_top_udf.diff


 The Top udf has been broken for a while, due to an incorrect implementation 
 of getArgToFuncMapping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1046) join algorithm specification is within double quotes

2010-02-04 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-1046:
--

Status: Open  (was: Patch Available)

 join algorithm specification is within double quotes
 

 Key: PIG-1046
 URL: https://issues.apache.org/jira/browse/PIG-1046
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-1046.patch, pig-1046_1.patch, pig-1046_2.patch, 
 pig-1046_3.patch


 This fails -
 j = join l1 by $0, l2 by $0 using 'skewed';
 This works -
 j = join l1 by $0, l2 by $0 using skewed;
 String constants are single-quoted in pig-latin. If the algorithm 
 specification is supposed to be a string, specifying it within single quotes 
 should be supported.
 Alternatively, we should be using identifiers here, since these are 
 pre-defined in pig users will not be specifying arbitrary values that might 
 not be valid identifier. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1046) join algorithm specification is within double quotes

2010-02-04 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-1046:
--

Attachment: pig-1046_4.patch

Updated patch incorporating Olga's comments.

 join algorithm specification is within double quotes
 

 Key: PIG-1046
 URL: https://issues.apache.org/jira/browse/PIG-1046
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-1046.patch, pig-1046_1.patch, pig-1046_2.patch, 
 pig-1046_3.patch, pig-1046_4.patch


 This fails -
 j = join l1 by $0, l2 by $0 using 'skewed';
 This works -
 j = join l1 by $0, l2 by $0 using skewed;
 String constants are single-quoted in pig-latin. If the algorithm 
 specification is supposed to be a string, specifying it within single quotes 
 should be supported.
 Alternatively, we should be using identifiers here, since these are 
 pre-defined in pig users will not be specifying arbitrary values that might 
 not be valid identifier. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1046) join algorithm specification is within double quotes

2010-02-04 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-1046:
--

Status: Patch Available  (was: Open)

 join algorithm specification is within double quotes
 

 Key: PIG-1046
 URL: https://issues.apache.org/jira/browse/PIG-1046
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-1046.patch, pig-1046_1.patch, pig-1046_2.patch, 
 pig-1046_3.patch, pig-1046_4.patch


 This fails -
 j = join l1 by $0, l2 by $0 using 'skewed';
 This works -
 j = join l1 by $0, l2 by $0 using skewed;
 String constants are single-quoted in pig-latin. If the algorithm 
 specification is supposed to be a string, specifying it within single quotes 
 should be supported.
 Alternatively, we should be using identifiers here, since these are 
 pre-defined in pig users will not be specifying arbitrary values that might 
 not be valid identifier. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1223) [zebra] Add cli to help admin zebra

2010-02-04 Thread He Yongqiang (JIRA)
[zebra] Add cli to help admin zebra
---

 Key: PIG-1223
 URL: https://issues.apache.org/jira/browse/PIG-1223
 Project: Pig
  Issue Type: Wish
Reporter: He Yongqiang




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner

2010-02-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829865#action_12829865
 ] 

Hadoop QA commented on PIG-1219:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12434850/PIG-1219-3.patch
  against trunk revision 906326.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/193/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/193/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/193/console

This message is automatically generated.

 Extra listStatus call to the namenode in WeightedRangePartitioner
 -

 Key: PIG-1219
 URL: https://issues.apache.org/jira/browse/PIG-1219
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1219-1.patch, PIG-1219-2.patch, PIG-1219-3.patch


 We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open 
 quantile file. openDFSFile internally will check the existence of the 
 quantile file, which adds burden to hdfs namenode. We shall remove this extra 
 check.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested

2010-02-04 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-834:
-

Attachment: pig-834.patch

In this patch, I look for a pattern of POUserFunc followed by another 
POUserFunc in the inner plan of ForEach and if thats found I flag the combiner 
optimizer to not fire. This disables the combiner for this particular query 
(test case included). Wondering if this fix is sufficient for this bug ?

 incorrect plan when algebraic functions are nested
 --

 Key: PIG-834
 URL: https://issues.apache.org/jira/browse/PIG-834
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Thejas M Nair
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-834.patch


 a = load 'students.txt' as (c1,c2,c3,c4); 
 c = group a by c2;  
 f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2));
 Notice that Distinct udf is missing in Combiner and reduce stage. As a result 
 distinct does not function, and incorrect results are produced.
 Distinct should have been evaluated in the 3 stages and output of Distinct 
 should be given to COUNT in reduce stage.
 {code}
 # Map Reduce Plan  
 #--
 MapReduce node 1-122
 Map Plan
 Local Rearrange[tuple]{bytearray}(false) - 1-139
 |   |
 |   Project[bytearray][1] - 1-140
 |
 |---New For Each(false,false)[bag] - 1-127
 |   |
 |   POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125
 |   |
 |   |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126
 |   |
 |   |---Project[bag][2] - 1-123
 |   |
 |   |---Project[bag][1] - 1-124
 |   |
 |   Project[bytearray][0] - 1-133
 |
 |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141
 |
 
 |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage)
  - 1-111
 Combine Plan
 Local Rearrange[tuple]{bytearray}(false) - 1-143
 |   |
 |   Project[bytearray][1] - 1-144
 |
 |---New For Each(false,false)[bag] - 1-132
 |   |
 |   POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130
 |   |
 |   |---Project[bag][0] - 1-135
 |   |
 |   Project[bytearray][1] - 1-134
 |
 |---POCombinerPackage[tuple]{bytearray} - 1-137
 Reduce Plan
 Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121
 |
 |---New For Each(false)[bag] - 1-120
 |   |
 |   POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119
 |   |
 |   |---Project[bag][0] - 1-136
 |
 |---POCombinerPackage[tuple]{bytearray} - 1-145
 Global sort: false
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



What's Zebra?

2010-02-04 Thread jian yi
What's the architecture of Zebra? Do it depend on Pig and HDFS? Please help.
I hope it's following:


 DFS is better for unstructed data, but DTS (not bigtable) is better for
structed data, data warehouse is structed, so I think a table is better than
a file. DTS is following:

 1. Break a logic big table into a many physical small table
 2. The same size blocks is not necessary
 3. The order of blocks is not  necessary
 4. Only store structed data
 5. Support block indexes
 6. Support deleting and updating
 7. The interfaces are SQL, but only a block
 8. Spliting a table horizontally and vertically is supported at the
same time
 9. 。。。


Re: What's Zebra?

2010-02-04 Thread Jeff Zhang
You can refer here
http://wiki.apache.org/pig/zebra



2010/2/5 jian yi eyj...@gmail.com

 What's the architecture of Zebra? Do it depend on Pig and HDFS? Please
 help.
 I hope it's following:


  DFS is better for unstructed data, but DTS (not bigtable) is better for
 structed data, data warehouse is structed, so I think a table is better
 than
 a file. DTS is following:

  1. Break a logic big table into a many physical small table
  2. The same size blocks is not necessary
  3. The order of blocks is not  necessary
  4. Only store structed data
  5. Support block indexes
  6. Support deleting and updating
  7. The interfaces are SQL, but only a block
  8. Spliting a table horizontally and vertically is supported at the
 same time
  9. 。。。




-- 
Best Regards

Jeff Zhang


[jira] Commented: (PIG-1046) join algorithm specification is within double quotes

2010-02-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829962#action_12829962
 ] 

Hadoop QA commented on PIG-1046:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12434884/pig-1046_4.patch
  against trunk revision 906657.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/201/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/201/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/201/console

This message is automatically generated.

 join algorithm specification is within double quotes
 

 Key: PIG-1046
 URL: https://issues.apache.org/jira/browse/PIG-1046
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Ashutosh Chauhan
 Fix For: 0.7.0

 Attachments: pig-1046.patch, pig-1046_1.patch, pig-1046_2.patch, 
 pig-1046_3.patch, pig-1046_4.patch


 This fails -
 j = join l1 by $0, l2 by $0 using 'skewed';
 This works -
 j = join l1 by $0, l2 by $0 using skewed;
 String constants are single-quoted in pig-latin. If the algorithm 
 specification is supposed to be a string, specifying it within single quotes 
 should be supported.
 Alternatively, we should be using identifiers here, since these are 
 pre-defined in pig users will not be specifying arbitrary values that might 
 not be valid identifier. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-259) allow store to overwrite existing directroy

2010-02-04 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-259:
---

Status: Open  (was: Patch Available)

 allow store to overwrite existing directroy
 ---

 Key: PIG-259
 URL: https://issues.apache.org/jira/browse/PIG-259
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Jeff Zhang
 Fix For: 0.8.0

 Attachments: Pig_259.patch, Pig_259_2.patch


 we have users who are asking for a flag to overwrite existing directory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-259) allow store to overwrite existing directroy

2010-02-04 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-259:
---

Attachment: Pig_259_2.patch

 allow store to overwrite existing directroy
 ---

 Key: PIG-259
 URL: https://issues.apache.org/jira/browse/PIG-259
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Jeff Zhang
 Fix For: 0.8.0

 Attachments: Pig_259.patch, Pig_259_2.patch


 we have users who are asking for a flag to overwrite existing directory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-259) allow store to overwrite existing directroy

2010-02-04 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829981#action_12829981
 ] 

Jeff Zhang commented on PIG-259:


Response to Alan regarding his comments,

1. I put the logic of deleting output file in JobControlCompiler, then it is 
easy for me to delay the deletion util the dependent job is done.

2. I prefer using keywords rather than string, because if using string, the 
following statement: {code} store a into 'output' 'overwrite'; {code}  has two 
consecutive string, it looks a little weird in my opinion.

3. I think the semantic of overwrite is the same as it is in file system. In 
file system, when we overwrite file using java api, it won't complain even the 
file does not exist



 allow store to overwrite existing directroy
 ---

 Key: PIG-259
 URL: https://issues.apache.org/jira/browse/PIG-259
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Jeff Zhang
 Fix For: 0.8.0

 Attachments: Pig_259.patch, Pig_259_2.patch


 we have users who are asking for a flag to overwrite existing directory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-259) allow store to overwrite existing directroy

2010-02-04 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829982#action_12829982
 ] 

Jeff Zhang commented on PIG-259:


Alan,

Should I create a new sub task under Pig-966 ? or is there any way to move this 
task under Pig-966 ?

 allow store to overwrite existing directroy
 ---

 Key: PIG-259
 URL: https://issues.apache.org/jira/browse/PIG-259
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Olga Natkovich
Assignee: Jeff Zhang
 Fix For: 0.8.0

 Attachments: Pig_259.patch, Pig_259_2.patch


 we have users who are asking for a flag to overwrite existing directory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.