[jira] Commented: (PIG-1009) FINDBUGS: OS_OPEN_STREAM: Method may fail to close stream

2009-10-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765933#action_12765933
 ] 

Hadoop QA commented on PIG-1009:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422166/PIG-1009.patch
  against trunk revision 825375.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/81/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/81/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/81/console

This message is automatically generated.

> FINDBUGS: OS_OPEN_STREAM: Method may fail to close stream
> -
>
> Key: PIG-1009
> URL: https://issues.apache.org/jira/browse/PIG-1009
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1009.patch
>
>
> OSorg.apache.pig.impl.io.FileLocalizer.parseCygPath(String, int) may fail 
> to close stream
> OSorg.apache.pig.impl.logicalLayer.parser.QueryParser.which(String) may 
> fail to close stream
> OS
> org.apache.pig.impl.util.PropertiesUtil.loadPropertiesFromFile(Properties) 
> may fail to close stream
> OSorg.apache.pig.Main.configureLog4J(Properties, PigContext) may fail to 
> close stream
> OS
> org.apache.pig.tools.parameters.PreprocessorContext.executeShellCommand(String)
>  may fail to close stream
> OS
> org.apache.pig.tools.parameters.PreprocessorContext.executeShellCommand(String)
>  may fail to close stream

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1008) FINDBUGS: NP_TOSTRING_COULD_RETURN_NULL

2009-10-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765901#action_12765901
 ] 

Hadoop QA commented on PIG-1008:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422155/PIG-1008.patch
  against trunk revision 825308.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/27/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/27/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/27/console

This message is automatically generated.

> FINDBUGS: NP_TOSTRING_COULD_RETURN_NULL
> ---
>
> Key: PIG-1008
> URL: https://issues.apache.org/jira/browse/PIG-1008
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1008.patch
>
>
> NPorg.apache.pig.data.DataByteArray.toString() may return null
> NP
> org.apache.pig.impl.streaming.StreamingCommand$HandleSpec.equals(Object) does 
> not check for null argument

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1018) FINDBUGS: NM_FIELD_NAMING_CONVENTION: Field names should start with a lower case letter

2009-10-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765896#action_12765896
 ] 

Hadoop QA commented on PIG-1018:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422153/PIG-1018.patch
  against trunk revision 825308.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 315 release audit warnings 
(more than the trunk's current 309 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/80/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/80/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/80/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/80/console

This message is automatically generated.

> FINDBUGS: NM_FIELD_NAMING_CONVENTION: Field names should start with a lower 
> case letter
> ---
>
> Key: PIG-1018
> URL: https://issues.apache.org/jira/browse/PIG-1018
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1018.patch
>
>
> NmThe field name 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.LogToPhyMap
>  doesn't start with a lower case letter
> NmThe method name 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.CreateTuple(Object[])
>  doesn't start with a lower case letter
> NmThe class name 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.util.operatorHelper
>  doesn't start with an upper case letter
> NmClass org.apache.pig.impl.util.WrappedIOException is not derived from 
> an Exception, even though it is named as such
> NmThe method name 
> org.apache.pig.pen.EquivalenceClasses.GetEquivalenceClasses(LogicalOperator, 
> Map) doesn't start with a lower case letter
> NmThe field name org.apache.pig.pen.util.DisplayExamples.Result doesn't 
> start with a lower case letter
> NmThe method name 
> org.apache.pig.pen.util.DisplayExamples.PrintSimple(LogicalOperator, Map) 
> doesn't start with a lower case letter
> NmThe method name 
> org.apache.pig.pen.util.DisplayExamples.PrintTabular(LogicalPlan, Map) 
> doesn't start with a lower case letter
> NmThe method name 
> org.apache.pig.tools.parameters.TokenMgrError.LexicalError(boolean, int, int, 
> int, String, char) doesn't start with a lower case letter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-921) Strange use case for Join which produces different results in local and map reduce mode

2009-10-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-921:
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed.

> Strange use case for Join which produces different results in local and map 
> reduce mode
> ---
>
> Key: PIG-921
> URL: https://issues.apache.org/jira/browse/PIG-921
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
> Environment: Hadoop 18 and Hadoop 20
>Reporter: Viraj Bhat
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: A.txt, B.txt, joinusecase.pig, PIG-921-1.patch
>
>
> I have script in this manner, loads from 2 files A.txt and B.txt
> {code}
> A = LOAD 'A.txt' as (a:tuple(a1:int, a2:chararray));
> B = LOAD 'B.txt' as (b:tuple(b1:int, b2:chararray));
> C = JOIN A by a.a1, B by b.b1;
> DESCRIBE C;
> DUMP C;
> {code}
> A.txt contains the following lines:
> {code}
> (1,a)
> (2,aa)
> {code}
> B.txt contains the following lines:
> {code}
> (1,b)
> (2,bb)
> {code}
> Now running the above script in local and map reduce mode on Hadoop 18 & 
> Hadoop 20, produces the following:
> Hadoop 18
> =
> (1,1)
> (2,2)
> =
> Hadoop 20
> =
> (1,1)
> (2,2)
> =
> Local Mode: Pig with Hadoop 18 jar release 
> =
> 2009-08-13 17:15:13,473 [main] INFO  org.apache.pig.Main - Logging error 
> messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
> 09/08/13 17:15:13 INFO pig.Main: Logging error messages to: 
> /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
> C: {a: (a1: int,a2: chararray),b: (b1: int,b2: chararray)}
> 2009-08-13 17:15:13,932 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1002: Unable to store alias C
> 09/08/13 17:15:13 ERROR grunt.Grunt: ERROR 1002: Unable to store alias C
> Details at logfile: 
> /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
> =
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
> at 
> org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
> at 
> org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146)
> at 
> org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109)
> at 
> org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165)
> ... 9 more
> =
> Local Mode: Pig with Hadoop 20 jar release
> =
> ((1,a),(1,b))
> ((2,aa),(2,bb)
> =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-921) Strange use case for Join which produces different results in local and map reduce mode

2009-10-14 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765864#action_12765864
 ] 

Pradeep Kamath commented on PIG-921:


+1 - minor comment, we can probably remove preds==null || preds.get(0)==null 
from the if() since the project should always have a predecessor and if it does 
not the execution would fail somewhere else .

> Strange use case for Join which produces different results in local and map 
> reduce mode
> ---
>
> Key: PIG-921
> URL: https://issues.apache.org/jira/browse/PIG-921
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
> Environment: Hadoop 18 and Hadoop 20
>Reporter: Viraj Bhat
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: A.txt, B.txt, joinusecase.pig, PIG-921-1.patch
>
>
> I have script in this manner, loads from 2 files A.txt and B.txt
> {code}
> A = LOAD 'A.txt' as (a:tuple(a1:int, a2:chararray));
> B = LOAD 'B.txt' as (b:tuple(b1:int, b2:chararray));
> C = JOIN A by a.a1, B by b.b1;
> DESCRIBE C;
> DUMP C;
> {code}
> A.txt contains the following lines:
> {code}
> (1,a)
> (2,aa)
> {code}
> B.txt contains the following lines:
> {code}
> (1,b)
> (2,bb)
> {code}
> Now running the above script in local and map reduce mode on Hadoop 18 & 
> Hadoop 20, produces the following:
> Hadoop 18
> =
> (1,1)
> (2,2)
> =
> Hadoop 20
> =
> (1,1)
> (2,2)
> =
> Local Mode: Pig with Hadoop 18 jar release 
> =
> 2009-08-13 17:15:13,473 [main] INFO  org.apache.pig.Main - Logging error 
> messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
> 09/08/13 17:15:13 INFO pig.Main: Logging error messages to: 
> /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
> C: {a: (a1: int,a2: chararray),b: (b1: int,b2: chararray)}
> 2009-08-13 17:15:13,932 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1002: Unable to store alias C
> 09/08/13 17:15:13 ERROR grunt.Grunt: ERROR 1002: Unable to store alias C
> Details at logfile: 
> /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
> =
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
> at 
> org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
> at 
> org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146)
> at 
> org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109)
> at 
> org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165)
> ... 9 more
> =
> Local Mode: Pig with Hadoop 20 jar release
> =
> ((1,a),(1,b))
> ((2,aa),(2,bb)
> =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2009-10-14 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765860#action_12765860
 ] 

Alan Gates commented on PIG-928:


Questions that we need to answer to get this patch ready for commit:

1) How do we do type conversion?  The current patch assumes a single string 
input and output.  We'll want to be able to do conversions from scripting 
languages to pig types that make sense.  How this can be done is tied up with 
#2 below.

2) Do we do this using the Bean Scripting Framework or with specific bindings 
for each language?  This patch shows how to do the specific bindings for 
Groovy.  It can be done for Jython, and I'm reasonably sure it can be done for 
JRuby.  The obvious advantage of using the BSF is we get all the languages they 
support for free.  We need to understand the performance costs of each choice.  
We should be able to use the existing patch to test the difference between 
using the BSF and direct Groovy bindings.  Also, it seems like type conversions 
will be much easier to do if we use specific bindings, as we can do explicit 
type mappings for each language.  Perhaps this is possible with BSF, but I'm 
not sure how.

3) Grammer for how to declare these.  I propose that we allow two options:  
inlined in define and file referenced in define.  So these would roughly look 
like:

define myudf ScriptUDF('groovy', 'return input.get(0).split();');
define myudf ScriptUDF('python', myudf.py);

We could also support inlining in the Pig Latin itself, something like:

B = foreach A generate {'groovy', 'return input.get(0).split();');};

I'm not a fan of this type of inlining, as I think it makes the code hard to 
read.


> UDFs in scripting languages
> ---
>
> Key: PIG-928
> URL: https://issues.apache.org/jira/browse/PIG-928
> Project: Pig
>  Issue Type: New Feature
>Reporter: Alan Gates
> Attachments: package.zip
>
>
> It should be possible to write UDFs in scripting languages such as python, 
> ruby, etc.  This frees users from needing to compile Java, generate a jar, 
> etc.  It also opens Pig to programmers who prefer scripting languages over 
> Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1024) Script contains nested limit fail due to "LOLimit does not support multiple outputs"

2009-10-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765853#action_12765853
 ] 

Hadoop QA commented on PIG-1024:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422154/PIG-1024-1.patch
  against trunk revision 825308.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/26/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/26/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/26/console

This message is automatically generated.

> Script contains nested limit fail due to "LOLimit does not support multiple 
> outputs"
> 
>
> Key: PIG-1024
> URL: https://issues.apache.org/jira/browse/PIG-1024
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1024-1.patch
>
>
> The following script fail: 
> a = load '1.txt' as (a0:int, a1:int, a2:int);
> b = group a by a0;
> c = foreach b { c1 = limit a 10;
> c2 = (c1.a0/c1.a1);
> c3 = (c1.a0/c1.a2);
> generate c2, c3;}
> Error message:
> ERROR org.apache.pig.impl.plan.OperatorPlan - Attempt to give operator of type
> org.apache.pig.impl.logicalLayer.LOLimit multiple outputs.  This operator 
> does not support multiple outputs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1009) FINDBUGS: OS_OPEN_STREAM: Method may fail to close stream

2009-10-14 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1009:


Attachment: PIG-1009.patch

> FINDBUGS: OS_OPEN_STREAM: Method may fail to close stream
> -
>
> Key: PIG-1009
> URL: https://issues.apache.org/jira/browse/PIG-1009
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1009.patch
>
>
> OSorg.apache.pig.impl.io.FileLocalizer.parseCygPath(String, int) may fail 
> to close stream
> OSorg.apache.pig.impl.logicalLayer.parser.QueryParser.which(String) may 
> fail to close stream
> OS
> org.apache.pig.impl.util.PropertiesUtil.loadPropertiesFromFile(Properties) 
> may fail to close stream
> OSorg.apache.pig.Main.configureLog4J(Properties, PigContext) may fail to 
> close stream
> OS
> org.apache.pig.tools.parameters.PreprocessorContext.executeShellCommand(String)
>  may fail to close stream
> OS
> org.apache.pig.tools.parameters.PreprocessorContext.executeShellCommand(String)
>  may fail to close stream

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1009) FINDBUGS: OS_OPEN_STREAM: Method may fail to close stream

2009-10-14 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1009:


Status: Patch Available  (was: Open)

> FINDBUGS: OS_OPEN_STREAM: Method may fail to close stream
> -
>
> Key: PIG-1009
> URL: https://issues.apache.org/jira/browse/PIG-1009
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1009.patch
>
>
> OSorg.apache.pig.impl.io.FileLocalizer.parseCygPath(String, int) may fail 
> to close stream
> OSorg.apache.pig.impl.logicalLayer.parser.QueryParser.which(String) may 
> fail to close stream
> OS
> org.apache.pig.impl.util.PropertiesUtil.loadPropertiesFromFile(Properties) 
> may fail to close stream
> OSorg.apache.pig.Main.configureLog4J(Properties, PigContext) may fail to 
> close stream
> OS
> org.apache.pig.tools.parameters.PreprocessorContext.executeShellCommand(String)
>  may fail to close stream
> OS
> org.apache.pig.tools.parameters.PreprocessorContext.executeShellCommand(String)
>  may fail to close stream

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1008) FINDBUGS: NP_TOSTRING_COULD_RETURN_NULL

2009-10-14 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1008:


Attachment: PIG-1008.patch

> FINDBUGS: NP_TOSTRING_COULD_RETURN_NULL
> ---
>
> Key: PIG-1008
> URL: https://issues.apache.org/jira/browse/PIG-1008
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1008.patch
>
>
> NPorg.apache.pig.data.DataByteArray.toString() may return null
> NP
> org.apache.pig.impl.streaming.StreamingCommand$HandleSpec.equals(Object) does 
> not check for null argument

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1008) FINDBUGS: NP_TOSTRING_COULD_RETURN_NULL

2009-10-14 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1008:


Status: Patch Available  (was: Open)

> FINDBUGS: NP_TOSTRING_COULD_RETURN_NULL
> ---
>
> Key: PIG-1008
> URL: https://issues.apache.org/jira/browse/PIG-1008
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1008.patch
>
>
> NPorg.apache.pig.data.DataByteArray.toString() may return null
> NP
> org.apache.pig.impl.streaming.StreamingCommand$HandleSpec.equals(Object) does 
> not check for null argument

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1024) Script contains nested limit fail due to "LOLimit does not support multiple outputs"

2009-10-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1024:


Status: Patch Available  (was: Open)

> Script contains nested limit fail due to "LOLimit does not support multiple 
> outputs"
> 
>
> Key: PIG-1024
> URL: https://issues.apache.org/jira/browse/PIG-1024
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1024-1.patch
>
>
> The following script fail: 
> a = load '1.txt' as (a0:int, a1:int, a2:int);
> b = group a by a0;
> c = foreach b { c1 = limit a 10;
> c2 = (c1.a0/c1.a1);
> c3 = (c1.a0/c1.a2);
> generate c2, c3;}
> Error message:
> ERROR org.apache.pig.impl.plan.OperatorPlan - Attempt to give operator of type
> org.apache.pig.impl.logicalLayer.LOLimit multiple outputs.  This operator 
> does not support multiple outputs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1024) Script contains nested limit fail due to "LOLimit does not support multiple outputs"

2009-10-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-1024:
---

Assignee: Daniel Dai

> Script contains nested limit fail due to "LOLimit does not support multiple 
> outputs"
> 
>
> Key: PIG-1024
> URL: https://issues.apache.org/jira/browse/PIG-1024
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1024-1.patch
>
>
> The following script fail: 
> a = load '1.txt' as (a0:int, a1:int, a2:int);
> b = group a by a0;
> c = foreach b { c1 = limit a 10;
> c2 = (c1.a0/c1.a1);
> c3 = (c1.a0/c1.a2);
> generate c2, c3;}
> Error message:
> ERROR org.apache.pig.impl.plan.OperatorPlan - Attempt to give operator of type
> org.apache.pig.impl.logicalLayer.LOLimit multiple outputs.  This operator 
> does not support multiple outputs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1024) Script contains nested limit fail due to "LOLimit does not support multiple outputs"

2009-10-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1024:


Attachment: PIG-1024-1.patch

Patch included. Thanks Pradeep's diagnosis.

> Script contains nested limit fail due to "LOLimit does not support multiple 
> outputs"
> 
>
> Key: PIG-1024
> URL: https://issues.apache.org/jira/browse/PIG-1024
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-1024-1.patch
>
>
> The following script fail: 
> a = load '1.txt' as (a0:int, a1:int, a2:int);
> b = group a by a0;
> c = foreach b { c1 = limit a 10;
> c2 = (c1.a0/c1.a1);
> c3 = (c1.a0/c1.a2);
> generate c2, c3;}
> Error message:
> ERROR org.apache.pig.impl.plan.OperatorPlan - Attempt to give operator of type
> org.apache.pig.impl.logicalLayer.LOLimit multiple outputs.  This operator 
> does not support multiple outputs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1024) Script contains nested limit fail due to "LOLimit does not support multiple outputs"

2009-10-14 Thread Daniel Dai (JIRA)
Script contains nested limit fail due to "LOLimit does not support multiple 
outputs"


 Key: PIG-1024
 URL: https://issues.apache.org/jira/browse/PIG-1024
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
Reporter: Daniel Dai
 Fix For: 0.6.0


The following script fail: 

a = load '1.txt' as (a0:int, a1:int, a2:int);
b = group a by a0;
c = foreach b { c1 = limit a 10;
c2 = (c1.a0/c1.a1);
c3 = (c1.a0/c1.a2);
generate c2, c3;}

Error message:

ERROR org.apache.pig.impl.plan.OperatorPlan - Attempt to give operator of type
org.apache.pig.impl.logicalLayer.LOLimit multiple outputs.  This operator does 
not support multiple outputs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1018) FINDBUGS: NM_FIELD_NAMING_CONVENTION: Field names should start with a lower case letter

2009-10-14 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1018:


Status: Patch Available  (was: Open)

> FINDBUGS: NM_FIELD_NAMING_CONVENTION: Field names should start with a lower 
> case letter
> ---
>
> Key: PIG-1018
> URL: https://issues.apache.org/jira/browse/PIG-1018
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1018.patch
>
>
> NmThe field name 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.LogToPhyMap
>  doesn't start with a lower case letter
> NmThe method name 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.CreateTuple(Object[])
>  doesn't start with a lower case letter
> NmThe class name 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.util.operatorHelper
>  doesn't start with an upper case letter
> NmClass org.apache.pig.impl.util.WrappedIOException is not derived from 
> an Exception, even though it is named as such
> NmThe method name 
> org.apache.pig.pen.EquivalenceClasses.GetEquivalenceClasses(LogicalOperator, 
> Map) doesn't start with a lower case letter
> NmThe field name org.apache.pig.pen.util.DisplayExamples.Result doesn't 
> start with a lower case letter
> NmThe method name 
> org.apache.pig.pen.util.DisplayExamples.PrintSimple(LogicalOperator, Map) 
> doesn't start with a lower case letter
> NmThe method name 
> org.apache.pig.pen.util.DisplayExamples.PrintTabular(LogicalPlan, Map) 
> doesn't start with a lower case letter
> NmThe method name 
> org.apache.pig.tools.parameters.TokenMgrError.LexicalError(boolean, int, int, 
> int, String, char) doesn't start with a lower case letter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1018) FINDBUGS: NM_FIELD_NAMING_CONVENTION: Field names should start with a lower case letter

2009-10-14 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1018:


Attachment: PIG-1018.patch

> FINDBUGS: NM_FIELD_NAMING_CONVENTION: Field names should start with a lower 
> case letter
> ---
>
> Key: PIG-1018
> URL: https://issues.apache.org/jira/browse/PIG-1018
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1018.patch
>
>
> NmThe field name 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.LogToPhyMap
>  doesn't start with a lower case letter
> NmThe method name 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.CreateTuple(Object[])
>  doesn't start with a lower case letter
> NmThe class name 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.util.operatorHelper
>  doesn't start with an upper case letter
> NmClass org.apache.pig.impl.util.WrappedIOException is not derived from 
> an Exception, even though it is named as such
> NmThe method name 
> org.apache.pig.pen.EquivalenceClasses.GetEquivalenceClasses(LogicalOperator, 
> Map) doesn't start with a lower case letter
> NmThe field name org.apache.pig.pen.util.DisplayExamples.Result doesn't 
> start with a lower case letter
> NmThe method name 
> org.apache.pig.pen.util.DisplayExamples.PrintSimple(LogicalOperator, Map) 
> doesn't start with a lower case letter
> NmThe method name 
> org.apache.pig.pen.util.DisplayExamples.PrintTabular(LogicalPlan, Map) 
> doesn't start with a lower case letter
> NmThe method name 
> org.apache.pig.tools.parameters.TokenMgrError.LexicalError(boolean, int, int, 
> int, String, char) doesn't start with a lower case letter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries

2009-10-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1020:


   Resolution: Fixed
Fix Version/s: 0.5.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

No unit test included since it only changes build.xml. Patch committed to both 
trunk and 0.5 branch. 

New target for pig.jar without hadoop libs is "jar-withouthadoop".

> Include an ant target to build pig.jar without hadoop libraries
> ---
>
> Key: PIG-1020
> URL: https://issues.apache.org/jira/browse/PIG-1020
> Project: Pig
>  Issue Type: New Feature
>  Components: build
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.5.0, 0.6.0
>
> Attachments: PIG-1020-1.patch, PIG-1020-2.patch
>
>
> Provide an ant target to build pig.jar without all hadoop related libraries. 
> User will provide external hadoop jars in classpath before invoking pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1014) Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records

2009-10-14 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765779#action_12765779
 ] 

Santhosh Srinivasan commented on PIG-1014:
--

Another option is to change the implementation of COUNT to reflect the proposed 
semantics. If the underlying UDF is changed then the user should be notified 
via an information message. If the user checks the explain output then (s)he 
will notice COUNT_STAR and will be confused.

> Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all 
> records are counted without considering nullness of the fields in the records
> 
>
> Key: PIG-1014
> URL: https://issues.apache.org/jira/browse/PIG-1014
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Pradeep Kamath
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1000) InternalCachedBag.java generates javac warning and findbug warning

2009-10-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1000:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

This patch is to address javacc and findbug warnings, no unit test needed. 
Patch committed. Thanks Ying!

> InternalCachedBag.java generates javac warning and findbug warning
> --
>
> Key: PIG-1000
> URL: https://issues.apache.org/jira/browse/PIG-1000
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Ying He
>Assignee: Ying He
> Fix For: 0.6.0
>
> Attachments: PIG-1000.patch
>
>
> patch submitted by PIG-975 generates javac warning and findbug warning

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1023) FINDBUGS: exclude CN_IDIOM_NO_SUPER_CALL

2009-10-14 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-1023.
-

Resolution: Fixed

patch committed

> FINDBUGS: exclude CN_IDIOM_NO_SUPER_CALL
> 
>
> Key: PIG-1023
> URL: https://issues.apache.org/jira/browse/PIG-1023
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Attachments: PIG-1023.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-944) Zebra schema is taken from Pig through TableStorer's construct

2009-10-14 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765767#action_12765767
 ] 

Yan Zhou commented on PIG-944:
--

A typo in one of my earlier comments at 02/Oct/09 10:33 PM. Instead of 

This patch must be applied after the patch for Jira PIG-933 has been applied. 

it should have read as

This patch must be applied after the patch for Jira PIG-993 has been applied. 

> Zebra schema is taken from Pig through TableStorer's construct
> --
>
> Key: PIG-944
> URL: https://issues.apache.org/jira/browse/PIG-944
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.6.0
>
> Attachments: SchemaConversion.patch, SchemaConversion.patch
>
>
> It should be from StoreConfig in TableOutputFormat.checkOutputSpecs method 
> because the information is dynamic in Pig's execution engine and should not 
> be taking a static argument to the constructor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1023) FINDBUGS: exclude CN_IDIOM_NO_SUPER_CALL

2009-10-14 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765751#action_12765751
 ] 

Daniel Dai commented on PIG-1023:
-

+1. Target findbug warnings suppressed. Findbugs generate 37 less warnings.

> FINDBUGS: exclude CN_IDIOM_NO_SUPER_CALL
> 
>
> Key: PIG-1023
> URL: https://issues.apache.org/jira/browse/PIG-1023
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Attachments: PIG-1023.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1000) InternalCachedBag.java generates javac warning and findbug warning

2009-10-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765744#action_12765744
 ] 

Hadoop QA commented on PIG-1000:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12421679/PIG-1000.patch
  against trunk revision 824980.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/79/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/79/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/79/console

This message is automatically generated.

> InternalCachedBag.java generates javac warning and findbug warning
> --
>
> Key: PIG-1000
> URL: https://issues.apache.org/jira/browse/PIG-1000
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Ying He
>Assignee: Ying He
> Fix For: 0.6.0
>
> Attachments: PIG-1000.patch
>
>
> patch submitted by PIG-975 generates javac warning and findbug warning

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-858) Order By followed by "replicated" join fails while compiling MR-plan from physical plan

2009-10-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765735#action_12765735
 ] 

Ashutosh Chauhan commented on PIG-858:
--

Its been a while since I did that patch. So, bit more clarification: We are 
interested in finding PO which corresponds to "fragment" PO input of POFRJoin. 
This PO is already compiled and is in one the MROper. Earlier we  will iterate 
through compiledInputs array trying to match this PO  with PO contained in each 
MROperator. This fails as discussed in previous comments. With this change, 
since we keep track of MR operator with each physical operator it need not to 
do that but can simply look up for MROper corresponding to "fragment" PO in the 
phyToMROpMap.

> Order By followed by "replicated" join fails while compiling MR-plan from 
> physical plan
> ---
>
> Key: PIG-858
> URL: https://issues.apache.org/jira/browse/PIG-858
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.6.0
>
> Attachments: pig-858.patch
>
>
> Consider the query:
> {code}
> A = load 'a';
> B = order A by $0;
> C = join A by $0, B by $0;
> explain C;
> {code}
> works. But if replicated join is used instead
> {code}
> A = load 'a';
> B = order A by $0;
> C = join A by $0, B by $0 using "replicated";
> explain C;
> {code}
> this fails with ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2034: Error 
> compiling operator POFRJoin
> relevant stacktrace:
> {code}
> Caused by: java.lang.RuntimeException: 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException:
>  ERROR 2034: Error compiling operator POFRJoin
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:306)
> at org.apache.pig.PigServer.explain(PigServer.java:574)
> ... 8 more
> Caused by: 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException:
>  ERROR 2034: Error compiling operator POFRJoin
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:942)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:173)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:342)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:327)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:233)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:301)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.explain(MapReduceLauncher.java:278)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:303)
> ... 9 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:901)
> ... 16 more
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries

2009-10-14 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765734#action_12765734
 ] 

Olga Natkovich commented on PIG-1020:
-

+1, please, commit to trunk and 0.5.0 branch

> Include an ant target to build pig.jar without hadoop libraries
> ---
>
> Key: PIG-1020
> URL: https://issues.apache.org/jira/browse/PIG-1020
> Project: Pig
>  Issue Type: New Feature
>  Components: build
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: PIG-1020-1.patch, PIG-1020-2.patch
>
>
> Provide an ant target to build pig.jar without all hadoop related libraries. 
> User will provide external hadoop jars in classpath before invoking pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-858) Order By followed by "replicated" join fails while compiling MR-plan from physical plan

2009-10-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765720#action_12765720
 ] 

Ashutosh Chauhan commented on PIG-858:
--

visitUnion has same changes as others visit functions, that is it adds MR 
Operator corresponding to POUnion in phyToMROpMap map. Real changes are in 
visitFRJoin. Earlier in visitFRJoin, it used to look in compiledInputs array of 
MROper one by one trying to match MROPer leaf PO with POFRJoin using operator 
key. Now, it doesn't need to do that it can simply lookup in the phyToMROpMap.

> Order By followed by "replicated" join fails while compiling MR-plan from 
> physical plan
> ---
>
> Key: PIG-858
> URL: https://issues.apache.org/jira/browse/PIG-858
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.6.0
>
> Attachments: pig-858.patch
>
>
> Consider the query:
> {code}
> A = load 'a';
> B = order A by $0;
> C = join A by $0, B by $0;
> explain C;
> {code}
> works. But if replicated join is used instead
> {code}
> A = load 'a';
> B = order A by $0;
> C = join A by $0, B by $0 using "replicated";
> explain C;
> {code}
> this fails with ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2034: Error 
> compiling operator POFRJoin
> relevant stacktrace:
> {code}
> Caused by: java.lang.RuntimeException: 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException:
>  ERROR 2034: Error compiling operator POFRJoin
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:306)
> at org.apache.pig.PigServer.explain(PigServer.java:574)
> ... 8 more
> Caused by: 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException:
>  ERROR 2034: Error compiling operator POFRJoin
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:942)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:173)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:342)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:327)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:233)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:301)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.explain(MapReduceLauncher.java:278)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:303)
> ... 9 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:901)
> ... 16 more
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1023) FINDBUGS: exclude CN_IDIOM_NO_SUPER_CALL

2009-10-14 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1023:


Attachment: PIG-1023.patch

> FINDBUGS: exclude CN_IDIOM_NO_SUPER_CALL
> 
>
> Key: PIG-1023
> URL: https://issues.apache.org/jira/browse/PIG-1023
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Attachments: PIG-1023.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1023) FINDBUGS: exclude CN_IDIOM_NO_SUPER_CALL

2009-10-14 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765718#action_12765718
 ] 

Olga Natkovich commented on PIG-1023:
-

This does not have to go through patch test process. Could one of the 
committers please review

> FINDBUGS: exclude CN_IDIOM_NO_SUPER_CALL
> 
>
> Key: PIG-1023
> URL: https://issues.apache.org/jira/browse/PIG-1023
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Attachments: PIG-1023.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1023) FINDBUGS: exclude CN_IDIOM_NO_SUPER_CALL

2009-10-14 Thread Olga Natkovich (JIRA)
FINDBUGS: exclude CN_IDIOM_NO_SUPER_CALL


 Key: PIG-1023
 URL: https://issues.apache.org/jira/browse/PIG-1023
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Olga Natkovich




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1005) FINDBUGS: CN_IDIOM_NO_SUPER_CALL in plans

2009-10-14 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765714#action_12765714
 ] 

Olga Natkovich commented on PIG-1005:
-

Adding to exclude list for now

>  FINDBUGS: CN_IDIOM_NO_SUPER_CALL in  plans
> ---
>
> Key: PIG-1005
> URL: https://issues.apache.org/jira/browse/PIG-1005
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>
> CNorg.apache.pig.impl.logicalLayer.LogicalPlan.clone() does not call 
> super.clone()
> CN   
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.clone()
>  does not call super.clone()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1004) FINDBUGS: CN_IDIOM_NO_SUPER_CALL in org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators

2009-10-14 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765712#action_12765712
 ] 

Olga Natkovich commented on PIG-1004:
-

Added to exclue file for now

> FINDBUGS: CN_IDIOM_NO_SUPER_CALL in   
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators
> 
>
> Key: PIG-1004
> URL: https://issues.apache.org/jira/browse/PIG-1004
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>
> Will address this during next cleanup:
> CN
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODistinct.clone()
>  does not call super.clone()
> CN
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.clone()
>  does not call super.clone()
> CN
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLimit.clone()
>  does not call super.clone()
> CN
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.clone()
>  does not call super.clone()
> CN
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrangeForIllustrate.clone()
>  does not call super.clone()
> CN
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POOptimizedForEach.clone()
>  does not call super.clone()
> CN
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSort.clone()
>  does not call super.clone()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1003) FINDBUGS: CN_IDIOM_NO_SUPER_CALL in org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators

2009-10-14 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765711#action_12765711
 ] 

Olga Natkovich commented on PIG-1003:
-

Added to exclude file for now

> FINDBUGS: CN_IDIOM_NO_SUPER_CALL in   
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators
> 
>
> Key: PIG-1003
> URL: https://issues.apache.org/jira/browse/PIG-1003
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>
> All physical expression operators have this issue. In the clone method, they 
> instanciate a new object rather than call super.clone.
> This is a major change and for now I am planning to exclude this warning. We 
> will address it once we work on the frontend rewrite.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1016) Reading in map data seems broken

2009-10-14 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765695#action_12765695
 ] 

Santhosh Srinivasan commented on PIG-1016:
--

The fix proposed in this JIRA reverts the changes made as part of PIG-880. Can 
you explain in more detail about the issue that you are facing currently? 
Specifically, can you provide a test case that reproduces this bug.

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

2009-10-14 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765682#action_12765682
 ] 

Daniel Dai commented on PIG-1022:
-

Actually we cannot push the filter even before f2. Since we do not keep track 
of the source of data inside tuple, so gid should be treated as a generated 
field of f2. However, projection map of f2 give us the wrong result that gid is 
a directly mapped field of group (which is a tuple (name, gid)), and this 
triggers all the subsequences. The fix for this problem is to modify the 
projection map generation logic for the mapped field. 

Santhosh, do you have any comment?

> optimizer pushes filter before the foreach that generates column used by 
> filter
> ---
>
> Key: PIG-1022
> URL: https://issues.apache.org/jira/browse/PIG-1022
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Daniel Dai
>
> grunt> l = load 'students.txt' using PigStorage() as (name:chararray, 
> gender:chararray, age:chararray, score:chararray);
> grunt> f = foreach l generate name, gender, age,score, '200'  as 
> gid:chararray;
> grunt> g = group f by (name, gid);
> grunt> f2 = foreach g generate group.name as name: chararray, group.gid as 
> gid: chararray;
> grunt> filt = filter f2 by gid == '200';
> grunt> explain filt;
> In the plan generated filt is pushed up after the load and before the first 
> foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1000) InternalCachedBag.java generates javac warning and findbug warning

2009-10-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1000:


Status: Patch Available  (was: Open)

> InternalCachedBag.java generates javac warning and findbug warning
> --
>
> Key: PIG-1000
> URL: https://issues.apache.org/jira/browse/PIG-1000
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Ying He
>Assignee: Ying He
> Fix For: 0.6.0
>
> Attachments: PIG-1000.patch
>
>
> patch submitted by PIG-975 generates javac warning and findbug warning

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-966) Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces

2009-10-14 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765663#action_12765663
 ] 

Dmitriy V. Ryaboy commented on PIG-966:
---

Regarding historgram representation:

I took a look at how Postgres does it, and they simply store 3 arrays:

* An array of "Most Common Values", which contains exactly what it sounds like, 
ordered in decreasing frequency
* A matching array of frequencies, expressed as a fraction of the total row 
count in the relation.
* an array of sorted values chosen in such a way that the number of rows with 
values between A[i] and A[i+1] is roughly the same for all i.  An interesting 
optimization they perform is that if the most common values array described 
above is defined for this field, then the values in that array are not included 
when calculating the boundaries for the histogram. They say that's called a 
"compressed histogram", if someone wants to dig up some papers on this.

Any objections to this design?



> Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces
> ---
>
> Key: PIG-966
> URL: https://issues.apache.org/jira/browse/PIG-966
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Alan Gates
>Assignee: Alan Gates
>
> I propose that we rework the LoadFunc, StoreFunc, and Slice/r interfaces 
> significantly.  See http://wiki.apache.org/pig/LoadStoreRedesignProposal for 
> full details

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1000) InternalCachedBag.java generates javac warning and findbug warning

2009-10-14 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765660#action_12765660
 ] 

Daniel Dai commented on PIG-1000:
-

+1

> InternalCachedBag.java generates javac warning and findbug warning
> --
>
> Key: PIG-1000
> URL: https://issues.apache.org/jira/browse/PIG-1000
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.4.0
>Reporter: Ying He
>Assignee: Ying He
> Fix For: 0.6.0
>
> Attachments: PIG-1000.patch
>
>
> patch submitted by PIG-975 generates javac warning and findbug warning

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1014) Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records

2009-10-14 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765658#action_12765658
 ] 

Pradeep Kamath commented on PIG-1014:
-

To achieve 1. above, we would translate COUNT( A ) to COUNT_STAR( A ) during 
job compilation. Since 3. above has multiple options and does not seem to be a 
prevalent use case (SQL does not support it), another option is to disable it - 
thoughts?

> Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all 
> records are counted without considering nullness of the fields in the records
> 
>
> Key: PIG-1014
> URL: https://issues.apache.org/jira/browse/PIG-1014
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Pradeep Kamath
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

2009-10-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-1022:
---

Assignee: Daniel Dai

> optimizer pushes filter before the foreach that generates column used by 
> filter
> ---
>
> Key: PIG-1022
> URL: https://issues.apache.org/jira/browse/PIG-1022
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
>Assignee: Daniel Dai
>
> grunt> l = load 'students.txt' using PigStorage() as (name:chararray, 
> gender:chararray, age:chararray, score:chararray);
> grunt> f = foreach l generate name, gender, age,score, '200'  as 
> gid:chararray;
> grunt> g = group f by (name, gid);
> grunt> f2 = foreach g generate group.name as name: chararray, group.gid as 
> gid: chararray;
> grunt> filt = filter f2 by gid == '200';
> grunt> explain filt;
> In the plan generated filt is pushed up after the load and before the first 
> foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-10-14 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765626#action_12765626
 ] 

Dmitriy V. Ryaboy commented on PIG-760:
---

This would be a nice proof-of-concept task for the new Load/StoreMetadata 
interfaces, as it removes the complexity of dealing with something like Owl.

> Serialize schemas for PigStorage() and other storage types.
> ---
>
> Key: PIG-760
> URL: https://issues.apache.org/jira/browse/PIG-760
> Project: Pig
>  Issue Type: New Feature
>Reporter: David Ciemiewicz
>
> I'm finding PigStorage() really convenient for storage and data interchange 
> because it compresses well and imports into Excel and other analysis 
> environments well.
> However, it is a pain when it comes to maintenance because the columns are in 
> fixed locations and I'd like to add columns in some cases.
> It would be great if load PigStorage() could read a default schema from a 
> .schema file stored with the data and if store PigStorage() could store a 
> .schema file with the data.
> I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
> will ignore a file called .schema in a directory of part files.
> So, for example, if I have a chain of Pig scripts I execute such as:
> A = load 'data-1' using PigStorage() as ( a: int , b: int );
> store A into 'data-2' using PigStorage();
> B = load 'data-2' using PigStorage();
> describe B;
> describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

2009-10-14 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765612#action_12765612
 ] 

Thejas M Nair commented on PIG-1022:


${code}
grunt> explain filt;
#---
# Logical Plan:
#---

Store 1-1162 Schema: {name: chararray,gid: chararray} Type: Unknown
|
|---ForEach 1-1148 Schema: {name: chararray,gid: chararray} Type: bag
|   |
|   Project 1-1144 Projections: [0] Overloaded: false FieldSchema: name: 
chararray Type: chararray
|   Input: Project 1-1145 Projections: [0] Overloaded: false|
|   |---Project 1-1145 Projections: [0] Overloaded: false FieldSchema: 
group: tuple({name: chararray,gid: chararray}) Type: tuple
|   Input: CoGroup 1-1138
|   |
|   Project 1-1146 Projections: [1] Overloaded: false FieldSchema: gid: 
chararray Type: chararray
|   Input: Project 1-1147 Projections: [0] Overloaded: false|
|   |---Project 1-1147 Projections: [0] Overloaded: false FieldSchema: 
group: tuple({name: chararray,gid: chararray}) Type: tuple
|   Input: CoGroup 1-1138
|
|---CoGroup 1-1138 Schema: {group: (name: chararray,gid: chararray),f: 
{name: chararray,gender: chararray,age: chararray,score: chararray,gid: 
chararray}} Type: bag
|   |
|   Project 1-1136 Projections: [0] Overloaded: false FieldSchema: 
name: chararray Type: chararray
|   Input: ForEach 1-1135
|   |
|   Project 1-1137 Projections: [4] Overloaded: false FieldSchema: gid: 
chararray Type: chararray
|   Input: ForEach 1-1135
|
|---ForEach 1-1135 Schema: {name: chararray,gender: chararray,age: 
chararray,score: chararray,gid: chararray} Type: bag
|   |
|   Project 1-1130 Projections: [0] Overloaded: false FieldSchema: 
name: chararray Type: chararray
|   Input: Filter 1-1152
|   |
|   Project 1-1131 Projections: [1] Overloaded: false FieldSchema: 
gender: chararray Type: chararray
|   Input: Filter 1-1152
|   |
|   Project 1-1132 Projections: [2] Overloaded: false FieldSchema: 
age: chararray Type: chararray
|   Input: Filter 1-1152
|   |
|   Project 1-1133 Projections: [3] Overloaded: false FieldSchema: 
score: chararray Type: chararray
|   Input: Filter 1-1152
|   |
|   Const 1-1134( 200 ) FieldSchema: chararray Type: chararray
|
|---Filter 1-1152 Schema: {name: chararray,gender: chararray,age: 
chararray,score: chararray} Type: bag
|   |
|   Equal 1-1151 FieldSchema: boolean Type: boolean
|   |
|   |---Project 1-1149 Projections: [0] Overloaded: false 
FieldSchema: name: chararray Type: chararray
|   |   Input: ForEach 1-1161
|   |
|   |---Const 1-1150( 200 ) FieldSchema: chararray Type: 
chararray
|
|---ForEach 1-1161 Schema: {name: chararray,gender: 
chararray,age: chararray,score: chararray} Type: bag
|   |
|   Cast 1-1154 FieldSchema: name: chararray Type: chararray
|   |
|   |---Project 1-1153 Projections: [0] Overloaded: false 
FieldSchema: name: bytearray Type: bytearray
|   Input: Load 1-1123
|   |
|   Cast 1-1156 FieldSchema: gender: chararray Type: 
chararray
|   |
|   |---Project 1-1155 Projections: [1] Overloaded: false 
FieldSchema: gender: bytearray Type: bytearray
|   Input: Load 1-1123
|   |
|   Cast 1-1158 FieldSchema: age: chararray Type: chararray
|   |
|   |---Project 1-1157 Projections: [2] Overloaded: false 
FieldSchema: age: bytearray Type: bytearray
|   Input: Load 1-1123
|   |
|   Cast 1-1160 FieldSchema: score: chararray Type: 
chararray
|   |
|   |---Project 1-1159 Projections: [3] Overloaded: false 
FieldSchema: score: bytearray Type: bytearray
|   Input: Load 1-1123
|
|---Load 1-1123 Schema: {name: bytearray,gender: 
bytearray,age: bytearray,score: bytearray} Type: bag

${code}

> optimizer pushes filter before the foreach that generates column used by 
> filter
> ---
>
> Key: PIG-1022
> URL: https://issues.apache.org/jira/browse/PIG-1022
> Project: Pig
>  Issue Type: Bug
>  Components: i

[jira] Created: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

2009-10-14 Thread Thejas M Nair (JIRA)
optimizer pushes filter before the foreach that generates column used by filter
---

 Key: PIG-1022
 URL: https://issues.apache.org/jira/browse/PIG-1022
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Thejas M Nair


grunt> l = load 'students.txt' using PigStorage() as (name:chararray, 
gender:chararray, age:chararray, score:chararray);
grunt> f = foreach l generate name, gender, age,score, '200'  as gid:chararray;
grunt> g = group f by (name, gid);
grunt> f2 = foreach g generate group.name as name: chararray, group.gid as gid: 
chararray;
grunt> filt = filter f2 by gid == '200';
grunt> explain filt;

In the plan generated filt is pushed up after the load and before the first 
foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-10-14 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765603#action_12765603
 ] 

Alan Gates commented on PIG-760:


At this point no one has contributed a PigStorageSchema as suggested above.  We 
remain open to such a contribution if someone has the time.  

> Serialize schemas for PigStorage() and other storage types.
> ---
>
> Key: PIG-760
> URL: https://issues.apache.org/jira/browse/PIG-760
> Project: Pig
>  Issue Type: New Feature
>Reporter: David Ciemiewicz
>
> I'm finding PigStorage() really convenient for storage and data interchange 
> because it compresses well and imports into Excel and other analysis 
> environments well.
> However, it is a pain when it comes to maintenance because the columns are in 
> fixed locations and I'd like to add columns in some cases.
> It would be great if load PigStorage() could read a default schema from a 
> .schema file stored with the data and if store PigStorage() could store a 
> .schema file with the data.
> I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
> will ignore a file called .schema in a directory of part files.
> So, for example, if I have a chain of Pig scripts I execute such as:
> A = load 'data-1' using PigStorage() as ( a: int , b: int );
> store A into 'data-2' using PigStorage();
> B = load 'data-2' using PigStorage();
> describe B;
> describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-858) Order By followed by "replicated" join fails while compiling MR-plan from physical plan

2009-10-14 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765601#action_12765601
 ] 

Alan Gates commented on PIG-858:


Mostly looks straight forward and passes all the tests.  You made a number of 
changes in MRCompiler.visitUnion.  I don't understand what exactly you were 
changing there.  Could you give a brief overview of those changes?

> Order By followed by "replicated" join fails while compiling MR-plan from 
> physical plan
> ---
>
> Key: PIG-858
> URL: https://issues.apache.org/jira/browse/PIG-858
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.6.0
>
> Attachments: pig-858.patch
>
>
> Consider the query:
> {code}
> A = load 'a';
> B = order A by $0;
> C = join A by $0, B by $0;
> explain C;
> {code}
> works. But if replicated join is used instead
> {code}
> A = load 'a';
> B = order A by $0;
> C = join A by $0, B by $0 using "replicated";
> explain C;
> {code}
> this fails with ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2034: Error 
> compiling operator POFRJoin
> relevant stacktrace:
> {code}
> Caused by: java.lang.RuntimeException: 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException:
>  ERROR 2034: Error compiling operator POFRJoin
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:306)
> at org.apache.pig.PigServer.explain(PigServer.java:574)
> ... 8 more
> Caused by: 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException:
>  ERROR 2034: Error compiling operator POFRJoin
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:942)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:173)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:342)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:327)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:233)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:301)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.explain(MapReduceLauncher.java:278)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:303)
> ... 9 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:901)
> ... 16 more
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1017) Converts strings to text in Pig

2009-10-14 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765598#action_12765598
 ] 

Olga Natkovich commented on PIG-1017:
-

L9 is still slower - any idea why?

Also, before we can commit these changes, we need to make sure that all 
piggybank functions are converted if they use string.

We also need to provide update for the UDF manual changes.

> Converts strings to text in Pig
> ---
>
> Key: PIG-1017
> URL: https://issues.apache.org/jira/browse/PIG-1017
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
>
> Strings in Java are UTF-16 and takes 2 bytes. Text 
> (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show 
> significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.