date:20091013


 [ 
https://issues.apache.org/jira/browse/PIG-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1001:


Status: Patch Available  (was: Open)

 Generate more meaningful error message when one input file does not exist
 -

 Key: PIG-1001
 URL: https://issues.apache.org/jira/browse/PIG-1001
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1001-1.patch


 In the following query, if 2.txt does not exist, 
 a = load '1.txt';
 b = order a by $0;
 c = load '2.txt';
 d = order c by $0;
 e = join b by $0, d by $0;
 dump e;
 Pig throws error message ERROR 2100: file:/tmp/temp155054664/tmp1144108421 
 does not exist., Pig should deal with it with the error message Input file 
 2.txt not exist instead of those confusing messages.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries

[
https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12764997#action_12764997
]

Hadoop QA commented on PIG-1020:

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12421951/PIG-1020-1.patch
against trunk revision 824446.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/75/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/75/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/75/console

This message is automatically generated.

Include an ant target to build pig.jar without hadoop libraries
---

Key: PIG-1020
URL: https://issues.apache.org/jira/browse/PIG-1020
Project: Pig
Issue Type: New Feature
Components: build
Affects Versions: 0.4.0
Reporter: Daniel Dai
Assignee: Daniel Dai
Priority: Minor
Fix For: 0.6.0

Attachments: PIG-1020-1.patch

Provide an ant target to build pig.jar without all hadoop related libraries.
User will provide external hadoop jars in classpath before invoking pig.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1001) Generate more meaningful error message when one input file does not exist

[
https://issues.apache.org/jira/browse/PIG-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765016#action_12765016
]

Hadoop QA commented on PIG-1001:

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12421956/PIG-1001-1.patch
against trunk revision 824446.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/21/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/21/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/21/console

This message is automatically generated.

Generate more meaningful error message when one input file does not exist
-

Key: PIG-1001
URL: https://issues.apache.org/jira/browse/PIG-1001
Project: Pig
Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Daniel Dai
Fix For: 0.6.0

Attachments: PIG-1001-1.patch

In the following query, if 2.txt does not exist,
a = load '1.txt';
b = order a by $0;
c = load '2.txt';
d = order c by $0;
e = join b by $0, d by $0;
dump e;
Pig throws error message ERROR 2100: file:/tmp/temp155054664/tmp1144108421
does not exist., Pig should deal with it with the error message Input file
2.txt not exist instead of those confusing messages.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1019) FINDBUGS: add exclude file


[ 
https://issues.apache.org/jira/browse/PIG-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765116#action_12765116
 ] 

Olga Natkovich commented on PIG-1019:
-

-1 on tests is ok since this is not a code related patch
-1 on release audit is also ok - it is due to exclude file not having a header

can one of the committers review the patch, please.

 FINDBUGS: add exclude file
 --

 Key: PIG-1019
 URL: https://issues.apache.org/jira/browse/PIG-1019
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Attachments: PIG-1019.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1016) Reading in map data seems broken


 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1016:


Status: Patch Available  (was: Open)

 Reading in map data seems broken
 

 Key: PIG-1016
 URL: https://issues.apache.org/jira/browse/PIG-1016
 Project: Pig
  Issue Type: Improvement
  Components: data
Affects Versions: 0.4.0
Reporter: hc busy
 Attachments: PIG-1016.patch


 Hi, I'm trying to load a map that has a tuple for value. The read fails in 
 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
 documentation it is stated that value of the map can be any time.
 I've attached a patch that allows us to read in complex objects as value as 
 documented. I've done simple verification of loading in maps with tuple/map 
 values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-16) setting parallel from grunt via set command

[
https://issues.apache.org/jira/browse/PIG-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Olga Natkovich resolved PIG-16.
---

Resolution: Fixed

setting parallel from grunt via set command
---

Key: PIG-16
URL: https://issues.apache.org/jira/browse/PIG-16
Project: Pig
Issue Type: Improvement
Components: grunt
Reporter: Olga Natkovich
Priority: Minor

I'd like to propose a different model which uses the grunt set option
and/or a command line option which sets reduce
parallelism to the be true and automatic.
set reduce_parallelism TRUE
set reduce_parallelism FALSE [Default - BTW, why is this the default?]
This way I won't have to update my script every single time I try playing
with -Dhod=-m N, parallelism for reduce
statements will default, appropriately, to 2*(N-1).
Alternatively, could I just specify PARALLEL with no value or PARALLEL
DEFAULT; And any time I needed to force reduce
to be single job, I could write PARALLEL 1.
Basically, this whole thing tripped me up for a long time and I just haven't
understood if there is a really good
reason to not make parallelism.
I guess it might be if you have aggregation functions that do not parallelize.
If this is the case, then it seems to me that this should be detectable
automagically based on whether the function is
a vanilla EvalFunction or if it is an AlgebraicFunction.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1019) FINDBUGS: add exclude file


[ 
https://issues.apache.org/jira/browse/PIG-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765131#action_12765131
 ] 

Alan Gates commented on PIG-1019:
-

+1

 FINDBUGS: add exclude file
 --

 Key: PIG-1019
 URL: https://issues.apache.org/jira/browse/PIG-1019
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Attachments: PIG-1019.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1014) Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records


[ 
https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765136#action_12765136
 ] 

Alan Gates commented on PIG-1014:
-

I think I agree with Santhosh here.  While it may be unfortunate that our 
syntax makes it difficult to match the rather strange semantics of COUNT(x) vs 
COUNT(*) in SQL, I'm not sure trying to make a distinct between COUNT(A) and 
COUNT(A.$0) is the right solution.  This will not be obvious at all to users.  
If anything, the right way to do this would be COUNT(A.*), but I'm not sure 
even about that.

 Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all 
 records are counted without considering nullness of the fields in the records
 

 Key: PIG-1014
 URL: https://issues.apache.org/jira/browse/PIG-1014
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Pradeep Kamath



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1019) FINDBUGS: add exclude file


 [ 
https://issues.apache.org/jira/browse/PIG-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1019:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

 FINDBUGS: add exclude file
 --

 Key: PIG-1019
 URL: https://issues.apache.org/jira/browse/PIG-1019
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Attachments: PIG-1019.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-976) Multi-query optimization throws ClassCastException

2009-10-13 Thread Richard Ding (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-976:
-

Status: Open  (was: Patch Available)

 Multi-query optimization throws ClassCastException
 --

 Key: PIG-976
 URL: https://issues.apache.org/jira/browse/PIG-976
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
Reporter: Ankur
Assignee: Richard Ding
 Attachments: PIG-976.patch, PIG-976.patch, PIG-976.patch, 
 PIG-976.patch


 Multi-query optimization fails to merge 2 branches when 1 is a result of 
 Group By ALL and another is a result of Group By field1 where field 1 is of 
 type long. Here is the script that fails with multi-query on.
 data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); 
 A = GROUP data ALL;
 B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2;
 C = FOREACH B GENERATE (sum1/sum2) AS rate; 
 STORE C INTO 'result1';
 D = GROUP data BY a; 
 E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c);
 STORE E into 'result2';
  
 Here is the exception from the logs
 java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast 
 to org.apache.pig.data.DataBag
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:264)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:254)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:196)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:174)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:63)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:906)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:786)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228)
   at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2206)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-976) Multi-query optimization throws ClassCastException

2009-10-13 Thread Richard Ding (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-976:
-

Status: Patch Available  (was: Open)

 Multi-query optimization throws ClassCastException
 --

 Key: PIG-976
 URL: https://issues.apache.org/jira/browse/PIG-976
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
Reporter: Ankur
Assignee: Richard Ding
 Attachments: PIG-976.patch, PIG-976.patch, PIG-976.patch, 
 PIG-976.patch


 Multi-query optimization fails to merge 2 branches when 1 is a result of 
 Group By ALL and another is a result of Group By field1 where field 1 is of 
 type long. Here is the script that fails with multi-query on.
 data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); 
 A = GROUP data ALL;
 B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2;
 C = FOREACH B GENERATE (sum1/sum2) AS rate; 
 STORE C INTO 'result1';
 D = GROUP data BY a; 
 E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c);
 STORE E into 'result2';
  
 Here is the exception from the logs
 java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast 
 to org.apache.pig.data.DataBag
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:264)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:254)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:196)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:174)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:63)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:906)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:786)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228)
   at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2206)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

LocalRearrange out of bounds exception - tips for debugging?

2009-10-13 Thread Dmitriy Ryaboy

We ran into what looks like some edge case bug in Pig, which causes it
to throw an IndexOutOfBoundsException (stack trace below).  The script
just joins two relations; it looks like our data was generated
incorrectly, and the join is empty, which may be what's causing the
failure. It also appears to only happen when at least one of the
inputs is on the large size (at least a few hundred megs).  Any ideas
on what could be happening and how to zoom in on the underlying cause?
 We are running off unmodified trunk.

Script:

register datagen.jar;
E =  load 'Employee' using
org.apache.pig.test.utils.datagen.PigPerformanceLoader() as
(id,name,cc,dc);
D =  load 'Department' using
org.apache.pig.test.utils.datagen.PigPerformanceLoader() as
(dept_id,dept_nm);
P =  load 'Project' using
org.apache.pig.test.utils.datagen.PigPerformanceLoader() as
(id,emp_id,role);
R1 = JOIN E by dc, D by dept_id;
R2 = JOIN R1 by E::id, P by emp_id;
store R2 into 'TestCase2Output';

R2 join fails with the stack trace below. It also fails if we
pre-calculate R1, store it, and load it directly (so, load R1, load P,
join R1 by $0, P by emp_id). We've verified that the records in R1 and
R2 have the expected fields, etc.


Stack Trace:

java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:148)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:226)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:260)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

[jira] Commented: (PIG-966) Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces

2009-10-13 Thread Dmitriy V. Ryaboy (JIRA)

[
https://issues.apache.org/jira/browse/PIG-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765172#action_12765172
]

Dmitriy V. Ryaboy commented on PIG-966:
---

Alan, thanks for the explanation on the kinds of pushdowns you are envisioning.
This makes sense, although I have a feeling that if we get this complex with
pushdowns, it may be more appropriate to start thinking of interfaces that
expose different access paths, rather than pushdownable operations.

Starting to think perhaps you are right in wanting to make this a single
interface instead of multiple ones like I suggested.

A couple more thoughts on the LoadPushdown interface.

getFeatures() should probably return a Set, not a List, as duplicates don't
really make sense and we want fast contains() calls on the returned object.

The new idea is just a small tweak on your design that aims to avoid the
OperatorPlan issue.

Maintain a Set of LogicalOperator classes (as in, LOProject.class) to indicate
acceptable operators, and provide an pushOperator(LogicalOperator op) method,
which can be called multiple times. If the order of operators matters, it
should be up to whoever is calling this method to do so in the right order.

This does force LoadFunc implementations to understand Pig operator classes,
and in the case of Filter it does have to deal with an inner LogicalPlan, but I
think those classes are mostly ok. If someone is advanced enough to want to
implement pushdowns, they can handle those interfaces. There is the danger of
the interfaces changing, of course, but, well, that consideration hasn't
stopped Hadoop... and we are setting a precedent by breaking the LoadFunc
interface right now anyway :-).

Too simple?

Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces
---

Key: PIG-966
URL: https://issues.apache.org/jira/browse/PIG-966
Project: Pig
Issue Type: Improvement
Components: impl
Reporter: Alan Gates
Assignee: Alan Gates

I propose that we rework the LoadFunc, StoreFunc, and Slice/r interfaces
significantly. See http://wiki.apache.org/pig/LoadStoreRedesignProposal for
full details

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1016) Reading in map data seems broken

[
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765183#action_12765183
]

Hadoop QA commented on PIG-1016:

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12421949/PIG-1016.patch
against trunk revision 824446.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/76/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/76/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/76/console

This message is automatically generated.

Reading in map data seems broken

Key: PIG-1016
URL: https://issues.apache.org/jira/browse/PIG-1016
Project: Pig
Issue Type: Improvement
Components: data
Affects Versions: 0.4.0
Reporter: hc busy
Attachments: PIG-1016.patch

Hi, I'm trying to load a map that has a tuple for value. The read fails in
0.4.0 because of a misconfiguration in the parser. Where as in almost all
documentation it is stated that value of the map can be any time.
I've attached a patch that allows us to read in complex objects as value as
documented. I've done simple verification of loading in maps with tuple/map
values and writing them back out using LOAD and STORE. All seems to work fine.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1014) Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records

2009-10-13 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765192#action_12765192
 ] 

Pradeep Kamath commented on PIG-1014:
-

The issue I see is with the implementation of COUNT today. It looks at only the 
first field in the bag and counts only non null values towards the result. This 
can lead to mysterious results. Consider a relation (A) with two fields with 
the following contents:
{noformat}
1 2
3 4
null 6
7 null
null null
{noformat}

If we have the following snippet:
{code}
B = group A all;
C = foreach B generate COUNT(A);
{code}

The answer is 3 which was arrived at only by considering record 1, record 2 and 
record 4 since the other records have null in the first position. Ironically 
though record 4 has null in the second position that does not prevent it from 
being not counted. So the result being based on the null-ness of just the first 
field seems somewhat arbitrary. My concern is that most users would not know 
that the result was arrived at *after* dropping records which had null in the 
first field even though they did not specify COUNT(A.$0).  Status Quo means we 
equate COUNT(A) to COUNT(A.$0) which is also not apparent to users.

 Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all 
 records are counted without considering nullness of the fields in the records
 

 Key: PIG-1014
 URL: https://issues.apache.org/jira/browse/PIG-1014
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Pradeep Kamath



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1014) Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records

2009-10-13 Thread Santhosh Srinivasan (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765194#action_12765194
 ] 

Santhosh Srinivasan commented on PIG-1014:
--

Essentially, Pradeep is pointing out an issue in the implementation of COUNT. 
If that is the case then COUNT has to be fixed or the semantics of COUNT has to 
be documented to explain the current implementation. I would vote for fixing 
COUNT to have the correct semantics.

 Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all 
 records are counted without considering nullness of the fields in the records
 

 Key: PIG-1014
 URL: https://issues.apache.org/jira/browse/PIG-1014
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Pradeep Kamath



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: LocalRearrange out of bounds exception - tips for debugging?

2009-10-13 Thread Alan Gates

Have you checked that each record your input data has at least the  
number of fields you specify?  Have you checked that the field  
separator in your data matches the default for PigPerformanceLoader  
(^A I think)?


Alan.

On Oct 13, 2009, at 10:28 AM, Dmitriy Ryaboy wrote:


We ran into what looks like some edge case bug in Pig, which causes it
to throw an IndexOutOfBoundsException (stack trace below).  The script
just joins two relations; it looks like our data was generated
incorrectly, and the join is empty, which may be what's causing the
failure. It also appears to only happen when at least one of the
inputs is on the large size (at least a few hundred megs).  Any ideas
on what could be happening and how to zoom in on the underlying cause?
We are running off unmodified trunk.

Script:

register datagen.jar;
E =  load 'Employee' using
org.apache.pig.test.utils.datagen.PigPerformanceLoader() as
(id,name,cc,dc);
D =  load 'Department' using
org.apache.pig.test.utils.datagen.PigPerformanceLoader() as
(dept_id,dept_nm);
P =  load 'Project' using
org.apache.pig.test.utils.datagen.PigPerformanceLoader() as
(id,emp_id,role);
R1 = JOIN E by dc, D by dept_id;
R2 = JOIN R1 by E::id, P by emp_id;
store R2 into 'TestCase2Output';

R2 join fails with the stack trace below. It also fails if we
pre-calculate R1, store it, and load it directly (so, load R1, load P,
join R1 by $0, P by emp_id). We've verified that the records in R1 and
R2 have the expected fields, etc.


Stack Trace:

java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
   at java.util.ArrayList.RangeCheck(ArrayList.java:547)
   at java.util.ArrayList.get(ArrayList.java:322)
   at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
   at  
org 
.apache 
.pig 
.backend 
.hadoop 
.executionengine 
.physicalLayer.expressionOperators.POProject.getNext(POProject.java: 
148)
   at  
org 
.apache 
.pig 
.backend 
.hadoop 
.executionengine 
.physicalLayer.expressionOperators.POProject.getNext(POProject.java: 
226)
   at  
org 
.apache 
.pig 
.backend 
.hadoop 
.executionengine 
.physicalLayer 
.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java: 
260)
   at  
org 
.apache 
.pig 
.backend 
.hadoop 
.executionengine 
.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
   at  
org 
.apache 
.pig 
.backend 
.hadoop 
.executionengine 
.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
   at  
org 
.apache 
.pig 
.backend 
.hadoop 
.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
   at  
org 
.apache 
.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce 
$Map.map(PigMapReduce.java:93)

   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java: 
358)

   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)

[jira] Commented: (PIG-976) Multi-query optimization throws ClassCastException


[ 
https://issues.apache.org/jira/browse/PIG-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765216#action_12765216
 ] 

Hadoop QA commented on PIG-976:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422000/PIG-976.patch
  against trunk revision 824838.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 8 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 3 new Findbugs warnings.

-1 release audit.  The applied patch generated 295 release audit warnings 
(more than the trunk's current 292 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/22/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/22/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/22/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/22/console

This message is automatically generated.

 Multi-query optimization throws ClassCastException
 --

 Key: PIG-976
 URL: https://issues.apache.org/jira/browse/PIG-976
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
Reporter: Ankur
Assignee: Richard Ding
 Attachments: PIG-976.patch, PIG-976.patch, PIG-976.patch, 
 PIG-976.patch


 Multi-query optimization fails to merge 2 branches when 1 is a result of 
 Group By ALL and another is a result of Group By field1 where field 1 is of 
 type long. Here is the script that fails with multi-query on.
 data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); 
 A = GROUP data ALL;
 B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2;
 C = FOREACH B GENERATE (sum1/sum2) AS rate; 
 STORE C INTO 'result1';
 D = GROUP data BY a; 
 E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c);
 STORE E into 'result2';
  
 Here is the exception from the logs
 java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast 
 to org.apache.pig.data.DataBag
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:264)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:254)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:196)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:174)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:63)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:906)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:786)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228)
   at

[jira] Updated: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries


 [ 
https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1020:


Status: Open  (was: Patch Available)

 Include an ant target to build pig.jar without hadoop libraries
 ---

 Key: PIG-1020
 URL: https://issues.apache.org/jira/browse/PIG-1020
 Project: Pig
  Issue Type: New Feature
  Components: build
Affects Versions: 0.4.0
Reporter: Daniel Dai
Assignee: Daniel Dai
Priority: Minor
 Fix For: 0.6.0

 Attachments: PIG-1020-1.patch, PIG-1020-2.patch


 Provide an ant target to build pig.jar without all hadoop related libraries. 
 User will provide external hadoop jars in classpath before invoking pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries


 [ 
https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1020:


Attachment: PIG-1020-2.patch

Change the test target to depend on jar-withouthadoop rather than jar

 Include an ant target to build pig.jar without hadoop libraries
 ---

 Key: PIG-1020
 URL: https://issues.apache.org/jira/browse/PIG-1020
 Project: Pig
  Issue Type: New Feature
  Components: build
Affects Versions: 0.4.0
Reporter: Daniel Dai
Assignee: Daniel Dai
Priority: Minor
 Fix For: 0.6.0

 Attachments: PIG-1020-1.patch, PIG-1020-2.patch


 Provide an ant target to build pig.jar without all hadoop related libraries. 
 User will provide external hadoop jars in classpath before invoking pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Work started: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries


 [ 
https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on PIG-1020 started by Daniel Dai.

 Include an ant target to build pig.jar without hadoop libraries
 ---

 Key: PIG-1020
 URL: https://issues.apache.org/jira/browse/PIG-1020
 Project: Pig
  Issue Type: New Feature
  Components: build
Affects Versions: 0.4.0
Reporter: Daniel Dai
Assignee: Daniel Dai
Priority: Minor
 Fix For: 0.6.0

 Attachments: PIG-1020-1.patch, PIG-1020-2.patch


 Provide an ant target to build pig.jar without all hadoop related libraries. 
 User will provide external hadoop jars in classpath before invoking pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries


 [ 
https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1020:


Status: Patch Available  (was: In Progress)

 Include an ant target to build pig.jar without hadoop libraries
 ---

 Key: PIG-1020
 URL: https://issues.apache.org/jira/browse/PIG-1020
 Project: Pig
  Issue Type: New Feature
  Components: build
Affects Versions: 0.4.0
Reporter: Daniel Dai
Assignee: Daniel Dai
Priority: Minor
 Fix For: 0.6.0

 Attachments: PIG-1020-1.patch, PIG-1020-2.patch


 Provide an ant target to build pig.jar without all hadoop related libraries. 
 User will provide external hadoop jars in classpath before invoking pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-921) Strange use case for Join which produces different results in local and map reduce mode


[ 
https://issues.apache.org/jira/browse/PIG-921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765275#action_12765275
 ] 

Daniel Dai commented on PIG-921:


The result should be ((1,a),(1,b)), ((2,aa),(2,bb). Map-reduce mode produces 
wrong result.

 Strange use case for Join which produces different results in local and map 
 reduce mode
 ---

 Key: PIG-921
 URL: https://issues.apache.org/jira/browse/PIG-921
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
 Environment: Hadoop 18 and Hadoop 20
Reporter: Viraj Bhat
 Attachments: A.txt, B.txt, joinusecase.pig


 I have script in this manner, loads from 2 files A.txt and B.txt
 {code}
 A = LOAD 'A.txt' as (a:tuple(a1:int, a2:chararray));
 B = LOAD 'B.txt' as (b:tuple(b1:int, b2:chararray));
 C = JOIN A by a.a1, B by b.b1;
 DESCRIBE C;
 DUMP C;
 {code}
 A.txt contains the following lines:
 {code}
 (1,a)
 (2,aa)
 {code}
 B.txt contains the following lines:
 {code}
 (1,b)
 (2,bb)
 {code}
 Now running the above script in local and map reduce mode on Hadoop 18  
 Hadoop 20, produces the following:
 Hadoop 18
 =
 (1,1)
 (2,2)
 =
 Hadoop 20
 =
 (1,1)
 (2,2)
 =
 Local Mode: Pig with Hadoop 18 jar release 
 =
 2009-08-13 17:15:13,473 [main] INFO  org.apache.pig.Main - Logging error 
 messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 09/08/13 17:15:13 INFO pig.Main: Logging error messages to: 
 /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 C: {a: (a1: int,a2: chararray),b: (b1: int,b2: chararray)}
 2009-08-13 17:15:13,932 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1002: Unable to store alias C
 09/08/13 17:15:13 ERROR grunt.Grunt: ERROR 1002: Unable to store alias C
 Details at logfile: 
 /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 =
 Caused by: java.lang.NullPointerException
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109)
 at 
 org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165)
 ... 9 more
 =
 Local Mode: Pig with Hadoop 20 jar release
 =
 ((1,a),(1,b))
 ((2,aa),(2,bb)
 =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (PIG-921) Strange use case for Join which produces different results in local and map reduce mode


 [ 
https://issues.apache.org/jira/browse/PIG-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-921:
--

Assignee: Daniel Dai

 Strange use case for Join which produces different results in local and map 
 reduce mode
 ---

 Key: PIG-921
 URL: https://issues.apache.org/jira/browse/PIG-921
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
 Environment: Hadoop 18 and Hadoop 20
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Attachments: A.txt, B.txt, joinusecase.pig


 I have script in this manner, loads from 2 files A.txt and B.txt
 {code}
 A = LOAD 'A.txt' as (a:tuple(a1:int, a2:chararray));
 B = LOAD 'B.txt' as (b:tuple(b1:int, b2:chararray));
 C = JOIN A by a.a1, B by b.b1;
 DESCRIBE C;
 DUMP C;
 {code}
 A.txt contains the following lines:
 {code}
 (1,a)
 (2,aa)
 {code}
 B.txt contains the following lines:
 {code}
 (1,b)
 (2,bb)
 {code}
 Now running the above script in local and map reduce mode on Hadoop 18  
 Hadoop 20, produces the following:
 Hadoop 18
 =
 (1,1)
 (2,2)
 =
 Hadoop 20
 =
 (1,1)
 (2,2)
 =
 Local Mode: Pig with Hadoop 18 jar release 
 =
 2009-08-13 17:15:13,473 [main] INFO  org.apache.pig.Main - Logging error 
 messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 09/08/13 17:15:13 INFO pig.Main: Logging error messages to: 
 /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 C: {a: (a1: int,a2: chararray),b: (b1: int,b2: chararray)}
 2009-08-13 17:15:13,932 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1002: Unable to store alias C
 09/08/13 17:15:13 ERROR grunt.Grunt: ERROR 1002: Unable to store alias C
 Details at logfile: 
 /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 =
 Caused by: java.lang.NullPointerException
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109)
 at 
 org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165)
 ... 9 more
 =
 Local Mode: Pig with Hadoop 20 jar release
 =
 ((1,a),(1,b))
 ((2,aa),(2,bb)
 =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries


 [ 
https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1020:


Status: Open  (was: Patch Available)

 Include an ant target to build pig.jar without hadoop libraries
 ---

 Key: PIG-1020
 URL: https://issues.apache.org/jira/browse/PIG-1020
 Project: Pig
  Issue Type: New Feature
  Components: build
Affects Versions: 0.4.0
Reporter: Daniel Dai
Assignee: Daniel Dai
Priority: Minor
 Fix For: 0.6.0

 Attachments: PIG-1020-1.patch, PIG-1020-2.patch


 Provide an ant target to build pig.jar without all hadoop related libraries. 
 User will provide external hadoop jars in classpath before invoking pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-921) Strange use case for Join which produces different results in local and map reduce mode


 [ 
https://issues.apache.org/jira/browse/PIG-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-921:
---

Attachment: PIG-921-1.patch

The problem is in POLocalReArragement, we skip the entire tuple in the value if 
we use one field of the tuple as join key.

 Strange use case for Join which produces different results in local and map 
 reduce mode
 ---

 Key: PIG-921
 URL: https://issues.apache.org/jira/browse/PIG-921
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
 Environment: Hadoop 18 and Hadoop 20
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: A.txt, B.txt, joinusecase.pig, PIG-921-1.patch


 I have script in this manner, loads from 2 files A.txt and B.txt
 {code}
 A = LOAD 'A.txt' as (a:tuple(a1:int, a2:chararray));
 B = LOAD 'B.txt' as (b:tuple(b1:int, b2:chararray));
 C = JOIN A by a.a1, B by b.b1;
 DESCRIBE C;
 DUMP C;
 {code}
 A.txt contains the following lines:
 {code}
 (1,a)
 (2,aa)
 {code}
 B.txt contains the following lines:
 {code}
 (1,b)
 (2,bb)
 {code}
 Now running the above script in local and map reduce mode on Hadoop 18  
 Hadoop 20, produces the following:
 Hadoop 18
 =
 (1,1)
 (2,2)
 =
 Hadoop 20
 =
 (1,1)
 (2,2)
 =
 Local Mode: Pig with Hadoop 18 jar release 
 =
 2009-08-13 17:15:13,473 [main] INFO  org.apache.pig.Main - Logging error 
 messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 09/08/13 17:15:13 INFO pig.Main: Logging error messages to: 
 /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 C: {a: (a1: int,a2: chararray),b: (b1: int,b2: chararray)}
 2009-08-13 17:15:13,932 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1002: Unable to store alias C
 09/08/13 17:15:13 ERROR grunt.Grunt: ERROR 1002: Unable to store alias C
 Details at logfile: 
 /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 =
 Caused by: java.lang.NullPointerException
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109)
 at 
 org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165)
 ... 9 more
 =
 Local Mode: Pig with Hadoop 20 jar release
 =
 ((1,a),(1,b))
 ((2,aa),(2,bb)
 =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries


 [ 
https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1020:


Status: Patch Available  (was: Open)

 Include an ant target to build pig.jar without hadoop libraries
 ---

 Key: PIG-1020
 URL: https://issues.apache.org/jira/browse/PIG-1020
 Project: Pig
  Issue Type: New Feature
  Components: build
Affects Versions: 0.4.0
Reporter: Daniel Dai
Assignee: Daniel Dai
Priority: Minor
 Fix For: 0.6.0

 Attachments: PIG-1020-1.patch, PIG-1020-2.patch


 Provide an ant target to build pig.jar without all hadoop related libraries. 
 User will provide external hadoop jars in classpath before invoking pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-921) Strange use case for Join which produces different results in local and map reduce mode


 [ 
https://issues.apache.org/jira/browse/PIG-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-921:
---

Fix Version/s: 0.6.0
Affects Version/s: (was: 0.3.0)
   0.4.0
   Status: Patch Available  (was: Open)

 Strange use case for Join which produces different results in local and map 
 reduce mode
 ---

 Key: PIG-921
 URL: https://issues.apache.org/jira/browse/PIG-921
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
 Environment: Hadoop 18 and Hadoop 20
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: A.txt, B.txt, joinusecase.pig, PIG-921-1.patch


 I have script in this manner, loads from 2 files A.txt and B.txt
 {code}
 A = LOAD 'A.txt' as (a:tuple(a1:int, a2:chararray));
 B = LOAD 'B.txt' as (b:tuple(b1:int, b2:chararray));
 C = JOIN A by a.a1, B by b.b1;
 DESCRIBE C;
 DUMP C;
 {code}
 A.txt contains the following lines:
 {code}
 (1,a)
 (2,aa)
 {code}
 B.txt contains the following lines:
 {code}
 (1,b)
 (2,bb)
 {code}
 Now running the above script in local and map reduce mode on Hadoop 18  
 Hadoop 20, produces the following:
 Hadoop 18
 =
 (1,1)
 (2,2)
 =
 Hadoop 20
 =
 (1,1)
 (2,2)
 =
 Local Mode: Pig with Hadoop 18 jar release 
 =
 2009-08-13 17:15:13,473 [main] INFO  org.apache.pig.Main - Logging error 
 messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 09/08/13 17:15:13 INFO pig.Main: Logging error messages to: 
 /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 C: {a: (a1: int,a2: chararray),b: (b1: int,b2: chararray)}
 2009-08-13 17:15:13,932 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1002: Unable to store alias C
 09/08/13 17:15:13 ERROR grunt.Grunt: ERROR 1002: Unable to store alias C
 Details at logfile: 
 /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 =
 Caused by: java.lang.NullPointerException
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109)
 at 
 org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165)
 ... 9 more
 =
 Local Mode: Pig with Hadoop 20 jar release
 =
 ((1,a),(1,b))
 ((2,aa),(2,bb)
 =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1016) Reading in map data seems broken


 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1016:
-

Attachment: (was: PIG-1016.patch)

 Reading in map data seems broken
 

 Key: PIG-1016
 URL: https://issues.apache.org/jira/browse/PIG-1016
 Project: Pig
  Issue Type: Improvement
  Components: data
Affects Versions: 0.4.0
Reporter: hc busy

 Hi, I'm trying to load a map that has a tuple for value. The read fails in 
 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
 documentation it is stated that value of the map can be any time.
 I've attached a patch that allows us to read in complex objects as value as 
 documented. I've done simple verification of loading in maps with tuple/map 
 values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1016) Reading in map data seems broken


 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1016:
-

Status: Open  (was: Patch Available)

Didn't pass a few other affected unit tests

 Reading in map data seems broken
 

 Key: PIG-1016
 URL: https://issues.apache.org/jira/browse/PIG-1016
 Project: Pig
  Issue Type: Improvement
  Components: data
Affects Versions: 0.4.0
Reporter: hc busy
 Attachments: PIG-1016.patch


 Hi, I'm trying to load a map that has a tuple for value. The read fails in 
 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
 documentation it is stated that value of the map can be any time.
 I've attached a patch that allows us to read in complex objects as value as 
 documented. I've done simple verification of loading in maps with tuple/map 
 values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1016) Reading in map data seems broken


 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1016:
-

Status: Patch Available  (was: Open)

 Reading in map data seems broken
 

 Key: PIG-1016
 URL: https://issues.apache.org/jira/browse/PIG-1016
 Project: Pig
  Issue Type: Improvement
  Components: data
Affects Versions: 0.4.0
Reporter: hc busy
 Attachments: PIG-1016.patch


 Hi, I'm trying to load a map that has a tuple for value. The read fails in 
 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
 documentation it is stated that value of the map can be any time.
 I've attached a patch that allows us to read in complex objects as value as 
 documented. I've done simple verification of loading in maps with tuple/map 
 values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1016) Reading in map data seems broken


 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1016:
-

Attachment: PIG-1016.patch

Sorry, first time contributor. This submit includes the fix and fixes several 
unit tests that failed

 Reading in map data seems broken
 

 Key: PIG-1016
 URL: https://issues.apache.org/jira/browse/PIG-1016
 Project: Pig
  Issue Type: Improvement
  Components: data
Affects Versions: 0.4.0
Reporter: hc busy
 Attachments: PIG-1016.patch


 Hi, I'm trying to load a map that has a tuple for value. The read fails in 
 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
 documentation it is stated that value of the map can be any time.
 I've attached a patch that allows us to read in complex objects as value as 
 documented. I've done simple verification of loading in maps with tuple/map 
 values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1016) Reading in map data seems broken

2009-10-13 Thread Dmitriy V. Ryaboy (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765302#action_12765302
 ] 

Dmitriy V. Ryaboy commented on PIG-1016:


No worries, we are used to Jira sending us a never-ending stream of updates :-).
Looks good to me (assuming this passes Hudson).

 Reading in map data seems broken
 

 Key: PIG-1016
 URL: https://issues.apache.org/jira/browse/PIG-1016
 Project: Pig
  Issue Type: Improvement
  Components: data
Affects Versions: 0.4.0
Reporter: hc busy
 Attachments: PIG-1016.patch


 Hi, I'm trying to load a map that has a tuple for value. The read fails in 
 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
 documentation it is stated that value of the map can be any time.
 I've attached a patch that allows us to read in complex objects as value as 
 documented. I've done simple verification of loading in maps with tuple/map 
 values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries

[
https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765311#action_12765311
]

Hadoop QA commented on PIG-1020:

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12422019/PIG-1020-2.patch
against trunk revision 824838.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/23/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/23/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/23/console

This message is automatically generated.

Include an ant target to build pig.jar without hadoop libraries
---

Attachments: PIG-1020-1.patch, PIG-1020-2.patch

Provide an ant target to build pig.jar without all hadoop related libraries.
User will provide external hadoop jars in classpath before invoking pig.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-968) findContainingJar fails when there's a + in the path


 [ 
https://issues.apache.org/jira/browse/PIG-968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-968.


Resolution: Fixed

Patch checked in.  Thanks Todd.

 findContainingJar fails when there's a + in the path
 

 Key: PIG-968
 URL: https://issues.apache.org/jira/browse/PIG-968
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0, 0.5.0
Reporter: Todd Lipcon
 Attachments: pig-968.txt


 This is the same bug as in MAPREDUCE-714. Please see discussion there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-858) Order By followed by replicated join fails while compiling MR-plan from physical plan


[ 
https://issues.apache.org/jira/browse/PIG-858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765334#action_12765334
 ] 

Alan Gates commented on PIG-858:


I'm reviewing this patch.

 Order By followed by replicated join fails while compiling MR-plan from 
 physical plan
 ---

 Key: PIG-858
 URL: https://issues.apache.org/jira/browse/PIG-858
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.6.0

 Attachments: pig-858.patch


 Consider the query:
 {code}
 A = load 'a';
 B = order A by $0;
 C = join A by $0, B by $0;
 explain C;
 {code}
 works. But if replicated join is used instead
 {code}
 A = load 'a';
 B = order A by $0;
 C = join A by $0, B by $0 using replicated;
 explain C;
 {code}
 this fails with ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2034: Error 
 compiling operator POFRJoin
 relevant stacktrace:
 {code}
 Caused by: java.lang.RuntimeException: 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException:
  ERROR 2034: Error compiling operator POFRJoin
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:306)
 at org.apache.pig.PigServer.explain(PigServer.java:574)
 ... 8 more
 Caused by: 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException:
  ERROR 2034: Error compiling operator POFRJoin
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:942)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:173)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:342)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:327)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:233)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:301)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.explain(MapReduceLauncher.java:278)
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:303)
 ... 9 more
 Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:901)
 ... 16 more
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1014) Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records

2009-10-13 Thread Santhosh Srinivasan (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765357#action_12765357
 ] 

Santhosh Srinivasan commented on PIG-1014:
--

After a discussion with Pradeep who also graciously ran SQL queries to verify 
semantics, we have the following proposal:

The semantics of COUNT could be defined as:

1. COUNT( A ) is equivalent to COUNT( A.* ) and the result of COUNT( A ) will 
count null tuples in the relation
2. COUNT( A.$0) will not count null tuples in the relation

3. COUNT(A.($0, $1)) is equivalent to COUNT( A1.* ) where A1 is the relation 
containing tuples with two columns and will exhibit the behavior of statement 1

OR 

3. COUNT(A.($0, $1)) is equivalent to COUNT( A1.* ) where A1 is the relation 
containing tuples with two columns and will exhibit the behavior of statement 2

Point 3 needs more discussion.

Comments/thoughts/suggestions/anything else welcome.


 Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all 
 records are counted without considering nullness of the fields in the records
 

 Key: PIG-1014
 URL: https://issues.apache.org/jira/browse/PIG-1014
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Pradeep Kamath



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-976) Multi-query optimization throws ClassCastException

2009-10-13 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765360#action_12765360
 ] 

Pradeep Kamath commented on PIG-976:


+1 changes look good - please address the findbugs and release audit warnings 
if appropriate.

 Multi-query optimization throws ClassCastException
 --

 Key: PIG-976
 URL: https://issues.apache.org/jira/browse/PIG-976
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
Reporter: Ankur
Assignee: Richard Ding
 Attachments: PIG-976.patch, PIG-976.patch, PIG-976.patch, 
 PIG-976.patch


 Multi-query optimization fails to merge 2 branches when 1 is a result of 
 Group By ALL and another is a result of Group By field1 where field 1 is of 
 type long. Here is the script that fails with multi-query on.
 data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); 
 A = GROUP data ALL;
 B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2;
 C = FOREACH B GENERATE (sum1/sum2) AS rate; 
 STORE C INTO 'result1';
 D = GROUP data BY a; 
 E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c);
 STORE E into 'result2';
  
 Here is the exception from the logs
 java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast 
 to org.apache.pig.data.DataBag
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:264)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:254)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:196)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:174)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:63)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:906)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:786)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228)
   at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2206)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries

[
https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765366#action_12765366
]

Hadoop QA commented on PIG-1020:

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12422019/PIG-1020-2.patch
against trunk revision 824838.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/24/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/24/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/24/console

This message is automatically generated.

Include an ant target to build pig.jar without hadoop libraries
---

Attachments: PIG-1020-1.patch, PIG-1020-2.patch

Provide an ant target to build pig.jar without all hadoop related libraries.
User will provide external hadoop jars in classpath before invoking pig.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1017) Converts strings to text in Pig

2009-10-13 Thread Sriranjan Manjunath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765380#action_12765380
 ] 

Sriranjan Manjunath commented on PIG-1017:
--

Pigmix results before and after converting strings to text:

||Pigmix query||Trunk||Modified code||
|L1| 3:2|2:24|
|L2| 2:6|1:23|
|L3| 3:36|3:49|
|L4| 1:42|1:49|
|L5| 1:49|1:49|
|L6| 1:47|3:3|
|L7| 1:44|1:49|
|L8| 1:19|1:18|
|L9| 4:6|5:35|
|L10| 8:52|7:56|
|L11| 2:26|1:34|
|L12| 1:57|1:54|


 Converts strings to text in Pig
 ---

 Key: PIG-1017
 URL: https://issues.apache.org/jira/browse/PIG-1017
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath

 Strings in Java are UTF-16 and takes 2 bytes. Text 
 (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show 
 significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1017) Converts strings to text in Pig

2009-10-13 Thread Sriranjan Manjunath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765381#action_12765381
 ] 

Sriranjan Manjunath commented on PIG-1017:
--

Something fishy is going on. I ran L6 a couple more times with the modified 
code and it completed in 1:8

 Converts strings to text in Pig
 ---

 Key: PIG-1017
 URL: https://issues.apache.org/jira/browse/PIG-1017
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath

 Strings in Java are UTF-16 and takes 2 bytes. Text 
 (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show 
 significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-921) Strange use case for Join which produces different results in local and map reduce mode


[ 
https://issues.apache.org/jira/browse/PIG-921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12765385#action_12765385
 ] 

Hadoop QA commented on PIG-921:
---

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422030/PIG-921-1.patch
  against trunk revision 824980.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/78/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/78/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/78/console

This message is automatically generated.

 Strange use case for Join which produces different results in local and map 
 reduce mode
 ---

 Key: PIG-921
 URL: https://issues.apache.org/jira/browse/PIG-921
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
 Environment: Hadoop 18 and Hadoop 20
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: A.txt, B.txt, joinusecase.pig, PIG-921-1.patch


 I have script in this manner, loads from 2 files A.txt and B.txt
 {code}
 A = LOAD 'A.txt' as (a:tuple(a1:int, a2:chararray));
 B = LOAD 'B.txt' as (b:tuple(b1:int, b2:chararray));
 C = JOIN A by a.a1, B by b.b1;
 DESCRIBE C;
 DUMP C;
 {code}
 A.txt contains the following lines:
 {code}
 (1,a)
 (2,aa)
 {code}
 B.txt contains the following lines:
 {code}
 (1,b)
 (2,bb)
 {code}
 Now running the above script in local and map reduce mode on Hadoop 18  
 Hadoop 20, produces the following:
 Hadoop 18
 =
 (1,1)
 (2,2)
 =
 Hadoop 20
 =
 (1,1)
 (2,2)
 =
 Local Mode: Pig with Hadoop 18 jar release 
 =
 2009-08-13 17:15:13,473 [main] INFO  org.apache.pig.Main - Logging error 
 messages to: /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 09/08/13 17:15:13 INFO pig.Main: Logging error messages to: 
 /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 C: {a: (a1: int,a2: chararray),b: (b1: int,b2: chararray)}
 2009-08-13 17:15:13,932 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1002: Unable to store alias C
 09/08/13 17:15:13 ERROR grunt.Grunt: ERROR 1002: Unable to store alias C
 Details at logfile: 
 /homes/viraj/pig-svn/trunk/pigscripts/pig_1250208913472.log
 =
 Caused by: java.lang.NullPointerException
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109)
 at 
 org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165)
 ... 9 more

[jira] Created: (PIG-1021) Cast of nested types does work as expected

Cast of nested types does work as expected
--

 Key: PIG-1021
 URL: https://issues.apache.org/jira/browse/PIG-1021
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
Reporter: Daniel Dai
 Fix For: 0.6.0


The following script does not work as expected:

1.txt:
(0.2,0.3)

a = load '1.txt';
b = foreach a generate (tuple(int, int))$0;

describe b;
b: {(int,int)}

dump b;
((0.2,0.3))

The expect result is ((0, 0))


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-1021) Cast of nested types does work as expected