[jira] Created: (PIG-994) Provide 'append' keyword to allow appending to diferent dataset on pig 2.3 as it is now on hadoop 0.20(which has append feature)

2009-10-05 Thread Rekha (JIRA)
Provide 'append' keyword to allow appending to diferent dataset on pig 2.3 as 
it is now on hadoop 0.20(which has append feature)


 Key: PIG-994
 URL: https://issues.apache.org/jira/browse/PIG-994
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.4.0
 Environment: Grid clusters
Reporter: Rekha
Priority: Minor


Provide 'append' keyword to allow appending to diferent dataset on pig 2.3 as 
it is now on hadoop 0.20(which has append feature)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-994) Provide 'append' keyword to allow appending to diferent dataset on pig 2.3 as it is now on hadoop 0.20(which has append feature)

2009-10-05 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762239#action_12762239
 ] 

Olga Natkovich commented on PIG-994:


Hadoop 20 does not have append. It is coming in hadoop 21.

 Provide 'append' keyword to allow appending to diferent dataset on pig 2.3 as 
 it is now on hadoop 0.20(which has append feature)
 

 Key: PIG-994
 URL: https://issues.apache.org/jira/browse/PIG-994
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.4.0
 Environment: Grid clusters
Reporter: Rekha
Priority: Minor

 Provide 'append' keyword to allow appending to diferent dataset on pig 0.5.0 
 as it is now on hadoop 0.20(which has append feature)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-994) Provide 'append' keyword to allow appending to diferent dataset on pig 2.3 as it is now on hadoop 0.20(which has append feature)

2009-10-05 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-994:
---

Description: Provide 'append' keyword to allow appending to diferent 
dataset on pig 0.5.0 as it is now on hadoop 0.20(which has append feature)  
(was: Provide 'append' keyword to allow appending to diferent dataset on pig 
2.3 as it is now on hadoop 0.20(which has append feature))

 Provide 'append' keyword to allow appending to diferent dataset on pig 2.3 as 
 it is now on hadoop 0.20(which has append feature)
 

 Key: PIG-994
 URL: https://issues.apache.org/jira/browse/PIG-994
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.4.0
 Environment: Grid clusters
Reporter: Rekha
Priority: Minor

 Provide 'append' keyword to allow appending to diferent dataset on pig 0.5.0 
 as it is now on hadoop 0.20(which has append feature)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-994) Provide 'append' keyword to allow appending to diferent dataset once the feature is available in Hadoop

2009-10-05 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-994:
---

Summary: Provide 'append' keyword to allow appending to diferent dataset 
once the feature is available in Hadoop  (was: Provide 'append' keyword to 
allow appending to diferent dataset on pig 2.3 as it is now on hadoop 
0.20(which has append feature))

 Provide 'append' keyword to allow appending to diferent dataset once the 
 feature is available in Hadoop
 ---

 Key: PIG-994
 URL: https://issues.apache.org/jira/browse/PIG-994
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.4.0
 Environment: Grid clusters
Reporter: Rekha
Priority: Minor

 Provide 'append' keyword to allow appending to diferent dataset on pig 0.5.0 
 as it is now on hadoop 0.20(which has append feature)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-922) Logical optimizer: push up project

2009-10-05 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-922:
---

Status: Open  (was: Patch Available)

 Logical optimizer: push up project
 --

 Key: PIG-922
 URL: https://issues.apache.org/jira/browse/PIG-922
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.3.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, 
 PIG-922-p1_2.patch, PIG-922-p1_3.patch, PIG-922-p1_4.patch, 
 PIG-922-p2_preview.patch, PIG-922-p2_preview2.patch, PIG-922-p3_1.patch, 
 PIG-922-p3_2.patch, PIG-922-p3_3.patch, PIG-922-p3_4.patch, PIG-922-p3_5.patch


 This is a continuation work of 
 [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
 another rule to the logical optimizer: Push up project, ie, prune columns as 
 early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-922) Logical optimizer: push up project

2009-10-05 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-922:
---

Attachment: PIG-922-p3_5.patch

Resync with trunk

 Logical optimizer: push up project
 --

 Key: PIG-922
 URL: https://issues.apache.org/jira/browse/PIG-922
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.3.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, 
 PIG-922-p1_2.patch, PIG-922-p1_3.patch, PIG-922-p1_4.patch, 
 PIG-922-p2_preview.patch, PIG-922-p2_preview2.patch, PIG-922-p3_1.patch, 
 PIG-922-p3_2.patch, PIG-922-p3_3.patch, PIG-922-p3_4.patch, PIG-922-p3_5.patch


 This is a continuation work of 
 [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
 another rule to the logical optimizer: Push up project, ie, prune columns as 
 early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-922) Logical optimizer: push up project

2009-10-05 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-922:
---

Status: Patch Available  (was: Open)

 Logical optimizer: push up project
 --

 Key: PIG-922
 URL: https://issues.apache.org/jira/browse/PIG-922
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.3.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, 
 PIG-922-p1_2.patch, PIG-922-p1_3.patch, PIG-922-p1_4.patch, 
 PIG-922-p2_preview.patch, PIG-922-p2_preview2.patch, PIG-922-p3_1.patch, 
 PIG-922-p3_2.patch, PIG-922-p3_3.patch, PIG-922-p3_4.patch, PIG-922-p3_5.patch


 This is a continuation work of 
 [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
 another rule to the logical optimizer: Push up project, ie, prune columns as 
 early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-995) Limit Optimizer throw exception ERROR 2156: Error while fixing projections

2009-10-05 Thread Daniel Dai (JIRA)
Limit Optimizer throw exception ERROR 2156: Error while fixing projections


 Key: PIG-995
 URL: https://issues.apache.org/jira/browse/PIG-995
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
Reporter: Daniel Dai
 Fix For: 0.6.0


The following script fail:

A = load '1.txt' AS (a0, a1, a2);
B = order A by a1;
C = limit B 10;
D = foreach C generate $0;
dump D;

Error log:
Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 2156: Error while 
fixing projections. Projection map of node to be replaced is null.
at 
org.apache.pig.impl.logicalLayer.ProjectFixerUpper.visit(ProjectFixerUpper.java:138)
at org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:408)
at org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:58)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:65)
at 
org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at org.apache.pig.impl.logicalLayer.LOForEach.rewire(LOForEach.java:761)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-995) Limit Optimizer throw exception ERROR 2156: Error while fixing projections

2009-10-05 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-995:
--

Assignee: Daniel Dai

 Limit Optimizer throw exception ERROR 2156: Error while fixing projections
 

 Key: PIG-995
 URL: https://issues.apache.org/jira/browse/PIG-995
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0


 The following script fail:
 A = load '1.txt' AS (a0, a1, a2);
 B = order A by a1;
 C = limit B 10;
 D = foreach C generate $0;
 dump D;
 Error log:
 Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 2156: Error while 
 fixing projections. Projection map of node to be replaced is null.
 at 
 org.apache.pig.impl.logicalLayer.ProjectFixerUpper.visit(ProjectFixerUpper.java:138)
 at 
 org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:408)
 at org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:58)
 at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:65)
 at 
 org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
 at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
 at 
 org.apache.pig.impl.logicalLayer.LOForEach.rewire(LOForEach.java:761)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-05 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-993:
---

  Description: 
A Zebra table is stored as multiple sub tables each containing a set of columns 
called column group (CG). The user specifies how these columns are grouped 
while creating a table through the _storage hint_.

For some of the large tables, it might be necessary for users to remove a set 
of columns and retain the rest. This jira provides a way for users to delete an 
entire column group. 

The following comments will have more details on API and the semantics. 

  was:

A Zebra table is stored as multiple sub tables each containing a set of columns 
called column group (CG). The user specifies how these columns are grouped 
while creating a table through the _storage hint_.

For some of the large tables, it might be necessary for users to remove a set 
of columns and retain the rest. This jira provides a way for users to delete an 
entire column group. 

The following comments will have more details on API and the semantics. 

Fix Version/s: (was: 0.5.0)

 [zebra] Abitlity to drop a column group in a table
 --

 Key: PIG-993
 URL: https://issues.apache.org/jira/browse/PIG-993
 Project: Pig
  Issue Type: Bug
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch, 
 zebra-drop-cg.patch


 A Zebra table is stored as multiple sub tables each containing a set of 
 columns called column group (CG). The user specifies how these columns are 
 grouped while creating a table through the _storage hint_.
 For some of the large tables, it might be necessary for users to remove a set 
 of columns and retain the rest. This jira provides a way for users to delete 
 an entire column group. 
 The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-920) optimizing diamond queries

2009-10-05 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding reassigned PIG-920:


Assignee: Richard Ding

 optimizing diamond queries
 --

 Key: PIG-920
 URL: https://issues.apache.org/jira/browse/PIG-920
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Richard Ding

 The following query
 A = load 'foo';
 B = filer A by $01;
 C = filter A by $1 = 'foo';
 D = COGROUP C by $0, B by $0;
 ..
 does not get efficiently executed. Currently, it runs a map only job that 
 basically reads and write the same data before doing the query processing.
 Query where the data is loaded twice actually executed more efficiently.
 This is not an uncommon query and we should fix this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-922) Logical optimizer: push up project

2009-10-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762362#action_12762362
 ] 

Hadoop QA commented on PIG-922:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12421307/PIG-922-p3_5.patch
  against trunk revision 821101.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 27 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 2 new Findbugs warnings.

-1 release audit.  The applied patch generated 305 release audit warnings 
(more than the trunk's current 298 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/60/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/60/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/60/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/60/console

This message is automatically generated.

 Logical optimizer: push up project
 --

 Key: PIG-922
 URL: https://issues.apache.org/jira/browse/PIG-922
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.3.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, 
 PIG-922-p1_2.patch, PIG-922-p1_3.patch, PIG-922-p1_4.patch, 
 PIG-922-p2_preview.patch, PIG-922-p2_preview2.patch, PIG-922-p3_1.patch, 
 PIG-922-p3_2.patch, PIG-922-p3_3.patch, PIG-922-p3_4.patch, PIG-922-p3_5.patch


 This is a continuation work of 
 [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
 another rule to the logical optimizer: Push up project, ie, prune columns as 
 early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-986) [zebra] Zebra Column Group Naming Support

2009-10-05 Thread Chao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Wang updated PIG-986:
--


Patch reviewed. +1

 [zebra] Zebra Column Group Naming Support
 -

 Key: PIG-986
 URL: https://issues.apache.org/jira/browse/PIG-986
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.4.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.6.0

 Attachments: ColumnGroupName.patch


 We introduce column group name to Zebra and make it a first-class citizen in 
 Zebra. This can ease management of column groups.
 We plan to introduce an as clause for column group name in Zebra's syntax.
 Functional Specifications:
 1) Column group names are optional. For column groups which do not have a 
 user-provided name, Zebra will assign some default column group names 
 internally that is unique for that table - CG0, CG1, CG2 ... Note: If CGx is 
 used by user, then it can not be used for internal names.
 2) We introduce an AS clause in Zebra's syntax for column group names. If 
 it occurs, it has to immediately follow [ ]. For example, [a1, a2] as PI 
 secure by user:joe group:secure perm:640; [a3, a4] as General compress by 
 lzo. Note that keyword AS is case insensitive.
 3) Column group names are unique within one table and are case sensitive, 
 i.e., c1 and C1 are different.
 4) Column group names will be used as the physical column group directory 
 path names.
 5) Zebra V2 will support dropColumnGroup by column group names (will 
 integrate with Raghu's A29 drop column work).
 6) Zebra V2 can support backward compatibility (If there are Zebra V1 created 
 tables in production when V2 is released). More specifically, this means that 
 Zebra V2 can load from V1-created tables and do dropColumnGroup on it.
 7) Does NOT support renaming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively

2009-10-05 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-975:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

patch committed. Thanks, Ying

 Need a databag that does not register with SpillableMemoryManager and spill 
 data pro-actively
 -

 Key: PIG-975
 URL: https://issues.apache.org/jira/browse/PIG-975
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.4.0
Reporter: Ying He
Assignee: Ying He
 Fix For: 0.6.0

 Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, 
 PIG-975.patch3, PIG-975.patch4


 POPackage uses DefaultDataBag during reduce process to hold data. It is 
 registered with SpillableMemoryManager and prone to OutOfMemoryException.  
 It's better to pro-actively managers the usage of the memory. The bag fills 
 in memory to a specified amount, and dump the rest the disk.  The amount of 
 memory to hold tuples is configurable. This can avoid out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.