[jira] Created: (PIG-994) Provide 'append' keyword to allow appending to diferent dataset on pig 2.3 as it is now on hadoop 0.20(which has append feature)
Provide 'append' keyword to allow appending to diferent dataset on pig 2.3 as it is now on hadoop 0.20(which has append feature) Key: PIG-994 URL: https://issues.apache.org/jira/browse/PIG-994 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.4.0 Environment: Grid clusters Reporter: Rekha Priority: Minor Provide 'append' keyword to allow appending to diferent dataset on pig 2.3 as it is now on hadoop 0.20(which has append feature) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-994) Provide 'append' keyword to allow appending to diferent dataset on pig 2.3 as it is now on hadoop 0.20(which has append feature)
[ https://issues.apache.org/jira/browse/PIG-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762239#action_12762239 ] Olga Natkovich commented on PIG-994: Hadoop 20 does not have append. It is coming in hadoop 21. Provide 'append' keyword to allow appending to diferent dataset on pig 2.3 as it is now on hadoop 0.20(which has append feature) Key: PIG-994 URL: https://issues.apache.org/jira/browse/PIG-994 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.4.0 Environment: Grid clusters Reporter: Rekha Priority: Minor Provide 'append' keyword to allow appending to diferent dataset on pig 0.5.0 as it is now on hadoop 0.20(which has append feature) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-994) Provide 'append' keyword to allow appending to diferent dataset on pig 2.3 as it is now on hadoop 0.20(which has append feature)
[ https://issues.apache.org/jira/browse/PIG-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-994: --- Description: Provide 'append' keyword to allow appending to diferent dataset on pig 0.5.0 as it is now on hadoop 0.20(which has append feature) (was: Provide 'append' keyword to allow appending to diferent dataset on pig 2.3 as it is now on hadoop 0.20(which has append feature)) Provide 'append' keyword to allow appending to diferent dataset on pig 2.3 as it is now on hadoop 0.20(which has append feature) Key: PIG-994 URL: https://issues.apache.org/jira/browse/PIG-994 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.4.0 Environment: Grid clusters Reporter: Rekha Priority: Minor Provide 'append' keyword to allow appending to diferent dataset on pig 0.5.0 as it is now on hadoop 0.20(which has append feature) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-994) Provide 'append' keyword to allow appending to diferent dataset once the feature is available in Hadoop
[ https://issues.apache.org/jira/browse/PIG-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-994: --- Summary: Provide 'append' keyword to allow appending to diferent dataset once the feature is available in Hadoop (was: Provide 'append' keyword to allow appending to diferent dataset on pig 2.3 as it is now on hadoop 0.20(which has append feature)) Provide 'append' keyword to allow appending to diferent dataset once the feature is available in Hadoop --- Key: PIG-994 URL: https://issues.apache.org/jira/browse/PIG-994 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.4.0 Environment: Grid clusters Reporter: Rekha Priority: Minor Provide 'append' keyword to allow appending to diferent dataset on pig 0.5.0 as it is now on hadoop 0.20(which has append feature) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-922) Logical optimizer: push up project
[ https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-922: --- Status: Open (was: Patch Available) Logical optimizer: push up project -- Key: PIG-922 URL: https://issues.apache.org/jira/browse/PIG-922 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, PIG-922-p1_2.patch, PIG-922-p1_3.patch, PIG-922-p1_4.patch, PIG-922-p2_preview.patch, PIG-922-p2_preview2.patch, PIG-922-p3_1.patch, PIG-922-p3_2.patch, PIG-922-p3_3.patch, PIG-922-p3_4.patch, PIG-922-p3_5.patch This is a continuation work of [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add another rule to the logical optimizer: Push up project, ie, prune columns as early as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-922) Logical optimizer: push up project
[ https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-922: --- Attachment: PIG-922-p3_5.patch Resync with trunk Logical optimizer: push up project -- Key: PIG-922 URL: https://issues.apache.org/jira/browse/PIG-922 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, PIG-922-p1_2.patch, PIG-922-p1_3.patch, PIG-922-p1_4.patch, PIG-922-p2_preview.patch, PIG-922-p2_preview2.patch, PIG-922-p3_1.patch, PIG-922-p3_2.patch, PIG-922-p3_3.patch, PIG-922-p3_4.patch, PIG-922-p3_5.patch This is a continuation work of [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add another rule to the logical optimizer: Push up project, ie, prune columns as early as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-922) Logical optimizer: push up project
[ https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-922: --- Status: Patch Available (was: Open) Logical optimizer: push up project -- Key: PIG-922 URL: https://issues.apache.org/jira/browse/PIG-922 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, PIG-922-p1_2.patch, PIG-922-p1_3.patch, PIG-922-p1_4.patch, PIG-922-p2_preview.patch, PIG-922-p2_preview2.patch, PIG-922-p3_1.patch, PIG-922-p3_2.patch, PIG-922-p3_3.patch, PIG-922-p3_4.patch, PIG-922-p3_5.patch This is a continuation work of [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add another rule to the logical optimizer: Push up project, ie, prune columns as early as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-995) Limit Optimizer throw exception ERROR 2156: Error while fixing projections
Limit Optimizer throw exception ERROR 2156: Error while fixing projections Key: PIG-995 URL: https://issues.apache.org/jira/browse/PIG-995 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Fix For: 0.6.0 The following script fail: A = load '1.txt' AS (a0, a1, a2); B = order A by a1; C = limit B 10; D = foreach C generate $0; dump D; Error log: Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 2156: Error while fixing projections. Projection map of node to be replaced is null. at org.apache.pig.impl.logicalLayer.ProjectFixerUpper.visit(ProjectFixerUpper.java:138) at org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:408) at org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:58) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:65) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.logicalLayer.LOForEach.rewire(LOForEach.java:761) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-995) Limit Optimizer throw exception ERROR 2156: Error while fixing projections
[ https://issues.apache.org/jira/browse/PIG-995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned PIG-995: -- Assignee: Daniel Dai Limit Optimizer throw exception ERROR 2156: Error while fixing projections Key: PIG-995 URL: https://issues.apache.org/jira/browse/PIG-995 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 The following script fail: A = load '1.txt' AS (a0, a1, a2); B = order A by a1; C = limit B 10; D = foreach C generate $0; dump D; Error log: Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 2156: Error while fixing projections. Projection map of node to be replaced is null. at org.apache.pig.impl.logicalLayer.ProjectFixerUpper.visit(ProjectFixerUpper.java:138) at org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:408) at org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:58) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:65) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.logicalLayer.LOForEach.rewire(LOForEach.java:761) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-993) [zebra] Abitlity to drop a column group in a table
[ https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-993: --- Description: A Zebra table is stored as multiple sub tables each containing a set of columns called column group (CG). The user specifies how these columns are grouped while creating a table through the _storage hint_. For some of the large tables, it might be necessary for users to remove a set of columns and retain the rest. This jira provides a way for users to delete an entire column group. The following comments will have more details on API and the semantics. was: A Zebra table is stored as multiple sub tables each containing a set of columns called column group (CG). The user specifies how these columns are grouped while creating a table through the _storage hint_. For some of the large tables, it might be necessary for users to remove a set of columns and retain the rest. This jira provides a way for users to delete an entire column group. The following comments will have more details on API and the semantics. Fix Version/s: (was: 0.5.0) [zebra] Abitlity to drop a column group in a table -- Key: PIG-993 URL: https://issues.apache.org/jira/browse/PIG-993 Project: Pig Issue Type: Bug Reporter: Raghu Angadi Assignee: Raghu Angadi Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch, zebra-drop-cg.patch A Zebra table is stored as multiple sub tables each containing a set of columns called column group (CG). The user specifies how these columns are grouped while creating a table through the _storage hint_. For some of the large tables, it might be necessary for users to remove a set of columns and retain the rest. This jira provides a way for users to delete an entire column group. The following comments will have more details on API and the semantics. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-920) optimizing diamond queries
[ https://issues.apache.org/jira/browse/PIG-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding reassigned PIG-920: Assignee: Richard Ding optimizing diamond queries -- Key: PIG-920 URL: https://issues.apache.org/jira/browse/PIG-920 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Richard Ding The following query A = load 'foo'; B = filer A by $01; C = filter A by $1 = 'foo'; D = COGROUP C by $0, B by $0; .. does not get efficiently executed. Currently, it runs a map only job that basically reads and write the same data before doing the query processing. Query where the data is loaded twice actually executed more efficiently. This is not an uncommon query and we should fix this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-922) Logical optimizer: push up project
[ https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762362#action_12762362 ] Hadoop QA commented on PIG-922: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421307/PIG-922-p3_5.patch against trunk revision 821101. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 27 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs warnings. -1 release audit. The applied patch generated 305 release audit warnings (more than the trunk's current 298 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/60/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/60/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/60/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/60/console This message is automatically generated. Logical optimizer: push up project -- Key: PIG-922 URL: https://issues.apache.org/jira/browse/PIG-922 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, PIG-922-p1_2.patch, PIG-922-p1_3.patch, PIG-922-p1_4.patch, PIG-922-p2_preview.patch, PIG-922-p2_preview2.patch, PIG-922-p3_1.patch, PIG-922-p3_2.patch, PIG-922-p3_3.patch, PIG-922-p3_4.patch, PIG-922-p3_5.patch This is a continuation work of [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add another rule to the logical optimizer: Push up project, ie, prune columns as early as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-986) [zebra] Zebra Column Group Naming Support
[ https://issues.apache.org/jira/browse/PIG-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Wang updated PIG-986: -- Patch reviewed. +1 [zebra] Zebra Column Group Naming Support - Key: PIG-986 URL: https://issues.apache.org/jira/browse/PIG-986 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.4.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.6.0 Attachments: ColumnGroupName.patch We introduce column group name to Zebra and make it a first-class citizen in Zebra. This can ease management of column groups. We plan to introduce an as clause for column group name in Zebra's syntax. Functional Specifications: 1) Column group names are optional. For column groups which do not have a user-provided name, Zebra will assign some default column group names internally that is unique for that table - CG0, CG1, CG2 ... Note: If CGx is used by user, then it can not be used for internal names. 2) We introduce an AS clause in Zebra's syntax for column group names. If it occurs, it has to immediately follow [ ]. For example, [a1, a2] as PI secure by user:joe group:secure perm:640; [a3, a4] as General compress by lzo. Note that keyword AS is case insensitive. 3) Column group names are unique within one table and are case sensitive, i.e., c1 and C1 are different. 4) Column group names will be used as the physical column group directory path names. 5) Zebra V2 will support dropColumnGroup by column group names (will integrate with Raghu's A29 drop column work). 6) Zebra V2 can support backward compatibility (If there are Zebra V1 created tables in production when V2 is released). More specifically, this means that Zebra V2 can load from V1-created tables and do dropColumnGroup on it. 7) Does NOT support renaming. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively
[ https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-975: --- Resolution: Fixed Status: Resolved (was: Patch Available) patch committed. Thanks, Ying Need a databag that does not register with SpillableMemoryManager and spill data pro-actively - Key: PIG-975 URL: https://issues.apache.org/jira/browse/PIG-975 Project: Pig Issue Type: Improvement Affects Versions: 0.4.0 Reporter: Ying He Assignee: Ying He Fix For: 0.6.0 Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, PIG-975.patch3, PIG-975.patch4 POPackage uses DefaultDataBag during reduce process to hold data. It is registered with SpillableMemoryManager and prone to OutOfMemoryException. It's better to pro-actively managers the usage of the memory. The bag fills in memory to a specified amount, and dump the rest the disk. The amount of memory to hold tuples is configurable. This can avoid out of memory error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.