[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode
[ https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12770336#action_12770336 ] Raghu Angadi commented on PIG-1053: --- a big +1. It is understandable from PIG developer's point of view to be annoyed by beginners complaining about run time with toy local inputs. may be clear heads-up in tutorial would reduce those. Consider moving to Hadoop for local mode Key: PIG-1053 URL: https://issues.apache.org/jira/browse/PIG-1053 Project: Pig Issue Type: Improvement Reporter: Alan Gates We need to consider moving Pig to use Hadoop's local mode instead of its own. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-986) [zebra] Zebra Column Group Naming Support
[ https://issues.apache.org/jira/browse/PIG-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-986: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I just committed this. Thanks Yan. [zebra] Zebra Column Group Naming Support - Key: PIG-986 URL: https://issues.apache.org/jira/browse/PIG-986 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.4.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.6.0 Attachments: ColumnGroupName.patch, ColumnGroupName.patch, ColumnGroupName.patch We introduce column group name to Zebra and make it a first-class citizen in Zebra. This can ease management of column groups. We plan to introduce an as clause for column group name in Zebra's syntax. Functional Specifications: 1) Column group names are optional. For column groups which do not have a user-provided name, Zebra will assign some default column group names internally that is unique for that table - CG0, CG1, CG2 ... Note: If CGx is used by user, then it can not be used for internal names. 2) We introduce an AS clause in Zebra's syntax for column group names. If it occurs, it has to immediately follow [ ]. For example, [a1, a2] as PI secure by user:joe group:secure perm:640; [a3, a4] as General compress by lzo. Note that keyword AS is case insensitive. 3) Column group names are unique within one table and are case sensitive, i.e., c1 and C1 are different. 4) Column group names will be used as the physical column group directory path names. 5) Zebra V2 will support dropColumnGroup by column group names (will integrate with Raghu's A29 drop column work). 6) Zebra V2 can support backward compatibility (If there are Zebra V1 created tables in production when V2 is released). More specifically, this means that Zebra V2 can load from V1-created tables and do dropColumnGroup on it. 7) Does NOT support renaming. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-993) [zebra] Abitlity to drop a column group in a table
[ https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12764552#action_12764552 ] Raghu Angadi commented on PIG-993: -- This patch depends on PIG-992. It is not a functional dependency and can be removed if required. [zebra] Abitlity to drop a column group in a table -- Key: PIG-993 URL: https://issues.apache.org/jira/browse/PIG-993 Project: Pig Issue Type: Bug Reporter: Raghu Angadi Assignee: Raghu Angadi Fix For: 0.6.0 Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch, zebra-drop-cg.patch A Zebra table is stored as multiple sub tables each containing a set of columns called column group (CG). The user specifies how these columns are grouped while creating a table through the _storage hint_. For some of the large tables, it might be necessary for users to remove a set of columns and retain the rest. This jira provides a way for users to delete an entire column group. The following comments will have more details on API and the semantics. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-986) [zebra] Zebra Column Group Naming Support
[ https://issues.apache.org/jira/browse/PIG-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-986: - Status: Open (was: Patch Available) [zebra] Zebra Column Group Naming Support - Key: PIG-986 URL: https://issues.apache.org/jira/browse/PIG-986 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.4.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.6.0 Attachments: ColumnGroupName.patch, ColumnGroupName.patch, ColumnGroupName.patch We introduce column group name to Zebra and make it a first-class citizen in Zebra. This can ease management of column groups. We plan to introduce an as clause for column group name in Zebra's syntax. Functional Specifications: 1) Column group names are optional. For column groups which do not have a user-provided name, Zebra will assign some default column group names internally that is unique for that table - CG0, CG1, CG2 ... Note: If CGx is used by user, then it can not be used for internal names. 2) We introduce an AS clause in Zebra's syntax for column group names. If it occurs, it has to immediately follow [ ]. For example, [a1, a2] as PI secure by user:joe group:secure perm:640; [a3, a4] as General compress by lzo. Note that keyword AS is case insensitive. 3) Column group names are unique within one table and are case sensitive, i.e., c1 and C1 are different. 4) Column group names will be used as the physical column group directory path names. 5) Zebra V2 will support dropColumnGroup by column group names (will integrate with Raghu's A29 drop column work). 6) Zebra V2 can support backward compatibility (If there are Zebra V1 created tables in production when V2 is released). More specifically, this means that Zebra V2 can load from V1-created tables and do dropColumnGroup on it. 7) Does NOT support renaming. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-986) [zebra] Zebra Column Group Naming Support
[ https://issues.apache.org/jira/browse/PIG-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-986: - Status: Patch Available (was: Open) [zebra] Zebra Column Group Naming Support - Key: PIG-986 URL: https://issues.apache.org/jira/browse/PIG-986 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.4.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.6.0 Attachments: ColumnGroupName.patch, ColumnGroupName.patch, ColumnGroupName.patch We introduce column group name to Zebra and make it a first-class citizen in Zebra. This can ease management of column groups. We plan to introduce an as clause for column group name in Zebra's syntax. Functional Specifications: 1) Column group names are optional. For column groups which do not have a user-provided name, Zebra will assign some default column group names internally that is unique for that table - CG0, CG1, CG2 ... Note: If CGx is used by user, then it can not be used for internal names. 2) We introduce an AS clause in Zebra's syntax for column group names. If it occurs, it has to immediately follow [ ]. For example, [a1, a2] as PI secure by user:joe group:secure perm:640; [a3, a4] as General compress by lzo. Note that keyword AS is case insensitive. 3) Column group names are unique within one table and are case sensitive, i.e., c1 and C1 are different. 4) Column group names will be used as the physical column group directory path names. 5) Zebra V2 will support dropColumnGroup by column group names (will integrate with Raghu's A29 drop column work). 6) Zebra V2 can support backward compatibility (If there are Zebra V1 created tables in production when V2 is released). More specifically, this means that Zebra V2 can load from V1-created tables and do dropColumnGroup on it. 7) Does NOT support renaming. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control
[ https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763836#action_12763836 ] Raghu Angadi commented on PIG-987: -- Thanks Yan. It might be better to remove gauravj also since it is ignored anyway. This implies column access control is not tested in this patch, right? [zebra] Zebra Column Group Access Control - Key: PIG-987 URL: https://issues.apache.org/jira/browse/PIG-987 Project: Pig Issue Type: New Feature Affects Versions: 0.6.0 Reporter: Yan Zhou Assignee: Yan Zhou Attachments: ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt, tmp-987-plus-991.patch Access Control: when processes try to read from the column groups, Zebra should be able to handle allowed vs. disallowed user/application accesses. The security is eventuallt granted by corresponding HDFS security of the data stored. Expected behavior when column group permissions are set: When user selects only columns that they do not have permissions to access, Zebra should return error with message Error #: Permission denied for accessing column column name or names Access control applies to an entire column group, so all columns in a column group have same permissions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-991) [zebra] A few minor bugs as described in the Description section
[ https://issues.apache.org/jira/browse/PIG-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-991: - Attachment: Bugs-2.patch I am committing a slightly modified patch. I removed the following lines that modified build.xml at the top level. Please ask one of the PIG committers to commit that change. The part that is removed : {noformat} @@ -940,4 +942,13 @@ target name=published depends=ivy-publish-local, maven-artifacts/ +target name=pig-test +jar + jarfile=${build.dir}/pig-test-${version}.jar + basedir=${build.dir}/test/classes + excludes=**/Test*.class + +/jar +/target + /project {noformat} [zebra] A few minor bugs as described in the Description section Key: PIG-991 URL: https://issues.apache.org/jira/browse/PIG-991 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: 0.6.0 Attachments: Bugs-2.patch, Bugs.patch 1) lzo2 was used as the compressor name for the LZO compression algorithm; it should be lzo instead; 2) the default compression is changed from lzo to gz for gzip; 3) In JAVACC file SchemaParser.jjt, the package name was wrong using the old package org.apache.pig.table.types; 4) in build.xml, two new javacc targets are added to generate TableSchemaParser and TableStorageParser java codes; 5) Support of column group security ( https://issues.apache.org/jira/browse/PIG-987 ) lacked support of the dumpinfo method: the groups and permissions were not displayed. Note that as a consequence, the patch herein must be applied after that of JIRA987. 6) and 7) a couple of issues reported in Jira917. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-987) [zebra] Zebra Column Group Access Control
[ https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-987: - Resolution: Fixed Fix Version/s: 0.6.0 Status: Resolved (was: Patch Available) I just committed this. Thanks Yan! [zebra] Zebra Column Group Access Control - Key: PIG-987 URL: https://issues.apache.org/jira/browse/PIG-987 Project: Pig Issue Type: New Feature Affects Versions: 0.6.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.6.0 Attachments: ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt, tmp-987-plus-991.patch Access Control: when processes try to read from the column groups, Zebra should be able to handle allowed vs. disallowed user/application accesses. The security is eventuallt granted by corresponding HDFS security of the data stored. Expected behavior when column group permissions are set: When user selects only columns that they do not have permissions to access, Zebra should return error with message Error #: Permission denied for accessing column column name or names Access control applies to an entire column group, so all columns in a column group have same permissions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-991) [zebra] A few minor bugs as described in the Description section
[ https://issues.apache.org/jira/browse/PIG-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-991: - Resolution: Fixed Status: Resolved (was: Patch Available) I just committed this. Thanks Yan. [zebra] A few minor bugs as described in the Description section Key: PIG-991 URL: https://issues.apache.org/jira/browse/PIG-991 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: 0.6.0 Attachments: Bugs-2.patch, Bugs.patch 1) lzo2 was used as the compressor name for the LZO compression algorithm; it should be lzo instead; 2) the default compression is changed from lzo to gz for gzip; 3) In JAVACC file SchemaParser.jjt, the package name was wrong using the old package org.apache.pig.table.types; 4) in build.xml, two new javacc targets are added to generate TableSchemaParser and TableStorageParser java codes; 5) Support of column group security ( https://issues.apache.org/jira/browse/PIG-987 ) lacked support of the dumpinfo method: the groups and permissions were not displayed. Note that as a consequence, the patch herein must be applied after that of JIRA987. 6) and 7) a couple of issues reported in Jira917. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control
[ https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763346#action_12763346 ] Raghu Angadi commented on PIG-987: -- I finally got some time look into this. Yes. I think the it should be fixed in the tests. TestColumnGroup.java says : {noformat} ColumnGroup.Writer writer = new ColumnGroup.Writer(path, strSchema, sorted, pig, gz, gauravj, users, (short) Short.parseShort(755, 8), false, conf); {noformat} using local FS. How can we expect users to have a user name gauravj on their machines and run as superusers :)? just can not be done. If the test wants to run with these permissions we should do : a) use HDFS (MiniDFSCluster) rather than local filesystem. The tester has all the permissions on a MiniDFS. b) minor : use a generic name than gauravj. [zebra] Zebra Column Group Access Control - Key: PIG-987 URL: https://issues.apache.org/jira/browse/PIG-987 Project: Pig Issue Type: New Feature Affects Versions: 0.6.0 Reporter: Yan Zhou Assignee: Yan Zhou Attachments: ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt, tmp-987-plus-991.patch Access Control: when processes try to read from the column groups, Zebra should be able to handle allowed vs. disallowed user/application accesses. The security is eventuallt granted by corresponding HDFS security of the data stored. Expected behavior when column group permissions are set: When user selects only columns that they do not have permissions to access, Zebra should return error with message Error #: Permission denied for accessing column column name or names Access control applies to an entire column group, so all columns in a column group have same permissions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control
[ https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762812#action_12762812 ] Raghu Angadi commented on PIG-987: -- I tried to commit this patch. 'ant test' says all the tests fail, where as only one two tests fail without the patch. Does Hudson actual run Zebra tests? [zebra] Zebra Column Group Access Control - Key: PIG-987 URL: https://issues.apache.org/jira/browse/PIG-987 Project: Pig Issue Type: New Feature Affects Versions: 0.6.0 Reporter: Yan Zhou Assignee: Yan Zhou Attachments: ColumnGroupSecurity.patch Access Control: when processes try to read from the column groups, Zebra should be able to handle allowed vs. disallowed user/application accesses. The security is eventuallt granted by corresponding HDFS security of the data stored. Expected behavior when column group permissions are set: When user selects only columns that they do not have permissions to access, Zebra should return error with message Error #: Permission denied for accessing column column name or names Access control applies to an entire column group, so all columns in a column group have same permissions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-991) [zebra] A few minor bugs as described in the Description section
[ https://issues.apache.org/jira/browse/PIG-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-991: - Release Note: (was: Patch should be applied after that of Jira987.) bq. Patch should be applied after that of Jira987. [moved above comment from 'Release Notes' to this comment]. [zebra] A few minor bugs as described in the Description section Key: PIG-991 URL: https://issues.apache.org/jira/browse/PIG-991 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: 0.6.0 Attachments: Bugs.patch 1) lzo2 was used as the compressor name for the LZO compression algorithm; it should be lzo instead; 2) the default compression is changed from lzo to gz for gzip; 3) In JAVACC file SchemaParser.jjt, the package name was wrong using the old package org.apache.pig.table.types; 4) in build.xml, two new javacc targets are added to generate TableSchemaParser and TableStorageParser java codes; 5) Support of column group security ( https://issues.apache.org/jira/browse/PIG-987 ) lacked support of the dumpinfo method: the groups and permissions were not displayed. Note that as a consequence, the patch herein must be applied after that of JIRA987. 6) and 7) a couple of issues reported in Jira917. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-987) [zebra] Zebra Column Group Access Control
[ https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-987: - Attachment: TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt I am attaching {{mapred.TestCheckin.txt}} that passes without the patch. btw, not all tests pass even without the patch. What is the environment required? I did a fresh check out, and ran 'ant test'. I guess the tests failures on trunk are related to lzo. But I didn't expect more failures with the patch. Looks like PIG-991 removes the lzo dependency. I will try with that patch included. [zebra] Zebra Column Group Access Control - Key: PIG-987 URL: https://issues.apache.org/jira/browse/PIG-987 Project: Pig Issue Type: New Feature Affects Versions: 0.6.0 Reporter: Yan Zhou Assignee: Yan Zhou Attachments: ColumnGroupSecurity.patch, TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt Access Control: when processes try to read from the column groups, Zebra should be able to handle allowed vs. disallowed user/application accesses. The security is eventuallt granted by corresponding HDFS security of the data stored. Expected behavior when column group permissions are set: When user selects only columns that they do not have permissions to access, Zebra should return error with message Error #: Permission denied for accessing column column name or names Access control applies to an entire column group, so all columns in a column group have same permissions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control
[ https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762829#action_12762829 ] Raghu Angadi commented on PIG-987: -- Not sure if this is related to PIG. When I applied PIG-991 over this, the tests passed (except the ones that fail on trunk). [zebra] Zebra Column Group Access Control - Key: PIG-987 URL: https://issues.apache.org/jira/browse/PIG-987 Project: Pig Issue Type: New Feature Affects Versions: 0.6.0 Reporter: Yan Zhou Assignee: Yan Zhou Attachments: ColumnGroupSecurity.patch, TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt Access Control: when processes try to read from the column groups, Zebra should be able to handle allowed vs. disallowed user/application accesses. The security is eventuallt granted by corresponding HDFS security of the data stored. Expected behavior when column group permissions are set: When user selects only columns that they do not have permissions to access, Zebra should return error with message Error #: Permission denied for accessing column column name or names Access control applies to an entire column group, so all columns in a column group have same permissions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-993) [zebra] Abitlity to drop a column group in a table
[ https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-993: - Fix Version/s: 0.6.0 [zebra] Abitlity to drop a column group in a table -- Key: PIG-993 URL: https://issues.apache.org/jira/browse/PIG-993 Project: Pig Issue Type: Bug Reporter: Raghu Angadi Assignee: Raghu Angadi Fix For: 0.6.0 Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch, zebra-drop-cg.patch A Zebra table is stored as multiple sub tables each containing a set of columns called column group (CG). The user specifies how these columns are grouped while creating a table through the _storage hint_. For some of the large tables, it might be necessary for users to remove a set of columns and retain the rest. This jira provides a way for users to delete an entire column group. The following comments will have more details on API and the semantics. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control
[ https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762871#action_12762871 ] Raghu Angadi commented on PIG-987: -- Even with PIG-991 included, I am seeing lzo related failures. Could you run tests on a clean checkout? If you didn't see the errors before then you probably have lzo set up in your environment, which is not a requirement. [zebra] Zebra Column Group Access Control - Key: PIG-987 URL: https://issues.apache.org/jira/browse/PIG-987 Project: Pig Issue Type: New Feature Affects Versions: 0.6.0 Reporter: Yan Zhou Assignee: Yan Zhou Attachments: ColumnGroupSecurity.patch, TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt Access Control: when processes try to read from the column groups, Zebra should be able to handle allowed vs. disallowed user/application accesses. The security is eventuallt granted by corresponding HDFS security of the data stored. Expected behavior when column group permissions are set: When user selects only columns that they do not have permissions to access, Zebra should return error with message Error #: Permission denied for accessing column column name or names Access control applies to an entire column group, so all columns in a column group have same permissions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-993) [zebra] Abitlity to drop a column group in a table
[zebra] Abitlity to drop a column group in a table -- Key: PIG-993 URL: https://issues.apache.org/jira/browse/PIG-993 Project: Pig Issue Type: Bug Reporter: Raghu Angadi Assignee: Raghu Angadi Fix For: 0.5.0 A Zebra table is stored as multiple sub tables each containing a set of columns called column group (CG). The user specifies how these columns are grouped while creating a table through the _storage hint_. For some of the large tables, it might be necessary for users to remove a set of columns and retain the rest. This jira provides a way for users to delete an entire column group. The following comments will have more details on API and the semantics. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-993) [zebra] Abitlity to drop a column group in a table
[ https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12761766#action_12761766 ] Raghu Angadi commented on PIG-993: -- API is pretty simple : {code} class org.apache.hadoop.zebra.BasicTable { /** see the patch for JavaDoc and attached example for usage */ public static void dropColumnGroup(Path path, Configuration conf, String cgName) throws IOException { ... } } {code} * Table schema is not modified. * this API takes a name for a column group. PIG-986 adds explicit names for CGs. * Once a CGs is deleted, NULL is returned for the fields that were stored in the CG. ** This is the main difference between just manually deleting a directory on filesystem and 'properly' deleting a CG. ** Many changes made in other parts of zebra are related to handling the missing CGs. [zebra] Abitlity to drop a column group in a table -- Key: PIG-993 URL: https://issues.apache.org/jira/browse/PIG-993 Project: Pig Issue Type: Bug Reporter: Raghu Angadi Assignee: Raghu Angadi Fix For: 0.5.0 A Zebra table is stored as multiple sub tables each containing a set of columns called column group (CG). The user specifies how these columns are grouped while creating a table through the _storage hint_. For some of the large tables, it might be necessary for users to remove a set of columns and retain the rest. This jira provides a way for users to delete an entire column group. The following comments will have more details on API and the semantics. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-993) [zebra] Abitlity to drop a column group in a table
[ https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-993: - Attachment: zebra-drop-cg.patch DropColumnGroupExample.java Attachments ; DropColumnGropuExample.java : a simple example to illustrate the functionality. zebra-drop-cg.patch : This patch would apply only after a patch for PIG-896. Some of the tests included there are written by Jing Huang. Jing also helped with testing the patchon real clusters with various errors. Yan Zhou helped with correctly handling missing column groups. [zebra] Abitlity to drop a column group in a table -- Key: PIG-993 URL: https://issues.apache.org/jira/browse/PIG-993 Project: Pig Issue Type: Bug Reporter: Raghu Angadi Assignee: Raghu Angadi Fix For: 0.5.0 Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch A Zebra table is stored as multiple sub tables each containing a set of columns called column group (CG). The user specifies how these columns are grouped while creating a table through the _storage hint_. For some of the large tables, it might be necessary for users to remove a set of columns and retain the rest. This jira provides a way for users to delete an entire column group. The following comments will have more details on API and the semantics. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-993) [zebra] Abitlity to drop a column group in a table
[ https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12761769#action_12761769 ] Raghu Angadi commented on PIG-993: -- zebra-drop-cg.patch : This patch would apply only after a patch for PIG-896. I meant say PIG-986. [zebra] Abitlity to drop a column group in a table -- Key: PIG-993 URL: https://issues.apache.org/jira/browse/PIG-993 Project: Pig Issue Type: Bug Reporter: Raghu Angadi Assignee: Raghu Angadi Fix For: 0.5.0 Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch A Zebra table is stored as multiple sub tables each containing a set of columns called column group (CG). The user specifies how these columns are grouped while creating a table through the _storage hint_. For some of the large tables, it might be necessary for users to remove a set of columns and retain the rest. This jira provides a way for users to delete an entire column group. The following comments will have more details on API and the semantics. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-985) [zebra] Make necessary changes to build scripts to accommodate new zebra features plus other improvement.
[ https://issues.apache.org/jira/browse/PIG-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12761045#action_12761045 ] Raghu Angadi commented on PIG-985: -- 5) drop column group change (Raghu Angadi) 6) schema package separation change (Yan Zhou) Just to clarify, this patch does not contain the above two features. It only contains couple of minor changes made in build.xml as part of these changes. Separate jiras will be filed for these two and other features soon. [zebra] Make necessary changes to build scripts to accommodate new zebra features plus other improvement. - Key: PIG-985 URL: https://issues.apache.org/jira/browse/PIG-985 Project: Pig Issue Type: Task Components: build Reporter: Chao Wang Assignee: Chao Wang Attachments: patch The whole task consists of a series of steps as follows: 1) nightly test change - prevent checkin tests from running twice in nightly (Chao Wang) 2) row based block splits for tables change (Raghu Angadi) 3) add clover target (Jing Huang) 4) add findbugs target (Chao Wang) 5) drop column group change (Raghu Angadi) 6) schema package separation change (Yan Zhou) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour
[ https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-949: - Resolution: Fixed Fix Version/s: (was: 0.4.0) Status: Resolved (was: Patch Available) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour -- Key: PIG-949 URL: https://issues.apache.org/jira/browse/PIG-949 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Environment: linux Reporter: Alok Singh Assignee: Yan Zhou Fix For: 0.5.0 Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch Hi The storage hint specification plays a important part whether the output table is readable or not say if we have have the map 'map'. One can split the map into a column group using [map#{k1}, map#{k2}...] however the remaining map field will automatically be added to the default group. if user try to create a new column group for the remaining fields as follows [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group the table writer will create the table. however, if one tries to load the created table via pig or via map reduce using TableInputFormat then the reader have problem reading the map We get the following stack trace 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : attempt_200908191538_33939_m_21_2, Status : FAILED java.io.IOException: getValue() failed: null at org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour
[ https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759789#action_12759789 ] Raghu Angadi commented on PIG-949: -- I just committed this. Thanks Yan for the fix and Jing for the test! Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour -- Key: PIG-949 URL: https://issues.apache.org/jira/browse/PIG-949 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Environment: linux Reporter: Alok Singh Assignee: Yan Zhou Fix For: 0.5.0 Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch Hi The storage hint specification plays a important part whether the output table is readable or not say if we have have the map 'map'. One can split the map into a column group using [map#{k1}, map#{k2}...] however the remaining map field will automatically be added to the default group. if user try to create a new column group for the remaining fields as follows [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group the table writer will create the table. however, if one tries to load the created table via pig or via map reduce using TableInputFormat then the reader have problem reading the map We get the following stack trace 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : attempt_200908191538_33939_m_21_2, Status : FAILED java.io.IOException: getValue() failed: null at org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour
[ https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-949: - Status: Open (was: Patch Available) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour -- Key: PIG-949 URL: https://issues.apache.org/jira/browse/PIG-949 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Environment: linux Reporter: Alok Singh Assignee: Yan Zhou Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch Hi The storage hint specification plays a important part whether the output table is readable or not say if we have have the map 'map'. One can split the map into a column group using [map#{k1}, map#{k2}...] however the remaining map field will automatically be added to the default group. if user try to create a new column group for the remaining fields as follows [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group the table writer will create the table. however, if one tries to load the created table via pig or via map reduce using TableInputFormat then the reader have problem reading the map We get the following stack trace 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : attempt_200908191538_33939_m_21_2, Status : FAILED java.io.IOException: getValue() failed: null at org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour
[ https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-949: - Fix Version/s: 0.5.0 0.4.0 Status: Patch Available (was: Open) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour -- Key: PIG-949 URL: https://issues.apache.org/jira/browse/PIG-949 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Environment: linux Reporter: Alok Singh Assignee: Yan Zhou Fix For: 0.4.0, 0.5.0 Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch Hi The storage hint specification plays a important part whether the output table is readable or not say if we have have the map 'map'. One can split the map into a column group using [map#{k1}, map#{k2}...] however the remaining map field will automatically be added to the default group. if user try to create a new column group for the remaining fields as follows [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group the table writer will create the table. however, if one tries to load the created table via pig or via map reduce using TableInputFormat then the reader have problem reading the map We get the following stack trace 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : attempt_200908191538_33939_m_21_2, Status : FAILED java.io.IOException: getValue() failed: null at org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried
[ https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi reassigned PIG-918: Assignee: Yan Zhou [zebra] LOAD call will hang if only the first column group is queried - Key: PIG-918 URL: https://issues.apache.org/jira/browse/PIG-918 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.4.0 Attachments: pig-zebra.patch, pig-zebra.patch Zebra's LOAD call with projections that only nclude column(s) in the first column group will hang because an improper range of random numbers for index to the array of column groups always skips the first element so that if all other column groups are not used, the looping keeps running without a chance to break. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried
[ https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-918: - Attachment: pig-zebra.patch When you generate a patch with 'git diff' please use 'git diff --no-prefix' so that patch applies with 'patch -p0' command. I am updating the attached patch with this change. [zebra] LOAD call will hang if only the first column group is queried - Key: PIG-918 URL: https://issues.apache.org/jira/browse/PIG-918 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Yan Zhou Fix For: 0.4.0 Attachments: pig-zebra.patch, pig-zebra.patch Zebra's LOAD call with projections that only nclude column(s) in the first column group will hang because an improper range of random numbers for index to the array of column groups always skips the first element so that if all other column groups are not used, the looping keeps running without a chance to break. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried
[ https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-918: - Affects Version/s: (was: 0.3.0) 0.4.0 [zebra] LOAD call will hang if only the first column group is queried - Key: PIG-918 URL: https://issues.apache.org/jira/browse/PIG-918 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Yan Zhou Fix For: 0.4.0 Attachments: pig-zebra.patch, pig-zebra.patch Zebra's LOAD call with projections that only nclude column(s) in the first column group will hang because an improper range of random numbers for index to the array of column groups always skips the first element so that if all other column groups are not used, the looping keeps running without a chance to break. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried
[ https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12750055#action_12750055 ] Raghu Angadi commented on PIG-918: -- I just committed this. Thanks Yan. [zebra] LOAD call will hang if only the first column group is queried - Key: PIG-918 URL: https://issues.apache.org/jira/browse/PIG-918 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Yan Zhou Fix For: 0.4.0 Attachments: pig-zebra.patch, pig-zebra.patch Zebra's LOAD call with projections that only nclude column(s) in the first column group will hang because an improper range of random numbers for index to the array of column groups always skips the first element so that if all other column groups are not used, the looping keeps running without a chance to break. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745219#action_12745219 ] Raghu Angadi commented on PIG-833: -- Thanks Jing. There are some PIG examples listed at the bottom of Zebra wiki : http://wiki.apache.org/pig/zebra (wiki is still under construction). Just listing java strings in Jing's comment with out Jira formatting : {noformat} final static String STR_SCHEMA = s1:bool, s2:int, s3:long, s4:float, s5:string, s6:bytes, + r1:record(f1:int, f2:long), r2:record(r3:record(f3:float, f4)), + m1:map(string),m2:map(map(int)), c:collection(f13:double, f14:float, f15:bytes); final static String STR_STORAGE = [s1, s2]; [m1#{a}]; [r1.f1]; [s3, s4, r2.r3.f3]; [s5, s6, m2#{x|y}]; + [r1.f2, m1#{b}]; [r2.r3.f4, m2#{z}]; {noformat} Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-833: - Attachment: PIG-833-zebra.patch.bz2 Updated patch. Only change is that ant prints a descriptive error to user if hadoop20.jar does not exist in top level lib directory. It lists basic steps to get this built until PIG-660 is committed. Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, PIG-833-zebra.patch.bz2, zebra-javadoc.tgz A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-833: - Attachment: PIG-833-zebra.patch.bz2 Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, zebra-javadoc.tgz A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742069#action_12742069 ] Raghu Angadi commented on PIG-833: -- Alan, in order to run unit tests you need to build pig test-core. As mentioned in the instructions above please run {{'ant -Dtestcase=none test-core'}} under top level directory before running 'ant test' under contrib/zebra. Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, test.out, zebra-javadoc.tgz A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736998#action_12736998 ] Raghu Angadi commented on PIG-833: -- There will be benchmark results either attached to this jira or to a subsequent jira. I would like to compare to SequenceFiles and the new format in Hive. Should to see on par performance. Major performance benefits come from commonly used projections (through column groups) and map side joins of sorted tables. An important part of motivation is some features like column security, ability to delete entire columns. We are running some larger scale benchmarks internally.. but these run on Yahoo's internal data sources. Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, zebra-javadoc.tgz A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-660) Integration with Hadoop 0.20
[ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736264#action_12736264 ] Raghu Angadi commented on PIG-660: -- Currently, hadoop jar for 0.18 under lib/ is called hadoop18.jar. Should we change build.xml to use hadoop20.jar instead of hadoop18.jar? I can file a jira to commit hadoop20.jar. This might be replaced by updated jar when this jira is committed. Integration with Hadoop 0.20 Key: PIG-660 URL: https://issues.apache.org/jira/browse/PIG-660 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Environment: Hadoop 0.20 Reporter: Santhosh Srinivasan Assignee: Santhosh Srinivasan Fix For: 0.4.0 Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking. 1. Hadoop should return objects instead of strings when exceptions are thrown 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-660) Integration with Hadoop 0.20
[ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736297#action_12736297 ] Raghu Angadi commented on PIG-660: -- Thanks Olga and Santosh. build.xml change is already in the patch. Thanks. I will attach hadoop20.jar that works with PIG. This is useful for anyone to tryout the patch. This will also be used by zebra (PIG-833). Please commit the jar file to PIG trunk. It could be updated with a later version of hadoop-0.20 branch. Integration with Hadoop 0.20 Key: PIG-660 URL: https://issues.apache.org/jira/browse/PIG-660 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Environment: Hadoop 0.20 Reporter: Santhosh Srinivasan Assignee: Santhosh Srinivasan Fix For: 0.4.0 Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking. 1. Hadoop should return objects instead of strings when exceptions are thrown 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-660) Integration with Hadoop 0.20
[ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-660: - Attachment: PIG-660_6.patch Updated patch fixes two minor conflicts with the current pig trunk. Integration with Hadoop 0.20 Key: PIG-660 URL: https://issues.apache.org/jira/browse/PIG-660 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Environment: Hadoop 0.20 Reporter: Santhosh Srinivasan Assignee: Santhosh Srinivasan Fix For: 0.4.0 Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch, PIG-660_6.patch With Hadoop 0.20, it will be possible to query the status of each map and reduce in a map reduce job. This will allow better error reporting. Some of the other items that could be on Hadoop's feature requests/bugs are documented here for tracking. 1. Hadoop should return objects instead of strings when exceptions are thrown 2. The JobControl should handle all exceptions and report them appropriately. For example, when the JobControl fails to launch jobs, it should handle exceptions appropriately and should support APIs that query this state, i.e., failure to launch jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-833: - Attachment: hadoop20.jar.bz2 Attaching hadoop20.jar that needs to be placed under lib/ directory under the top level PIG directory. will included specific instructions later in the jira. Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2 A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736424#action_12736424 ] Raghu Angadi commented on PIG-833: -- Will surely look at Hive's storage layer and SerDe. I will be able to better comment on specifics once I get better handle. In the mean while I will attach the work that is already been done on Zebra. This is currently a contrib in PIG. Based on these experiences we could probably provide a common storage layer more widely suitable for multiple Hadoop related projects. Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2 A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-833: - Attachment: PIG-833-zebra.patch The first cut of contrib/zebra. The patch is very large and should probably compress the subsequent versions of it. More documentation on design and usage will be added to the jira. How to compile : -- * check out latest PIG trunk * Apply the latest patch from PIG-660 * copy attached hadoop20.jar to ./lib * run '{{ant jar}}' (and {{'ant -Dtestcase=none test-core'}} for zebra tests). * cd contrib/zebra * ant jar * ant test (for tests). Currently there are compile time deprecation warnings related to use of deprecated mapred API (JobConf). There is will be fixed later. Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-833: - Attachment: zebra-javadoc.tgz Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, zebra-javadoc.tgz A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.