[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-10-26 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12770336#action_12770336
 ] 

Raghu Angadi commented on PIG-1053:
---

a big +1.

It is understandable from PIG developer's point of view to be annoyed by 
beginners complaining about run time with toy local inputs. may be clear 
heads-up in tutorial would reduce those.

 Consider moving to Hadoop for local mode
 

 Key: PIG-1053
 URL: https://issues.apache.org/jira/browse/PIG-1053
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates

 We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-986) [zebra] Zebra Column Group Naming Support

2009-10-11 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-986:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

I just committed this. Thanks Yan.

 [zebra] Zebra Column Group Naming Support
 -

 Key: PIG-986
 URL: https://issues.apache.org/jira/browse/PIG-986
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.4.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.6.0

 Attachments: ColumnGroupName.patch, ColumnGroupName.patch, 
 ColumnGroupName.patch


 We introduce column group name to Zebra and make it a first-class citizen in 
 Zebra. This can ease management of column groups.
 We plan to introduce an as clause for column group name in Zebra's syntax.
 Functional Specifications:
 1) Column group names are optional. For column groups which do not have a 
 user-provided name, Zebra will assign some default column group names 
 internally that is unique for that table - CG0, CG1, CG2 ... Note: If CGx is 
 used by user, then it can not be used for internal names.
 2) We introduce an AS clause in Zebra's syntax for column group names. If 
 it occurs, it has to immediately follow [ ]. For example, [a1, a2] as PI 
 secure by user:joe group:secure perm:640; [a3, a4] as General compress by 
 lzo. Note that keyword AS is case insensitive.
 3) Column group names are unique within one table and are case sensitive, 
 i.e., c1 and C1 are different.
 4) Column group names will be used as the physical column group directory 
 path names.
 5) Zebra V2 will support dropColumnGroup by column group names (will 
 integrate with Raghu's A29 drop column work).
 6) Zebra V2 can support backward compatibility (If there are Zebra V1 created 
 tables in production when V2 is released). More specifically, this means that 
 Zebra V2 can load from V1-created tables and do dropColumnGroup on it.
 7) Does NOT support renaming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-11 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12764552#action_12764552
 ] 

Raghu Angadi commented on PIG-993:
--

This patch depends on PIG-992. It is not a functional dependency and can be 
removed if required.

 [zebra] Abitlity to drop a column group in a table
 --

 Key: PIG-993
 URL: https://issues.apache.org/jira/browse/PIG-993
 Project: Pig
  Issue Type: Bug
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Fix For: 0.6.0

 Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch, 
 zebra-drop-cg.patch


 A Zebra table is stored as multiple sub tables each containing a set of 
 columns called column group (CG). The user specifies how these columns are 
 grouped while creating a table through the _storage hint_.
 For some of the large tables, it might be necessary for users to remove a set 
 of columns and retain the rest. This jira provides a way for users to delete 
 an entire column group. 
 The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-986) [zebra] Zebra Column Group Naming Support

2009-10-10 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-986:
-

Status: Open  (was: Patch Available)

 [zebra] Zebra Column Group Naming Support
 -

 Key: PIG-986
 URL: https://issues.apache.org/jira/browse/PIG-986
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.4.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.6.0

 Attachments: ColumnGroupName.patch, ColumnGroupName.patch, 
 ColumnGroupName.patch


 We introduce column group name to Zebra and make it a first-class citizen in 
 Zebra. This can ease management of column groups.
 We plan to introduce an as clause for column group name in Zebra's syntax.
 Functional Specifications:
 1) Column group names are optional. For column groups which do not have a 
 user-provided name, Zebra will assign some default column group names 
 internally that is unique for that table - CG0, CG1, CG2 ... Note: If CGx is 
 used by user, then it can not be used for internal names.
 2) We introduce an AS clause in Zebra's syntax for column group names. If 
 it occurs, it has to immediately follow [ ]. For example, [a1, a2] as PI 
 secure by user:joe group:secure perm:640; [a3, a4] as General compress by 
 lzo. Note that keyword AS is case insensitive.
 3) Column group names are unique within one table and are case sensitive, 
 i.e., c1 and C1 are different.
 4) Column group names will be used as the physical column group directory 
 path names.
 5) Zebra V2 will support dropColumnGroup by column group names (will 
 integrate with Raghu's A29 drop column work).
 6) Zebra V2 can support backward compatibility (If there are Zebra V1 created 
 tables in production when V2 is released). More specifically, this means that 
 Zebra V2 can load from V1-created tables and do dropColumnGroup on it.
 7) Does NOT support renaming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-986) [zebra] Zebra Column Group Naming Support

2009-10-10 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-986:
-

Status: Patch Available  (was: Open)

 [zebra] Zebra Column Group Naming Support
 -

 Key: PIG-986
 URL: https://issues.apache.org/jira/browse/PIG-986
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.4.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.6.0

 Attachments: ColumnGroupName.patch, ColumnGroupName.patch, 
 ColumnGroupName.patch


 We introduce column group name to Zebra and make it a first-class citizen in 
 Zebra. This can ease management of column groups.
 We plan to introduce an as clause for column group name in Zebra's syntax.
 Functional Specifications:
 1) Column group names are optional. For column groups which do not have a 
 user-provided name, Zebra will assign some default column group names 
 internally that is unique for that table - CG0, CG1, CG2 ... Note: If CGx is 
 used by user, then it can not be used for internal names.
 2) We introduce an AS clause in Zebra's syntax for column group names. If 
 it occurs, it has to immediately follow [ ]. For example, [a1, a2] as PI 
 secure by user:joe group:secure perm:640; [a3, a4] as General compress by 
 lzo. Note that keyword AS is case insensitive.
 3) Column group names are unique within one table and are case sensitive, 
 i.e., c1 and C1 are different.
 4) Column group names will be used as the physical column group directory 
 path names.
 5) Zebra V2 will support dropColumnGroup by column group names (will 
 integrate with Raghu's A29 drop column work).
 6) Zebra V2 can support backward compatibility (If there are Zebra V1 created 
 tables in production when V2 is released). More specifically, this means that 
 Zebra V2 can load from V1-created tables and do dropColumnGroup on it.
 7) Does NOT support renaming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-08 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763836#action_12763836
 ] 

Raghu Angadi commented on PIG-987:
--

Thanks Yan. It might be better to remove gauravj also since it is ignored 
anyway. 

This implies column access control is not tested in this patch, right?

 [zebra] Zebra Column Group Access Control
 -

 Key: PIG-987
 URL: https://issues.apache.org/jira/browse/PIG-987
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.6.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Attachments: ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, 
 ColumnGroupSecurity.patch, TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, 
 TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt, tmp-987-plus-991.patch


 Access Control: when processes try to read from the column groups, Zebra 
 should be able to handle allowed vs. disallowed user/application accesses.  
 The security is eventuallt granted by corresponding  HDFS security of the 
 data stored.
 Expected behavior when column group permissions are set:
 When user selects only columns that they do not have permissions to 
 access, Zebra should return error with message Error #: Permission denied 
 for accessing column column name or names 
 Access control applies to an entire column group, so all columns in a column 
 group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-991) [zebra] A few minor bugs as described in the Description section

2009-10-08 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-991:
-

Attachment: Bugs-2.patch

I am committing a slightly modified patch. I removed the following lines that 
modified build.xml at the top level. Please ask one of the PIG committers to 
commit that change.

The part that is removed :
{noformat}
@@ -940,4 +942,13 @@

  target name=published depends=ivy-publish-local, maven-artifacts/

+target name=pig-test
+jar
+  jarfile=${build.dir}/pig-test-${version}.jar
+  basedir=${build.dir}/test/classes
+  excludes=**/Test*.class
+
+/jar
+/target
+
 /project
{noformat}

 [zebra] A few minor bugs as described in the Description section
 

 Key: PIG-991
 URL: https://issues.apache.org/jira/browse/PIG-991
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: 0.6.0

 Attachments: Bugs-2.patch, Bugs.patch


 1) lzo2 was used as the compressor name for the LZO compression algorithm; 
 it should be lzo instead;
 2) the default compression is changed from lzo to gz for gzip;
 3) In JAVACC file SchemaParser.jjt, the package name was wrong using the old 
 package org.apache.pig.table.types;
 4) in build.xml, two new javacc targets are added to generate 
 TableSchemaParser and TableStorageParser java codes;
 5) Support of column group security ( 
 https://issues.apache.org/jira/browse/PIG-987 ) lacked support of the 
 dumpinfo method: the groups and permissions were not displayed. Note that as 
 a consequence, the patch herein must be applied after that of JIRA987.
 6) and 7) a couple of issues reported in Jira917.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-08 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-987:
-

   Resolution: Fixed
Fix Version/s: 0.6.0
   Status: Resolved  (was: Patch Available)

I just committed this. Thanks Yan!

 [zebra] Zebra Column Group Access Control
 -

 Key: PIG-987
 URL: https://issues.apache.org/jira/browse/PIG-987
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.6.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.6.0

 Attachments: ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, 
 ColumnGroupSecurity.patch, TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, 
 TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt, tmp-987-plus-991.patch


 Access Control: when processes try to read from the column groups, Zebra 
 should be able to handle allowed vs. disallowed user/application accesses.  
 The security is eventuallt granted by corresponding  HDFS security of the 
 data stored.
 Expected behavior when column group permissions are set:
 When user selects only columns that they do not have permissions to 
 access, Zebra should return error with message Error #: Permission denied 
 for accessing column column name or names 
 Access control applies to an entire column group, so all columns in a column 
 group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-991) [zebra] A few minor bugs as described in the Description section

2009-10-08 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-991:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed this. Thanks Yan.

 [zebra] A few minor bugs as described in the Description section
 

 Key: PIG-991
 URL: https://issues.apache.org/jira/browse/PIG-991
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: 0.6.0

 Attachments: Bugs-2.patch, Bugs.patch


 1) lzo2 was used as the compressor name for the LZO compression algorithm; 
 it should be lzo instead;
 2) the default compression is changed from lzo to gz for gzip;
 3) In JAVACC file SchemaParser.jjt, the package name was wrong using the old 
 package org.apache.pig.table.types;
 4) in build.xml, two new javacc targets are added to generate 
 TableSchemaParser and TableStorageParser java codes;
 5) Support of column group security ( 
 https://issues.apache.org/jira/browse/PIG-987 ) lacked support of the 
 dumpinfo method: the groups and permissions were not displayed. Note that as 
 a consequence, the patch herein must be applied after that of JIRA987.
 6) and 7) a couple of issues reported in Jira917.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-07 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763346#action_12763346
 ] 

Raghu Angadi commented on PIG-987:
--

I finally got some time look into this. Yes. I think the it should be fixed in 
the tests. TestColumnGroup.java says :  
{noformat}
ColumnGroup.Writer writer = new ColumnGroup.Writer(path, strSchema, sorted,
pig, gz, gauravj, users, (short) Short.parseShort(755, 8), 
false, conf);
{noformat}

using local FS. How can we expect users to have a user name gauravj on their 
machines and run as superusers :)? just can not be done.

If the test wants to run with these permissions we should do :
 a) use HDFS (MiniDFSCluster) rather than local filesystem. The tester has all 
the permissions on a MiniDFS.
 b) minor : use a generic name than gauravj.


 [zebra] Zebra Column Group Access Control
 -

 Key: PIG-987
 URL: https://issues.apache.org/jira/browse/PIG-987
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.6.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Attachments: ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, 
 TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, 
 TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt, tmp-987-plus-991.patch


 Access Control: when processes try to read from the column groups, Zebra 
 should be able to handle allowed vs. disallowed user/application accesses.  
 The security is eventuallt granted by corresponding  HDFS security of the 
 data stored.
 Expected behavior when column group permissions are set:
 When user selects only columns that they do not have permissions to 
 access, Zebra should return error with message Error #: Permission denied 
 for accessing column column name or names 
 Access control applies to an entire column group, so all columns in a column 
 group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-06 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762812#action_12762812
 ] 

Raghu Angadi commented on PIG-987:
--

I tried to commit this patch. 'ant test' says all the tests fail, where as only 
one two tests fail without the patch.

Does Hudson actual run Zebra tests?


 [zebra] Zebra Column Group Access Control
 -

 Key: PIG-987
 URL: https://issues.apache.org/jira/browse/PIG-987
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.6.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Attachments: ColumnGroupSecurity.patch


 Access Control: when processes try to read from the column groups, Zebra 
 should be able to handle allowed vs. disallowed user/application accesses.  
 The security is eventuallt granted by corresponding  HDFS security of the 
 data stored.
 Expected behavior when column group permissions are set:
 When user selects only columns that they do not have permissions to 
 access, Zebra should return error with message Error #: Permission denied 
 for accessing column column name or names 
 Access control applies to an entire column group, so all columns in a column 
 group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-991) [zebra] A few minor bugs as described in the Description section

2009-10-06 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-991:
-

Release Note:   (was: Patch should be applied after that of Jira987.)

bq. Patch should be applied after that of Jira987.

[moved above comment from 'Release Notes' to this comment].

 [zebra] A few minor bugs as described in the Description section
 

 Key: PIG-991
 URL: https://issues.apache.org/jira/browse/PIG-991
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: 0.6.0

 Attachments: Bugs.patch


 1) lzo2 was used as the compressor name for the LZO compression algorithm; 
 it should be lzo instead;
 2) the default compression is changed from lzo to gz for gzip;
 3) In JAVACC file SchemaParser.jjt, the package name was wrong using the old 
 package org.apache.pig.table.types;
 4) in build.xml, two new javacc targets are added to generate 
 TableSchemaParser and TableStorageParser java codes;
 5) Support of column group security ( 
 https://issues.apache.org/jira/browse/PIG-987 ) lacked support of the 
 dumpinfo method: the groups and permissions were not displayed. Note that as 
 a consequence, the patch herein must be applied after that of JIRA987.
 6) and 7) a couple of issues reported in Jira917.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-06 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-987:
-

Attachment: TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt

I am attaching {{mapred.TestCheckin.txt}} that passes without the patch.

btw, not all tests pass even without the patch. What is the environment 
required? I did a fresh check out, and ran 'ant test'.

I guess the tests failures on trunk are related to lzo. But I didn't expect 
more failures with the patch.

Looks like PIG-991 removes the lzo dependency. I will try with that patch 
included.

 [zebra] Zebra Column Group Access Control
 -

 Key: PIG-987
 URL: https://issues.apache.org/jira/browse/PIG-987
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.6.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Attachments: ColumnGroupSecurity.patch, 
 TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt


 Access Control: when processes try to read from the column groups, Zebra 
 should be able to handle allowed vs. disallowed user/application accesses.  
 The security is eventuallt granted by corresponding  HDFS security of the 
 data stored.
 Expected behavior when column group permissions are set:
 When user selects only columns that they do not have permissions to 
 access, Zebra should return error with message Error #: Permission denied 
 for accessing column column name or names 
 Access control applies to an entire column group, so all columns in a column 
 group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-06 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762829#action_12762829
 ] 

Raghu Angadi commented on PIG-987:
--

Not sure if this is related to PIG. When I applied PIG-991 over this, the tests 
passed (except the ones that fail on trunk).


 [zebra] Zebra Column Group Access Control
 -

 Key: PIG-987
 URL: https://issues.apache.org/jira/browse/PIG-987
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.6.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Attachments: ColumnGroupSecurity.patch, 
 TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt


 Access Control: when processes try to read from the column groups, Zebra 
 should be able to handle allowed vs. disallowed user/application accesses.  
 The security is eventuallt granted by corresponding  HDFS security of the 
 data stored.
 Expected behavior when column group permissions are set:
 When user selects only columns that they do not have permissions to 
 access, Zebra should return error with message Error #: Permission denied 
 for accessing column column name or names 
 Access control applies to an entire column group, so all columns in a column 
 group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-06 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-993:
-

Fix Version/s: 0.6.0

 [zebra] Abitlity to drop a column group in a table
 --

 Key: PIG-993
 URL: https://issues.apache.org/jira/browse/PIG-993
 Project: Pig
  Issue Type: Bug
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Fix For: 0.6.0

 Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch, 
 zebra-drop-cg.patch


 A Zebra table is stored as multiple sub tables each containing a set of 
 columns called column group (CG). The user specifies how these columns are 
 grouped while creating a table through the _storage hint_.
 For some of the large tables, it might be necessary for users to remove a set 
 of columns and retain the rest. This jira provides a way for users to delete 
 an entire column group. 
 The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-06 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762871#action_12762871
 ] 

Raghu Angadi commented on PIG-987:
--

Even with PIG-991 included, I am seeing lzo related failures. Could you run 
tests on a clean checkout? If you didn't see the errors before then you 
probably have lzo set up in your environment, which is not a requirement. 



 [zebra] Zebra Column Group Access Control
 -

 Key: PIG-987
 URL: https://issues.apache.org/jira/browse/PIG-987
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.6.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Attachments: ColumnGroupSecurity.patch, 
 TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt


 Access Control: when processes try to read from the column groups, Zebra 
 should be able to handle allowed vs. disallowed user/application accesses.  
 The security is eventuallt granted by corresponding  HDFS security of the 
 data stored.
 Expected behavior when column group permissions are set:
 When user selects only columns that they do not have permissions to 
 access, Zebra should return error with message Error #: Permission denied 
 for accessing column column name or names 
 Access control applies to an entire column group, so all columns in a column 
 group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-02 Thread Raghu Angadi (JIRA)
[zebra] Abitlity to drop a column group in a table
--

 Key: PIG-993
 URL: https://issues.apache.org/jira/browse/PIG-993
 Project: Pig
  Issue Type: Bug
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Fix For: 0.5.0



A Zebra table is stored as multiple sub tables each containing a set of columns 
called column group (CG). The user specifies how these columns are grouped 
while creating a table through the _storage hint_.

For some of the large tables, it might be necessary for users to remove a set 
of columns and retain the rest. This jira provides a way for users to delete an 
entire column group. 

The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-02 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12761766#action_12761766
 ] 

Raghu Angadi commented on PIG-993:
--


API  is pretty simple : {code}
class org.apache.hadoop.zebra.BasicTable {
 /** see the patch for JavaDoc and attached example for usage */

public static void dropColumnGroup(Path path,
   Configuration conf,   String cgName)
   throws IOException { ... }
}
{code}

  * Table schema is not modified.  
  * this API takes a name for a column group. PIG-986 adds explicit names for 
CGs.
  * Once a CGs is deleted, NULL is returned for the fields that were stored in 
the CG. 
 ** This is the main difference between just manually deleting  a directory 
on filesystem and 'properly' deleting a CG.
 ** Many changes made in other parts of zebra are related to handling the 
missing CGs.


 [zebra] Abitlity to drop a column group in a table
 --

 Key: PIG-993
 URL: https://issues.apache.org/jira/browse/PIG-993
 Project: Pig
  Issue Type: Bug
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Fix For: 0.5.0


 A Zebra table is stored as multiple sub tables each containing a set of 
 columns called column group (CG). The user specifies how these columns are 
 grouped while creating a table through the _storage hint_.
 For some of the large tables, it might be necessary for users to remove a set 
 of columns and retain the rest. This jira provides a way for users to delete 
 an entire column group. 
 The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-02 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-993:
-

Attachment: zebra-drop-cg.patch
DropColumnGroupExample.java

Attachments ; 

  DropColumnGropuExample.java : a simple example to illustrate the 
functionality.

  zebra-drop-cg.patch : This patch would apply only after a patch for PIG-896.

  Some of the tests included there are written by Jing Huang. Jing also helped 
with testing the patchon real clusters with various errors. Yan Zhou helped 
with correctly handling missing column groups.



 [zebra] Abitlity to drop a column group in a table
 --

 Key: PIG-993
 URL: https://issues.apache.org/jira/browse/PIG-993
 Project: Pig
  Issue Type: Bug
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Fix For: 0.5.0

 Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch


 A Zebra table is stored as multiple sub tables each containing a set of 
 columns called column group (CG). The user specifies how these columns are 
 grouped while creating a table through the _storage hint_.
 For some of the large tables, it might be necessary for users to remove a set 
 of columns and retain the rest. This jira provides a way for users to delete 
 an entire column group. 
 The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-02 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12761769#action_12761769
 ] 

Raghu Angadi commented on PIG-993:
--

 zebra-drop-cg.patch : This patch would apply only after a patch for PIG-896.
I meant say PIG-986.


 [zebra] Abitlity to drop a column group in a table
 --

 Key: PIG-993
 URL: https://issues.apache.org/jira/browse/PIG-993
 Project: Pig
  Issue Type: Bug
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Fix For: 0.5.0

 Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch


 A Zebra table is stored as multiple sub tables each containing a set of 
 columns called column group (CG). The user specifies how these columns are 
 grouped while creating a table through the _storage hint_.
 For some of the large tables, it might be necessary for users to remove a set 
 of columns and retain the rest. This jira provides a way for users to delete 
 an entire column group. 
 The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-985) [zebra] Make necessary changes to build scripts to accommodate new zebra features plus other improvement.

2009-09-30 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12761045#action_12761045
 ] 

Raghu Angadi commented on PIG-985:
--

 5) drop column group change (Raghu Angadi)
 6) schema package separation change (Yan Zhou)

Just to clarify, this patch does not contain the above two features. It only 
contains couple of minor changes made in build.xml as part of these changes. 
Separate jiras will be filed for these two and other features soon. 


 [zebra] Make necessary changes to build scripts to accommodate new zebra 
 features plus other improvement.
 -

 Key: PIG-985
 URL: https://issues.apache.org/jira/browse/PIG-985
 Project: Pig
  Issue Type: Task
  Components: build
Reporter: Chao Wang
Assignee: Chao Wang
 Attachments: patch


 The whole task consists of a series of steps as follows:
 1) nightly test change - prevent checkin tests from running twice in nightly 
 (Chao Wang)
 2) row based block splits for tables change (Raghu Angadi)
 3) add clover target (Jing Huang)
 4) add findbugs target (Chao Wang)
 5) drop column group change (Raghu Angadi) 
 6) schema package separation change (Yan Zhou)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-25 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-949:
-

   Resolution: Fixed
Fix Version/s: (was: 0.4.0)
   Status: Resolved  (was: Patch Available)

 Zebra Bug: splitting map into multiple column group using storage hint causes 
 unexpected behaviour
 --

 Key: PIG-949
 URL: https://issues.apache.org/jira/browse/PIG-949
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
 Environment: linux
Reporter: Alok Singh
Assignee: Yan Zhou
 Fix For: 0.5.0

 Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch


 Hi 
  The storage hint
 specification plays a important part whether the output table is readable or 
 not
 say if we have have the map 'map'.
 One can split the map into a column group using [map#{k1}, map#{k2}...] 
 however the remaining map field will automatically be added to the default 
 group.
 if user try to create a new column group for the remaining fields as follows
 [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
 the table writer will create the table.
 however, if one tries to load the created table via pig or via map reduce 
 using TableInputFormat
  
 then the reader  have problem reading the map
 We get the following stack trace
 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
 attempt_200908191538_33939_m_21_2, Status : FAILED
 java.io.IOException: getValue() failed: null
 at 
 org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
 at 
 org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
 at 
 org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-25 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12759789#action_12759789
 ] 

Raghu Angadi commented on PIG-949:
--

I just committed this. Thanks Yan for the fix and Jing for the test!

 Zebra Bug: splitting map into multiple column group using storage hint causes 
 unexpected behaviour
 --

 Key: PIG-949
 URL: https://issues.apache.org/jira/browse/PIG-949
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
 Environment: linux
Reporter: Alok Singh
Assignee: Yan Zhou
 Fix For: 0.5.0

 Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch


 Hi 
  The storage hint
 specification plays a important part whether the output table is readable or 
 not
 say if we have have the map 'map'.
 One can split the map into a column group using [map#{k1}, map#{k2}...] 
 however the remaining map field will automatically be added to the default 
 group.
 if user try to create a new column group for the remaining fields as follows
 [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
 the table writer will create the table.
 however, if one tries to load the created table via pig or via map reduce 
 using TableInputFormat
  
 then the reader  have problem reading the map
 We get the following stack trace
 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
 attempt_200908191538_33939_m_21_2, Status : FAILED
 java.io.IOException: getValue() failed: null
 at 
 org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
 at 
 org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
 at 
 org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-22 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-949:
-

Status: Open  (was: Patch Available)

 Zebra Bug: splitting map into multiple column group using storage hint causes 
 unexpected behaviour
 --

 Key: PIG-949
 URL: https://issues.apache.org/jira/browse/PIG-949
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
 Environment: linux
Reporter: Alok Singh
Assignee: Yan Zhou
 Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch


 Hi 
  The storage hint
 specification plays a important part whether the output table is readable or 
 not
 say if we have have the map 'map'.
 One can split the map into a column group using [map#{k1}, map#{k2}...] 
 however the remaining map field will automatically be added to the default 
 group.
 if user try to create a new column group for the remaining fields as follows
 [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
 the table writer will create the table.
 however, if one tries to load the created table via pig or via map reduce 
 using TableInputFormat
  
 then the reader  have problem reading the map
 We get the following stack trace
 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
 attempt_200908191538_33939_m_21_2, Status : FAILED
 java.io.IOException: getValue() failed: null
 at 
 org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
 at 
 org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
 at 
 org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-22 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-949:
-

Fix Version/s: 0.5.0
   0.4.0
   Status: Patch Available  (was: Open)

 Zebra Bug: splitting map into multiple column group using storage hint causes 
 unexpected behaviour
 --

 Key: PIG-949
 URL: https://issues.apache.org/jira/browse/PIG-949
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
 Environment: linux
Reporter: Alok Singh
Assignee: Yan Zhou
 Fix For: 0.4.0, 0.5.0

 Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch


 Hi 
  The storage hint
 specification plays a important part whether the output table is readable or 
 not
 say if we have have the map 'map'.
 One can split the map into a column group using [map#{k1}, map#{k2}...] 
 however the remaining map field will automatically be added to the default 
 group.
 if user try to create a new column group for the remaining fields as follows
 [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
 the table writer will create the table.
 however, if one tries to load the created table via pig or via map reduce 
 using TableInputFormat
  
 then the reader  have problem reading the map
 We get the following stack trace
 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
 attempt_200908191538_33939_m_21_2, Status : FAILED
 java.io.IOException: getValue() failed: null
 at 
 org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
 at 
 org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
 at 
 org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried

2009-09-03 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi reassigned PIG-918:


Assignee: Yan Zhou

 [zebra] LOAD call will hang if only the first column group is queried
 -

 Key: PIG-918
 URL: https://issues.apache.org/jira/browse/PIG-918
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.4.0

 Attachments: pig-zebra.patch, pig-zebra.patch


 Zebra's LOAD call with projections that only nclude column(s) in the first 
 column group will hang because an improper range of random numbers for index 
 to the array of column groups always skips the first element so that if all 
 other column groups are not used, the looping keeps running without a chance 
 to break.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried

2009-09-01 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-918:
-

Attachment: pig-zebra.patch

When you generate a patch with 'git diff' please use 'git diff --no-prefix' so 
that patch applies with 'patch -p0' command. I am updating the attached patch 
with this change.


 [zebra] LOAD call will hang if only the first column group is queried
 -

 Key: PIG-918
 URL: https://issues.apache.org/jira/browse/PIG-918
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Yan Zhou
 Fix For: 0.4.0

 Attachments: pig-zebra.patch, pig-zebra.patch


 Zebra's LOAD call with projections that only nclude column(s) in the first 
 column group will hang because an improper range of random numbers for index 
 to the array of column groups always skips the first element so that if all 
 other column groups are not used, the looping keeps running without a chance 
 to break.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried

2009-09-01 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-918:
-

Affects Version/s: (was: 0.3.0)
   0.4.0

 [zebra] LOAD call will hang if only the first column group is queried
 -

 Key: PIG-918
 URL: https://issues.apache.org/jira/browse/PIG-918
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Yan Zhou
 Fix For: 0.4.0

 Attachments: pig-zebra.patch, pig-zebra.patch


 Zebra's LOAD call with projections that only nclude column(s) in the first 
 column group will hang because an improper range of random numbers for index 
 to the array of column groups always skips the first element so that if all 
 other column groups are not used, the looping keeps running without a chance 
 to break.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried

2009-09-01 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12750055#action_12750055
 ] 

Raghu Angadi commented on PIG-918:
--

I just committed this. Thanks Yan.

 [zebra] LOAD call will hang if only the first column group is queried
 -

 Key: PIG-918
 URL: https://issues.apache.org/jira/browse/PIG-918
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Yan Zhou
 Fix For: 0.4.0

 Attachments: pig-zebra.patch, pig-zebra.patch


 Zebra's LOAD call with projections that only nclude column(s) in the first 
 column group will hang because an improper range of random numbers for index 
 to the array of column groups always skips the first element so that if all 
 other column groups are not used, the looping keeps running without a chance 
 to break.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-08-19 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745219#action_12745219
 ] 

Raghu Angadi commented on PIG-833:
--

Thanks Jing. There are some PIG examples listed at the bottom of Zebra wiki : 
http://wiki.apache.org/pig/zebra (wiki is still under construction).

Just listing java strings in Jing's comment with out Jira formatting :

{noformat}
final static String STR_SCHEMA = 
 s1:bool, s2:int, s3:long, s4:float, s5:string, s6:bytes,  +
 r1:record(f1:int, f2:long), r2:record(r3:record(f3:float, f4)),  +
 m1:map(string),m2:map(map(int)), c:collection(f13:double, f14:float, 
f15:bytes);

final static String STR_STORAGE = 
  [s1, s2]; [m1#{a}]; [r1.f1]; [s3, s4, r2.r3.f3]; [s5, s6, m2#{x|y}];   +
  [r1.f2, m1#{b}]; [r2.r3.f4, m2#{z}];
{noformat}

 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
 PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
 TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-833) Storage access layer

2009-08-11 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-833:
-

Attachment: PIG-833-zebra.patch.bz2

Updated patch. Only change is that ant prints a descriptive error to user if 
hadoop20.jar does not exist in top level lib directory. It lists basic steps to 
get this built until PIG-660 is committed.


 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
 PIG-833-zebra.patch.bz2, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-833) Storage access layer

2009-08-11 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-833:
-

Attachment: PIG-833-zebra.patch.bz2

 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
 PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-08-11 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742069#action_12742069
 ] 

Raghu Angadi commented on PIG-833:
--

Alan, in order to run unit tests you need to build pig test-core.

As mentioned in the instructions above please run {{'ant -Dtestcase=none 
test-core'}} under top level directory before running 'ant test' under 
contrib/zebra.


 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
 PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, test.out, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-07-29 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736998#action_12736998
 ] 

Raghu Angadi commented on PIG-833:
--

There will be benchmark results either attached to this jira or to a subsequent 
jira.

I would like to compare to SequenceFiles and the new format in Hive. Should to 
see on par performance.

Major performance benefits come from commonly used projections (through column 
groups) and map side joins of sorted tables. An important part of motivation is 
some features like column security, ability to delete entire columns. 

We are running some larger scale benchmarks internally.. but these run on 
Yahoo's internal data sources.


 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-660) Integration with Hadoop 0.20

2009-07-28 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736264#action_12736264
 ] 

Raghu Angadi commented on PIG-660:
--

Currently, hadoop jar for 0.18 under lib/ is called hadoop18.jar. Should we 
change build.xml to use hadoop20.jar instead of hadoop18.jar?

I can file a jira to commit hadoop20.jar. This might be replaced by updated jar 
when this jira is committed.

 Integration with Hadoop 0.20
 

 Key: PIG-660
 URL: https://issues.apache.org/jira/browse/PIG-660
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop 0.20
Reporter: Santhosh Srinivasan
Assignee: Santhosh Srinivasan
 Fix For: 0.4.0

 Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, 
 PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch


 With Hadoop 0.20, it will be possible to query the status of each map and 
 reduce in a map reduce job. This will allow better error reporting. Some of 
 the other items that could be on Hadoop's feature requests/bugs are 
 documented here for tracking.
 1. Hadoop should return objects instead of strings when exceptions are thrown
 2. The JobControl should handle all exceptions and report them appropriately. 
 For example, when the JobControl fails to launch jobs, it should handle 
 exceptions appropriately and should support APIs that query this state, i.e., 
 failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-660) Integration with Hadoop 0.20

2009-07-28 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736297#action_12736297
 ] 

Raghu Angadi commented on PIG-660:
--

Thanks Olga and Santosh.

build.xml change is already in the patch. Thanks.

I will attach hadoop20.jar that works with PIG. This is useful for anyone to 
tryout the patch. This will also be used by zebra (PIG-833). Please commit the 
jar file to PIG trunk. It could be updated with a later version of hadoop-0.20 
branch.

 Integration with Hadoop 0.20
 

 Key: PIG-660
 URL: https://issues.apache.org/jira/browse/PIG-660
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop 0.20
Reporter: Santhosh Srinivasan
Assignee: Santhosh Srinivasan
 Fix For: 0.4.0

 Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, 
 PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch


 With Hadoop 0.20, it will be possible to query the status of each map and 
 reduce in a map reduce job. This will allow better error reporting. Some of 
 the other items that could be on Hadoop's feature requests/bugs are 
 documented here for tracking.
 1. Hadoop should return objects instead of strings when exceptions are thrown
 2. The JobControl should handle all exceptions and report them appropriately. 
 For example, when the JobControl fails to launch jobs, it should handle 
 exceptions appropriately and should support APIs that query this state, i.e., 
 failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-660) Integration with Hadoop 0.20

2009-07-28 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-660:
-

Attachment: PIG-660_6.patch

Updated patch fixes two minor conflicts with the current pig trunk.

 Integration with Hadoop 0.20
 

 Key: PIG-660
 URL: https://issues.apache.org/jira/browse/PIG-660
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Hadoop 0.20
Reporter: Santhosh Srinivasan
Assignee: Santhosh Srinivasan
 Fix For: 0.4.0

 Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, 
 PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch, PIG-660_6.patch


 With Hadoop 0.20, it will be possible to query the status of each map and 
 reduce in a map reduce job. This will allow better error reporting. Some of 
 the other items that could be on Hadoop's feature requests/bugs are 
 documented here for tracking.
 1. Hadoop should return objects instead of strings when exceptions are thrown
 2. The JobControl should handle all exceptions and report them appropriately. 
 For example, when the JobControl fails to launch jobs, it should handle 
 exceptions appropriately and should support APIs that query this state, i.e., 
 failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-833) Storage access layer

2009-07-28 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-833:
-

Attachment: hadoop20.jar.bz2

Attaching hadoop20.jar that needs to be placed under lib/ directory under the 
top level PIG directory. will included specific instructions later in the jira.

 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-07-28 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736424#action_12736424
 ] 

Raghu Angadi commented on PIG-833:
--


Will surely look at Hive's storage layer and SerDe. I will be able to better 
comment on specifics  once I get better handle. In the mean while I will attach 
the work that is already been done on Zebra. 

This is currently a contrib in PIG. Based on these experiences we could 
probably provide a common storage layer more widely suitable for multiple 
Hadoop related projects.

 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-833) Storage access layer

2009-07-28 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-833:
-

Attachment: PIG-833-zebra.patch

The first cut of contrib/zebra. The patch is very large and should probably 
compress the subsequent versions of it.

More documentation on design and usage will be added to the jira.

How to compile :
--
 * check out latest PIG trunk
 * Apply the latest patch from PIG-660
 * copy attached hadoop20.jar to ./lib
 * run '{{ant jar}}' (and {{'ant -Dtestcase=none test-core'}} for zebra tests).
 * cd contrib/zebra
 * ant jar
 * ant test (for tests).

Currently there are compile time deprecation warnings related to use of 
deprecated mapred API (JobConf). There is will be fixed later.


 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-833) Storage access layer

2009-07-28 Thread Raghu Angadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-833:
-

Attachment: zebra-javadoc.tgz

 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.