from:"\"Raghu Angadi \\\(JIRA\\\)\""

[jira] Commented: (PIG-660) Integration with Hadoop 0.20

2009-07-28 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736264#action_12736264
 ] 

Raghu Angadi commented on PIG-660:
--

Currently, hadoop jar for 0.18 under lib/ is called hadoop18.jar. Should we 
change build.xml to use hadoop20.jar instead of hadoop18.jar?

I can file a jira to commit hadoop20.jar. This might be replaced by updated jar 
when this jira is committed.

> Integration with Hadoop 0.20
> 
>
> Key: PIG-660
> URL: https://issues.apache.org/jira/browse/PIG-660
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
> Environment: Hadoop 0.20
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: 0.4.0
>
> Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, 
> PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and 
> reduce in a map reduce job. This will allow better error reporting. Some of 
> the other items that could be on Hadoop's feature requests/bugs are 
> documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. 
> For example, when the JobControl fails to launch jobs, it should handle 
> exceptions appropriately and should support APIs that query this state, i.e., 
> failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-660) Integration with Hadoop 0.20

2009-07-28 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736297#action_12736297
 ] 

Raghu Angadi commented on PIG-660:
--

Thanks Olga and Santosh.

build.xml change is already in the patch. Thanks.

I will attach hadoop20.jar that works with PIG. This is useful for anyone to 
tryout the patch. This will also be used by zebra (PIG-833). Please commit the 
jar file to PIG trunk. It could be updated with a later version of hadoop-0.20 
branch.

> Integration with Hadoop 0.20
> 
>
> Key: PIG-660
> URL: https://issues.apache.org/jira/browse/PIG-660
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
> Environment: Hadoop 0.20
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: 0.4.0
>
> Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, 
> PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and 
> reduce in a map reduce job. This will allow better error reporting. Some of 
> the other items that could be on Hadoop's feature requests/bugs are 
> documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. 
> For example, when the JobControl fails to launch jobs, it should handle 
> exceptions appropriately and should support APIs that query this state, i.e., 
> failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-660) Integration with Hadoop 0.20

2009-07-28 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-660:
-

Attachment: PIG-660_6.patch

Updated patch fixes two minor conflicts with the current pig trunk.

> Integration with Hadoop 0.20
> 
>
> Key: PIG-660
> URL: https://issues.apache.org/jira/browse/PIG-660
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
> Environment: Hadoop 0.20
>Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: 0.4.0
>
> Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, 
> PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch, PIG-660_6.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and 
> reduce in a map reduce job. This will allow better error reporting. Some of 
> the other items that could be on Hadoop's feature requests/bugs are 
> documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. 
> For example, when the JobControl fails to launch jobs, it should handle 
> exceptions appropriately and should support APIs that query this state, i.e., 
> failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-833) Storage access layer

2009-07-28 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-833:
-

Attachment: hadoop20.jar.bz2

Attaching hadoop20.jar that needs to be placed under lib/ directory under the 
top level PIG directory. will included specific instructions later in the jira.

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-833) Storage access layer

2009-07-28 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736424#action_12736424
 ] 

Raghu Angadi commented on PIG-833:
--


Will surely look at Hive's storage layer and SerDe. I will be able to better 
comment on specifics  once I get better handle. In the mean while I will attach 
the work that is already been done on Zebra. 

This is currently a contrib in PIG. Based on these experiences we could 
probably provide a common storage layer more widely suitable for multiple 
Hadoop related projects.

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-833) Storage access layer

2009-07-28 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-833:
-

Attachment: PIG-833-zebra.patch

The first cut of contrib/zebra. The patch is very large and should probably 
compress the subsequent versions of it.

More documentation on design and usage will be added to the jira.

How to compile :
--
 * check out latest PIG trunk
 * Apply the latest patch from PIG-660
 * copy attached hadoop20.jar to ./lib
 * run '{{ant jar}}' (and {{'ant -Dtestcase=none test-core'}} for zebra tests).
 * cd contrib/zebra
 * ant jar
 * ant test (for tests).

Currently there are compile time deprecation warnings related to use of 
deprecated mapred API (JobConf). There is will be fixed later.


> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-833) Storage access layer

2009-07-28 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-833:
-

Attachment: zebra-javadoc.tgz

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-833) Storage access layer

2009-07-29 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736998#action_12736998
 ] 

Raghu Angadi commented on PIG-833:
--

There will be benchmark results either attached to this jira or to a subsequent 
jira.

I would like to compare to SequenceFiles and the new format in Hive. Should to 
see on par performance.

Major performance benefits come from commonly used projections (through column 
groups) and map side joins of sorted tables. An important part of motivation is 
some features like column security, ability to delete entire columns. 

We are running some larger scale benchmarks internally.. but these run on 
Yahoo's internal data sources.


> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-833) Storage access layer

2009-08-11 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-833:
-

Attachment: PIG-833-zebra.patch.bz2

Updated patch. Only change is that ant prints a descriptive error to user if 
hadoop20.jar does not exist in top level lib directory. It lists basic steps to 
get this built until PIG-660 is committed.


> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-833) Storage access layer

2009-08-11 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-833:
-

Attachment: PIG-833-zebra.patch.bz2

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-833) Storage access layer

2009-08-11 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742069#action_12742069
 ] 

Raghu Angadi commented on PIG-833:
--

Alan, in order to run unit tests you need to build pig test-core.

As mentioned in the instructions above please run {{'ant -Dtestcase=none 
test-core'}} under top level directory before running 'ant test' under 
contrib/zebra.


> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, test.out, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-833) Storage access layer

2009-08-12 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742435#action_12742435
 ] 

Raghu Angadi commented on PIG-833:
--

>  this means Pig contrib/ is no longer compatible with Hadoop 18.

This is not desirable and expected to be temporary until PIG-660 is committed. 
PIG-660 has other dependencies different schedule. We thought committing zebra 
will make zebra builds and subsequent patches easier if it is committed. 

As such PIG does not build contrib from top level ('ant test-contrib' is a 
no-op). So each contrib project needs to be build explicitly anyway. This is 
different from Hadoop build. This this patch should not fail existing automated 
builds.

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
> TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-833) Storage access layer

2009-08-17 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744361#action_12744361
 ] 

Raghu Angadi commented on PIG-833:
--


will try to get some initial docs attached to this jira asap. I think the 
current plan is to have proper wiki pages (and attached here). This is part of 
the reason by we would like to keep this jira open.

The bulk initial dump is certainly not desirable but has been fairly common for 
many contrib projects in Hadoop. A bit of rush to get this committed to contrib 
is in part to avoid such large changes going again. The longer we delay larger 
the patch is going to get. We want to get the subsequent patches and 
discussions to public jira asap and we are already doing that.

I would like to clarify that this is not a PIG feature but rather a contrib 
project. We would not want this commit to be generalized for PIG commits. All 
the responsibility is with Zebra team. This patch is the initial verion. It 
does include many tests. 






> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
> TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-833) Storage access layer

2009-08-19 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745219#action_12745219
 ] 

Raghu Angadi commented on PIG-833:
--

Thanks Jing. There are some PIG examples listed at the bottom of Zebra wiki : 
http://wiki.apache.org/pig/zebra (wiki is still under construction).

Just listing java strings in Jing's comment with out Jira formatting :

{noformat}
final static String STR_SCHEMA = 
 "s1:bool, s2:int, s3:long, s4:float, s5:string, s6:bytes, " +
 "r1:record(f1:int, f2:long), r2:record(r3:record(f3:float, f4)), " +
 "m1:map(string),m2:map(map(int)), c:collection(f13:double, f14:float, 
f15:bytes)";

final static String STR_STORAGE = 
  "[s1, s2]; [m1#{a}]; [r1.f1]; [s3, s4, r2.r3.f3]; [s5, s6, m2#{x|y}];  " +
  "[r1.f2, m1#{b}]; [r2.r3.f4, m2#{z}]";
{noformat}

> Storage access layer
> 
>
> Key: PIG-833
> URL: https://issues.apache.org/jira/browse/PIG-833
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
> Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
> PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
> TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz
>
>
> A layer is needed to provide a high level data access abstraction and a 
> tabular view of data in Hadoop, and could free Pig users from implementing 
> their own data storage/retrieval code.  This layer should also include a 
> columnar storage format in order to provide fast data projection, 
> CPU/space-efficient data serialization, and a schema language to manage 
> physical storage metadata.  Eventually it could also support predicate 
> pushdown for further performance improvement.  Initially, this layer could be 
> a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried

2009-09-01 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-918:
-

Attachment: pig-zebra.patch

When you generate a patch with 'git diff' please use 'git diff --no-prefix' so 
that patch applies with 'patch -p0' command. I am updating the attached patch 
with this change.


> [zebra] LOAD call will hang if only the first column group is queried
> -
>
> Key: PIG-918
> URL: https://issues.apache.org/jira/browse/PIG-918
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Yan Zhou
> Fix For: 0.4.0
>
> Attachments: pig-zebra.patch, pig-zebra.patch
>
>
> Zebra's LOAD call with projections that only nclude column(s) in the first 
> column group will hang because an improper range of random numbers for index 
> to the array of column groups always skips the first element so that if all 
> other column groups are not used, the looping keeps running without a chance 
> to break.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried

2009-09-01 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-918:
-

Affects Version/s: (was: 0.3.0)
   0.4.0

> [zebra] LOAD call will hang if only the first column group is queried
> -
>
> Key: PIG-918
> URL: https://issues.apache.org/jira/browse/PIG-918
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
> Fix For: 0.4.0
>
> Attachments: pig-zebra.patch, pig-zebra.patch
>
>
> Zebra's LOAD call with projections that only nclude column(s) in the first 
> column group will hang because an improper range of random numbers for index 
> to the array of column groups always skips the first element so that if all 
> other column groups are not used, the looping keeps running without a chance 
> to break.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried

2009-09-01 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750055#action_12750055
 ] 

Raghu Angadi commented on PIG-918:
--

I just committed this. Thanks Yan.

> [zebra] LOAD call will hang if only the first column group is queried
> -
>
> Key: PIG-918
> URL: https://issues.apache.org/jira/browse/PIG-918
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
> Fix For: 0.4.0
>
> Attachments: pig-zebra.patch, pig-zebra.patch
>
>
> Zebra's LOAD call with projections that only nclude column(s) in the first 
> column group will hang because an improper range of random numbers for index 
> to the array of column groups always skips the first element so that if all 
> other column groups are not used, the looping keeps running without a chance 
> to break.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried

2009-09-03 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi resolved PIG-918.
--

Resolution: Fixed

> [zebra] LOAD call will hang if only the first column group is queried
> -
>
> Key: PIG-918
> URL: https://issues.apache.org/jira/browse/PIG-918
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.4.0
>
> Attachments: pig-zebra.patch, pig-zebra.patch
>
>
> Zebra's LOAD call with projections that only nclude column(s) in the first 
> column group will hang because an improper range of random numbers for index 
> to the array of column groups always skips the first element so that if all 
> other column groups are not used, the looping keeps running without a chance 
> to break.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (PIG-918) [zebra] LOAD call will hang if only the first column group is queried

2009-09-03 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi reassigned PIG-918:


Assignee: Yan Zhou

> [zebra] LOAD call will hang if only the first column group is queried
> -
>
> Key: PIG-918
> URL: https://issues.apache.org/jira/browse/PIG-918
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.4.0
>
> Attachments: pig-zebra.patch, pig-zebra.patch
>
>
> Zebra's LOAD call with projections that only nclude column(s) in the first 
> column group will hang because an improper range of random numbers for index 
> to the array of column groups always skips the first element so that if all 
> other column groups are not used, the looping keeps running without a chance 
> to break.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-22 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758328#action_12758328
 ] 

Raghu Angadi commented on PIG-949:
--

Yan, please include the test case in the patch. 

Also I would suggest a regular name for the test case file something like 
'TestMapAcrossMultipleCGs.java' or something shorter. Inside the file you could 
mention JIRA number in the comment.

Raghu.

> Zebra Bug: splitting map into multiple column group using storage hint causes 
> unexpected behaviour
> --
>
> Key: PIG-949
> URL: https://issues.apache.org/jira/browse/PIG-949
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
> Environment: linux
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Attachments: Pig_949.patch
>
>
> Hi 
>  The storage hint
> specification plays a important part whether the output table is readable or 
> not
> say if we have have the map 'map'.
> One can split the map into a column group using [map#{k1}, map#{k2}...] 
> however the remaining map field will automatically be added to the default 
> group.
> if user try to create a new column group for the remaining fields as follows
> [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
> the table writer will create the table.
> however, if one tries to load the created table via pig or via map reduce 
> using TableInputFormat
>  
> then the reader  have problem reading the map
> We get the following stack trace
> 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
> attempt_200908191538_33939_m_21_2, Status : FAILED
> java.io.IOException: getValue() failed: null
> at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-22 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-949:
-

Status: Open  (was: Patch Available)

> Zebra Bug: splitting map into multiple column group using storage hint causes 
> unexpected behaviour
> --
>
> Key: PIG-949
> URL: https://issues.apache.org/jira/browse/PIG-949
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
> Environment: linux
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch
>
>
> Hi 
>  The storage hint
> specification plays a important part whether the output table is readable or 
> not
> say if we have have the map 'map'.
> One can split the map into a column group using [map#{k1}, map#{k2}...] 
> however the remaining map field will automatically be added to the default 
> group.
> if user try to create a new column group for the remaining fields as follows
> [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
> the table writer will create the table.
> however, if one tries to load the created table via pig or via map reduce 
> using TableInputFormat
>  
> then the reader  have problem reading the map
> We get the following stack trace
> 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
> attempt_200908191538_33939_m_21_2, Status : FAILED
> java.io.IOException: getValue() failed: null
> at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-22 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-949:
-

Fix Version/s: 0.5.0
   0.4.0
   Status: Patch Available  (was: Open)

> Zebra Bug: splitting map into multiple column group using storage hint causes 
> unexpected behaviour
> --
>
> Key: PIG-949
> URL: https://issues.apache.org/jira/browse/PIG-949
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
> Environment: linux
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Fix For: 0.4.0, 0.5.0
>
> Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch
>
>
> Hi 
>  The storage hint
> specification plays a important part whether the output table is readable or 
> not
> say if we have have the map 'map'.
> One can split the map into a column group using [map#{k1}, map#{k2}...] 
> however the remaining map field will automatically be added to the default 
> group.
> if user try to create a new column group for the remaining fields as follows
> [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
> the table writer will create the table.
> however, if one tries to load the created table via pig or via map reduce 
> using TableInputFormat
>  
> then the reader  have problem reading the map
> We get the following stack trace
> 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
> attempt_200908191538_33939_m_21_2, Status : FAILED
> java.io.IOException: getValue() failed: null
> at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-25 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-949:
-

   Resolution: Fixed
Fix Version/s: (was: 0.4.0)
   Status: Resolved  (was: Patch Available)

> Zebra Bug: splitting map into multiple column group using storage hint causes 
> unexpected behaviour
> --
>
> Key: PIG-949
> URL: https://issues.apache.org/jira/browse/PIG-949
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
> Environment: linux
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Fix For: 0.5.0
>
> Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch
>
>
> Hi 
>  The storage hint
> specification plays a important part whether the output table is readable or 
> not
> say if we have have the map 'map'.
> One can split the map into a column group using [map#{k1}, map#{k2}...] 
> however the remaining map field will automatically be added to the default 
> group.
> if user try to create a new column group for the remaining fields as follows
> [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
> the table writer will create the table.
> however, if one tries to load the created table via pig or via map reduce 
> using TableInputFormat
>  
> then the reader  have problem reading the map
> We get the following stack trace
> 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
> attempt_200908191538_33939_m_21_2, Status : FAILED
> java.io.IOException: getValue() failed: null
> at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-25 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12759789#action_12759789
 ] 

Raghu Angadi commented on PIG-949:
--

I just committed this. Thanks Yan for the fix and Jing for the test!

> Zebra Bug: splitting map into multiple column group using storage hint causes 
> unexpected behaviour
> --
>
> Key: PIG-949
> URL: https://issues.apache.org/jira/browse/PIG-949
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
> Environment: linux
>Reporter: Alok Singh
>Assignee: Yan Zhou
> Fix For: 0.5.0
>
> Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch
>
>
> Hi 
>  The storage hint
> specification plays a important part whether the output table is readable or 
> not
> say if we have have the map 'map'.
> One can split the map into a column group using [map#{k1}, map#{k2}...] 
> however the remaining map field will automatically be added to the default 
> group.
> if user try to create a new column group for the remaining fields as follows
> [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
> the table writer will create the table.
> however, if one tries to load the created table via pig or via map reduce 
> using TableInputFormat
>  
> then the reader  have problem reading the map
> We get the following stack trace
> 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
> attempt_200908191538_33939_m_21_2, Status : FAILED
> java.io.IOException: getValue() failed: null
> at 
> org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
> at 
> org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-985) [zebra] Make necessary changes to build scripts to accommodate new zebra features plus other improvement.

2009-09-30 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761045#action_12761045
 ] 

Raghu Angadi commented on PIG-985:
--

> 5) drop column group change (Raghu Angadi)
> 6) schema package separation change (Yan Zhou)

Just to clarify, this patch does not contain the above two features. It only 
contains couple of minor changes made in build.xml as part of these changes. 
Separate jiras will be filed for these two and other features soon. 


> [zebra] Make necessary changes to build scripts to accommodate new zebra 
> features plus other improvement.
> -
>
> Key: PIG-985
> URL: https://issues.apache.org/jira/browse/PIG-985
> Project: Pig
>  Issue Type: Task
>  Components: build
>Reporter: Chao Wang
>Assignee: Chao Wang
> Attachments: patch
>
>
> The whole task consists of a series of steps as follows:
> 1) nightly test change - prevent checkin tests from running twice in nightly 
> (Chao Wang)
> 2) row based block splits for tables change (Raghu Angadi)
> 3) add clover target (Jing Huang)
> 4) add findbugs target (Chao Wang)
> 5) drop column group change (Raghu Angadi) 
> 6) schema package separation change (Yan Zhou)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-02 Thread Raghu Angadi (JIRA)

[zebra] Abitlity to drop a column group in a table
--

 Key: PIG-993
 URL: https://issues.apache.org/jira/browse/PIG-993
 Project: Pig
  Issue Type: Bug
Reporter: Raghu Angadi
Assignee: Raghu Angadi
 Fix For: 0.5.0



A Zebra table is stored as multiple sub tables each containing a set of columns 
called column group (CG). The user specifies how these columns are grouped 
while creating a table through the _storage hint_.

For some of the large tables, it might be necessary for users to remove a set 
of columns and retain the rest. This jira provides a way for users to delete an 
entire column group. 

The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-02 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761766#action_12761766
 ] 

Raghu Angadi commented on PIG-993:
--


API  is pretty simple : {code}
class org.apache.hadoop.zebra.BasicTable {
 /** see the patch for JavaDoc and attached example for usage */

public static void dropColumnGroup(Path path,
   Configuration conf,   String cgName)
   throws IOException { ... }
}
{code}

  * Table schema is not modified.  
  * this API takes a name for a column group. PIG-986 adds explicit names for 
CGs.
  * Once a CGs is deleted, NULL is returned for the fields that were stored in 
the CG. 
 ** This is the main difference between just manually deleting  a directory 
on filesystem and 'properly' deleting a CG.
 ** Many changes made in other parts of zebra are related to handling the 
missing CGs.


> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.5.0
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-02 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761767#action_12761767
 ] 

Raghu Angadi commented on PIG-993:
--

Deletion procedure : 

   # Check if a column group with the given name exists and throw an error if 
there is no such group.
   # If the column group is already deleted return normally.
  ** If a column group is already marked deleted and the corresponding 
physical directory still 
exists, try to remove the the column group data again. An earlier 
attempt might not have
removed the directory.
   # Create a an empty file ".deleted-CGNAME" in the top level directory. 
   # If the creation fails, check if the file already exists. This can happen 
when two users concurrently
  try to delete the same column group. If CG is marked deleted after this, 
return success. Exception is 
  thrown for any other error.
   # Delete the column group directory. 
   # An exception is thrown if deletion fails. Note that, column group is 
already marked deleted even though 
  the deletion of a directory failed. A subsequent deletion of such a 
column group will again try to to delete the directory.

> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.5.0
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-02 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-993:
-

Attachment: zebra-drop-cg.patch
DropColumnGroupExample.java

Attachments ; 

  DropColumnGropuExample.java : a simple example to illustrate the 
functionality.

  zebra-drop-cg.patch : This patch would apply only after a patch for PIG-896.

  Some of the tests included there are written by Jing Huang. Jing also helped 
with testing the patchon real clusters with various errors. Yan Zhou helped 
with correctly handling missing column groups.



> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.5.0
>
> Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-02 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761769#action_12761769
 ] 

Raghu Angadi commented on PIG-993:
--

> zebra-drop-cg.patch : This patch would apply only after a patch for PIG-896.
I meant say PIG-986.


> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.5.0
>
> Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-06 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762812#action_12762812
 ] 

Raghu Angadi commented on PIG-987:
--

I tried to commit this patch. 'ant test' says all the tests fail, where as only 
one two tests fail without the patch.

Does Hudson actual run Zebra tests?


> [zebra] Zebra Column Group Access Control
> -
>
> Key: PIG-987
> URL: https://issues.apache.org/jira/browse/PIG-987
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Attachments: ColumnGroupSecurity.patch
>
>
> Access Control: when processes try to read from the column groups, Zebra 
> should be able to handle allowed vs. disallowed user/application accesses.  
> The security is eventuallt granted by corresponding  HDFS security of the 
> data stored.
> Expected behavior when column group permissions are set:
> When user selects only columns that they do not have permissions to 
> access, Zebra should return error with message "Error #: Permission denied 
> for accessing column  
> Access control applies to an entire column group, so all columns in a column 
> group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-991) [zebra] A few minor bugs as described in the Description section

2009-10-06 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-991:
-

Release Note:   (was: Patch should be applied after that of Jira987.)

bq. Patch should be applied after that of Jira987.

[moved above comment from 'Release Notes' to this comment].

> [zebra] A few minor bugs as described in the Description section
> 
>
> Key: PIG-991
> URL: https://issues.apache.org/jira/browse/PIG-991
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: Bugs.patch
>
>
> 1) "lzo2" was used as the compressor name for the LZO compression algorithm; 
> it should be "lzo" instead;
> 2) the default compression is changed from "lzo" to "gz" for gzip;
> 3) In JAVACC file SchemaParser.jjt, the package name was wrong using the old 
> "package org.apache.pig.table.types";
> 4) in build.xml, two new javacc targets are added to generate 
> TableSchemaParser and TableStorageParser java codes;
> 5) Support of column group security ( 
> https://issues.apache.org/jira/browse/PIG-987 ) lacked support of the 
> dumpinfo method: the groups and permissions were not displayed. Note that as 
> a consequence, the patch herein must be applied after that of JIRA987.
> 6) and 7) a couple of issues reported in Jira917.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-06 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-987:
-

Attachment: TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt

I am attaching {{mapred.TestCheckin.txt}} that passes without the patch.

btw, not all tests pass even without the patch. What is the environment 
required? I did a fresh check out, and ran 'ant test'.

I guess the tests failures on trunk are related to lzo. But I didn't expect 
more failures with the patch.

Looks like PIG-991 removes the lzo dependency. I will try with that patch 
included.

> [zebra] Zebra Column Group Access Control
> -
>
> Key: PIG-987
> URL: https://issues.apache.org/jira/browse/PIG-987
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Attachments: ColumnGroupSecurity.patch, 
> TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt
>
>
> Access Control: when processes try to read from the column groups, Zebra 
> should be able to handle allowed vs. disallowed user/application accesses.  
> The security is eventuallt granted by corresponding  HDFS security of the 
> data stored.
> Expected behavior when column group permissions are set:
> When user selects only columns that they do not have permissions to 
> access, Zebra should return error with message "Error #: Permission denied 
> for accessing column  
> Access control applies to an entire column group, so all columns in a column 
> group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-06 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762829#action_12762829
 ] 

Raghu Angadi commented on PIG-987:
--

Not sure if this is related to PIG. When I applied PIG-991 over this, the tests 
passed (except the ones that fail on trunk).


> [zebra] Zebra Column Group Access Control
> -
>
> Key: PIG-987
> URL: https://issues.apache.org/jira/browse/PIG-987
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Attachments: ColumnGroupSecurity.patch, 
> TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt
>
>
> Access Control: when processes try to read from the column groups, Zebra 
> should be able to handle allowed vs. disallowed user/application accesses.  
> The security is eventuallt granted by corresponding  HDFS security of the 
> data stored.
> Expected behavior when column group permissions are set:
> When user selects only columns that they do not have permissions to 
> access, Zebra should return error with message "Error #: Permission denied 
> for accessing column  
> Access control applies to an entire column group, so all columns in a column 
> group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-06 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-993:
-

Fix Version/s: 0.6.0

> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.6.0
>
> Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch, 
> zebra-drop-cg.patch
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-06 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762871#action_12762871
 ] 

Raghu Angadi commented on PIG-987:
--

Even with PIG-991 included, I am seeing lzo related failures. Could you run 
tests on a clean checkout? If you didn't see the errors before then you 
probably have lzo set up in your environment, which is not a requirement. 



> [zebra] Zebra Column Group Access Control
> -
>
> Key: PIG-987
> URL: https://issues.apache.org/jira/browse/PIG-987
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Attachments: ColumnGroupSecurity.patch, 
> TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt
>
>
> Access Control: when processes try to read from the column groups, Zebra 
> should be able to handle allowed vs. disallowed user/application accesses.  
> The security is eventuallt granted by corresponding  HDFS security of the 
> data stored.
> Expected behavior when column group permissions are set:
> When user selects only columns that they do not have permissions to 
> access, Zebra should return error with message "Error #: Permission denied 
> for accessing column  
> Access control applies to an entire column group, so all columns in a column 
> group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-07 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-987:
-

Attachment: tmp-987-plus-991.patch
TEST-org.apache.hadoop.zebra.io.TestCheckin.txt

Attachments :
   # tmp-987-plus-991.patch : latest patch here + patch for PIG-991
   # TEST-org.apache.hadoop.zebra.io.TestCheckin.txt : output of the failed 
tests.

Yan,  looks like lzo related errors are fixed with the combined patch. But 
there are still some failures. I think some of these failures exist on trunk as 
well.

> [zebra] Zebra Column Group Access Control
> -
>
> Key: PIG-987
> URL: https://issues.apache.org/jira/browse/PIG-987
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Attachments: ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, 
> TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, 
> TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt, tmp-987-plus-991.patch
>
>
> Access Control: when processes try to read from the column groups, Zebra 
> should be able to handle allowed vs. disallowed user/application accesses.  
> The security is eventuallt granted by corresponding  HDFS security of the 
> data stored.
> Expected behavior when column group permissions are set:
> When user selects only columns that they do not have permissions to 
> access, Zebra should return error with message "Error #: Permission denied 
> for accessing column  
> Access control applies to an entire column group, so all columns in a column 
> group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-07 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763346#action_12763346
 ] 

Raghu Angadi commented on PIG-987:
--

I finally got some time look into this. Yes. I think the it should be fixed in 
the tests. TestColumnGroup.java says :  
{noformat}
ColumnGroup.Writer writer = new ColumnGroup.Writer(path, strSchema, sorted,
"pig", "gz", "gauravj", "users", (short) Short.parseShort("755", 8), 
false, conf);
{noformat}

using local FS. How can we expect users to have a user name "gauravj" on their 
machines and run as superusers :)? just can not be done.

If the test wants to run with these permissions we should do :
 a) use HDFS (MiniDFSCluster) rather than local filesystem. The tester has all 
the permissions on a MiniDFS.
 b) minor : use a generic name than gauravj.


> [zebra] Zebra Column Group Access Control
> -
>
> Key: PIG-987
> URL: https://issues.apache.org/jira/browse/PIG-987
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Attachments: ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, 
> TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, 
> TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt, tmp-987-plus-991.patch
>
>
> Access Control: when processes try to read from the column groups, Zebra 
> should be able to handle allowed vs. disallowed user/application accesses.  
> The security is eventuallt granted by corresponding  HDFS security of the 
> data stored.
> Expected behavior when column group permissions are set:
> When user selects only columns that they do not have permissions to 
> access, Zebra should return error with message "Error #: Permission denied 
> for accessing column  
> Access control applies to an entire column group, so all columns in a column 
> group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-08 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763516#action_12763516
 ] 

Raghu Angadi commented on PIG-987:
--

> Can you chgrp a local FS file to a group called "users" on your box?
No.

Its the same problem. I don't have a group called "users".. and I don't think 
we can require others to have it.

I didn't know owner is ignored. It is still allowed by storage hint?

> [zebra] Zebra Column Group Access Control
> -
>
> Key: PIG-987
> URL: https://issues.apache.org/jira/browse/PIG-987
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Attachments: ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, 
> TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, 
> TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt, tmp-987-plus-991.patch
>
>
> Access Control: when processes try to read from the column groups, Zebra 
> should be able to handle allowed vs. disallowed user/application accesses.  
> The security is eventuallt granted by corresponding  HDFS security of the 
> data stored.
> Expected behavior when column group permissions are set:
> When user selects only columns that they do not have permissions to 
> access, Zebra should return error with message "Error #: Permission denied 
> for accessing column  
> Access control applies to an entire column group, so all columns in a column 
> group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-08 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763836#action_12763836
 ] 

Raghu Angadi commented on PIG-987:
--

Thanks Yan. It might be better to remove gauravj also since it is ignored 
anyway. 

This implies column access control is not tested in this patch, right?

> [zebra] Zebra Column Group Access Control
> -
>
> Key: PIG-987
> URL: https://issues.apache.org/jira/browse/PIG-987
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Attachments: ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, 
> ColumnGroupSecurity.patch, TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, 
> TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt, tmp-987-plus-991.patch
>
>
> Access Control: when processes try to read from the column groups, Zebra 
> should be able to handle allowed vs. disallowed user/application accesses.  
> The security is eventuallt granted by corresponding  HDFS security of the 
> data stored.
> Expected behavior when column group permissions are set:
> When user selects only columns that they do not have permissions to 
> access, Zebra should return error with message "Error #: Permission denied 
> for accessing column  
> Access control applies to an entire column group, so all columns in a column 
> group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-991) [zebra] A few minor bugs as described in the Description section

2009-10-08 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-991:
-

Attachment: Bugs-2.patch

I am committing a slightly modified patch. I removed the following lines that 
modified build.xml at the top level. Please ask one of the PIG committers to 
commit that change.

The part that is removed :
{noformat}
@@ -940,4 +942,13 @@

  

+
+
+
+
+
 
{noformat}

> [zebra] A few minor bugs as described in the Description section
> 
>
> Key: PIG-991
> URL: https://issues.apache.org/jira/browse/PIG-991
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: Bugs-2.patch, Bugs.patch
>
>
> 1) "lzo2" was used as the compressor name for the LZO compression algorithm; 
> it should be "lzo" instead;
> 2) the default compression is changed from "lzo" to "gz" for gzip;
> 3) In JAVACC file SchemaParser.jjt, the package name was wrong using the old 
> "package org.apache.pig.table.types";
> 4) in build.xml, two new javacc targets are added to generate 
> TableSchemaParser and TableStorageParser java codes;
> 5) Support of column group security ( 
> https://issues.apache.org/jira/browse/PIG-987 ) lacked support of the 
> dumpinfo method: the groups and permissions were not displayed. Note that as 
> a consequence, the patch herein must be applied after that of JIRA987.
> 6) and 7) a couple of issues reported in Jira917.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-987) [zebra] Zebra Column Group Access Control

2009-10-08 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-987:
-

   Resolution: Fixed
Fix Version/s: 0.6.0
   Status: Resolved  (was: Patch Available)

I just committed this. Thanks Yan!

> [zebra] Zebra Column Group Access Control
> -
>
> Key: PIG-987
> URL: https://issues.apache.org/jira/browse/PIG-987
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.6.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.6.0
>
> Attachments: ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, 
> ColumnGroupSecurity.patch, TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, 
> TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt, tmp-987-plus-991.patch
>
>
> Access Control: when processes try to read from the column groups, Zebra 
> should be able to handle allowed vs. disallowed user/application accesses.  
> The security is eventuallt granted by corresponding  HDFS security of the 
> data stored.
> Expected behavior when column group permissions are set:
> When user selects only columns that they do not have permissions to 
> access, Zebra should return error with message "Error #: Permission denied 
> for accessing column  
> Access control applies to an entire column group, so all columns in a column 
> group have same permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-991) [zebra] A few minor bugs as described in the Description section

2009-10-08 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-991:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed this. Thanks Yan.

> [zebra] A few minor bugs as described in the Description section
> 
>
> Key: PIG-991
> URL: https://issues.apache.org/jira/browse/PIG-991
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: Bugs-2.patch, Bugs.patch
>
>
> 1) "lzo2" was used as the compressor name for the LZO compression algorithm; 
> it should be "lzo" instead;
> 2) the default compression is changed from "lzo" to "gz" for gzip;
> 3) In JAVACC file SchemaParser.jjt, the package name was wrong using the old 
> "package org.apache.pig.table.types";
> 4) in build.xml, two new javacc targets are added to generate 
> TableSchemaParser and TableStorageParser java codes;
> 5) Support of column group security ( 
> https://issues.apache.org/jira/browse/PIG-987 ) lacked support of the 
> dumpinfo method: the groups and permissions were not displayed. Note that as 
> a consequence, the patch herein must be applied after that of JIRA987.
> 6) and 7) a couple of issues reported in Jira917.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-986) [zebra] Zebra Column Group Naming Support

2009-10-10 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-986:
-

Status: Open  (was: Patch Available)

> [zebra] Zebra Column Group Naming Support
> -
>
> Key: PIG-986
> URL: https://issues.apache.org/jira/browse/PIG-986
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Chao Wang
>Assignee: Chao Wang
> Fix For: 0.6.0
>
> Attachments: ColumnGroupName.patch, ColumnGroupName.patch, 
> ColumnGroupName.patch
>
>
> We introduce column group name to Zebra and make it a first-class citizen in 
> Zebra. This can ease management of column groups.
> We plan to introduce an "as" clause for column group name in Zebra's syntax.
> Functional Specifications:
> 1) Column group names are optional. For column groups which do not have a 
> user-provided name, Zebra will assign some default column group names 
> internally that is unique for that table - CG0, CG1, CG2 ... Note: If CGx is 
> used by user, then it can not be used for internal names.
> 2) We introduce an "AS" clause in Zebra's syntax for column group names. If 
> it occurs, it has to immediately follow [ ]. For example, "[a1, a2] as PI 
> secure by user:joe group:secure perm:640; [a3, a4] as General compress by 
> lzo". Note that keyword "AS" is case insensitive.
> 3) Column group names are unique within one table and are case sensitive, 
> i.e., c1 and C1 are different.
> 4) Column group names will be used as the physical column group directory 
> path names.
> 5) Zebra V2 will support dropColumnGroup by column group names (will 
> integrate with Raghu's A29 drop column work).
> 6) Zebra V2 can support backward compatibility (If there are Zebra V1 created 
> tables in production when V2 is released). More specifically, this means that 
> Zebra V2 can load from V1-created tables and do dropColumnGroup on it.
> 7) Does NOT support renaming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-986) [zebra] Zebra Column Group Naming Support

2009-10-10 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-986:
-

Status: Patch Available  (was: Open)

> [zebra] Zebra Column Group Naming Support
> -
>
> Key: PIG-986
> URL: https://issues.apache.org/jira/browse/PIG-986
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Chao Wang
>Assignee: Chao Wang
> Fix For: 0.6.0
>
> Attachments: ColumnGroupName.patch, ColumnGroupName.patch, 
> ColumnGroupName.patch
>
>
> We introduce column group name to Zebra and make it a first-class citizen in 
> Zebra. This can ease management of column groups.
> We plan to introduce an "as" clause for column group name in Zebra's syntax.
> Functional Specifications:
> 1) Column group names are optional. For column groups which do not have a 
> user-provided name, Zebra will assign some default column group names 
> internally that is unique for that table - CG0, CG1, CG2 ... Note: If CGx is 
> used by user, then it can not be used for internal names.
> 2) We introduce an "AS" clause in Zebra's syntax for column group names. If 
> it occurs, it has to immediately follow [ ]. For example, "[a1, a2] as PI 
> secure by user:joe group:secure perm:640; [a3, a4] as General compress by 
> lzo". Note that keyword "AS" is case insensitive.
> 3) Column group names are unique within one table and are case sensitive, 
> i.e., c1 and C1 are different.
> 4) Column group names will be used as the physical column group directory 
> path names.
> 5) Zebra V2 will support dropColumnGroup by column group names (will 
> integrate with Raghu's A29 drop column work).
> 6) Zebra V2 can support backward compatibility (If there are Zebra V1 created 
> tables in production when V2 is released). More specifically, this means that 
> Zebra V2 can load from V1-created tables and do dropColumnGroup on it.
> 7) Does NOT support renaming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-986) [zebra] Zebra Column Group Naming Support

2009-10-11 Thread Raghu Angadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi updated PIG-986:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

I just committed this. Thanks Yan.

> [zebra] Zebra Column Group Naming Support
> -
>
> Key: PIG-986
> URL: https://issues.apache.org/jira/browse/PIG-986
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Chao Wang
>Assignee: Chao Wang
> Fix For: 0.6.0
>
> Attachments: ColumnGroupName.patch, ColumnGroupName.patch, 
> ColumnGroupName.patch
>
>
> We introduce column group name to Zebra and make it a first-class citizen in 
> Zebra. This can ease management of column groups.
> We plan to introduce an "as" clause for column group name in Zebra's syntax.
> Functional Specifications:
> 1) Column group names are optional. For column groups which do not have a 
> user-provided name, Zebra will assign some default column group names 
> internally that is unique for that table - CG0, CG1, CG2 ... Note: If CGx is 
> used by user, then it can not be used for internal names.
> 2) We introduce an "AS" clause in Zebra's syntax for column group names. If 
> it occurs, it has to immediately follow [ ]. For example, "[a1, a2] as PI 
> secure by user:joe group:secure perm:640; [a3, a4] as General compress by 
> lzo". Note that keyword "AS" is case insensitive.
> 3) Column group names are unique within one table and are case sensitive, 
> i.e., c1 and C1 are different.
> 4) Column group names will be used as the physical column group directory 
> path names.
> 5) Zebra V2 will support dropColumnGroup by column group names (will 
> integrate with Raghu's A29 drop column work).
> 6) Zebra V2 can support backward compatibility (If there are Zebra V1 created 
> tables in production when V2 is released). More specifically, this means that 
> Zebra V2 can load from V1-created tables and do dropColumnGroup on it.
> 7) Does NOT support renaming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-11 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764552#action_12764552
 ] 

Raghu Angadi commented on PIG-993:
--

This patch depends on PIG-992. It is not a functional dependency and can be 
removed if required.

> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.6.0
>
> Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch, 
> zebra-drop-cg.patch
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-16 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766841#action_12766841
 ] 

Raghu Angadi commented on PIG-993:
--


I think the test needs to be fixed.  It deletes 6 column groups from 6 
different threads. The spec explicitly states read accesses and parallel 
deletions expected to fail. But the table is always left in consistent state. 
The rationale for this is that in practice these tables are accessed from 
different machines and it would add unnecessary complication to support 
coordinate all the readers and the writers (especially with no locking support 
on HDFS). Zebra tables have no state outside the directory. This applies to 
writing as well.

One options I see is to make each thread make multiple attempts in case of 
errors. 
  

> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.6.0
>
> Attachments: DropColumnGroupExample.java, 
> TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, zebra-drop-cg.patch, 
> zebra-drop-cg.patch, zebra-drop-cg.patch
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1053) Consider moving to Hadoop for local mode

2009-10-26 Thread Raghu Angadi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770336#action_12770336
 ] 

Raghu Angadi commented on PIG-1053:
---

a big +1.

It is understandable from PIG developer's point of view to be annoyed by 
beginners complaining about run time with toy local inputs. may be clear 
heads-up in tutorial would reduce those.

> Consider moving to Hadoop for local mode
> 
>
> Key: PIG-1053
> URL: https://issues.apache.org/jira/browse/PIG-1053
> Project: Pig
>  Issue Type: Improvement
>Reporter: Alan Gates
>
> We need to consider moving Pig to use Hadoop's local mode instead of its own.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

49 matches

Mail list logo