[jira] Created: (PIG-833) Storage access layer

2009-06-04 Thread Jay Tang (JIRA)
Storage access layer


 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang


A layer is needed to provide a high level data access abstraction and a tabular 
view of data in Hadoop, and could free Pig users from implementing their own 
data storage/retrieval code.  This layer should also include a columnar storage 
format in order to provide fast data projection, CPU/space-efficient data 
serialization, and a schema language to manage physical storage metadata.  
Eventually it could also support predicate pushdown for further performance 
improvement.  Initially, this layer could be a contrib project in Pig and 
become a hadoop subproject later on.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-833) Storage access layer

2009-08-11 Thread Jay Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742201#action_12742201
 ] 

Jay Tang commented on PIG-833:
--

Zebra has a dependency on TFile that is available in Hadoop 20; that's why the 
compilation instruction is more complicated.  A new wiki at 
http://wiki.apache.org/pig/zebra will provide more information on Zebra.

 Storage access layer
 

 Key: PIG-833
 URL: https://issues.apache.org/jira/browse/PIG-833
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang
 Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, 
 PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, 
 TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz


 A layer is needed to provide a high level data access abstraction and a 
 tabular view of data in Hadoop, and could free Pig users from implementing 
 their own data storage/retrieval code.  This layer should also include a 
 columnar storage format in order to provide fast data projection, 
 CPU/space-efficient data serialization, and a schema language to manage 
 physical storage metadata.  Eventually it could also support predicate 
 pushdown for further performance improvement.  Initially, this layer could be 
 a contrib project in Pig and become a hadoop subproject later on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1140) [zebra] Use of Hadoop 2.0 APIs

2010-02-09 Thread Jay Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Tang reassigned PIG-1140:
-

Assignee: Xuefu Zhang

 [zebra] Use of Hadoop 2.0 APIs  
 

 Key: PIG-1140
 URL: https://issues.apache.org/jira/browse/PIG-1140
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Yan Zhou
Assignee: Xuefu Zhang
 Fix For: 0.7.0

 Attachments: zebra.0209


 Currently, Zebra is still using already deprecated Hadoop 1.8 APIs. Need to 
 upgrade to its 2.0 APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1137) [zebra] get* methods of Zebra Map/Reduce APIs need improvements

2010-02-09 Thread Jay Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Tang reassigned PIG-1137:
-

Assignee: Yan Zhou

 [zebra] get* methods of Zebra Map/Reduce APIs need improvements
 ---

 Key: PIG-1137
 URL: https://issues.apache.org/jira/browse/PIG-1137
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0


 Currently the set* methods takes external Zebra objects, namely objects of  
 ZebraStorageHint, ZebraSchema, ZebraSortInfo or ZebraProjection. 
 Correspondingly, the get* methods should return such objects instead of 
 String or Zebra internal objects like Schema.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1139) [zebra] Encapsulation of check of ZebraSortInfo by a Zebra reader; the check by a writer could be better encapsulated

2010-02-09 Thread Jay Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Tang updated PIG-1139:
--

Fix Version/s: (was: 0.7.0)
   0.8.0

 [zebra] Encapsulation of check of ZebraSortInfo by a Zebra reader; the check 
 by a writer could be better encapsulated
 -

 Key: PIG-1139
 URL: https://issues.apache.org/jira/browse/PIG-1139
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Yan Zhou
Priority: Minor
 Fix For: 0.8.0


 Currently the user's ZebraSortInfo by Map/Reduce's writer, namely, the 
 BasicTableOutputFormat.setStorageInfo, is sanity checked by the 
 SortInfo.parse(), although the sanity check could be all performed in that 
 method taking a ZebraSortInfo object.
 But the sanity check at the reader side is totally by the caller of 
 TableInputFormat.requireSortedTable method, which should be better 
 encapsulated into a new SortInfo's method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1137) [zebra] get* methods of Zebra Map/Reduce APIs need improvements

2010-03-22 Thread Jay Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Tang updated PIG-1137:
--

Fix Version/s: (was: 0.7.0)
   0.8.0

 [zebra] get* methods of Zebra Map/Reduce APIs need improvements
 ---

 Key: PIG-1137
 URL: https://issues.apache.org/jira/browse/PIG-1137
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.8.0


 Currently the set* methods takes external Zebra objects, namely objects of  
 ZebraStorageHint, ZebraSchema, ZebraSortInfo or ZebraProjection. 
 Correspondingly, the get* methods should return such objects instead of 
 String or Zebra internal objects like Schema.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1120) [zebra] should support using org.apache.hadoop.zebra.pig.TableStorer() if user does not want to specify storage hint

2010-03-22 Thread Jay Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Tang updated PIG-1120:
--

Fix Version/s: (was: 0.7.0)
   0.8.0

 [zebra] should support  using org.apache.hadoop.zebra.pig.TableStorer() if 
 user does not want to specify storage hint
 -

 Key: PIG-1120
 URL: https://issues.apache.org/jira/browse/PIG-1120
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
 Fix For: 0.8.0


 If user doesn't want to specify storage hint, current zebra implementation 
 only support  using org.apache.hadoop.zebra.pig.TableStorer('')  Note: empty 
 string in TableStorer(' ').
 We should support the format of  using 
 org.apache.hadoop.zebra.pig.TableStorer() as we do on  using 
 org.apache.hadoop.zebra.pig.TableLoader()
 sample pig script:
 register /grid/0/dev/hadoopqa/jars/zebra.jar;
 a = load '1.txt' as (a:int, 
 b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);
 b = load '2.txt' as (a:int, 
 b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);
 c = join a by a, b by a;
 d = foreach c generate a::a, a::b, b::c;
 describe d;
 dump d;
 store d into 'join3' using org.apache.hadoop.zebra.pig.TableStorer('');
 --this will fail
 --store d into 'join3' using org.apache.hadoop.zebra.pig.TableStorer( );

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1138) [zebra] Support of PIG's new Load/Store Interfaces

2010-03-22 Thread Jay Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Tang resolved PIG-1138.
---

   Resolution: Duplicate
Fix Version/s: 0.7.0

Duplicate of 1140

 [zebra] Support of  PIG's new Load/Store Interfaces
 ---

 Key: PIG-1138
 URL: https://issues.apache.org/jira/browse/PIG-1138
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Yan Zhou
 Fix For: 0.7.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1223) [zebra] Add cli to help admin zebra

2010-03-22 Thread Jay Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848329#action_12848329
 ] 

Jay Tang commented on PIG-1223:
---

Yongqiang, could you comment on what kind of admin features you're looking for?

 [zebra] Add cli to help admin zebra
 ---

 Key: PIG-1223
 URL: https://issues.apache.org/jira/browse/PIG-1223
 Project: Pig
  Issue Type: Wish
Reporter: He Yongqiang



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits

2010-03-22 Thread Jay Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Tang updated PIG-1306:
--

Fix Version/s: 0.7.0

 [zebra] Support of locally sorted input splits
 --

 Key: PIG-1306
 URL: https://issues.apache.org/jira/browse/PIG-1306
 Project: Pig
  Issue Type: Improvement
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0


 Current Zebra supports sorted or unsorted input splits on sorted table or 
 sorted table unions. The sorted input splits are based upon key ranges which 
 do not overlap. And the splits are basically globally sorted in that they are 
 locally sorted, and their key ranges do not overlap.
 The biggest problem of the key-range splits are performance hits suffered if 
 data skew is present, particularly if a key range contains a duplicate key 
 solely which makes the data trunk of the duplicate keys virtually 
 unsplittable regardless how many mappers are available: it just has to be 
 processed by a single mapper.
 On the other hand, there are scenarios when the globally sorted splits are a 
 over-kill and only locally sorted splits are good enough. Examples are the 
 use of Zebra sorted tables as the probe table in a map-side merge inner join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

2010-03-26 Thread Jay Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850347#action_12850347
 ] 

Jay Tang commented on PIG-1331:
---

Owl has an internal metastore that has a similar relational table and partition 
model with Hive's metastore.  Owl goes beyond this and provides a uniform data 
access mechanism on top of multiple storage format.  This interface can be 
leveraged by Pig and MapReduce applications.  There is room for collaboration 
between Owl and Hive so that we could eventually converge on a common metastore 
for Hadoop.

 Owl Hadoop Table Management Service
 ---

 Key: PIG-1331
 URL: https://issues.apache.org/jira/browse/PIG-1331
 Project: Pig
  Issue Type: New Feature
Reporter: Jay Tang

 This JIRA is a proposal to create a Hadoop table management service: Owl. 
 Today, MapReduce and Pig applications interacts directly with HDFS 
 directories and files and must deal with low level data management issues 
 such as storage format, serialization/compression schemes, data layout, and 
 efficient data accesses, etc, often with different solutions. Owl aims to 
 provide a standard way to addresses this issue and abstracts away the 
 complexities of reading/writing huge amount of data from/to HDFS.
 Owl has a data access API that is modeled after the traditional Hadoop 
 !InputFormt and a management API to manipulate Owl objects.  This JIRA is 
 related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata 
 store.  Owl integrates with different storage module like Zebra with a 
 pluggable architecture.
  Initially, the proposal is to submit Owl as a Pig contrib project.  Over 
 time, it makes sense to move it to a Hadoop subproject.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

2010-03-27 Thread Jay Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850638#action_12850638
 ] 

Jay Tang commented on PIG-1331:
---

Owl's data access API, OwlInputFormat, provides a uniform API to access data 
stored in different storage format like Zebra, RCFile, SequenceFile, etc.  Its 
a single data access abstraction on top of disparate data.

 Owl Hadoop Table Management Service
 ---

 Key: PIG-1331
 URL: https://issues.apache.org/jira/browse/PIG-1331
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Jay Tang
 Attachments: owl.contrib.3.tgz


 This JIRA is a proposal to create a Hadoop table management service: Owl. 
 Today, MapReduce and Pig applications interacts directly with HDFS 
 directories and files and must deal with low level data management issues 
 such as storage format, serialization/compression schemes, data layout, and 
 efficient data accesses, etc, often with different solutions. Owl aims to 
 provide a standard way to addresses this issue and abstracts away the 
 complexities of reading/writing huge amount of data from/to HDFS.
 Owl has a data access API that is modeled after the traditional Hadoop 
 !InputFormt and a management API to manipulate Owl objects.  This JIRA is 
 related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata 
 store.  Owl integrates with different storage module like Zebra with a 
 pluggable architecture.
  Initially, the proposal is to submit Owl as a Pig contrib project.  Over 
 time, it makes sense to move it to a Hadoop subproject.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

2010-03-28 Thread Jay Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850688#action_12850688
 ] 

Jay Tang commented on PIG-1331:
---

Carl, from a serialization/deserialization perspective, the functionality 
appears similar.  Owl also handles other storage layer interactions like data 
pruning.  Owl supports partition and column pruning; we plan to support row 
pruning via predicate pushdown.  The goal is to push data filtering work down.  
If a storage layer does not support a certain filter capability, Owl would 
provide an implementation.  

 Owl Hadoop Table Management Service
 ---

 Key: PIG-1331
 URL: https://issues.apache.org/jira/browse/PIG-1331
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Jay Tang
 Attachments: owl.contrib.3.tgz


 This JIRA is a proposal to create a Hadoop table management service: Owl. 
 Today, MapReduce and Pig applications interacts directly with HDFS 
 directories and files and must deal with low level data management issues 
 such as storage format, serialization/compression schemes, data layout, and 
 efficient data accesses, etc, often with different solutions. Owl aims to 
 provide a standard way to addresses this issue and abstracts away the 
 complexities of reading/writing huge amount of data from/to HDFS.
 Owl has a data access API that is modeled after the traditional Hadoop 
 !InputFormt and a management API to manipulate Owl objects.  This JIRA is 
 related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata 
 store.  Owl integrates with different storage module like Zebra with a 
 pluggable architecture.
  Initially, the proposal is to submit Owl as a Pig contrib project.  Over 
 time, it makes sense to move it to a Hadoop subproject.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

2010-03-29 Thread Jay Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851230#action_12851230
 ] 

Jay Tang commented on PIG-1331:
---

Ashish,  the goal of Owl is to provide a table-like abstraction to manage 
Hadoop data.  The design would allow any customer MapReduce applications, Pig 
Latin, and even Hive query language to consume data via Owl's interface.  Our 
vision is to build a full data life cycle management stack that encompasses 
data creation, notification, consumption, retention, and security management, 
etc.  Owl would make things easier for a MapReduce application writer or for 
someone to build another query processing language on top of it.  We will 
update Owl wikie page with more detailed information.

 Owl Hadoop Table Management Service
 ---

 Key: PIG-1331
 URL: https://issues.apache.org/jira/browse/PIG-1331
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Jay Tang
 Attachments: owl.contrib.3.tgz


 This JIRA is a proposal to create a Hadoop table management service: Owl. 
 Today, MapReduce and Pig applications interacts directly with HDFS 
 directories and files and must deal with low level data management issues 
 such as storage format, serialization/compression schemes, data layout, and 
 efficient data accesses, etc, often with different solutions. Owl aims to 
 provide a standard way to addresses this issue and abstracts away the 
 complexities of reading/writing huge amount of data from/to HDFS.
 Owl has a data access API that is modeled after the traditional Hadoop 
 !InputFormt and a management API to manipulate Owl objects.  This JIRA is 
 related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata 
 store.  Owl integrates with different storage module like Zebra with a 
 pluggable architecture.
  Initially, the proposal is to submit Owl as a Pig contrib project.  Over 
 time, it makes sense to move it to a Hadoop subproject.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

2010-03-31 Thread Jay Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851998#action_12851998
 ] 

Jay Tang commented on PIG-1331:
---

There seems to be an issue with maven repo.  We'll attach jar files and update 
build scripts.

 Owl Hadoop Table Management Service
 ---

 Key: PIG-1331
 URL: https://issues.apache.org/jira/browse/PIG-1331
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Jay Tang
 Attachments: build.log, owl.contrib.3.tgz


 This JIRA is a proposal to create a Hadoop table management service: Owl. 
 Today, MapReduce and Pig applications interacts directly with HDFS 
 directories and files and must deal with low level data management issues 
 such as storage format, serialization/compression schemes, data layout, and 
 efficient data accesses, etc, often with different solutions. Owl aims to 
 provide a standard way to addresses this issue and abstracts away the 
 complexities of reading/writing huge amount of data from/to HDFS.
 Owl has a data access API that is modeled after the traditional Hadoop 
 !InputFormt and a management API to manipulate Owl objects.  This JIRA is 
 related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata 
 store.  Owl integrates with different storage module like Zebra with a 
 pluggable architecture.
  Initially, the proposal is to submit Owl as a Pig contrib project.  Over 
 time, it makes sense to move it to a Hadoop subproject.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1367) [zebra] Map-side Cogroup Test case is needed on 0.7 if the feature is supported in 0.7

2010-04-16 Thread Jay Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Tang updated PIG-1367:
--

Fix Version/s: site
   (was: 0.7.0)

 [zebra] Map-side Cogroup Test case is needed on 0.7 if the feature is 
 supported in 0.7
 --

 Key: PIG-1367
 URL: https://issues.apache.org/jira/browse/PIG-1367
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Yan Zhou
 Fix For: site


 PIG-1315 has the Zebra support for this feature and the map-side group-by. It 
 also has the test case for map-side COGROUP; while the test case for map-side 
 GROUP-BY is in PIG-1357.
 However PIG-1315 is committed to the trunk as a whole; but only committed to 
 the 0.7 branch without the map-side group-by test case because PIG has yet to 
 decide if the feature will be in the 0.7 release.
 This JIRA is created for tracking purpose should the decision to support 
 map-side COGROUP in 0.7 by PIG is made. If not, this should be made invalid 
 eventually.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (PIG-1367) [zebra] Map-side Cogroup Test case is needed on 0.7 if the feature is supported in 0.7

2010-04-16 Thread Jay Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Tang updated PIG-1367:
--

Fix Version/s: 0.8.0
   (was: site)

 [zebra] Map-side Cogroup Test case is needed on 0.7 if the feature is 
 supported in 0.7
 --

 Key: PIG-1367
 URL: https://issues.apache.org/jira/browse/PIG-1367
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Yan Zhou
 Fix For: 0.8.0


 PIG-1315 has the Zebra support for this feature and the map-side group-by. It 
 also has the test case for map-side COGROUP; while the test case for map-side 
 GROUP-BY is in PIG-1357.
 However PIG-1315 is committed to the trunk as a whole; but only committed to 
 the 0.7 branch without the map-side group-by test case because PIG has yet to 
 decide if the feature will be in the 0.7 release.
 This JIRA is created for tracking purpose should the decision to support 
 map-side COGROUP in 0.7 by PIG is made. If not, this should be made invalid 
 eventually.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (PIG-1350) [Zebra] Zebra column names cannot have leading _

2010-04-16 Thread Jay Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Tang updated PIG-1350:
--

Fix Version/s: 0.8.0
   (was: 0.7.0)

 [Zebra] Zebra column names cannot have leading _
 --

 Key: PIG-1350
 URL: https://issues.apache.org/jira/browse/PIG-1350
 Project: Pig
  Issue Type: Improvement
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: pig-1350.patch, pig-1350.patch


 Disallowing '_' as leading character in column names in Zebra schema is too 
 restrictive, which should be lifted.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

2010-05-04 Thread Jay Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12863835#action_12863835
 ] 

Jay Tang commented on PIG-1331:
---

Yes, Jeff.  Owl, as a table management service, has a metadata module. Please 
see http://wiki.apache.org/pig/owl for more information.

 Owl Hadoop Table Management Service
 ---

 Key: PIG-1331
 URL: https://issues.apache.org/jira/browse/PIG-1331
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Jay Tang
Assignee: Ajay Kidave
 Fix For: 0.8.0

 Attachments: anttestoutput.tgz, build.log, ivy_version.patch, 
 owl.contrib.3.tgz, owl.contrib.4.tar.gz


 This JIRA is a proposal to create a Hadoop table management service: Owl. 
 Today, MapReduce and Pig applications interacts directly with HDFS 
 directories and files and must deal with low level data management issues 
 such as storage format, serialization/compression schemes, data layout, and 
 efficient data accesses, etc, often with different solutions. Owl aims to 
 provide a standard way to addresses this issue and abstracts away the 
 complexities of reading/writing huge amount of data from/to HDFS.
 Owl has a data access API that is modeled after the traditional Hadoop 
 !InputFormt and a management API to manipulate Owl objects.  This JIRA is 
 related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata 
 store.  Owl integrates with different storage module like Zebra with a 
 pluggable architecture.
  Initially, the proposal is to submit Owl as a Pig contrib project.  Over 
 time, it makes sense to move it to a Hadoop subproject.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.