[jira] Commented: (PIG-823) Hadoop Metadata Service

2009-08-17 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744326#action_12744326
 ] 

Jeff Hammerbacher commented on PIG-823:
---

Hey,

Great to see Owl source! I've filed a ticket over on the Hive project 
(https://issues.apache.org/jira/browse/HIVE-762) to see if we can find some 
common ground between Pig and Hive's metadata needs; it would be great to have 
a single metadata service for all of Hadoop's structured data manipulation 
tools. If you're interested, please chime in there (or open a ticket here? 
Whatever seems sane to you).

Thanks,
Jeff

 Hadoop Metadata Service
 ---

 Key: PIG-823
 URL: https://issues.apache.org/jira/browse/PIG-823
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich
 Attachments: owl.filelist, owl.patch.gz, owl_libdeps.tgz, 
 owl_otherdeps.tgz


 This JIRA is created to track development of a metadata system for  Hadoop. 
 The goal of the system is to allow users and applications to register data 
 stored on HDFS, search for the data available on HDFS, and associate metadata 
 such as schema, statistics, etc. with a particular data unit or a data set 
 stored on HDFS. The initial goal is to provide a fairly generic, low level 
 abstraction that any user or application on HDFS can use to store an retrieve 
 metadata. Over time a higher level abstractions closely tied to particular 
 applications or tools can be developed.
 Over time, it would make sense for the metadata service to become a 
 subproject within Hadoop. For now, the proposal is to make it a contrib to 
 Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-823) Hadoop Metadata Service

2009-06-11 Thread Amr Awadallah (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718313#action_12718313
 ] 

Amr Awadallah commented on PIG-823:
---

sounds good, thanks for elaborating.

 Hadoop Metadata Service
 ---

 Key: PIG-823
 URL: https://issues.apache.org/jira/browse/PIG-823
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich

 This JIRA is created to track development of a metadata system for  Hadoop. 
 The goal of the system is to allow users and applications to register data 
 stored on HDFS, search for the data available on HDFS, and associate metadata 
 such as schema, statistics, etc. with a particular data unit or a data set 
 stored on HDFS. The initial goal is to provide a fairly generic, low level 
 abstraction that any user or application on HDFS can use to store an retrieve 
 metadata. Over time a higher level abstractions closely tied to particular 
 applications or tools can be developed.
 Over time, it would make sense for the metadata service to become a 
 subproject within Hadoop. For now, the proposal is to make it a contrib to 
 Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-823) Hadoop Metadata Service

2009-06-11 Thread Matei Zaharia (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718374#action_12718374
 ] 

Matei Zaharia commented on PIG-823:
---

That sounds great. If this is sufficiently extensible, it might potentially 
even be useful for data that is not in HDFS, such as HBase tables (though we 
should avoid making the system overly complex).

 Hadoop Metadata Service
 ---

 Key: PIG-823
 URL: https://issues.apache.org/jira/browse/PIG-823
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich

 This JIRA is created to track development of a metadata system for  Hadoop. 
 The goal of the system is to allow users and applications to register data 
 stored on HDFS, search for the data available on HDFS, and associate metadata 
 such as schema, statistics, etc. with a particular data unit or a data set 
 stored on HDFS. The initial goal is to provide a fairly generic, low level 
 abstraction that any user or application on HDFS can use to store an retrieve 
 metadata. Over time a higher level abstractions closely tied to particular 
 applications or tools can be developed.
 Over time, it would make sense for the metadata service to become a 
 subproject within Hadoop. For now, the proposal is to make it a contrib to 
 Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-823) Hadoop Metadata Service

2009-06-10 Thread Amr Awadallah (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717933#action_12717933
 ] 

Amr Awadallah commented on PIG-823:
---

+1 to unified meta-data service.

-- amr


 Hadoop Metadata Service
 ---

 Key: PIG-823
 URL: https://issues.apache.org/jira/browse/PIG-823
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich

 This JIRA is created to track development of a metadata system for  Hadoop. 
 The goal of the system is to allow users and applications to register data 
 stored on HDFS, search for the data available on HDFS, and associate metadata 
 such as schema, statistics, etc. with a particular data unit or a data set 
 stored on HDFS. The initial goal is to provide a fairly generic, low level 
 abstraction that any user or application on HDFS can use to store an retrieve 
 metadata. Over time a higher level abstractions closely tied to particular 
 applications or tools can be developed.
 Over time, it would make sense for the metadata service to become a 
 subproject within Hadoop. For now, the proposal is to make it a contrib to 
 Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-823) Hadoop Metadata Service

2009-06-10 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718063#action_12718063
 ] 

Alan Gates commented on PIG-823:


In response to Matei's comment:

The intent is not that this is Pig metadata, but that it be grid wide metadata. 
 We don't want to put it directly in HDFS by extending the namenode, since the 
namenode is already heavily loaded and a central contention point in the 
system.  We also want it to remain optional, as many users will not need it.

The vision is that this will be a separate module that Hadoop users can choose 
to install and use with their system, along with other modules they use, such 
as Pig, Hive, Chuckwa, etc.

The Pig team is volunteering to put it in our contrib for now because Pig is 
interested in it and willing to devote the resources to help it get started.

 Hadoop Metadata Service
 ---

 Key: PIG-823
 URL: https://issues.apache.org/jira/browse/PIG-823
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich

 This JIRA is created to track development of a metadata system for  Hadoop. 
 The goal of the system is to allow users and applications to register data 
 stored on HDFS, search for the data available on HDFS, and associate metadata 
 such as schema, statistics, etc. with a particular data unit or a data set 
 stored on HDFS. The initial goal is to provide a fairly generic, low level 
 abstraction that any user or application on HDFS can use to store an retrieve 
 metadata. Over time a higher level abstractions closely tied to particular 
 applications or tools can be developed.
 Over time, it would make sense for the metadata service to become a 
 subproject within Hadoop. For now, the proposal is to make it a contrib to 
 Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.




[jira] Commented: (PIG-823) Hadoop Metadata Service

2009-06-09 Thread Matei Zaharia (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717876#action_12717876
 ] 

Matei Zaharia commented on PIG-823:
---

I agree with Jeff that that it might be better to make this service a feature 
of HDFS rather than a component of Pig. A metadata service might be useful to 
people who don't use Pig at all, e.g. who just load data and process it with 
MapReduce (which is a use case you cover on the Wiki page). Having a single, 
standard metadata service would allow unrelated tools for loading data, 
processing it, browsing it, etc to interoperate.

 Hadoop Metadata Service
 ---

 Key: PIG-823
 URL: https://issues.apache.org/jira/browse/PIG-823
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich

 This JIRA is created to track development of a metadata system for  Hadoop. 
 The goal of the system is to allow users and applications to register data 
 stored on HDFS, search for the data available on HDFS, and associate metadata 
 such as schema, statistics, etc. with a particular data unit or a data set 
 stored on HDFS. The initial goal is to provide a fairly generic, low level 
 abstraction that any user or application on HDFS can use to store an retrieve 
 metadata. Over time a higher level abstractions closely tied to particular 
 applications or tools can be developed.
 Over time, it would make sense for the metadata service to become a 
 subproject within Hadoop. For now, the proposal is to make it a contrib to 
 Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-823) Hadoop Metadata Service

2009-06-05 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716722#action_12716722
 ] 

Jeff Hammerbacher commented on PIG-823:
---

Hey Alan,

Thanks for the additional detail. I suppose I can wait for the document to be 
released to the public, but it sounds as if you're creating a separate 
extended attributes service to host non-core file and directory metadata 
separately from the NN. It's not clear to me that this is a positive 
development for Hadoop. Perhaps we should spend the engineering effort on a 
single, partitioned, available metadata service for all file and directory 
attributes? The project has larger scope and requires but is potentially a 
cleaner solution for the long term.

Later,
Jeff

 Hadoop Metadata Service
 ---

 Key: PIG-823
 URL: https://issues.apache.org/jira/browse/PIG-823
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich

 This JIRA is created to track development of a metadata system for  Hadoop. 
 The goal of the system is to allow users and applications to register data 
 stored on HDFS, search for the data available on HDFS, and associate metadata 
 such as schema, statistics, etc. with a particular data unit or a data set 
 stored on HDFS. The initial goal is to provide a fairly generic, low level 
 abstraction that any user or application on HDFS can use to store an retrieve 
 metadata. Over time a higher level abstractions closely tied to particular 
 applications or tools can be developed.
 Over time, it would make sense for the metadata service to become a 
 subproject within Hadoop. For now, the proposal is to make it a contrib to 
 Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-823) Hadoop Metadata Service

2009-06-04 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716463#action_12716463
 ] 

Jeff Hammerbacher commented on PIG-823:
---

Hey Olga,

Really looking forward to seeing more discussion on this issue. The NameNode 
already contains file metadata like ctime, mtime, the block list, permissions, 
etc. Will the proposed metadata service subsume those attributes as well? 
Curious to see the proposed design.

Thanks,
Jeff

 Hadoop Metadata Service
 ---

 Key: PIG-823
 URL: https://issues.apache.org/jira/browse/PIG-823
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich

 This JIRA is created to track development of a metadata system for  Hadoop. 
 The goal of the system is to allow users and applications to register data 
 stored on HDFS, search for the data available on HDFS, and associate metadata 
 such as schema, statistics, etc. with a particular data unit or a data set 
 stored on HDFS. The initial goal is to provide a fairly generic, low level 
 abstraction that any user or application on HDFS can use to store an retrieve 
 metadata. Over time a higher level abstractions closely tied to particular 
 applications or tools can be developed.
 Over time, it would make sense for the metadata service to become a 
 subproject within Hadoop. For now, the proposal is to make it a contrib to 
 Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-823) Hadoop Metadata Service

2009-05-29 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714543#action_12714543
 ] 

Olga Natkovich commented on PIG-823:


We looked at metadata in Hive and it is really focused around higher level of 
abstraction such as tables/partitions etc. We would like to have something 
lower level, more generic, and closer to HDFS. We see a wider use for this 
system then just to support for SQL though SQL for Pig might be the first user.


 Hadoop Metadata Service
 ---

 Key: PIG-823
 URL: https://issues.apache.org/jira/browse/PIG-823
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich

 This JIRA is created to track development of a metadata system for  Hadoop. 
 The goal of the system is to allow users and applications to register data 
 stored on HDFS, search for the data available on HDFS, and associate metadata 
 such as schema, statistics, etc. with a particular data unit or a data set 
 stored on HDFS. The initial goal is to provide a fairly generic, low level 
 abstraction that any user or application on HDFS can use to store an retrieve 
 metadata. Over time a higher level abstractions closely tied to particular 
 applications or tools can be developed.
 Over time, it would make sense for the metadata service to become a 
 subproject within Hadoop. For now, the proposal is to make it a contrib to 
 Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-823) Hadoop Metadata Service

2009-05-29 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714547#action_12714547
 ] 

Jeff Hammerbacher commented on PIG-823:
---

It's an open source project and easily extensible. There are many extensions to 
the service within Facebook to support more general information. Why not try to 
add them to the existing service, since it's already got pluggable backends and 
a server implementation already defined?

 Hadoop Metadata Service
 ---

 Key: PIG-823
 URL: https://issues.apache.org/jira/browse/PIG-823
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich

 This JIRA is created to track development of a metadata system for  Hadoop. 
 The goal of the system is to allow users and applications to register data 
 stored on HDFS, search for the data available on HDFS, and associate metadata 
 such as schema, statistics, etc. with a particular data unit or a data set 
 stored on HDFS. The initial goal is to provide a fairly generic, low level 
 abstraction that any user or application on HDFS can use to store an retrieve 
 metadata. Over time a higher level abstractions closely tied to particular 
 applications or tools can be developed.
 Over time, it would make sense for the metadata service to become a 
 subproject within Hadoop. For now, the proposal is to make it a contrib to 
 Pig since Pig SQL is likely to be the first user of the system.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.