[jira] Commented: (PIG-823) Hadoop Metadata Service
[ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744326#action_12744326 ] Jeff Hammerbacher commented on PIG-823: --- Hey, Great to see Owl source! I've filed a ticket over on the Hive project (https://issues.apache.org/jira/browse/HIVE-762) to see if we can find some common ground between Pig and Hive's metadata needs; it would be great to have a single metadata service for all of Hadoop's structured data manipulation tools. If you're interested, please chime in there (or open a ticket here? Whatever seems sane to you). Thanks, Jeff Hadoop Metadata Service --- Key: PIG-823 URL: https://issues.apache.org/jira/browse/PIG-823 Project: Pig Issue Type: New Feature Reporter: Olga Natkovich Attachments: owl.filelist, owl.patch.gz, owl_libdeps.tgz, owl_otherdeps.tgz This JIRA is created to track development of a metadata system for Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed. Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-823) Hadoop Metadata Service
[ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718313#action_12718313 ] Amr Awadallah commented on PIG-823: --- sounds good, thanks for elaborating. Hadoop Metadata Service --- Key: PIG-823 URL: https://issues.apache.org/jira/browse/PIG-823 Project: Pig Issue Type: New Feature Reporter: Olga Natkovich This JIRA is created to track development of a metadata system for Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed. Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-823) Hadoop Metadata Service
[ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718374#action_12718374 ] Matei Zaharia commented on PIG-823: --- That sounds great. If this is sufficiently extensible, it might potentially even be useful for data that is not in HDFS, such as HBase tables (though we should avoid making the system overly complex). Hadoop Metadata Service --- Key: PIG-823 URL: https://issues.apache.org/jira/browse/PIG-823 Project: Pig Issue Type: New Feature Reporter: Olga Natkovich This JIRA is created to track development of a metadata system for Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed. Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-823) Hadoop Metadata Service
[ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717933#action_12717933 ] Amr Awadallah commented on PIG-823: --- +1 to unified meta-data service. -- amr Hadoop Metadata Service --- Key: PIG-823 URL: https://issues.apache.org/jira/browse/PIG-823 Project: Pig Issue Type: New Feature Reporter: Olga Natkovich This JIRA is created to track development of a metadata system for Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed. Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-823) Hadoop Metadata Service
[ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12718063#action_12718063 ] Alan Gates commented on PIG-823: In response to Matei's comment: The intent is not that this is Pig metadata, but that it be grid wide metadata. We don't want to put it directly in HDFS by extending the namenode, since the namenode is already heavily loaded and a central contention point in the system. We also want it to remain optional, as many users will not need it. The vision is that this will be a separate module that Hadoop users can choose to install and use with their system, along with other modules they use, such as Pig, Hive, Chuckwa, etc. The Pig team is volunteering to put it in our contrib for now because Pig is interested in it and willing to devote the resources to help it get started. Hadoop Metadata Service --- Key: PIG-823 URL: https://issues.apache.org/jira/browse/PIG-823 Project: Pig Issue Type: New Feature Reporter: Olga Natkovich This JIRA is created to track development of a metadata system for Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed. Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-823) Hadoop Metadata Service
[ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717876#action_12717876 ] Matei Zaharia commented on PIG-823: --- I agree with Jeff that that it might be better to make this service a feature of HDFS rather than a component of Pig. A metadata service might be useful to people who don't use Pig at all, e.g. who just load data and process it with MapReduce (which is a use case you cover on the Wiki page). Having a single, standard metadata service would allow unrelated tools for loading data, processing it, browsing it, etc to interoperate. Hadoop Metadata Service --- Key: PIG-823 URL: https://issues.apache.org/jira/browse/PIG-823 Project: Pig Issue Type: New Feature Reporter: Olga Natkovich This JIRA is created to track development of a metadata system for Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed. Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-823) Hadoop Metadata Service
[ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716722#action_12716722 ] Jeff Hammerbacher commented on PIG-823: --- Hey Alan, Thanks for the additional detail. I suppose I can wait for the document to be released to the public, but it sounds as if you're creating a separate extended attributes service to host non-core file and directory metadata separately from the NN. It's not clear to me that this is a positive development for Hadoop. Perhaps we should spend the engineering effort on a single, partitioned, available metadata service for all file and directory attributes? The project has larger scope and requires but is potentially a cleaner solution for the long term. Later, Jeff Hadoop Metadata Service --- Key: PIG-823 URL: https://issues.apache.org/jira/browse/PIG-823 Project: Pig Issue Type: New Feature Reporter: Olga Natkovich This JIRA is created to track development of a metadata system for Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed. Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-823) Hadoop Metadata Service
[ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716463#action_12716463 ] Jeff Hammerbacher commented on PIG-823: --- Hey Olga, Really looking forward to seeing more discussion on this issue. The NameNode already contains file metadata like ctime, mtime, the block list, permissions, etc. Will the proposed metadata service subsume those attributes as well? Curious to see the proposed design. Thanks, Jeff Hadoop Metadata Service --- Key: PIG-823 URL: https://issues.apache.org/jira/browse/PIG-823 Project: Pig Issue Type: New Feature Reporter: Olga Natkovich This JIRA is created to track development of a metadata system for Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed. Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-823) Hadoop Metadata Service
[ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714543#action_12714543 ] Olga Natkovich commented on PIG-823: We looked at metadata in Hive and it is really focused around higher level of abstraction such as tables/partitions etc. We would like to have something lower level, more generic, and closer to HDFS. We see a wider use for this system then just to support for SQL though SQL for Pig might be the first user. Hadoop Metadata Service --- Key: PIG-823 URL: https://issues.apache.org/jira/browse/PIG-823 Project: Pig Issue Type: New Feature Reporter: Olga Natkovich This JIRA is created to track development of a metadata system for Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed. Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-823) Hadoop Metadata Service
[ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12714547#action_12714547 ] Jeff Hammerbacher commented on PIG-823: --- It's an open source project and easily extensible. There are many extensions to the service within Facebook to support more general information. Why not try to add them to the existing service, since it's already got pluggable backends and a server implementation already defined? Hadoop Metadata Service --- Key: PIG-823 URL: https://issues.apache.org/jira/browse/PIG-823 Project: Pig Issue Type: New Feature Reporter: Olga Natkovich This JIRA is created to track development of a metadata system for Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed. Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.