Re: Proposal: File based metastore

Owen O'Malley Mon, 29 Jan 2018 09:11:35 -0800

You should really look at what the Netflix guys are doing on Iceberg.

https://github.com/Netflix/iceberg


They have put a lot of thought into how to efficiently handle tabular data
in S3. They put all of the metadata in S3 except for a single link to the
name of the table's root metadata file.

Other advantages of their design:

   - Efficient atomic addition and removal of files in S3.
   - Consistent schema evolution across formats
   - More flexible partitioning and bucketing.


.. Owen

On Sun, Jan 28, 2018 at 12:02 PM, Edward Capriolo <edlinuxg...@gmail.com>
wrote:

> All,
>
> I have been bouncing around the earth for a while and have had the
> privilege of working at 4-5 places. On arrival each place was in a variety
> of states in their hadoop journey.
>
> One large company that I was at had a ~200 TB hadoop cluster. They
> actually ran PIG and there ops group REFUSED to support hive, even though
> they had written thousands of lines of pig macros to deal with selecting
> from a partition, or a pig script file you would import so you would know
> what the columns of the data at location /x/y/z is.
>
> In another lifetime I have been at a shop that used SCALDING. Again lots
> of custom effort there with avro and parquet, all to do things that hive
> would do our of the box. Again the biggest challenge is the thrift service
> and metastore.
>
> In the cloud many people will use a bootstrap script
> https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hadoop-script.html
> or 'msck repair'
>
> The "rise of the cloud" has changed us all the metastore is being a
> database is a hard paradigm to support. Imagine for example I created data
> to an s3 bucket with hive, and another group in my company requires read
> only access to this data for an ephemeral request. Sharing the data is
> easy, S3 access can be granted, sharing the metastore and thrift services
> are much more complicated.
>
> So lets think out of the box:
>
> https://www.datastax.com/2011/03/brisk-is-here-hadoop-and-
> cassandra-together-at-last
>
> Datastax was able to build a platform where the filesystem and the
> metastore were backed into Cassandra. Even though a HBase user would not
> want that, the novel thing about that approach is that the metastore was
> not "some extra thing in a database" that you had to deal with.
>
> What I am thinking is that for the user of s3, the metastore should be in
> s3. Probably in hidden files inside the warehouse/table directory(ies).
>
> Think of it as msck repair "on the fly" "https://www.ibm.com/support/
> knowledgecenter/SSPT3X_4.2.5/com.ibm.swg.im.infosphere.
> biginsights.commsql.doc/doc/biga_msckrep.html"
>
> The implementation could be something like this:
>
> On startup read hive.warehouse.dir look for "_warehouse" That would help
> us locate the databases and in the databases we can locate tables, with the
> tables we can locate partitions.
>
> This will of course scale horribly across tables with 90000000 partitions
> but that would not be our use case. For all the people with "msck repair"
> in the bootstrap they have a much cleaner way of using hive.
>
> The implementations could even be "Stacked" files first metastore lookback
> second.
>
> It would be also wise to have a tool available in the CLI "metastore
> <table> toJson" making it drop dead simple to export the schema
> definitions.
>
> Thoughts?
>
>
>

Re: Proposal: File based metastore

Reply via email to