You should really look at what the Netflix guys are doing on Iceberg. https://github.com/Netflix/iceberg
They have put a lot of thought into how to efficiently handle tabular data in S3. They put all of the metadata in S3 except for a single link to the name of the table's root metadata file. Other advantages of their design: - Efficient atomic addition and removal of files in S3. - Consistent schema evolution across formats - More flexible partitioning and bucketing. .. Owen On Sun, Jan 28, 2018 at 12:02 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote: > All, > > I have been bouncing around the earth for a while and have had the > privilege of working at 4-5 places. On arrival each place was in a variety > of states in their hadoop journey. > > One large company that I was at had a ~200 TB hadoop cluster. They > actually ran PIG and there ops group REFUSED to support hive, even though > they had written thousands of lines of pig macros to deal with selecting > from a partition, or a pig script file you would import so you would know > what the columns of the data at location /x/y/z is. > > In another lifetime I have been at a shop that used SCALDING. Again lots > of custom effort there with avro and parquet, all to do things that hive > would do our of the box. Again the biggest challenge is the thrift service > and metastore. > > In the cloud many people will use a bootstrap script > https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hadoop-script.html > or 'msck repair' > > The "rise of the cloud" has changed us all the metastore is being a > database is a hard paradigm to support. Imagine for example I created data > to an s3 bucket with hive, and another group in my company requires read > only access to this data for an ephemeral request. Sharing the data is > easy, S3 access can be granted, sharing the metastore and thrift services > are much more complicated. > > So lets think out of the box: > > https://www.datastax.com/2011/03/brisk-is-here-hadoop-and- > cassandra-together-at-last > > Datastax was able to build a platform where the filesystem and the > metastore were backed into Cassandra. Even though a HBase user would not > want that, the novel thing about that approach is that the metastore was > not "some extra thing in a database" that you had to deal with. > > What I am thinking is that for the user of s3, the metastore should be in > s3. Probably in hidden files inside the warehouse/table directory(ies). > > Think of it as msck repair "on the fly" "https://www.ibm.com/support/ > knowledgecenter/SSPT3X_4.2.5/com.ibm.swg.im.infosphere. > biginsights.commsql.doc/doc/biga_msckrep.html" > > The implementation could be something like this: > > On startup read hive.warehouse.dir look for "_warehouse" That would help > us locate the databases and in the databases we can locate tables, with the > tables we can locate partitions. > > This will of course scale horribly across tables with 90000000 partitions > but that would not be our use case. For all the people with "msck repair" > in the bootstrap they have a much cleaner way of using hive. > > The implementations could even be "Stacked" files first metastore lookback > second. > > It would be also wise to have a tool available in the CLI "metastore > <table> toJson" making it drop dead simple to export the schema > definitions. > > Thoughts? > > >