Lyft recently open sourced a data discovery tool called Amundsen that can
serve many of the data catalog needs.

https://eng.lyft.com/amundsen-lyfts-data-discovery-metadata-engine-62d27254fbb9
https://github.com/lyft/amundsenmetadatalibrary

You still need HMS to store the data schema though.



On Thu, Jun 20, 2019 at 4:47 AM James Cotrotsios <jamescotrots...@gmail.com>
wrote:

> Is there a plan to have a business catalog component for the Data Lake? If
> not how would someone make a proposal to create an open source project
> related to that. I would be interested in building out an open source data
> catalog that would use the Hive metadata store as a baseline for technical
> metadata.
>
>
> On Wed, Jun 19, 2019 at 3:04 PM Liwen Sun <liwen....@databricks.com>
> wrote:
>
>> We are delighted to announce the availability of Delta Lake 0.2.0!
>>
>> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart:
>> https://docs.delta.io/0.2.0/quick-start.html
>>
>> To view the release notes:
>> https://github.com/delta-io/delta/releases/tag/v0.2.0
>>
>> This release introduces two main features:
>>
>> *Cloud storage support*
>> In addition to HDFS, you can now configure Delta Lake to read and write
>> data on cloud storage services such as Amazon S3 and Azure Blob Storage.
>> For configuration instructions, please see:
>> https://docs.delta.io/0.2.0/delta-storage.html
>>
>> *Improved concurrency*
>> Delta Lake now allows concurrent append-only writes while still ensuring
>> serializability. For concurrency control in Delta Lake, please see:
>> https://docs.delta.io/0.2.0/delta-concurrency.html
>>
>> We have also greatly expanded the test coverage as part of this release.
>>
>> We would like to acknowledge all community members for contributing to
>> this release.
>>
>> Best regards,
>> Liwen Sun
>>
>>

Reply via email to