Lyft recently open sourced a data discovery tool called Amundsen that can serve many of the data catalog needs.
https://eng.lyft.com/amundsen-lyfts-data-discovery-metadata-engine-62d27254fbb9 https://github.com/lyft/amundsenmetadatalibrary You still need HMS to store the data schema though. On Thu, Jun 20, 2019 at 4:47 AM James Cotrotsios <jamescotrots...@gmail.com> wrote: > Is there a plan to have a business catalog component for the Data Lake? If > not how would someone make a proposal to create an open source project > related to that. I would be interested in building out an open source data > catalog that would use the Hive metadata store as a baseline for technical > metadata. > > > On Wed, Jun 19, 2019 at 3:04 PM Liwen Sun <liwen....@databricks.com> > wrote: > >> We are delighted to announce the availability of Delta Lake 0.2.0! >> >> To try out Delta Lake 0.2.0, please follow the Delta Lake Quickstart: >> https://docs.delta.io/0.2.0/quick-start.html >> >> To view the release notes: >> https://github.com/delta-io/delta/releases/tag/v0.2.0 >> >> This release introduces two main features: >> >> *Cloud storage support* >> In addition to HDFS, you can now configure Delta Lake to read and write >> data on cloud storage services such as Amazon S3 and Azure Blob Storage. >> For configuration instructions, please see: >> https://docs.delta.io/0.2.0/delta-storage.html >> >> *Improved concurrency* >> Delta Lake now allows concurrent append-only writes while still ensuring >> serializability. For concurrency control in Delta Lake, please see: >> https://docs.delta.io/0.2.0/delta-concurrency.html >> >> We have also greatly expanded the test coverage as part of this release. >> >> We would like to acknowledge all community members for contributing to >> this release. >> >> Best regards, >> Liwen Sun >> >>