Re: Use our own metastore with Spark SQL

Zhu, Luke Mon, 14 Oct 2019 19:07:18 -0700

I had a similar issue this summer while prototyping Spark on K8s. I ended
up sticking with Hive Metastore 2 to meet time goals. Not sure if I was
using it correctly, but I only needed Hadoop + Hive JARs; I did not need to
run HDFS, YARN, etc. Using the metastore with an s3a warehouse.dir path
seemed to work fine.

When Spark supports Metastore 3.0, things should be a bit easier as HMS 3
will have clearer instructions for standalone deployments.
https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration

If you have more time and truly need to move away from everything Hadoop,
you can also implement ExternalCatalog:
https://github.com/apache/spark/blob/5264164a67df498b73facae207eda12ee133be7d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala

See https://jira.apache.org/jira/browse/SPARK-23443 for ongoing progress on
a Glue ExternalCatalog implementation. If you are using EMR, you can also
check  out
https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html

On Mon, Oct 14, 2019 at 12:24 PM xweb <ashish8...@gmail.com> wrote:

>
> Is it possible to use our own metastore instead of Hive Metastore with
> Spark
> SQL?
>
> Can you please point me to some docs or code I can look at to get it done?
>
> We are moving away from everything Hadoop.
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: Use our own metastore with Spark SQL

Reply via email to