Looking a bit deeper, I think this is because listTables actually retrieves
all the metadata about each table, not just the table names, and metadata
can be a fair amount of data.

Seems like querying just table names via SQL is reasonably fast:

spark.sql("show tables like 'pattern'")

as it only returns table name and isTemporary.

On Wed, Jan 4, 2017 at 11:26 AM, Everett Anderson <ever...@nuna.com> wrote:

> Hi,
>
> In Spark 1.6.2, we were able to very quickly -- nearly instantly -- search
> through the list of (many) table names in our Hive metastore with
>
> sqlContext.tableNames().filter(_.matches("some regex")).foreach { println
> }
>
> In Spark 2.0.2, however, this takes forever. Similarly, queries with
> Catalog that should return a Dataset like
>
> spark.catalog.listTables("default")
>
> take forever.  Setting the log level to DEBUG in the spark-shell, I can
> see the above command is scrolling through every table name in the
> metastore.
>
> Does anyone have a better way to quickly search through the metastore for
> a table names matching a regexp in Spark 2?
>
> Thanks!
>
> - Everett
>
>
>

Reply via email to