Re: Hive TPC-DS metastore dumps in Postgres

2020-07-31 Thread Stamatis Zampetakis
There is now a PR [1] with various improvements over the last update. Feel
free to check it out and let me know what you think.

Best,
Stamatis

[1] https://github.com/apache/hive/pull/1347

On Mon, Jun 22, 2020 at 5:32 PM Stamatis Zampetakis 
wrote:

> Hey guys,
>
> I put up a small project on GitHub [1] with Hive metastore dumps from
> tpcds10tb/tpcds30tb (+partitioning) and some scripts to quickly spin up a
> dockerized Postgres with those loaded.
>
> Personally, I find it useful to check the plans of TPC-DS queries using
> the usual qtest mechanism (without external tools and tapping into a real
> cluster) having at hand beefy stats + partitioning info. The driver and
> other changes needed to run these tests are located in [2].
>
> I am sharing it here in case it might be of use to somebody else.
>
> The two main commands that you will need if you wanna try this out:
> docker build --tag postgres-tpcds-metastore:1.0 .
> mvn test -Dtest=TestTezPerfDBCliDriver -Dtest.output.overwrite=true
> -Dtest.metastore.db=postgres.tpcds
>
> Small caveat: Currently in [2] the dockerized postgres is restarted for
> every query which makes things slow. This will be fixed later on.
>
> Best,
> Stamatis
>
> [1] https://github.com/zabetak/hive-postgres-metastore
> [2] https://github.com/zabetak/hive/tree/qtest_postgres_driver
>


Hive TPC-DS metastore dumps in Postgres

2020-06-22 Thread Stamatis Zampetakis
Hey guys,

I put up a small project on GitHub [1] with Hive metastore dumps from
tpcds10tb/tpcds30tb (+partitioning) and some scripts to quickly spin up a
dockerized Postgres with those loaded.

Personally, I find it useful to check the plans of TPC-DS queries using the
usual qtest mechanism (without external tools and tapping into a real
cluster) having at hand beefy stats + partitioning info. The driver and
other changes needed to run these tests are located in [2].

I am sharing it here in case it might be of use to somebody else.

The two main commands that you will need if you wanna try this out:
docker build --tag postgres-tpcds-metastore:1.0 .
mvn test -Dtest=TestTezPerfDBCliDriver -Dtest.output.overwrite=true
-Dtest.metastore.db=postgres.tpcds

Small caveat: Currently in [2] the dockerized postgres is restarted for
every query which makes things slow. This will be fixed later on.

Best,
Stamatis

[1] https://github.com/zabetak/hive-postgres-metastore
[2] https://github.com/zabetak/hive/tree/qtest_postgres_driver