Hi, It seems that sort of benchmark is not a trivial undertaking. I'm sure there is a lot to consider while doing that sort of benchmark. Probably, more senior members of the Kudu team could suggest something else, but right away I can suggest the following:
1. Consider using real hardware machines while doing the benchmark, not VMs. Make sure the databases store their data on the same media when doing the comparison. 2. Make sure your benchmark schema is supported by both Kudu and PostgreSQL. Probably, to perform the benchmark you would need to tweak your existing schema little bit. Kudu supports a subset of types available in PostreSQL. Also, pay attention to primary keys/indices and partitions if you running read/scan comparisons. Overall, in this context it's worth reading this document first: https://kudu.apache.org/docs/schema_design.html 3. Kudu is supposed to shine when working with huge amount of data spread across multiple machines in a cluster. Are you about to use clustered setup for PostgreSQL as well? May be worth considering to try clustered setup for PostgreSQL as well. 4. While creating Kudu tables, use just a single replica -- additional replicas add some latency for write operations because the write operation is considered successful only when by majority of existing replicas. Also, since I didn't see 5. Consider placing WAL for both Kudu and PostgreSQL on an SSD -- this lowers latencies for DML operations. I know that's so at least for Kudu, and I would expect that's true for PostgreSQL as well. 6. Pay some attention to run-time resource limits in effect while running those benchmarks: https://www.postgresql.org/docs/9.6/static/runtime-config-resource.html https://kudu.apache.org/docs/configuration_reference.html (search for flags containing 'memory' and 'cache_size' in their names) As for inserting your existing data into Kudu, consider using Impala: https://kudu.apache.org/docs/kudu_impala_integration.html Best regards, Alexey On Tue, Mar 14, 2017 at 8:01 AM, paulo faria <[email protected]> wrote: > HI > > > Im doing a benchmark of Kudu(and other timeseriesdbs) Versus PostgresQL > 9.6. > Done ur VM demo tutorial already. > > > But now I would like to compare those 2. I already got the Postgresql > enviroment set (with some tables + data (1GB per table to test)) on a > remote server. > 1)What is ur advice for a query(reads) performance compare? > 2)Any way to convert(or migrate) the postgres structure to the Kudu? I got > my database on HUE Impala so i can query over there and download the data > also from there. > > > Any tips are apreciated > > Best Regards > > Paulo Faria > >
