On Tue, Mar 14, 2017 at 11:25 AM, Alexey Serbin <aser...@cloudera.com> wrote:
> Hi, > > It seems that sort of benchmark is not a trivial undertaking. I'm sure > there is a lot to consider while doing that sort of benchmark. Probably, > more senior members of the Kudu team could suggest something else, but > right away I can suggest the following: > > 1. Consider using real hardware machines while doing the benchmark, not > VMs. Make sure the databases store their data on the same media when doing > the comparison. > > 2. Make sure your benchmark schema is supported by both Kudu and > PostgreSQL. Probably, to perform the benchmark you would need to tweak > your existing schema little bit. Kudu supports a subset of types available > in PostreSQL. Also, pay attention to primary keys/indices and partitions > if you running read/scan comparisons. Overall, in this context it's worth > reading this document first: https://kudu.apache.org/docs/ > schema_design.html > > 3. Kudu is supposed to shine when working with huge amount of data spread > across multiple machines in a cluster. Are you about to use clustered > setup for PostgreSQL as well? May be worth considering to try clustered > setup for PostgreSQL as well. > > 4. While creating Kudu tables, use just a single replica -- additional > replicas add some latency for write operations because the write operation > is considered successful only when by majority of existing replicas. Also, > since I didn't see > Oops, something happened with those words. I meant ... only when acknowledged by the majority of existing replicas. I'm suggesting to use just a single replica since I didn't see anything mentioned about replication for the PostgreSQL. > 5. Consider placing WAL for both Kudu and PostgreSQL on an SSD -- this > lowers latencies for DML operations. I know that's so at least for Kudu, > and I would expect that's true for PostgreSQL as well. > > 6. Pay some attention to run-time resource limits in effect while running > those benchmarks: > https://www.postgresql.org/docs/9.6/static/runtime-config-resource.html > https://kudu.apache.org/docs/configuration_reference.html (search for > flags containing 'memory' and 'cache_size' in their names) > > > As for inserting your existing data into Kudu, consider using Impala: > https://kudu.apache.org/docs/kudu_impala_integration.html > > > Best regards, > > Alexey > > On Tue, Mar 14, 2017 at 8:01 AM, paulo faria <ziko...@hotmail.com> wrote: > >> HI >> >> >> Im doing a benchmark of Kudu(and other timeseriesdbs) Versus PostgresQL >> 9.6. >> Done ur VM demo tutorial already. >> >> >> But now I would like to compare those 2. I already got the Postgresql >> enviroment set (with some tables + data (1GB per table to test)) on a >> remote server. >> 1)What is ur advice for a query(reads) performance compare? >> 2)Any way to convert(or migrate) the postgres structure to the Kudu? I >> got my database on HUE Impala so i can query over there and download the >> data also from there. >> >> >> Any tips are apreciated >> >> Best Regards >> >> Paulo Faria >> >> >