I figure out that problem was that I run this program on my development Windows machine. It seems that there is some performance issue with java.net.NetworkInterface.getByInetAddress on Windows (I found only that http://stackoverflow.com/questions/35541870/java-networkinterface-getbyinetaddress-takes-way-too-long confirmation so far). See profiler screenshot http://pasteboard.co/8uHil3I5H.png (kudu-client v1.3.1), every call take 53 ms (!) on average. Also, could you recheck logic, why this function recalls 88 times in 12 seconds for that small program?
2017-04-24 22:29 GMT+03:00 Todd Lipcon <[email protected]>: > I tried to reproduce this locally using your code and couldn't. I get > around 100K inserts/second for 1.0, 1.1, 1.2, and 1.3 clients (against a > 1.4-SNAPSHOT cluster) > > Is it always reproducible for you? eg if you switch back to the earlier > client and try another set of runs, do you get the same results? > > -Todd > > On Mon, Apr 24, 2017 at 10:56 AM, Todd Lipcon <[email protected]> wrote: > >> I vaguely recall some bug in earlier versions of the Java client where >> 'shutdown' wouldn't properly block on the data being flushed. So it's >> possible in 1.0.x and below, you're not actually measuring the full amount >> of time to write all the data, whereas when the bug is fixed, you are. >> >> I'll see if I can repro this locally as well using your code. >> >> -Todd >> >> On Mon, Apr 24, 2017 at 10:49 AM, David Alves <[email protected]> >> wrote: >> >>> Hi Pavel >>> >>> Interesting, Thanks for sharing those numbers. >>> I assume you weren't using AUTOFLUSH_BACKGROUND for the first versions >>> you tested (don't think it was available then iirc). >>> Could you try without in the last version and see how the numbers >>> compare? >>> We'd be happy to help track down the reason for this perf regression. >>> >>> Best >>> David >>> >>> On Mon, Apr 24, 2017 at 4:58 AM, Pavel Martynov <[email protected]> >>> wrote: >>> >>>> Hi, I ran into the fact that I can not achieve high insertion speed and >>>> I start to experiment with https://github.com/cloude >>>> ra/kudu-examples/tree/master/java/insert-loadgen. >>>> My slightly modified code (recreation of table on startup + duration >>>> measuring): https://gist.github.com/xkrt/9405a2eeb98a56288b7 >>>> c5a7d817097b4. >>>> On every run I change kudu-client version, results: >>>> >>>> kudu-client-ver perf >>>> 0.10 Duration: 626 ms, 79872/sec >>>> 1.0.0 Duration: 622 ms, 80385 inserts/sec >>>> 1.0.1 Duration: 630 ms, 79365 inserts/sec >>>> 1.1.0 Duration: 11703 ms, 4272 inserts/sec >>>> 1.3.1 Duration: 12317 ms, 4059 inserts/sec >>>> >>>> As can you see there was a great degradation between 1.0.1 and 1.1.0 >>>> (about a ~20 times!). >>>> What could be a problem, how can I fix it? (actually I interested in >>>> kudu-spark, so probably using of kudu-client 1.0.1 is not right solution?). >>>> >>>> My test cluster: 3 hosts with master and tserver on each (3 masters and >>>> 3 tservers overall). >>>> No extra settings, flags used: >>>> fs_wal_dir >>>> fs_data_dirs >>>> master_addresses >>>> tserver_master_addrs >>>> >>>> >>>> -- >>>> with best regards, Pavel Martynov >>>> >>> >>> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera > -- with best regards, Pavel Martynov
