Maybe you can add your consumer number? In my opinion, more threads to insert can give a better throughput.
2017-10-31 15:07 GMT+08:00 Chao Sun <[email protected]>: > OK. Thanks! I changed to manual flush mode and it increased to ~15K / sec. > :) > > Is there any other tuning I can do to further improve this? and also, how > much would > SSD help in this case (only upsert)? > > Thanks again, > Chao > > On Mon, Oct 30, 2017 at 11:42 PM, Todd Lipcon <[email protected]> wrote: > >> If you want to manage batching yourself you can use the manual flush >> mode. Easiest would be the auto flush background mode. >> >> Todd >> >> On Oct 30, 2017 11:10 PM, "Chao Sun" <[email protected]> wrote: >> >>> Hi Todd, >>> >>> Thanks for the reply! I used a single Kafka consumer to pull the data. >>> For Kudu, I was doing something very simple that basically just follow >>> the example here >>> <https://github.com/cloudera/kudu-examples/blob/master/java/java-sample/src/main/java/org/kududb/examples/sample/Sample.java> >>> . >>> In specific: >>> >>> loop { >>> Insert insert = kuduTable.newInsert(); >>> PartialRow row = insert.getRow(); >>> // fill the columns >>> kuduSession.apply(insert) >>> } >>> >>> I didn't specify the flushing mode, so it will pick up the >>> AUTO_FLUSH_SYNC as default? >>> should I use MANUAL_FLUSH? >>> >>> Thanks, >>> Chao >>> >>> On Mon, Oct 30, 2017 at 10:39 PM, Todd Lipcon <[email protected]> wrote: >>> >>>> Hey Chao, >>>> >>>> Nice to hear you are checking out Kudu. >>>> >>>> What are you using to consume from Kafka and write to Kudu? Is it >>>> possible that it is Java code and you are using the SYNC flush mode? That >>>> would result in a separate round trip for each record and thus very low >>>> throughput. >>>> >>>> Todd >>>> >>>> On Oct 30, 2017 10:23 PM, "Chao Sun" <[email protected]> wrote: >>>> >>>> Hi, >>>> >>>> We are evaluating Kudu (version kudu 1.3.0-cdh5.11.1, revision >>>> af02f3ea6d9a1807dcac0ec75bfbca79a01a5cab) on a 8-node cluster. >>>> The data are coming from Kafka at a rate of around 30K / sec, and hash >>>> partitioned into 128 buckets. However, with default settings, Kudu can only >>>> consume the topics at a rate of around 1.5K / second. This is a direct >>>> ingest with no transformation on the data. >>>> >>>> Could this because I was using the default configurations? also we are >>>> using Kudu on HDD - could that also be related? >>>> >>>> Any help would be appreciated. Thanks. >>>> >>>> Best, >>>> Chao >>>> >>>> >>>> >>> >
