If you want to manage batching yourself you can use the manual flush mode. Easiest would be the auto flush background mode.
Todd On Oct 30, 2017 11:10 PM, "Chao Sun" <[email protected]> wrote: > Hi Todd, > > Thanks for the reply! I used a single Kafka consumer to pull the data. > For Kudu, I was doing something very simple that basically just follow the > example here > <https://github.com/cloudera/kudu-examples/blob/master/java/java-sample/src/main/java/org/kududb/examples/sample/Sample.java> > . > In specific: > > loop { > Insert insert = kuduTable.newInsert(); > PartialRow row = insert.getRow(); > // fill the columns > kuduSession.apply(insert) > } > > I didn't specify the flushing mode, so it will pick up the AUTO_FLUSH_SYNC > as default? > should I use MANUAL_FLUSH? > > Thanks, > Chao > > On Mon, Oct 30, 2017 at 10:39 PM, Todd Lipcon <[email protected]> wrote: > >> Hey Chao, >> >> Nice to hear you are checking out Kudu. >> >> What are you using to consume from Kafka and write to Kudu? Is it >> possible that it is Java code and you are using the SYNC flush mode? That >> would result in a separate round trip for each record and thus very low >> throughput. >> >> Todd >> >> On Oct 30, 2017 10:23 PM, "Chao Sun" <[email protected]> wrote: >> >> Hi, >> >> We are evaluating Kudu (version kudu 1.3.0-cdh5.11.1, revision >> af02f3ea6d9a1807dcac0ec75bfbca79a01a5cab) on a 8-node cluster. >> The data are coming from Kafka at a rate of around 30K / sec, and hash >> partitioned into 128 buckets. However, with default settings, Kudu can only >> consume the topics at a rate of around 1.5K / second. This is a direct >> ingest with no transformation on the data. >> >> Could this because I was using the default configurations? also we are >> using Kudu on HDD - could that also be related? >> >> Any help would be appreciated. Thanks. >> >> Best, >> Chao >> >> >> >
