Hi Boris, Kudu clients (both Java and C++ ones) send write operations to corresponding tablet servers in batches when using the AUTO_FLUSH_BACKGROUND and MANUAL_FLUSH modes. When a tablet server receives a Write RPC (WriteRequestPB is the corresponding type of the parameter), it decodes the operations from the batch: https://github.com/apache/kudu/blob/master/src/kudu/tablet/local_tablet_writer.h#L97
While decoding operations from a batch, various constraints are being checked. One of those is checking for nulls in non-nullable columns. If there is a row in the batch that violates the non-nullable constraint, the whole batch is rejected. That's exactly what happened in your example: a batch to one tablet consisted of 3 rows one of which had a row with violation of the non-nullable constraint for the dt_tm column, so the whole batch of 3 operations was rejected. You can play with different partition schemes: e.g., in case of 10 hashed partitions it might happen that only 2 operations would be rejected, in case of 30 partitions -- just the single key==2 row could be rejected. BTW, that might also happen if using the MANUAL_FLUSH mode. However, with the AUTO_FLUSH_SYNC mode, the client sends operations in batches of size 1. Kind regards, Alexey On Fri, Nov 16, 2018 at 7:24 PM Boris Tyukin <[email protected]> wrote: > Hi Todd, > > We are on Kudu 1.5 still and I used Kudu client 1.7 > > Thanks, > Boris > > On Fri, Nov 16, 2018, 17:07 Todd Lipcon <[email protected] wrote: > >> Hi Boris, >> >> This is interesting. Just so we're looking at the same code, what version >> of the kudu-client dependency have you specified, and what version of the >> server? >> >> -Todd >> >> On Fri, Nov 16, 2018 at 1:12 PM Boris Tyukin <[email protected]> >> wrote: >> >>> Hey guys, >>> >>> I am playing with Kudu Java client (wow it is fast), using mostly code >>> from Kudu Java example. >>> >>> While learning about exceptions during rows inserts, I stumbled upon >>> something I could not explain. >>> >>> If I insert 10 rows into a brand new Kudu table (AUTO_FLUSH_BACKGROUND >>> mode) and I make one row to be "bad" intentionally (one column cannot be >>> NULL), I actually get 3 rows that cannot be inserted into Kudu, not 1 as I >>> was expected. >>> >>> But if I do session.flush() after every single insert, I get only one >>> error row (but this ruins the purpose of AUTO_FLUSH_BACKGROUND mode). >>> >>> Any ideas one? We cannot afford losing data and need to track all rows >>> which cannot be inserted. >>> >>> AUTO_FLUSH mode works much better and I do not have an issue like above, >>> but then it is way slower than AUTO_FLUSH_BACKGROUND. >>> >>> My code is below. It is in Groovy, but I think you will get an idea :) >>> https://gist.github.com/boristyukin/8703d2c6ec55d6787843aa133920bf01 >>> >>> Here is output from my test code that hopefully illustrates my confusion >>> - out of 10 rows inserted, 9 should be good and 1 bad, but it turns out >>> Kudu flagged 3 as bad: >>> >>> Created table kudu_groovy_example >>> Inserting 10 rows in AUTO_FLUSH_BACKGROUND flush mode ... >>> (int32 key=1, string value="value 1", unixtime_micros >>> dt_tm=2018-11-16T20:57:03.469000Z) >>> (int32 key=2, string value=NULL) BAD ROW >>> (int32 key=3, string value="value 3", unixtime_micros >>> dt_tm=2018-11-16T20:57:03.595000Z) >>> (int32 key=4, string value=NULL, unixtime_micros >>> dt_tm=2018-11-16T20:57:03.596000Z) >>> (int32 key=5, string value="value 5", unixtime_micros >>> dt_tm=2018-11-16T20:57:03.597000Z) >>> (int32 key=6, string value=NULL, unixtime_micros >>> dt_tm=2018-11-16T20:57:03.597000Z) >>> (int32 key=7, string value="value 7", unixtime_micros >>> dt_tm=2018-11-16T20:57:03.598000Z) >>> (int32 key=8, string value=NULL, unixtime_micros >>> dt_tm=2018-11-16T20:57:03.602000Z) >>> (int32 key=9, string value="value 9", unixtime_micros >>> dt_tm=2018-11-16T20:57:03.603000Z) >>> (int32 key=10, string value=NULL, unixtime_micros >>> dt_tm=2018-11-16T20:57:03.603000Z) >>> 3 errors inserting rows - why 3???? only 1 expected to be bad... >>> there were errors inserting rows to Kudu >>> the first few errors follow: >>> ??? key 1 and 6 supposed to be fine! >>> Row error for primary key=[-128, 0, 0, 1], tablet=null, server=null, >>> status=Invalid argument: No value provided for required column: >>> dt_tm[unixtime_micros NOT NULL] (error 0) >>> Row error for primary key=[-128, 0, 0, 2], tablet=null, server=null, >>> status=Invalid argument: No value provided for required column: >>> dt_tm[unixtime_micros NOT NULL] (error 0) >>> Row error for primary key=[-128, 0, 0, 6], tablet=null, server=null, >>> status=Invalid argument: No value provided for required column: >>> dt_tm[unixtime_micros NOT NULL] (error 0) >>> Rows counted in 485 ms >>> Table has 7 rows - ??? supposed to be 9! >>> INT32 key=4, STRING value=NULL, UNIXTIME_MICROS >>> dt_tm=2018-11-16T20:57:03.596000Z >>> INT32 key=8, STRING value=NULL, UNIXTIME_MICROS >>> dt_tm=2018-11-16T20:57:03.602000Z >>> INT32 key=9, STRING value=value 9, UNIXTIME_MICROS >>> dt_tm=2018-11-16T20:57:03.603000Z >>> INT32 key=3, STRING value=value 3, UNIXTIME_MICROS >>> dt_tm=2018-11-16T20:57:03.595000Z >>> INT32 key=10, STRING value=NULL, UNIXTIME_MICROS >>> dt_tm=2018-11-16T20:57:03.603000Z >>> INT32 key=5, STRING value=value 5, UNIXTIME_MICROS >>> dt_tm=2018-11-16T20:57:03.597000Z >>> INT32 key=7, STRING value=value 7, UNIXTIME_MICROS >>> dt_tm=2018-11-16T20:57:03.598000Z >>> >>> >>> >>> >>> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> >
