Re: strange behavior of getPendingErrors

Alexey Serbin Sat, 17 Nov 2018 13:26:55 -0800

https://issues.apache.org/jira/browse/KUDU-2625 is the JIRA to track this
issue.  Feel free to add details, comments, etc.



Thanks,

Alexey

On Sat, Nov 17, 2018 at 7:13 AM Alexey Serbin <aser...@cloudera.com> wrote:

> Hey Todd,
>
> Yes, that behavior is a bit strange especially given the fact that the
> behavior differs when it comes to duplicate rows and other errors that
> happen at later stages of 'applying' write operations at the server side.
>
> I'll open a JIRA item about this issue.  If anyone disagrees, we can
> resolve the JIRA item as needed.
>
>
> Thanks,
>
> Alexey
>
> On Sat, Nov 17, 2018 at 12:01 AM Todd Lipcon <t...@cloudera.com> wrote:
>
>> Hey Alexey,
>>
>> I think your explanation makes sense from an implementation perspective.
>> But, I think we should treat this behavior as a bug. From the user
>> perspective, such an error is a per-row data issue and should only affect
>> the row with the problem, not some arbitrary subset of rows in the batch
>> which happened to share a partition.
>>
>> Does anyone disagree?
>>
>> Todd
>>
>> On Fri, Nov 16, 2018, 9:28 PM Alexey Serbin <aser...@cloudera.com wrote:
>>
>>> Hi Boris,
>>>
>>> Kudu clients (both Java and C++ ones) send write operations to
>>> corresponding tablet servers in batches when using the AUTO_FLUSH_BACKGROUND
>>> and MANUAL_FLUSH modes.  When a tablet server receives a Write RPC
>>> (WriteRequestPB is the corresponding type of the parameter), it decodes the
>>> operations from the batch:
>>> https://github.com/apache/kudu/blob/master/src/kudu/tablet/local_tablet_writer.h#L97
>>>
>>> While decoding operations from a batch, various constraints are being
>>> checked.  One of those is checking for nulls in non-nullable columns.  If
>>> there is a row in the batch that violates the non-nullable constraint, the
>>> whole batch is rejected.
>>>
>>> That's exactly what happened in your example: a batch to one tablet
>>> consisted of 3 rows one of which had a row with violation of the
>>> non-nullable constraint for the dt_tm column, so the whole batch of 3
>>> operations was rejected.  You can play with different partition schemes:
>>> e.g., in case of 10 hashed partitions it might happen that only 2
>>> operations would be rejected, in case of 30 partitions -- just the single
>>> key==2 row could be rejected.
>>>
>>> BTW, that might also happen if using the MANUAL_FLUSH mode.  However,
>>> with the AUTO_FLUSH_SYNC mode, the client sends operations in batches of
>>> size 1.
>>>
>>>
>>> Kind regards,
>>>
>>> Alexey
>>>
>>> On Fri, Nov 16, 2018 at 7:24 PM Boris Tyukin <bo...@boristyukin.com>
>>> wrote:
>>>
>>>> Hi Todd,
>>>>
>>>> We are on Kudu 1.5 still and I used Kudu client 1.7
>>>>
>>>> Thanks,
>>>> Boris
>>>>
>>>> On Fri, Nov 16, 2018, 17:07 Todd Lipcon <t...@cloudera.com wrote:
>>>>
>>>>> Hi Boris,
>>>>>
>>>>> This is interesting. Just so we're looking at the same code, what
>>>>> version of the kudu-client dependency have you specified, and what version
>>>>> of the server?
>>>>>
>>>>> -Todd
>>>>>
>>>>> On Fri, Nov 16, 2018 at 1:12 PM Boris Tyukin <bo...@boristyukin.com>
>>>>> wrote:
>>>>>
>>>>>> Hey guys,
>>>>>>
>>>>>> I am playing with Kudu Java client (wow it is fast), using mostly
>>>>>> code from Kudu Java example.
>>>>>>
>>>>>> While learning about exceptions during rows inserts, I stumbled upon
>>>>>> something I could not explain.
>>>>>>
>>>>>> If I insert 10 rows into a brand new Kudu table
>>>>>> (AUTO_FLUSH_BACKGROUND mode) and I make one row to be "bad" intentionally
>>>>>> (one column cannot be NULL), I actually get 3 rows that cannot be 
>>>>>> inserted
>>>>>> into Kudu, not 1 as I was expected.
>>>>>>
>>>>>> But if I do session.flush() after every single insert, I get only one
>>>>>> error row (but this ruins the purpose of AUTO_FLUSH_BACKGROUND mode).
>>>>>>
>>>>>> Any ideas one? We cannot afford losing data and need to track all
>>>>>> rows which cannot be inserted.
>>>>>>
>>>>>> AUTO_FLUSH mode works much better and I do not have an issue like
>>>>>> above, but then it is way slower than AUTO_FLUSH_BACKGROUND.
>>>>>>
>>>>>> My code is below. It is in Groovy, but I think you will get an idea :)
>>>>>> https://gist.github.com/boristyukin/8703d2c6ec55d6787843aa133920bf01
>>>>>>
>>>>>> Here is output from my test code that hopefully illustrates my
>>>>>> confusion - out of 10 rows inserted, 9 should be good and 1 bad, but it
>>>>>> turns out Kudu flagged 3 as bad:
>>>>>>
>>>>>> Created table kudu_groovy_example
>>>>>> Inserting 10 rows in AUTO_FLUSH_BACKGROUND flush mode ...
>>>>>> (int32 key=1, string value="value 1", unixtime_micros
>>>>>> dt_tm=2018-11-16T20:57:03.469000Z)
>>>>>> (int32 key=2, string value=NULL)  BAD ROW
>>>>>> (int32 key=3, string value="value 3", unixtime_micros
>>>>>> dt_tm=2018-11-16T20:57:03.595000Z)
>>>>>> (int32 key=4, string value=NULL, unixtime_micros
>>>>>> dt_tm=2018-11-16T20:57:03.596000Z)
>>>>>> (int32 key=5, string value="value 5", unixtime_micros
>>>>>> dt_tm=2018-11-16T20:57:03.597000Z)
>>>>>> (int32 key=6, string value=NULL, unixtime_micros
>>>>>> dt_tm=2018-11-16T20:57:03.597000Z)
>>>>>> (int32 key=7, string value="value 7", unixtime_micros
>>>>>> dt_tm=2018-11-16T20:57:03.598000Z)
>>>>>> (int32 key=8, string value=NULL, unixtime_micros
>>>>>> dt_tm=2018-11-16T20:57:03.602000Z)
>>>>>> (int32 key=9, string value="value 9", unixtime_micros
>>>>>> dt_tm=2018-11-16T20:57:03.603000Z)
>>>>>> (int32 key=10, string value=NULL, unixtime_micros
>>>>>> dt_tm=2018-11-16T20:57:03.603000Z)
>>>>>> 3 errors inserting rows - why 3???? only 1 expected to be bad...
>>>>>> there were errors inserting rows to Kudu
>>>>>> the first few errors follow:
>>>>>> ??? key 1 and 6 supposed to be fine!
>>>>>> Row error for primary key=[-128, 0, 0, 1], tablet=null, server=null,
>>>>>> status=Invalid argument: No value provided for required column:
>>>>>> dt_tm[unixtime_micros NOT NULL] (error 0)
>>>>>> Row error for primary key=[-128, 0, 0, 2], tablet=null, server=null,
>>>>>> status=Invalid argument: No value provided for required column:
>>>>>> dt_tm[unixtime_micros NOT NULL] (error 0)
>>>>>> Row error for primary key=[-128, 0, 0, 6], tablet=null, server=null,
>>>>>> status=Invalid argument: No value provided for required column:
>>>>>> dt_tm[unixtime_micros NOT NULL] (error 0)
>>>>>> Rows counted in 485 ms
>>>>>> Table has 7 rows - ??? supposed to be 9!
>>>>>> INT32 key=4, STRING value=NULL, UNIXTIME_MICROS
>>>>>> dt_tm=2018-11-16T20:57:03.596000Z
>>>>>> INT32 key=8, STRING value=NULL, UNIXTIME_MICROS
>>>>>> dt_tm=2018-11-16T20:57:03.602000Z
>>>>>> INT32 key=9, STRING value=value 9, UNIXTIME_MICROS
>>>>>> dt_tm=2018-11-16T20:57:03.603000Z
>>>>>> INT32 key=3, STRING value=value 3, UNIXTIME_MICROS
>>>>>> dt_tm=2018-11-16T20:57:03.595000Z
>>>>>> INT32 key=10, STRING value=NULL, UNIXTIME_MICROS
>>>>>> dt_tm=2018-11-16T20:57:03.603000Z
>>>>>> INT32 key=5, STRING value=value 5, UNIXTIME_MICROS
>>>>>> dt_tm=2018-11-16T20:57:03.597000Z
>>>>>> INT32 key=7, STRING value=value 7, UNIXTIME_MICROS
>>>>>> dt_tm=2018-11-16T20:57:03.598000Z
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>>>>
>>>>

Re: strange behavior of getPendingErrors

Reply via email to