Wang Xixu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19949 )

Change subject: KUDU-3483 Fix flushing data in batch when table schema changed
......................................................................


Patch Set 16:

(1 comment)

> Patch Set 16:
>
> This patch batches all buffered operations into their respective tablet and 
> schema and then sends them to tablet servers. Given each tablet has only one 
> schema, some operations will fail due to inconsistent schema with that of the 
> tablet.
>
> It's a good way to avoid discarding all buffered operations. But it also 
> sends extra write requests to the tablet server since we already know the 
> inconsistency of the batched operations.
>
> So is it possible to identify operations with outdated schema from the client 
> and return row errors for these operations?

The difference between your solution and my solution is your solution finds the 
different schema in the same batch, return the errors directly. My solution 
splits the rows into multiple batches, and tries to flush the batches, maybe 
part of rows will succeed. Your solution is pessimistic, my solution is 
optimism.

Besides, your solution return the error row, and the user will try again, then 
this errors rows will be distributed into the same batch mostly, and the error 
will happen again. Only if the user opens the table and reconstructs the error 
row using the newest schema, it will succeed.

I think flushing schema-changed rows is a low probability case, it will not 
case much pressure to the Kudu server.

What's your opinion? @Alexy Serbin @Yinchun Lai

http://gerrit.cloudera.org:8080/#/c/19949/15/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java
File 
java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java:

http://gerrit.cloudera.org:8080/#/c/19949/15/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java@418
PS15, Line 418: // Compare with the last schema in the list, because the last 
operations
              :         // have the same schemas with it most likely.
> We allow concurrent flushes from the client via the same KuduSession for th
Yes, it not always true. But what is the cost if found a row with different row 
with the last rows in the batch? It only needs to add an extra space. This core 
of this algorithm is using extra space to reduce time complexity.



--
To view, visit http://gerrit.cloudera.org:8080/19949
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie6501962b32814d121f180b2942999c402d927db
Gerrit-Change-Number: 19949
Gerrit-PatchSet: 16
Gerrit-Owner: Wang Xixu <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: KeDeng <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Wang Xixu <[email protected]>
Gerrit-Reviewer: Yifan Zhang <[email protected]>
Gerrit-Reviewer: Yingchun Lai <[email protected]>
Gerrit-Reviewer: Yuqi Du <[email protected]>
Gerrit-Comment-Date: Tue, 27 Jun 2023 05:52:34 +0000
Gerrit-HasComments: Yes

Reply via email to