Wang Xixu has posted comments on this change. ( http://gerrit.cloudera.org:8080/19949 )
Change subject: KUDU-3483 Fix flushing data in batch when table schema changed ...................................................................... Patch Set 16: (1 comment) > Patch Set 16: > > This patch batches all buffered operations into their respective tablet and > schema and then sends them to tablet servers. Given each tablet has only one > schema, some operations will fail due to inconsistent schema with that of the > tablet. > > It's a good way to avoid discarding all buffered operations. But it also > sends extra write requests to the tablet server since we already know the > inconsistency of the batched operations. > > So is it possible to identify operations with outdated schema from the client > and return row errors for these operations? The difference between your solution and my solution is your solution finds the different schema in the same batch, return the errors directly. My solution splits the rows into multiple batches, and tries to flush the batches, maybe part of rows will succeed. Your solution is pessimistic, my solution is optimism. Besides, your solution return the error row, and the user will try again, then this errors rows will be distributed into the same batch mostly, and the error will happen again. Only if the user opens the table and reconstructs the error row using the newest schema, it will succeed. I think flushing schema-changed rows is a low probability case, it will not case much pressure to the Kudu server. What's your opinion? @Alexy Serbin @Yinchun Lai http://gerrit.cloudera.org:8080/#/c/19949/15/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java: http://gerrit.cloudera.org:8080/#/c/19949/15/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java@418 PS15, Line 418: // Compare with the last schema in the list, because the last operations : // have the same schemas with it most likely. > We allow concurrent flushes from the client via the same KuduSession for th Yes, it not always true. But what is the cost if found a row with different row with the last rows in the batch? It only needs to add an extra space. This core of this algorithm is using extra space to reduce time complexity. -- To view, visit http://gerrit.cloudera.org:8080/19949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie6501962b32814d121f180b2942999c402d927db Gerrit-Change-Number: 19949 Gerrit-PatchSet: 16 Gerrit-Owner: Wang Xixu <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: KeDeng <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Wang Xixu <[email protected]> Gerrit-Reviewer: Yifan Zhang <[email protected]> Gerrit-Reviewer: Yingchun Lai <[email protected]> Gerrit-Reviewer: Yuqi Du <[email protected]> Gerrit-Comment-Date: Tue, 27 Jun 2023 05:52:34 +0000 Gerrit-HasComments: Yes
