Hello Dan Hecht, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/5288 to look at the new patch set (#3). Change subject: IMPALA-3788: Add flag for Kudu read-your-writes ...................................................................... IMPALA-3788: Add flag for Kudu read-your-writes The previous attempt to support for Kudu 'read-your-writes' consistency successfully captured the latest observed ts from the Kudu client after a write, and to propagate it to future Kudu clients within the same session. That alone made writes within a session linearizable, but it did not fully address 'read-your-writes' semantics because the Kudu client in the KuduScanner needed further configuration. The Kudu client exposes an option to set the 'ReadMode', which can be either READ_LATEST or READ_AT_SNAPSHOT. The former is the default and allows the client to read the latest known value for every row, and there is no consistency among the version of the rows read within that scan. When READ_AT_SNAPSHOT is enabled, the client will pick a ts that is after the latest observed session ts (propagated and set with SetLatestObservedTimestamp() by the previous commit for IMPALA-3788) and perform a snapshot read at that time. This timestamp is still determined per-client, so that does not mean that the entire query performs a snapshot read at the same timestamp-- doing that requires further work in Kudu and will require another change in Impala as well. That said, this behavior is sufficient to satisfy 'read-your-writes' consistency in all cases _except_ when a DML statement is reading and writing the same table, e.g. INSERT INTO foo SELECT ... from foo This case may result in reading rows that were inserted by a different node of the same query. This case will be handled when a global snapshot timestamp is supported and configured by Impala. Because this is performing a snapshot read, some rows may be read from lagging replicas and thus those replicas will have to wait before returning rows. This has implications for the query execution behavior (e.g. queries may be more likely to time out, may affect number of queries that can be run), so the behavior is not yet enabled by default. It can be enabled with the flag --kudu_read_mode READ_AT_SNAPSHOT The goal is to make this the default behavior after sufficient testing. Change-Id: I003aba410548bc9158d1e11abbdcf710c31a82ff --- M be/src/exec/kudu-scanner.cc 1 file changed, 11 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/5288/3 -- To view, visit http://gerrit.cloudera.org:8080/5288 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I003aba410548bc9158d1e11abbdcf710c31a82ff Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>