I was wondering if there is any guarantee from the kudu scanner that the rows
returned from a single tablet scan are always in the same order basing on the
following assumptions :
- There was no change in the underlying kudu tablet for the given scan range
when the reads are being performed multiple times for the same scan token
- I am using Java client
- I am using Kudu version 1.4.0
- The client code is using the KuduScanTokenBuilder API to plan the set of
scans that can be performed for a given query.
- The client is using the nextRows() followed using hasNext() and next()
methods in the corresponding iterators.
- There seems to be a variable called orderMode in the asyncScanner during a
debug session but it looks like this property is not exposed yet as a public
API. The default value seems to be that it is unordered.
Perhaps the answer is no per the last point above but would like confirmation
from the community.
I am integrating Apache Apex with Apache kudu and am using the scan token
builder API mechanism to plan the scans in a distributed way. While doing so, I
would like to provide the end users of Apache Apex a mechanism to get a
consistent scan ordering as a configurable approach. Given it is almost
impossible to achieve this ordering in a true distributed fashion for
downstream compute nodes, the aim is to provide consistent ordering within a
single Apex partition. Apache apex with Kudu integration would be providing
configurations to map one tablet to one or multiple apex partitions. While
scanning in either of these mapping styles, I would like to provide further
ordering guarantees. However I am not sure if Apache Kudu would provide a
consistent ordering for the same scan provided the above assumptions hold good.
Could you please advise regarding the ordering of scan rows for a single tablet
across multiple launches of the same scan token ?