Hello All,

I was wondering if there is any guarantee from the kudu scanner that the rows 
returned from a single tablet scan are always in the same order basing on the 
following assumptions : 

- There was no change in the underlying kudu tablet for the given scan range 
when the reads are being performed multiple times for the same scan token
- I am using Java client
- I am using Kudu version 1.4.0
- The client code is using the KuduScanTokenBuilder API to plan the set of 
scans that can be performed for a given query.
- The client is using the nextRows() followed using hasNext() and next() 
methods in the corresponding iterators.
- There seems to be a variable called orderMode in the asyncScanner during a 
debug session but it looks like this property is not exposed yet as a public 
API. The default value seems to be that it is unordered. 

Perhaps the answer is no per the last point above but would like confirmation 
from the community. 

I am integrating Apache Apex with Apache kudu and am using the scan token 
builder API mechanism to plan the scans in a distributed way. While doing so, I 
would like to provide the end users of Apache Apex a mechanism to get a 
consistent scan ordering as a configurable approach. Given it is almost 
impossible to achieve this ordering in a true distributed fashion for 
downstream compute nodes, the aim is to provide consistent ordering within a 
single Apex partition. Apache apex with Kudu integration would be providing 
configurations to map one tablet to one or multiple apex partitions. While 
scanning in either of these mapping styles, I would like to provide further 
ordering guarantees. However I am not sure if Apache Kudu would provide a 
consistent ordering for the same scan provided the above assumptions hold good. 

Could you please advise regarding the ordering of scan rows for a single tablet 
across multiple launches of the same scan token ?


Reply via email to