Hi Ananth We've "hidden" the ordered scans API when we added hash partitioning since it wouldn't return a fully ordered scan across tablet servers anymore and we didn't want to confuse users. If all you want is a scan that always returns the same ordering (but not fully ordered rows) you can achieve that by making the scan fault-tolerant (https://kudu.apache.org/apidocs/org/apache/kudu/client/AbstractKuduScannerBuilder.html#setFaultTolerant-boolean-) Note that there might be a perf penalty for doing these kinds of scans.
HTH -David Sent from my iPhone > On Aug 12, 2017, at 7:36 PM, Ananth Gundabattula <[email protected]> > wrote: > > Hello All, > > > I was wondering if there is any guarantee from the kudu scanner that the rows > returned from a single tablet scan are always in the same order basing on the > following assumptions : > > - There was no change in the underlying kudu tablet for the given scan range > when the reads are being performed multiple times for the same scan token > - I am using Java client > - I am using Kudu version 1.4.0 > - The client code is using the KuduScanTokenBuilder API to plan the set of > scans that can be performed for a given query. > - The client is using the nextRows() followed using hasNext() and next() > methods in the corresponding iterators. > - There seems to be a variable called orderMode in the asyncScanner during a > debug session but it looks like this property is not exposed yet as a public > API. The default value seems to be that it is unordered. > > > Perhaps the answer is no per the last point above but would like confirmation > from the community. > > I am integrating Apache Apex with Apache kudu and am using the scan token > builder API mechanism to plan the scans in a distributed way. While doing so, > I would like to provide the end users of Apache Apex a mechanism to get a > consistent scan ordering as a configurable approach. Given it is almost > impossible to achieve this ordering in a true distributed fashion for > downstream compute nodes, the aim is to provide consistent ordering within a > single Apex partition. Apache apex with Kudu integration would be providing > configurations to map one tablet to one or multiple apex partitions. While > scanning in either of these mapping styles, I would like to provide further > ordering guarantees. However I am not sure if Apache Kudu would provide a > consistent ordering for the same scan provided the above assumptions hold > good. > > Could you please advise regarding the ordering of scan rows for a single > tablet across multiple launches of the same scan token ? > > Regards, > Ananth
