Hi Sergey, The optimization you're looking for is essentially to realize that IN-list on a primary key prefix can be converted as follows:
scan(PK in (1,2,3)) -> scan(PK = 1 OR PK = 2 OR PK = 3) -> scan(PK = 1) union all scan(pk = 2) union all scan(PK = 3) Currently, the tserver doesn't support conversion of a single user-facing scan into multiple internal scan ranges in the general case. Doing so would require a bit of surgery on the tablet server to understand the concept that a scan has a set of disjoint PK ranges rather than a single range associated. I filed a JIRA to support this here: https://issues.apache.org/jira/browse/KUDU-2875 That said, there's a separate optimization which is simpler to implement, which is to notice within a given DiskRowSet (small chunk of rows) that only a single value in the IN-list can be present. In that case the IN-list can convert, locally, to an equality predicate which may be satisfied by a range scan or skipped entirely. I added this note to https://issues.apache.org/jira/browse/KUDU-1644 Thanks Todd On Tue, Jun 25, 2019 at 9:24 PM Sergey Olontsev <[email protected]> wrote: > Does anyone could help to find more about how InList predicates work? > > I have a bunch of values (sometimes just a couple, or a few, but > potentially it could be tens or hundreds), and I have a table with primary > key on column for the searching values with hash partitioning. > > And I've notices, that several separate searches by primary key with > Comparison predicate usually work faster that one with InList predicate. > I'm looking and Scanners information on gui and see, that by using > Comparison predicate my app is reading only 1 block and it takes > miliseconds, but with InList predicate it reads ~1.6 blocks several times > (scanning with a batch of 1 million rows) and each scanner takes about > 1-1.5 seconds to complete. > > So, really need more information about how exactly InList predicates are > implemented and behave. Anyone could provide any links? Unfortunately, I > was unable find any information, a few JIRA tasks only, but that didn't > helped. > > https://issues.apache.org/jira/browse/KUDU-2853 > https://issues.apache.org/jira/browse/KUDU-1644 > > Best regards, Sergey. > -- Todd Lipcon Software Engineer, Cloudera
