Hi Sergey,

The optimization you're looking for is essentially to realize that IN-list
on a primary key prefix can be converted as follows:

scan(PK in (1,2,3)) ->
scan(PK = 1 OR PK = 2 OR PK = 3) ->
scan(PK = 1) union all scan(pk = 2) union all scan(PK = 3)

Currently, the tserver doesn't support conversion of a single user-facing
scan into multiple internal scan ranges in the general case. Doing so would
require a bit of surgery on the tablet server to understand the concept
that a scan has a set of disjoint PK ranges rather than a single range
associated. I filed a JIRA to support this here:
https://issues.apache.org/jira/browse/KUDU-2875

That said, there's a separate optimization which is simpler to implement,
which is to notice within a given DiskRowSet (small chunk of rows) that
only a single value in the IN-list can be present. In that case the IN-list
can convert, locally, to an equality predicate which may be satisfied by a
range scan or skipped entirely. I added this note to
https://issues.apache.org/jira/browse/KUDU-1644

Thanks
Todd

On Tue, Jun 25, 2019 at 9:24 PM Sergey Olontsev <[email protected]> wrote:

> Does anyone could help to find more about how InList predicates work?
>
> I have a bunch of values (sometimes just a couple, or a few, but
> potentially it could be tens or hundreds), and I have a table with primary
> key on column for the searching values with hash partitioning.
>
> And I've notices, that several separate searches by primary key with
> Comparison predicate usually work faster that one with InList predicate.
> I'm looking and Scanners information on gui and see, that by using
> Comparison predicate my app is reading only 1 block and it takes
> miliseconds, but with InList predicate it reads ~1.6 blocks several times
> (scanning with a batch of 1 million rows) and each scanner takes about
> 1-1.5 seconds to complete.
>
> So, really need more information about how exactly InList predicates are
> implemented and behave. Anyone could provide any links? Unfortunately, I
> was unable find any information, a few JIRA tasks only, but that didn't
> helped.
>
> https://issues.apache.org/jira/browse/KUDU-2853
> https://issues.apache.org/jira/browse/KUDU-1644
>
> Best regards, Sergey.
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to