On Fri, Mar 3, 2017 at 7:52 PM, James Taylor <jamestay...@apache.org> wrote: > Hi Marcin, > There's a few things going on here: > 1) An index created on a view over HBase data won't be maintained by > Phoenix. You'd need to maintain it yourself through some other external > means. If you create a table that maps to your HBase data, then it will be > maintained.
UPSERTING data into the view will also work, right? > 2) An index would only be used if you match against a constant on the > right-hand side (while you're matching against the "number" column). For > example, the following query would use the index and limit the scan to only > rows in which "number" starts with 'Queen': > > 0: jdbc:phoenix:> explain select * from "traces" where > regexp_substr("number", 'Q.*') = 'Queen'; > +------------------------------------------------------------------------------+ > | PLAN > | > +------------------------------------------------------------------------------+ > | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER traces > [1,'Queen'] | > | SERVER FILTER BY FIRST KEY ONLY AND REGEXP_SUBSTR("number", 'Q.*', 1) > = | > +------------------------------------------------------------------------------+ > > Note that with local indexes, interpreting when the index is used is a bit > subtle, but you'll see a [1, ...] after the RANGE SCAN as an indication. The > index may be used if it's a full table scan (since the data contained in the > index table may be smaller than that in the data table), but that won't > buying you very much. > 3) The index would only be used if your REGEXP_SUBSTR has a constant string > before any wildcard matches in the second argument. You could also > potentially use a function index [1], but it'd only use the index if the > REGEXP_SUBSTR makes the same exact call (i.e. same pattern argument) as was > used when the functional index was created. > > HTH. Thanks, > > James > > [1] https://phoenix.apache.org/secondary_indexing.html#Functional_Indexes So, if I understand correctly, my best bet would be to use some heuristics to hopefully extract a coarse prefix range from the regex? That way I can do a preliminary range scan on the index and refine the filtering on the returned data. I assume I could implement this as a UDF with an appropriate newKeyPart method? > > On Fri, Mar 3, 2017 at 12:59 AM, Marcin Januszkiewicz > <katamaran...@gmail.com> wrote: >> >> Hi, >> >> I have a table in hbase and created a view of it in phoenix, along >> with a local index: >> >> create view "traces" (rowkey VARCHAR PRIMARY KEY, "cf"."time" VARCHAR, >> "cf"."number" VARCHAR, "cf"."class" VARCHAR); >> create local index idxf on "dmstab_vehicle_traces" ("cf"."number", >> "cf"."class", rowkey); >> >> I need to filter rows based on a regex condition on the "number" >> column, so I use queries like these: >> >> explain select * from "traces" where regexp_substr("number", 'Q.*') = >> "number"; >> >> CLIENT 2-CHUNK PARALLEL 2-WAY ROUND ROBIN RANGE SCAN OVER traces [2] >> SERVER FILTER BY FIRST KEY ONLY AND REGEXP_SUBSTR("number", 'Q.*', >> 1) = "number" >> >> >> It's a little ugly and less efficient than using LIKE, but the >> performance is still relatively acceptable thanks to the index. >> >> However, if I want to range of rowkeys to include, Phoenix stops using >> the index, which slows down the query significantly: >> >> explain select * from "traces" where regexp_substr("number", 'Q.*') = >> "number" and rowkey < 'BY'; >> >> CLIENT 4-CHUNK 687413 ROWS 629145681 BYTES PARALLEL 1-WAY ROUND ROBIN >> RANGE SCAN OVER traces [*] - ['BY'] >> SERVER FILTER BY REGEXP_SUBSTR(cf."number", 'Q.*', 1) = cf."number" >> >> Using an index hint doesn't change anything. Is there a way to make >> this work, and is this a bug? > >