On Fri, Mar 3, 2017 at 7:52 PM, James Taylor <jamestay...@apache.org> wrote:
> Hi Marcin,
> There's a few things going on here:
> 1) An index created on a view over HBase data won't be maintained by
> Phoenix. You'd need to maintain it yourself through some other external
> means. If you create a table that maps to your HBase data, then it will be
> maintained.

UPSERTING data into the view will also work, right?

> 2) An index would only be used if you match against a constant on the
> right-hand side (while you're matching against the "number" column). For
> example, the following query would use the index and limit the scan to only
> rows in which "number" starts with 'Queen':
>
> 0: jdbc:phoenix:> explain select * from "traces" where
> regexp_substr("number", 'Q.*') = 'Queen';
> +------------------------------------------------------------------------------+
> |                                            PLAN
> |
> +------------------------------------------------------------------------------+
> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER traces
> [1,'Queen'] |
> |     SERVER FILTER BY FIRST KEY ONLY AND REGEXP_SUBSTR("number", 'Q.*', 1)
> =  |
> +------------------------------------------------------------------------------+
>
> Note that with local indexes, interpreting when the index is used is a bit
> subtle, but you'll see a [1, ...] after the RANGE SCAN as an indication. The
> index may be used if it's a full table scan (since the data contained in the
> index table may be smaller than that in the data table), but that won't
> buying you very much.
> 3) The index would only be used if your REGEXP_SUBSTR has a constant string
> before any wildcard matches in the second argument. You could also
> potentially use a function index [1], but it'd only use the index if the
> REGEXP_SUBSTR makes the same exact call (i.e. same pattern argument) as was
> used when the functional index was created.
>
> HTH.  Thanks,
>
>     James
>
> [1] https://phoenix.apache.org/secondary_indexing.html#Functional_Indexes


So, if I understand correctly, my best bet would be to use some
heuristics to hopefully extract a coarse prefix range from the regex?
That way I can do a preliminary range scan on the index and refine the
filtering on the returned data. I assume I could implement this as a
UDF with an appropriate newKeyPart method?



>
> On Fri, Mar 3, 2017 at 12:59 AM, Marcin Januszkiewicz
> <katamaran...@gmail.com> wrote:
>>
>> Hi,
>>
>> I have a table in hbase and created a view of it in phoenix, along
>> with a local index:
>>
>> create view "traces" (rowkey VARCHAR PRIMARY KEY, "cf"."time" VARCHAR,
>> "cf"."number" VARCHAR, "cf"."class" VARCHAR);
>> create local index idxf on "dmstab_vehicle_traces" ("cf"."number",
>> "cf"."class", rowkey);
>>
>> I need to filter rows based on a regex condition on the "number"
>> column, so I use queries like these:
>>
>> explain select * from "traces" where regexp_substr("number", 'Q.*') =
>> "number";
>>
>> CLIENT 2-CHUNK PARALLEL 2-WAY ROUND ROBIN RANGE SCAN OVER traces [2]
>>     SERVER FILTER BY FIRST KEY ONLY AND REGEXP_SUBSTR("number", 'Q.*',
>> 1) = "number"
>>
>>
>> It's a little ugly and less efficient than using LIKE, but the
>> performance is still relatively acceptable thanks to the index.
>>
>> However, if I want to range of rowkeys to include, Phoenix stops using
>> the index, which slows down the query significantly:
>>
>> explain select * from "traces" where regexp_substr("number", 'Q.*') =
>> "number" and rowkey < 'BY';
>>
>> CLIENT 4-CHUNK 687413 ROWS 629145681 BYTES PARALLEL 1-WAY ROUND ROBIN
>> RANGE SCAN OVER traces [*] - ['BY']
>>     SERVER FILTER BY REGEXP_SUBSTR(cf."number", 'Q.*', 1) = cf."number"
>>
>> Using an index hint doesn't change anything. Is there a way to make
>> this work, and is this a bug?
>
>

Reply via email to