Re: Query on partial Row Key

James Taylor Fri, 05 Jun 2015 07:45:28 -0700

Hi Vijay,
You've got a couple of options:
1) Force the query to do a skip scan (the Phoenix equivalent of the
FuzzyRowKeyFilter) by adding a hint like this:


    select /*+ SKIP_SCAN */ * from my_table where cid = ? and tid = ?

By default, Phoenix won't do a skip scan when there's gaps in the pk
columns filtered on by your where clause, but the hint will force it to be
used.

2) Add a secondary index on cid, tid. I suspect it wasn't being used
because you weren't including all the columns from your data table in your
index. Try creating a secondary index like this:

    create index cid_tid_idx on my_table(cid, tid) include (col1, col2, ...)

You could alternatively force Phoenix to use your secondary index by using
the INDEX hint on your select statement. For more detail and other options
for secondary indexes, see http://phoenix.apache.org/secondary_indexing.html

Thanks,
James

On Fri, Jun 5, 2015 at 7:18 AM, Hemal Parekh <[email protected]> wrote:

> Vijay,
>
> You can try this.
>
> select * from my_table where <primary key>  like '?%' and tid = ?
> (argument ? is customerid)
>
> This will do a range scan and a filter on server side. Other option is to
> change primary key design to (customerId int, transactionId varchar, timestamp
> BigInt). This will allow to query on primary key directly resulting in
> range scan.
>
> select * from my_table where <primary key>  like '?%'  (your argument
> should include both customerid and transactionid)
>
> Above will return all timestamps versioned records for a matching
> combination of customerid and transactionid.
>
>
> Thanks,
> Hemal
>
>
>
> On Fri, Jun 5, 2015 at 10:00 AM, Vijay Kukkala <[email protected]> wrote:
>
>> Cluster configuration:  Phoenix 4.0.2 with Hbase 0.98 HDP 2.1
>>
>> One of our table has a primary key (customerId int, timestamp BigInt,
>> transactionId varchar).
>>  One of our use cases is to retrieve records by customerId and
>> transactionID.
>>
>> select * from my_table where cid = ? and tid = ?
>>
>> looking at the explain plan, it resulted in a range scan on customerID
>> with a filter on transactionID.
>>
>> I tried to create an index on (CustomerID, transactionID) but didn't
>> result any change in the explain plan.
>>
>> Trying to understand if there is a better way to handle this. In Hbase,
>> there is a fuzzyRowKeyFilter which allows to query on parts on the key.
>>
>> Would appreciate if someone can point me in the right direction.
>> thanks,
>> Vijay
>>
>>
>
>
> --
>
> Hemal Parekh
> Senior Data Warehouse Architect
>  m. 240.449.4396
> [image: Bitscopic Inc] <http://bitscopic.com>
>
>

Re: Query on partial Row Key

Reply via email to