Re: Query on partial Row Key

Vijay Kukkala Fri, 05 Jun 2015 10:07:54 -0700

Thanks James,

I tried both and that helped.



On Fri, Jun 5, 2015 at 9:44 AM James Taylor <[email protected]> wrote:

> Hi Vijay,
> You've got a couple of options:
> 1) Force the query to do a skip scan (the Phoenix equivalent of the
> FuzzyRowKeyFilter) by adding a hint like this:
>
>     select /*+ SKIP_SCAN */ * from my_table where cid = ? and tid = ?
>
> By default, Phoenix won't do a skip scan when there's gaps in the pk
> columns filtered on by your where clause, but the hint will force it to be
> used.
>
> 2) Add a secondary index on cid, tid. I suspect it wasn't being used
> because you weren't including all the columns from your data table in your
> index. Try creating a secondary index like this:
>
>     create index cid_tid_idx on my_table(cid, tid) include (col1, col2,
> ...)
>
> You could alternatively force Phoenix to use your secondary index by using
> the INDEX hint on your select statement. For more detail and other options
> for secondary indexes, see
> http://phoenix.apache.org/secondary_indexing.html
>
> Thanks,
> James
>
> On Fri, Jun 5, 2015 at 7:18 AM, Hemal Parekh <[email protected]> wrote:
>
>> Vijay,
>>
>> You can try this.
>>
>> select * from my_table where <primary key>  like '?%' and tid = ?
>> (argument ? is customerid)
>>
>> This will do a range scan and a filter on server side. Other option is to
>> change primary key design to (customerId int, transactionId varchar, 
>> timestamp
>> BigInt). This will allow to query on primary key directly resulting in
>> range scan.
>>
>> select * from my_table where <primary key>  like '?%'  (your argument
>> should include both customerid and transactionid)
>>
>> Above will return all timestamps versioned records for a matching
>> combination of customerid and transactionid.
>>
>>
>> Thanks,
>> Hemal
>>
>>
>>
>> On Fri, Jun 5, 2015 at 10:00 AM, Vijay Kukkala <[email protected]>
>> wrote:
>>
>>> Cluster configuration:  Phoenix 4.0.2 with Hbase 0.98 HDP 2.1
>>>
>>> One of our table has a primary key (customerId int, timestamp BigInt,
>>> transactionId varchar).
>>>  One of our use cases is to retrieve records by customerId and
>>> transactionID.
>>>
>>> select * from my_table where cid = ? and tid = ?
>>>
>>> looking at the explain plan, it resulted in a range scan on customerID
>>> with a filter on transactionID.
>>>
>>> I tried to create an index on (CustomerID, transactionID) but didn't
>>> result any change in the explain plan.
>>>
>>> Trying to understand if there is a better way to handle this. In Hbase,
>>> there is a fuzzyRowKeyFilter which allows to query on parts on the key.
>>>
>>> Would appreciate if someone can point me in the right direction.
>>> thanks,
>>> Vijay
>>>
>>>
>>
>>
>> --
>>
>> Hemal Parekh
>> Senior Data Warehouse Architect
>>  m. 240.449.4396
>> [image: Bitscopic Inc] <http://bitscopic.com>
>>
>>
>

Re: Query on partial Row Key

Reply via email to