Thanks a lot for that explanation, Gabriel.

From: Gabriel Reid [mailto:[email protected]]
Sent: Thursday, July 23, 2015 10:39 AM
To: [email protected]
Subject: Re: Understanding keys

Filtering a query on the leading columns of the primary key (i.e. [A], [A,B], 
or [A,B,C]) will give optimal performance. This is because the records are in 
sorted order based on the combination of [A,B,C], so filtering on a leading 
subset of the primary key is basically the same as filtering on the full 
primary key. The set of records that pass the query filter are all within a 
single defined range of records (not considering salt buckets and multi-tenant 
tables).

Filtering on [B] or [B,C] may offer relatively decent performance, depending on 
the cardinality and data type of primary key column A. This is due to the way 
that Phoenix makes use of skip scans [1].

The farther you get away from using the leading column(s) in a filter, the more 
rows that Phoenix will need to check in order to return a result, which equals 
slower query performance.

The difference between having a primary key of (A,B,C) compared to a primary 
key of (A, B) and a secondary index on C is that the secondary index on C will 
allow much faster lookups if you're searching for a row with a given value for 
C. I expect that the performance of a query where you specify a filter on A, B, 
and C in this case will also be slower than if you have a primary key on (A, B, 
C), as again Phoenix will have to scan over more rows than if the primary key 
(A, B, C) were defined.

- Gabriel

1. http://phoenix.apache.org/skip_scan.html


On Thu, Jul 23, 2015 at 11:45 AM Riesland, Zack 
<[email protected]<mailto:[email protected]>> wrote:
This is probably a silly question… please humor me: I’m a Java/JS developer 
learning about databases as I go.

Suppose I have a table with columns A-Z, and declare the primary key to be (A, 
B, C).

I understand that that forces each row to have a unique A, B, C combination.

But what does it mean for querying? Can I very quickly query again column A 
and/or B and/or C, or just the combination of the 3?

What is the difference between primary key (A, B, C) and primary key (A, B) 
with a secondary index on C?

Thanks!

Reply via email to