Hi James, I have filed a JIRA https://issues.apache.org/jira/browse/PHOENIX-3129 for using global index for such queries without hint. Feel free to watch and comment on this issue.
Thanks, Maryann On Wed, Jul 27, 2016 at 12:29 PM, James Taylor <jamestay...@apache.org> wrote: > On Wed, Jul 27, 2016 at 8:07 AM, Heather, James (ELS) < > james.heat...@elsevier.com> wrote: > >> - select * from documents where profile_id = ? (multiple rows returned, >> doing a full scan) --- 5 sec >> >> See https://phoenix.apache.org/secondary_indexing.html#Index_Usage for >> when an index is used/not used, but by default a global index won't be used >> if not all columns are contained in the index (i.e. covered index). >> >> >> If the index on profile_id isn't covered, it's much quicker, it seems, to >> do >> >> SELECT * FROM documents WHERE id IN (SELECT id FROM documents WHERE >> profile_id = ?) >> >> where 'id' is the primary key. In other words, do it in two stages: >> retrieve the primary key using the global index, and then look up the rows >> using the primary key. >> >> Is there any reason that Phoenix doesn't do this by default, to avoid the >> full scan? >> > > In general, Phoenix doesn't know how many rows will be returned for a > given profile_id, so rather than performing a broadcast join (and > potentially failing because too many rows are returned) it takes the > conservative approach. You can hint the query to force the index to be > used, though. > > We could definitely be smarter about this when there's an index in place, > as we have a good estimate (with stats) on how rows will be scanned in that > case. In fact we have an optimization already in place that'll do a kind of > batched lookup of PKs as the join in being processed (kind of like a nested > loop join). Definitely worth filing a JIRA. > > >> - select profile_id from documents_test order by added desc limit 1 >> (index on added) --- 5.5 sec >> >> >> If PROFILE_ID is not in the primary key constraint of the table, make >> sure to include it in your index on the ADDED column. >> >> >> I don't think there's a similar trick with an ORDER BY clause, is there? >> > > Yes, you can hint the query to use the index (see link I sent before). > >