select count(*) causing issues, use select count on a specific field for better results

Andries Engelbrecht Thu, 12 Feb 2015 08:32:51 -0800

I noticed that for many data sets (especially with much larger volumes) that 
select count(*) can cause various issues.


Since this is typically used to get a record count it is normally better to 
just do a count of a specific field, with JSON it is good to select an 
identifier field that will always be present (same with HBase) and for Hive to 
pick a single column.

So instead of select count(*) to use select count(object.id).


Is it possible to have Drill do this by default? A lot of DBMS cheat by just 
grabbing the statistics to give the user a count for queries like this, in this 
case of Drill this may not be possible but just applying the count to a single 
field will reduce the work and number of issues (especially with complex data 
sets).


—Andries

select count(*) causing issues, use select count on a specific field for better results

Reply via email to