I noticed that for many data sets (especially with much larger volumes) that 
select count(*) can cause various issues. 

Since this is typically used to get a record count it is normally better to 
just do a count of a specific field, with JSON it is good to select an 
identifier field that will always be present (same with HBase) and for Hive to 
pick a single column.

So instead of select count(*) to use select count(object.id).


Is it possible to have Drill do this by default? A lot of DBMS cheat by just 
grabbing the statistics to give the user a count for queries like this, in this 
case of Drill this may not be possible but just applying the count to a single 
field will reduce the work and number of issues (especially with complex data 
sets).


—Andries

Reply via email to