I noticed that for many data sets (especially with much larger volumes) that select count(*) can cause various issues.
Since this is typically used to get a record count it is normally better to just do a count of a specific field, with JSON it is good to select an identifier field that will always be present (same with HBase) and for Hive to pick a single column. So instead of select count(*) to use select count(object.id). Is it possible to have Drill do this by default? A lot of DBMS cheat by just grabbing the statistics to give the user a count for queries like this, in this case of Drill this may not be possible but just applying the count to a single field will reduce the work and number of issues (especially with complex data sets). —Andries
