Currently drill tries to infer schemas from data that doesn't come with one, such as JSON, CSV, and mongoDB. However this doesn't work well if the first N rows are missing values for fields - drill just assigns an arbitrary type to fields that are only null and no type to fields that are missing completely, then rejects values when it finds them later.
What if you could instead query in a mode where each row is just given as a string, and you use JSON functions to load the data out and convert or cast it to the appropriate type? For JSON in particular it's common these days to provide functions that extract data from a JSON string column. BigQuery and postgres are two good examples. I think in many cases these JSON functions could be inspected by a driver and still be used for filter push down. Anyway, just an idea I had to approach the mongo schema problem that's a bit different from trying to specify the schema up front. I think this approach offers more flexibility to the user at the cost of more verbose syntax and harder to optimize queries.
