I'm still trying out some things. However, one observation regarding the
'DontKnow'. It's also relevant as depending on the name order of the files in
the directory, the same query currently can return different results.
> A partial solution is the one that Ted suggested: have readers create a
> "DontKnow" column type, then modify each of a dozen operators to merge
> columns of type X/X, X/DontKnow and DontKnow/DontKnow. Might work, but we'd
> need a volunteer to implement such a sweeping change; it is a non-trivial
> exercise.
If you have multiple files, and the tag only occurs in the second file
file1.json: { }
file2.json: { "a": "foo" }
file3.json: { }
Then
select sqlTypeOf(a) from ....
will return
NULL
VARCHAR
VARCHAR
If you rename the files, the result is different.
This will make the implementation of DontKnow tricky, because the query should
return:
DONTKNOW
VARCHAR
DONTKNOW
and not
DONTKNOW
VARCHAR
VARCHAR
Otherwise, you have query results that depend on the name of the files, and for
more complex where clauses, the optimizer might incorrectly throw out necessary
checks (e.g., when first checking for not DONTKNOW and then checking for a
specific value)
Sebastian