I'm still trying out some things. However, one observation regarding the 
'DontKnow'. It's also relevant as depending on the name order of the files in 
the directory, the same query currently can return different results.


> A partial solution is the one that Ted suggested: have readers create a 
> "DontKnow" column type, then modify each of a dozen operators to merge 
> columns of type X/X, X/DontKnow and DontKnow/DontKnow. Might work, but we'd 
> need a volunteer to implement such a sweeping change; it is a non-trivial 
> exercise.

If you have multiple files, and the tag only occurs in the second file

file1.json: { }
file2.json: { "a": "foo" }
file3.json: { }

Then

select sqlTypeOf(a) from ....

will return

NULL
VARCHAR
VARCHAR

If you rename the files, the result is different.

This will make the implementation of DontKnow tricky, because the query should 
return:

DONTKNOW
VARCHAR
DONTKNOW

and not

DONTKNOW
VARCHAR
VARCHAR

Otherwise, you have query results that depend on the name of the files, and for 
more complex where clauses, the optimizer might incorrectly throw out necessary 
checks (e.g., when first checking for not DONTKNOW and then checking for a 
specific value)

  Sebastian

Reply via email to