Hi Boaz,
As noted earlier, it would be wonderful if Drill could handle schema changes on
the fly, using only the information in the files as they are read, and with
only a few code changes. Alas, such is not the case.
Question: is the goal to have schema changes somewhat less often (but they
Hi Paul,
(_a_) Having a "schema file" sounds like contradiction to calling Drill
"schema free"; maybe we could "sweep it under the mat" by creating a new
convention for scanners, such that if a scanner has multiple files to
read (e.g. f1.csv, f2,csv, ...), then is there's some file named
HI Aman,
I would completely agree with the analysis -- except for the fact that we can't
create a general solution, only a patchwork of incomplete ad-hoc solutions. The
question is not whether it would be useful to have a general solution (it
would), rather whether it is technically possible
Hi Paul,
Thanks for the feedback ! I am in complete favor of doing the schema
discovery and schema hinting. But even on this list in the past we have
discussed other use cases such as IoT devices where the schema-on-read is
needed (I think it was in the context of the 'death of schema-on-read'
Hi Aman,
Thanks much for the write-up. My two cents, FWIW.
As the history of this list has shown, I've fought with the schema change issue
multiple times: in sort, in JSON, in the row set loader framework, and in
writing the "Data Engineering" chapter in the Learning Drill book.
What I have
Hi all,
While we continue to enhance the schema provision and metastore aspects in
Drill, we also should explore what it means to be truly schema-less such
that we can better handle {semi, un}structured data, data sitting in DBs
that store JSON documents (e.g Mongo, MapR-DB).
The blocking