Hey all, when looking at the drill options, and specifically as I was
trying to understand the parquet options, I realized that the naming of the
options was forming "question" as I looked at them. What do I mean?
Consider:

+--------------------------------------------+

|                    name                    |

+--------------------------------------------+

| store.parquet.block-size                   |

| store.parquet.compression                  |

| store.parquet.dictionary.page-size         |

| store.parquet.enable_dictionary_encoding   |

| store.parquet.page-size                    |

| store.parquet.use_new_reader               |

| store.parquet.vector_fill_check_threshold  |

| store.parquet.vector_fill_threshold        |

+--------------------------------------------+



So I will remove "store.parquet" as I refer to them here:


use_new_reader - This seems fairly obvious an "on read" options and
(maybe?) does affect the Parquet writer, yet "enable_dictionary_encoding"
is likely ONLY an on write option.... correct? I mean, if the Parquet file
was written somewhere else, and written with Dictionary encoding, Drill
will still read it ok, regardless of this setting. Compression as well, if
the Parquet file was created with gzip, and this setting is snappy, it will
still read it, same goes for block size. Thus, those seem to be "writer"
settings, rather than reader settings.


So what about the vector settings? Write or Read (or both?) For json there
is this setting: | store.json.writer.uglify    which seems to be writer
focused and obviously writer, but for other settings, knowing what the
setting applies to, on write, on read, neither, or both, could be very
useful for troubleshooting and knowing which settings to play with.


Now, changing these settings as they are is not recommended, even in my
test clusters, I have scripts that alter them for specific ETLs, and I
would hate to have things break, but how hard would it be to add a string
column to sys.options something like "applies_to" with write, read, both,
neither, n/a as options?   I think this could be valuable for users and
administrators of Drill.


One other note, in addition to the applies_to,  would it be horrifically
difficult to add a  "description" field for options?  Self documenting
settings sure would be handy....  :)


John

Reply via email to