Looks like some work has been done here, any chance we can move this along?
https://issues.apache.org/jira/browse/DRILL-4699 Thanks! On Tue, May 31, 2016 at 12:51 PM, John Omernik <[email protected]> wrote: > I added a JIRA related to this: > > https://issues.apache.org/jira/browse/DRILL-4699 > > On Sun, May 29, 2016 at 6:55 AM, John Omernik <[email protected]> wrote: > >> Hey all, when looking at the drill options, and specifically as I was >> trying to understand the parquet options, I realized that the naming of the >> options was forming "question" as I looked at them. What do I mean? >> Consider: >> >> +--------------------------------------------+ >> >> | name | >> >> +--------------------------------------------+ >> >> | store.parquet.block-size | >> >> | store.parquet.compression | >> >> | store.parquet.dictionary.page-size | >> >> | store.parquet.enable_dictionary_encoding | >> >> | store.parquet.page-size | >> >> | store.parquet.use_new_reader | >> >> | store.parquet.vector_fill_check_threshold | >> >> | store.parquet.vector_fill_threshold | >> >> +--------------------------------------------+ >> >> >> >> So I will remove "store.parquet" as I refer to them here: >> >> >> use_new_reader - This seems fairly obvious an "on read" options and >> (maybe?) does affect the Parquet writer, yet "enable_dictionary_encoding" >> is likely ONLY an on write option.... correct? I mean, if the Parquet file >> was written somewhere else, and written with Dictionary encoding, Drill >> will still read it ok, regardless of this setting. Compression as well, if >> the Parquet file was created with gzip, and this setting is snappy, it will >> still read it, same goes for block size. Thus, those seem to be "writer" >> settings, rather than reader settings. >> >> >> So what about the vector settings? Write or Read (or both?) For json >> there is this setting: | store.json.writer.uglify which seems to be >> writer focused and obviously writer, but for other settings, knowing what >> the setting applies to, on write, on read, neither, or both, could be very >> useful for troubleshooting and knowing which settings to play with. >> >> >> Now, changing these settings as they are is not recommended, even in my >> test clusters, I have scripts that alter them for specific ETLs, and I >> would hate to have things break, but how hard would it be to add a string >> column to sys.options something like "applies_to" with write, read, both, >> neither, n/a as options? I think this could be valuable for users and >> administrators of Drill. >> >> >> One other note, in addition to the applies_to, would it be horrifically >> difficult to add a "description" field for options? Self documenting >> settings sure would be handy.... :) >> >> >> John >> >> >> >
