Hi Charles it was indeed some conflict. After renaming my storage plugin to a distinct name it worked.
regards Stefan On Wed, Jul 12, 2023 at 6:48 PM Stefan Ziegler <stefan.ziegler...@gmail.com> wrote: > Thanks. Yes. I'm going to try the renaming approach. > > Not a rant but isn't the whole point of a "storage-plugins-override.conf" > to override storage plugin configuration? > > Btw: I'm in embedded mode. So I guess I can also use the config files from > /tmp/drill after "fixing" the format configuration in the ui and use them > e.g. in a docker image. > > regards > Stefan > > On Wed, Jul 12, 2023 at 6:04 PM Charles Givre <cgi...@gmail.com> wrote: > >> My sense of what is happening in your use case is that the configs that >> exist in the UI are overriding the conf file. What it seems like you want >> is the opposite order of precedence. I've never used the conf files for >> this, so I don't have a lot of experience with that, but it would seem that >> the best way to get your Drill cluster configured to so what you want is to >> delete or disable the configs in the UI and only use the ones in the config >> file. >> >> By conflicting I meant that let's say that you have a plugin called dfs >> that has the json format enabled. If you put a configuration for a plugin >> also called dfs in the conf file, what I think is happening is that since >> you have two plugins with the same name, Drill will read the ones from the >> UI. (FYSA, they aren't actually stored in the UI. If you are using Drill >> in distributed mode, those configurations are stored in zookeeper. If you >> are in embedded mode, they are stored on your drive somewhere.) >> >> Anyway, IMHO, the best thing to do would be to make sure that the >> plugins in your conf file do not have the same names as the pluigns that >> appear in the UI. That's what I was getting at. Does that make sense? >> Best, >> -- C >> >> > On Jul 12, 2023, at 11:57 AM, Stefan Ziegler < >> stefan.ziegler...@gmail.com> wrote: >> > >> > Hi Charles >> > >> > not sure if I understand you correctly: what do you mean with “not >> conflicting”. My attempt is to not use the UI at all to configure storages. >> I thought this can be achieved by overriding the defaut storages with the >> “override” file. This seems to work except the strange behaviour with the >> formats. >> > >> > regards >> > Stefan >> > >> > Sent from Outlook for iOS<https://aka.ms/o0ukef> >> > ________________________________ >> > From: Charles Givre <cgi...@gmail.com> >> > Sent: Wednesday, July 12, 2023 5:04 PM >> > To: user <user@drill.apache.org> >> > Subject: Re: Respecting formats restriction when using >> storage-plugins-override.conf >> > >> > Hi Stefan, >> > My biggest piece of advice here would just be to make sure the plugins >> specified in the override file do not conflict with the UI-based configs. >> It may make sense to have completely different configs in each location. >> IE: >> > >> > dfs-conf and (plain) dfs. >> > >> > I think that should solve all issues. In theory if you remove a config >> from the "formats" section, Drill should not be able to parse the file in >> question. So for example if you don't have the 'csv' format or 'excel' >> then Drill will not be able to parse those formats. >> > >> > Best, >> > -- C >> > >> > >> >> On Jul 11, 2023, at 2:42 AM, Stefan Ziegler < >> stefan.ziegler...@gmail.com> wrote: >> >> >> >> The config für dfs in the UI looks like this: >> >> >> >> { >> >> "type": "file", >> >> "connection": "file:///", >> >> "workspaces": { >> >> "tmp": { >> >> "location": "/tmp", >> >> "writable": true, >> >> "defaultInputFormat": null, >> >> "allowAccessOutsideWorkspace": false >> >> }, >> >> "root": { >> >> "location": "/", >> >> "writable": false, >> >> "defaultInputFormat": null, >> >> "allowAccessOutsideWorkspace": false >> >> }, >> >> "home": { >> >> "location": "/Users/stefan", >> >> "writable": true, >> >> "defaultInputFormat": null, >> >> "allowAccessOutsideWorkspace": false >> >> } >> >> }, >> >> "formats": { >> >> "parquet": { >> >> "type": "parquet" >> >> }, >> >> "json": { >> >> "type": "json", >> >> "extensions": [ >> >> "json" >> >> ] >> >> }, >> >> "excel": { >> >> "type": "excel", >> >> "extensions": [ >> >> "xlsx" >> >> ], >> >> "lastRow": 1048576, >> >> "ignoreErrors": true, >> >> "maxArraySize": -1, >> >> "thresholdBytesForTempFiles": -1 >> >> }, >> >> "spss": { >> >> "type": "spss", >> >> "extensions": [ >> >> "sav" >> >> ] >> >> }, >> >> "iceberg": { >> >> "type": "iceberg", >> >> "properties": null, >> >> "caseSensitive": null, >> >> "includeColumnStats": null, >> >> "ignoreResiduals": null, >> >> "snapshotId": null, >> >> "snapshotAsOfTime": null, >> >> "fromSnapshotId": null, >> >> "toSnapshotId": null >> >> }, >> >> "httpd": { >> >> "type": "httpd", >> >> "extensions": [ >> >> "httpd" >> >> ], >> >> "logFormat": "common\ncombined" >> >> }, >> >> "xml": { >> >> "type": "xml", >> >> "extensions": [ >> >> "xml" >> >> ], >> >> "dataLevel": 1 >> >> }, >> >> "syslog": { >> >> "type": "syslog", >> >> "extensions": [ >> >> "syslog" >> >> ], >> >> "maxErrors": 10 >> >> }, >> >> "msaccess": { >> >> "type": "msaccess", >> >> "extensions": [ >> >> "mdb", >> >> "accdb" >> >> ] >> >> }, >> >> "hdf5": { >> >> "type": "hdf5", >> >> "extensions": [ >> >> "h5" >> >> ], >> >> "defaultPath": null >> >> }, >> >> "ltsv": { >> >> "type": "ltsv", >> >> "extensions": [ >> >> "ltsv" >> >> ], >> >> "parseMode": "lenient", >> >> "escapeCharacter": null, >> >> "kvDelimiter": null, >> >> "entryDelimiter": null, >> >> "lineEnding": null, >> >> "quoteChar": null >> >> }, >> >> "delta": { >> >> "type": "delta", >> >> "version": null, >> >> "timestamp": null >> >> }, >> >> "shp": { >> >> "type": "shp", >> >> "extensions": [ >> >> "shp" >> >> ] >> >> }, >> >> "image": { >> >> "type": "image", >> >> "extensions": [ >> >> "jpg", >> >> "jpeg", >> >> "jpe", >> >> "tif", >> >> "tiff", >> >> "dng", >> >> "psd", >> >> "png", >> >> "bmp", >> >> "gif", >> >> "ico", >> >> "pcx", >> >> "wav", >> >> "wave", >> >> "avi", >> >> "webp", >> >> "mov", >> >> "mp4", >> >> "m4a", >> >> "m4p", >> >> "m4b", >> >> "m4r", >> >> "m4v", >> >> "3gp", >> >> "3g2", >> >> "eps", >> >> "epsf", >> >> "epsi", >> >> "ai", >> >> "arw", >> >> "crw", >> >> "cr2", >> >> "nef", >> >> "orf", >> >> "raf", >> >> "rw2", >> >> "rwl", >> >> "srw", >> >> "x3f" >> >> ], >> >> "fileSystemMetadata": true, >> >> "descriptive": true >> >> }, >> >> "pdf": { >> >> "type": "pdf", >> >> "extensions": [ >> >> "pdf" >> >> ], >> >> "extractHeaders": true, >> >> "extractionAlgorithm": "basic" >> >> }, >> >> "sas": { >> >> "type": "sas", >> >> "extensions": [ >> >> "sas7bdat" >> >> ] >> >> }, >> >> "pcap": { >> >> "type": "pcap", >> >> "extensions": [ >> >> "pcap", >> >> "pcapng" >> >> ] >> >> } >> >> }, >> >> "authMode": "SHARED_USER", >> >> "enabled": true >> >> } >> >> >> >> I'm now able to query some XML data: "SELECT * FROM >> >> dfs.home.`ch.so.afu.abbaustellen.xml`;" Which I actually don't want to >> be >> >> able to (see formats in the "storage-plugins-override.conf" file). If I >> >> remove the xml format section in the config in the UI, I'm not able to >> >> query the xml anymore: "Error: VALIDATION ERROR: From line 1, column >> 15 to >> >> line 1, column 51: Object 'ch.so.afu.abbaustellen.xml' not found within >> >> 'dfs.home'". >> >> >> >> regards >> >> Stefan >> >> >> >> >> >> On Mon, Jul 10, 2023 at 9:15 PM Charles Givre <cgi...@gmail.com> >> wrote: >> >> >> >>> HI Stefan, >> >>> What's in the config in the UI? Can you also please clarify what >> queries >> >>> are running which indicate that your configs aren't working? >> >>> Best, >> >>> -- C >> >>> >> >>> >> >>> >> >>>> On Jul 10, 2023, at 1:11 PM, Stefan Ziegler < >> stefan.ziegler...@gmail.com> >> >>> wrote: >> >>>> >> >>>> "storage": { >> >>>> cp: { >> >>>> type: "file", >> >>>> connection: "classpath:///", >> >>>> formats: { >> >>>> "csv" : { >> >>>> type: "text", >> >>>> extensions: [ "csv" ], >> >>>> delimiter: "," >> >>>> } >> >>>> } >> >>>> enabled: true >> >>>> } >> >>>> } >> >>>> "storage": { >> >>>> dfs: { >> >>>> type: "file", >> >>>> connection: "file:///", >> >>>> workspaces: { >> >>>> "tmp": { >> >>>> "location": "/tmp", >> >>>> "writable": true, >> >>>> "defaultInputFormat": null, >> >>>> "allowAccessOutsideWorkspace": false >> >>>> }, >> >>>> "home": { >> >>>> "location": "/Users/stefan", >> >>>> "writable": true, >> >>>> "defaultInputFormat": null, >> >>>> "allowAccessOutsideWorkspace": false >> >>>> }, >> >>>> "root": { >> >>>> "location": "/", >> >>>> "writable": false, >> >>>> "defaultInputFormat": null, >> >>>> "allowAccessOutsideWorkspace": false >> >>>> } >> >>>> }, >> >>>> formats: { >> >>>> "parquet": { >> >>>> "type": "parquet" >> >>>> }, >> >>>> "json": { >> >>>> "type": "json", >> >>>> "extensions": [ >> >>>> "json" >> >>>> ] >> >>>> } >> >>>> }, >> >>>> enabled: true >> >>>> } >> >>>> } >> >>>> "storage": { >> >>>> s3: { >> >>>> type: "file", >> >>>> connection: "s3a://<my-bucket-name>", >> >>>> config: { >> >>>> "fs.s3a.aws.credentials.provider": >> >>>> "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider", >> >>>> "fs.s3a.endpoint": "s3.eu-central-1.amazonaws.com", >> >>>> "fs.s3a.impl.disable.cache": "false" >> >>>> }, >> >>>> workspaces: { >> >>>> "root": { >> >>>> "location": "/", >> >>>> "writable": false, >> >>>> "defaultInputFormat": "parquet", >> >>>> "allowAccessOutsideWorkspace": false >> >>>> } >> >>>> }, >> >>>> "formats": { >> >>>> "parquet": { >> >>>> "type": "parquet" >> >>>> } >> >>>> }, >> >>>> enabled: true >> >>>> } >> >>>> } >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> On Mon, Jul 10, 2023 at 6:40 PM Charles Givre <cgi...@gmail.com> >> wrote: >> >>>> >> >>>>> Can you share your configs with any sensitive info redacted? The >> lists >> >>>>> don't support images, so please just cut/paste the json. >> >>>>> I had another idea... >> >>>>> -- C >> >>>>> >> >>>>> >> >>>>>> On Jul 10, 2023, at 12:28 PM, Stefan Ziegler < >> >>>>> stefan.ziegler...@gmail.com> wrote: >> >>>>>> >> >>>>>> Yes, I think I'm following these instructions. And the file is not >> >>>>>> completely ignored. It creates additional format definitions. Let's >> >>> say I >> >>>>>> white list some formats in my storage configuration and Drill adds >> more >> >>>>>> formats (which I don't want). Is there another way to start a >> "vanilla" >> >>>>>> Drill installation with my own configurations? >> >>>>>> >> >>>>>> Stefan >> >>>>>> >> >>>>>> On Mon, Jul 10, 2023 at 6:17 PM Charles Givre <cgi...@gmail.com> >> >>> wrote: >> >>>>>> >> >>>>>>> Hi Stefan, >> >>>>>>> My apologies.. Ok.. so the issue is that the >> >>>>> storage-plugins-override.conf >> >>>>>>> is being ignored. I've never actually used this feature, so I >> wasn't >> >>>>>>> familiar with it, but are you folllowing the instructions here [1] >> >>> with >> >>>>>>> respect to configuration and restarting Drill? My suggestion >> would be >> >>>>> to >> >>>>>>> remove all the plugins in the UI and only specify them in the >> .conf >> >>>>> file. >> >>>>>>> Drill has an order of precedence and I suspect what is happening >> is >> >>> that >> >>>>>>> the UI versions have a higher priority than the .conf versions. >> Does >> >>>>> that >> >>>>>>> make sense? >> >>>>>>> >> >>>>>>> -- C >> >>>>>>> >> >>>>>>> [1]: >> >>>>>>> >> >>>>> >> >>> >> https://drill.apache.org/docs/configuring-storage-plugins/#configuring-storage-plugins-with-the-storage-plugins-overrideconf-file >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>>> On Jul 10, 2023, at 12:06 PM, Stefan Ziegler < >> >>>>>>> stefan.ziegler...@gmail.com> wrote: >> >>>>>>>> >> >>>>>>>> Hi Charles >> >>>>>>>> >> >>>>>>>> I use a "storage-plugins-override.conf" file. My attempt is to >> have >> >>> the >> >>>>>>>> configuration for my storages in a single file and Drill can >> pick up >> >>>>> the >> >>>>>>>> configuration on startup. I put "storage-plugins-override.conf" >> in >> >>> the >> >>>>>>> conf >> >>>>>>>> directory and Drill creates the storages on startup but (and >> that is >> >>> my >> >>>>>>>> problem) also creates all formats for every storage defined in my >> >>>>> config >> >>>>>>>> file. E.g. I have a (local) file type storage and I define two >> >>> formats >> >>>>>>>> (parquet and json) in it. Drill does not respect my restriction >> to >> >>> two >> >>>>>>>> formats in the config file but creates all formats known to Drill >> >>> (like >> >>>>>>>> iceberg, xml etc.). >> >>>>>>>> >> >>>>>>>> regards >> >>>>>>>> Stefan >> >>>>>>>> >> >>>>>>>> On Mon, Jul 10, 2023 at 5:30 PM Charles Givre <cgi...@gmail.com> >> >>>>> wrote: >> >>>>>>>> >> >>>>>>>>> HI Stefan, >> >>>>>>>>> Thanks for your interest in Drill. You have to define the >> format >> >>>>> config >> >>>>>>>>> for each storage plugin. Otherwise Drill doesn't know what >> >>> extension >> >>>>> to >> >>>>>>>>> associate with what format plugin. Out of curiosity, why are >> you >> >>>>> using >> >>>>>>> the >> >>>>>>>>> .conf files for this? >> >>>>>>>>> -- C >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>>> On Jul 9, 2023, at 12:03 PM, Stefan Ziegler < >> >>>>>>> stefan.ziegler...@gmail.com> >> >>>>>>>>> wrote: >> >>>>>>>>>> >> >>>>>>>>>> Not defining a format seems to prevent the user from querying >> the >> >>>>>>>>> specific >> >>>>>>>>>> format. E.g. after deleting the xml format definition in the >> web >> >>> gui, >> >>>>>>> I'm >> >>>>>>>>>> not able to query xml files anymore. So I guess my assumption >> was >> >>>>>>> right. >> >>>>>>>>>> >> >>>>>>>>>> Stefan >> >>>>>>>>>> >> >>>>>>>>>> On Sun, Jul 9, 2023 at 5:41 PM Stefan Ziegler < >> >>>>>>>>> stefan.ziegler...@gmail.com> >> >>>>>>>>>> wrote: >> >>>>>>>>>> >> >>>>>>>>>>> Btw: I assumed that the list of formats act as a restriction. >> >>>>> Probably >> >>>>>>>>> I'm >> >>>>>>>>>>> wrong. >> >>>>>>>>>>> >> >>>>>>>>>>> Stefan >> >>>>>>>>>>> >> >>>>>>>>>>> On Sun, Jul 9, 2023 at 5:27 PM Stefan Ziegler < >> >>>>>>>>> stefan.ziegler...@gmail.com> >> >>>>>>>>>>> wrote: >> >>>>>>>>>>> >> >>>>>>>>>>>> Hi >> >>>>>>>>>>>> >> >>>>>>>>>>>> I'm using storage-plugins-override.conf to configure the >> storage >> >>>>>>>>> plugins >> >>>>>>>>>>>> on startup. My storage configurations contain only one or two >> >>>>> formats >> >>>>>>>>>>>> (parquet, json, csv). Checking the storages in the web gui I >> >>>>> noticed >> >>>>>>>>> that >> >>>>>>>>>>>> for all the storages all formats are enabled, e.g. msaccess, >> >>>>> iceberg >> >>>>>>>>> etc. >> >>>>>>>>>>>> >> >>>>>>>>>>>> Is this on purpose or did I do something wrong? >> >>>>>>>>>>>> >> >>>>>>>>>>>> Example configuration: >> >>>>>>>>>>>> >> >>>>>>>>>>>> "storage": { >> >>>>>>>>>>>> dfs: { >> >>>>>>>>>>>> type: "file", >> >>>>>>>>>>>> connection: "file:///", >> >>>>>>>>>>>> workspaces: { >> >>>>>>>>>>>> "tmp": { >> >>>>>>>>>>>> "location": "/tmp", >> >>>>>>>>>>>> "writable": true, >> >>>>>>>>>>>> "defaultInputFormat": null, >> >>>>>>>>>>>> "allowAccessOutsideWorkspace": false >> >>>>>>>>>>>> }, >> >>>>>>>>>>>> "root": { >> >>>>>>>>>>>> "location": "/", >> >>>>>>>>>>>> "writable": false, >> >>>>>>>>>>>> "defaultInputFormat": null, >> >>>>>>>>>>>> "allowAccessOutsideWorkspace": false >> >>>>>>>>>>>> } >> >>>>>>>>>>>> }, >> >>>>>>>>>>>> formats: { >> >>>>>>>>>>>> "parquet": { >> >>>>>>>>>>>> "type": "parquet" >> >>>>>>>>>>>> }, >> >>>>>>>>>>>> "json": { >> >>>>>>>>>>>> "type": "json", >> >>>>>>>>>>>> "extensions": [ >> >>>>>>>>>>>> "json" >> >>>>>>>>>>>> ] >> >>>>>>>>>>>> } >> >>>>>>>>>>>> }, >> >>>>>>>>>>>> enabled: true >> >>>>>>>>>>>> } >> >>>>>>>>>>>> } >> >>>>>>>>>>>> >> >>>>>>>>>>>> regards >> >>>>>>>>>>>> Stefan >> >>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>> >> >>>>> >> >>> >> >>> >> > >> >>