My sense of what is happening in your use case is that the configs that exist 
in the UI are overriding the conf file.   What it seems like you want is the 
opposite order of precedence.  I've never used the conf files for this, so I 
don't have a lot of experience with that, but it would seem that the best way 
to get your Drill cluster configured to so what you want is to delete or 
disable the configs in the UI and only use the ones in the config file. 

By conflicting I meant that let's say that you have a plugin called dfs that 
has the json format enabled.  If you put a configuration for a plugin also 
called dfs in the conf file, what I think is happening is that since you have 
two plugins with the same name, Drill will read the ones from the UI.  (FYSA, 
they aren't actually stored in the UI.  If you are using Drill in distributed 
mode, those configurations are stored in zookeeper.  If you are in embedded 
mode, they are stored on your drive somewhere.) 

Anyway,  IMHO, the best thing to do would be to make sure that the plugins in 
your conf file do not have the same names as the pluigns that appear in the UI. 
 That's what I was getting at.  Does that make sense?
Best,
-- C

> On Jul 12, 2023, at 11:57 AM, Stefan Ziegler <stefan.ziegler...@gmail.com> 
> wrote:
> 
> Hi Charles
> 
> not sure if I understand you correctly: what do you mean with “not 
> conflicting”. My attempt is to not use the UI at all to configure storages. I 
> thought this can be achieved by overriding the defaut storages with the 
> “override” file. This seems to work except the strange behaviour with the 
> formats.
> 
> regards
> Stefan
> 
> Sent from Outlook for iOS<https://aka.ms/o0ukef>
> ________________________________
> From: Charles Givre <cgi...@gmail.com>
> Sent: Wednesday, July 12, 2023 5:04 PM
> To: user <user@drill.apache.org>
> Subject: Re: Respecting formats restriction when using 
> storage-plugins-override.conf
> 
> Hi Stefan,
> My biggest piece of advice here would just be to make sure the plugins 
> specified in the override file do not conflict with the UI-based configs.   
> It may make sense to have completely different configs in each location. IE:
> 
> dfs-conf and (plain) dfs.
> 
> I think that should solve all issues.  In theory if you remove a config from 
> the "formats" section, Drill should not be able to parse the file in 
> question.  So for example if you don't have the 'csv' format or 'excel' then 
> Drill will not be able to parse those formats.
> 
> Best,
> -- C
> 
> 
>> On Jul 11, 2023, at 2:42 AM, Stefan Ziegler <stefan.ziegler...@gmail.com> 
>> wrote:
>> 
>> The config für dfs in the UI looks like this:
>> 
>> {
>> "type": "file",
>> "connection": "file:///",
>> "workspaces": {
>>   "tmp": {
>>     "location": "/tmp",
>>     "writable": true,
>>     "defaultInputFormat": null,
>>     "allowAccessOutsideWorkspace": false
>>   },
>>   "root": {
>>     "location": "/",
>>     "writable": false,
>>     "defaultInputFormat": null,
>>     "allowAccessOutsideWorkspace": false
>>   },
>>   "home": {
>>     "location": "/Users/stefan",
>>     "writable": true,
>>     "defaultInputFormat": null,
>>     "allowAccessOutsideWorkspace": false
>>   }
>> },
>> "formats": {
>>   "parquet": {
>>     "type": "parquet"
>>   },
>>   "json": {
>>     "type": "json",
>>     "extensions": [
>>       "json"
>>     ]
>>   },
>>   "excel": {
>>     "type": "excel",
>>     "extensions": [
>>       "xlsx"
>>     ],
>>     "lastRow": 1048576,
>>     "ignoreErrors": true,
>>     "maxArraySize": -1,
>>     "thresholdBytesForTempFiles": -1
>>   },
>>   "spss": {
>>     "type": "spss",
>>     "extensions": [
>>       "sav"
>>     ]
>>   },
>>   "iceberg": {
>>     "type": "iceberg",
>>     "properties": null,
>>     "caseSensitive": null,
>>     "includeColumnStats": null,
>>     "ignoreResiduals": null,
>>     "snapshotId": null,
>>     "snapshotAsOfTime": null,
>>     "fromSnapshotId": null,
>>     "toSnapshotId": null
>>   },
>>   "httpd": {
>>     "type": "httpd",
>>     "extensions": [
>>       "httpd"
>>     ],
>>     "logFormat": "common\ncombined"
>>   },
>>   "xml": {
>>     "type": "xml",
>>     "extensions": [
>>       "xml"
>>     ],
>>     "dataLevel": 1
>>   },
>>   "syslog": {
>>     "type": "syslog",
>>     "extensions": [
>>       "syslog"
>>     ],
>>     "maxErrors": 10
>>   },
>>   "msaccess": {
>>     "type": "msaccess",
>>     "extensions": [
>>       "mdb",
>>       "accdb"
>>     ]
>>   },
>>   "hdf5": {
>>     "type": "hdf5",
>>     "extensions": [
>>       "h5"
>>     ],
>>     "defaultPath": null
>>   },
>>   "ltsv": {
>>     "type": "ltsv",
>>     "extensions": [
>>       "ltsv"
>>     ],
>>     "parseMode": "lenient",
>>     "escapeCharacter": null,
>>     "kvDelimiter": null,
>>     "entryDelimiter": null,
>>     "lineEnding": null,
>>     "quoteChar": null
>>   },
>>   "delta": {
>>     "type": "delta",
>>     "version": null,
>>     "timestamp": null
>>   },
>>   "shp": {
>>     "type": "shp",
>>     "extensions": [
>>       "shp"
>>     ]
>>   },
>>   "image": {
>>     "type": "image",
>>     "extensions": [
>>       "jpg",
>>       "jpeg",
>>       "jpe",
>>       "tif",
>>       "tiff",
>>       "dng",
>>       "psd",
>>       "png",
>>       "bmp",
>>       "gif",
>>       "ico",
>>       "pcx",
>>       "wav",
>>       "wave",
>>       "avi",
>>       "webp",
>>       "mov",
>>       "mp4",
>>       "m4a",
>>       "m4p",
>>       "m4b",
>>       "m4r",
>>       "m4v",
>>       "3gp",
>>       "3g2",
>>       "eps",
>>       "epsf",
>>       "epsi",
>>       "ai",
>>       "arw",
>>       "crw",
>>       "cr2",
>>       "nef",
>>       "orf",
>>       "raf",
>>       "rw2",
>>       "rwl",
>>       "srw",
>>       "x3f"
>>     ],
>>     "fileSystemMetadata": true,
>>     "descriptive": true
>>   },
>>   "pdf": {
>>     "type": "pdf",
>>     "extensions": [
>>       "pdf"
>>     ],
>>     "extractHeaders": true,
>>     "extractionAlgorithm": "basic"
>>   },
>>   "sas": {
>>     "type": "sas",
>>     "extensions": [
>>       "sas7bdat"
>>     ]
>>   },
>>   "pcap": {
>>     "type": "pcap",
>>     "extensions": [
>>       "pcap",
>>       "pcapng"
>>     ]
>>   }
>> },
>> "authMode": "SHARED_USER",
>> "enabled": true
>> }
>> 
>> I'm now able to query some XML data: "SELECT * FROM
>> dfs.home.`ch.so.afu.abbaustellen.xml`;" Which I actually don't want to be
>> able to (see formats in the "storage-plugins-override.conf" file). If I
>> remove the xml format section in the config in the UI, I'm not able to
>> query the xml anymore: "Error: VALIDATION ERROR: From line 1, column 15 to
>> line 1, column 51: Object 'ch.so.afu.abbaustellen.xml' not found within
>> 'dfs.home'".
>> 
>> regards
>> Stefan
>> 
>> 
>> On Mon, Jul 10, 2023 at 9:15 PM Charles Givre <cgi...@gmail.com> wrote:
>> 
>>> HI Stefan,
>>> What's in the config in the UI?  Can you also please clarify what queries
>>> are running which indicate that your configs aren't working?
>>> Best,
>>> -- C
>>> 
>>> 
>>> 
>>>> On Jul 10, 2023, at 1:11 PM, Stefan Ziegler <stefan.ziegler...@gmail.com>
>>> wrote:
>>>> 
>>>> "storage": {
>>>> cp: {
>>>>  type: "file",
>>>>  connection: "classpath:///",
>>>>  formats: {
>>>>    "csv" : {
>>>>      type: "text",
>>>>      extensions: [ "csv" ],
>>>>      delimiter: ","
>>>>    }
>>>>  }
>>>>  enabled: true
>>>> }
>>>> }
>>>> "storage": {
>>>> dfs: {
>>>>  type: "file",
>>>>  connection: "file:///",
>>>>  workspaces: {
>>>>    "tmp": {
>>>>      "location": "/tmp",
>>>>      "writable": true,
>>>>      "defaultInputFormat": null,
>>>>      "allowAccessOutsideWorkspace": false
>>>>    },
>>>>    "home": {
>>>>      "location": "/Users/stefan",
>>>>      "writable": true,
>>>>      "defaultInputFormat": null,
>>>>      "allowAccessOutsideWorkspace": false
>>>>    },
>>>>    "root": {
>>>>      "location": "/",
>>>>      "writable": false,
>>>>      "defaultInputFormat": null,
>>>>      "allowAccessOutsideWorkspace": false
>>>>    }
>>>>  },
>>>>  formats: {
>>>>    "parquet": {
>>>>      "type": "parquet"
>>>>    },
>>>>    "json": {
>>>>      "type": "json",
>>>>      "extensions": [
>>>>        "json"
>>>>      ]
>>>>    }
>>>>  },
>>>>  enabled: true
>>>> }
>>>> }
>>>> "storage": {
>>>> s3: {
>>>>  type: "file",
>>>>  connection: "s3a://<my-bucket-name>",
>>>>  config: {
>>>>    "fs.s3a.aws.credentials.provider":
>>>> "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider",
>>>>    "fs.s3a.endpoint": "s3.eu-central-1.amazonaws.com",
>>>>    "fs.s3a.impl.disable.cache": "false"
>>>>  },
>>>>  workspaces: {
>>>>    "root": {
>>>>      "location": "/",
>>>>      "writable": false,
>>>>      "defaultInputFormat": "parquet",
>>>>      "allowAccessOutsideWorkspace": false
>>>>    }
>>>>  },
>>>>  "formats": {
>>>>    "parquet": {
>>>>      "type": "parquet"
>>>>    }
>>>>  },
>>>>  enabled: true
>>>> }
>>>> }
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Mon, Jul 10, 2023 at 6:40 PM Charles Givre <cgi...@gmail.com> wrote:
>>>> 
>>>>> Can you share your configs with any sensitive info redacted?  The lists
>>>>> don't support images, so please just cut/paste the json.
>>>>> I had another idea...
>>>>> -- C
>>>>> 
>>>>> 
>>>>>> On Jul 10, 2023, at 12:28 PM, Stefan Ziegler <
>>>>> stefan.ziegler...@gmail.com> wrote:
>>>>>> 
>>>>>> Yes, I think I'm following these instructions. And the file is not
>>>>>> completely ignored. It creates additional format definitions. Let's
>>> say I
>>>>>> white list some formats in my storage configuration and Drill adds more
>>>>>> formats (which I don't want). Is there another way to start a "vanilla"
>>>>>> Drill installation with my own configurations?
>>>>>> 
>>>>>> Stefan
>>>>>> 
>>>>>> On Mon, Jul 10, 2023 at 6:17 PM Charles Givre <cgi...@gmail.com>
>>> wrote:
>>>>>> 
>>>>>>> Hi Stefan,
>>>>>>> My apologies.. Ok.. so the issue is that the
>>>>> storage-plugins-override.conf
>>>>>>> is being ignored.  I've never actually used this feature, so I wasn't
>>>>>>> familiar with it, but are you folllowing the instructions here [1]
>>> with
>>>>>>> respect to configuration and restarting Drill?  My suggestion would be
>>>>> to
>>>>>>> remove all the plugins in the UI and only specify them in the .conf
>>>>> file.
>>>>>>> Drill has an order of precedence and I suspect what is happening is
>>> that
>>>>>>> the UI versions have a higher priority than the .conf versions.   Does
>>>>> that
>>>>>>> make sense?
>>>>>>> 
>>>>>>> -- C
>>>>>>> 
>>>>>>> [1]:
>>>>>>> 
>>>>> 
>>> https://drill.apache.org/docs/configuring-storage-plugins/#configuring-storage-plugins-with-the-storage-plugins-overrideconf-file
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Jul 10, 2023, at 12:06 PM, Stefan Ziegler <
>>>>>>> stefan.ziegler...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Hi Charles
>>>>>>>> 
>>>>>>>> I use a "storage-plugins-override.conf" file. My attempt is to have
>>> the
>>>>>>>> configuration for my storages in a single file and Drill can pick up
>>>>> the
>>>>>>>> configuration on startup. I put "storage-plugins-override.conf" in
>>> the
>>>>>>> conf
>>>>>>>> directory and Drill creates the storages on startup but (and that is
>>> my
>>>>>>>> problem) also creates all formats for every storage defined in my
>>>>> config
>>>>>>>> file. E.g. I have a (local) file type storage and I define two
>>> formats
>>>>>>>> (parquet and json) in it. Drill does not respect my restriction to
>>> two
>>>>>>>> formats in the config file but creates all formats known to Drill
>>> (like
>>>>>>>> iceberg, xml etc.).
>>>>>>>> 
>>>>>>>> regards
>>>>>>>> Stefan
>>>>>>>> 
>>>>>>>> On Mon, Jul 10, 2023 at 5:30 PM Charles Givre <cgi...@gmail.com>
>>>>> wrote:
>>>>>>>> 
>>>>>>>>> HI Stefan,
>>>>>>>>> Thanks for your interest in Drill.  You have to define the format
>>>>> config
>>>>>>>>> for each storage plugin.  Otherwise Drill doesn't know what
>>> extension
>>>>> to
>>>>>>>>> associate with what format plugin.  Out of curiosity, why are you
>>>>> using
>>>>>>> the
>>>>>>>>> .conf files for this?
>>>>>>>>> -- C
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Jul 9, 2023, at 12:03 PM, Stefan Ziegler <
>>>>>>> stefan.ziegler...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Not defining a format seems to prevent the user from querying the
>>>>>>>>> specific
>>>>>>>>>> format. E.g. after deleting the xml format definition in the web
>>> gui,
>>>>>>> I'm
>>>>>>>>>> not able to query xml files anymore. So I guess my assumption was
>>>>>>> right.
>>>>>>>>>> 
>>>>>>>>>> Stefan
>>>>>>>>>> 
>>>>>>>>>> On Sun, Jul 9, 2023 at 5:41 PM Stefan Ziegler <
>>>>>>>>> stefan.ziegler...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Btw: I assumed that the list of formats act as a restriction.
>>>>> Probably
>>>>>>>>> I'm
>>>>>>>>>>> wrong.
>>>>>>>>>>> 
>>>>>>>>>>> Stefan
>>>>>>>>>>> 
>>>>>>>>>>> On Sun, Jul 9, 2023 at 5:27 PM Stefan Ziegler <
>>>>>>>>> stefan.ziegler...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi
>>>>>>>>>>>> 
>>>>>>>>>>>> I'm using storage-plugins-override.conf to configure the storage
>>>>>>>>> plugins
>>>>>>>>>>>> on startup. My storage configurations contain only one or two
>>>>> formats
>>>>>>>>>>>> (parquet, json, csv). Checking the storages in the web gui I
>>>>> noticed
>>>>>>>>> that
>>>>>>>>>>>> for all the storages all formats are enabled, e.g. msaccess,
>>>>> iceberg
>>>>>>>>> etc.
>>>>>>>>>>>> 
>>>>>>>>>>>> Is this on purpose or did I do something wrong?
>>>>>>>>>>>> 
>>>>>>>>>>>> Example configuration:
>>>>>>>>>>>> 
>>>>>>>>>>>> "storage": {
>>>>>>>>>>>> dfs: {
>>>>>>>>>>>> type: "file",
>>>>>>>>>>>> connection: "file:///",
>>>>>>>>>>>> workspaces: {
>>>>>>>>>>>> "tmp": {
>>>>>>>>>>>>   "location": "/tmp",
>>>>>>>>>>>>   "writable": true,
>>>>>>>>>>>>   "defaultInputFormat": null,
>>>>>>>>>>>>   "allowAccessOutsideWorkspace": false
>>>>>>>>>>>> },
>>>>>>>>>>>> "root": {
>>>>>>>>>>>>   "location": "/",
>>>>>>>>>>>>   "writable": false,
>>>>>>>>>>>>   "defaultInputFormat": null,
>>>>>>>>>>>>   "allowAccessOutsideWorkspace": false
>>>>>>>>>>>> }
>>>>>>>>>>>> },
>>>>>>>>>>>> formats: {
>>>>>>>>>>>> "parquet": {
>>>>>>>>>>>>   "type": "parquet"
>>>>>>>>>>>> },
>>>>>>>>>>>> "json": {
>>>>>>>>>>>>   "type": "json",
>>>>>>>>>>>>   "extensions": [
>>>>>>>>>>>>     "json"
>>>>>>>>>>>>   ]
>>>>>>>>>>>> }
>>>>>>>>>>>> },
>>>>>>>>>>>> enabled: true
>>>>>>>>>>>> }
>>>>>>>>>>>> }
>>>>>>>>>>>> 
>>>>>>>>>>>> regards
>>>>>>>>>>>> Stefan
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
> 

Reply via email to