Hi Charles

it was indeed some conflict. After renaming my storage plugin to a distinct
name it worked.

regards
Stefan

On Wed, Jul 12, 2023 at 6:48 PM Stefan Ziegler <stefan.ziegler...@gmail.com>
wrote:

> Thanks. Yes. I'm going to try the renaming approach.
>
> Not a rant but isn't the whole point of a "storage-plugins-override.conf"
> to override storage plugin configuration?
>
> Btw: I'm in embedded mode. So I guess I can also use the config files from
> /tmp/drill after "fixing" the format configuration in the ui and use them
> e.g. in a docker image.
>
> regards
> Stefan
>
> On Wed, Jul 12, 2023 at 6:04 PM Charles Givre <cgi...@gmail.com> wrote:
>
>> My sense of what is happening in your use case is that the configs that
>> exist in the UI are overriding the conf file.   What it seems like you want
>> is the opposite order of precedence.  I've never used the conf files for
>> this, so I don't have a lot of experience with that, but it would seem that
>> the best way to get your Drill cluster configured to so what you want is to
>> delete or disable the configs in the UI and only use the ones in the config
>> file.
>>
>> By conflicting I meant that let's say that you have a plugin called dfs
>> that has the json format enabled.  If you put a configuration for a plugin
>> also called dfs in the conf file, what I think is happening is that since
>> you have two plugins with the same name, Drill will read the ones from the
>> UI.  (FYSA, they aren't actually stored in the UI.  If you are using Drill
>> in distributed mode, those configurations are stored in zookeeper.  If you
>> are in embedded mode, they are stored on your drive somewhere.)
>>
>> Anyway,  IMHO, the best thing to do would be to make sure that the
>> plugins in your conf file do not have the same names as the pluigns that
>> appear in the UI.  That's what I was getting at.  Does that make sense?
>> Best,
>> -- C
>>
>> > On Jul 12, 2023, at 11:57 AM, Stefan Ziegler <
>> stefan.ziegler...@gmail.com> wrote:
>> >
>> > Hi Charles
>> >
>> > not sure if I understand you correctly: what do you mean with “not
>> conflicting”. My attempt is to not use the UI at all to configure storages.
>> I thought this can be achieved by overriding the defaut storages with the
>> “override” file. This seems to work except the strange behaviour with the
>> formats.
>> >
>> > regards
>> > Stefan
>> >
>> > Sent from Outlook for iOS<https://aka.ms/o0ukef>
>> > ________________________________
>> > From: Charles Givre <cgi...@gmail.com>
>> > Sent: Wednesday, July 12, 2023 5:04 PM
>> > To: user <user@drill.apache.org>
>> > Subject: Re: Respecting formats restriction when using
>> storage-plugins-override.conf
>> >
>> > Hi Stefan,
>> > My biggest piece of advice here would just be to make sure the plugins
>> specified in the override file do not conflict with the UI-based configs.
>>  It may make sense to have completely different configs in each location.
>> IE:
>> >
>> > dfs-conf and (plain) dfs.
>> >
>> > I think that should solve all issues.  In theory if you remove a config
>> from the "formats" section, Drill should not be able to parse the file in
>> question.  So for example if you don't have the 'csv' format or 'excel'
>> then Drill will not be able to parse those formats.
>> >
>> > Best,
>> > -- C
>> >
>> >
>> >> On Jul 11, 2023, at 2:42 AM, Stefan Ziegler <
>> stefan.ziegler...@gmail.com> wrote:
>> >>
>> >> The config für dfs in the UI looks like this:
>> >>
>> >> {
>> >> "type": "file",
>> >> "connection": "file:///",
>> >> "workspaces": {
>> >>   "tmp": {
>> >>     "location": "/tmp",
>> >>     "writable": true,
>> >>     "defaultInputFormat": null,
>> >>     "allowAccessOutsideWorkspace": false
>> >>   },
>> >>   "root": {
>> >>     "location": "/",
>> >>     "writable": false,
>> >>     "defaultInputFormat": null,
>> >>     "allowAccessOutsideWorkspace": false
>> >>   },
>> >>   "home": {
>> >>     "location": "/Users/stefan",
>> >>     "writable": true,
>> >>     "defaultInputFormat": null,
>> >>     "allowAccessOutsideWorkspace": false
>> >>   }
>> >> },
>> >> "formats": {
>> >>   "parquet": {
>> >>     "type": "parquet"
>> >>   },
>> >>   "json": {
>> >>     "type": "json",
>> >>     "extensions": [
>> >>       "json"
>> >>     ]
>> >>   },
>> >>   "excel": {
>> >>     "type": "excel",
>> >>     "extensions": [
>> >>       "xlsx"
>> >>     ],
>> >>     "lastRow": 1048576,
>> >>     "ignoreErrors": true,
>> >>     "maxArraySize": -1,
>> >>     "thresholdBytesForTempFiles": -1
>> >>   },
>> >>   "spss": {
>> >>     "type": "spss",
>> >>     "extensions": [
>> >>       "sav"
>> >>     ]
>> >>   },
>> >>   "iceberg": {
>> >>     "type": "iceberg",
>> >>     "properties": null,
>> >>     "caseSensitive": null,
>> >>     "includeColumnStats": null,
>> >>     "ignoreResiduals": null,
>> >>     "snapshotId": null,
>> >>     "snapshotAsOfTime": null,
>> >>     "fromSnapshotId": null,
>> >>     "toSnapshotId": null
>> >>   },
>> >>   "httpd": {
>> >>     "type": "httpd",
>> >>     "extensions": [
>> >>       "httpd"
>> >>     ],
>> >>     "logFormat": "common\ncombined"
>> >>   },
>> >>   "xml": {
>> >>     "type": "xml",
>> >>     "extensions": [
>> >>       "xml"
>> >>     ],
>> >>     "dataLevel": 1
>> >>   },
>> >>   "syslog": {
>> >>     "type": "syslog",
>> >>     "extensions": [
>> >>       "syslog"
>> >>     ],
>> >>     "maxErrors": 10
>> >>   },
>> >>   "msaccess": {
>> >>     "type": "msaccess",
>> >>     "extensions": [
>> >>       "mdb",
>> >>       "accdb"
>> >>     ]
>> >>   },
>> >>   "hdf5": {
>> >>     "type": "hdf5",
>> >>     "extensions": [
>> >>       "h5"
>> >>     ],
>> >>     "defaultPath": null
>> >>   },
>> >>   "ltsv": {
>> >>     "type": "ltsv",
>> >>     "extensions": [
>> >>       "ltsv"
>> >>     ],
>> >>     "parseMode": "lenient",
>> >>     "escapeCharacter": null,
>> >>     "kvDelimiter": null,
>> >>     "entryDelimiter": null,
>> >>     "lineEnding": null,
>> >>     "quoteChar": null
>> >>   },
>> >>   "delta": {
>> >>     "type": "delta",
>> >>     "version": null,
>> >>     "timestamp": null
>> >>   },
>> >>   "shp": {
>> >>     "type": "shp",
>> >>     "extensions": [
>> >>       "shp"
>> >>     ]
>> >>   },
>> >>   "image": {
>> >>     "type": "image",
>> >>     "extensions": [
>> >>       "jpg",
>> >>       "jpeg",
>> >>       "jpe",
>> >>       "tif",
>> >>       "tiff",
>> >>       "dng",
>> >>       "psd",
>> >>       "png",
>> >>       "bmp",
>> >>       "gif",
>> >>       "ico",
>> >>       "pcx",
>> >>       "wav",
>> >>       "wave",
>> >>       "avi",
>> >>       "webp",
>> >>       "mov",
>> >>       "mp4",
>> >>       "m4a",
>> >>       "m4p",
>> >>       "m4b",
>> >>       "m4r",
>> >>       "m4v",
>> >>       "3gp",
>> >>       "3g2",
>> >>       "eps",
>> >>       "epsf",
>> >>       "epsi",
>> >>       "ai",
>> >>       "arw",
>> >>       "crw",
>> >>       "cr2",
>> >>       "nef",
>> >>       "orf",
>> >>       "raf",
>> >>       "rw2",
>> >>       "rwl",
>> >>       "srw",
>> >>       "x3f"
>> >>     ],
>> >>     "fileSystemMetadata": true,
>> >>     "descriptive": true
>> >>   },
>> >>   "pdf": {
>> >>     "type": "pdf",
>> >>     "extensions": [
>> >>       "pdf"
>> >>     ],
>> >>     "extractHeaders": true,
>> >>     "extractionAlgorithm": "basic"
>> >>   },
>> >>   "sas": {
>> >>     "type": "sas",
>> >>     "extensions": [
>> >>       "sas7bdat"
>> >>     ]
>> >>   },
>> >>   "pcap": {
>> >>     "type": "pcap",
>> >>     "extensions": [
>> >>       "pcap",
>> >>       "pcapng"
>> >>     ]
>> >>   }
>> >> },
>> >> "authMode": "SHARED_USER",
>> >> "enabled": true
>> >> }
>> >>
>> >> I'm now able to query some XML data: "SELECT * FROM
>> >> dfs.home.`ch.so.afu.abbaustellen.xml`;" Which I actually don't want to
>> be
>> >> able to (see formats in the "storage-plugins-override.conf" file). If I
>> >> remove the xml format section in the config in the UI, I'm not able to
>> >> query the xml anymore: "Error: VALIDATION ERROR: From line 1, column
>> 15 to
>> >> line 1, column 51: Object 'ch.so.afu.abbaustellen.xml' not found within
>> >> 'dfs.home'".
>> >>
>> >> regards
>> >> Stefan
>> >>
>> >>
>> >> On Mon, Jul 10, 2023 at 9:15 PM Charles Givre <cgi...@gmail.com>
>> wrote:
>> >>
>> >>> HI Stefan,
>> >>> What's in the config in the UI?  Can you also please clarify what
>> queries
>> >>> are running which indicate that your configs aren't working?
>> >>> Best,
>> >>> -- C
>> >>>
>> >>>
>> >>>
>> >>>> On Jul 10, 2023, at 1:11 PM, Stefan Ziegler <
>> stefan.ziegler...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> "storage": {
>> >>>> cp: {
>> >>>>  type: "file",
>> >>>>  connection: "classpath:///",
>> >>>>  formats: {
>> >>>>    "csv" : {
>> >>>>      type: "text",
>> >>>>      extensions: [ "csv" ],
>> >>>>      delimiter: ","
>> >>>>    }
>> >>>>  }
>> >>>>  enabled: true
>> >>>> }
>> >>>> }
>> >>>> "storage": {
>> >>>> dfs: {
>> >>>>  type: "file",
>> >>>>  connection: "file:///",
>> >>>>  workspaces: {
>> >>>>    "tmp": {
>> >>>>      "location": "/tmp",
>> >>>>      "writable": true,
>> >>>>      "defaultInputFormat": null,
>> >>>>      "allowAccessOutsideWorkspace": false
>> >>>>    },
>> >>>>    "home": {
>> >>>>      "location": "/Users/stefan",
>> >>>>      "writable": true,
>> >>>>      "defaultInputFormat": null,
>> >>>>      "allowAccessOutsideWorkspace": false
>> >>>>    },
>> >>>>    "root": {
>> >>>>      "location": "/",
>> >>>>      "writable": false,
>> >>>>      "defaultInputFormat": null,
>> >>>>      "allowAccessOutsideWorkspace": false
>> >>>>    }
>> >>>>  },
>> >>>>  formats: {
>> >>>>    "parquet": {
>> >>>>      "type": "parquet"
>> >>>>    },
>> >>>>    "json": {
>> >>>>      "type": "json",
>> >>>>      "extensions": [
>> >>>>        "json"
>> >>>>      ]
>> >>>>    }
>> >>>>  },
>> >>>>  enabled: true
>> >>>> }
>> >>>> }
>> >>>> "storage": {
>> >>>> s3: {
>> >>>>  type: "file",
>> >>>>  connection: "s3a://<my-bucket-name>",
>> >>>>  config: {
>> >>>>    "fs.s3a.aws.credentials.provider":
>> >>>> "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider",
>> >>>>    "fs.s3a.endpoint": "s3.eu-central-1.amazonaws.com",
>> >>>>    "fs.s3a.impl.disable.cache": "false"
>> >>>>  },
>> >>>>  workspaces: {
>> >>>>    "root": {
>> >>>>      "location": "/",
>> >>>>      "writable": false,
>> >>>>      "defaultInputFormat": "parquet",
>> >>>>      "allowAccessOutsideWorkspace": false
>> >>>>    }
>> >>>>  },
>> >>>>  "formats": {
>> >>>>    "parquet": {
>> >>>>      "type": "parquet"
>> >>>>    }
>> >>>>  },
>> >>>>  enabled: true
>> >>>> }
>> >>>> }
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Mon, Jul 10, 2023 at 6:40 PM Charles Givre <cgi...@gmail.com>
>> wrote:
>> >>>>
>> >>>>> Can you share your configs with any sensitive info redacted?  The
>> lists
>> >>>>> don't support images, so please just cut/paste the json.
>> >>>>> I had another idea...
>> >>>>> -- C
>> >>>>>
>> >>>>>
>> >>>>>> On Jul 10, 2023, at 12:28 PM, Stefan Ziegler <
>> >>>>> stefan.ziegler...@gmail.com> wrote:
>> >>>>>>
>> >>>>>> Yes, I think I'm following these instructions. And the file is not
>> >>>>>> completely ignored. It creates additional format definitions. Let's
>> >>> say I
>> >>>>>> white list some formats in my storage configuration and Drill adds
>> more
>> >>>>>> formats (which I don't want). Is there another way to start a
>> "vanilla"
>> >>>>>> Drill installation with my own configurations?
>> >>>>>>
>> >>>>>> Stefan
>> >>>>>>
>> >>>>>> On Mon, Jul 10, 2023 at 6:17 PM Charles Givre <cgi...@gmail.com>
>> >>> wrote:
>> >>>>>>
>> >>>>>>> Hi Stefan,
>> >>>>>>> My apologies.. Ok.. so the issue is that the
>> >>>>> storage-plugins-override.conf
>> >>>>>>> is being ignored.  I've never actually used this feature, so I
>> wasn't
>> >>>>>>> familiar with it, but are you folllowing the instructions here [1]
>> >>> with
>> >>>>>>> respect to configuration and restarting Drill?  My suggestion
>> would be
>> >>>>> to
>> >>>>>>> remove all the plugins in the UI and only specify them in the
>> .conf
>> >>>>> file.
>> >>>>>>> Drill has an order of precedence and I suspect what is happening
>> is
>> >>> that
>> >>>>>>> the UI versions have a higher priority than the .conf versions.
>>  Does
>> >>>>> that
>> >>>>>>> make sense?
>> >>>>>>>
>> >>>>>>> -- C
>> >>>>>>>
>> >>>>>>> [1]:
>> >>>>>>>
>> >>>>>
>> >>>
>> https://drill.apache.org/docs/configuring-storage-plugins/#configuring-storage-plugins-with-the-storage-plugins-overrideconf-file
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>> On Jul 10, 2023, at 12:06 PM, Stefan Ziegler <
>> >>>>>>> stefan.ziegler...@gmail.com> wrote:
>> >>>>>>>>
>> >>>>>>>> Hi Charles
>> >>>>>>>>
>> >>>>>>>> I use a "storage-plugins-override.conf" file. My attempt is to
>> have
>> >>> the
>> >>>>>>>> configuration for my storages in a single file and Drill can
>> pick up
>> >>>>> the
>> >>>>>>>> configuration on startup. I put "storage-plugins-override.conf"
>> in
>> >>> the
>> >>>>>>> conf
>> >>>>>>>> directory and Drill creates the storages on startup but (and
>> that is
>> >>> my
>> >>>>>>>> problem) also creates all formats for every storage defined in my
>> >>>>> config
>> >>>>>>>> file. E.g. I have a (local) file type storage and I define two
>> >>> formats
>> >>>>>>>> (parquet and json) in it. Drill does not respect my restriction
>> to
>> >>> two
>> >>>>>>>> formats in the config file but creates all formats known to Drill
>> >>> (like
>> >>>>>>>> iceberg, xml etc.).
>> >>>>>>>>
>> >>>>>>>> regards
>> >>>>>>>> Stefan
>> >>>>>>>>
>> >>>>>>>> On Mon, Jul 10, 2023 at 5:30 PM Charles Givre <cgi...@gmail.com>
>> >>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> HI Stefan,
>> >>>>>>>>> Thanks for your interest in Drill.  You have to define the
>> format
>> >>>>> config
>> >>>>>>>>> for each storage plugin.  Otherwise Drill doesn't know what
>> >>> extension
>> >>>>> to
>> >>>>>>>>> associate with what format plugin.  Out of curiosity, why are
>> you
>> >>>>> using
>> >>>>>>> the
>> >>>>>>>>> .conf files for this?
>> >>>>>>>>> -- C
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>> On Jul 9, 2023, at 12:03 PM, Stefan Ziegler <
>> >>>>>>> stefan.ziegler...@gmail.com>
>> >>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> Not defining a format seems to prevent the user from querying
>> the
>> >>>>>>>>> specific
>> >>>>>>>>>> format. E.g. after deleting the xml format definition in the
>> web
>> >>> gui,
>> >>>>>>> I'm
>> >>>>>>>>>> not able to query xml files anymore. So I guess my assumption
>> was
>> >>>>>>> right.
>> >>>>>>>>>>
>> >>>>>>>>>> Stefan
>> >>>>>>>>>>
>> >>>>>>>>>> On Sun, Jul 9, 2023 at 5:41 PM Stefan Ziegler <
>> >>>>>>>>> stefan.ziegler...@gmail.com>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> Btw: I assumed that the list of formats act as a restriction.
>> >>>>> Probably
>> >>>>>>>>> I'm
>> >>>>>>>>>>> wrong.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Stefan
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Sun, Jul 9, 2023 at 5:27 PM Stefan Ziegler <
>> >>>>>>>>> stefan.ziegler...@gmail.com>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Hi
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I'm using storage-plugins-override.conf to configure the
>> storage
>> >>>>>>>>> plugins
>> >>>>>>>>>>>> on startup. My storage configurations contain only one or two
>> >>>>> formats
>> >>>>>>>>>>>> (parquet, json, csv). Checking the storages in the web gui I
>> >>>>> noticed
>> >>>>>>>>> that
>> >>>>>>>>>>>> for all the storages all formats are enabled, e.g. msaccess,
>> >>>>> iceberg
>> >>>>>>>>> etc.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Is this on purpose or did I do something wrong?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Example configuration:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> "storage": {
>> >>>>>>>>>>>> dfs: {
>> >>>>>>>>>>>> type: "file",
>> >>>>>>>>>>>> connection: "file:///",
>> >>>>>>>>>>>> workspaces: {
>> >>>>>>>>>>>> "tmp": {
>> >>>>>>>>>>>>   "location": "/tmp",
>> >>>>>>>>>>>>   "writable": true,
>> >>>>>>>>>>>>   "defaultInputFormat": null,
>> >>>>>>>>>>>>   "allowAccessOutsideWorkspace": false
>> >>>>>>>>>>>> },
>> >>>>>>>>>>>> "root": {
>> >>>>>>>>>>>>   "location": "/",
>> >>>>>>>>>>>>   "writable": false,
>> >>>>>>>>>>>>   "defaultInputFormat": null,
>> >>>>>>>>>>>>   "allowAccessOutsideWorkspace": false
>> >>>>>>>>>>>> }
>> >>>>>>>>>>>> },
>> >>>>>>>>>>>> formats: {
>> >>>>>>>>>>>> "parquet": {
>> >>>>>>>>>>>>   "type": "parquet"
>> >>>>>>>>>>>> },
>> >>>>>>>>>>>> "json": {
>> >>>>>>>>>>>>   "type": "json",
>> >>>>>>>>>>>>   "extensions": [
>> >>>>>>>>>>>>     "json"
>> >>>>>>>>>>>>   ]
>> >>>>>>>>>>>> }
>> >>>>>>>>>>>> },
>> >>>>>>>>>>>> enabled: true
>> >>>>>>>>>>>> }
>> >>>>>>>>>>>> }
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> regards
>> >>>>>>>>>>>> Stefan
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>>>
>> >>>
>> >>>
>> >
>>
>>

Reply via email to