Re: [gdal-dev] Passing open options along dataset name in a string ?

2020-11-17 Thread Sean Gillies
Hi Jukka,

On Tue, Nov 17, 2020 at 2:57 AM jratike80 <
jukka.rahko...@maanmittauslaitos.fi> wrote:

> Hi,
>
> I have done some helpdesk work within the GDAL community and I know well
> that the open options and config options are confusing. I also know that
> they exists for a reason but simplified and uniform way to use them would
> be
> nice.
>

I'm glad to read that you're interested! I believe there is tension between
simple and uniform and we'll need wide input to find a good balance.

I've been using QGIS today for the first time in a while and I think that
application could benefit from this as well. A "URL" for datasets could
remove some of the complexity of QGIS dialog boxes for creating or
configuring connections.


> Some comments on comments:
>
> >> gdalinfo my.tif -oo GEOREF_SOURCES=WORLDFILE,PAM
> >>
>
> > Ideally this would be baked into the format, but, yes, I think we've got
> a
> > bead on dataset open options.
>
> I don't know how it could be baked into the format. The option gives user
> an
> option to override wrong GeoTIFF georeferencing with wordfile, for example.
>

Yes.


> >> gdalinfo BAG:"data/test_vr.bag":supergrid:0:1
> >>
>
> > DRIVER:"file":something
>
> > Right. This will require some work because of multiple colons. Though
> I've
> > never seen BAG driver data in the wild. Is this a real live format?
>
> As far as I know BAG is the hdf of bathymetry and widely used in that
> context.
>
> >> gdalinfo data/test_vr.bag -oo MODE=RESAMPLED_GRID -oo
> SUPERGRIDS_MASK=YES
> >> gdalinfo HDF5:"d:\foo.he5"://HDFEOS/SWATHS/foo/bar
> >>
>
> > HDF5 driver, filename using Windows drive, and UNC path within it. This
> is
> > marginal, right?
>
> The part beginning with // is not UNC path but the name of the subdataset
> within hdf5 file https://gdal.org/drivers/raster/hdf5.html. Not more
> marginal than HDF5 itself.
>
>
Thank you for explaining.


> >> ogrinfo OCI:warmerda/password@.dreadfest
>
> >Wat?
>
> Text has just been formatted into email link because of the @ sign that
> belongs to the Oracle connection string "username" / "password" @ "the name
> of the Oracle database as it appears in the tnsnames.ora file". Let's see
> if
> formatting happens again when I send this from Nabble:
>
> OCI:warmerda/passw...@gdal800.dreadfest.com abc.shp
>

Thanks.


> -Jukka Rahkonen-
>
>
>
>
> Sean Gillies-3 wrote
> > Hi Even,
> >
> > On Wed, Nov 4, 2020 at 9:01 AM Even Rouault 
>
> > even.rouault@
>
> > 
> > wrote:
> >
> >> > > Another particularity we have in GDAL is that the dataset name might
> >> be
> >> > > almost
> >> > > anything. Most of the time, it is a regular file path, or some /vsi
> >> path.
> >> > > But
> >> > > sometimes, it can be JSON content (the GeoJSON driver accepts the
> >> content
> >> > > to
> >> > > be directly provided as the dataset name), or XML (VRT, WMS
> drivers).
> >> > > We have also the subdataset syntax "HDF5:foo.hdf:my_variable"
> >> >
> >> > Could VRT XML and JSON be exempted? We already have a way to embed
> open
> >> > options in the XML.
> >>
> >> If the gdn: mechanism is a new possibility offered that doesn't exclude
> >> existing ones (otherwise that would be a pretty big breaking change), we
> >> could
> >> possibly exempt the odd cases I mentioned (or have some quoting/escaping
> >> rules
> >> to enable that payload to be seen as a file), which generally don't need
> >> a
> >> "permanent" way of refering to the dataset like gdn: would offer, since
> >> this
> >> is content often generated programatically or retrieved dynamically.
> >>
> >> Covering subdataset would be a more important use case. Something that
> >> would
> >> have to be decided if the way we express subdatasets would be somehow
> >> standardized or if it would be a black-box string for the gdn:
> >> encapsulation.
> >> For a black-box approach, we would have to define some escaping/quoting
> >> rules
> >> to avoid any potential issue with separators of the gdn syntax. If we
> >> decide
> >> that the subdataset syntax is part of what is standardized by GDN that
> >> would
> >> be a more challenging exercice, because the subdataset syntax varies
> from
> >> driver to driver.
> >>
> >
> > The variation of subdataset syntax among drivers is a bug, let's try to
> > fix
> > this.
> >
> > It seems to me that the internet way to address subdatasets would be to
> > use
> > a # URL fragment. But since most of our formats and the servers that
> serve
> > files of these formats are not aware, we may have to come up with
> > something
> > different. We may need to consider making subdatasets a layer opening
> > option?
> >
> > pending on how we design things, that might impact between:
> >> - just GDALOpen() generic code if GDALOpen() decodes the gdn: string to
> >> decompose it into 'classic' dataset names and open options
> >> - all drivers if the gdn: string would be passed to each
> >> GDALDriver::pfnOpen()
> >> implementation
> >> - intermediate situation if we decide to drop (at least for 

Re: [gdal-dev] Passing open options along dataset name in a string ?

2020-11-17 Thread Even Rouault
> The variation of subdataset syntax among drivers is a bug, let's try to fix
> this.
> 
> It seems to me that the internet way to address subdatasets would be to use
> a # URL fragment. But since most of our formats and the servers that serve
> files of these formats are not aware, we may have to come up with something
> different. We may need to consider making subdatasets a layer opening
> option?

Hum, I'm a bit confused. Isn't the purpose to have a single string covering 
subdataset specification and open options ?

Because you could potentially have use cases where you open a "container" 
dataset with its name and open options (not selecting a particular subdataset) 
and the GetMetadata("SUBDATASETS") should return potentially a GDN that would 
have the same open options but also additions foreach specific subdataset.

Let's say "gdalinfo my.hdf5 -oo FOO=BAR" would return a list of subdatasets:
gdn:HDF5:my.hdf5+encoding_FOO=BAR+encoding_VARIABLE=temperature
gdn:HDF5:my.hdf5+encoding_FOO=BAR+encoding_VARIABLE=pressure


Some additions to Jukka's answer:

> > gdalinfo GTIFF_DIR:0:d:\my.tif
> 
> WTF is this? :)

https://gdal.org/drivers/raster/gtiff.html :
"""
Multi-page TIFF files are exposed as subdatasets. On opening, a subdataset 
name is GTIFF_DIR:{index}:filename.tif, where {index} starts at 1.
"""
(ok, so my example was wrong :-) should have benn GTIFF_DIR:1:d:\my.tif)

> 
> > gdalinfo EEDAI:my/asset
> > gdalinfo EEDAI: -oo ASSET=my/asset
> > gdalinfo EEDAI:my/asset:band1, band2
> > gdalinfo EEDAI: -oo ASSET=my/asset -oo BANDS=band1,band2
> 
> Never seen these.

Cf https://gdal.org/drivers/raster/eedai.html

This driver shows a case where we handle both worlds. The specification of a 
dataset can be in the dataset name ("EEDAI:my/asset") or as a dataset name 
("EEDAI:") + open options (ASSET=my/asset). This was my attempt to use open 
options as a way of having more explictness on how to specify the subparts of 
a subdataset, but perhaps this wasn't a good idea. But this is a case where 
the border between what is the dataset/subdataset name and what is an open 
option is fuzzy. "gdalinfo EEDAI:" without any option will not work: you can't 
reasonably list all datasets hosted on Earth Engine...

But in some circumstances (when all bands don’t have the same georeferencing, 
resolution, CRS or image dimensions), whatever you open with  dataset name 
("EEDAI:my/asset") or as a dataset name ("EEDAI:") + open options (ASSET=my/
asset), you may get a list of subdatasets

> > GDALOpen() is not even aware that HDF5:bla means that the dataset will be
> > recognized by the HDF5 driver
> 
> Wait what?

GDALOpen() just iterates over drivers and passes the dataset name and open 
options to them until one says "yes, that's for me". The use of 
"DRIVER_NAME:bla" is mostly a convention, but in no way a core mechanism. If 
you use DRIVER_NAME:bla fo a driver that doesn't recognize the DRIVER_NAME: 
prefix, that won't work.

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] Passing open options along dataset name in a string ?

2020-11-17 Thread jratike80
Hi,

I have done some helpdesk work within the GDAL community and I know well
that the open options and config options are confusing. I also know that
they exists for a reason but simplified and uniform way to use them would be
nice.

Some comments on comments:

>> gdalinfo my.tif -oo GEOREF_SOURCES=WORLDFILE,PAM
>>

> Ideally this would be baked into the format, but, yes, I think we've got a
> bead on dataset open options.

I don't know how it could be baked into the format. The option gives user an
option to override wrong GeoTIFF georeferencing with wordfile, for example.

>> gdalinfo BAG:"data/test_vr.bag":supergrid:0:1
>>

> DRIVER:"file":something

> Right. This will require some work because of multiple colons. Though I've
> never seen BAG driver data in the wild. Is this a real live format?

As far as I know BAG is the hdf of bathymetry and widely used in that
context.

>> gdalinfo data/test_vr.bag -oo MODE=RESAMPLED_GRID -oo SUPERGRIDS_MASK=YES
>> gdalinfo HDF5:"d:\foo.he5"://HDFEOS/SWATHS/foo/bar
>>

> HDF5 driver, filename using Windows drive, and UNC path within it. This is
> marginal, right?

The part beginning with // is not UNC path but the name of the subdataset
within hdf5 file https://gdal.org/drivers/raster/hdf5.html. Not more
marginal than HDF5 itself.

>> ogrinfo OCI:warmerda/password@.dreadfest

>Wat?

Text has just been formatted into email link because of the @ sign that
belongs to the Oracle connection string "username" / "password" @ "the name
of the Oracle database as it appears in the tnsnames.ora file". Let's see if
formatting happens again when I send this from Nabble:

OCI:warmerda/passw...@gdal800.dreadfest.com abc.shp

-Jukka Rahkonen-




Sean Gillies-3 wrote
> Hi Even,
> 
> On Wed, Nov 4, 2020 at 9:01 AM Even Rouault 

> even.rouault@

> 
> wrote:
> 
>> > > Another particularity we have in GDAL is that the dataset name might
>> be
>> > > almost
>> > > anything. Most of the time, it is a regular file path, or some /vsi
>> path.
>> > > But
>> > > sometimes, it can be JSON content (the GeoJSON driver accepts the
>> content
>> > > to
>> > > be directly provided as the dataset name), or XML (VRT, WMS drivers).
>> > > We have also the subdataset syntax "HDF5:foo.hdf:my_variable"
>> >
>> > Could VRT XML and JSON be exempted? We already have a way to embed open
>> > options in the XML.
>>
>> If the gdn: mechanism is a new possibility offered that doesn't exclude
>> existing ones (otherwise that would be a pretty big breaking change), we
>> could
>> possibly exempt the odd cases I mentioned (or have some quoting/escaping
>> rules
>> to enable that payload to be seen as a file), which generally don't need
>> a
>> "permanent" way of refering to the dataset like gdn: would offer, since
>> this
>> is content often generated programatically or retrieved dynamically.
>>
>> Covering subdataset would be a more important use case. Something that
>> would
>> have to be decided if the way we express subdatasets would be somehow
>> standardized or if it would be a black-box string for the gdn:
>> encapsulation.
>> For a black-box approach, we would have to define some escaping/quoting
>> rules
>> to avoid any potential issue with separators of the gdn syntax. If we
>> decide
>> that the subdataset syntax is part of what is standardized by GDN that
>> would
>> be a more challenging exercice, because the subdataset syntax varies from
>> driver to driver.
>>
> 
> The variation of subdataset syntax among drivers is a bug, let's try to
> fix
> this.
> 
> It seems to me that the internet way to address subdatasets would be to
> use
> a # URL fragment. But since most of our formats and the servers that serve
> files of these formats are not aware, we may have to come up with
> something
> different. We may need to consider making subdatasets a layer opening
> option?
> 
> pending on how we design things, that might impact between:
>> - just GDALOpen() generic code if GDALOpen() decodes the gdn: string to
>> decompose it into 'classic' dataset names and open options
>> - all drivers if the gdn: string would be passed to each
>> GDALDriver::pfnOpen()
>> implementation
>> - intermediate situation if we decide to drop (at least for future
>> drivers)
>> per-driver subdataset syntax (which has deficiencies has the quoting
>> rules
>> to
>> separate the filename from the non-filename component vary from driver to
>> driver, and are most of the time not defined) to come up with something
>> more
>> standardized
>>
>> To help brainstorming, a non-exhaustive overview of a few situations
>> mixing
>> driver prefixing, subdataset syntax and open options:
>>
>> gdalinfo my.tif
>>
> 
> Yes. We have to handle bare paths to local dataset files.
> 
> 
>> gdalinfo my.tif -oo GEOREF_SOURCES=WORLDFILE,PAM
>>
> 
> Ideally this would be baked into the format, but, yes, I think we've got a
> bead on dataset open options.
> 
> 
>> gdalinfo GTIFF_DIR:0:d:\my.tif
>>
> 
> WTF is this? :)
> 
> 
>> gdalinfo EEDAI:my/asset
>> 

Re: [gdal-dev] Passing open options along dataset name in a string ?

2020-11-16 Thread Sean Gillies
Hi Even,

On Wed, Nov 4, 2020 at 9:01 AM Even Rouault 
wrote:

> > > Another particularity we have in GDAL is that the dataset name might be
> > > almost
> > > anything. Most of the time, it is a regular file path, or some /vsi
> path.
> > > But
> > > sometimes, it can be JSON content (the GeoJSON driver accepts the
> content
> > > to
> > > be directly provided as the dataset name), or XML (VRT, WMS drivers).
> > > We have also the subdataset syntax "HDF5:foo.hdf:my_variable"
> >
> > Could VRT XML and JSON be exempted? We already have a way to embed open
> > options in the XML.
>
> If the gdn: mechanism is a new possibility offered that doesn't exclude
> existing ones (otherwise that would be a pretty big breaking change), we
> could
> possibly exempt the odd cases I mentioned (or have some quoting/escaping
> rules
> to enable that payload to be seen as a file), which generally don't need a
> "permanent" way of refering to the dataset like gdn: would offer, since
> this
> is content often generated programatically or retrieved dynamically.
>
> Covering subdataset would be a more important use case. Something that
> would
> have to be decided if the way we express subdatasets would be somehow
> standardized or if it would be a black-box string for the gdn:
> encapsulation.
> For a black-box approach, we would have to define some escaping/quoting
> rules
> to avoid any potential issue with separators of the gdn syntax. If we
> decide
> that the subdataset syntax is part of what is standardized by GDN that
> would
> be a more challenging exercice, because the subdataset syntax varies from
> driver to driver.
>

The variation of subdataset syntax among drivers is a bug, let's try to fix
this.

It seems to me that the internet way to address subdatasets would be to use
a # URL fragment. But since most of our formats and the servers that serve
files of these formats are not aware, we may have to come up with something
different. We may need to consider making subdatasets a layer opening
option?

pending on how we design things, that might impact between:
> - just GDALOpen() generic code if GDALOpen() decodes the gdn: string to
> decompose it into 'classic' dataset names and open options
> - all drivers if the gdn: string would be passed to each
> GDALDriver::pfnOpen()
> implementation
> - intermediate situation if we decide to drop (at least for future
> drivers)
> per-driver subdataset syntax (which has deficiencies has the quoting rules
> to
> separate the filename from the non-filename component vary from driver to
> driver, and are most of the time not defined) to come up with something
> more
> standardized
>
> To help brainstorming, a non-exhaustive overview of a few situations
> mixing
> driver prefixing, subdataset syntax and open options:
>
> gdalinfo my.tif
>

Yes. We have to handle bare paths to local dataset files.


> gdalinfo my.tif -oo GEOREF_SOURCES=WORLDFILE,PAM
>

Ideally this would be baked into the format, but, yes, I think we've got a
bead on dataset open options.


> gdalinfo GTIFF_DIR:0:d:\my.tif
>

WTF is this? :)


> gdalinfo EEDAI:my/asset
> gdalinfo EEDAI: -oo ASSET=my/asset
> gdalinfo EEDAI:my/asset:band1, band2
> gdalinfo EEDAI: -oo ASSET=my/asset -oo BANDS=band1,band2
>

Never seen these.


> gdalinfo BAG:"data/test_vr.bag":supergrid:0:1
>

DRIVER:"file":something

Right. This will require some work because of multiple colons. Though I've
never seen BAG driver data in the wild. Is this a real live format?


> gdalinfo data/test_vr.bag -oo MODE=RESAMPLED_GRID -oo SUPERGRIDS_MASK=YES
> gdalinfo HDF5:"d:\foo.he5"://HDFEOS/SWATHS/foo/bar
>

HDF5 driver, filename using Windows drive, and UNC path within it. This is
marginal, right?


> gdalinfo netCDF:"/vsicurl/http://example.com/my.nc":my_var
>

This looks less complicated than some of the examples above.


> ogrinfo "PG:dbname=testdb user=foo"
> ogrinfo "mySQL:testdb,user=foo"
>

These seem like they could be driver specific, but generalized key-value
parameters.


> ogrinfo OCI:warmerda/passw...@gdal800.dreadfest.com


Wat?


> GDALOpen() is not even aware that HDF5:bla means that the dataset will be
> recognized by the HDF5 driver
>
>
Wait what?

-- 
Sean Gillies
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] Passing open options along dataset name in a string ?

2020-11-04 Thread Sean Gillies
Even,

On Wed, Nov 4, 2020 at 3:40 AM Even Rouault 
wrote:

> Sean,
>
> What GDN stands for: GDAL Dataset Name ?
>

Yes. I just made that up on the spot. Think of it as a GDAL or FOSS4G
specific namespace. Until now, GDAL has been using symbols like WFS: and
HDF5: in the global namespace, which causes problems for interoperability.


>
> > The URN or GDN version might look something like the thing below, using
> ?+
> > and ?= [3] to identify vsi and driver option sections
> >
> > gdn:curl:csv:
> >
> example.com/foo.csv?a=1=2?+max_retry=5?=autodetect_type=yes_geom_colu
> > mns=no
>
> The http or https protocol should be captured too.
>
> > Bringing a little more order to how we name and address datasets was on
> my
> > todo list at the start of the year, but then 2020 went into a spiral. I
> > don't think rasterio's "zip+s3" etc approach is the best.
>
> I see fsspec has a syntax for chaining filesystems in
> https://filesystem-spec.readthedocs.io/en/latest/features.html#url-chaining


Yes, I recognized this as being a different approach to a similar problem
we have in GDAL: data files that can be accessed with different layers of
optional protocols.


>
>
> Another particularity we have in GDAL is that the dataset name might be
> almost
> anything. Most of the time, it is a regular file path, or some /vsi path.
> But
> sometimes, it can be JSON content (the GeoJSON driver accepts the content
> to
> be directly provided as the dataset name), or XML (VRT, WMS drivers).
> We have also the subdataset syntax "HDF5:foo.hdf:my_variable"
>

Could VRT XML and JSON be exempted? We already have a way to embed open
options in the XML.

-- 
Sean Gillies
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] Passing open options along dataset name in a string ?

2020-11-04 Thread Even Rouault
> > Another particularity we have in GDAL is that the dataset name might be
> > almost
> > anything. Most of the time, it is a regular file path, or some /vsi path.
> > But
> > sometimes, it can be JSON content (the GeoJSON driver accepts the content
> > to
> > be directly provided as the dataset name), or XML (VRT, WMS drivers).
> > We have also the subdataset syntax "HDF5:foo.hdf:my_variable"
> 
> Could VRT XML and JSON be exempted? We already have a way to embed open
> options in the XML.

If the gdn: mechanism is a new possibility offered that doesn't exclude 
existing ones (otherwise that would be a pretty big breaking change), we could 
possibly exempt the odd cases I mentioned (or have some quoting/escaping rules 
to enable that payload to be seen as a file), which generally don't need a 
"permanent" way of refering to the dataset like gdn: would offer, since this 
is content often generated programatically or retrieved dynamically.

Covering subdataset would be a more important use case. Something that would 
have to be decided if the way we express subdatasets would be somehow 
standardized or if it would be a black-box string for the gdn: encapsulation. 
For a black-box approach, we would have to define some escaping/quoting rules 
to avoid any potential issue with separators of the gdn syntax. If we decide 
that the subdataset syntax is part of what is standardized by GDN that would 
be a more challenging exercice, because the subdataset syntax varies from 
driver to driver.
Depending on how we design things, that might impact between:
- just GDALOpen() generic code if GDALOpen() decodes the gdn: string to 
decompose it into 'classic' dataset names and open options
- all drivers if the gdn: string would be passed to each GDALDriver::pfnOpen() 
implementation
- intermediate situation if we decide to drop (at least for future drivers) 
per-driver subdataset syntax (which has deficiencies has the quoting rules to 
separate the filename from the non-filename component vary from driver to 
driver, and are most of the time not defined) to come up with something more 
standardized

To help brainstorming, a non-exhaustive overview of a few situations mixing 
driver prefixing, subdataset syntax and open options:

gdalinfo my.tif
gdalinfo my.tif -oo GEOREF_SOURCES=WORLDFILE,PAM
gdalinfo GTIFF_DIR:0:d:\my.tif
gdalinfo EEDAI:my/asset
gdalinfo EEDAI: -oo ASSET=my/asset
gdalinfo EEDAI:my/asset:band1, band2
gdalinfo EEDAI: -oo ASSET=my/asset -oo BANDS=band1,band2
gdalinfo BAG:"data/test_vr.bag":supergrid:0:1
gdalinfo data/test_vr.bag -oo MODE=RESAMPLED_GRID -oo SUPERGRIDS_MASK=YES
gdalinfo HDF5:"d:\foo.he5"://HDFEOS/SWATHS/foo/bar
gdalinfo netCDF:"/vsicurl/http://example.com/my.nc":my_var
ogrinfo "PG:dbname=testdb user=foo"
ogrinfo "mySQL:testdb,user=foo"
ogrinfo OCI:warmerda/passw...@gdal800.dreadfest.com

GDALOpen() is not even aware that HDF5:bla means that the dataset will be 
recognized by the HDF5 driver

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] Passing open options along dataset name in a string ?

2020-11-04 Thread Even Rouault
Sean,

What GDN stands for: GDAL Dataset Name ?

> The URN or GDN version might look something like the thing below, using ?+
> and ?= [3] to identify vsi and driver option sections
> 
> gdn:curl:csv:
> example.com/foo.csv?a=1=2?+max_retry=5?=autodetect_type=yes_geom_colu
> mns=no

The http or https protocol should be captured too.

> Bringing a little more order to how we name and address datasets was on my
> todo list at the start of the year, but then 2020 went into a spiral. I
> don't think rasterio's "zip+s3" etc approach is the best.

I see fsspec has a syntax for chaining filesystems in
https://filesystem-spec.readthedocs.io/en/latest/features.html#url-chaining

Another particularity we have in GDAL is that the dataset name might be almost 
anything. Most of the time, it is a regular file path, or some /vsi path. But 
sometimes, it can be JSON content (the GeoJSON driver accepts the content to 
be directly provided as the dataset name), or XML (VRT, WMS drivers).
We have also the subdataset syntax "HDF5:foo.hdf:my_variable"

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] Passing open options along dataset name in a string ?

2020-11-03 Thread Sean Gillies
Even,

On Mon, Nov 2, 2020 at 1:16 PM Even Rouault 
wrote:

> Sean,
>
> > We already have a way of passing "open" options for vsicurl:
> >
> https://gdal.org/user/virtual_file_systems.html#vsicurl-http-https-ftp-files
> > -random-access. What about reusing that conceptual framework and syntax?
> >
> > For example:
> >
> > "foo.csv?AUTODETECT_TYPE=YES_GEOM_COLUMNS=NO"
>
> I actually considered that, but realized that things would get messy if
> you want
> to use that vsicurl syntax and open options...
>
> You would then have strings like
>
> /vsicurl?max_retry=5=
> http://example.com/foo.csv_TYPE=YES_GEOM_COLUMNS=NO
>
> and the GDALOpen() logic would have to figure out whas is the /vsicurl
> part and the open option part.
>
> Or we would have to URL-escape the "/vsicurl?max_retry=5=
> http://example.com/foo.csv; part
> to avoid using '?' and '&', like:
>
> /vsicurl%3Fmax_retry=5%26url=
> http://example.com/foo.csv?AUTODETECT_TYPE=YES_GEOM_COLUMNS=NO
>
>
> Another issue is we have connection strings like "WFS:
> http://example.com/wfs?SERVICE=WFS=2.0.0; (or actually
> just the "/vsicurl?max_retry=5=http://example.com/foo.csv; string
> mentioned above).
> GDALOpen() would then mis-interpret this as dataset name = "WFS:
> http://example.com/wfs;
> with open options SERVICE=WFS and VERSION=2.0.0
>

I see.

I wish our data formats were more standard and less slippery and didn't
need these open options. But it's true that some files are very different
without the proper combination of opening options and there's a benefit to
helping applications use the right combination.

I'm not a fan of the mix of JSON and not-JSON elements in the syntax you
proposed. I think a good solution for naming datasets and including all the
driver options and vsi options looks more like a URN [1] and I think we
should write a GDAL RFC to standardize it. I also think that we should get
some people outside of GDAL involved. Like folks from the Dask community,
who might share some lessons learned from writing fsspec [2].

The URN or GDN version might look something like the thing below, using ?+
and ?= [3] to identify vsi and driver option sections

gdn:curl:csv:
example.com/foo.csv?a=1=2?+max_retry=5?=autodetect_type=yes_geom_columns=no

Bringing a little more order to how we name and address datasets was on my
todo list at the start of the year, but then 2020 went into a spiral. I
don't think rasterio's "zip+s3" etc approach is the best. We should start
from scratch and come up with something excellent and expressive and
broadly supported in GDAL, QGIS, rasterio, GeoPandas, GeoTrellis etc.

[1] https://en.wikipedia.org/wiki/Uniform_Resource_Name
[2] https://filesystem-spec.readthedocs.io/en/latest/index.html
[3] https://tools.ietf.org/html/rfc8141#section-2.3

-- 
Sean Gillies
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] Passing open options along dataset name in a string ?

2020-11-02 Thread Sean Gillies
Hi Even,

On Mon, Nov 2, 2020 at 3:10 AM Even Rouault 
wrote:

> Hi,
>
> I've heard interest in having the capability of passing a GDAL dataset
> name
> and its open options in a single string, since this is easier for storing.
>
> The syntax could be a JSON serialized string prefixed by GDAL_JSON: to
> avoid
> any ambiguity with drivers that would accept JSON as a connection string/
> dataset name.
>
> So something like:
>
> GDAL_JSON:{"dataset":"foo.csv","open_options":["AUTODETECT_TYPE=YES",
> "KEEP_GEOM_COLUMNS=NO"]}
>
> A "allowed_drivers" member could also be added to reflect the
> corresponding
> argument of GDALOpenEx()
>
> GDALOpen()/GDALOpenEx() would parse this, and process that exactly as if
> it
> was called with the dataset name, open options and allowed drivers put in
> the
> dedicated C arguments. So no change in drivers, just in GDALOpenEx().
>
> If using that syntax, it wouldn't make sense to have both serialized
> options
> and options passed as C-argument together, so a warning would be emitted
> if
> that happened
>
> Thoughts ?
>
> Even
>

We already have a way of passing "open" options for vsicurl:
https://gdal.org/user/virtual_file_systems.html#vsicurl-http-https-ftp-files-random-access.
What about reusing that conceptual framework and syntax?

For example:

"foo.csv?AUTODETECT_TYPE=YES_GEOM_COLUMNS=NO"

-- 
Sean Gillies
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] Passing open options along dataset name in a string ?

2020-11-02 Thread Even Rouault
Sean,

> We already have a way of passing "open" options for vsicurl:
> https://gdal.org/user/virtual_file_systems.html#vsicurl-http-https-ftp-files
> -random-access. What about reusing that conceptual framework and syntax?
> 
> For example:
> 
> "foo.csv?AUTODETECT_TYPE=YES_GEOM_COLUMNS=NO"

I actually considered that, but realized that things would get messy if you want
to use that vsicurl syntax and open options...

You would then have strings like

/vsicurl?max_retry=5=http://example.com/foo.csv_TYPE=YES_GEOM_COLUMNS=NO

and the GDALOpen() logic would have to figure out whas is the /vsicurl part and 
the open option part.

Or we would have to URL-escape the 
"/vsicurl?max_retry=5=http://example.com/foo.csv; part
to avoid using '?' and '&', like:

/vsicurl%3Fmax_retry=5%26url=http://example.com/foo.csv?AUTODETECT_TYPE=YES_GEOM_COLUMNS=NO


Another issue is we have connection strings like 
"WFS:http://example.com/wfs?SERVICE=WFS=2.0.0; (or actually
just the "/vsicurl?max_retry=5=http://example.com/foo.csv; string mentioned 
above).
GDALOpen() would then mis-interpret this as dataset name = 
"WFS:http://example.com/wfs;
with open options SERVICE=WFS and VERSION=2.0.0

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

[gdal-dev] Passing open options along dataset name in a string ?

2020-11-02 Thread Even Rouault
Hi,

I've heard interest in having the capability of passing a GDAL dataset name 
and its open options in a single string, since this is easier for storing.

The syntax could be a JSON serialized string prefixed by GDAL_JSON: to avoid 
any ambiguity with drivers that would accept JSON as a connection string/
dataset name.

So something like:

GDAL_JSON:{"dataset":"foo.csv","open_options":["AUTODETECT_TYPE=YES", 
"KEEP_GEOM_COLUMNS=NO"]}

A "allowed_drivers" member could also be added to reflect the corresponding 
argument of GDALOpenEx()

GDALOpen()/GDALOpenEx() would parse this, and process that exactly as if it 
was called with the dataset name, open options and allowed drivers put in the 
dedicated C arguments. So no change in drivers, just in GDALOpenEx().

If using that syntax, it wouldn't make sense to have both serialized options 
and options passed as C-argument together, so a warning would be emitted if 
that happened

Thoughts ?

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev