Re: [gdal-dev] HDF5 and identified fields / primary dimension

2024-04-03 Thread Michael Sumner via gdal-dev
> For that particular file, I see that the "feature_id" variable
> (corresponding to the "feature_id" dimension) has a cf_role =
> "timeseries_id" attribute, and that the global metadata has a
> featureType = "timeSeries" attribute. So given
>
> https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#coordinates-metadata
> , this seems to be relatively standardized, and in that case the
> heuristics could be improve to recognize that the main dimension is
> feature_id (probably with a test that the size of the time dimension is
> 1).  As far as I can see/remember, the vector layer support in netCDF
> was originally developed for the featureType=point and profile use cases
> , so some tuning for timeseries isn't unexpected
>
>
Thanks!  I've made *some* progress, the deepest I've been down in that file
... I hope to be able to craft these suggestions at some point.

Cheers, Mike



> Even
>
> --
> http://www.spatialys.com
> My software is free, but my time generally not.
>
>

-- 
Michael Sumner
Software and Database Engineer
Australian Antarctic Division
Hobart, Australia
e-mail: mdsum...@gmail.com
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] HDF5 and identified fields / primary dimension

2024-04-01 Thread Even Rouault via gdal-dev

Michael,


Warning 1: The dataset has several variables that could be identified 
as vector fields, but not all share the same primary dimension. 
Consequently they will be ignored.
Yes, the driver is super conservative/picky when trying to recognize a 
netCDF file as a vector layer, and its heuristics will return in error 
if there is any ambiguity.


I've seen similar cases in other files. I presume the driver could be 
updated to 1) choose the primary dimension and read the values while 
ignore others 2) user-specify the dimension to include, or 3) 
user-specify the fields to exclude


I guess option 2 could be reasonable as an open option

For that particular file, I see that the "feature_id" variable 
(corresponding to the "feature_id" dimension) has a cf_role = 
"timeseries_id" attribute, and that the global metadata has a 
featureType = "timeSeries" attribute. So given 
https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#coordinates-metadata 
, this seems to be relatively standardized, and in that case the 
heuristics could be improve to recognize that the main dimension is 
feature_id (probably with a test that the size of the time dimension is 
1).  As far as I can see/remember, the vector layer support in netCDF 
was originally developed for the featureType=point and profile use cases 
, so some tuning for timeseries isn't unexpected


Or maybe if detecting that in the set of dimensions there is only one 
with > 1 sample and others ones are at 1, consider only the one with > 1 
sample


Even

--
http://www.spatialys.com
My software is free, but my time generally not.

___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] HDF5 and identified fields / primary dimension

2024-04-01 Thread Michael Sumner via gdal-dev
well actually, I think what I'm asking for is the intended behaviour, but
there's an error.

Is it meant to detect sets of variables on 1D dimensions and present them
as layers? That's what would make sense to me.

Still exploring.

Cheers, Mike



On Tue, Apr 2, 2024 at 5:36 AM Michael Sumner  wrote:

> This source has an array on 'feature_id' with 2729077 values, with various
> fields
>
> elevation, longitude, latitude, qBtmVertRunoff, qBucket, etc
>
>
> '/vsis3/noaa-nwm-retro-v2.0-pds/full_physics/2017/20170401.CHRTOUT_DOMAIN1.comp'
>
> It is accessible via the mdim api.
>
> Structurally it is basically a table with rows per feature_id and  columns
> per fields, but it has a length-1 pair of fields "time" and
> "reference_time" defined on dimension time, this is like a single time step
> per file (like an unlimited dimension in the classic 2D case).
>
> Accessing with the vector API reports that it can't treat this as a table
> because of those time values that don't match the feature_id dimension:
>
> ogrinfo
> NETCDF:'/vsis3/noaa-nwm-retro-v2.0-pds/full_physics/2017/20170401.CHRTOUT_DOMAIN1.comp'
> -ro
>
> Warning 1: The dataset has several variables that could be identified as
> vector fields, but not all share the same primary dimension. Consequently
> they will be ignored.
>
> I've seen similar cases in other files. I presume the driver could be
> updated to 1) choose the primary dimension and read the values while ignore
> others 2) user-specify the dimension to include, or 3) user-specify the
> fields to exclude
>
> So:
>
> - is there a workaround to enable the vector driver to focus on the
> primary dimension?
> - would a PR along those lines have to consider greater difficulties than
> applying the proposed updates to arrays using the primary dimension only?
>  I'd only consider this for strictly 1D arrays.
> - degenerate dimensions could be used to copy-out the value of the other
> dims (I'd consider this an optional extra)
>
> (It's a bit special-case-y, you wouldn't want to go to multi-arrays and
> have them flatten out multi-dims in a general way, I think, but degenerate
> dimensions might be worth consideration )
>
> Appreciate any thoughts, thanks! I'd quite like to have the
> vector-approach work as well as the mdim approach, I think they are nicely
> complementary and provide different pros and cons.
>
> Cheers, Mike
>
> --
> Michael Sumner
> Software and Database Engineer
> Australian Antarctic Division
> Hobart, Australia
> e-mail: mdsum...@gmail.com
>


-- 
Michael Sumner
Software and Database Engineer
Australian Antarctic Division
Hobart, Australia
e-mail: mdsum...@gmail.com
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev