Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries

2017-02-02 Thread David Blodgett
Dear Jonathan,

As I mentioned in my response yesterday, we have worked through these issues 
and think we have a compromise proposal for the community. 

Since the conversation is active, I’ll go ahead and share our work with the 
list in a follow up email in a moment.

A couple specific responses to your note below:

There is a balance to be found between “opaque” and “transparent” encoding of 
geometries (or ragged arrays for that matter). “Transparent” tends to require 
more dimensions and really breaks the geometries apart while “opaque” starts to 
impinge on the human readability and self-describing ideals of CF. We feel that 
a middle way is available to us and I’ll outline the logic for that in my 
follow up.

Regarding the indexed array. The approach (to use node sharing and the indexed 
array notation) we will propose seems to be good for a couple reasons. First, 
it allows data to be topologically intact without the need for node-comparison. 
This is very important for some applications and we feel should be possible in 
the encoding. Second, it is in-line with the approach taken by UGRID and will 
be familiar to some for that reason. The argument around storage volume can go 
either way. You are right that in cases where people don’t have shared nodes, 
it will be extra. But I’m not convinced this is a factor that would change the 
decision based on the first two considerations.

Regards,

- Dave

> On Feb 2, 2017, at 3:31 AM, Jonathan Gregory  
> wrote:
> 
> Dear Ben and Chris
> 
> Following Chris's comment about preferring variables to multi-valued
> attributes, here are the examples for linestring and multipolygon redone so
> that both use variables to store the counts of parts and nodes. In this scheme
> more variables and dimensions are needed, but it may be easier to read and it
> is more CF-like, because the topology information is a "container" variable,
> like the CF grid_mapping and the ugrid mesh topology, having no numerical
> information in itself, just with attributes that point to variables.
> 
>  dimensions:
>station = 3; // stream segments
>time = UNLIMITED;
>node = 9; // = 2 + 4 + 3
>  variables:
>float flow(station,time) ;
>  flow:units="m3 s-1";
>  flow:topology="SOMETHING";
>double time(time) ;
>  time:standard_name = "time";
>  time:units = "days since 1970-01-01 00:00:00" ;
>char SOMETHING;
>  SOMETHING:node_coordinates="lon lat";
>  SOMETHING:node_count="node_count_var";
>  SOMETHING:topology_type="linestring";
>int node_count_var(station);  // number of nodes for each linestring
>float lon(node) ;
>  lon:standard_name = "longitude";
>  lon:units = "degrees_east";
>float lat(node) ;
>  lat:standard_name = "latitude";
>  lat:units = "degrees_north" ;
>  data:
>node_count_var=2, 4, 3;
>lon=0, 1,  0, -1, -2, -3,  2, 3, 4;
>lat=51, 52,  51, 50, 50, 49,  55, 55, 56;
> 
>  dimensions:
>station = 3; // collections of polygons
>time = UNLIMITED;
>node = 24; // = 4 + 3 + 3 + 3 + 5 + 3 + 3
>part = 7 ; // = 3 + 2 + 2
>  variables:
>float flow(station,time) ;
>  flow:units="m3 s-1";
>  flow:topology="SOMETHING";
>double time(time) ;
>  time:standard_name = "time";
>  time:units = "days since 1970-01-01 00:00:00" ;
>char SOMETHING;
>  SOMETHING:node_coordinates="lon lat";
>  SOMETHING:node_count="node_count_var";
>  SOMETHING:part_count="part_count_var";
>  SOMETHING:topology_type="multipolygon";
>int node_count_var(part); // number of nodes in each polygon
>int part_count_var(station); // number of polygons in each collection
>float lon(node) ;
>  lon:standard_name = "longitude";
>  lon:units = "degrees_east";
>float lat(node) ;
>  lat:standard_name = "latitude";
>  lat:units = "degrees_north" ;
>  data:
>node_count_var=4, 3, 3, 3, 5, 3, 3;
>part_count_var=3, 2, 2;
>lon=0, 20, 20, 0, ... // first polygon, etc. ...
>lat=0, 0, 20, 20, ...
> 
> Also, two more thoughts regarding not using indirection, but instead
> duplicating coincident coordinate values:
> 
> * Doing it this way (without indexing) is consistent with ordinary CF bounds.
> Contiguous cells in 1D have bounds with equal values. Thus N cells have 2N
> bounds, although usually only N+1 distinct values of bounds. There are several
> reasons why we made this choice, one being that it's more flexible, in 
> allowing
> non-contiguous and overlapping cells.
> 
> * The indexing itself takes space. If you have N (lon,lat) points which are 
> all
> boundaries between two regions, so they're all used twice, you will have 4N
> coordinate values without indexing. With indexing you will have only 2N, but
> the index takes N, making 3N in total. Thus you save 25% of the space, not 
> 50%.
> 
> Best wishes
> 
> Jonathan
> ___
> CF-metadata mailing list
> 

Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries

2017-02-02 Thread Jonathan Gregory
Dear Ben and Chris

Following Chris's comment about preferring variables to multi-valued
attributes, here are the examples for linestring and multipolygon redone so
that both use variables to store the counts of parts and nodes. In this scheme
more variables and dimensions are needed, but it may be easier to read and it
is more CF-like, because the topology information is a "container" variable,
like the CF grid_mapping and the ugrid mesh topology, having no numerical
information in itself, just with attributes that point to variables.

  dimensions:
station = 3; // stream segments
time = UNLIMITED;
node = 9; // = 2 + 4 + 3
  variables:
float flow(station,time) ;
  flow:units="m3 s-1";
  flow:topology="SOMETHING";
double time(time) ;
  time:standard_name = "time";
  time:units = "days since 1970-01-01 00:00:00" ;
char SOMETHING;
  SOMETHING:node_coordinates="lon lat";
  SOMETHING:node_count="node_count_var";
  SOMETHING:topology_type="linestring";
int node_count_var(station);  // number of nodes for each linestring
float lon(node) ;
  lon:standard_name = "longitude";
  lon:units = "degrees_east";
float lat(node) ;
  lat:standard_name = "latitude";
  lat:units = "degrees_north" ;
  data:
node_count_var=2, 4, 3;
lon=0, 1,  0, -1, -2, -3,  2, 3, 4;
lat=51, 52,  51, 50, 50, 49,  55, 55, 56;

  dimensions:
station = 3; // collections of polygons
time = UNLIMITED;
node = 24; // = 4 + 3 + 3 + 3 + 5 + 3 + 3
part = 7 ; // = 3 + 2 + 2
  variables:
float flow(station,time) ;
  flow:units="m3 s-1";
  flow:topology="SOMETHING";
double time(time) ;
  time:standard_name = "time";
  time:units = "days since 1970-01-01 00:00:00" ;
char SOMETHING;
  SOMETHING:node_coordinates="lon lat";
  SOMETHING:node_count="node_count_var";
  SOMETHING:part_count="part_count_var";
  SOMETHING:topology_type="multipolygon";
int node_count_var(part); // number of nodes in each polygon
int part_count_var(station); // number of polygons in each collection
float lon(node) ;
  lon:standard_name = "longitude";
  lon:units = "degrees_east";
float lat(node) ;
  lat:standard_name = "latitude";
  lat:units = "degrees_north" ;
  data:
node_count_var=4, 3, 3, 3, 5, 3, 3;
part_count_var=3, 2, 2;
lon=0, 20, 20, 0, ... // first polygon, etc. ...
lat=0, 0, 20, 20, ...

Also, two more thoughts regarding not using indirection, but instead
duplicating coincident coordinate values:

* Doing it this way (without indexing) is consistent with ordinary CF bounds.
Contiguous cells in 1D have bounds with equal values. Thus N cells have 2N
bounds, although usually only N+1 distinct values of bounds. There are several
reasons why we made this choice, one being that it's more flexible, in allowing
non-contiguous and overlapping cells.

* The indexing itself takes space. If you have N (lon,lat) points which are all
boundaries between two regions, so they're all used twice, you will have 4N
coordinate values without indexing. With indexing you will have only 2N, but
the index takes N, making 3N in total. Thus you save 25% of the space, not 50%.

Best wishes

Jonathan
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries

2017-02-01 Thread David Blodgett
Dear Jonathan and Chris, 

Thanks for bringing this thread back to life! Please don’t take silence on the 
part of Ben and I as a lack of activity. 

We have been working on a thorough proposal and are hoping to share it with the 
community very soon.

Chris, I think you will find a number of things in our proposal to your liking. 
We have attempted to reconcile a number of issues you brought up with our 
original proposal and the ideas the Jonathan shared. 

We are working through one last issue (what to do with the old “point” feature 
type now that we have geometries that are a superset). Once we have some 
finality on that, I will be circulating a proposal.

Regards.

- Dave
 
> On Feb 1, 2017, at 11:00 AM, Jonathan Gregory  
> wrote:
> 
> Dear Chris
> 
>> I really don't like storing info like this in an attribute -- I think it
>> should be another variable, instead. it is a bit tricky with "nested" data
>> like this, but yu can link variables together with something like:
>> 
>>int SOMETHING(station); // number of polygons in each collection
>>  SOMETHING:node_coordinates="lon lat";
>>  SOMETHING:geometry_type="multipolygon";
>>  SOMETHING:node_count="node_count_1"
>>int node_count_1(num_nodes);
>> 
>> ...
>> data
>>node_count_1 = 4, 3, 3, 3, 5, 3, 3;
> 
> Yes, I thought of doing it that way too: that is, use a string attribute to
> name a vector integer variable, rather than using a vector integer attribute.
> This/your way is more consistent with CF in general, where we have few vector
> attributes, and none with variable dimension. So I actually prefer it. I 
> didn't
> do it that way because I thought it looked simpler with an attribute. But I
> don't mind.
> 
>> Thus I
>>> have combined the two variables I suggested last time (number_of_parts and
>>> number_of_nodes) into SOMETHING.
>>> 
>> I think we should come up with a better name here -- it would help be parse
>> it anyway :-)
> 
> Indeed. :-) SOMETHING is just the variable name, not the term for this kind
> of variable. It might be called a topology variable, for instance.
> 
> Speaking of that, I wonder whether topology_type is a better name than
> geometry_type for the specification as points, lines or polygons. That is
> topological information.
> 
> Best wishes
> 
> Jonathan
> ___
> CF-metadata mailing list
> CF-metadata@cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> 

___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries

2017-02-01 Thread Jonathan Gregory
Dear Chris

> I really don't like storing info like this in an attribute -- I think it
> should be another variable, instead. it is a bit tricky with "nested" data
> like this, but yu can link variables together with something like:
> 
> int SOMETHING(station); // number of polygons in each collection
>   SOMETHING:node_coordinates="lon lat";
>   SOMETHING:geometry_type="multipolygon";
>   SOMETHING:node_count="node_count_1"
> int node_count_1(num_nodes);
> 
> ...
> data
> node_count_1 = 4, 3, 3, 3, 5, 3, 3;

Yes, I thought of doing it that way too: that is, use a string attribute to
name a vector integer variable, rather than using a vector integer attribute.
This/your way is more consistent with CF in general, where we have few vector
attributes, and none with variable dimension. So I actually prefer it. I didn't
do it that way because I thought it looked simpler with an attribute. But I
don't mind.

> Thus I
> > have combined the two variables I suggested last time (number_of_parts and
> > number_of_nodes) into SOMETHING.
> >
> I think we should come up with a better name here -- it would help be parse
> it anyway :-)

Indeed. :-) SOMETHING is just the variable name, not the term for this kind
of variable. It might be called a topology variable, for instance.

Speaking of that, I wonder whether topology_type is a better name than
geometry_type for the specification as points, lines or polygons. That is
topological information.

Best wishes

Jonathan
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries

2017-02-01 Thread Chris Barker
My CDL-reading was off a bit yesterday, so:

On Tue, Jan 31, 2017 at 1:22 AM, Jonathan Gregory  wrote:


> So, for example, we could
> store three timeseries, each applying to a collection of polygons, like
> this:
>
>   dimensions:
> station = 3; // collections of polygons
> time = UNLIMITED;
> node = 24; // = 4 + 3 + 3 + 3 + 5 + 3 + 3
>   variables:
> float flow(station,time) ;
>   flow:units="m3 s-1";
>   flow:topology="SOMETHING";
> double time(time) ;
>   time:standard_name = "time";
>   time:units = "days since 1970-01-01 00:00:00" ;
> int SOMETHING(station); // number of polygons in each collection
>   SOMETHING:node_coordinates="lon lat";
>   SOMETHING:geometry_type="multipolygon";
>   SOMETHING:nodes=4, 3, 3, 3, 5, 3, 3; // number of nodes in each
> polygon
>

I really don't like storing info like this in an attribute -- I think it
should be another variable, instead. it is a bit tricky with "nested" data
like this, but yu can link variables together with something like:

int SOMETHING(station); // number of polygons in each collection
  SOMETHING:node_coordinates="lon lat";
  SOMETHING:geometry_type="multipolygon";
  SOMETHING:node_count="node_count_1"
int node_count_1(num_nodes);

...
data
node_count_1 = 4, 3, 3, 3, 5, 3, 3;

Thus I
> have combined the two variables I suggested last time (number_of_parts and
> number_of_nodes) into SOMETHING.
>

I think we should come up with a better name here -- it would help be parse
it anyway :-)

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries

2017-02-01 Thread Jonathan Gregory
Dear Chris

Thanks for your comments on my comments. Here are replies to a subset!

> > Your aim is to
> > describe the network alone.
...
>  > You would like to have SOMETHING alone in the file, just to
> > describe the network itself. CF doesn't do this at present (domain without
> > data),
> 
> I don't see a conflict here -- if you can describe the network (geometry)
> then you can associate data with it (UGRID used indexes into cells, nodes,
> etc, this should be equally applicable)
> 
> isn't a set of coordinate variables essentially do that? i.e. you can
> define a rectangular grid -- even if there is no data on it. And you can
> certainly do that with UGRID, which is another standard, but I don't think
> it conflicts with CF.

There isn't a conflict, I agree, but it's not currently possible in CF. That
is because the data variable has all the coordinates attached to it, so you
can't have coordinates without data. Of course it could easily be done, for
instance by providing a dummy data variable which identified the dimensions
but was itself a scalar - that's been discussed before, but no-one's proposed
yet to add it to the convention. It's not a conceptual difficulty, but it is
an addition to the data model.

> >   data:
> > SOMETHING=2, 4, 3;
> > lon=0, 1,  0, -1, -2, -3,  2, 3, 4;
> > lat=51, 52,  51, 50, 50, 49,  55, 55, 56;
> I'm confused about what this is.

There are three linestrings. The SOMETHING variable says how many nodes each
has, and the lon and lat variables are the coordinates of those nodes.

> > For the sake of applications which can
> > read CF but don't understand simple geometries, it might be a good idea in
> > addition to provide a "representative" location for each timeseries, as
> > representive_lat(station) and representative_lon(station), which could for
> > instance be the mean of the node coordinates for each geometry.
> 
> We do that in UGRID, too -- I think it's even required (and called
> coordinates, actually). It may make little sense with complex geometries,
> but it can be handy.

Yes. It is required in CF as well, and the attribute is named coordinates;
I think ugrid follows this.

> The stream network example would be a good one. also things like political
> boundaries -- they tend to be complex polygons with shared vertices.

There's a shared vertex at the confluence of two streams, but I guess those
are a fairly small fraction of the total number of points. With political
boundaries, I agree that most points (not coastlines) will appear twice.

Best wishes

Jonathan
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries

2017-01-31 Thread Chris Barker
A couple quick comments:

I think we're close here, so that's good. I'm not that clear on where tehre
are decisions left to be made, but I'll highlight two:

...

> Your aim is to
> describe the network alone.
>
...

>  a collection of timeseries is stored as a
> data variable with a single dimension of time and a single dimension of
> space.
>

I don't see a conflict here -- if you can describe the network (geometry)
then you can associate data with it (UGRID used indexes into cells, nodes,
etc, this should be equally applicable)

 > You would like to have SOMETHING alone in the file, just to

> describe the network itself. CF doesn't do this at present (domain without
> data),


isn't a set of coordinate variables essentially do that? i.e. you can
define a rectangular grid -- even if there is no data on it. And you can
certainly do that with UGRID, which is another standard, but I don't think
it conflicts with CF.


> Taking your previous comments into account (I'll come back to them below),
> as
> a modified version of what I suggested before, here's a possible way to
> handle
> this case, for a small number (3) of linestrings:
>

That looks good to me, I think...


>
>   data:
> SOMETHING=2, 4, 3;
> lon=0, 1,  0, -1, -2, -3,  2, 3, 4;
> lat=51, 52,  51, 50, 50, 49,  55, 55, 56;
>

I'm confused about what this is.

These simple geometries can be regarded as a more complex alternative to
> cells
> bounds - each timeseries has a complicated geometry of nodes and lines, but
> logically it's still a single "cell".


yup.


> For the sake of applications which can
> read CF but don't understand simple geometries, it might be a good idea in
> addition to provide a "representative" location for each timeseries, as
> representive_lat(station) and representative_lon(station), which could for
> instance be the mean of the node coordinates for each geometry.


We do that in UGRID, too -- I think it's even required (and called
coordinates, actually). It may make little sense with complex geometries,
but it can be handy.

> You propose the index variable in order for the convention to be like
> > ugrid. However this still seems to me to be an unnecessary complexity and
> > use of space if you aren’t going to have many shared nodes.



> To be frank, I'm not convinced by either argument. Regarding the first, in
> your
> example you don't reuse any points at all. Can you give an example where
> there
> is a lot of reuse?


The stream network example would be a good one. also things like political
boundaries -- they tend to be complex polygons with shared vertices.


> Regarding the second, I agree that it is a nuisance and
> unreliable to have to make comparisons with tolerance between
> floating-point
> numbers to determine equality. However, when you write a file, I suppose
> you
> can and would write exactly the same numbers for the coordinates of a node
> if
> it appears several times, wouldn't you? Thus the coincidence of nodes can
> be
> tested by *exact* equality of coordinates - no tolerance needed.
>

you still don't know fo sure if the vertices are the SAME or if the Happen
to be the same.

This is a tough one -- the "normal" GIS data model does not have shared
nodes (that I know of) so perhaps we should follow that. But this lack of
shared nodes is actually a substantial pain for GIS systems and uses --
there is a lot of complex "snapping" that needs to be done.  So I'm on the
fence about this -- I'm pretty convinced shared nodes are a better model,
but if we want to interact seamlessly with other GIS formats, we may be
better off matching that data model.

In my example above, I assumed the polygons have no holes in them, so I've
> omitted the inside/outside information. If needed, this information could
> also
> be an attribute e.g. SOMETHING:inout="OIIIIOO", with as many elements
> as
> there are polygons in total. Thinking again about it, I wonder whether this
> information is really needed. If you draw all the polygons, isn't it
> apparent
> which ones are inside anyway? When would you use this information?
>

it's not always clear. if there is a hole in a polygon, you can figure it
out, but if there is a lake in a land polygon, and a island in the lake,
then it gets pretty tricky.

I think shapefiles use clockwise vs anti-clockwise to indicate
inside-outside, but IIUC, they are pretty limited with nested polygons, too.


> My scheme avoids the use of break values, which you're not very keen on
> your-
> selves, it sounds like.


I don't like break values either.


> You wrote > - It is more difficult to extract a single geometry using this
> approach.  It's not hard, though, and the same comment would apply to the
> CF
> contiguous ragged array representation.


yes -- you can represent a ragged array by either specifying the
start-index of each "row", or by specifying the size of each row. CF
specifies the size of each row. I think that's a worse way to  do it --
it's similar if you 

Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries

2017-01-31 Thread Jonathan Gregory
Dear Ben

Thanks for your new thoughts. I find this intriguing but still puzzling, and I
think this means we are talking at cross-purposes. Perhaps we ought to speak on
the phone? However, here are some replies.

Maybe this is a clue to our differences:

> We intend for this proposal to fit in the Discrete Sampling Geometry
> timeSeries featureType. So this proposal does not contain any new mechanism
> to link a time-varying data variable with a network composed of polygons,
> points, and lines (a whole hydrologic system for example).
...
> We would never associate time-varying data with nodes or
> the edges between them.

So far, CF describes data, and provides coordinates to locate the data in space
and time (and other dimensions). I'm not really familiar with the terminology,
but I understand that this is called a "coverage" - that is, a data which is a
function of a domain. Your "new mechanism" sentence suggests that your aim is
to describe just a domain, with no data. Maybe that's why you're agnostic about
whether and how the 2.7 million stream segments are grouped. Your aim is to
describe the network alone.

But if you want to link it to CF timeseries, as you say you do, this question
must has a definite answer, because a collection of timeseries is stored as a
data variable with a single dimension of time and a single dimension of space.
The latter is an index to information which locates the data e.g. a simplified
version of CF example H.2:

  dimensions:
station = 10 ; // measurement locations
time = UNLIMITED ;
  variables:
float humidity(station,time) ;
  humidity:standard_name = "specific humidity" ;
  humidity:coordinates = "lat lon" ;
double time(time) ;
  time:standard_name = "time";
  time:units = "days since 1970-01-01 00:00:00" ;
float lon(station) ;
  lon:standard_name = "longitude";
  lon:units = "degrees_east";
float lat(station) ;
  lat:standard_name = "latitude";
  lat:units = "degrees_north" ;

Here the the data is located at 10 (lon,lat) points. In the streamflow example
I guess that each of the 2.7M stream segments has a timeseries of flow rates -
is that right? That means we have to replace the points with linestrings (which
I think is essentially the same as polylines, isn't it?), one for each stream
segment. There must be exactly the same number of linestrings as there are
timeseries. We need something like:

  dimensions:
station = 270; // stream segments
time = UNLIMITED;
  variables:
float flow(station,time) ;
  flow:units="m3 s-1";
double time(time) ;
  time:standard_name = "time";
  time:units = "days since 1970-01-01 00:00:00" ;
SOMETHING(station) // to describe the geometry of each stream segment

Your proposal is about the SOMETHING, but not how it links to the data. Is
that right? You would like to have SOMETHING alone in the file, just to
describe the network itself. CF doesn't do this at present (domain without
data), but it's been discussed before, and if we agree a CF convention for
SOMETHING, it could also be linked to the timeseries data variables.

Taking your previous comments into account (I'll come back to them below), as
a modified version of what I suggested before, here's a possible way to handle
this case, for a small number (3) of linestrings:

  dimensions:
station = 3; // stream segments
time = UNLIMITED;
node = 9; // = 2 + 4 + 3
  variables:
float flow(station,time) ;
  flow:units="m3 s-1";
  flow:topology="SOMETHING";
double time(time) ;
  time:standard_name = "time";
  time:units = "days since 1970-01-01 00:00:00" ;
int SOMETHING(station); // number of nodes for each linestring
  SOMETHING:node_coordinates="lon lat";
  SOMETHING:geometry_type="linestring";
float lon(node) ;
  lon:standard_name = "longitude";
  lon:units = "degrees_east";
float lat(node) ;
  lat:standard_name = "latitude";
  lat:units = "degrees_north" ;
  data:
SOMETHING=2, 4, 3;
lon=0, 1,  0, -1, -2, -3,  2, 3, 4;
lat=51, 52,  51, 50, 50, 49,  55, 55, 56;

The timeseries flow(0,*) is for the 2-point line from (0,51) to (1,52), and
flow(1,*) is for the 4-point line (0,51) -> (-1,50) -> (-2,50) -> (-3,49).
Timeseries of data on polygons (one timeseries for each polygon) would be done
in the same way, with geometry_type="polygon". The topology attribute of the
data variable provides a link to the SOMETHING variable, which specifies how
the nodes are connected to make the linestring or polygon for each timeseries.
I use the attribute names "topology" and "node_coordinates" to be reminiscent
of ugrid.  The SOMETHING variable could exist in a file without data variables
to describe the linestrings alone.

With linestrings and polygons, the geometry of each timeseries has a single
part (one linestring or one polygon), so the SOMETHING variable is used to
specify the number of nodes for each geometry.  The example 

Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries

2016-11-01 Thread Chris Barker
A little note:

On Tue, Nov 1, 2016 at 9:43 AM, Chris Barker  wrote:

> I'm on shakier ground about when you want to use a GeometryCollection vs a
> FeatureCollection, but I _think_ that the point of a geometrycollection is
> that you can group different types of geometry -- but still want them to be
> treated as a single entity.
>

some quick googling led me to this discussion:

https://github.com/topojson/topojson/issues/37

which indicates that a GeometryCollection is generally treated as a single
entity.

-CHB




> I've dealt with all this trying to jam data that fits well into netcdf
> into geoJSON, or GIS_oriented systems -- it's quite hard to be efficient
> about it :-) - i.e there is really no way to associate an array of data
> with an array of geometries -- it sure looks like you could do it with
> GeometryCollections, but the systems aren't expecting that.
>
> Of course, CF doesn't need to follow this data model, but it's a good idea
> to be informed by it.
>
>
>> Nonetheless in both cases the geometries have to be described. I think the
>> difference is how we attach this description to the data or coordinates,
>> rather
>> than how the description is constructed.
>>
>
> indeed.
>
>
>> You propose the index variable in order for the convention to be like
>> ugrid.
>> However this still seems to me to be an unnecessary complexity and use of
>> space
>> if you aren't going to have many shared nodes.
>
>
> In the GIS data model, nodes are not shared between geometries, and you
> are quite right that keeping nodes separate with geometries indexing nto it
> is an added complication and would not be space-efficient.
>
> However, there is another reason to do it -- it makes it definitive that
> two (or more) geometries share the exact same node, rather than them being
> distinct points that happened to be at the same location (Or worse, with FP
> error and all, two points that are very close)e
>
> This is actually a major limitation in the standard GIS model.
>
>
>> I think the case for having
>> another convention, distinct from ugrid, is stronger if it is *unlike*
>> ugrid
>> in this respect, and therefore simpler as well.
>>
>
> I still think that it should be separate from UGRID -- it really is a
> different use case, though they should still share whatever they can, and
> it could turn out that UGRID is a special case of geometries?
>
> -CHB
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries

2016-11-01 Thread Chris Barker
A few comments, though you all seem to have this in hand :-)


I was asking whether this means that for each *collection* (of points,
> lines or
> polygons) there is a *single* timeseries.


I don't get why this matters -- any number of time series could be
associated with a single "entity" -- just like any number of timeseries can
be associated with  given coordinates in regular old CF.


> For instance, in your example of a
> single geometry composed of several polygons, there is a single number for
> each
> time. But that is not the case for weather stations; for each weather
> station
> there is a timeseries, and at each time there is a different number (value
> of
> temperature, precipitation or whatever) for each weather station.


I think it may be helpful to borrow terminology (and the data model) from
the GIS world here. IN this case, I am referencing the geoJSON spec, as I
happen to be working with that at the moment, but the basic data model is
pretty consistent.

http://geojson.org/geojson-spec.html

Note that they have "geometries" which can be things like points, polygons,
polyllines. IIUC (and I'm no osgeo mavin) geometries represent a "single"
entity. Then there are "Features": a Feature is essentially data associated
with a particular geometry.

But note: there are "Collections" -- both Geometry and Feature Collections
-- that is what you use to "bundle" various data together.

I think we may be well served by thinking in terms of mapping the GIS data
model to CF/netcdf -- for instance it would be great to be able to write a
netcdf<->geoJSON converter that was lossless, AND would be fairly "native"
in both cases.


You also
> write, "The US National Weather Service’s National Water Model (NWM) ...
> forecasts streamflow rates in about 2.7 million stream segments averaging
> 2km."
> The stream network is a MultiLineString geometry, but I don't think there
> is
> just one value of streamflow applying to the entire network at any given
> time;
>

no -- of course not. So that network (if I understand the GIS data model)
should be a Feature Collection, not all one Feature. So a whole collection
of geometries as well.

The "trick" with this data model is that it "de-vecoritizes" the data.
Those of us used to working with netcdf, CF, gridded data, etc, tend to
think that you'd want to have, for instance, a vector of geometries, and
then various vectors of data associated with those geometries. whereas the
GIS data model associated data with a given geometry, and then creates
collections of those. This is kindof like the old C conundrum:

Do a use a struct of arrays, or an array of structs? netcdf is very much
about  the struct of arrays approach.

(though I'm still confused, maybe you can have an "array" of data
associated with a GeometryCollection?)

as for MultiLineString -- you could associate an array of data with the
Multilinestring -- so one value per segment. But I think that violates the
intent of the data model -- you should have a GeometryCollection of
linestrings instead. and then each segment has its own geometry and you can
associate an array of data with that. (or it should be a FeatureCollection?
I'm getting confused now!

I guess there is a different timeseries for each stream segment. But in my
> example above, the Atlantic Ocean is a single polygon with a single
> timeseries
> for its average temperature, not a different timeseries for each node.


right, so that Polygon would be a single Feature.


> Thus I
> am unclear about the dimensions of the data. In terms of your original
> example,
> does the data have dimensions (time,geometry, where geometry=1) or
> (time,node)?
>

(time,geometry, where geometry=1)

time,node would be for data associated with a FeatureCollection of Points
(or a MultiPoint).

Does anyone "get" the GIS data model. I'm quite confused as to when you
would use:

MultiPolygon
vs
GeometryCollection of Polygons
vs
FeatureCollection of Features with Polygon Geometries

But I'm going t take a stab at it:

MultiPolygon (and MultiLInestring, and MultiPoint) is used when you have
more than one of a particular type of geometry that are logically one thing
-- maybe an archipelago, for instance. A Polygon geometry can represent a
simple polygon, or a polygon with holes in it -- but can not represent two
separate polygons. So if you have multiple polygons that are geometrically
distinct, but logically connected, you use a MultiPolygon.

I'm on shakier ground about when you want to use a GeometryCollection vs a
FeatureCollection, but I _think_ that the point of a geometrycollection is
that you can group different types of geometry -- but still want them to be
treated as a single entity.

I've dealt with all this trying to jam data that fits well into netcdf into
geoJSON, or GIS_oriented systems -- it's quite hard to be efficient about
it :-) - i.e there is really no way to associate an array of data with an
array of geometries -- it sure looks like you could do it 

Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries

2016-10-26 Thread Jonathan Gregory
Dear Ben and Bert

Thanks for your emails, which help me to understand the simple geometry
proposals better. Just to be clear, I'd like to repeat my first question.

> You explain that the need is to specify spatial coordinates with a simple
> geometry for a timeSeries variable. For example, this could be for the
> discharge as a function of time across some line in a river (your example),
> or I suppose it could be an average temperature as a function of time for
> the Atlantic Ocean, where you wanted to supply the polygon which drew the
> outline of the basin. Have I got the idea?

to which you replied

> Yes, you have this mostly right. It’s common to have a collection of points
> (weather stations), lines (stream reaches), or polygons (hydrologic
> catchments) with an associated time series

I was asking whether this means that for each *collection* (of points, lines or
polygons) there is a *single* timeseries. For instance, in your example of a
single geometry composed of several polygons, there is a single number for each
time. But that is not the case for weather stations; for each weather station
there is a timeseries, and at each time there is a different number (value of
temperature, precipitation or whatever) for each weather station. You also
write, "The US National Weather Service’s National Water Model (NWM) ...
forecasts streamflow rates in about 2.7 million stream segments averaging 2km."
The stream network is a MultiLineString geometry, but I don't think there is
just one value of streamflow applying to the entire network at any given time;
I guess there is a different timeseries for each stream segment. But in my
example above, the Atlantic Ocean is a single polygon with a single timeseries
for its average temperature, not a different timeseries for each node. Thus I
am unclear about the dimensions of the data. In terms of your original example,
does the data have dimensions (time,geometry, where geometry=1) or (time,node)?

This seems to me to be a crucial difference. In the former case the simple
geometry can be regarded as a more complex alternative to cells bounds - the
cell has a complicated geometry of nodes and lines, but it's still a single
cell. In the latter case you're providing many timeseries in an unstructured
geometry, which is what ugrid describes. Which do you have in mind?

Nonetheless in both cases the geometries have to be described. I think the
difference is how we attach this description to the data or coordinates, rather
than how the description is constructed.

You propose the index variable in order for the convention to be like ugrid.
However this still seems to me to be an unnecessary complexity and use of space
if you aren't going to have many shared nodes. I think the case for having
another convention, distinct from ugrid, is stronger if it is *unlike* ugrid
in this respect, and therefore simpler as well.

I agree that repeating the inside/outside flag many times is wasteful. That,
coupled with your clarification that you may have several geometries, each
consisting of several elements (points, lines, polygons), means that you need,
in effect, a ragged array of ragged arrays (geometry,element,node). This is
more complicated than DSGs, but it seems to me it would be reasonably easy to
understand if your multi-geometry example
https://github.com/bekozi/netCDF-CF-simple-geometry/wiki/VLEN-Arrays-in-NetCDF-3#multipolygon-example
was stored something like this:

  geom=3;
  part=11;
  node=36;
  int number_of_parts(geom);
number_of_parts:parts="number_of_nodes";
  int number_of_nodes(part);
number_of_nodes:inout="inout";
  char inout(part);
  float x(node);
  float y(node);
  number_of_parts=6, 3, 2;
  number_of_nodes=4, 3, 3, 3, 3, 3, 3, 5, 3, 3, 3;
  inout="OIIIOOOIO";
  x=0, 20, 20, 0, 1, 10, 19, 5, 7, 9, 11, 13, 15, 5, 9, 7, 11, 15, 13, -40,
  -20, -45, -20, -10, -10, -30, -45, -30, -20, -20, 30, 45, 10, 25, 50, 30;
  y = 0, 0, 20, 20, 1, 5, 1, 15, 19, 15, 15, 19, 15, 25, 25, 29, 25, 25, 29,
  -40, -45, -30, -35, -30, -10, -5, -20, -20, -15, -25, 20, 40, 40, 5, 10, 15;

where I assume that all polygons are closed.

What do you think?

Best wishes

Jonathan
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries

2016-09-27 Thread Chris Barker
Thanks for all the great input, Bert.

One comment:

>
> 5) Besides inventing our own storage format (either in line with UGRID or
> CF), a third way was discussed namely: to store the simple geometry shapes
> as ascii or binary blobs in an extended format NetCDF 4 file.


I think binary blobs is a really bad idea (and what would be the format of
those blobs? shape files? or maybe WEll KnownBinary?

But WellKnownText or geoJSON might be reasonable.

I'd still rather have it done "properly" with netcdf arrays.

-Chris




-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries

2016-09-27 Thread Ben Koziol - NOAA Affiliate
Jonathan and CF-Metadata List,

Thanks for the suggestions and discussion. We’ve attempted to respond to
the major questions and concerns using Jonathan's mail as a template.
Apologies in advance if we missed anything outstanding or did not
appropriately acknowledge contributions in this thread.

You explain that the need is to specify spatial coordinates with a simple
> geometry for a timeSeries variable. For example, this could be for the
> discharge as a function of time across some line in a river (your example),
> or I suppose it could be an average temperature as a function of time for
> the Atlantic Ocean, where you wanted to supply the polygon which drew the
> outline of the basin. Have I got the idea?


Yes, you have this mostly right. It’s common to have a collection of points
(weather stations), lines (stream reaches), or polygons (hydrologic
catchments) with an associated time series.

Timeseries like this can be stored in CF, but their geographical extent is
> usually described only in words e.g. a region name of atlantic_ocean, and
> this is fine for applications like CMIP where you want to compare data from
> different data sources in which the Atlantic Ocean may have different exact
> shapes (different AOGCMs, in particular). An array of region names is also
> possible, so I don't think we need a new convention to contain your dwarf
> planet example.


The dwarf planet example is intended to describe our generalized approach
to continuous ragged arrays that may be used for arbitrarily-sized data
arrays. For some (including me), using a string instead of a numeric
example helps illustrate the concept. It is an idiosyncratic example in
many ways. Sorry for the confusion.

Sect 9.1 on discrete sampling geometries says it cannot yet be used for
> cases "where geo-positioning cannot be described as a discrete point
> location. Problematic examples include time series that refer to a
> geographical region (e.g. the northern hemisphere) ...". Actually I think
> that's not quite right. The existing convention *can* describe regions
> which are contiguous, and rectangular or polygonal, using its usual bounds
> convention (Sect 7.1). I think we should consider changing this text,
> because it seems unnecessarily restrictive.


Your explanation makes sense, and this should be captured in the DSG
convention text.


> If the regions were irregular polygons in latitude and longitude, nv would
> be the number of vertices and the lat and lon bounds would trace the
> outline of the polygon e.g. nv=3, lat=0,90,0 and lon=0,0,90 describes the
> eighth of the sphere which is bounded by the meridians at 0E and 90E and
> the Equator. I think, therefore, we do not need an additional convention
> for points or polygonal regions.


Many earth science datasets (excluding triangular, hexagonal, etc. meshes)
representable as polygons and lines have differing node counts. "nv" could
not efficiently capture watershed A with 5 nodes and watershed B with 100.
Additionally, the cell bounds concept does not include the structure and
semantics needed to support MultiLines, MultiPolygons, or polygons with
holes/interiors.

However, we would need new conventions for a timeseries where each value
> applies to a set of discontiguous regions or regions with holes in them, a
> set of points, a line or a set of lines. I guess that these are included in
> the geometry types you list (LineString, Multipoint, MultiLineString, and
> MultiPolygon).


Yes.

Do you have definite use-cases for all of these?  (I ask this because we
> don't add new functionality to CF until there is a definite and common need
> for it in practice.)


David Arctur described the primary motivation for developing the simple
geometries approach: "Among other applications, NetCDF-CF is now being used
as an intermediate & output data format in the US National Weather
Service’s National Water Model (NWM). This forecasts streamflow rates in
about 2.7 million stream segments averaging 2km, throughout the continental
US, at multiple time horizons (3 hr, 18 hr, 10 days) every hour, and an
ensemble for 30-day forecast less frequently." These data also contain
multi-geometries primarily in the form of MultiLineStrings and
MultiPolgyons.

To this we would add that working with GIS datasets of this magnitude is
difficult with current NetCDF metadata conventions, often yielding an
unwieldy hybrid of NetCDF data and other softwares like ESRI ArcGIS and
PostGIS. ESRI ArcGIS and PostGIS are not usable on many HPC platforms where
models like the NWM reside.

I suspect that geometries of this kind can be described by the ugrid
> convention http://ugrid-conventions.github.io/ugrid-conventions, which is
> compliant with CF. Their purpose is to describe a set of connected points,
> edges or faces at which values are given, whereas in your case you'd give a
> single value for the whole set, but the description of the geometry itself
> might be similar. Have you had a look at whether 

Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries

2016-09-26 Thread Bert Jagers
 thread in the UGRID community first.

Best regards,

Bert

-Original Message-
From: CF-metadata [mailto:cf-metadata-boun...@cgd.ucar.edu] On Behalf Of 
Jonathan Gregory
Sent: 22 September 2016 18:26
To: cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries

Dear Chris

> > If the regions were irregular
> > polygons in latitude and longitude, nv would be the number of
> > vertices and the lat and lon bounds would trace the outline of the
> > polygon e.g. nv=3,
> > lat=0,90,0
> > and lon=0,0,90 describes the eighth of the sphere which is bounded
> > by the meridians at 0E and 90E and the Equator. I think, therefore,
> > we do not need an additional convention for points or polygonal
> > regions.
>
> this seems fine for this simple example, but burying a bunch of
> coordinates of a complex polygon in a text string in an attribute is
> really not a good idea -- the coordinates of a polygon should be in
> the array data one way or another, rather than having to parse out attribute 
> strings.

To avoid confusion:

I didn't suggest parsing attribute strings. The same numbers that Ben would put 
in his x and y auxiliary coordinate variables for a single polygon can appear 
in coordinate bounds variables according to the existing convention.

> * I suspect that geometries of this kind can be described by the ugrid
> > convention http://ugrid-conventions.github.io/ugrid-conventions,
> > which is compliant with CF. Their purpose is to describe a set of
> > connected points, edges or faces at which values are given,
>
> I'm not so sure -- UGRID is about defining a bunch of polygons that
> all share vertices, and are all of the same order (usually all
> triangles, or quads, or maybe hexes). if they are a mixture, you still
> store the full set (say, six vertices), while marking some as unused.
> But it's not that well set up for a bunch of polygons of different order.
>
> Not too bad if there are only one or two complex polygons, but it
> would be a bit weird -- you'd have vertices and boundaries, but no
> faces. And you'd lose t order of the vertices (thought that could
> probably be added to the UGRID standard)

OK. I didn't investigate this, but it would be good to know about it. If ugrid 
can do something like this, but not all of it, maybe ugrid could be extended. 
If ugrid seems too complicated for these cases, maybe a "light"
version of ugrid could be proposed for them. I think we should avoid having two 
partially overlapping conventions.

> * So far CF does not say anything about the use of netCDF-4 features (i.e.
> > not
> > the classic model). We have often discussed allowing them but the
> > general argument is also made that there has to be a compelling case
> > for providing a new way to do something which can already be done.
> > (Steve Hankin often made this argument, but since he's mostly
> > retired I'll make it now in his name
> > :-)
> >
>
> Maybe it's time to embrace netcdf4? It's been a while! Though maybe
> for CF
> 2.* -- any movement on that?

I think, as we generally do, that we should adopt netCDF-4 features if there is 
a definite need to do so. I mean something you can't do with an existing 
mechanism, or which is done so much more easily with a new mechanism that it 
justifies the extra effort of requiring alternatives to be programmed in 
software. I'm not arguing against it in general, but I think it has to be 
argued for each specific need within the convention.

CF2 is not well-defined. I have to admit to being nervous about that. I am very 
much opposed to an idea of "starting all over again" and maintaining two 
conventions in parallel (since old data would continue to exist for a long time 
and so the old CF would have to be supported), and I also think backwards- 
incompability has to be strongly justified. So I favour step-by-step evolution.
Another idea we've discussed, which I'm comfortable with, is of defining 
"strict" compliance to the convention, which a data-writer could optionally 
adhere to. This could exclude older features we wanted to deprecate. However 
this is really not the subject of the discussion - it's another thread.

> I think the ragged array option ins fine -- though I haven't looked at
> vlen arrays enough to know if they offer a compelling alternative. One
> issue is that the programming environments that we use to work with
> the data may not have an equivalent of vlen arrays.

That's a good point, and a reason why we have to be cautious in general about 
adopting netCDF-4 features.

Best wishes

Jonathan
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
DISCLAIMER: 

Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries

2016-09-26 Thread Hedley, Mark
Hello Ben

I think this is fascinating and fantastic work which is likely to prove very 
useful for a range of domains.

I am afraid that, just now, I don't have any specific insights with regard to:
> Questions for the CF Community
> 
> 1. Are our VLEN netCDF-3 and netCDF-4 approaches acceptable? What changes 
> would you recommend?
> 2. Are the geometry types point, line, polygon, and their multipart 
> equivalents sufficient for the community?

but I do think these are really valuable areas to get feedback on.

all the best
mark



From: CF-metadata [cf-metadata-boun...@cgd.ucar.edu] on behalf of Ben Koziol - 
NOAA Affiliate [ben.koz...@noaa.gov]

Sent: 07 September 2016 19:13

To: CF metadata

Cc: Bob Simons - NOAA Federal; Whiteaker, Timothy L

Subject: [CF-metadata] Feedback requested on proposed CF Simple Geometries







Greetings,



As part of an EarthCube project for
 advancing netCDF-CF [1], we are developing an approach to represent simple 
geometries in enhanced netCDF-4 with a variable length array backport for 
netCDF-3. Simple geometries, for example, may be used to associate stream 
discharge with river lines or surface
 runoff with watershed polygons. We've drafted an initial approach and 
reference implementation on the GitHub netCDF-CF-simple-geometry project [2] 
and would greatly appreciate feedback from the CF community. We'd like to make 
sure our scope is appropriate
 and our approach is acceptable.


Scope




The result of this effort will be a standard that the CF
timeSeries
 feature type could use to specify spatial coordinates (define a simple 
geometry) for a
timeSeries
 variable.

For those familiar with the OGC WKT standard geometry types [3], we will
 include Point, LineString, Polygon, Multipoint, MultiLineString, and 
MultiPolygon (WKT primitives and multipart geometries).




We anticipate that the six chosen geometry
 types will cover the needs of most people generating netCDF data. These types 
also align with other geospatial data formats such as GeoJSON and ESRI 
Shapefile. If our approach is well received by the CF community, we may later 
adapt it to include parametric
 shapes such as circles and ellipses.


Simple Geometry Encoding
 Method


Driven by the possibility that different
 features will require different numbers of coordinates to describe their 
geometries, our approach uses variable length (VLEN) arrays in enhanced 
netCDF-4 and continuous ragged arrays (CRAs) in netCDF-3. We describe the VLEN 
netCDF-4 approach first. The netCDF-3
 CRA description follows.


In our approach, a VLEN
coordinate_index
 variable which identifies the indices of geometry coordinates in separate 
coordinate arrays. The
coordinate_index
 variable includes a coordinates
 attribute which stores the names of the coordinate variables and a geom_type
 attribute to indicate the geometry type.


For multipart geometries, the coordinate
 index variable may include a negative integer flag(s) indicating the start of 
each new geometry "part" for the current feature. The first geometry part is 
not preceded by the negative integer flag. The variable shall include an 
attribute named
multipart_break_value
 identifying the flag's value.


For polygon geometries with holes (also
 called "interiors"), the coordinate index values shall include a negative 
integer flagging the start of each hole. In this case, the variable shall 
include a
hole_break_value
 attribute to indicate the flag value.


Other attributes on the coordinate index
 variable describe clockwise or anticlockwise node order for polygons and 
polygon closure convention. For additional details, see the wiki [4]. With 
these concepts defined, an example for multipolygons with holes is shown below. 
You can copy the WKT description
 below into Wicket [5] if you'd like to see what the geometry in this example 
looks like.


Well-Known Text (WKT):
MULTIPOLYGON(((0 0, 20 0, 20 20, 0 20, 0 0), (1 1, 10 5, 19 1, 1 1), (5 15, 7 
19, 9 15, 5 15),
 (11 15, 13 19, 15 15, 11 15)), ((5 25, 9 25, 7 29, 5 25)), ((11 25, 15 25, 13 
29, 11 25)))


Common Data Language (CDL) for netCDF-4
 VLEN Arrays:


netcdf multipolygon_example
 {
types:
 int64(*) geom_VLType ;
dimensions:
 node = 25 ;
 geom = 1 ;
variables:
 geom_VLType coordinate_index(geom)
 ;
   string coordinate_index:geom_type
 = "multipolygon" ;
   string coordinate_index:coordinates
 = "x y" ;
   coordinate_index:multipart_break_value
 = -1 ;
   coordinate_index:hole_break_value
 = -2 ;
   string coordinate_index:outer_ring_order
 = "anticlockwise" ;
   string coordinate_index:closure_convention
 = "last_node_equals_first" ;
 double x(node) ;
 double y(node) ;
data:


coordinate_index =

   {0, 1, 2, 3, 4, -2, 5, 6,
 7, 8, -2, 9, 10, 11, 12, -2, 13, 14, 15, 16, -1, 17, 18, 19, 20, -1, 21, 22, 
23, 24} ;


x = 0, 20, 20, 0, 0, 1, 10,
 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7, 5, 11, 15, 13, 11 ;


y = 0, 0, 20, 20, 0, 1, 5, 1,
 1, 15, 19, 15, 15, 15, 19, 15, 15, 25, 25, 29, 25, 

Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries

2016-09-22 Thread Ethan Davis
Hi all,

Just a quick note on Chris' CF 2 question (until I have a bit more time to
think more fully on this discussion)  ...

The EC netCDF-CF project that Ben mentioned is working on a number of CF
extension efforts that are looking to use features of the netCDF enhanced
data model. Those efforts will all target CF 2 rather than CF 1.x. However,
as with the Simple Geometries, we should also expect suggestions for
changes to CF 1.x spinning out of these efforts.

The CF-2 discussion has been pretty quite for awhile now. However, I expect
it will be more active as these various CF extension efforts start seeking
more community input and making proposals.

Cheers,

Ethan


On Thu, Sep 22, 2016 at 12:00 PM, Chris Barker 
wrote:

> Sorry, not enough time to really read tis all carefully, but a couple
> comments from a brief look:
>
>>
>> If the regions were irregular
>> polygons in latitude and longitude, nv would be the number of vertices
>> and the
>> lat and lon bounds would trace the outline of the polygon e.g. nv=3,
>> lat=0,90,0
>> and lon=0,0,90 describes the eighth of the sphere which is bounded by the
>> meridians at 0E and 90E and the Equator. I think, therefore, we do not
>> need an
>> additional convention for points or polygonal regions.
>
>
> this seems fine for this simple example, but burying a bunch of
> coordinates of a complex polygon in a text string in an attribute is really
> not a good idea -- the coordinates of a polygon should be in the array data
> one way or another, rather than having to parse out attribute strings.
>
> * I suspect that geometries of this kind can be described by the ugrid
>> convention http://ugrid-conventions.github.io/ugrid-conventions, which is
>> compliant with CF. Their purpose is to describe a set of connected points,
>> edges or faces at which values are given,
>
>
> I'm not so sure -- UGRID is about defining a bunch of polygons that all
> share vertices, and are all of the same order (usually all triangles, or
> quads, or maybe hexes). if they are a mixture, you still store the full set
> (say, six vertices), while marking some as unused. But it's not that well
> set up for a bunch of polygons of different order.
>
> Not too bad if there are only one or two complex polygons, but it would be
> a bit weird -- you'd have vertices and boundaries, but no faces. And you'd
> lose t order of the vertices (thought that could probably be added to the
> UGRID standard)
>
>
>> whereas in your case you'd give a
>> single value for the whole set, but the description of the geometry itself
>> might be similar. Have you had a look at whether ugrid could meet your
>> needs?
>> If it almost does so, perhaps a better thing to do would be to propose
>> additions to ugrid. We would like to avoid having more than one way to
>> describe
>> such geometries.
>>
>
> Ben has been involved in UGRID, so I'm sure he's thought this out. For my
> part, I think it's really a different problem, though it would be nice if
> it were as similar to UGRID as possible.
>
> * So far CF does not say anything about the use of netCDF-4 features (i.e.
>> not
>> the classic model). We have often discussed allowing them but the general
>> argument is also made that there has to be a compelling case for
>> providing a
>> new way to do something which can already be done. (Steve Hankin often
>> made
>> this argument, but since he's mostly retired I'll make it now in his name
>> :-)
>>
>
> Maybe it's time to embrace netcdf4? It's been a while! Though maybe for CF
> 2.* -- any movement on that?
>
>
>> If there are two ways to do something, software has to support both of
>> them. We
>> already have ways to encode ragged arrays, so is there a compelling case
>> for
>> needing the netCDF-4 vlen array as well?
>
>
> I think the ragged array option ins fine -- though I haven't looked at
> vlen arrays enough to know if they offer a compelling alternative. One
> issue is that the programming environments that we use to work with the
> data may not have an equivalent of vlen arrays.
>
> * Similarly, you propose attributes for clockwise/anticlockwise node order
>> and
>> for the polygon closure convention.
>
>
> This should match the OGC conventions as much as is practical.
>
> -CHB
>
>
> --
>
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
>
> chris.bar...@noaa.gov
>
> ___
> CF-metadata mailing list
> CF-metadata@cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
>
>
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries

2016-09-22 Thread Chris Barker
On Thu, Sep 22, 2016 at 9:26 AM, Jonathan Gregory  wrote:

> I didn't suggest parsing attribute strings. The same numbers that Ben
> would put
> in his x and y auxiliary coordinate variables for a single polygon can
> appear
> in coordinate bounds variables according to the existing convention.


OK then, sorry for the confusion, probably me reading it too fast...

OK. I didn't investigate this, but it would be good to know about it. If
> ugrid can do something like this, but not all of it, maybe ugrid could be
> extended.


sure.


> If ugrid seems too complicated for these cases, maybe a "light"
> version of ugrid could be proposed for them. I think we should avoid having
> two partially overlapping conventions.


I agree -- but it seem like these are really different use cases to me --
sure there are similarities, but a different enough focus that a different
standard may make sense -- though hopefully UGRID can inform the "new" one,
so as to not have different way to accomplish the parts that are the same.


CF2 is not well-defined.


I thought it wasn't defined at all. But I think we all share your concerns
about that.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries

2016-09-22 Thread Jonathan Gregory
Dear Chris

> > If the regions were irregular
> > polygons in latitude and longitude, nv would be the number of vertices and
> > the
> > lat and lon bounds would trace the outline of the polygon e.g. nv=3,
> > lat=0,90,0
> > and lon=0,0,90 describes the eighth of the sphere which is bounded by the
> > meridians at 0E and 90E and the Equator. I think, therefore, we do not
> > need an
> > additional convention for points or polygonal regions.
> 
> this seems fine for this simple example, but burying a bunch of coordinates
> of a complex polygon in a text string in an attribute is really not a good
> idea -- the coordinates of a polygon should be in the array data one way or
> another, rather than having to parse out attribute strings.

To avoid confusion:

I didn't suggest parsing attribute strings. The same numbers that Ben would put
in his x and y auxiliary coordinate variables for a single polygon can appear
in coordinate bounds variables according to the existing convention.

> * I suspect that geometries of this kind can be described by the ugrid
> > convention http://ugrid-conventions.github.io/ugrid-conventions, which is
> > compliant with CF. Their purpose is to describe a set of connected points,
> > edges or faces at which values are given,
> 
> I'm not so sure -- UGRID is about defining a bunch of polygons that all
> share vertices, and are all of the same order (usually all triangles, or
> quads, or maybe hexes). if they are a mixture, you still store the full set
> (say, six vertices), while marking some as unused. But it's not that well
> set up for a bunch of polygons of different order.
> 
> Not too bad if there are only one or two complex polygons, but it would be
> a bit weird -- you'd have vertices and boundaries, but no faces. And you'd
> lose t order of the vertices (thought that could probably be added to the
> UGRID standard)

OK. I didn't investigate this, but it would be good to know about it. If
ugrid can do something like this, but not all of it, maybe ugrid could be
extended. If ugrid seems too complicated for these cases, maybe a "light"
version of ugrid could be proposed for them. I think we should avoid having
two partially overlapping conventions.

> * So far CF does not say anything about the use of netCDF-4 features (i.e.
> > not
> > the classic model). We have often discussed allowing them but the general
> > argument is also made that there has to be a compelling case for providing
> > a
> > new way to do something which can already be done. (Steve Hankin often made
> > this argument, but since he's mostly retired I'll make it now in his name
> > :-)
> >
> 
> Maybe it's time to embrace netcdf4? It's been a while! Though maybe for CF
> 2.* -- any movement on that?

I think, as we generally do, that we should adopt netCDF-4 features if there
is a definite need to do so. I mean something you can't do with an existing
mechanism, or which is done so much more easily with a new mechanism that it
justifies the extra effort of requiring alternatives to be programmed in
software. I'm not arguing against it in general, but I think it has to be
argued for each specific need within the convention.

CF2 is not well-defined. I have to admit to being nervous about that. I am
very much opposed to an idea of "starting all over again" and maintaining
two conventions in parallel (since old data would continue to exist for a long
time and so the old CF would have to be supported), and I also think backwards-
incompability has to be strongly justified. So I favour step-by-step evolution.
Another idea we've discussed, which I'm comfortable with, is of defining
"strict" compliance to the convention, which a data-writer could optionally
adhere to. This could exclude older features we wanted to deprecate. However
this is really not the subject of the discussion - it's another thread.

> I think the ragged array option ins fine -- though I haven't looked at vlen
> arrays enough to know if they offer a compelling alternative. One issue is
> that the programming environments that we use to work with the data may not
> have an equivalent of vlen arrays.

That's a good point, and a reason why we have to be cautious in general about
adopting netCDF-4 features.

Best wishes

Jonathan
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries

2016-09-22 Thread Arctur, David K
Dear Jonathan, 

I can’t speak to the technical details, but can mention some motivation for 
simple geometries. Among other applications, NetCDF-CF is now being used as an 
intermediate & output data format in the US National Weather Service’s National 
Water Model (NWM). This forecasts streamflow rates in about 2.7 million stream 
segments averaging 2km, throughout the continental US, at multiple time 
horizons (3 hr, 18 hr, 10 days) every hour, and an ensemble for 30-day forecast 
less frequently. There are many applications which can benefit from detailed 
polyline and polygon geometries. While ugrid could also be used, the simple 
geometries approach presented is simpler to implement. 

Regards, 
David Arctur



On Sep 22, 2016, at 5:40 AM, Jonathan Gregory  wrote:

Dear Ben

Thank you for your thoughtful and interesting proposal. I have quite a lot of
questions and comments about it.

* You explain that the need is to specify spatial coordinates with a simple
geometry for a timeSeries variable. For example, this could be for the
discharge as a function of time across some line in a river (your example), or
I suppose it could be an average temperature as a function of time for the
Atlantic Ocean, where you wanted to supply the polygon which drew the outline
of the basin. Have I got the idea? Timeseries like this can be stored in CF,
but their geographical extent is usually described only in words e.g. a region
name of atlantic_ocean, and this is fine for applications like CMIP where you
want to compare data from different data sources in which the Atlantic Ocean
may have different exact shapes (different AOGCMs, in particular). An array of
region names is also possible, so I don't think we need a new convention to
contain your dwarf planet example.

* Sect 9.1 on discrete sampling geometries says it cannot yet be used for cases
"where geo-positioning cannot be described as a discrete point location.
Problematic examples include time series that refer to a geographical region
(e.g. the northern hemisphere) ...". Actually I think that's not quite right.
The existing convention *can* describe regions which are contiguous, and
rectangular or polygonal, using its usual bounds convention (Sect 7.1). I think
we should consider changing this text, because it seems unnecessarily
restrictive. For example, a timeSeries for the average temperature in the
Northern Hemisphere can be stored like this:

 dimensions:
   region=1;
   nv=2;
   time=UNLIMITED;
 variables:
   float temperature(region,time);
 temperature:standard_name="surface_temperature";
 temperature:units="K";
 temperature:coordinates="lat lon";
 temperature:cell_methods="time: mean area: mean";
   float lat(region);
 lat:standard_name="latitude";
 lat:units="degrees_north";
 lat:bounds="lat_bounds";
   float lat_bounds(region,nv);
   float lon(region);
 lon:standard_name="longitude";
 lon:units="degrees_east";
 lon:bounds="lon_bounds";
   float lon_bounds(region,nv);
 data:
   lat_bounds=0,90;
   lon_bounds=0,360;

which means the region is 0-90N and 0-360E. If the regions were irregular
polygons in latitude and longitude, nv would be the number of vertices and the
lat and lon bounds would trace the outline of the polygon e.g. nv=3, lat=0,90,0
and lon=0,0,90 describes the eighth of the sphere which is bounded by the
meridians at 0E and 90E and the Equator. I think, therefore, we do not need an
additional convention for points or polygonal regions. However, we would need
new conventions for a timeseries where each value applies to a set of
discontiguous regions or regions with holes in them, a set of points, a line or
a set of lines. I guess that these are included in the geometry types you list
(LineString, Multipoint, MultiLineString, and MultiPolygon). Do you have
definite use-cases for all of these?  (I ask this because we don't add new
functionality to CF until there is a definite and common need for it in
practice.)

* I suspect that geometries of this kind can be described by the ugrid
convention http://ugrid-conventions.github.io/ugrid-conventions, which is
compliant with CF. Their purpose is to describe a set of connected points,
edges or faces at which values are given, whereas in your case you'd give a
single value for the whole set, but the description of the geometry itself
might be similar. Have you had a look at whether ugrid could meet your needs?
If it almost does so, perhaps a better thing to do would be to propose
additions to ugrid. We would like to avoid having more than one way to describe
such geometries.

If you decide to make use of ugrid instead, the rest of my comments may
not be relevant!

* So far CF does not say anything about the use of netCDF-4 features (i.e. not
the classic model). We have often discussed allowing them but the general
argument is also made that there has to be a compelling case for providing a
new way to do something which can already 

Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries

2016-09-22 Thread Jonathan Gregory
Dear Ben

Thank you for your thoughtful and interesting proposal. I have quite a lot of
questions and comments about it.

* You explain that the need is to specify spatial coordinates with a simple
geometry for a timeSeries variable. For example, this could be for the
discharge as a function of time across some line in a river (your example), or
I suppose it could be an average temperature as a function of time for the
Atlantic Ocean, where you wanted to supply the polygon which drew the outline
of the basin. Have I got the idea? Timeseries like this can be stored in CF,
but their geographical extent is usually described only in words e.g. a region
name of atlantic_ocean, and this is fine for applications like CMIP where you
want to compare data from different data sources in which the Atlantic Ocean
may have different exact shapes (different AOGCMs, in particular). An array of
region names is also possible, so I don't think we need a new convention to
contain your dwarf planet example.

* Sect 9.1 on discrete sampling geometries says it cannot yet be used for cases
"where geo-positioning cannot be described as a discrete point location.
Problematic examples include time series that refer to a geographical region
(e.g. the northern hemisphere) ...". Actually I think that's not quite right.
The existing convention *can* describe regions which are contiguous, and
rectangular or polygonal, using its usual bounds convention (Sect 7.1). I think
we should consider changing this text, because it seems unnecessarily
restrictive. For example, a timeSeries for the average temperature in the
Northern Hemisphere can be stored like this:

  dimensions:
region=1;
nv=2;
time=UNLIMITED;
  variables:
float temperature(region,time);
  temperature:standard_name="surface_temperature";
  temperature:units="K";
  temperature:coordinates="lat lon";
  temperature:cell_methods="time: mean area: mean";
float lat(region);
  lat:standard_name="latitude";
  lat:units="degrees_north";
  lat:bounds="lat_bounds";
float lat_bounds(region,nv);
float lon(region);
  lon:standard_name="longitude";
  lon:units="degrees_east";
  lon:bounds="lon_bounds";
float lon_bounds(region,nv);
  data:
lat_bounds=0,90;
lon_bounds=0,360;

which means the region is 0-90N and 0-360E. If the regions were irregular
polygons in latitude and longitude, nv would be the number of vertices and the
lat and lon bounds would trace the outline of the polygon e.g. nv=3, lat=0,90,0
and lon=0,0,90 describes the eighth of the sphere which is bounded by the
meridians at 0E and 90E and the Equator. I think, therefore, we do not need an
additional convention for points or polygonal regions. However, we would need
new conventions for a timeseries where each value applies to a set of
discontiguous regions or regions with holes in them, a set of points, a line or
a set of lines. I guess that these are included in the geometry types you list
(LineString, Multipoint, MultiLineString, and MultiPolygon). Do you have
definite use-cases for all of these?  (I ask this because we don't add new
functionality to CF until there is a definite and common need for it in
practice.)

* I suspect that geometries of this kind can be described by the ugrid
convention http://ugrid-conventions.github.io/ugrid-conventions, which is
compliant with CF. Their purpose is to describe a set of connected points,
edges or faces at which values are given, whereas in your case you'd give a
single value for the whole set, but the description of the geometry itself
might be similar. Have you had a look at whether ugrid could meet your needs?
If it almost does so, perhaps a better thing to do would be to propose
additions to ugrid. We would like to avoid having more than one way to describe
such geometries.

If you decide to make use of ugrid instead, the rest of my comments may
not be relevant!

* So far CF does not say anything about the use of netCDF-4 features (i.e. not
the classic model). We have often discussed allowing them but the general
argument is also made that there has to be a compelling case for providing a
new way to do something which can already be done. (Steve Hankin often made
this argument, but since he's mostly retired I'll make it now in his name :-)
If there are two ways to do something, software has to support both of them. We
already have ways to encode ragged arrays, so is there a compelling case for
needing the netCDF-4 vlen array as well? We already have a way to encode
strings too, as character arrays. I think this is probably a discussion we
should have again in a different thread, so I'll just talk about your classic
encoding. The same points apply to both encodings.

* Your approach uses a coordinate_index variable to identify indices of
geometry coordinates e.g.

  dimensions:
indices = 30;
node = 25 ;
geom = 1 ;
  variables:
int coordinate_index(indices) ;