Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
Dear Jonathan, As I mentioned in my response yesterday, we have worked through these issues and think we have a compromise proposal for the community. Since the conversation is active, I’ll go ahead and share our work with the list in a follow up email in a moment. A couple specific responses to your note below: There is a balance to be found between “opaque” and “transparent” encoding of geometries (or ragged arrays for that matter). “Transparent” tends to require more dimensions and really breaks the geometries apart while “opaque” starts to impinge on the human readability and self-describing ideals of CF. We feel that a middle way is available to us and I’ll outline the logic for that in my follow up. Regarding the indexed array. The approach (to use node sharing and the indexed array notation) we will propose seems to be good for a couple reasons. First, it allows data to be topologically intact without the need for node-comparison. This is very important for some applications and we feel should be possible in the encoding. Second, it is in-line with the approach taken by UGRID and will be familiar to some for that reason. The argument around storage volume can go either way. You are right that in cases where people don’t have shared nodes, it will be extra. But I’m not convinced this is a factor that would change the decision based on the first two considerations. Regards, - Dave > On Feb 2, 2017, at 3:31 AM, Jonathan Gregory > wrote: > > Dear Ben and Chris > > Following Chris's comment about preferring variables to multi-valued > attributes, here are the examples for linestring and multipolygon redone so > that both use variables to store the counts of parts and nodes. In this scheme > more variables and dimensions are needed, but it may be easier to read and it > is more CF-like, because the topology information is a "container" variable, > like the CF grid_mapping and the ugrid mesh topology, having no numerical > information in itself, just with attributes that point to variables. > > dimensions: >station = 3; // stream segments >time = UNLIMITED; >node = 9; // = 2 + 4 + 3 > variables: >float flow(station,time) ; > flow:units="m3 s-1"; > flow:topology="SOMETHING"; >double time(time) ; > time:standard_name = "time"; > time:units = "days since 1970-01-01 00:00:00" ; >char SOMETHING; > SOMETHING:node_coordinates="lon lat"; > SOMETHING:node_count="node_count_var"; > SOMETHING:topology_type="linestring"; >int node_count_var(station); // number of nodes for each linestring >float lon(node) ; > lon:standard_name = "longitude"; > lon:units = "degrees_east"; >float lat(node) ; > lat:standard_name = "latitude"; > lat:units = "degrees_north" ; > data: >node_count_var=2, 4, 3; >lon=0, 1, 0, -1, -2, -3, 2, 3, 4; >lat=51, 52, 51, 50, 50, 49, 55, 55, 56; > > dimensions: >station = 3; // collections of polygons >time = UNLIMITED; >node = 24; // = 4 + 3 + 3 + 3 + 5 + 3 + 3 >part = 7 ; // = 3 + 2 + 2 > variables: >float flow(station,time) ; > flow:units="m3 s-1"; > flow:topology="SOMETHING"; >double time(time) ; > time:standard_name = "time"; > time:units = "days since 1970-01-01 00:00:00" ; >char SOMETHING; > SOMETHING:node_coordinates="lon lat"; > SOMETHING:node_count="node_count_var"; > SOMETHING:part_count="part_count_var"; > SOMETHING:topology_type="multipolygon"; >int node_count_var(part); // number of nodes in each polygon >int part_count_var(station); // number of polygons in each collection >float lon(node) ; > lon:standard_name = "longitude"; > lon:units = "degrees_east"; >float lat(node) ; > lat:standard_name = "latitude"; > lat:units = "degrees_north" ; > data: >node_count_var=4, 3, 3, 3, 5, 3, 3; >part_count_var=3, 2, 2; >lon=0, 20, 20, 0, ... // first polygon, etc. ... >lat=0, 0, 20, 20, ... > > Also, two more thoughts regarding not using indirection, but instead > duplicating coincident coordinate values: > > * Doing it this way (without indexing) is consistent with ordinary CF bounds. > Contiguous cells in 1D have bounds with equal values. Thus N cells have 2N > bounds, although usually only N+1 distinct values of bounds. There are several > reasons why we made this choice, one being that it's more flexible, in > allowing > non-contiguous and overlapping cells. > > * The indexing itself takes space. If you have N (lon,lat) points which are > all > boundaries between two regions, so they're all used twice, you will have 4N > coordinate values without indexing. With indexing you will have only 2N, but > the index takes N, making 3N in total. Thus you save 25% of the space, not > 50%. > > Best wishes > > Jonathan > ___ > CF-metadata mailing list > CF-metadata@cgd.ucar.edu > http://mailman.cg
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
Dear Ben and Chris Following Chris's comment about preferring variables to multi-valued attributes, here are the examples for linestring and multipolygon redone so that both use variables to store the counts of parts and nodes. In this scheme more variables and dimensions are needed, but it may be easier to read and it is more CF-like, because the topology information is a "container" variable, like the CF grid_mapping and the ugrid mesh topology, having no numerical information in itself, just with attributes that point to variables. dimensions: station = 3; // stream segments time = UNLIMITED; node = 9; // = 2 + 4 + 3 variables: float flow(station,time) ; flow:units="m3 s-1"; flow:topology="SOMETHING"; double time(time) ; time:standard_name = "time"; time:units = "days since 1970-01-01 00:00:00" ; char SOMETHING; SOMETHING:node_coordinates="lon lat"; SOMETHING:node_count="node_count_var"; SOMETHING:topology_type="linestring"; int node_count_var(station); // number of nodes for each linestring float lon(node) ; lon:standard_name = "longitude"; lon:units = "degrees_east"; float lat(node) ; lat:standard_name = "latitude"; lat:units = "degrees_north" ; data: node_count_var=2, 4, 3; lon=0, 1, 0, -1, -2, -3, 2, 3, 4; lat=51, 52, 51, 50, 50, 49, 55, 55, 56; dimensions: station = 3; // collections of polygons time = UNLIMITED; node = 24; // = 4 + 3 + 3 + 3 + 5 + 3 + 3 part = 7 ; // = 3 + 2 + 2 variables: float flow(station,time) ; flow:units="m3 s-1"; flow:topology="SOMETHING"; double time(time) ; time:standard_name = "time"; time:units = "days since 1970-01-01 00:00:00" ; char SOMETHING; SOMETHING:node_coordinates="lon lat"; SOMETHING:node_count="node_count_var"; SOMETHING:part_count="part_count_var"; SOMETHING:topology_type="multipolygon"; int node_count_var(part); // number of nodes in each polygon int part_count_var(station); // number of polygons in each collection float lon(node) ; lon:standard_name = "longitude"; lon:units = "degrees_east"; float lat(node) ; lat:standard_name = "latitude"; lat:units = "degrees_north" ; data: node_count_var=4, 3, 3, 3, 5, 3, 3; part_count_var=3, 2, 2; lon=0, 20, 20, 0, ... // first polygon, etc. ... lat=0, 0, 20, 20, ... Also, two more thoughts regarding not using indirection, but instead duplicating coincident coordinate values: * Doing it this way (without indexing) is consistent with ordinary CF bounds. Contiguous cells in 1D have bounds with equal values. Thus N cells have 2N bounds, although usually only N+1 distinct values of bounds. There are several reasons why we made this choice, one being that it's more flexible, in allowing non-contiguous and overlapping cells. * The indexing itself takes space. If you have N (lon,lat) points which are all boundaries between two regions, so they're all used twice, you will have 4N coordinate values without indexing. With indexing you will have only 2N, but the index takes N, making 3N in total. Thus you save 25% of the space, not 50%. Best wishes Jonathan ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
Dear Jonathan and Chris, Thanks for bringing this thread back to life! Please don’t take silence on the part of Ben and I as a lack of activity. We have been working on a thorough proposal and are hoping to share it with the community very soon. Chris, I think you will find a number of things in our proposal to your liking. We have attempted to reconcile a number of issues you brought up with our original proposal and the ideas the Jonathan shared. We are working through one last issue (what to do with the old “point” feature type now that we have geometries that are a superset). Once we have some finality on that, I will be circulating a proposal. Regards. - Dave > On Feb 1, 2017, at 11:00 AM, Jonathan Gregory > wrote: > > Dear Chris > >> I really don't like storing info like this in an attribute -- I think it >> should be another variable, instead. it is a bit tricky with "nested" data >> like this, but yu can link variables together with something like: >> >>int SOMETHING(station); // number of polygons in each collection >> SOMETHING:node_coordinates="lon lat"; >> SOMETHING:geometry_type="multipolygon"; >> SOMETHING:node_count="node_count_1" >>int node_count_1(num_nodes); >> >> ... >> data >>node_count_1 = 4, 3, 3, 3, 5, 3, 3; > > Yes, I thought of doing it that way too: that is, use a string attribute to > name a vector integer variable, rather than using a vector integer attribute. > This/your way is more consistent with CF in general, where we have few vector > attributes, and none with variable dimension. So I actually prefer it. I > didn't > do it that way because I thought it looked simpler with an attribute. But I > don't mind. > >> Thus I >>> have combined the two variables I suggested last time (number_of_parts and >>> number_of_nodes) into SOMETHING. >>> >> I think we should come up with a better name here -- it would help be parse >> it anyway :-) > > Indeed. :-) SOMETHING is just the variable name, not the term for this kind > of variable. It might be called a topology variable, for instance. > > Speaking of that, I wonder whether topology_type is a better name than > geometry_type for the specification as points, lines or polygons. That is > topological information. > > Best wishes > > Jonathan > ___ > CF-metadata mailing list > CF-metadata@cgd.ucar.edu > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata > ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
Dear Chris > I really don't like storing info like this in an attribute -- I think it > should be another variable, instead. it is a bit tricky with "nested" data > like this, but yu can link variables together with something like: > > int SOMETHING(station); // number of polygons in each collection > SOMETHING:node_coordinates="lon lat"; > SOMETHING:geometry_type="multipolygon"; > SOMETHING:node_count="node_count_1" > int node_count_1(num_nodes); > > ... > data > node_count_1 = 4, 3, 3, 3, 5, 3, 3; Yes, I thought of doing it that way too: that is, use a string attribute to name a vector integer variable, rather than using a vector integer attribute. This/your way is more consistent with CF in general, where we have few vector attributes, and none with variable dimension. So I actually prefer it. I didn't do it that way because I thought it looked simpler with an attribute. But I don't mind. > Thus I > > have combined the two variables I suggested last time (number_of_parts and > > number_of_nodes) into SOMETHING. > > > I think we should come up with a better name here -- it would help be parse > it anyway :-) Indeed. :-) SOMETHING is just the variable name, not the term for this kind of variable. It might be called a topology variable, for instance. Speaking of that, I wonder whether topology_type is a better name than geometry_type for the specification as points, lines or polygons. That is topological information. Best wishes Jonathan ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
My CDL-reading was off a bit yesterday, so: On Tue, Jan 31, 2017 at 1:22 AM, Jonathan Gregory wrote: > So, for example, we could > store three timeseries, each applying to a collection of polygons, like > this: > > dimensions: > station = 3; // collections of polygons > time = UNLIMITED; > node = 24; // = 4 + 3 + 3 + 3 + 5 + 3 + 3 > variables: > float flow(station,time) ; > flow:units="m3 s-1"; > flow:topology="SOMETHING"; > double time(time) ; > time:standard_name = "time"; > time:units = "days since 1970-01-01 00:00:00" ; > int SOMETHING(station); // number of polygons in each collection > SOMETHING:node_coordinates="lon lat"; > SOMETHING:geometry_type="multipolygon"; > SOMETHING:nodes=4, 3, 3, 3, 5, 3, 3; // number of nodes in each > polygon > I really don't like storing info like this in an attribute -- I think it should be another variable, instead. it is a bit tricky with "nested" data like this, but yu can link variables together with something like: int SOMETHING(station); // number of polygons in each collection SOMETHING:node_coordinates="lon lat"; SOMETHING:geometry_type="multipolygon"; SOMETHING:node_count="node_count_1" int node_count_1(num_nodes); ... data node_count_1 = 4, 3, 3, 3, 5, 3, 3; Thus I > have combined the two variables I suggested last time (number_of_parts and > number_of_nodes) into SOMETHING. > I think we should come up with a better name here -- it would help be parse it anyway :-) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
Dear Chris Thanks for your comments on my comments. Here are replies to a subset! > > Your aim is to > > describe the network alone. ... > > You would like to have SOMETHING alone in the file, just to > > describe the network itself. CF doesn't do this at present (domain without > > data), > > I don't see a conflict here -- if you can describe the network (geometry) > then you can associate data with it (UGRID used indexes into cells, nodes, > etc, this should be equally applicable) > > isn't a set of coordinate variables essentially do that? i.e. you can > define a rectangular grid -- even if there is no data on it. And you can > certainly do that with UGRID, which is another standard, but I don't think > it conflicts with CF. There isn't a conflict, I agree, but it's not currently possible in CF. That is because the data variable has all the coordinates attached to it, so you can't have coordinates without data. Of course it could easily be done, for instance by providing a dummy data variable which identified the dimensions but was itself a scalar - that's been discussed before, but no-one's proposed yet to add it to the convention. It's not a conceptual difficulty, but it is an addition to the data model. > > data: > > SOMETHING=2, 4, 3; > > lon=0, 1, 0, -1, -2, -3, 2, 3, 4; > > lat=51, 52, 51, 50, 50, 49, 55, 55, 56; > I'm confused about what this is. There are three linestrings. The SOMETHING variable says how many nodes each has, and the lon and lat variables are the coordinates of those nodes. > > For the sake of applications which can > > read CF but don't understand simple geometries, it might be a good idea in > > addition to provide a "representative" location for each timeseries, as > > representive_lat(station) and representative_lon(station), which could for > > instance be the mean of the node coordinates for each geometry. > > We do that in UGRID, too -- I think it's even required (and called > coordinates, actually). It may make little sense with complex geometries, > but it can be handy. Yes. It is required in CF as well, and the attribute is named coordinates; I think ugrid follows this. > The stream network example would be a good one. also things like political > boundaries -- they tend to be complex polygons with shared vertices. There's a shared vertex at the confluence of two streams, but I guess those are a fairly small fraction of the total number of points. With political boundaries, I agree that most points (not coastlines) will appear twice. Best wishes Jonathan ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
A couple quick comments: I think we're close here, so that's good. I'm not that clear on where tehre are decisions left to be made, but I'll highlight two: ... > Your aim is to > describe the network alone. > ... > a collection of timeseries is stored as a > data variable with a single dimension of time and a single dimension of > space. > I don't see a conflict here -- if you can describe the network (geometry) then you can associate data with it (UGRID used indexes into cells, nodes, etc, this should be equally applicable) > You would like to have SOMETHING alone in the file, just to > describe the network itself. CF doesn't do this at present (domain without > data), isn't a set of coordinate variables essentially do that? i.e. you can define a rectangular grid -- even if there is no data on it. And you can certainly do that with UGRID, which is another standard, but I don't think it conflicts with CF. > Taking your previous comments into account (I'll come back to them below), > as > a modified version of what I suggested before, here's a possible way to > handle > this case, for a small number (3) of linestrings: > That looks good to me, I think... > > data: > SOMETHING=2, 4, 3; > lon=0, 1, 0, -1, -2, -3, 2, 3, 4; > lat=51, 52, 51, 50, 50, 49, 55, 55, 56; > I'm confused about what this is. These simple geometries can be regarded as a more complex alternative to > cells > bounds - each timeseries has a complicated geometry of nodes and lines, but > logically it's still a single "cell". yup. > For the sake of applications which can > read CF but don't understand simple geometries, it might be a good idea in > addition to provide a "representative" location for each timeseries, as > representive_lat(station) and representative_lon(station), which could for > instance be the mean of the node coordinates for each geometry. We do that in UGRID, too -- I think it's even required (and called coordinates, actually). It may make little sense with complex geometries, but it can be handy. > You propose the index variable in order for the convention to be like > > ugrid. However this still seems to me to be an unnecessary complexity and > > use of space if you aren’t going to have many shared nodes. > To be frank, I'm not convinced by either argument. Regarding the first, in > your > example you don't reuse any points at all. Can you give an example where > there > is a lot of reuse? The stream network example would be a good one. also things like political boundaries -- they tend to be complex polygons with shared vertices. > Regarding the second, I agree that it is a nuisance and > unreliable to have to make comparisons with tolerance between > floating-point > numbers to determine equality. However, when you write a file, I suppose > you > can and would write exactly the same numbers for the coordinates of a node > if > it appears several times, wouldn't you? Thus the coincidence of nodes can > be > tested by *exact* equality of coordinates - no tolerance needed. > you still don't know fo sure if the vertices are the SAME or if the Happen to be the same. This is a tough one -- the "normal" GIS data model does not have shared nodes (that I know of) so perhaps we should follow that. But this lack of shared nodes is actually a substantial pain for GIS systems and uses -- there is a lot of complex "snapping" that needs to be done. So I'm on the fence about this -- I'm pretty convinced shared nodes are a better model, but if we want to interact seamlessly with other GIS formats, we may be better off matching that data model. In my example above, I assumed the polygons have no holes in them, so I've > omitted the inside/outside information. If needed, this information could > also > be an attribute e.g. SOMETHING:inout="OIIIIOO", with as many elements > as > there are polygons in total. Thinking again about it, I wonder whether this > information is really needed. If you draw all the polygons, isn't it > apparent > which ones are inside anyway? When would you use this information? > it's not always clear. if there is a hole in a polygon, you can figure it out, but if there is a lake in a land polygon, and a island in the lake, then it gets pretty tricky. I think shapefiles use clockwise vs anti-clockwise to indicate inside-outside, but IIUC, they are pretty limited with nested polygons, too. > My scheme avoids the use of break values, which you're not very keen on > your- > selves, it sounds like. I don't like break values either. > You wrote > - It is more difficult to extract a single geometry using this > approach. It's not hard, though, and the same comment would apply to the > CF > contiguous ragged array representation. yes -- you can represent a ragged array by either specifying the start-index of each "row", or by specifying the size of each row. CF specifies the size of each row. I think that's a worse way to do it -- it's similar if you a
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
Dear Ben Thanks for your new thoughts. I find this intriguing but still puzzling, and I think this means we are talking at cross-purposes. Perhaps we ought to speak on the phone? However, here are some replies. Maybe this is a clue to our differences: > We intend for this proposal to fit in the Discrete Sampling Geometry > timeSeries featureType. So this proposal does not contain any new mechanism > to link a time-varying data variable with a network composed of polygons, > points, and lines (a whole hydrologic system for example). ... > We would never associate time-varying data with nodes or > the edges between them. So far, CF describes data, and provides coordinates to locate the data in space and time (and other dimensions). I'm not really familiar with the terminology, but I understand that this is called a "coverage" - that is, a data which is a function of a domain. Your "new mechanism" sentence suggests that your aim is to describe just a domain, with no data. Maybe that's why you're agnostic about whether and how the 2.7 million stream segments are grouped. Your aim is to describe the network alone. But if you want to link it to CF timeseries, as you say you do, this question must has a definite answer, because a collection of timeseries is stored as a data variable with a single dimension of time and a single dimension of space. The latter is an index to information which locates the data e.g. a simplified version of CF example H.2: dimensions: station = 10 ; // measurement locations time = UNLIMITED ; variables: float humidity(station,time) ; humidity:standard_name = "specific humidity" ; humidity:coordinates = "lat lon" ; double time(time) ; time:standard_name = "time"; time:units = "days since 1970-01-01 00:00:00" ; float lon(station) ; lon:standard_name = "longitude"; lon:units = "degrees_east"; float lat(station) ; lat:standard_name = "latitude"; lat:units = "degrees_north" ; Here the the data is located at 10 (lon,lat) points. In the streamflow example I guess that each of the 2.7M stream segments has a timeseries of flow rates - is that right? That means we have to replace the points with linestrings (which I think is essentially the same as polylines, isn't it?), one for each stream segment. There must be exactly the same number of linestrings as there are timeseries. We need something like: dimensions: station = 270; // stream segments time = UNLIMITED; variables: float flow(station,time) ; flow:units="m3 s-1"; double time(time) ; time:standard_name = "time"; time:units = "days since 1970-01-01 00:00:00" ; SOMETHING(station) // to describe the geometry of each stream segment Your proposal is about the SOMETHING, but not how it links to the data. Is that right? You would like to have SOMETHING alone in the file, just to describe the network itself. CF doesn't do this at present (domain without data), but it's been discussed before, and if we agree a CF convention for SOMETHING, it could also be linked to the timeseries data variables. Taking your previous comments into account (I'll come back to them below), as a modified version of what I suggested before, here's a possible way to handle this case, for a small number (3) of linestrings: dimensions: station = 3; // stream segments time = UNLIMITED; node = 9; // = 2 + 4 + 3 variables: float flow(station,time) ; flow:units="m3 s-1"; flow:topology="SOMETHING"; double time(time) ; time:standard_name = "time"; time:units = "days since 1970-01-01 00:00:00" ; int SOMETHING(station); // number of nodes for each linestring SOMETHING:node_coordinates="lon lat"; SOMETHING:geometry_type="linestring"; float lon(node) ; lon:standard_name = "longitude"; lon:units = "degrees_east"; float lat(node) ; lat:standard_name = "latitude"; lat:units = "degrees_north" ; data: SOMETHING=2, 4, 3; lon=0, 1, 0, -1, -2, -3, 2, 3, 4; lat=51, 52, 51, 50, 50, 49, 55, 55, 56; The timeseries flow(0,*) is for the 2-point line from (0,51) to (1,52), and flow(1,*) is for the 4-point line (0,51) -> (-1,50) -> (-2,50) -> (-3,49). Timeseries of data on polygons (one timeseries for each polygon) would be done in the same way, with geometry_type="polygon". The topology attribute of the data variable provides a link to the SOMETHING variable, which specifies how the nodes are connected to make the linestring or polygon for each timeseries. I use the attribute names "topology" and "node_coordinates" to be reminiscent of ugrid. The SOMETHING variable could exist in a file without data variables to describe the linestrings alone. With linestrings and polygons, the geometry of each timeseries has a single part (one linestring or one polygon), so the SOMETHING variable is used to specify the number of nodes for each geometry. The example we
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
Jonathan, Chris, and CF-Metadata, Thank you kindly for the replies and apologies for the delay on the response. Please see responses below. The comments were very useful in introducing some conceptual hangups and will help us moving forward. As usual, looking forward to continued discussion. A quick note. We’ve revised our thinking a bit and recorded our current design in an AGU poster: http://goo.gl/0NI4Sd. Based on the feedback in this thread and what we’ve learned in the process of preparing the poster and software, we need to discuss a bit more and will prepare a proposal for the community to review soon. I was asking whether this means that for each *collection* (of points, lines or polygons) there is a *single* timeseries. For instance, in your example of a single geometry composed of several polygons, there is a single number for each time. But that is not the case for weather stations; for each weather station there is a timeseries, and at each time there is a different number (value of temperature, precipitation or whatever) for each weather station. You also write, “The US National Weather Service’s National Water Model (NWM) … forecasts streamflow rates in about 2.7 million stream segments averaging 2km.” The stream network is a MultiLineString geometry, but I don’t think there is just one value of streamflow applying to the entire network at any given time; I guess there is a different timeseries for each stream segment. But in my example above, the Atlantic Ocean is a single polygon with a single timeseries for its average temperature, not a different timeseries for each node. Thus I am unclear about the dimensions of the data. In terms of your original example, does the data have dimensions (time,geometry, where geometry=1) or (time,node)? Before diving in, it’s critical to define some terminology. A geometry is meant to refer to a potentially multipart geometric entity that might otherwise be called a feature. A geometry is made up of one or more points, lines, or polygons. That said, we are thinking the dimensions of time-varying data would be (time, geometry) where time and geometry may have arbitrary lengths. Hence, multiple time-varying variables could be associated with each geometry. Chris addressed this in his response. How geometry data is “exploded” is up to the client-software. The 2.7 million stream segments would likely not be a single MultiLineString geometry. The geometry counts could be 2.7 million Linestrings and 2.7 million Polygons. One could collapse all this geometry data into single multi-geometries, but this would prove unwieldy. Some of the LineStrings could be discontinuous multilinestring geometries (only requiring one index on the geometry dimension but consisting of two physical LineStrings). This seems to me to be a crucial difference. In the former case the simple geometry can be regarded as a more complex alternative to cells bounds - the cell has a complicated geometry of nodes and lines, but it’s still a single cell. In the latter case you’re providing many timeseries in an unstructured geometry, which is what ugrid describes. Which do you have in mind? We intend for this proposal to fit in the Discrete Sampling Geometry timeSeries featureType. So this proposal does not contain any new mechanism to link a time-varying data variable with a network composed of polygons, points, and lines (a whole hydrologic system for example). UGRID provides some mechanisms for this similar to other CF conventions (data is associated with a grid center point and its bounds for example - or a “face” has a center point in UGRID). It’s possible your question is still not being addressed. “Nodes” are used in all geometries. We would never associate time-varying data with nodes or the edges between them. Data would always be associated with the geometry (a feature comprised of nodes). You propose the index variable in order for the convention to be like ugrid. However this still seems to me to be an unnecessary complexity and use of space if you aren’t going to have many shared nodes. I think the case for having another convention, distinct from ugrid, is stronger if it is *unlike* ugrid in this respect, and therefore simpler as well. Sharing nodes should be possible within the spec in our opinion. There may be overhead for some dataset encodings, but if one is willing to sacrifice computational time when writing complex geometric datasets with shared-node-topology, considerable disk space and memory may be saved. Reusing coordinate indexing indirection does not seem like a duplication of UGRID. In fact, it makes sense to align with UGRID as much as possible to facilitate data exchange. I agree that repeating the inside/outside flag many times is wasteful. That, coupled with your clarification that you may have several geometries, each consisting of several elements (points, lines, polygons), means that you need, in effect, a ragged array of ragged arrays (geometry,elem
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
A little note: On Tue, Nov 1, 2016 at 9:43 AM, Chris Barker wrote: > I'm on shakier ground about when you want to use a GeometryCollection vs a > FeatureCollection, but I _think_ that the point of a geometrycollection is > that you can group different types of geometry -- but still want them to be > treated as a single entity. > some quick googling led me to this discussion: https://github.com/topojson/topojson/issues/37 which indicates that a GeometryCollection is generally treated as a single entity. -CHB > I've dealt with all this trying to jam data that fits well into netcdf > into geoJSON, or GIS_oriented systems -- it's quite hard to be efficient > about it :-) - i.e there is really no way to associate an array of data > with an array of geometries -- it sure looks like you could do it with > GeometryCollections, but the systems aren't expecting that. > > Of course, CF doesn't need to follow this data model, but it's a good idea > to be informed by it. > > >> Nonetheless in both cases the geometries have to be described. I think the >> difference is how we attach this description to the data or coordinates, >> rather >> than how the description is constructed. >> > > indeed. > > >> You propose the index variable in order for the convention to be like >> ugrid. >> However this still seems to me to be an unnecessary complexity and use of >> space >> if you aren't going to have many shared nodes. > > > In the GIS data model, nodes are not shared between geometries, and you > are quite right that keeping nodes separate with geometries indexing nto it > is an added complication and would not be space-efficient. > > However, there is another reason to do it -- it makes it definitive that > two (or more) geometries share the exact same node, rather than them being > distinct points that happened to be at the same location (Or worse, with FP > error and all, two points that are very close)e > > This is actually a major limitation in the standard GIS model. > > >> I think the case for having >> another convention, distinct from ugrid, is stronger if it is *unlike* >> ugrid >> in this respect, and therefore simpler as well. >> > > I still think that it should be separate from UGRID -- it really is a > different use case, though they should still share whatever they can, and > it could turn out that UGRID is a special case of geometries? > > -CHB > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R(206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > chris.bar...@noaa.gov > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
A few comments, though you all seem to have this in hand :-) I was asking whether this means that for each *collection* (of points, > lines or > polygons) there is a *single* timeseries. I don't get why this matters -- any number of time series could be associated with a single "entity" -- just like any number of timeseries can be associated with given coordinates in regular old CF. > For instance, in your example of a > single geometry composed of several polygons, there is a single number for > each > time. But that is not the case for weather stations; for each weather > station > there is a timeseries, and at each time there is a different number (value > of > temperature, precipitation or whatever) for each weather station. I think it may be helpful to borrow terminology (and the data model) from the GIS world here. IN this case, I am referencing the geoJSON spec, as I happen to be working with that at the moment, but the basic data model is pretty consistent. http://geojson.org/geojson-spec.html Note that they have "geometries" which can be things like points, polygons, polyllines. IIUC (and I'm no osgeo mavin) geometries represent a "single" entity. Then there are "Features": a Feature is essentially data associated with a particular geometry. But note: there are "Collections" -- both Geometry and Feature Collections -- that is what you use to "bundle" various data together. I think we may be well served by thinking in terms of mapping the GIS data model to CF/netcdf -- for instance it would be great to be able to write a netcdf<->geoJSON converter that was lossless, AND would be fairly "native" in both cases. You also > write, "The US National Weather Service’s National Water Model (NWM) ... > forecasts streamflow rates in about 2.7 million stream segments averaging > 2km." > The stream network is a MultiLineString geometry, but I don't think there > is > just one value of streamflow applying to the entire network at any given > time; > no -- of course not. So that network (if I understand the GIS data model) should be a Feature Collection, not all one Feature. So a whole collection of geometries as well. The "trick" with this data model is that it "de-vecoritizes" the data. Those of us used to working with netcdf, CF, gridded data, etc, tend to think that you'd want to have, for instance, a vector of geometries, and then various vectors of data associated with those geometries. whereas the GIS data model associated data with a given geometry, and then creates collections of those. This is kindof like the old C conundrum: Do a use a struct of arrays, or an array of structs? netcdf is very much about the struct of arrays approach. (though I'm still confused, maybe you can have an "array" of data associated with a GeometryCollection?) as for MultiLineString -- you could associate an array of data with the Multilinestring -- so one value per segment. But I think that violates the intent of the data model -- you should have a GeometryCollection of linestrings instead. and then each segment has its own geometry and you can associate an array of data with that. (or it should be a FeatureCollection? I'm getting confused now! I guess there is a different timeseries for each stream segment. But in my > example above, the Atlantic Ocean is a single polygon with a single > timeseries > for its average temperature, not a different timeseries for each node. right, so that Polygon would be a single Feature. > Thus I > am unclear about the dimensions of the data. In terms of your original > example, > does the data have dimensions (time,geometry, where geometry=1) or > (time,node)? > (time,geometry, where geometry=1) time,node would be for data associated with a FeatureCollection of Points (or a MultiPoint). Does anyone "get" the GIS data model. I'm quite confused as to when you would use: MultiPolygon vs GeometryCollection of Polygons vs FeatureCollection of Features with Polygon Geometries But I'm going t take a stab at it: MultiPolygon (and MultiLInestring, and MultiPoint) is used when you have more than one of a particular type of geometry that are logically one thing -- maybe an archipelago, for instance. A Polygon geometry can represent a simple polygon, or a polygon with holes in it -- but can not represent two separate polygons. So if you have multiple polygons that are geometrically distinct, but logically connected, you use a MultiPolygon. I'm on shakier ground about when you want to use a GeometryCollection vs a FeatureCollection, but I _think_ that the point of a geometrycollection is that you can group different types of geometry -- but still want them to be treated as a single entity. I've dealt with all this trying to jam data that fits well into netcdf into geoJSON, or GIS_oriented systems -- it's quite hard to be efficient about it :-) - i.e there is really no way to associate an array of data with an array of geometries -- it sure looks like you could do it with
[CF-metadata] Feedback requested on proposed CF Simple Geometries
Dear Dave > I’ll respond the first question by saying that we are talking about > (time,geometry) NOT (time,node). Good. Thanks for the clarification. > You are correct in thinking that this is analogous to a complex (potentially > multipart) cell. In this case, we feel that it is more analogous to a > different spatial representation of a station DSG data type than an extension > of cell geometry, but different use cases may have differing a-priori > relationships to the existing standards baseline. Timeseries have data(time,station), which is like your case. When all stations have the same sampling times, this is just a 2D array and the station dimension is a "discrete axis" (CF sect 4.5). This existed before sect 9 on DSGs was added to CF. That new section provides a mechanism for storing ragged data arrays for multiple timeseries without wasting space, but logically it is equivalent to the rectangular array, which it upholds as a possible way to store the data. So I think actually the two cases you mention are the same. The new thing you want to do is describe the "station" for each timeseries in a geometrically complex way, rather than its being a single point or a single polygon, which can already be described by coordinates and coordinate bounds, respectively. > DSG handles data from one or a collection of TimeSeries (point), Trajectory, > Profile, TrajectoryProfile or TimeSeriesProfile. So measurements are from a > point (TimeSeries and Profile) or points along a trajectory. DSG can be used > for at least some of what you want to do here if you say, e.g., I have a > TimeSeries which is stream flow measured (or modeled) at a given point on a > stream. But DSG has no system to define a geometry (point, polyline, polygon, > etc) and say e.g, The expected rainfall in this polygon for some period of > time is 5304 liters, except to assign a nominal point (centroid?) or just use > an ID (e.g., San Francisquito Catchment section 5A). I suggest that if we see it this way you would still provide representative coordinates for each station, whether you stored it as a rectangular array or a ragged one. These could be useful for the simplest kind of plotting, which wants a location for each "geometry". However, these coordinates would not have bounds, because that's not adequate to describe the structure. There are many choices for how the geometry data could be stored. Here's a suggestion along the lines of my last email. I have made it resemble ugrid (and grid_mapping) in using a "container" variable to "host" the geometry description. In this way of doing it, each data variable points to the geometry variable, and the geometry variable points to the geometry coordinates, with no direct link between the representative (geometry) coordinates and the geometry (node) coordinates. It could alternatively, or additionally, be arranged so that the representative coordinates point to the geometry coordinates. However, my suggestion below is unlike other uses of container variables, which are scalars that don't contain information. My geometry variable is an auxiliary coordinate variable, pointed to by the usual CF coordinates attribute, and its value contains information - it gives the number of parts in each geometry. It is identifiable as a geometry variable by its special attributes. Making it formally an aux coord variable avoids having to invent a new attribute to point to it. The "inout" variable contains I or O for inside or outside polygon; it could also contain L for line and P for point. geom=3; part=11; node=36; time=20; float p(time,geom); p:standard_name="precipitation_flux"; p:units="kg m-2 s-1"; p:coordinates="xrep yrep geom3"; float t(time,geom); t:standard_name="air_temperature"; t:units="K"; t:coordinates="xrep yrep geom3 z"; float z; z:standard_name="height"; z:units="m"; int geom3(geom); geom3:part_dimension="part"; // must equal the sum of geom3 geom3:node_count="nodes_per_part"; geom3:part_type="inout"; geom3:node_dimension="node"; // must equal the sum of the node_count geom3:node_coordinates="x y"; // also an attribute in ugrid float xrep(geom); float yrep(geom); int geom3(geom); int nodes_per_part(part); char inout(part); float x(node); float y(node); geom=6, 3, 2; nodes_per_part=4, 3, 3, 3, 3, 3, 3, 5, 3, 3, 3; inout="OIIIOOOIO"; x=0, 20, 20, 0, 1, 10, 19, 5, 7, 9, 11, 13, 15, 5, 9, 7, 11, 15, 13, -40, -20, -45, -20, -10, -10, -30, -45, -30, -20, -20, 30, 45, 10, 25, 50, 30; y = 0, 0, 20, 20, 1, 5, 1, 15, 19, 15, 15, 19, 15, 25, 25, 29, 25, 25, 29, -40, -45, -30, -35, -30, -10, -5, -20, -20, -15, -25, 20, 40, 40, 5, 10, 15; z = 1.5; I agree you can't use ugrid as it stands, because ugrid describes a single mesh per data variable, with many data "points" at nodes, edges and faces of the mesh. You want several meshes (in ugrid terms) for each data variable, with only one data "point" fo
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
Dear Ben and Bert Thanks for your emails, which help me to understand the simple geometry proposals better. Just to be clear, I'd like to repeat my first question. > You explain that the need is to specify spatial coordinates with a simple > geometry for a timeSeries variable. For example, this could be for the > discharge as a function of time across some line in a river (your example), > or I suppose it could be an average temperature as a function of time for > the Atlantic Ocean, where you wanted to supply the polygon which drew the > outline of the basin. Have I got the idea? to which you replied > Yes, you have this mostly right. It’s common to have a collection of points > (weather stations), lines (stream reaches), or polygons (hydrologic > catchments) with an associated time series I was asking whether this means that for each *collection* (of points, lines or polygons) there is a *single* timeseries. For instance, in your example of a single geometry composed of several polygons, there is a single number for each time. But that is not the case for weather stations; for each weather station there is a timeseries, and at each time there is a different number (value of temperature, precipitation or whatever) for each weather station. You also write, "The US National Weather Service’s National Water Model (NWM) ... forecasts streamflow rates in about 2.7 million stream segments averaging 2km." The stream network is a MultiLineString geometry, but I don't think there is just one value of streamflow applying to the entire network at any given time; I guess there is a different timeseries for each stream segment. But in my example above, the Atlantic Ocean is a single polygon with a single timeseries for its average temperature, not a different timeseries for each node. Thus I am unclear about the dimensions of the data. In terms of your original example, does the data have dimensions (time,geometry, where geometry=1) or (time,node)? This seems to me to be a crucial difference. In the former case the simple geometry can be regarded as a more complex alternative to cells bounds - the cell has a complicated geometry of nodes and lines, but it's still a single cell. In the latter case you're providing many timeseries in an unstructured geometry, which is what ugrid describes. Which do you have in mind? Nonetheless in both cases the geometries have to be described. I think the difference is how we attach this description to the data or coordinates, rather than how the description is constructed. You propose the index variable in order for the convention to be like ugrid. However this still seems to me to be an unnecessary complexity and use of space if you aren't going to have many shared nodes. I think the case for having another convention, distinct from ugrid, is stronger if it is *unlike* ugrid in this respect, and therefore simpler as well. I agree that repeating the inside/outside flag many times is wasteful. That, coupled with your clarification that you may have several geometries, each consisting of several elements (points, lines, polygons), means that you need, in effect, a ragged array of ragged arrays (geometry,element,node). This is more complicated than DSGs, but it seems to me it would be reasonably easy to understand if your multi-geometry example https://github.com/bekozi/netCDF-CF-simple-geometry/wiki/VLEN-Arrays-in-NetCDF-3#multipolygon-example was stored something like this: geom=3; part=11; node=36; int number_of_parts(geom); number_of_parts:parts="number_of_nodes"; int number_of_nodes(part); number_of_nodes:inout="inout"; char inout(part); float x(node); float y(node); number_of_parts=6, 3, 2; number_of_nodes=4, 3, 3, 3, 3, 3, 3, 5, 3, 3, 3; inout="OIIIOOOIO"; x=0, 20, 20, 0, 1, 10, 19, 5, 7, 9, 11, 13, 15, 5, 9, 7, 11, 15, 13, -40, -20, -45, -20, -10, -10, -30, -45, -30, -20, -20, 30, 45, 10, 25, 50, 30; y = 0, 0, 20, 20, 1, 5, 1, 15, 19, 15, 15, 19, 15, 25, 25, 29, 25, 25, 29, -40, -45, -30, -35, -30, -10, -5, -20, -20, -15, -25, 20, 40, 40, 5, 10, 15; where I assume that all polygons are closed. What do you think? Best wishes Jonathan ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
On Tue, Sep 27, 2016 at 3:52 PM, Chris Barker wrote: > Thanks for all the great input, Bert. > > One comment: > >> >> 5) Besides inventing our own storage format (either in line with UGRID or >> CF), a third way was discussed namely: to store the simple geometry shapes >> as ascii or binary blobs in an extended format NetCDF 4 file. > > > I think binary blobs is a really bad idea (and what would be the format of > those blobs? shape files? or maybe WEll KnownBinary? > > I agree--it sounds almost absurd to take a file format whose claim to fame is being self-describing, and use it to store data in a format that is no longer self-describing. I also lean (though less strongly) towards applying this same argument to well-known text. Ryan -- Ryan May, Ph.D. Software Engineer UCAR/Unidata Boulder, CO ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
Thanks for all the great input, Bert. One comment: > > 5) Besides inventing our own storage format (either in line with UGRID or > CF), a third way was discussed namely: to store the simple geometry shapes > as ascii or binary blobs in an extended format NetCDF 4 file. I think binary blobs is a really bad idea (and what would be the format of those blobs? shape files? or maybe WEll KnownBinary? But WellKnownText or geoJSON might be reasonable. I'd still rather have it done "properly" with netcdf arrays. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
Jonathan and CF-Metadata List, Thanks for the suggestions and discussion. We’ve attempted to respond to the major questions and concerns using Jonathan's mail as a template. Apologies in advance if we missed anything outstanding or did not appropriately acknowledge contributions in this thread. You explain that the need is to specify spatial coordinates with a simple > geometry for a timeSeries variable. For example, this could be for the > discharge as a function of time across some line in a river (your example), > or I suppose it could be an average temperature as a function of time for > the Atlantic Ocean, where you wanted to supply the polygon which drew the > outline of the basin. Have I got the idea? Yes, you have this mostly right. It’s common to have a collection of points (weather stations), lines (stream reaches), or polygons (hydrologic catchments) with an associated time series. Timeseries like this can be stored in CF, but their geographical extent is > usually described only in words e.g. a region name of atlantic_ocean, and > this is fine for applications like CMIP where you want to compare data from > different data sources in which the Atlantic Ocean may have different exact > shapes (different AOGCMs, in particular). An array of region names is also > possible, so I don't think we need a new convention to contain your dwarf > planet example. The dwarf planet example is intended to describe our generalized approach to continuous ragged arrays that may be used for arbitrarily-sized data arrays. For some (including me), using a string instead of a numeric example helps illustrate the concept. It is an idiosyncratic example in many ways. Sorry for the confusion. Sect 9.1 on discrete sampling geometries says it cannot yet be used for > cases "where geo-positioning cannot be described as a discrete point > location. Problematic examples include time series that refer to a > geographical region (e.g. the northern hemisphere) ...". Actually I think > that's not quite right. The existing convention *can* describe regions > which are contiguous, and rectangular or polygonal, using its usual bounds > convention (Sect 7.1). I think we should consider changing this text, > because it seems unnecessarily restrictive. Your explanation makes sense, and this should be captured in the DSG convention text. > If the regions were irregular polygons in latitude and longitude, nv would > be the number of vertices and the lat and lon bounds would trace the > outline of the polygon e.g. nv=3, lat=0,90,0 and lon=0,0,90 describes the > eighth of the sphere which is bounded by the meridians at 0E and 90E and > the Equator. I think, therefore, we do not need an additional convention > for points or polygonal regions. Many earth science datasets (excluding triangular, hexagonal, etc. meshes) representable as polygons and lines have differing node counts. "nv" could not efficiently capture watershed A with 5 nodes and watershed B with 100. Additionally, the cell bounds concept does not include the structure and semantics needed to support MultiLines, MultiPolygons, or polygons with holes/interiors. However, we would need new conventions for a timeseries where each value > applies to a set of discontiguous regions or regions with holes in them, a > set of points, a line or a set of lines. I guess that these are included in > the geometry types you list (LineString, Multipoint, MultiLineString, and > MultiPolygon). Yes. Do you have definite use-cases for all of these? (I ask this because we > don't add new functionality to CF until there is a definite and common need > for it in practice.) David Arctur described the primary motivation for developing the simple geometries approach: "Among other applications, NetCDF-CF is now being used as an intermediate & output data format in the US National Weather Service’s National Water Model (NWM). This forecasts streamflow rates in about 2.7 million stream segments averaging 2km, throughout the continental US, at multiple time horizons (3 hr, 18 hr, 10 days) every hour, and an ensemble for 30-day forecast less frequently." These data also contain multi-geometries primarily in the form of MultiLineStrings and MultiPolgyons. To this we would add that working with GIS datasets of this magnitude is difficult with current NetCDF metadata conventions, often yielding an unwieldy hybrid of NetCDF data and other softwares like ESRI ArcGIS and PostGIS. ESRI ArcGIS and PostGIS are not usable on many HPC platforms where models like the NWM reside. I suspect that geometries of this kind can be described by the ugrid > convention http://ugrid-conventions.github.io/ugrid-conventions, which is > compliant with CF. Their purpose is to describe a set of connected points, > edges or faces at which values are given, whereas in your case you'd give a > single value for the whole set, but the description of the geometry itself > might be similar. Have you had a look at whether
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
uts, and circular arcs would be automatically covered consistently by this > method. > > 6) Actually, I have a related use case for storing a network of polylines > (rather than straight edges) in UGRID compatible format for 1D hydraulic > models. In this case I need to store polylines representing river branches: > their connectivity at bifurcations and confluences is important, but so is > their overall length - river chainage - and hence I can't split them into the > base edges. This network defines basically the 1D coordinate system to be > used by the actual 1D (UGRID) simulation mesh which will be defined on top of > this channel network. Because of the link to 1D numerical modelling, this > will be discussed in a separate thread in the UGRID community first. > > Best regards, > > Bert > > -Original Message- > From: CF-metadata [mailto:cf-metadata-boun...@cgd.ucar.edu] On Behalf Of > Jonathan Gregory > Sent: 22 September 2016 18:26 > To: cf-metadata@cgd.ucar.edu > Subject: Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries > > Dear Chris > >>> If the regions were irregular >>> polygons in latitude and longitude, nv would be the number of >>> vertices and the lat and lon bounds would trace the outline of the >>> polygon e.g. nv=3, >>> lat=0,90,0 >>> and lon=0,0,90 describes the eighth of the sphere which is bounded >>> by the meridians at 0E and 90E and the Equator. I think, therefore, >>> we do not need an additional convention for points or polygonal >>> regions. >> >> this seems fine for this simple example, but burying a bunch of >> coordinates of a complex polygon in a text string in an attribute is >> really not a good idea -- the coordinates of a polygon should be in >> the array data one way or another, rather than having to parse out attribute >> strings. > > To avoid confusion: > > I didn't suggest parsing attribute strings. The same numbers that Ben would > put in his x and y auxiliary coordinate variables for a single polygon can > appear in coordinate bounds variables according to the existing convention. > >> * I suspect that geometries of this kind can be described by the ugrid >>> convention http://ugrid-conventions.github.io/ugrid-conventions, >>> which is compliant with CF. Their purpose is to describe a set of >>> connected points, edges or faces at which values are given, >> >> I'm not so sure -- UGRID is about defining a bunch of polygons that >> all share vertices, and are all of the same order (usually all >> triangles, or quads, or maybe hexes). if they are a mixture, you still >> store the full set (say, six vertices), while marking some as unused. >> But it's not that well set up for a bunch of polygons of different order. >> >> Not too bad if there are only one or two complex polygons, but it >> would be a bit weird -- you'd have vertices and boundaries, but no >> faces. And you'd lose t order of the vertices (thought that could >> probably be added to the UGRID standard) > > OK. I didn't investigate this, but it would be good to know about it. If > ugrid can do something like this, but not all of it, maybe ugrid could be > extended. If ugrid seems too complicated for these cases, maybe a "light" > version of ugrid could be proposed for them. I think we should avoid having > two partially overlapping conventions. > >> * So far CF does not say anything about the use of netCDF-4 features (i.e. >>> not >>> the classic model). We have often discussed allowing them but the >>> general argument is also made that there has to be a compelling case >>> for providing a new way to do something which can already be done. >>> (Steve Hankin often made this argument, but since he's mostly >>> retired I'll make it now in his name >>> :-) >>> >> >> Maybe it's time to embrace netcdf4? It's been a while! Though maybe >> for CF >> 2.* -- any movement on that? > > I think, as we generally do, that we should adopt netCDF-4 features if there > is a definite need to do so. I mean something you can't do with an existing > mechanism, or which is done so much more easily with a new mechanism that it > justifies the extra effort of requiring alternatives to be programmed in > software. I'm not arguing against it in general, but I think it has to be > argued for each specific need within the convention. > > CF2 is not well-defined. I have to admit to being nervous about that. I am > very much oppos
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
elling, this will be discussed in a separate thread in the UGRID community first. Best regards, Bert -Original Message- From: CF-metadata [mailto:cf-metadata-boun...@cgd.ucar.edu] On Behalf Of Jonathan Gregory Sent: 22 September 2016 18:26 To: cf-metadata@cgd.ucar.edu Subject: Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries Dear Chris > > If the regions were irregular > > polygons in latitude and longitude, nv would be the number of > > vertices and the lat and lon bounds would trace the outline of the > > polygon e.g. nv=3, > > lat=0,90,0 > > and lon=0,0,90 describes the eighth of the sphere which is bounded > > by the meridians at 0E and 90E and the Equator. I think, therefore, > > we do not need an additional convention for points or polygonal > > regions. > > this seems fine for this simple example, but burying a bunch of > coordinates of a complex polygon in a text string in an attribute is > really not a good idea -- the coordinates of a polygon should be in > the array data one way or another, rather than having to parse out attribute > strings. To avoid confusion: I didn't suggest parsing attribute strings. The same numbers that Ben would put in his x and y auxiliary coordinate variables for a single polygon can appear in coordinate bounds variables according to the existing convention. > * I suspect that geometries of this kind can be described by the ugrid > > convention http://ugrid-conventions.github.io/ugrid-conventions, > > which is compliant with CF. Their purpose is to describe a set of > > connected points, edges or faces at which values are given, > > I'm not so sure -- UGRID is about defining a bunch of polygons that > all share vertices, and are all of the same order (usually all > triangles, or quads, or maybe hexes). if they are a mixture, you still > store the full set (say, six vertices), while marking some as unused. > But it's not that well set up for a bunch of polygons of different order. > > Not too bad if there are only one or two complex polygons, but it > would be a bit weird -- you'd have vertices and boundaries, but no > faces. And you'd lose t order of the vertices (thought that could > probably be added to the UGRID standard) OK. I didn't investigate this, but it would be good to know about it. If ugrid can do something like this, but not all of it, maybe ugrid could be extended. If ugrid seems too complicated for these cases, maybe a "light" version of ugrid could be proposed for them. I think we should avoid having two partially overlapping conventions. > * So far CF does not say anything about the use of netCDF-4 features (i.e. > > not > > the classic model). We have often discussed allowing them but the > > general argument is also made that there has to be a compelling case > > for providing a new way to do something which can already be done. > > (Steve Hankin often made this argument, but since he's mostly > > retired I'll make it now in his name > > :-) > > > > Maybe it's time to embrace netcdf4? It's been a while! Though maybe > for CF > 2.* -- any movement on that? I think, as we generally do, that we should adopt netCDF-4 features if there is a definite need to do so. I mean something you can't do with an existing mechanism, or which is done so much more easily with a new mechanism that it justifies the extra effort of requiring alternatives to be programmed in software. I'm not arguing against it in general, but I think it has to be argued for each specific need within the convention. CF2 is not well-defined. I have to admit to being nervous about that. I am very much opposed to an idea of "starting all over again" and maintaining two conventions in parallel (since old data would continue to exist for a long time and so the old CF would have to be supported), and I also think backwards- incompability has to be strongly justified. So I favour step-by-step evolution. Another idea we've discussed, which I'm comfortable with, is of defining "strict" compliance to the convention, which a data-writer could optionally adhere to. This could exclude older features we wanted to deprecate. However this is really not the subject of the discussion - it's another thread. > I think the ragged array option ins fine -- though I haven't looked at > vlen arrays enough to know if they offer a compelling alternative. One > issue is that the programming environments that we use to work with > the data may not have an equivalent of vlen arrays. That's a good point, and a reason why we have to be cautious in general about adopting netCDF-4 features. Best wishes Jonathan _
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
Hello Ben I think this is fascinating and fantastic work which is likely to prove very useful for a range of domains. I am afraid that, just now, I don't have any specific insights with regard to: > Questions for the CF Community > > 1. Are our VLEN netCDF-3 and netCDF-4 approaches acceptable? What changes > would you recommend? > 2. Are the geometry types point, line, polygon, and their multipart > equivalents sufficient for the community? but I do think these are really valuable areas to get feedback on. all the best mark From: CF-metadata [cf-metadata-boun...@cgd.ucar.edu] on behalf of Ben Koziol - NOAA Affiliate [ben.koz...@noaa.gov] Sent: 07 September 2016 19:13 To: CF metadata Cc: Bob Simons - NOAA Federal; Whiteaker, Timothy L Subject: [CF-metadata] Feedback requested on proposed CF Simple Geometries Greetings, As part of an EarthCube project for advancing netCDF-CF [1], we are developing an approach to represent simple geometries in enhanced netCDF-4 with a variable length array backport for netCDF-3. Simple geometries, for example, may be used to associate stream discharge with river lines or surface runoff with watershed polygons. We've drafted an initial approach and reference implementation on the GitHub netCDF-CF-simple-geometry project [2] and would greatly appreciate feedback from the CF community. We'd like to make sure our scope is appropriate and our approach is acceptable. Scope The result of this effort will be a standard that the CF timeSeries feature type could use to specify spatial coordinates (define a simple geometry) for a timeSeries variable. For those familiar with the OGC WKT standard geometry types [3], we will include Point, LineString, Polygon, Multipoint, MultiLineString, and MultiPolygon (WKT primitives and multipart geometries). We anticipate that the six chosen geometry types will cover the needs of most people generating netCDF data. These types also align with other geospatial data formats such as GeoJSON and ESRI Shapefile. If our approach is well received by the CF community, we may later adapt it to include parametric shapes such as circles and ellipses. Simple Geometry Encoding Method Driven by the possibility that different features will require different numbers of coordinates to describe their geometries, our approach uses variable length (VLEN) arrays in enhanced netCDF-4 and continuous ragged arrays (CRAs) in netCDF-3. We describe the VLEN netCDF-4 approach first. The netCDF-3 CRA description follows. In our approach, a VLEN coordinate_index variable which identifies the indices of geometry coordinates in separate coordinate arrays. The coordinate_index variable includes a coordinates attribute which stores the names of the coordinate variables and a geom_type attribute to indicate the geometry type. For multipart geometries, the coordinate index variable may include a negative integer flag(s) indicating the start of each new geometry "part" for the current feature. The first geometry part is not preceded by the negative integer flag. The variable shall include an attribute named multipart_break_value identifying the flag's value. For polygon geometries with holes (also called "interiors"), the coordinate index values shall include a negative integer flagging the start of each hole. In this case, the variable shall include a hole_break_value attribute to indicate the flag value. Other attributes on the coordinate index variable describe clockwise or anticlockwise node order for polygons and polygon closure convention. For additional details, see the wiki [4]. With these concepts defined, an example for multipolygons with holes is shown below. You can copy the WKT description below into Wicket [5] if you'd like to see what the geometry in this example looks like. Well-Known Text (WKT): MULTIPOLYGON(((0 0, 20 0, 20 20, 0 20, 0 0), (1 1, 10 5, 19 1, 1 1), (5 15, 7 19, 9 15, 5 15), (11 15, 13 19, 15 15, 11 15)), ((5 25, 9 25, 7 29, 5 25)), ((11 25, 15 25, 13 29, 11 25))) Common Data Language (CDL) for netCDF-4 VLEN Arrays: netcdf multipolygon_example { types: int64(*) geom_VLType ; dimensions: node = 25 ; geom = 1 ; variables: geom_VLType coordinate_index(geom) ; string coordinate_index:geom_type = "multipolygon" ; string coordinate_index:coordinates = "x y" ; coordinate_index:multipart_break_value = -1 ; coordinate_index:hole_break_value = -2 ; string coordinate_index:outer_ring_order = "anticlockwise" ; string coordinate_index:closure_convention = "last_node_equals_first" ; double x(node) ; double y(node) ; data: coordinate_index = {0, 1, 2, 3, 4, -2, 5, 6, 7, 8, -2, 9, 10, 11, 12, -2, 13, 14, 15, 16, -1, 17, 18, 19, 20, -1, 21, 22, 23, 24} ; x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7,
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
Dear Ethan > The EC netCDF-CF project that Ben mentioned is working on a number of CF > extension efforts that are looking to use features of the netCDF enhanced > data model. Those efforts will all target CF 2 rather than CF 1.x. However, > as with the Simple Geometries, we should also expect suggestions for > changes to CF 1.x spinning out of these efforts. > > The CF-2 discussion has been pretty quite for awhile now. However, I expect > it will be more active as these various CF extension efforts start seeking > more community input and making proposals. This implies forking CF. I don't see the need to do that. If we did make a major backward-incompatible change to CF, I agree that it would be logical to call it CF2.x, and subsequent development would be based on it, but I don't see why EarthCube proposals like Ben's shouldn't be accommodated in CF1.x. Best wishes Jonathan ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
Hi all, Just a quick note on Chris' CF 2 question (until I have a bit more time to think more fully on this discussion) ... The EC netCDF-CF project that Ben mentioned is working on a number of CF extension efforts that are looking to use features of the netCDF enhanced data model. Those efforts will all target CF 2 rather than CF 1.x. However, as with the Simple Geometries, we should also expect suggestions for changes to CF 1.x spinning out of these efforts. The CF-2 discussion has been pretty quite for awhile now. However, I expect it will be more active as these various CF extension efforts start seeking more community input and making proposals. Cheers, Ethan On Thu, Sep 22, 2016 at 12:00 PM, Chris Barker wrote: > Sorry, not enough time to really read tis all carefully, but a couple > comments from a brief look: > >> >> If the regions were irregular >> polygons in latitude and longitude, nv would be the number of vertices >> and the >> lat and lon bounds would trace the outline of the polygon e.g. nv=3, >> lat=0,90,0 >> and lon=0,0,90 describes the eighth of the sphere which is bounded by the >> meridians at 0E and 90E and the Equator. I think, therefore, we do not >> need an >> additional convention for points or polygonal regions. > > > this seems fine for this simple example, but burying a bunch of > coordinates of a complex polygon in a text string in an attribute is really > not a good idea -- the coordinates of a polygon should be in the array data > one way or another, rather than having to parse out attribute strings. > > * I suspect that geometries of this kind can be described by the ugrid >> convention http://ugrid-conventions.github.io/ugrid-conventions, which is >> compliant with CF. Their purpose is to describe a set of connected points, >> edges or faces at which values are given, > > > I'm not so sure -- UGRID is about defining a bunch of polygons that all > share vertices, and are all of the same order (usually all triangles, or > quads, or maybe hexes). if they are a mixture, you still store the full set > (say, six vertices), while marking some as unused. But it's not that well > set up for a bunch of polygons of different order. > > Not too bad if there are only one or two complex polygons, but it would be > a bit weird -- you'd have vertices and boundaries, but no faces. And you'd > lose t order of the vertices (thought that could probably be added to the > UGRID standard) > > >> whereas in your case you'd give a >> single value for the whole set, but the description of the geometry itself >> might be similar. Have you had a look at whether ugrid could meet your >> needs? >> If it almost does so, perhaps a better thing to do would be to propose >> additions to ugrid. We would like to avoid having more than one way to >> describe >> such geometries. >> > > Ben has been involved in UGRID, so I'm sure he's thought this out. For my > part, I think it's really a different problem, though it would be nice if > it were as similar to UGRID as possible. > > * So far CF does not say anything about the use of netCDF-4 features (i.e. >> not >> the classic model). We have often discussed allowing them but the general >> argument is also made that there has to be a compelling case for >> providing a >> new way to do something which can already be done. (Steve Hankin often >> made >> this argument, but since he's mostly retired I'll make it now in his name >> :-) >> > > Maybe it's time to embrace netcdf4? It's been a while! Though maybe for CF > 2.* -- any movement on that? > > >> If there are two ways to do something, software has to support both of >> them. We >> already have ways to encode ragged arrays, so is there a compelling case >> for >> needing the netCDF-4 vlen array as well? > > > I think the ragged array option ins fine -- though I haven't looked at > vlen arrays enough to know if they offer a compelling alternative. One > issue is that the programming environments that we use to work with the > data may not have an equivalent of vlen arrays. > > * Similarly, you propose attributes for clockwise/anticlockwise node order >> and >> for the polygon closure convention. > > > This should match the OGC conventions as much as is practical. > > -CHB > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R(206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > chris.bar...@noaa.gov > > ___ > CF-metadata mailing list > CF-metadata@cgd.ucar.edu > http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata > > ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
On Thu, Sep 22, 2016 at 9:26 AM, Jonathan Gregory wrote: > I didn't suggest parsing attribute strings. The same numbers that Ben > would put > in his x and y auxiliary coordinate variables for a single polygon can > appear > in coordinate bounds variables according to the existing convention. OK then, sorry for the confusion, probably me reading it too fast... OK. I didn't investigate this, but it would be good to know about it. If > ugrid can do something like this, but not all of it, maybe ugrid could be > extended. sure. > If ugrid seems too complicated for these cases, maybe a "light" > version of ugrid could be proposed for them. I think we should avoid having > two partially overlapping conventions. I agree -- but it seem like these are really different use cases to me -- sure there are similarities, but a different enough focus that a different standard may make sense -- though hopefully UGRID can inform the "new" one, so as to not have different way to accomplish the parts that are the same. CF2 is not well-defined. I thought it wasn't defined at all. But I think we all share your concerns about that. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
Dear Chris > > If the regions were irregular > > polygons in latitude and longitude, nv would be the number of vertices and > > the > > lat and lon bounds would trace the outline of the polygon e.g. nv=3, > > lat=0,90,0 > > and lon=0,0,90 describes the eighth of the sphere which is bounded by the > > meridians at 0E and 90E and the Equator. I think, therefore, we do not > > need an > > additional convention for points or polygonal regions. > > this seems fine for this simple example, but burying a bunch of coordinates > of a complex polygon in a text string in an attribute is really not a good > idea -- the coordinates of a polygon should be in the array data one way or > another, rather than having to parse out attribute strings. To avoid confusion: I didn't suggest parsing attribute strings. The same numbers that Ben would put in his x and y auxiliary coordinate variables for a single polygon can appear in coordinate bounds variables according to the existing convention. > * I suspect that geometries of this kind can be described by the ugrid > > convention http://ugrid-conventions.github.io/ugrid-conventions, which is > > compliant with CF. Their purpose is to describe a set of connected points, > > edges or faces at which values are given, > > I'm not so sure -- UGRID is about defining a bunch of polygons that all > share vertices, and are all of the same order (usually all triangles, or > quads, or maybe hexes). if they are a mixture, you still store the full set > (say, six vertices), while marking some as unused. But it's not that well > set up for a bunch of polygons of different order. > > Not too bad if there are only one or two complex polygons, but it would be > a bit weird -- you'd have vertices and boundaries, but no faces. And you'd > lose t order of the vertices (thought that could probably be added to the > UGRID standard) OK. I didn't investigate this, but it would be good to know about it. If ugrid can do something like this, but not all of it, maybe ugrid could be extended. If ugrid seems too complicated for these cases, maybe a "light" version of ugrid could be proposed for them. I think we should avoid having two partially overlapping conventions. > * So far CF does not say anything about the use of netCDF-4 features (i.e. > > not > > the classic model). We have often discussed allowing them but the general > > argument is also made that there has to be a compelling case for providing > > a > > new way to do something which can already be done. (Steve Hankin often made > > this argument, but since he's mostly retired I'll make it now in his name > > :-) > > > > Maybe it's time to embrace netcdf4? It's been a while! Though maybe for CF > 2.* -- any movement on that? I think, as we generally do, that we should adopt netCDF-4 features if there is a definite need to do so. I mean something you can't do with an existing mechanism, or which is done so much more easily with a new mechanism that it justifies the extra effort of requiring alternatives to be programmed in software. I'm not arguing against it in general, but I think it has to be argued for each specific need within the convention. CF2 is not well-defined. I have to admit to being nervous about that. I am very much opposed to an idea of "starting all over again" and maintaining two conventions in parallel (since old data would continue to exist for a long time and so the old CF would have to be supported), and I also think backwards- incompability has to be strongly justified. So I favour step-by-step evolution. Another idea we've discussed, which I'm comfortable with, is of defining "strict" compliance to the convention, which a data-writer could optionally adhere to. This could exclude older features we wanted to deprecate. However this is really not the subject of the discussion - it's another thread. > I think the ragged array option ins fine -- though I haven't looked at vlen > arrays enough to know if they offer a compelling alternative. One issue is > that the programming environments that we use to work with the data may not > have an equivalent of vlen arrays. That's a good point, and a reason why we have to be cautious in general about adopting netCDF-4 features. Best wishes Jonathan ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
Sorry, not enough time to really read tis all carefully, but a couple comments from a brief look: > > If the regions were irregular > polygons in latitude and longitude, nv would be the number of vertices and > the > lat and lon bounds would trace the outline of the polygon e.g. nv=3, > lat=0,90,0 > and lon=0,0,90 describes the eighth of the sphere which is bounded by the > meridians at 0E and 90E and the Equator. I think, therefore, we do not > need an > additional convention for points or polygonal regions. this seems fine for this simple example, but burying a bunch of coordinates of a complex polygon in a text string in an attribute is really not a good idea -- the coordinates of a polygon should be in the array data one way or another, rather than having to parse out attribute strings. * I suspect that geometries of this kind can be described by the ugrid > convention http://ugrid-conventions.github.io/ugrid-conventions, which is > compliant with CF. Their purpose is to describe a set of connected points, > edges or faces at which values are given, I'm not so sure -- UGRID is about defining a bunch of polygons that all share vertices, and are all of the same order (usually all triangles, or quads, or maybe hexes). if they are a mixture, you still store the full set (say, six vertices), while marking some as unused. But it's not that well set up for a bunch of polygons of different order. Not too bad if there are only one or two complex polygons, but it would be a bit weird -- you'd have vertices and boundaries, but no faces. And you'd lose t order of the vertices (thought that could probably be added to the UGRID standard) > whereas in your case you'd give a > single value for the whole set, but the description of the geometry itself > might be similar. Have you had a look at whether ugrid could meet your > needs? > If it almost does so, perhaps a better thing to do would be to propose > additions to ugrid. We would like to avoid having more than one way to > describe > such geometries. > Ben has been involved in UGRID, so I'm sure he's thought this out. For my part, I think it's really a different problem, though it would be nice if it were as similar to UGRID as possible. * So far CF does not say anything about the use of netCDF-4 features (i.e. > not > the classic model). We have often discussed allowing them but the general > argument is also made that there has to be a compelling case for providing > a > new way to do something which can already be done. (Steve Hankin often made > this argument, but since he's mostly retired I'll make it now in his name > :-) > Maybe it's time to embrace netcdf4? It's been a while! Though maybe for CF 2.* -- any movement on that? > If there are two ways to do something, software has to support both of > them. We > already have ways to encode ragged arrays, so is there a compelling case > for > needing the netCDF-4 vlen array as well? I think the ragged array option ins fine -- though I haven't looked at vlen arrays enough to know if they offer a compelling alternative. One issue is that the programming environments that we use to work with the data may not have an equivalent of vlen arrays. * Similarly, you propose attributes for clockwise/anticlockwise node order > and > for the polygon closure convention. This should match the OGC conventions as much as is practical. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ CF-metadata mailing list CF-metadata@cgd.ucar.edu http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
Dear Jonathan, I can’t speak to the technical details, but can mention some motivation for simple geometries. Among other applications, NetCDF-CF is now being used as an intermediate & output data format in the US National Weather Service’s National Water Model (NWM). This forecasts streamflow rates in about 2.7 million stream segments averaging 2km, throughout the continental US, at multiple time horizons (3 hr, 18 hr, 10 days) every hour, and an ensemble for 30-day forecast less frequently. There are many applications which can benefit from detailed polyline and polygon geometries. While ugrid could also be used, the simple geometries approach presented is simpler to implement. Regards, David Arctur On Sep 22, 2016, at 5:40 AM, Jonathan Gregory wrote: Dear Ben Thank you for your thoughtful and interesting proposal. I have quite a lot of questions and comments about it. * You explain that the need is to specify spatial coordinates with a simple geometry for a timeSeries variable. For example, this could be for the discharge as a function of time across some line in a river (your example), or I suppose it could be an average temperature as a function of time for the Atlantic Ocean, where you wanted to supply the polygon which drew the outline of the basin. Have I got the idea? Timeseries like this can be stored in CF, but their geographical extent is usually described only in words e.g. a region name of atlantic_ocean, and this is fine for applications like CMIP where you want to compare data from different data sources in which the Atlantic Ocean may have different exact shapes (different AOGCMs, in particular). An array of region names is also possible, so I don't think we need a new convention to contain your dwarf planet example. * Sect 9.1 on discrete sampling geometries says it cannot yet be used for cases "where geo-positioning cannot be described as a discrete point location. Problematic examples include time series that refer to a geographical region (e.g. the northern hemisphere) ...". Actually I think that's not quite right. The existing convention *can* describe regions which are contiguous, and rectangular or polygonal, using its usual bounds convention (Sect 7.1). I think we should consider changing this text, because it seems unnecessarily restrictive. For example, a timeSeries for the average temperature in the Northern Hemisphere can be stored like this: dimensions: region=1; nv=2; time=UNLIMITED; variables: float temperature(region,time); temperature:standard_name="surface_temperature"; temperature:units="K"; temperature:coordinates="lat lon"; temperature:cell_methods="time: mean area: mean"; float lat(region); lat:standard_name="latitude"; lat:units="degrees_north"; lat:bounds="lat_bounds"; float lat_bounds(region,nv); float lon(region); lon:standard_name="longitude"; lon:units="degrees_east"; lon:bounds="lon_bounds"; float lon_bounds(region,nv); data: lat_bounds=0,90; lon_bounds=0,360; which means the region is 0-90N and 0-360E. If the regions were irregular polygons in latitude and longitude, nv would be the number of vertices and the lat and lon bounds would trace the outline of the polygon e.g. nv=3, lat=0,90,0 and lon=0,0,90 describes the eighth of the sphere which is bounded by the meridians at 0E and 90E and the Equator. I think, therefore, we do not need an additional convention for points or polygonal regions. However, we would need new conventions for a timeseries where each value applies to a set of discontiguous regions or regions with holes in them, a set of points, a line or a set of lines. I guess that these are included in the geometry types you list (LineString, Multipoint, MultiLineString, and MultiPolygon). Do you have definite use-cases for all of these? (I ask this because we don't add new functionality to CF until there is a definite and common need for it in practice.) * I suspect that geometries of this kind can be described by the ugrid convention http://ugrid-conventions.github.io/ugrid-conventions, which is compliant with CF. Their purpose is to describe a set of connected points, edges or faces at which values are given, whereas in your case you'd give a single value for the whole set, but the description of the geometry itself might be similar. Have you had a look at whether ugrid could meet your needs? If it almost does so, perhaps a better thing to do would be to propose additions to ugrid. We would like to avoid having more than one way to describe such geometries. If you decide to make use of ugrid instead, the rest of my comments may not be relevant! * So far CF does not say anything about the use of netCDF-4 features (i.e. not the classic model). We have often discussed allowing them but the general argument is also made that there has to be a compelling case for providing a new way to do something which can already be done. (Steve Hankin often
Re: [CF-metadata] Feedback requested on proposed CF Simple Geometries
Dear Ben Thank you for your thoughtful and interesting proposal. I have quite a lot of questions and comments about it. * You explain that the need is to specify spatial coordinates with a simple geometry for a timeSeries variable. For example, this could be for the discharge as a function of time across some line in a river (your example), or I suppose it could be an average temperature as a function of time for the Atlantic Ocean, where you wanted to supply the polygon which drew the outline of the basin. Have I got the idea? Timeseries like this can be stored in CF, but their geographical extent is usually described only in words e.g. a region name of atlantic_ocean, and this is fine for applications like CMIP where you want to compare data from different data sources in which the Atlantic Ocean may have different exact shapes (different AOGCMs, in particular). An array of region names is also possible, so I don't think we need a new convention to contain your dwarf planet example. * Sect 9.1 on discrete sampling geometries says it cannot yet be used for cases "where geo-positioning cannot be described as a discrete point location. Problematic examples include time series that refer to a geographical region (e.g. the northern hemisphere) ...". Actually I think that's not quite right. The existing convention *can* describe regions which are contiguous, and rectangular or polygonal, using its usual bounds convention (Sect 7.1). I think we should consider changing this text, because it seems unnecessarily restrictive. For example, a timeSeries for the average temperature in the Northern Hemisphere can be stored like this: dimensions: region=1; nv=2; time=UNLIMITED; variables: float temperature(region,time); temperature:standard_name="surface_temperature"; temperature:units="K"; temperature:coordinates="lat lon"; temperature:cell_methods="time: mean area: mean"; float lat(region); lat:standard_name="latitude"; lat:units="degrees_north"; lat:bounds="lat_bounds"; float lat_bounds(region,nv); float lon(region); lon:standard_name="longitude"; lon:units="degrees_east"; lon:bounds="lon_bounds"; float lon_bounds(region,nv); data: lat_bounds=0,90; lon_bounds=0,360; which means the region is 0-90N and 0-360E. If the regions were irregular polygons in latitude and longitude, nv would be the number of vertices and the lat and lon bounds would trace the outline of the polygon e.g. nv=3, lat=0,90,0 and lon=0,0,90 describes the eighth of the sphere which is bounded by the meridians at 0E and 90E and the Equator. I think, therefore, we do not need an additional convention for points or polygonal regions. However, we would need new conventions for a timeseries where each value applies to a set of discontiguous regions or regions with holes in them, a set of points, a line or a set of lines. I guess that these are included in the geometry types you list (LineString, Multipoint, MultiLineString, and MultiPolygon). Do you have definite use-cases for all of these? (I ask this because we don't add new functionality to CF until there is a definite and common need for it in practice.) * I suspect that geometries of this kind can be described by the ugrid convention http://ugrid-conventions.github.io/ugrid-conventions, which is compliant with CF. Their purpose is to describe a set of connected points, edges or faces at which values are given, whereas in your case you'd give a single value for the whole set, but the description of the geometry itself might be similar. Have you had a look at whether ugrid could meet your needs? If it almost does so, perhaps a better thing to do would be to propose additions to ugrid. We would like to avoid having more than one way to describe such geometries. If you decide to make use of ugrid instead, the rest of my comments may not be relevant! * So far CF does not say anything about the use of netCDF-4 features (i.e. not the classic model). We have often discussed allowing them but the general argument is also made that there has to be a compelling case for providing a new way to do something which can already be done. (Steve Hankin often made this argument, but since he's mostly retired I'll make it now in his name :-) If there are two ways to do something, software has to support both of them. We already have ways to encode ragged arrays, so is there a compelling case for needing the netCDF-4 vlen array as well? We already have a way to encode strings too, as character arrays. I think this is probably a discussion we should have again in a different thread, so I'll just talk about your classic encoding. The same points apply to both encodings. * Your approach uses a coordinate_index variable to identify indices of geometry coordinates e.g. dimensions: indices = 30; node = 25 ; geom = 1 ; variables: int coordinate_index(indices) ; coordinate_index:coordin
[CF-metadata] Feedback requested on proposed CF Simple Geometries
Greetings, As part of an EarthCube project for advancing netCDF-CF [1], we are developing an approach to represent simple geometries in enhanced netCDF-4 with a variable length array backport for netCDF-3. Simple geometries, for example, may be used to associate stream discharge with river lines or surface runoff with watershed polygons. We've drafted an initial approach and reference implementation on the GitHub netCDF-CF-simple-geometry project [2] and would greatly appreciate feedback from the CF community. We'd like to make sure our scope is appropriate and our approach is acceptable. Scope - The result of this effort will be a standard that the CF timeSeries feature type could use to specify spatial coordinates (define a simple geometry) for a timeSeries variable. - For those familiar with the OGC WKT standard geometry types [3], we will include Point, LineString, Polygon, Multipoint, MultiLineString, and MultiPolygon (WKT primitives and multipart geometries). We anticipate that the six chosen geometry types will cover the needs of most people generating netCDF data. These types also align with other geospatial data formats such as GeoJSON and ESRI Shapefile. If our approach is well received by the CF community, we may later adapt it to include parametric shapes such as circles and ellipses. Simple Geometry Encoding Method Driven by the possibility that different features will require different numbers of coordinates to describe their geometries, our approach uses variable length (VLEN) arrays in enhanced netCDF-4 and continuous ragged arrays (CRAs) in netCDF-3. We describe the VLEN netCDF-4 approach first. The netCDF-3 CRA description follows. In our approach, a VLEN coordinate_index variable which identifies the indices of geometry coordinates in separate coordinate arrays. The coordinate_index variable includes a coordinates attribute which stores the names of the coordinate variables and a geom_type attribute to indicate the geometry type. For multipart geometries, the coordinate index variable may include a negative integer flag(s) indicating the start of each new geometry "part" for the current feature. The first geometry part is not preceded by the negative integer flag. The variable shall include an attribute named multipart_break_value identifying the flag's value. For polygon geometries with holes (also called "interiors"), the coordinate index values shall include a negative integer flagging the start of each hole. In this case, the variable shall include a hole_break_value attribute to indicate the flag value. Other attributes on the coordinate index variable describe clockwise or anticlockwise node order for polygons and polygon closure convention. For additional details, see the wiki [4]. With these concepts defined, an example for multipolygons with holes is shown below. You can copy the WKT description below into Wicket [5] if you'd like to see what the geometry in this example looks like. Well-Known Text (WKT): MULTIPOLYGON(((0 0, 20 0, 20 20, 0 20, 0 0), (1 1, 10 5, 19 1, 1 1), (5 15, 7 19, 9 15, 5 15), (11 15, 13 19, 15 15, 11 15)), ((5 25, 9 25, 7 29, 5 25)), ((11 25, 15 25, 13 29, 11 25))) Common Data Language (CDL) for netCDF-4 VLEN Arrays: netcdf multipolygon_example { types: int64(*) geom_VLType ; dimensions: node = 25 ; geom = 1 ; variables: geom_VLType coordinate_index(geom) ; string coordinate_index:geom_type = "multipolygon" ; string coordinate_index:coordinates = "x y" ; coordinate_index:multipart_break_value = -1 ; coordinate_index:hole_break_value = -2 ; string coordinate_index:outer_ring_order = "anticlockwise" ; string coordinate_index:closure_convention = "last_node_equals_first" ; double x(node) ; double y(node) ; data: coordinate_index = {0, 1, 2, 3, 4, -2, 5, 6, 7, 8, -2, 9, 10, 11, 12, -2, 13, 14, 15, 16, -1, 17, 18, 19, 20, -1, 21, 22, 23, 24} ; x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7, 5, 11, 15, 13, 11 ; y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25, 25, 29, 25, 25, 25, 29, 25 ; } You'll find additional examples for VLEN geometry storage on our wiki [6]. Variable Length (VLEN) Arrays in NetCDF-3 To support netCDF-3, we created a VLEN approach for netCDF-3 [7]. Inspired by CF continuous ragged arrays (CRAs), our approach drops the CRA count variable in favor of a stop variable that stores the stop index for each geometry within an array of geometry coordinates. This improves random accessibility of the CRA "elements" avoiding the need to sum counts preceding the target element index. The stop variable includes a contiguous_ragged_dimension attribute whose value is the name of the dimension for which stop indices apply (similar to the CRA sample_dimension attribute). An example showing how strings can be stored with this approach is shown below. Common Data Language (CDL) for netCDF-3 CRAs: netcdf dwarf_planets { dime