Re: [CF-metadata] CF-metadata Digest, Vol 166, Issue 5

2017-02-06 Thread Jim Biard
It may be worth noting that the IEEE-754 representation was decided by 
Unidata and netCDF. If it had been left up to CF, I wonder what would 
have happened? ;-)



On 2/3/17 2:41 PM, Bob Simons - NOAA Federal wrote:

...
And it seems odd to reject existing standards that have been so 
painstakingly hammered out, in favor of starting the process all over 
again.  We follow existing standards for other things (e.g., IEEE-754 
for representing floating point numbers in binary files), why can't we 
follow an existing Simple Features standard?

...


--
CICS-NC  Visit us on
Facebook  *Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC 
North Carolina State University 
NOAA National Centers for Environmental Information 
/formerly NOAA’s National Climatic Data Center/
151 Patton Ave, Asheville, NC 28801
e: jbi...@cicsnc.org 
o: +1 828 271 4900

/Connect with us on Facebook for climate 
 and ocean and geophysics 
 information, and follow us 
on Twitter at @NOAANCEIclimate  and 
@NOAANCEIocngeo . /



___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] CF-metadata Digest, Vol 166, Issue 5

2017-02-04 Thread David Blodgett
Dear Chris, 

Thanks for your thorough treatment of these issues. We have gone through a 
similar thought process to arrive at the proposal we came up with. I’ll answer 
as briefly as I can.

1) how would you translate between netcdf geometries and, say geo JSON?

The thinking is that node coordinate sharing is optional. If the writer wants 
to check or already knows that nodes share coordinates, then it’s possible. 
Otherwise, it doesn’t have to be used. I’ve always felt that this was 
important, but maybe not critical for a core NetCDF-CF data model. Some offline 
conversation has led to an example that does not use it that may be a good 
alternative, more on that later.

2) Break Values

You really do have to hold your nose on the break values. The issue is that you 
have to store that information somehow and it is almost worse to create new 
variables to store the multi-part and hole/not hole information. The 
alternative approach that’s forming up as mentioned above does break the 
information out into additional variables but simplifies things otherwise. In 
that case it doesn’t feel overly complex to me… so stay tuned for more on this 
front.

3) Ragged Indexing

Your thought process follows ours exactly. The key is that you either have to 
create the “pointer” array as a first order of business or loop over the counts 
ad nauseam. I’m actually leaning toward the counts for two reasons. First, the 
counts approach is already in CF so is a natural fit and will be familiar to 
developers in this space. Second, the issue of 0 vs 1 indexing is annoying. In 
our proposal, we settled on 0 indexing because it aligns with the idea of an 
offset, but it is still annoying and some applications would always have to 
adjust that pointer array as a first order of business. 

On to Bob’s comments.

Regarding aligning with other data models / encodings, I guess this needs to be 
unpacked a bit. 

1) In this setting, simple features is a data model, not an encoding. An 
encoding can implement part or all of a data model as is needed by the use 
case(s) at hand. There is no problem with partial implementations you still get 
interoperability for the intended use cases.
2) Attempting to align with other encoding standards UGRID and NetCDF-CF are 
the primary ones here, is simply to keep the implementation patterns similar 
and familiar. This may be a fools errand, but is presumably good for 
adoptability and consistency. 
So, I don’t see a problem with implementing important simple features types in 
a way that aligns with the way the existing community standards work.

I don’t see this as ignoring existing standards at all. There is no open 
community standard for binary encoding of geometries and related data that 
passes the CF requirements of human readability and self-description. We are 
adopting the appropriate data model and suggesting a new encoding that will 
solve a lot of problems in the environmental modeling space. 

As we’ve discussed before, your "different approach” sounds great, but seems 
like an exercise for a future effort that doesn’t attempt to align with CF 1.7. 
Maybe what you suggest is a path forward for variable length arrays in the CF 
2.0 “vision in the mist”, but I don’t see it as a tenable solution for CF 1.*.

Best Regards,

- Dave


> On Feb 3, 2017, at 3:31 PM, Chris Barker  wrote:
> 
> a few thoughts. First, I think there are three core "issues" that need to be 
> resolved:
> 
> 1) Coordinate indexing (indirection)
> 
> the question of whether you have an array of "vertices" that the geomotry 
> types index into to get thier data:
> 
> Advantages:
>  - if a number of geometries share a lot of vertices, it can be more efficient
>  - the relationship between geometries that share vertices (i.e. polygons 
> that share a boundary) etc. is well defined. you dopnt need to check for 
> closeness, and maybe have a tolerance, etc.
> 
> These were absolutely critical for UGRID for example -- a UGRID mesh is a 
> single thing", NOT a collection of polygons that happen to share some 
> vertices.
> 
> Disadvantages:
>  -  if the geometries do not share many vertices, it is less efficient.
>  -  there are additional code complications in "getting" the vertices of the 
> given geometry
>  - it does not match the OGC data model.
> 
> My 0.02 -- given my use cases, I tend to want teh advantages -- but I don't 
> know that that's a typical use case. And I think it's a really good idea to 
> keep with the OGS data model where possible -- i.e. e able to translate from 
> netcdf to, say, geoJSON as losslessly as possible. Given that I think it's 
> probably a better idea not to have the indirection.
> 
> However (to equivocate) perhaps the types of information people are likely to 
> want to store in netcdf are a subset of what the OGC standards are designed 
> for -- and for those use-cases, maybe shared vertices are critical.
> 
> One way to think about it -- how would you 

Re: [CF-metadata] CF-metadata Digest, Vol 166, Issue 5

2017-02-03 Thread Chris Barker
a few thoughts. First, I think there are three core "issues" that need to
be resolved:

1) Coordinate indexing (indirection)

the question of whether you have an array of "vertices" that the geomotry
types index into to get thier data:

Advantages:
 - if a number of geometries share a lot of vertices, it can be more
efficient
 - the relationship between geometries that share vertices (i.e. polygons
that share a boundary) etc. is well defined. you dopnt need to check for
closeness, and maybe have a tolerance, etc.

These were absolutely critical for UGRID for example -- a UGRID mesh is a
single thing", NOT a collection of polygons that happen to share some
vertices.

Disadvantages:
 -  if the geometries do not share many vertices, it is less efficient.
 -  there are additional code complications in "getting" the vertices of
the given geometry
 - it does not match the OGC data model.

My 0.02 -- given my use cases, I tend to want teh advantages -- but I don't
know that that's a typical use case. And I think it's a really good idea to
keep with the OGS data model where possible -- i.e. e able to translate
from netcdf to, say, geoJSON as losslessly as possible. Given that I think
it's probably a better idea not to have the indirection.

However (to equivocate) perhaps the types of information people are likely
to want to store in netcdf are a subset of what the OGC standards are
designed for -- and for those use-cases, maybe shared vertices are critical.

One way to think about it -- how would you translate between netcdf
geometries and, say geo JSON:
  - nc => geojson would lose the shared index info.
  - geojson => nc -- would you try to reconstruct the shared vertices?? I"m
thinking that would be a bit dangerous in the general case, because you are
adding information that you don't know is true -- are these a shared vertex
or two that just happen to be at the same location?

> > Break values

I don't really like break values as an approach, but with netcdf any option
will be ugly one way or another. So keeping with the WKT approach makes
sense to me. Either way you'll need custom code to unpack it. (BTW -- what
does WellKnownBinary do?)

> > Ragged indexing

There are two "natural" ways to represent a ragged array:

(a) store the length of each "row"
(b) store the index to the beginning (or end) or each "row"

CF already uses (a). However, working with it, I'm pretty convinced that
it's the "wrong" choice:

If you want to know how long a given row is, that is really easy with (a),
and almost as easy with (b) (involves two indexes and a subtraction)

However, if you want to extract a particular row: (b) makes this really
easy -- you simply access the slice of the array you want. with (a) you
need to loop through the entire "length_of_rows" array (up to the row of
interest) and add up the values to find the slice you need. not a huge
issue, but it is an issue. In fact, in my code to read ragged arrays in
netcdf, the first thing I do is pre-compute the index-to-each-row, so I can
then use that to access individual rows for future access -- if  you are
accessing via OpenDAP -- that's particular helpful.

So -- (b) is clearly (to me) the "best" way to do it -- but is it worth
introducing a second way to handle ragged arrays in CF? I would think yes,
but that would be offset if:

 - There is a bunch of existing library code that transparently handles
ragged arrays in netcdf (does netcdfJava have something? I'm pretty sure
Python doesn't -- certainly not in netCDF4)

 - That that existing lib code would be advantageous to leverage for code
reading features: I suspect that there will have to be enough custom code
that the ragged array bits are going to be the least of it.

So I'm for the "new" way of representing ragged arrays

-CHB


On Fri, Feb 3, 2017 at 11:41 AM, Bob Simons - NOAA Federal <
bob.sim...@noaa.gov> wrote:

> Then, isn't this proposal just the first step in the creation of a new
> model and a new encoding of Simple Features, one that is "align[ed] ...
> with as many other encoding standards in this space as is practical"? In
> other words, yet another standard for Simple Features?
>
> If so, it seems risky to me to take just the first (easy?) step "to
> support the use cases that have a compelling need today" and not solve the
> entire problem. I know the CF way is to just solve real, current needs, but
> in this case it seems to risk a head slap moment in the future when we
> realize that, in order to deal with some new simple feature variant, we
> should have done things differently from the beginning?
>
> And it seems odd to reject existing standards that have been so
> painstakingly hammered out, in favor of starting the process all over
> again.  We follow existing standards for other things (e.g., IEEE-754 for
> representing floating point numbers in binary files), why can't we follow
> an existing Simple Features standard?
>
> ---
> Rather than just be a naysayer, let me suggest a very 

Re: [CF-metadata] CF-metadata Digest, Vol 166, Issue 5

2017-02-03 Thread Bob Simons - NOAA Federal
Then, isn't this proposal just the first step in the creation of a new
model and a new encoding of Simple Features, one that is "align[ed] ...
with as many other encoding standards in this space as is practical"? In
other words, yet another standard for Simple Features?

If so, it seems risky to me to take just the first (easy?) step "to support
the use cases that have a compelling need today" and not solve the entire
problem. I know the CF way is to just solve real, current needs, but in
this case it seems to risk a head slap moment in the future when we realize
that, in order to deal with some new simple feature variant, we should have
done things differently from the beginning?

And it seems odd to reject existing standards that have been so
painstakingly hammered out, in favor of starting the process all over
again.  We follow existing standards for other things (e.g., IEEE-754 for
representing floating point numbers in binary files), why can't we follow
an existing Simple Features standard?

---
Rather than just be a naysayer, let me suggest a very different alternative:

There are several projects in the CF realm (e.g., this Simple Features
project, Discrete Sampling Geometry (DSG), true variable-length Strings,
ugrid(?)) which share a common underlying problem: how to deal with
variable-length multidimensional arrays: a[b][c], where the length of the c
dimension may be different for different b indices.
DSG solved this (5 different ways!), but only for DSG.
The Simple Features proposal seeks to solve the problem for Simple Features.
We still have no support for Unicode variable-length Strings.

Instead of continuing to solve the variable-length problem a different way
every time we confront it, shouldn't we solve it once, with one small
addition to the standard, and then use that solution repeatedly?
The solution could be a simple variant of one of the DSG solutions, but
generalized so that it could be used in different situations.
An encoding standard and built-in support for variable-length data arrays
in netcdf-java/c would solve a lot of problems, now and in the future.
Some work on this is already done: I think the netcdf-java API already
supports variable-length arrays when reading netcdf-4 files.
For Simple Features, the problem would reduce to: store the feature (using
some specified existing standard like WKT or WKB) in a variable-length
array.





On Fri, Feb 3, 2017 at 9:07 AM,  wrote:

> Date: Fri, 3 Feb 2017 11:07:00 -0600
> From: David Blodgett 
> To: Bob Simons - NOAA Federal 
> Cc: CF Metadata 
> Subject: Re: [CF-metadata] Extension of Discrete Sampling Geometries
> for Simple Features
> Message-ID: <8ee85e65-2815-4720-90fc-13c72d3c7...@usgs.gov>
> Content-Type: text/plain; charset="utf-8"
>
> Dear Bob,
>
> I?ll just take these in line.
>
> 1) noted. We have been trying to figure out what to do with the point
> featureType and I think leaving it more or less alone is a viable path
> forward.
>
> 2) This is not an exact replica of WKT, but rather a similar approach to
> WKT. As I stated, we have followed the ISO simple features data model and
> well known text feature types in concept, but have not used the same
> standardization formalisms. We aren?t advocating for supporting ?all of?
> any standard but are rather attempting to support the use cases that have a
> compelling need today while aligning this with as many other encoding
> standards in this space as is practical. Hopefully that answers your
> question, sorry if it?s vague.
>
> 3) The google doc linked in my response contains the encoding we are
> proposing as a starting point for conversation: http://goo.gl/Kq9ASq <
> http://goo.gl/Kq9ASq> I want to stress, as a starting point for
> discussion. I expect that this proposal will change drastically before
> we?re done.
>
> 4) Absolutely envision tools doing what you say, convert to/from standard
> spatial formats and NetCDF-CF geometries. We intend to introduce an R and a
> Python implementation that does exactly as you say along with whatever form
> this standard takes in the end. R and Python were chosen as the team that
> brought this together are familiar with those two languages, additional
> implementations would be more than welcome.
>
> 5) We do include a ?geometry? featureType similar to the ?point?
> featureType. Thus our difficulty with what to do with the ?point?
> featureType. You are correct, there are lots of non timeSeries applications
> to be solved and this proposal does intend to support them (within the
> existing DSG constructs).
>
> Thanks for your questions, hopefully my answers close some gaps for you.
>
> - Dave
>
> > On Feb 3, 2017, at 10:47 AM, Bob Simons - NOAA Federal <
> bob.sim...@noaa.gov> wrote:
> >
> > 1) There is a vague comment in the proposal about possibly changing the
> point featureType. Please don't, unless the changes don't