> UTF-8 is only an encoding, so we should just say "unicode" for strings.
We could do that if and only if netcdf itself was clear about how Unicode is
encoded in files. Which it is for variable names, though not so sure it is
anywhere else.
But even so, once the encoding has been specified,
I agree and would go one small step further: UTF-8 is only an encoding, so we
should just say "unicode" for strings. If we need to restrict that, say to
disallow underscore in the beginning or to save a separation character like
space in attributes right now, we should do so at the character
@DocOtak @zklaus Sorry if I've pulled the discussion off track. The question of
exactly why NUG worded things the way they did is intriguing, but I think Klaus
is right that we shouldn't get wrapped around that particular axle in this
issue — particularly if we are going to split encoding off
I think there is some confusion here.
First, this whole regex stuff is only about the physical byte layout of the
netcdf classic file format. I would in principle suggest to completely focus on
netcdf4 files instead.
Second, I think CF should not concern itself with encodings and byte order