Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread jbiard
 

Hi. 

I don't think we should use ease of mapping variable names to
a programming language as a reason for allowing (or not allowing) any
particular character in variable names. CF has, as I understood it,
considered variable names as completely up to the producer, relying on
attributes to provide meaning. So, I can name a temperature variable
fluffy_bunny if I want to, and it is completely valid. 

Section 1.3
of the Conventions states, No variable or dimension names are
standardized by this convention. 

Section 2.3 states: 

Variable,
dimension and attribute names should begin with a letter and be composed
of letters, digits, and underscores. Note that this is in conformance
with the COARDS conventions, but is more restrictive than the netCDF
interface which allows use of the hyphen character. The netCDF interface
also allows leading underscores in names, but the NUG states that this
is reserved for system use. 

Case is significant in netCDF names, but
it is recommended that names should not be distinguished purely by case,
i.e., if case is disregarded, no two names should be the same. It is
also recommended that names should be obviously meaningful, if possible,
as this renders the file more effectively self-describing. 

This
convention does not standardize any variable or dimension names. 

While
the Conventions makes recommendations about variable names, NO STANDARDS
are set by the Conventions. 

So, why were non-alphanumeric characters
other than '_' excluded by practice back in the day? Are these reasons
still valid? In fact, given the statements in the Conventions, is there
actually anything other than opinion constraining people from using any
characters they like in variable (and dimension) names (as long as they
are OK with netCDF and maybe NUG)? 

Grace and peace, 

Jim 

On
2014-01-14 12:08, Chris Barker wrote: 

 There is another reason: 

mapping CF variable names directly to programming language variable
names is pretty handy -- so it's nice if those are legal. 
 I'm sure
not all programming languages have the same restrictions on names, but
there is surely a subset that's pretty common (i.e. none of the usual
math characters). 
 -Chris 
 
 On Mon, Jan 13, 2014 at 12:57 PM,
Steve Hankin steven.c.han...@noaa.gov [9] wrote:
 
 Hi John,
 

Philosophically I am aligned with Bryan: the purpose of the CF standard
is to constrain (simplify and make predictable) the use of a highly
general file creation toolkit like netCDF. The question of limitations
placed on name strings should be evaluated on this yard stick.
 

There is a class of problems that are created by embedding special
syntax characters willy-nilly into name strings. Namely, that the use of
such characters can render mathematical expressions ambiguous. Here's a
simple example. Suppose a file contains 3 surface marine variables --
lets say atmospheric CO2, ocean CO2 and an artfully computed delta
across the surface. Further say that the file creator chooses to name
the delta variable using a -, as in
 atmosCO2
 waterCO2
 and

_ _ atmosCO2-waterCO2
 
 Then the meaning of the mathematical
expression atmosCO2-waterCO2 has been rendered ambiguous. Is it a
single variable name, or the difference of two? One is forced to use
arbitrary tricks that are alien to the scientific users we are trying to
serve -- say disambiguating the expression by insisting on surrounding
quotes, atmosCO2-waterCO2, white space, atmosCO2 - waterCO2.
(Would any scientist read atmosCO2 - waterCO2 and atmosCO2-waterCO2
to have distinct meanings?)
 
 As you say we have already headed
down this (slippery) slope. Characters like +, -, . and
case-sensitivity have leaked through into fairly common practice. For
better or worse. :-( (Should the publishers of science textbooks start
using case-sensitive variable names?) So the question that you've posed
is in a sense, _now that the horse is out of the barn, is there any
merit to keeping the other animals penned?_ Like Brian, I would argue
that the way to answer this is to insist that at least there be
significant gains from letting them out.
 
 Another unintended
negative consequence: the impact on free text searches when our variable
names include special syntax characters. Are our metadata procedures on
an arc so promising that we have no need to rely on general Google-style
tools for discovery? 
 
 - Steve
 

= 
 
 On 1/13/2014 12:12
PM, John Graybeal wrote: 
 
 Not sure I am following you --
constraints are presumably there for a reason, I wasn't sure what the
reason was for these particular constraints, but thought they might have
simply echoed earlier netCDF constraints. 
 To your 'use case'
question, we were thinking about alternatives to mx_ as prefix for our
own attributes, to minimize the chance of collisions (e.g., with some
maintenance variables someone might name mx_). 
 john 
 
 On
Jan 13, 2014, at 11:27, Bryan Lawrence bryan.lawre...@ncas.ac.uk [5]
wrote: 
 
 

Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread Jim Biard
Chris,

The point is, the Conventions themselves state that there is no standard.  
People are all the time trying to add meaning to variable names, but the 
standard actually states that the meaning is to reside in the attributes.  The 
variable names are just keys for differentiating the variables.  (I could name 
all my variables “vNN”, where N is a digit, and I would be completely 
valid according to the standard.)  The long_name and standard_name attributes 
are the places where descriptors of the variable content are to be found.

So I’m raising a question.  Is there actually anything other than sentiment 
(i.e., an actual rule) that anyone can point to that prevents someone from 
using “new” characters in their variable names?

Grace and peace,

Jim

Visit us on
FacebookJim Biard
Research Scholar
Cooperative Institute for Climate and Satellites NC
North Carolina State University
NOAA's National Climatic Data Center
151 Patton Ave, Asheville, NC 28801
e: jbi...@cicsnc.org
o: +1 828 271 4900




On Jan 15, 2014, at 12:00 PM, Chris Barker chris.bar...@noaa.gov wrote:

 On Wed, Jan 15, 2014 at 7:39 AM, jbiard jbi...@mail.cicsnc.org wrote:
 I don't think we should use ease of mapping variable names to a programming 
 language as a reason for allowing (or not allowing) any particular character 
 in variable names. 
 
 Why not? maybe not a compelling reason, but I can't imagine a compelling 
 reason to have more flexible naming conventions, either.
 CF has, as I understood it, considered variable names as completely up to the 
 producer, relying on attributes to provide meaning.  So, I can name a 
 temperature variable fluffy_bunny if I want to, and it is completely valid.
 
 valid yes, a good idea? probably not.
 
 Section 1.3 of the Conventions states, No variable or dimension names are 
 standardized by this convention. 
 
 so there are no standard variable names -- that's not the same as standards 
 for variable names
 
 Personally, I wish there were standards for variable names, it would make it 
 easier to code against -- but that cat's out of the bag. But this cat isn't: 
 the restiricitons have been there for a long time, so the question now is:
 
 what are the reasons for easing those restrictions?
 
 and
 
 what are the reasons for keeping those restrictions?
 
 we've given a few reasons for keeping them (maybe not all  that compeling 
 toyou, but reasons none the less) -- what are the reasons for relaxing them, 
 other than I like this naming convention that is currently not allowed ?
 
 I'm not convinced that fluffy-bunny is any more readable or anything else 
 than fluffy_bunny
 
 -Chris
 
 
 -- 
 
 Christopher Barker, Ph.D.
 Oceanographer
 
 Emergency Response Division
 NOAA/NOS/ORR(206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115   (206) 526-6317   main reception
 
 chris.bar...@noaa.gov
 ___
 CF-metadata mailing list
 CF-metadata@cgd.ucar.edu
 http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread Steve Hankin


On 1/15/2014 9:24 AM, Jim Biard wrote:

Chris,

The point is, the Conventions themselves state that there is *no 
standard*.  People are all the time trying to add meaning to variable 
names, but the standard actually states that the meaning is to reside 
in the attributes.  The variable names are just keys for 
differentiating the variables.  (I could name all my variables 
vNN, where N is a digit, and I would be completely valid 
according to the standard.)  The long_name and standard_name 
attributes are the places where descriptors of the variable content 
are to be found.


So I'm raising a question. _ Is there actually anything other than 
sentiment (i.e., an actual rule) that anyone can point to that 
prevents someone from using new characters in their variable names?_


How about the lines from the CF document that you cut-pasted (thank you):

   /Variable, dimension and attribute names should begin with a letter
   and be composed of letters, digits, and underscores. Note that this
   is in conformance with the COARDS conventions, but is more
   restrictive than the netCDF interface which allows use of the hyphen
   character. The netCDF interface also allows leading underscores in
   names, but the NUG states that this is reserved for system use./

- Steve


Grace and peace,

Jim

CICS-NC http://www.cicsnc.org/Visit us on
Facebook http://www.facebook.com/cicsnc *Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC http://cicsnc.org/
North Carolina State University http://ncsu.edu/
NOAA's National Climatic Data Center http://ncdc.noaa.gov/
151 Patton Ave, Asheville, NC 28801
e: jbi...@cicsnc.org mailto:jbi...@cicsnc.org
o: +1 828 271 4900





On Jan 15, 2014, at 12:00 PM, Chris Barker chris.bar...@noaa.gov 
mailto:chris.bar...@noaa.gov wrote:


On Wed, Jan 15, 2014 at 7:39 AM, jbiard jbi...@mail.cicsnc.org 
mailto:jbi...@mail.cicsnc.org wrote:


I don't think we should use ease of mapping variable names to a
programming language as a reason for allowing (or not allowing)
any particular character in variable names.

Why not? maybe not a compelling reason, but I can't imagine a 
compelling reason to have more flexible naming conventions, either.


CF has, as I understood it, considered variable names as
completely up to the producer, relying on attributes to provide
meaning.  So, I can name a temperature variable fluffy_bunny if
I want to, and it is completely valid.

valid yes, a good idea? probably not.

Section 1.3 of the Conventions states, No variable or dimension
names are standardized by this convention.

so there are no standard variable names -- that's not the same as 
standards for variable names


Personally, I wish there were standards for variable names, it would 
make it easier to code against -- but that cat's out of the bag. But 
this cat isn't: the restiricitons have been there for a long time, so 
the question now is:


what are the reasons for easing those restrictions?

and

what are the reasons for keeping those restrictions?

we've given a few reasons for keeping them (maybe not all  that 
compeling toyou, but reasons none the less) -- what are the reasons 
for relaxing them, other than I like this naming convention that is 
currently not allowed ?


I'm not convinced that fluffy-bunny is any more readable or 
anything else than fluffy_bunny


-Chris


--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov mailto:chris.bar...@noaa.gov
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu mailto:CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata




___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread Karl Taylor

All,

Yes, that statement seems quite definitive and unambiguous, and for the 
reasons stated in other emails, I support retaining it.


regards,
Karl

On 1/15/14 9:37 AM, Steve Hankin wrote:


On 1/15/2014 9:24 AM, Jim Biard wrote:

Chris,

The point is, the Conventions themselves state that there is *no 
standard*.  People are all the time trying to add meaning to variable 
names, but the standard actually states that the meaning is to reside 
in the attributes.  The variable names are just keys for 
differentiating the variables.  (I could name all my variables 
vNN, where N is a digit, and I would be completely valid 
according to the standard.)  The long_name and standard_name 
attributes are the places where descriptors of the variable content 
are to be found.


So I'm raising a question. _ Is there actually anything other than 
sentiment (i.e., an actual rule) that anyone can point to that 
prevents someone from using new characters in their variable names?_


How about the lines from the CF document that you cut-pasted (thank you):

/Variable, dimension and attribute names should begin with a
letter and be composed of letters, digits, and underscores. Note
that this is in conformance with the COARDS conventions, but is
more restrictive than the netCDF interface which allows use of the
hyphen character. The netCDF interface also allows leading
underscores in names, but the NUG states that this is reserved for
system use./

- Steve


Grace and peace,

Jim

CICS-NC http://www.cicsnc.org/Visit us on
Facebook http://www.facebook.com/cicsnc *Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC http://cicsnc.org/
North Carolina State University http://ncsu.edu/
NOAA's National Climatic Data Center http://ncdc.noaa.gov/
151 Patton Ave, Asheville, NC 28801
e: jbi...@cicsnc.org mailto:jbi...@cicsnc.org
o: +1 828 271 4900





On Jan 15, 2014, at 12:00 PM, Chris Barker chris.bar...@noaa.gov 
mailto:chris.bar...@noaa.gov wrote:


On Wed, Jan 15, 2014 at 7:39 AM, jbiard jbi...@mail.cicsnc.org 
mailto:jbi...@mail.cicsnc.org wrote:


I don't think we should use ease of mapping variable names to a
programming language as a reason for allowing (or not allowing)
any particular character in variable names.

Why not? maybe not a compelling reason, but I can't imagine a 
compelling reason to have more flexible naming conventions, either.


CF has, as I understood it, considered variable names as
completely up to the producer, relying on attributes to provide
meaning.  So, I can name a temperature variable fluffy_bunny
if I want to, and it is completely valid.

valid yes, a good idea? probably not.

Section 1.3 of the Conventions states, No variable or dimension
names are standardized by this convention.

so there are no standard variable names -- that's not the same as 
standards for variable names


Personally, I wish there were standards for variable names, it would 
make it easier to code against -- but that cat's out of the bag. But 
this cat isn't: the restiricitons have been there for a long time, 
so the question now is:


what are the reasons for easing those restrictions?

and

what are the reasons for keeping those restrictions?

we've given a few reasons for keeping them (maybe not all  that 
compeling toyou, but reasons none the less) -- what are the reasons 
for relaxing them, other than I like this naming convention that is 
currently not allowed ?


I'm not convinced that fluffy-bunny is any more readable or 
anything else than fluffy_bunny


-Chris


--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov mailto:chris.bar...@noaa.gov
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu mailto:CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata




___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata




___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread Ethan Davis
Hi all,

The use of should may, by many, be interpreted as a recommendation
rather than as a requirement.

Though the terms must, should, and may are used throughout the CF
spec, I am not finding any text that defines those terms.

Perhaps a reference to the IETF RFC 2119 [1] (which defines these and a
few other related terms) should be added to the CF spec. Though it seems
that might require a fairly full review of the uses in CF of the terms
defined in RFC 2119.

Ethan

[1] http://www.ietf.org/rfc/rfc2119.txt

On 1/15/2014 10:46 AM, Karl Taylor wrote:
 All,
 
 Yes, that statement seems quite definitive and unambiguous, and for the
 reasons stated in other emails, I support retaining it.
 
 regards,
 Karl
 
 On 1/15/14 9:37 AM, Steve Hankin wrote:

 On 1/15/2014 9:24 AM, Jim Biard wrote:
 Chris,

 The point is, the Conventions themselves state that there is *no
 standard*.  People are all the time trying to add meaning to variable
 names, but the standard actually states that the meaning is to reside
 in the attributes.  The variable names are just keys for
 differentiating the variables.  (I could name all my variables
 “vNN”, where N is a digit, and I would be completely valid
 according to the standard.)  The long_name and standard_name
 attributes are the places where descriptors of the variable content
 are to be found.

 So I’m raising a question. _ Is there actually anything other than
 sentiment (i.e., an actual rule) that anyone can point to that
 prevents someone from using “new” characters in their variable names?_

 How about the lines from the CF document that you cut-pasted (thank you):

 /Variable, dimension and attribute names should begin with a
 letter and be composed of letters, digits, and underscores. Note
 that this is in conformance with the COARDS conventions, but is
 more restrictive than the netCDF interface which allows use of the
 hyphen character. The netCDF interface also allows leading
 underscores in names, but the NUG states that this is reserved for
 system use./

 - Steve

 Grace and peace,

 Jim

 CICS-NC http://www.cicsnc.org/Visit us on
 Facebook http://www.facebook.com/cicsnc   *Jim Biard*
 *Research Scholar*
 Cooperative Institute for Climate and Satellites NC http://cicsnc.org/
 North Carolina State University http://ncsu.edu/
 NOAA's National Climatic Data Center http://ncdc.noaa.gov/
 151 Patton Ave, Asheville, NC 28801
 e: jbi...@cicsnc.org mailto:jbi...@cicsnc.org
 o: +1 828 271 4900





 On Jan 15, 2014, at 12:00 PM, Chris Barker chris.bar...@noaa.gov
 mailto:chris.bar...@noaa.gov wrote:

 On Wed, Jan 15, 2014 at 7:39 AM, jbiard jbi...@mail.cicsnc.org
 mailto:jbi...@mail.cicsnc.org wrote:

 I don't think we should use ease of mapping variable names to a
 programming language as a reason for allowing (or not allowing)
 any particular character in variable names. 

 Why not? maybe not a compelling reason, but I can't imagine a
 compelling reason to have more flexible naming conventions, either.

 CF has, as I understood it, considered variable names as
 completely up to the producer, relying on attributes to provide
 meaning.  So, I can name a temperature variable fluffy_bunny
 if I want to, and it is completely valid.

 valid yes, a good idea? probably not.

 Section 1.3 of the Conventions states, No variable or dimension
 names are standardized by this convention. 

 so there are no standard variable names -- that's not the same as
 standards for variable names

 Personally, I wish there were standards for variable names, it would
 make it easier to code against -- but that cat's out of the bag. But
 this cat isn't: the restiricitons have been there for a long time,
 so the question now is:

 what are the reasons for easing those restrictions?

 and

 what are the reasons for keeping those restrictions?

 we've given a few reasons for keeping them (maybe not all  that
 compeling toyou, but reasons none the less) -- what are the reasons
 for relaxing them, other than I like this naming convention that is
 currently not allowed ?

 I'm not convinced that fluffy-bunny is any more readable or
 anything else than fluffy_bunny

 -Chris


 -- 

 Christopher Barker, Ph.D.
 Oceanographer

 Emergency Response Division
 NOAA/NOS/ORR(206) 526-6959   voice
 7600 Sand Point Way NE   (206) 526-6329   fax
 Seattle, WA  98115   (206) 526-6317   main reception

 chris.bar...@noaa.gov mailto:chris.bar...@noaa.gov
 ___
 CF-metadata mailing list
 CF-metadata@cgd.ucar.edu mailto:CF-metadata@cgd.ucar.edu
 http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata



 ___
 CF-metadata mailing list
 CF-metadata@cgd.ucar.edu
 http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata



 ___
 CF-metadata mailing list
 CF-metadata@cgd.ucar.edu
 

Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread John Graybeal
 Perhaps a reference to the IETF RFC 2119 [1] (which defines these and a few 
 other related terms) should be added to the CF spec

Yes please, since discussion on this thread has already varied in its 
understanding/application of those terms.

The ambiguity in the sentence No variable or dimension names are standardized 
by this convention. is also relevant. It could mean This convention defines 
no requirements about variable or dimension names. or This convention does 
not specify any particular variable or dimension names. The former meaning 
obviously reinforces the interpretation that 'should' is not a requirement.

While the arguments pushing for the restrictive naming convention (_ as the 
only special character) are perhaps not strong, for my own use I don't have a 
compelling use case on the need for more characters either. Mostly this is a 
matter of personal taste -- I like being able to use . and - to help with 
visual parsing and + and @ for semantic reasons, and they help reduce the 
number of likely prefix collisions (which a single separator doesn't help with 
at all). 

There is also a social benefit from relaxing the CF almost-standard: 
on-boarding. We want to encourage netCDF users to transition to CF. Minimizing 
the number of inconsistencies seems practical and forward-thinking. Forcing a 
netCDF user (which may include lots of HDF users too, these days) to abandon 
established attribute names is a significant cost for the affected users, now 
and going forward.

John


On Jan 15, 2014, at 10:00, Ethan Davis eda...@unidata.ucar.edu wrote:

 Hi all,
 
 The use of should may, by many, be interpreted as a recommendation
 rather than as a requirement.
 
 Though the terms must, should, and may are used throughout the CF
 spec, I am not finding any text that defines those terms.
 
 Perhaps a reference to the IETF RFC 2119 [1] (which defines these and a
 few other related terms) should be added to the CF spec. Though it seems
 that might require a fairly full review of the uses in CF of the terms
 defined in RFC 2119.
 
 Ethan
 
 [1] http://www.ietf.org/rfc/rfc2119.txt
 
 On 1/15/2014 10:46 AM, Karl Taylor wrote:
 All,
 
 Yes, that statement seems quite definitive and unambiguous, and for the
 reasons stated in other emails, I support retaining it.
 
 regards,
 Karl
 
 On 1/15/14 9:37 AM, Steve Hankin wrote:
 
 On 1/15/2014 9:24 AM, Jim Biard wrote:
 Chris,
 
 The point is, the Conventions themselves state that there is *no
 standard*.  People are all the time trying to add meaning to variable
 names, but the standard actually states that the meaning is to reside
 in the attributes.  The variable names are just keys for
 differentiating the variables.  (I could name all my variables
 “vNN”, where N is a digit, and I would be completely valid
 according to the standard.)  The long_name and standard_name
 attributes are the places where descriptors of the variable content
 are to be found.
 
 So I’m raising a question. _ Is there actually anything other than
 sentiment (i.e., an actual rule) that anyone can point to that
 prevents someone from using “new” characters in their variable names?_
 
 How about the lines from the CF document that you cut-pasted (thank you):
 
/Variable, dimension and attribute names should begin with a
letter and be composed of letters, digits, and underscores. Note
that this is in conformance with the COARDS conventions, but is
more restrictive than the netCDF interface which allows use of the
hyphen character. The netCDF interface also allows leading
underscores in names, but the NUG states that this is reserved for
system use./
 
- Steve
 
 Grace and peace,
 
 Jim
 
 CICS-NC http://www.cicsnc.org/Visit us on
 Facebook http://www.facebook.com/cicsnc  *Jim Biard*
 *Research Scholar*
 Cooperative Institute for Climate and Satellites NC http://cicsnc.org/
 North Carolina State University http://ncsu.edu/
 NOAA's National Climatic Data Center http://ncdc.noaa.gov/
 151 Patton Ave, Asheville, NC 28801
 e: jbi...@cicsnc.org mailto:jbi...@cicsnc.org
 o: +1 828 271 4900
 
 
 
 
 
 On Jan 15, 2014, at 12:00 PM, Chris Barker chris.bar...@noaa.gov
 mailto:chris.bar...@noaa.gov wrote:
 
 On Wed, Jan 15, 2014 at 7:39 AM, jbiard jbi...@mail.cicsnc.org
 mailto:jbi...@mail.cicsnc.org wrote:
 
I don't think we should use ease of mapping variable names to a
programming language as a reason for allowing (or not allowing)
any particular character in variable names. 
 
 Why not? maybe not a compelling reason, but I can't imagine a
 compelling reason to have more flexible naming conventions, either.
 
CF has, as I understood it, considered variable names as
completely up to the producer, relying on attributes to provide
meaning.  So, I can name a temperature variable fluffy_bunny
if I want to, and it is completely valid.
 
 valid yes, a good idea? probably not.
 
Section 1.3 of the Conventions states, No variable 

Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread Mike McCann
I have two use cases:
MBARI has data from an underwater vehicle that contains hundreds of engineering 
variables that are automatically logged using the onboard software's names for 
the variables.  Those variables include the '.' character. We tried to use our 
existing NetCDF TDS/Hyrax infrastructure to handle these data but ran into 
several frustrating inconsistencies in how various packages handled the '.'.  
Unfortunately, we are not using the infrastructure for these data.
The ESIP Federation documentation group discussed creating a flattened object 
serialization convention for hierarchical metadata and wanted to use '.' as a 
delineator but needed to abandon that consideration to stay CF compliant.
-Mike 

--
Mike McCann
Software Engineer
Monterey Bay Aquarium Research Institute
7700 Sandholdt Road
Moss Landing, CA 95039-9644
Voice: 831.775.1769  Fax: 831.775.1736 http://www.mbari.org

On Jan 15, 2014, at 10:28 AM, John Graybeal wrote:

 Perhaps a reference to the IETF RFC 2119 [1] (which defines these and a few 
 other related terms) should be added to the CF spec
 
 Yes please, since discussion on this thread has already varied in its 
 understanding/application of those terms.
 
 The ambiguity in the sentence No variable or dimension names are 
 standardized by this convention. is also relevant. It could mean This 
 convention defines no requirements about variable or dimension names. or 
 This convention does not specify any particular variable or dimension 
 names. The former meaning obviously reinforces the interpretation that 
 'should' is not a requirement.
 
 While the arguments pushing for the restrictive naming convention (_ as the 
 only special character) are perhaps not strong, for my own use I don't have a 
 compelling use case on the need for more characters either. Mostly this is a 
 matter of personal taste -- I like being able to use . and - to help with 
 visual parsing and + and @ for semantic reasons, and they help reduce the 
 number of likely prefix collisions (which a single separator doesn't help 
 with at all). 
 
 There is also a social benefit from relaxing the CF almost-standard: 
 on-boarding. We want to encourage netCDF users to transition to CF. 
 Minimizing the number of inconsistencies seems practical and 
 forward-thinking. Forcing a netCDF user (which may include lots of HDF users 
 too, these days) to abandon established attribute names is a significant cost 
 for the affected users, now and going forward.
 
 John
 
 
 On Jan 15, 2014, at 10:00, Ethan Davis eda...@unidata.ucar.edu wrote:
 
 Hi all,
 
 The use of should may, by many, be interpreted as a recommendation
 rather than as a requirement.
 
 Though the terms must, should, and may are used throughout the CF
 spec, I am not finding any text that defines those terms.
 
 Perhaps a reference to the IETF RFC 2119 [1] (which defines these and a
 few other related terms) should be added to the CF spec. Though it seems
 that might require a fairly full review of the uses in CF of the terms
 defined in RFC 2119.
 
 Ethan
 
 [1] http://www.ietf.org/rfc/rfc2119.txt
 
 On 1/15/2014 10:46 AM, Karl Taylor wrote:
 All,
 
 Yes, that statement seems quite definitive and unambiguous, and for the
 reasons stated in other emails, I support retaining it.
 
 regards,
 Karl
 
 On 1/15/14 9:37 AM, Steve Hankin wrote:
 
 On 1/15/2014 9:24 AM, Jim Biard wrote:
 Chris,
 
 The point is, the Conventions themselves state that there is *no
 standard*.  People are all the time trying to add meaning to variable
 names, but the standard actually states that the meaning is to reside
 in the attributes.  The variable names are just keys for
 differentiating the variables.  (I could name all my variables
 “vNN”, where N is a digit, and I would be completely valid
 according to the standard.)  The long_name and standard_name
 attributes are the places where descriptors of the variable content
 are to be found.
 
 So I’m raising a question. _ Is there actually anything other than
 sentiment (i.e., an actual rule) that anyone can point to that
 prevents someone from using “new” characters in their variable names?_
 
 How about the lines from the CF document that you cut-pasted (thank you):
 
   /Variable, dimension and attribute names should begin with a
   letter and be composed of letters, digits, and underscores. Note
   that this is in conformance with the COARDS conventions, but is
   more restrictive than the netCDF interface which allows use of the
   hyphen character. The netCDF interface also allows leading
   underscores in names, but the NUG states that this is reserved for
   system use./
 
   - Steve
 
 Grace and peace,
 
 Jim
 
 CICS-NC http://www.cicsnc.org/Visit us on
 Facebook http://www.facebook.com/cicsnc *Jim Biard*
 *Research Scholar*
 Cooperative Institute for Climate and Satellites NC http://cicsnc.org/
 North Carolina State University http://ncsu.edu/
 NOAA's National Climatic Data Center 

Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread Steve Hankin


On 1/15/2014 10:28 AM, John Graybeal wrote:

Perhaps a reference to the IETF RFC 2119 [1] (which defines these and a few 
other related terms) should be added to the CF spec

Yes please, since discussion on this thread has already varied in its 
understanding/application of those terms.

The ambiguity in the sentence No variable or dimension names are standardized by this convention. 
is also relevant. It could mean This convention defines no requirements about variable or dimension 
names. or This convention does not specify any particular variable or dimension names. The 
former meaning obviously reinforces the interpretation that 'should' is not a requirement.


It feels like we are veering towards hair-splitting, no?  CF contains a 
clear (if overlay polite) statement about the proper way to create a 
variable name: /Variable, dimension and attribute names should begin 
with a letter and be composed of letters, digits, and underscores/.   
Clarification of the word should would be useful, yes, but the 
discussion would be highly unlikely to end up changing foundation 
compliance guidelines that have been in CF since the COARDS days.


Since it follows the preceding sentence,  /This convention does not 
standardize any variable or dimension names./ also seems quite clear.   
The loophole that is implied here -- that CF does not standardize 
variable and dimension names, but other groups may do so -- has been 
usefully exploited by groups like OceanSites, who have chosen to 
standardize their own names and naming patterns sitting atop CF as a 
normative standard.


While the arguments pushing for the restrictive naming convention (_ as the 
only special character) are perhaps not strong, for my own use I don't have a 
compelling use case on the need for more characters either. Mostly this is a 
matter of personal taste -- I like being able to use . and - to help with 
visual parsing and + and @ for semantic reasons, and they help reduce the 
number of likely prefix collisions (which a single separator doesn't help with 
at all).
Agree.  There are factors sitting in the balance pans on both the pro 
and con side.  Special syntax names allow one to create very concise 
names with (we hope) self-evident meanings.  When you are the person 
engaged in the act of defining a new file, this is especially 
attractive.  But over the lifecycle of the data -- considering data 
discovery and data usage in a wide range of contexts -- the special 
syntax characters come back to bite you time and again.


Mike's example of an embedded dot is an interesting one because it 
cuts both ways.  Yes, there are times when creating CF files where it 
seems convenient to embed . into a name in order to preserve a 
hierarchy from the software of origin.  But there will then be 
downstream situations that we make a muddle of when those applications 
want to use the same approach to designate a different hierarchy.  For 
example, downstream applications that want to refer to 
varname.attributename are forced into ugly hacks like 
var.name.with.dots.attributename.  (Admittedly, this Pandora's box has 
already been opened.  We are already forced to contend with this today.)


A point I feel we ought to remind ourselves of, is that in an issue like 
the naming of variables we should try to put ourselves into the head 
space of the users of the data -- scientists.  Funky looking camel-case 
strings are bread and butter to software developers, but not so much to 
the sensibilities of scientists (particularly older ones).


There is also a social benefit from relaxing the CF almost-standard: 
on-boarding. We want to encourage netCDF users to transition to CF. Minimizing 
the number of inconsistencies seems practical and forward-thinking. Forcing a 
netCDF user (which may include lots of HDF users too, these days) to abandon 
established attribute names is a significant cost for the affected users, now 
and going forward.
I agree that this is a valid consideration.  There is gray surrounding 
this issue.


- Steve


John


On Jan 15, 2014, at 10:00, Ethan Davis eda...@unidata.ucar.edu wrote:


Hi all,

The use of should may, by many, be interpreted as a recommendation
rather than as a requirement.

Though the terms must, should, and may are used throughout the CF
spec, I am not finding any text that defines those terms.

Perhaps a reference to the IETF RFC 2119 [1] (which defines these and a
few other related terms) should be added to the CF spec. Though it seems
that might require a fairly full review of the uses in CF of the terms
defined in RFC 2119.

Ethan

[1] http://www.ietf.org/rfc/rfc2119.txt

On 1/15/2014 10:46 AM, Karl Taylor wrote:

All,

Yes, that statement seems quite definitive and unambiguous, and for the
reasons stated in other emails, I support retaining it.

regards,
Karl

On 1/15/14 9:37 AM, Steve Hankin wrote:

On 1/15/2014 9:24 AM, Jim Biard wrote:

Chris,

The point is, the Conventions themselves state that 

Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread Chris Barker
On Wed, Jan 15, 2014 at 9:24 AM, Jim Biard jbi...@cicsnc.org wrote:

 The point is, the Conventions themselves state that there is *no standard*.
  People are all the time trying to add meaning to variable names, but the
 standard actually states that the meaning is to reside in the attributes.


but we aren't talking about assigning meaning, or telling anyone what names
they can use.

There is an existing rule about what charactors can be used for variable
names, that's it -- and we've given a couple not-all-that compelling
reasons why that rule is good, and no reason other than maybe taste, why
that rule would be extended.

(and it certainly shouldn't be removed completely -- variable names with
arbitrary bytes in them would really be a mess). Is it ascii-only now? it
probably should stay that way.

Perhaps there are some reasons to want less-restrictive variable names --
I'm not always that imaginative, but if so, then present them.

 The variable names are just keys for differentiating the variables.  (I
 could name all my variables “vNN”, where N is a digit, and I would
 be completely valid according to the standard.)


yup, but you couldn't name them: vNNN-NNN -- and why do you need to?

Given your point about the real meaning being encoded in the attributes,
then a prime reason to choose a given variable name is that it matches a
name you are using elsewhere in your process -- which is why I like them
being restricted to names that are valid variable names in programming
languages. Bu tit also may be a reason to be more flexible -- if you call
something this+that elsewhere in your process, you may want to use it in
your netcdf files, too.

-Chris



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread John Graybeal
And I wasn't going to say anything else, but this crystallized an issue or two 
from past mails. I promise to (try to) let it go after this. 

On Jan 15, 2014, at 12:37, Chris Barker chris.bar...@noaa.gov wrote:

 There is an existing rule about what charactors can be used for variable 
 names, that's it -- and we've given a couple not-all-that compelling reasons 
 why that rule is good, and no reason other than maybe taste, why that rule 
 would be extended.

I don't think multiple use cases from different individuals and communities 
should be categorized as no reason other than maybe taste.  Just sayin'...

 (and it certainly shouldn't be removed completely -- variable names with 
 arbitrary bytes in them would really be a mess). Is it ascii-only now? it 
 probably should stay that way.

This prompts me to observe that somehow, in this brave new age of computer 
programming, people are developing netCDF software that supports Unicode 
characters -- Unicode!! -- in variable (attribute etc) names. There will be 
netCDF files in the wild, used by scientists and normal people (especially 
normal people from non-English-speaking countries) that use all sorts of wild 
and crazy characters in their variable names. (Perhaps CF thinks these are 
alphanumeric, in which case I've found a solution! The standard certainly is 
not explicitly ASCII-only.)  By the way, I was amazed to learn that using 
Unicode in programming languages is starting to take hold.

At some point, we in the CF-supporting community are going to have to support 
the standard practices in this aspect that are going on everywhere else in the 
software world, or decide we want a permanent back-water for the 'scientists 
who are not interested in or capable of supporting these practices' (not my 
claim).

 Perhaps there are some reasons to want less-restrictive variable names -- I'm 
 not always that imaginative, but if so, then present them.

Let's just make the list so far, to get everyone up to speed with the 
discussion:
* easier visual parsing (taste, yes, but practical also if you work with lots 
of data sets from different communities)
* embedding semantic meaning (taste)
* clearly isolating the context (namespace, hierarchy)
* matching attribute names that come from the source data
* consistency with netCDF usage/files - easier onboarding of those files
* Unicode/internationalization support

John


___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread Russ Rew
John Graybeal wrote:
 This prompts me to observe that somehow, in this brave new age of computer 
 programming, people are
 developing netCDF software that supports Unicode characters -- Unicode!! -- 
 in variable (attribute
 etc) names. There will be netCDF files in the wild, used by scientists and 
 normal people (especially
 normal people from non-English-speaking countries) that use all sorts of wild 
 and crazy characters
 in their variable names. (Perhaps CF thinks these are alphanumeric, in 
 which case I've found a
 solution! The standard certainly is not explicitly ASCII-only.)  By the way, 
 I was amazed to learn
 that using Unicode in programming languages is starting to take hold.

Yes, since June 2008 we have supported use of Unicode characters in
names in both netCDF-3 and netCDF-4 software.  The intent was to make
netCDF more suitable for international use, rather than to encode
mathematical operations in variable names.  But we were also responding
to needs of some user communities, for example atmospheric chemists who
wanted to be able to use standard notations for chemical species in
variable names.

Here's a small non-sensical example of ncdump output for a file
containing Unicode names:

  
http://www.unidata.ucar.edu/netcdf/workshops/most-recent/utilities/Unicode.html

The precise rules for netCDF names are in the format documentation, but
the short version is:

  ... The first character of a name must be alphanumeric, a multi-byte
  UTF-8 character, or '_' (reserved for special names with meaning to
  implementations, such as the “_FillValue” attribute). Subsequent
  characters may also include printing special characters, except for
  '/' which is not allowed in names. Names that have trailing space
  characters are also not permitted.

That document also warns:

  Note that by using special characters in names, you may make your data
  not compliant with conventions that have more stringent requirements
  on valid names for netCDF components, for example the CF Conventions.

 At some point, we in the CF-supporting community are going to have to support 
 the standard practices
 in this aspect that are going on everywhere else in the software world, or 
 decide we want a
 permanent back-water for the 'scientists who are not interested in or capable 
 of supporting these
 practices' (not my claim).
 
 Perhaps there are some reasons to want less-restrictive variable names -- 
 I'm not always
 that imaginative, but if so, then present them.
 
 Let's just make the list so far, to get everyone up to speed with the 
 discussion:
 * easier visual parsing (taste, yes, but practical also if you work with lots 
 of data sets from
 different communities)
 * embedding semantic meaning (taste)
 * clearly isolating the context (namespace, hierarchy)
 * matching attribute names that come from the source data
 * consistency with netCDF usage/files - easier onboarding of those files
 * Unicode/internationalization support

--Russ
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-15 Thread Chris Barker
Let the bike shedding continue!

On Wed, Jan 15, 2014 at 1:14 PM, John Graybeal jbgrayb...@mindspring.comwrote:

 I don't think multiple use cases from different individuals and
 communities should be categorized as no reason other than maybe taste.
  Just sayin'...

 multiple use-cases are examples not reasons -- I'd like to do that, or
I've been doing that doesn't give a why... though you do below, thanks.


(and it certainly shouldn't be removed completely -- variable names with
arbitrary bytes in them would really be a mess). Is it ascii-only now? it
probably should stay that way.


This prompts me to observe that somehow, in this brave new age of computer
 programming, people are developing netCDF software that supports Unicode
 characters -- Unicode!! -- in variable (attribute etc) names.


I'm a fan of unicode, actually, but despite it being around a long time,
now, it's still a pain in the *%^ in C, C++, and, I'm guessing, Fortran.
Not so bad in more modern languages, though apparently some use UTF-16 and
don't always handle the larger code points correctly. So still a pain.

And as you can tell, I'm a fan of restricting names to particular classes
of characters, and unicode includes a lot of concepts that are pretty hard
to define: e.g. alphanumeric. I can see how it owuld be really nice for
non-english speakers or math and science geeks to use all sorts of great
variable names, but Im afraid opening up fully might more of a nightmae
than it is worth.

My pet programming language, python, currently allows unicode variable
names, with restrictions, but his is a heck of a list to keep track of!

http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html


 There will be netCDF files in the wild, used by scientists and normal
 people (especially normal people from non-English-speaking countries) that
 use all sorts of wild and crazy characters in their variable names.
 (Perhaps CF thinks these are alphanumeric, in which case I've found a
 solution! The standard certainly is not explicitly ASCII-only.)  By the
 way, I was amazed to learn that using Unicode in programming languages is
 starting to take hold.


but still only starting

At some point, we in the CF-supporting community are going to have to
 support the standard practices in this aspect that are going on everywhere
 else in the software world, or decide we want a permanent back-water for
 the 'scientists who are not interested in or capable of supporting these
 practices' (not my claim).


I think unicode is a red herring for this issue -- not that it isn't
interesting, but for sure full unicode options would allow nice expressive
variable names, but I'd still rather have variable names that don't look
like math expressions, and aren't legal names in programing languages.

The current CF document says
Variable, dimension and attribute names should begin with a letter and be
composed of letters, digits, and underscores.

but letters is not very well defined when you get outside of ascii -- it
seems we have work to do.




 Perhaps there are some reasons to want less-restrictive variable names --
 I'm not always that imaginative, but if so, then present them.


 Let's just make the list so far, to get everyone up to speed with the
 discussion:
 * easier visual parsing (taste, yes, but practical also if you work with
 lots of data sets from different communities)
 * embedding semantic meaning (taste)
 * clearly isolating the context (namespace, hierarchy)


I'm having trouble seeing how adding math symbols, etc will help these --
they can be done pretty well with underscores...


 * matching attribute names that come from the source data
 * consistency with netCDF usage/files - easier onboarding of those files


mixed bag here -- CF is intended to be more restricted than netcdf


* Unicode/internationalization support

orthogonal question, I think. unless there's a language that uses + as a
letter

I think we've only heard from me and Steve saying we didn't like this
proposal -- don't take our work on it!


-Chris



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-14 Thread Chris Barker
There is another reason:

mapping CF variable names directly to programming language variable names
is pretty handy -- so it's nice if those are legal.

I'm sure not all programming languages have the same restrictions on names,
but there is surely a subset that's pretty common (i.e. none of the usual
math characters).

-Chris



On Mon, Jan 13, 2014 at 12:57 PM, Steve Hankin steven.c.han...@noaa.govwrote:

  Hi John,

 Philosophically I am aligned with Bryan:  the purpose of the CF standard
 is to constrain (simplify and make predictable) the use of a highly general
 file creation toolkit like netCDF.   The question of limitations placed on
 name strings should be evaluated on this yard stick.

 There is a class of problems that are created by embedding special syntax
 characters willy-nilly into name strings.  Namely, that the use of such
 characters can render mathematical expressions ambiguous.  Here's a simple
 example.  Suppose a file contains 3 surface marine variables -- lets say
 atmospheric CO2, ocean CO2 and an artfully computed delta across the
 surface.  Further say that the file creator chooses to name the delta
 variable using a -, as in
 atmosCO2
 waterCO2
 and
 atmosCO2-waterCO2

 Then the meaning of the mathematical expression  atmosCO2-waterCO2 has
 been rendered ambiguous.  Is it a single variable name, or the difference
 of two?   One is forced to use arbitrary tricks that are alien to the
 scientific users we are trying to serve -- say disambiguating  the
 expression by insisting on surrounding quotes, atmosCO2-waterCO2,
 white space, atmosCO2 - waterCO2.  (Would any scientist read atmosCO2
 - waterCO2 and atmosCO2-waterCO2 to have distinct meanings?)

 As you say we have already headed down this (slippery) slope.  Characters
 like +, -, . and case-sensitivity have leaked through into fairly
 common practice.  For better or worse.  :-(   (Should the publishers of
 science textbooks start using case-sensitive variable names?)  So the
 question that you've posed is in a sense, *now that the horse is out of
 the barn, is there any merit to keeping the other animals penned?*   Like
 Brian, I would argue that the way to answer this is to insist that at least
 there be significant gains from letting them out.

 Another unintended negative consequence:  the impact on free text searches
 when our variable names include special syntax characters.   Are our
 metadata procedures on an arc so promising that we have no need to rely on
 general Google-style tools for discovery?

   - Steve

 =


 On 1/13/2014 12:12 PM, John Graybeal wrote:

 Not sure I am following you -- constraints are presumably there for a
 reason, I wasn't sure what the reason was for these particular constraints,
 but thought they might have simply echoed earlier netCDF constraints.

  To your 'use case' question, we were thinking about alternatives to mx_
 as prefix for our own attributes, to minimize the chance of collisions
 (e.g., with some maintenance variables someone might name mx_).

  john

  On Jan 13, 2014, at 11:27, Bryan Lawrence bryan.lawre...@ncas.ac.uk
 wrote:

  Hi John

  In the spirit of CF being *constrained* netCDF, it seems that we
 wouldn't, unless we had a specific use case ... do you?

 Cheers
 Bryan


 On 13 January 2014 18:54, john.grayb...@marinexplore.com wrote:

 As netCDF is growing to allow @, +, hyphen, and period in
 variable/dimension/attribute names, is there any likelihood CF will grow to
 allow some or all of those characters?

 I seem to recall some tools have conflicts with some of those characters
 (aside from them being non-conformant). But consistency and flexibility
 would be nice.

 john
 
 John Graybeal
 Sr. Data Manager, Metadata  Semantics

 M +1 408 675-5445
 skype: graybealski
 Marinexplore
 920 Stewart Drive
 Sunnyvale 94085
 California, USA
 www.marinexplore.comhttp://marinexplore.com


 --
 Scanned by iCritical.




  --

  Bryan Lawrence
 University of Reading: Professor of Weather and Climate Computing.
 National Centre for Atmospheric Science: Director of Models and Data.
 STFC: Director of the Centre for Environmental Data Archival.
 Ph: +44 118 3786507 or 1235 445012; Web:home.badc.rl.ac.uk/lawrence



 *John Graybeal*
 Sr. Data Manager, Metadata  Semantics

 M +1 408 675-5445
 skype: graybealski
 Marinexplore
 920 Stewart Drive
 Sunnyvale 94085
 California, USA
 www.marinexplore.com http://marinexplore.com



 ___
 CF-metadata mailing 
 listcf-metad...@cgd.ucar.eduhttp://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata



 ___
 CF-metadata mailing list
 CF-metadata@cgd.ucar.edu
 http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata




-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 

Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-13 Thread John Graybeal
Not sure I am following you -- constraints are presumably there for a reason, I 
wasn't sure what the reason was for these particular constraints, but thought 
they might have simply echoed earlier netCDF constraints.

To your 'use case' question, we were thinking about alternatives to mx_ as 
prefix for our own attributes, to minimize the chance of collisions (e.g., with 
some maintenance variables someone might name mx_).

john

On Jan 13, 2014, at 11:27, Bryan Lawrence bryan.lawre...@ncas.ac.uk wrote:

 Hi John
 
 In the spirit of CF being *constrained* netCDF, it seems that we wouldn't, 
 unless we had a specific use case ... do you?
 
 Cheers
 Bryan
 
 
 On 13 January 2014 18:54, john.grayb...@marinexplore.com wrote:
 As netCDF is growing to allow @, +, hyphen, and period in 
 variable/dimension/attribute names, is there any likelihood CF will grow to 
 allow some or all of those characters?
 
 I seem to recall some tools have conflicts with some of those characters 
 (aside from them being non-conformant). But consistency and flexibility would 
 be nice.
 
 john
 
 John Graybeal
 Sr. Data Manager, Metadata  Semantics
 
 M +1 408 675-5445
 skype: graybealski
 Marinexplore
 920 Stewart Drive
 Sunnyvale 94085
 California, USA
 www.marinexplore.comhttp://marinexplore.com
 
 
 --
 Scanned by iCritical.
 
 
 
 
 -- 
 
 Bryan Lawrence
 University of Reading: Professor of Weather and Climate Computing.
 National Centre for Atmospheric Science: Director of Models and Data.
 STFC: Director of the Centre for Environmental Data Archival.
 Ph: +44 118 3786507 or 1235 445012; Web:home.badc.rl.ac.uk/lawrence 


John Graybeal
Sr. Data Manager, Metadata  Semantics

M +1 408 675-5445
skype: graybealski
Marinexplore
920 Stewart Drive
Sunnyvale 94085
California, USA
www.marinexplore.com

___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] CF upgrade to netCDF variable names

2014-01-13 Thread Steve Hankin

Hi John,

Philosophically I am aligned with Bryan:  the purpose of the CF standard 
is to constrain (simplify and make predictable) the use of a highly 
general file creation toolkit like netCDF.   The question of limitations 
placed on name strings should be evaluated on this yard stick.


There is a class of problems that are created by embedding special 
syntax characters willy-nilly into name strings.  Namely, that the use 
of such characters can render mathematical expressions ambiguous.  
Here's a simple example.  Suppose a file contains 3 surface marine 
variables -- lets say atmospheric CO2, ocean CO2 and an artfully 
computed delta across the surface.  Further say that the file creator 
chooses to name the delta variable using a -, as in

atmosCO2
waterCO2
and
//  atmosCO2-waterCO2

Then the meaning of the mathematical expression  atmosCO2-waterCO2 has 
been rendered ambiguous.  Is it a single variable name, or the 
difference of two?   One is forced to use arbitrary tricks that are 
alien to the scientific users we are trying to serve -- say 
disambiguating  the expression by insisting on surrounding quotes, 
atmosCO2-waterCO2, white space, atmosCO2 - waterCO2.  (Would any 
scientist read atmosCO2 - waterCO2 and atmosCO2-waterCO2 to have 
distinct meanings?)


As you say we have already headed down this (slippery) slope. Characters 
like +, -, . and case-sensitivity have leaked through into fairly 
common practice.  For better or worse. :-(   (Should the publishers of 
science textbooks start using case-sensitive variable names?)  So the 
question that you've posed is in a sense, /now that the horse is out of 
the barn, is there any merit to keeping the other animals penned?/   
Like Brian, I would argue that the way to answer this is to insist that 
at least there be significant gains from letting them out.


Another unintended negative consequence:  the impact on free text 
searches when our variable names include special syntax characters.   
Are our metadata procedures on an arc so promising that we have no need 
to rely on general Google-style tools for discovery?


  - Steve

=

On 1/13/2014 12:12 PM, John Graybeal wrote:
Not sure I am following you -- constraints are presumably there for a 
reason, I wasn't sure what the reason was for these particular 
constraints, but thought they might have simply echoed earlier netCDF 
constraints.


To your 'use case' question, we were thinking about alternatives to 
mx_ as prefix for our own attributes, to minimize the chance of 
collisions (e.g., with some maintenance variables someone might name mx_).


john

On Jan 13, 2014, at 11:27, Bryan Lawrence bryan.lawre...@ncas.ac.uk 
mailto:bryan.lawre...@ncas.ac.uk wrote:



Hi John

In the spirit of CF being *constrained* netCDF, it seems that we 
wouldn't, unless we had a specific use case ... do you?


Cheers
Bryan


On 13 January 2014 18:54, john.grayb...@marinexplore.com 
mailto:john.grayb...@marinexplore.com wrote:


As netCDF is growing to allow @, +, hyphen, and period in
variable/dimension/attribute names, is there any likelihood CF
will grow to allow some or all of those characters?

I seem to recall some tools have conflicts with some of those
characters (aside from them being non-conformant). But
consistency and flexibility would be nice.

john

John Graybeal
Sr. Data Manager, Metadata  Semantics

M +1 408 675-5445 tel:%2B1%20408%20675-5445
skype: graybealski
Marinexplore
920 Stewart Drive
Sunnyvale 94085
California, USA
www.marinexplore.com
http://www.marinexplore.com/http://marinexplore.com
http://marinexplore.com/


--
Scanned by iCritical.




--

Bryan Lawrence
University of Reading: Professor of Weather and Climate Computing.
National Centre for Atmospheric Science: Director of Models and Data.
STFC: Director of the Centre for Environmental Data Archival.
Ph: +44 118 3786507 or 1235 445012; Web:home.badc.rl.ac.uk/lawrence 
http://home.badc.rl.ac.uk/lawrence



*John Graybeal*
Sr. Data Manager, Metadata  Semantics

M +1 408 675-5445
skype: graybealski
Marinexplore
920 Stewart Drive
Sunnyvale 94085
California, USA
www.marinexplore.com http://marinexplore.com



___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata