[CF-metadata] Metadata/standard names for variables involving thresholds (e.g. climate indices)

2016-10-17 Thread martin.juckes
Dear Lars,

see comments in line below:

Martin
---
Dear all,

Before the summer I asked in separate emails a few questions related to 
standard names. Based on the responses I have worked on this a bit more and 
made some good progress, but there still are some open issues that I am trying 
to wrap my head around. Most of the issues are related to making use of 
existing standard names involving thresholds to describe well established 
climate indices (aka indices of climate extremes). To get an overview of the 
issues I collect them here (with reference to the previous email threads), but 
if necessary we can again fork them:

1. Canonical units 
(http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2016/058800.html, 3rd point):
"number_of_days_" variables have dimension "1" (dimensionless) and 
"spell_length_of_days_..." variables have dimension "day". This seems somewhat 
inconsistent as both standard names specify "_days_". As both standard names 
contain the unit it may from a formal point of view be more appropriate that 
both would be dimensionless, but from a practical point of view, e.g. when 
designing software, it would be (much) more useful to have the unit ("day") in 
the metadata rather than having to parse it from the standard name for these 
particular variables, when this is mostly not necessary.

It does appear problematic to have units which will be confusing for the 
majority of users who are likely to be interested in these variables. Perhaps 
we could sidestep the issue by adding new standard names of the form 
"time_with_air_temperature_above_threshold" etc. Changing the units of the 
existing standard names would be problematic for existing data files which are 
valid with the current definitions.

2. Climatological time variable 
(http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2016/058800.html, 2nd point):
It is not clear to me why "number_of_days_..." and "spell_length_of_days_..." 
variables must have a climatological time variable. I would suggest that it 
makes sense to have an ordinary time coordinate variable that allows bounds to 
be specified. For example, assume that we have a time-series of daily 
near-surface temperature fields from a CMIP5 model ("tas") and want for each 
month to calculate the number_of_days_with_air_temperature_above_threshold. 
I.e. this is not much different from calculating the monthly mean temperature, 
for which an ordinary time coordinate variable with bounds is used. The same 
line of argument applies for "spell_length_of _days_..." variables and the two 
"integral_of_air_temperature_..." variables.

The climatological time axis specifies what kind of "air_temperature" is being 
compared against the threshold (e.g. daily mean, daily max).

3. Threshold must have a coordinate variable . 
(http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2016/058824.html):
This means that the threshold must be either a constant or several alternative 
constants. As I understand it, it is neither possible to have a two-dimensional 
threshold that varies over the domain, e.g. based on a climatological 
(quantile) value for each gridpoint, nor to have a three-dimensional threshold, 
e.g. a seasonally varying climatological threshold for each gridpoint. I can 
envisage two ways to allow 2-/3-dim thresholds: either 1) allow ancillary 
variables as thresholds. The ancillary variable would have a coordinate 
variable providing the the underlying constant(s), e.g. quantile levels. Or 2) 
allow the threshold to have a different unit from the main variable which then 
enables the underlying quantiles to be used as threshold variable directly 
where the actual 2-/3-dim climatological thresholds are available as an 
ancillary variable.

You could try the following, which, I believe, allows a separate threshold at 
each lat,lon point.
   float myClimateIndex( index )
  coordinates: lat lon threshold
   float lat(index):
  standard_name: latitude
   float lon(index):
  standard_name: longitude
float threshold(index):
  standard_name: air_temperature
  coordinates: lat lon

4. Standard name of the threshold variable 
(http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2016/058856.html):
The standard name air_temperature_threshold is only used in the definition of 
integral_of_air_temperature_deficit_wrt_time and 
integral_of_air_temperature_excess_wrt_time. For other threshold-based standard 
names involving air temperature, wind and lwe thickness of precipitation 
(listed below) the explanation reads ". must have a . variable with the a 
standard name of X to supply the threshold(s)". I.e. the threshold variable 
must have the same standard name as 'main variable' to which the threshold is 
applied.
The latter seems more in line with the view Jonathan expressed in a previous 
email exchange (lower part of 
http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2014/057420.html).
Based on this, my suggestion is

[CF-metadata] Metadata/standard names for variables involving thresholds (e.g. climate indices)

2016-09-26 Thread Bärring Lars
Dear all,

Before the summer I asked in separate emails a few questions related to 
standard names. Based on the responses I have worked on this a bit more and 
made some good progress, but there still are some open issues that I am trying 
to wrap my head around.  Most of the issues are related to making use of 
existing standard names involving thresholds to describe well established 
climate indices (aka indices of climate extremes). To get an overview of the 
issues I collect them here (with reference to the previous email threads), but 
if necessary we can again fork them:

1. Canonical units  
(http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2016/058800.html, 3rd point):
"number_of_days_" variables have dimension "1" (dimensionless) and 
"spell_length_of_days_..." variables have dimension "day". This seems somewhat 
inconsistent as both standard names specify "_days_". As both standard names 
contain the unit it may from a formal point of view be more appropriate that 
both would be dimensionless, but from a practical point of view, e.g. when 
designing software, it would be (much) more useful to have the unit ("day") in 
the metadata rather than having to parse it from the standard name for these 
particular variables, when this is mostly not necessary.

2. Climatological time variable  
(http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2016/058800.html, 2nd point):
It is not clear to me why "number_of_days_..." and "spell_length_of_days_..." 
variables must have a climatological time variable. I would suggest that it 
makes sense to have an ordinary time coordinate variable that allows bounds to 
be specified. For example, assume that we have a time-series of daily 
near-surface temperature fields from a CMIP5 model ("tas") and want for each 
month to calculate the number_of_days_with_air_temperature_above_threshold. 
I.e. this is not much different from calculating the monthly mean temperature, 
for which an ordinary time coordinate variable with bounds is used. The same 
line of argument applies for "spell_length_of _days_..." variables and the two 
"integral_of_air_temperature_..." variables.

3. Threshold must have a coordinate variable . 
(http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2016/058824.html): 
This means that the threshold must be either a constant or several alternative 
constants. As I understand it, it is neither possible to have a two-dimensional 
threshold that varies over the domain, e.g. based on a climatological 
(quantile) value for each gridpoint, nor to have a three-dimensional threshold, 
e.g. a seasonally varying climatological threshold for each gridpoint. I can 
envisage two ways to allow 2-/3-dim thresholds: either 1) allow ancillary 
variables as thresholds. The ancillary variable would have a coordinate 
variable providing the the underlying constant(s), e.g. quantile levels. Or 2) 
allow the threshold to have a different unit from the main variable which then 
enables the underlying quantiles to be used as threshold variable  directly 
where the actual 2-/3-dim climatological thresholds are available as an 
ancillary variable.

4. Standard name of the threshold variable  
(http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2016/058856.html): 
The standard name air_temperature_threshold is only used in the definition of 
integral_of_air_temperature_deficit_wrt_time and 
integral_of_air_temperature_excess_wrt_time. For other threshold-based standard 
names involving air temperature, wind and lwe thickness of precipitation 
(listed below) the explanation reads ". must have a . variable with the a 
standard name of X to supply the threshold(s)". I.e. the threshold variable 
must have the same standard name as 'main variable' to which the threshold is 
applied. 
The latter seems more in line with the view Jonathan expressed in a previous 
email exchange (lower part of 
http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2014/057420.html). 
Based on this, my suggestion is to change the explanation of the two standard 
names integral_of_air_temperature_deficit|excess_wrt_time to read 
" ... The air_temperature variable, which is the data variable of the 
integral should have a 
scalar coordinate variable or a size-one coordinate variable with the 
standard name of 
either air_temperature (preferred) or air_temperature_threshold to 
indicate the threshold..."
With this minor change consistency with other threshold-based variables is 
introduced (and encouraged) at the same time as backward compatibility is 
maintained.


Kind regards,
Lars Bärring

FDr, Forskare
PhD, Research Scientist

SMHI  /  Swedish Meteorological and Hydrological Institute 
Rossby Centre 
SE - 601 76 NORRKÖPING 
http://www.smhi.se

E-post / Email: lars.barr...@smhi.se
Tel / Phone: +46 (0)11 495 8604  
Fax: +46 (0)11 495 8001 
Besöksadress / Visiting address: Folkborgsvägen 17


___
CF-metadata mailing list
CF-metadata@cgd.uca