Re: [CF-metadata] Missing data bins in histograms

2019-05-16 Thread Jonathan Gregory
Dear Martin

Thanks for explaining the use case. I agree that where flag_meanings is used
to encode a string-valued data variable, the permissible flag_values are those
which are allowed by data variable's standard name, if they are standardised.
I also agree that we should give guidance for the use of flag_values with
coordinate variables. I suggest something more general, such that coordinate
values which are flag_values are status codes that indicate that a numerical
coordinate value cannot be assigned to the cell, but the data is nonetheless
meaningful (in some way indicated by flag_meanings). For example (as I said
before) a possible application is for all coordinate values which are too small
or too large.

Best wishes

Jonathan

- Forwarded message from Martin Juckes - UKRI STFC 
 -

> Date: Thu, 16 May 2019 08:55:22 +
> From: Martin Juckes - UKRI STFC 
> To: Jonathan Gregory , "cf-metadata@cgd.ucar.edu"
>       
> Subject: Re: [CF-metadata] Missing data bins in histograms
> 
> Dear Jonathan,
> 
> 
> OK, flagged-dimension-array approach is certainly a more compact 
> representation than my previous proposal (which had a flagged auxillary 
> coordinate). It would certainly be an extension to the scope of the examples 
> in the CF Conventions document. I have a slight concern that there might be a 
> subtle conflict in terms of the way in which the "flag_meanings" are 
> interpreted -- but I think that can be dealt with if we frame the description 
> of new example carefully.
> 
> 
> Firstly, it is perhaps relevant to clear up that "out_of_range" would not 
> work in this case. The data in this bin is, I believe, a count of the 
> profiles taken by the sensor for which the retrieval algorithm returned an 
> error code rather than a height value. It is common practise to want an 
> "out_of_range" bin in a histogram, but that is not the use case here. If we 
> were recording the results from individual retrievals in a field with 
> standard name "height", these data points would be flagged using the 
> "missing_value" attribute.
> 
> 
> In the existing examples of "flag_meanings" it appears that the status codes 
> used are application specific, so there is perhaps no need to make a firm 
> decision on the text used here ... though the example should make sense.
> 
> 
> There are standard names which assert specific rules about the usage of 
> "flag_meanings" -- we have discussed these in trac ticket 
> 153<https://cf-trac.llnl.gov/trac/ticket/153>. That discussion is not 
> concluded, but the clear intention is that, for some names at least, the flag 
> values/meanings mechanism should be used with "flag_meanings" which conform 
> to the specification of the standard name. For example, if the standard name 
> is "region", the "flag_meanings" values should come from the accepted list of 
> region names.
> 
> 
> In general, I think the "flag_meanings" should be consistent with the 
> standard name. Perhaps we could express this as "The flag_meanings values 
> should be consistent with the specifications of the standard name or, 
> alternatively, when used in a coordinate array, represent status codes 
> associated with data which would be reported as missing." With a 
> clarification of this kind, I think the approach you are suggesting can work. 
> This makes it clear that we are dealing with data points which would use the 
> "missing_value" when recorded in a data array, but we have a different 
> approach here because we are constructing a coordinate array to label data 
> values, and the coordinate array has a label for missing values, not an 
> missing value.
> 
> 
> regards,
> 
> Martin
> 
> 
> 
> 
> 
> From: CF-metadata  on behalf of Jonathan 
> Gregory 
> Sent: 15 May 2019 16:48
> To: cf-metadata@cgd.ucar.edu
> Subject: Re: [CF-metadata] Missing data bins in histograms
> 
> Dear Martin
> 
> Yes, that's what I meant, thanks. Indeed, I might be suggesting an extension
> of the use of flag_values, because you're right that its existing uses are in
> cases where every data value that occurs is one which appears in flag_values.
> However this is not a requirement in the convention or conformance documents.
> 
> In your particular case, it is best to give the meaning as "missing"? It would
> be more informative to call them "out of range", perhaps.
> 
> Best wishes
> 
> Jonathan
> 
> - Forwarded message from Martin Juckes - UKRI STFC 
>  -
> 
> > Date: Tue, 14 May 2019 14:34:48 +
> > From: Martin Juc

Re: [CF-metadata] Missing data bins in histograms

2019-05-16 Thread Martin Juckes - UKRI STFC
Dear Jonathan,


OK, flagged-dimension-array approach is certainly a more compact representation 
than my previous proposal (which had a flagged auxillary coordinate). It would 
certainly be an extension to the scope of the examples in the CF Conventions 
document. I have a slight concern that there might be a subtle conflict in 
terms of the way in which the "flag_meanings" are interpreted -- but I think 
that can be dealt with if we frame the description of new example carefully.


Firstly, it is perhaps relevant to clear up that "out_of_range" would not work 
in this case. The data in this bin is, I believe, a count of the profiles taken 
by the sensor for which the retrieval algorithm returned an error code rather 
than a height value. It is common practise to want an "out_of_range" bin in a 
histogram, but that is not the use case here. If we were recording the results 
from individual retrievals in a field with standard name "height", these data 
points would be flagged using the "missing_value" attribute.


In the existing examples of "flag_meanings" it appears that the status codes 
used are application specific, so there is perhaps no need to make a firm 
decision on the text used here ... though the example should make sense.


There are standard names which assert specific rules about the usage of 
"flag_meanings" -- we have discussed these in trac ticket 
153<https://cf-trac.llnl.gov/trac/ticket/153>. That discussion is not 
concluded, but the clear intention is that, for some names at least, the flag 
values/meanings mechanism should be used with "flag_meanings" which conform to 
the specification of the standard name. For example, if the standard name is 
"region", the "flag_meanings" values should come from the accepted list of 
region names.


In general, I think the "flag_meanings" should be consistent with the standard 
name. Perhaps we could express this as "The flag_meanings values should be 
consistent with the specifications of the standard name or, alternatively, when 
used in a coordinate array, represent status codes associated with data which 
would be reported as missing." With a clarification of this kind, I think the 
approach you are suggesting can work. This makes it clear that we are dealing 
with data points which would use the "missing_value" when recorded in a data 
array, but we have a different approach here because we are constructing a 
coordinate array to label data values, and the coordinate array has a label for 
missing values, not an missing value.


regards,

Martin




________
From: CF-metadata  on behalf of Jonathan 
Gregory 
Sent: 15 May 2019 16:48
To: cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] Missing data bins in histograms

Dear Martin

Yes, that's what I meant, thanks. Indeed, I might be suggesting an extension
of the use of flag_values, because you're right that its existing uses are in
cases where every data value that occurs is one which appears in flag_values.
However this is not a requirement in the convention or conformance documents.

In your particular case, it is best to give the meaning as "missing"? It would
be more informative to call them "out of range", perhaps.

Best wishes

Jonathan

- Forwarded message from Martin Juckes - UKRI STFC 
 -

> Date: Tue, 14 May 2019 14:34:48 +0000
> From: Martin Juckes - UKRI STFC 
> To: Jonathan Gregory , "cf-metadata@cgd.ucar.edu"
>
> Subject: Re: [CF-metadata] Missing data bins in histograms
>
> Dear Jonathan,
>
>
> Sorry, I think I misunderstood the scope of valid usage of "flag_values". 
> I've only seen it used in contexts in which all values of the flagged array 
> are translated using the "flag_values"/"flag_meanings" pairs, but you are 
> suggesting, I think, that it should only apply to the one anomalous bin. If 
> we can use a single "flag_values" without changing the interpretation of the 
> rest of the array, that would make the solution easier.
>
>
> Does this correspond to what you are thinking of:
>
>
> float data(time,lat,lon,zbins);
>   data: standard_name =   
> "histogram_of_equivalent_reflectivity_factor_over_height_above_reference_ellipsoid";
>   data: coordinates="status";
> float zbins(zbins);
>   zbins: long_name="Height ranges (with bin for missing data at first 
> element)"
>   zbins: units="m";
>   zbins: bounds="zbin_bnds";
>   zbins: standard_name = "height";
>
>   zbins:flag_values =  -.f;
>   zbins:flag_meanings = "missing_values";
> float zbin_bnds(zindex,2);
> character status(char_len);
>status:standard_name = "status_flag";

Re: [CF-metadata] Missing data bins in histograms

2019-05-15 Thread Jonathan Gregory
Dear Martin

Yes, that's what I meant, thanks. Indeed, I might be suggesting an extension
of the use of flag_values, because you're right that its existing uses are in
cases where every data value that occurs is one which appears in flag_values.
However this is not a requirement in the convention or conformance documents.

In your particular case, it is best to give the meaning as "missing"? It would
be more informative to call them "out of range", perhaps.

Best wishes

Jonathan

- Forwarded message from Martin Juckes - UKRI STFC 
 -

> Date: Tue, 14 May 2019 14:34:48 +
> From: Martin Juckes - UKRI STFC 
> To: Jonathan Gregory , "cf-metadata@cgd.ucar.edu"
>   
> Subject: Re: [CF-metadata] Missing data bins in histograms
> 
> Dear Jonathan,
> 
> 
> Sorry, I think I misunderstood the scope of valid usage of "flag_values". 
> I've only seen it used in contexts in which all values of the flagged array 
> are translated using the "flag_values"/"flag_meanings" pairs, but you are 
> suggesting, I think, that it should only apply to the one anomalous bin. If 
> we can use a single "flag_values" without changing the interpretation of the 
> rest of the array, that would make the solution easier.
> 
> 
> Does this correspond to what you are thinking of:
> 
> 
> float data(time,lat,lon,zbins);
>   data: standard_name =   
> "histogram_of_equivalent_reflectivity_factor_over_height_above_reference_ellipsoid";
>   data: coordinates="status";
> float zbins(zbins);
>   zbins: long_name="Height ranges (with bin for missing data at first 
> element)"
>   zbins: units="m";
>   zbins: bounds="zbin_bnds";
>   zbins: standard_name = "height";
> 
>   zbins:flag_values =  -.f;
>   zbins:flag_meanings = "missing_values";
> float zbin_bnds(zindex,2);
> character status(char_len);
>status:standard_name = "status_flag";
>status:long_name = "Flag indicating quality of histogram";
> float lat(lat);
> float lon(lon);
> 
> data:
>   zbins = -9999., 25., 100., ....;
>   zbin_bnds = -.,0., 0., 50., 50., 150., ...
> 
> regards,
> Martin
> 
> From: CF-metadata  on behalf of Jonathan 
> Gregory 
> Sent: 14 May 2019 13:43
> To: cf-metadata@cgd.ucar.edu
> Subject: Re: [CF-metadata] Missing data bins in histograms
> 
> Dear Martin
> 
> I agree that if valid_range implies masked-out data in some software, we can't
> put special values out of the range, and that we shouldn't tamper with missing
> data. I still think that flag_values is a better way to indicate special
> values in a coordinate variable than an auxiliary coordinate variable would 
> be.
> If there are flag values, by definition those values aren't physical 
> coordinate
> values, and the user of such data need to be aware of that. That would be the
> consequence of changing the convention to allow flag_values for coordinate
> variables, just as it is presently the case that a user of a data variable
> ought to check whether it has flag_values, which would likewise indicate that
> some of the valid values are not actually physical values. However I don't
> think we ought to change the standard_name to signal it, since introducing new
> standard_names requires software to recognise both versions.
> 
> Best wishes
> 
> Jonathan
> 
> - Forwarded message from Martin Juckes - UKRI STFC 
>  -
> 
> > Date: Tue, 14 May 2019 09:03:19 +
> > From: Martin Juckes - UKRI STFC 
> > To: Jonathan Gregory , "cf-metadata@cgd.ucar.edu"
> >
> > Subject: Re: [CF-metadata] Missing data bins in histograms
> >
> > Dear Jonathan,
> >
> >
> > I looked at "valid_range", and also "actual_range", but I believe that the 
> > definitions of either of these would have to be changed to accommodate this 
> > usage, and we would run into the problem that Jim raised in connection with 
> > my earlier suggestion of using "missing_value": such changes can break 
> > assumptions made by existing software. Data outside the "valid_range" may 
> > well be automatically rejected by a user application before the data gets 
> > to any CF aware libraries. For instance, python netCDF4 at version 1.3.0 
> > and 1.3.1 automatically removes data outside the valid_range, giving the 
> > user a masked array.  There is some discussion of this here: 
> > https://github.com/Unidata/netcdf4-python/issues/748.
> >
> >
> > <https://github.com/Unidata/netcdf4-python/issues/748>It is

Re: [CF-metadata] Missing data bins in histograms

2019-05-15 Thread Hollis, Dan
Hi Martin,

That seems like a good way forward. I shall make a proposal for a new standard 
name in the next day or two.

Regards,

Dan


-Original Message-
From: Martin Juckes - UKRI STFC  
Sent: Tuesday, 14 May 2019 17:30
To: Hollis, Dan ; Gregory, Jonathan 
; cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: Re: [CF-metadata] Missing data bins in histograms

Hi Dan,


if we were starting from a blank sheet, that would be a strong point. As it is, 
we are rather constrained by the existing practices in the community. I hope 
that we can find an agreement along the lines of the discussion that Jonathan 
and I are having which makes it possible to support this approach without major 
adjustment.


This is likely (if we succeed) to include presentation of a new example in the 
conventions document. Perhaps we could, at the same time, include and example 
showing the alternative approach which you are suggesting -- but  that would 
depend on having a standard name for the number of missing or rejected 
observations approved.


regards,

Martin



From: Hollis, Dan 
Sent: 14 May 2019 16:02
To: Juckes, Martin (STFC,RAL,RALSP); Gregory, Jonathan; 
cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: RE: [CF-metadata] Missing data bins in histograms

Hi Martin,

I agree there is no clear line between data and metadata and I didn't really 
intend to suggest there was one. As you say, there are different equally-valid 
views of where the line could/should be drawn in any particular situation 
between the different types of data that we wish to record. My instinct would 
be to separate the result of processing the available data (whether that be a 
mean, a total, a count or a histogram) from information about the data that was 
not available (such as a count of missing observations), but I appreciate that 
is not always necessary or practical.

Regards,

Dan


-Original Message-
From: Martin Juckes - UKRI STFC 
Sent: Tuesday, 14 May 2019 15:04
To: Hollis, Dan ; Gregory, Jonathan 
; cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: Re: [CF-metadata] Missing data bins in histograms

Hi Dan,


Thanks, that makes it clearer.


The conversation below follows on from one that Karl and I had with people from 
CFMIP (Cloud Forcing Model Intercomparison Project). The variable in question, 
contains the histogram, is produced to make it possible to compare climate 
model output with a standard product from the MISR imaging spectrometer.


I realise now that I have overlooked a change in the variable definition: 
although the product is computed as a histogram, the results are then 
normalised by total number of observations in each grid cell and reported as a 
percentage, so the actual variable name is 
cloud_area_fraction_in_atmosphere_layer rather than histogram. Their standard 
product has 16 bins: 15 for height ranges and one for the error flag.


When Karl and I started the conversation, one of us did suggest splitting the 
16th bin off into a separate variable, but this was considered as being an 
unwarranted complication: the variable is produced by one software package as a 
single array and used by a range of data analysis packages as a single array. 
Splitting it into two in the NetCDF file and then reassembling the parts 
afterwards would create significant extra work that nobody wants to do.


A considerable volume of data has already been written in the CMIP5 archive 
using this approach, with no CF metadata to inform people of the special nature 
of the 16th bin: the aim here is to improve on that state of affairs by 
providing specific metadata.


I would say that your view of the count of missing values as ancillary data is 
a valid perspective, but the suggestion that you are able to draw a clear line 
between "data" and "metadata" and that this perspective should become standard 
is not tenable. The perspective that counts of error flags are just as much 
data as counts of the other height range bins is also valid.


regards,

Martin



From: Hollis, Dan 
Sent: 14 May 2019 13:47
To: Juckes, Martin (STFC,RAL,RALSP); Gregory, Jonathan; 
cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: RE: [CF-metadata] Missing data bins in histograms

Hi Martin,

Sorry, I didn't mean to imply that we would do away with the histogram standard 
names - these would be retained, of course. I just meant that we both want to 
store one extra bit of information (maximum number of obs or, equivalently, 
missing number of obs) and that in both use cases ('histogram_of...' and 
'number_of...') this could be in an ancillary variable, for which we'd need a 
new standard name. Does that make more sense?

I appreciate that your users wish to display the number of missing values 
alongside the counts for the different bins, however I'd argue that this 
information is ancillary to the histogram itself (in the same way that the 
nu

Re: [CF-metadata] Missing data bins in histograms

2019-05-14 Thread Martin Juckes - UKRI STFC
Hi Dan,


if we were starting from a blank sheet, that would be a strong point. As it is, 
we are rather constrained by the existing practices in the community. I hope 
that we can find an agreement along the lines of the discussion that Jonathan 
and I are having which makes it possible to support this approach without major 
adjustment.


This is likely (if we succeed) to include presentation of a new example in the 
conventions document. Perhaps we could, at the same time, include and example 
showing the alternative approach which you are suggesting -- but  that would 
depend on having a standard name for the number of missing or rejected 
observations approved.


regards,

Martin



From: Hollis, Dan 
Sent: 14 May 2019 16:02
To: Juckes, Martin (STFC,RAL,RALSP); Gregory, Jonathan; 
cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: RE: [CF-metadata] Missing data bins in histograms

Hi Martin,

I agree there is no clear line between data and metadata and I didn't really 
intend to suggest there was one. As you say, there are different equally-valid 
views of where the line could/should be drawn in any particular situation 
between the different types of data that we wish to record. My instinct would 
be to separate the result of processing the available data (whether that be a 
mean, a total, a count or a histogram) from information about the data that was 
not available (such as a count of missing observations), but I appreciate that 
is not always necessary or practical.

Regards,

Dan


-Original Message-
From: Martin Juckes - UKRI STFC 
Sent: Tuesday, 14 May 2019 15:04
To: Hollis, Dan ; Gregory, Jonathan 
; cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: Re: [CF-metadata] Missing data bins in histograms

Hi Dan,


Thanks, that makes it clearer.


The conversation below follows on from one that Karl and I had with people from 
CFMIP (Cloud Forcing Model Intercomparison Project). The variable in question, 
contains the histogram, is produced to make it possible to compare climate 
model output with a standard product from the MISR imaging spectrometer.


I realise now that I have overlooked a change in the variable definition: 
although the product is computed as a histogram, the results are then 
normalised by total number of observations in each grid cell and reported as a 
percentage, so the actual variable name is 
cloud_area_fraction_in_atmosphere_layer rather than histogram. Their standard 
product has 16 bins: 15 for height ranges and one for the error flag.


When Karl and I started the conversation, one of us did suggest splitting the 
16th bin off into a separate variable, but this was considered as being an 
unwarranted complication: the variable is produced by one software package as a 
single array and used by a range of data analysis packages as a single array. 
Splitting it into two in the NetCDF file and then reassembling the parts 
afterwards would create significant extra work that nobody wants to do.


A considerable volume of data has already been written in the CMIP5 archive 
using this approach, with no CF metadata to inform people of the special nature 
of the 16th bin: the aim here is to improve on that state of affairs by 
providing specific metadata.


I would say that your view of the count of missing values as ancillary data is 
a valid perspective, but the suggestion that you are able to draw a clear line 
between "data" and "metadata" and that this perspective should become standard 
is not tenable. The perspective that counts of error flags are just as much 
data as counts of the other height range bins is also valid.


regards,

Martin



From: Hollis, Dan 
Sent: 14 May 2019 13:47
To: Juckes, Martin (STFC,RAL,RALSP); Gregory, Jonathan; 
cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: RE: [CF-metadata] Missing data bins in histograms

Hi Martin,

Sorry, I didn't mean to imply that we would do away with the histogram standard 
names - these would be retained, of course. I just meant that we both want to 
store one extra bit of information (maximum number of obs or, equivalently, 
missing number of obs) and that in both use cases ('histogram_of...' and 
'number_of...') this could be in an ancillary variable, for which we'd need a 
new standard name. Does that make more sense?

I appreciate that your users wish to display the number of missing values 
alongside the counts for the different bins, however I'd argue that this 
information is ancillary to the histogram itself (in the same way that the 
number of missing days is ancillary to a count of days of air frost) and should 
be stored as such in the netCDF file (rather than in a 'pseudo-bin').

Regards,

Dan


-Original Message-
From: Martin Juckes - UKRI STFC 
Sent: Tuesday, 14 May 2019 13:29
To: Hollis, Dan ; Gregory, Jonathan 
; cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: Re: [CF-metadat

Re: [CF-metadata] Missing data bins in histograms

2019-05-14 Thread Hollis, Dan
Hi Martin,

I agree there is no clear line between data and metadata and I didn't really 
intend to suggest there was one. As you say, there are different equally-valid 
views of where the line could/should be drawn in any particular situation 
between the different types of data that we wish to record. My instinct would 
be to separate the result of processing the available data (whether that be a 
mean, a total, a count or a histogram) from information about the data that was 
not available (such as a count of missing observations), but I appreciate that 
is not always necessary or practical.

Regards,

Dan


-Original Message-
From: Martin Juckes - UKRI STFC  
Sent: Tuesday, 14 May 2019 15:04
To: Hollis, Dan ; Gregory, Jonathan 
; cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: Re: [CF-metadata] Missing data bins in histograms

Hi Dan,


Thanks, that makes it clearer.


The conversation below follows on from one that Karl and I had with people from 
CFMIP (Cloud Forcing Model Intercomparison Project). The variable in question, 
contains the histogram, is produced to make it possible to compare climate 
model output with a standard product from the MISR imaging spectrometer.


I realise now that I have overlooked a change in the variable definition: 
although the product is computed as a histogram, the results are then 
normalised by total number of observations in each grid cell and reported as a 
percentage, so the actual variable name is 
cloud_area_fraction_in_atmosphere_layer rather than histogram. Their standard 
product has 16 bins: 15 for height ranges and one for the error flag.


When Karl and I started the conversation, one of us did suggest splitting the 
16th bin off into a separate variable, but this was considered as being an 
unwarranted complication: the variable is produced by one software package as a 
single array and used by a range of data analysis packages as a single array. 
Splitting it into two in the NetCDF file and then reassembling the parts 
afterwards would create significant extra work that nobody wants to do.


A considerable volume of data has already been written in the CMIP5 archive 
using this approach, with no CF metadata to inform people of the special nature 
of the 16th bin: the aim here is to improve on that state of affairs by 
providing specific metadata.


I would say that your view of the count of missing values as ancillary data is 
a valid perspective, but the suggestion that you are able to draw a clear line 
between "data" and "metadata" and that this perspective should become standard 
is not tenable. The perspective that counts of error flags are just as much 
data as counts of the other height range bins is also valid.


regards,

Martin



From: Hollis, Dan 
Sent: 14 May 2019 13:47
To: Juckes, Martin (STFC,RAL,RALSP); Gregory, Jonathan; 
cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: RE: [CF-metadata] Missing data bins in histograms

Hi Martin,

Sorry, I didn't mean to imply that we would do away with the histogram standard 
names - these would be retained, of course. I just meant that we both want to 
store one extra bit of information (maximum number of obs or, equivalently, 
missing number of obs) and that in both use cases ('histogram_of...' and 
'number_of...') this could be in an ancillary variable, for which we'd need a 
new standard name. Does that make more sense?

I appreciate that your users wish to display the number of missing values 
alongside the counts for the different bins, however I'd argue that this 
information is ancillary to the histogram itself (in the same way that the 
number of missing days is ancillary to a count of days of air frost) and should 
be stored as such in the netCDF file (rather than in a 'pseudo-bin').

Regards,

Dan


-Original Message-
From: Martin Juckes - UKRI STFC 
Sent: Tuesday, 14 May 2019 13:29
To: Hollis, Dan ; Gregory, Jonathan 
; cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: Re: [CF-metadata] Missing data bins in histograms

Hi Dan,


it is a similar concept, but the aim here is to record it in a histogram. We 
have a standard name for the histogram  .. I'm not sure why you think we need 
to change this. Perhaps it would be possible to do away with "histogram_" 
standard names and just use "number_of_observations", but I'm afraid I don't 
see much merit in that approach.


For you use case, I can certainly see that there could be a case for a 
"number_of_missing_observations" standard name, but it doesn't help with the 
specification of the histogram that I want to store.


regards,

Martin


From: Hollis, Dan 
Sent: 14 May 2019 13:13
To: Juckes, Martin (STFC,RAL,RALSP); Gregory, Jonathan; 
cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: RE: [CF-metadata] Missing data bins in histograms

Hi Martin,

Thanks for your suggestion - I can see how thi

Re: [CF-metadata] Missing data bins in histograms

2019-05-14 Thread Martin Juckes - UKRI STFC
Dear Jonathan,


Sorry, I think I misunderstood the scope of valid usage of "flag_values". I've 
only seen it used in contexts in which all values of the flagged array are 
translated using the "flag_values"/"flag_meanings" pairs, but you are 
suggesting, I think, that it should only apply to the one anomalous bin. If we 
can use a single "flag_values" without changing the interpretation of the rest 
of the array, that would make the solution easier.


Does this correspond to what you are thinking of:


float data(time,lat,lon,zbins);
  data: standard_name =   
"histogram_of_equivalent_reflectivity_factor_over_height_above_reference_ellipsoid";
  data: coordinates="status";
float zbins(zbins);
  zbins: long_name="Height ranges (with bin for missing data at first element)"
  zbins: units="m";
  zbins: bounds="zbin_bnds";
  zbins: standard_name = "height";

  zbins:flag_values =  -.f;
  zbins:flag_meanings = "missing_values";
float zbin_bnds(zindex,2);
character status(char_len);
   status:standard_name = "status_flag";
   status:long_name = "Flag indicating quality of histogram";
float lat(lat);
float lon(lon);

data:
  zbins = -., 25., 100., ;
  zbin_bnds = -.,0., 0., 50., 50., 150., ...

regards,
Martin
____
From: CF-metadata  on behalf of Jonathan 
Gregory 
Sent: 14 May 2019 13:43
To: cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] Missing data bins in histograms

Dear Martin

I agree that if valid_range implies masked-out data in some software, we can't
put special values out of the range, and that we shouldn't tamper with missing
data. I still think that flag_values is a better way to indicate special
values in a coordinate variable than an auxiliary coordinate variable would be.
If there are flag values, by definition those values aren't physical coordinate
values, and the user of such data need to be aware of that. That would be the
consequence of changing the convention to allow flag_values for coordinate
variables, just as it is presently the case that a user of a data variable
ought to check whether it has flag_values, which would likewise indicate that
some of the valid values are not actually physical values. However I don't
think we ought to change the standard_name to signal it, since introducing new
standard_names requires software to recognise both versions.

Best wishes

Jonathan

- Forwarded message from Martin Juckes - UKRI STFC 
 -

> Date: Tue, 14 May 2019 09:03:19 +
> From: Martin Juckes - UKRI STFC 
> To: Jonathan Gregory , "cf-metadata@cgd.ucar.edu"
>
> Subject: Re: [CF-metadata] Missing data bins in histograms
>
> Dear Jonathan,
>
>
> I looked at "valid_range", and also "actual_range", but I believe that the 
> definitions of either of these would have to be changed to accommodate this 
> usage, and we would run into the problem that Jim raised in connection with 
> my earlier suggestion of using "missing_value": such changes can break 
> assumptions made by existing software. Data outside the "valid_range" may 
> well be automatically rejected by a user application before the data gets to 
> any CF aware libraries. For instance, python netCDF4 at version 1.3.0 and 
> 1.3.1 automatically removes data outside the valid_range, giving the user a 
> masked array.  There is some discussion of this here: 
> https://github.com/Unidata/netcdf4-python/issues/748.
>
>
> <https://github.com/Unidata/netcdf4-python/issues/748>It is possible to 
> circumvent this behaviour by changing the auto-masking setting in python 
> netCDF4, and the NUG does suggest using values outside the "valid_range" as 
> flags. NUG also suggests using the missing_value attribute to list such flag 
> values ... but Jim has pointed out that such an approach is likely to cause 
> problems with many applications. This is a complex area because the meaning 
> of "missing_value" in NUG has evolved. Up until CF 1.5 it appears that a 
> "missing_value" meant, unambiguously, missing data.  The current CF appears 
> to changed this in line with NUG so that different usages are now 
> permissible, but I still agree with Jim's objection. We can't, I'm sure, at 
> this stage, follow an approach which depends on users being able to control 
> the auto-masking settings (it is a simple call to the "set_auto_mask" method 
> if you are using the python netCDF4 library directly ... but may not be 
> available to users who are working with applications built on the library).
>
>
> I wanted to use a new standard name for the hight bins because of the fact 
> that the value in the first bin, which I have set to -., is not a h

Re: [CF-metadata] Missing data bins in histograms

2019-05-14 Thread Martin Juckes - UKRI STFC
Hi Dan,


Thanks, that makes it clearer.


The conversation below follows on from one that Karl and I had with people from 
CFMIP (Cloud Forcing Model Intercomparison Project). The variable in question, 
contains the histogram, is produced to make it possible to compare climate 
model output with a standard product from the MISR imaging spectrometer.


I realise now that I have overlooked a change in the variable definition: 
although the product is computed as a histogram, the results are then 
normalised by total number of observations in each grid cell and reported as a 
percentage, so the actual variable name is 
cloud_area_fraction_in_atmosphere_layer rather than histogram. Their standard 
product has 16 bins: 15 for height ranges and one for the error flag.


When Karl and I started the conversation, one of us did suggest splitting the 
16th bin off into a separate variable, but this was considered as being an 
unwarranted complication: the variable is produced by one software package as a 
single array and used by a range of data analysis packages as a single array. 
Splitting it into two in the NetCDF file and then reassembling the parts 
afterwards would create significant extra work that nobody wants to do.


A considerable volume of data has already been written in the CMIP5 archive 
using this approach, with no CF metadata to inform people of the special nature 
of the 16th bin: the aim here is to improve on that state of affairs by 
providing specific metadata.


I would say that your view of the count of missing values as ancillary data is 
a valid perspective, but the suggestion that you are able to draw a clear line 
between "data" and "metadata" and that this perspective should become standard 
is not tenable. The perspective that counts of error flags are just as much 
data as counts of the other height range bins is also valid.


regards,

Martin



From: Hollis, Dan 
Sent: 14 May 2019 13:47
To: Juckes, Martin (STFC,RAL,RALSP); Gregory, Jonathan; 
cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: RE: [CF-metadata] Missing data bins in histograms

Hi Martin,

Sorry, I didn't mean to imply that we would do away with the histogram standard 
names - these would be retained, of course. I just meant that we both want to 
store one extra bit of information (maximum number of obs or, equivalently, 
missing number of obs) and that in both use cases ('histogram_of...' and 
'number_of...') this could be in an ancillary variable, for which we'd need a 
new standard name. Does that make more sense?

I appreciate that your users wish to display the number of missing values 
alongside the counts for the different bins, however I'd argue that this 
information is ancillary to the histogram itself (in the same way that the 
number of missing days is ancillary to a count of days of air frost) and should 
be stored as such in the netCDF file (rather than in a 'pseudo-bin').

Regards,

Dan


-Original Message-
From: Martin Juckes - UKRI STFC 
Sent: Tuesday, 14 May 2019 13:29
To: Hollis, Dan ; Gregory, Jonathan 
; cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: Re: [CF-metadata] Missing data bins in histograms

Hi Dan,


it is a similar concept, but the aim here is to record it in a histogram. We 
have a standard name for the histogram  .. I'm not sure why you think we need 
to change this. Perhaps it would be possible to do away with "histogram_" 
standard names and just use "number_of_observations", but I'm afraid I don't 
see much merit in that approach.


For you use case, I can certainly see that there could be a case for a 
"number_of_missing_observations" standard name, but it doesn't help with the 
specification of the histogram that I want to store.


regards,

Martin


From: Hollis, Dan 
Sent: 14 May 2019 13:13
To: Juckes, Martin (STFC,RAL,RALSP); Gregory, Jonathan; 
cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: RE: [CF-metadata] Missing data bins in histograms

Hi Martin,

Thanks for your suggestion - I can see how this could work for our data. 
However I can also see that having to parse the 'interval' text from the 
'cell_methods' comment field and combine that with the bounds from the time 
coordinate is not especially user-friendly! It would be much easier if we could 
store 'maximum_number_of_observations' (or 'number_of_missing_observations') as 
well.

I guess the reason your suggestion does not work for your histograms is that 
there is no obvious place to record the sampling intervals (angular and 
distance) of the radar data. However, if I'm understanding this correctly, all 
the user really needs is the total number of data bins in one sweep of the 
radar. I'd argue that this is similar in concept to 
'maximum_number_of_observations' i.e. maybe we just need a new standard name 
that we can both use. What do you think?

Apologies if I haven't fully gra

Re: [CF-metadata] Missing data bins in histograms

2019-05-14 Thread Hollis, Dan
Hi Martin,

Sorry, I didn't mean to imply that we would do away with the histogram standard 
names - these would be retained, of course. I just meant that we both want to 
store one extra bit of information (maximum number of obs or, equivalently, 
missing number of obs) and that in both use cases ('histogram_of...' and 
'number_of...') this could be in an ancillary variable, for which we'd need a 
new standard name. Does that make more sense?

I appreciate that your users wish to display the number of missing values 
alongside the counts for the different bins, however I'd argue that this 
information is ancillary to the histogram itself (in the same way that the 
number of missing days is ancillary to a count of days of air frost) and should 
be stored as such in the netCDF file (rather than in a 'pseudo-bin').

Regards,

Dan


-Original Message-
From: Martin Juckes - UKRI STFC  
Sent: Tuesday, 14 May 2019 13:29
To: Hollis, Dan ; Gregory, Jonathan 
; cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: Re: [CF-metadata] Missing data bins in histograms

Hi Dan,


it is a similar concept, but the aim here is to record it in a histogram. We 
have a standard name for the histogram  .. I'm not sure why you think we need 
to change this. Perhaps it would be possible to do away with "histogram_" 
standard names and just use "number_of_observations", but I'm afraid I don't 
see much merit in that approach.


For you use case, I can certainly see that there could be a case for a 
"number_of_missing_observations" standard name, but it doesn't help with the 
specification of the histogram that I want to store.


regards,

Martin


From: Hollis, Dan 
Sent: 14 May 2019 13:13
To: Juckes, Martin (STFC,RAL,RALSP); Gregory, Jonathan; 
cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: RE: [CF-metadata] Missing data bins in histograms

Hi Martin,

Thanks for your suggestion - I can see how this could work for our data. 
However I can also see that having to parse the 'interval' text from the 
'cell_methods' comment field and combine that with the bounds from the time 
coordinate is not especially user-friendly! It would be much easier if we could 
store 'maximum_number_of_observations' (or 'number_of_missing_observations') as 
well.

I guess the reason your suggestion does not work for your histograms is that 
there is no obvious place to record the sampling intervals (angular and 
distance) of the radar data. However, if I'm understanding this correctly, all 
the user really needs is the total number of data bins in one sweep of the 
radar. I'd argue that this is similar in concept to 
'maximum_number_of_observations' i.e. maybe we just need a new standard name 
that we can both use. What do you think?

Apologies if I haven't fully grasped the complexities of your data.

Regards,

Dan

-Original Message-
From: Martin Juckes - UKRI STFC 
Sent: Tuesday, 14 May 2019 12:02
To: Hollis, Dan ; Gregory, Jonathan 
; cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: Re: [CF-metadata] Missing data bins in histograms

Hello Dan,


I think there is a method for recording the number of valid observations in 
each data point, which, if I've understood correctly, would meet the 
requirement you are describing: using an "ancillary_variable" with standard 
name "number_of_observations".  I don't think there is a method for explicitly 
recording missing values, but you can use "interval" (in the "cell_methods" 
comment) to specify the interval of input data which, together with the 
duration of the calculation, will tell you the maximum amount of input values 
available.


In your use-case the number of missing values would be part of the ancillary 
information, in my use case it is the data itself -- the users want a histogram 
which includes a count of failed retrievals,


regards,

Martin


From: Hollis, Dan 
Sent: 14 May 2019 11:22
To: Juckes, Martin (STFC,RAL,RALSP); Gregory, Jonathan; 
cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: RE: [CF-metadata] Missing data bins in histograms

Dear Martin/Jonathan/Jim,

I appreciate that this discussion is focussed on histograms, however I wonder 
if there is a wider issue here i.e. how should one record the number of missing 
values for any extensive quantity?

For example, we use number_of_days_with_air_temperature_below_threshold to 
store counts of days of air frost (computed from station observations of daily 
minimum temperature). The threshold is specified using a scalar coordinate 
variable called 'air_temperature' with a value of 0.0. The counts of air frost 
are for periods of months, seasons or years and, inevitably, the values for 
some periods for some stations are based on incomplete data. Is there a 
recommended method for recording the number of missing observations for each 
data point (apologies if I've missed this 

Re: [CF-metadata] Missing data bins in histograms

2019-05-14 Thread Jonathan Gregory
Dear Martin

I agree that if valid_range implies masked-out data in some software, we can't
put special values out of the range, and that we shouldn't tamper with missing
data. I still think that flag_values is a better way to indicate special
values in a coordinate variable than an auxiliary coordinate variable would be.
If there are flag values, by definition those values aren't physical coordinate
values, and the user of such data need to be aware of that. That would be the
consequence of changing the convention to allow flag_values for coordinate
variables, just as it is presently the case that a user of a data variable
ought to check whether it has flag_values, which would likewise indicate that
some of the valid values are not actually physical values. However I don't
think we ought to change the standard_name to signal it, since introducing new
standard_names requires software to recognise both versions.

Best wishes

Jonathan

- Forwarded message from Martin Juckes - UKRI STFC 
 -

> Date: Tue, 14 May 2019 09:03:19 +
> From: Martin Juckes - UKRI STFC 
> To: Jonathan Gregory , "cf-metadata@cgd.ucar.edu"
>       
> Subject: Re: [CF-metadata] Missing data bins in histograms
> 
> Dear Jonathan,
> 
> 
> I looked at "valid_range", and also "actual_range", but I believe that the 
> definitions of either of these would have to be changed to accommodate this 
> usage, and we would run into the problem that Jim raised in connection with 
> my earlier suggestion of using "missing_value": such changes can break 
> assumptions made by existing software. Data outside the "valid_range" may 
> well be automatically rejected by a user application before the data gets to 
> any CF aware libraries. For instance, python netCDF4 at version 1.3.0 and 
> 1.3.1 automatically removes data outside the valid_range, giving the user a 
> masked array.  There is some discussion of this here: 
> https://github.com/Unidata/netcdf4-python/issues/748.
> 
> 
> <https://github.com/Unidata/netcdf4-python/issues/748>It is possible to 
> circumvent this behaviour by changing the auto-masking setting in python 
> netCDF4, and the NUG does suggest using values outside the "valid_range" as 
> flags. NUG also suggests using the missing_value attribute to list such flag 
> values ... but Jim has pointed out that such an approach is likely to cause 
> problems with many applications. This is a complex area because the meaning 
> of "missing_value" in NUG has evolved. Up until CF 1.5 it appears that a 
> "missing_value" meant, unambiguously, missing data.  The current CF appears 
> to changed this in line with NUG so that different usages are now 
> permissible, but I still agree with Jim's objection. We can't, I'm sure, at 
> this stage, follow an approach which depends on users being able to control 
> the auto-masking settings (it is a simple call to the "set_auto_mask" method 
> if you are using the python netCDF4 library directly ... but may not be 
> available to users who are working with applications built on the library).
> 
> 
> I wanted to use a new standard name for the hight bins because of the fact 
> that the value in the first bin, which I have set to -., is not a height. 
> This data point needs to have a valid floating point value to conform to the 
> rules for a coordinate array, but, unlike the rest of the array, it should 
> not be interpreted as height. This is signalled by the presence of an 
> auxiliary coordinate -- but I'm not sure that that is adequate. Applications 
> and users are entitled to believe that a variable which has standard name 
> "height" really refers to height, without having to check all the auxiliary 
> coordinates to see if there is something there which modifies the meaning of 
> the variable. The standard name "height_bins" would signal that they must 
> look in the auxiliary coordinate.
> 
> 
> Do you agree with the necessity and appropriateness of the new name of 
> "bin_status_flag" which I have suggested for the auxiliary coordinate?
> 
> 
> regards,
> 
> Martin
> 
> 
> From: CF-metadata  on behalf of Jonathan 
> Gregory 
> Sent: 13 May 2019 18:00
> To: cf-metadata@cgd.ucar.edu
> Subject: Re: [CF-metadata] Missing data bins in histograms
> 
> Dear Martin
> 
> I agree that an alternative which would not require a change to the
> convention is to attach a string-valued aux coord variable. However, the
> flags are much more economical and seem natural, as you say.
> 
> As I said in my last email, I feel that it's better to keep the standard name
> as it is, despite the presence of a special value in it w

Re: [CF-metadata] Missing data bins in histograms

2019-05-14 Thread Martin Juckes - UKRI STFC
Hi Dan,


it is a similar concept, but the aim here is to record it in a histogram. We 
have a standard name for the histogram  .. I'm not sure why you think we need 
to change this. Perhaps it would be possible to do away with "histogram_" 
standard names and just use "number_of_observations", but I'm afraid I don't 
see much merit in that approach.


For you use case, I can certainly see that there could be a case for a 
"number_of_missing_observations" standard name, but it doesn't help with the 
specification of the histogram that I want to store.


regards,

Martin


From: Hollis, Dan 
Sent: 14 May 2019 13:13
To: Juckes, Martin (STFC,RAL,RALSP); Gregory, Jonathan; 
cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: RE: [CF-metadata] Missing data bins in histograms

Hi Martin,

Thanks for your suggestion - I can see how this could work for our data. 
However I can also see that having to parse the 'interval' text from the 
'cell_methods' comment field and combine that with the bounds from the time 
coordinate is not especially user-friendly! It would be much easier if we could 
store 'maximum_number_of_observations' (or 'number_of_missing_observations') as 
well.

I guess the reason your suggestion does not work for your histograms is that 
there is no obvious place to record the sampling intervals (angular and 
distance) of the radar data. However, if I'm understanding this correctly, all 
the user really needs is the total number of data bins in one sweep of the 
radar. I'd argue that this is similar in concept to 
'maximum_number_of_observations' i.e. maybe we just need a new standard name 
that we can both use. What do you think?

Apologies if I haven't fully grasped the complexities of your data.

Regards,

Dan

-Original Message-
From: Martin Juckes - UKRI STFC 
Sent: Tuesday, 14 May 2019 12:02
To: Hollis, Dan ; Gregory, Jonathan 
; cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: Re: [CF-metadata] Missing data bins in histograms

Hello Dan,


I think there is a method for recording the number of valid observations in 
each data point, which, if I've understood correctly, would meet the 
requirement you are describing: using an "ancillary_variable" with standard 
name "number_of_observations".  I don't think there is a method for explicitly 
recording missing values, but you can use "interval" (in the "cell_methods" 
comment) to specify the interval of input data which, together with the 
duration of the calculation, will tell you the maximum amount of input values 
available.


In your use-case the number of missing values would be part of the ancillary 
information, in my use case it is the data itself -- the users want a histogram 
which includes a count of failed retrievals,


regards,

Martin


From: Hollis, Dan 
Sent: 14 May 2019 11:22
To: Juckes, Martin (STFC,RAL,RALSP); Gregory, Jonathan; 
cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: RE: [CF-metadata] Missing data bins in histograms

Dear Martin/Jonathan/Jim,

I appreciate that this discussion is focussed on histograms, however I wonder 
if there is a wider issue here i.e. how should one record the number of missing 
values for any extensive quantity?

For example, we use number_of_days_with_air_temperature_below_threshold to 
store counts of days of air frost (computed from station observations of daily 
minimum temperature). The threshold is specified using a scalar coordinate 
variable called 'air_temperature' with a value of 0.0. The counts of air frost 
are for periods of months, seasons or years and, inevitably, the values for 
some periods for some stations are based on incomplete data. Is there a 
recommended method for recording the number of missing observations for each 
data point (apologies if I've missed this in the conventions)? If so then maybe 
the same approach could be used for histograms too. If not then my feeling is 
that whatever solution you propose should be applicable to all extensive 
quantities (i.e. all quantities that can be derived from a set of constituent 
observations). Having a special 'bin' might work for histogram data but would 
not work for other variables so I think a different approach is required.

My feeling is that the number of missing values is sort of like metadata i.e. 
it's telling you something about the quality of the data itself. Would an 
ancillary variable suit this purpose?

Regards,

Dan


-Original Message-
From: CF-metadata  On Behalf Of Martin Juckes 
- UKRI STFC
Sent: Tuesday, 14 May 2019 10:03
To: Gregory, Jonathan ; cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] Missing data bins in histograms

Dear Jonathan,


I looked at "valid_range", and also "actual_range", but I believe that the 
definitions of either of these would have to be changed to accommodate this 
usage, and we wou

Re: [CF-metadata] Missing data bins in histograms

2019-05-14 Thread Hollis, Dan
Hi Martin,

Thanks for your suggestion - I can see how this could work for our data. 
However I can also see that having to parse the 'interval' text from the 
'cell_methods' comment field and combine that with the bounds from the time 
coordinate is not especially user-friendly! It would be much easier if we could 
store 'maximum_number_of_observations' (or 'number_of_missing_observations') as 
well.

I guess the reason your suggestion does not work for your histograms is that 
there is no obvious place to record the sampling intervals (angular and 
distance) of the radar data. However, if I'm understanding this correctly, all 
the user really needs is the total number of data bins in one sweep of the 
radar. I'd argue that this is similar in concept to 
'maximum_number_of_observations' i.e. maybe we just need a new standard name 
that we can both use. What do you think?

Apologies if I haven't fully grasped the complexities of your data.

Regards,

Dan

-Original Message-
From: Martin Juckes - UKRI STFC  
Sent: Tuesday, 14 May 2019 12:02
To: Hollis, Dan ; Gregory, Jonathan 
; cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: Re: [CF-metadata] Missing data bins in histograms

Hello Dan,


I think there is a method for recording the number of valid observations in 
each data point, which, if I've understood correctly, would meet the 
requirement you are describing: using an "ancillary_variable" with standard 
name "number_of_observations".  I don't think there is a method for explicitly 
recording missing values, but you can use "interval" (in the "cell_methods" 
comment) to specify the interval of input data which, together with the 
duration of the calculation, will tell you the maximum amount of input values 
available.


In your use-case the number of missing values would be part of the ancillary 
information, in my use case it is the data itself -- the users want a histogram 
which includes a count of failed retrievals,


regards,

Martin


From: Hollis, Dan 
Sent: 14 May 2019 11:22
To: Juckes, Martin (STFC,RAL,RALSP); Gregory, Jonathan; 
cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: RE: [CF-metadata] Missing data bins in histograms

Dear Martin/Jonathan/Jim,

I appreciate that this discussion is focussed on histograms, however I wonder 
if there is a wider issue here i.e. how should one record the number of missing 
values for any extensive quantity?

For example, we use number_of_days_with_air_temperature_below_threshold to 
store counts of days of air frost (computed from station observations of daily 
minimum temperature). The threshold is specified using a scalar coordinate 
variable called 'air_temperature' with a value of 0.0. The counts of air frost 
are for periods of months, seasons or years and, inevitably, the values for 
some periods for some stations are based on incomplete data. Is there a 
recommended method for recording the number of missing observations for each 
data point (apologies if I've missed this in the conventions)? If so then maybe 
the same approach could be used for histograms too. If not then my feeling is 
that whatever solution you propose should be applicable to all extensive 
quantities (i.e. all quantities that can be derived from a set of constituent 
observations). Having a special 'bin' might work for histogram data but would 
not work for other variables so I think a different approach is required.

My feeling is that the number of missing values is sort of like metadata i.e. 
it's telling you something about the quality of the data itself. Would an 
ancillary variable suit this purpose?

Regards,

Dan


-Original Message-
From: CF-metadata  On Behalf Of Martin Juckes 
- UKRI STFC
Sent: Tuesday, 14 May 2019 10:03
To: Gregory, Jonathan ; cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] Missing data bins in histograms

Dear Jonathan,


I looked at "valid_range", and also "actual_range", but I believe that the 
definitions of either of these would have to be changed to accommodate this 
usage, and we would run into the problem that Jim raised in connection with my 
earlier suggestion of using "missing_value": such changes can break assumptions 
made by existing software. Data outside the "valid_range" may well be 
automatically rejected by a user application before the data gets to any CF 
aware libraries. For instance, python netCDF4 at version 1.3.0 and 1.3.1 
automatically removes data outside the valid_range, giving the user a masked 
array.  There is some discussion of this here: 
https://github.com/Unidata/netcdf4-python/issues/748.


<https://github.com/Unidata/netcdf4-python/issues/748>It is possible to 
circumvent this behaviour by changing the auto-masking setting in python 
netCDF4, and the NUG does suggest using values outside the "valid_range" as 
flags. NUG also suggests using the missing_value

Re: [CF-metadata] Missing data bins in histograms

2019-05-14 Thread Martin Juckes - UKRI STFC
Hello Dan,


I think there is a method for recording the number of valid observations in 
each data point, which, if I've understood correctly, would meet the 
requirement you are describing: using an "ancillary_variable" with standard 
name "number_of_observations".  I don't think there is a method for explicitly 
recording missing values, but you can use "interval" (in the "cell_methods" 
comment) to specify the interval of input data which, together with the 
duration of the calculation, will tell you the maximum amount of input values 
available.


In your use-case the number of missing values would be part of the ancillary 
information, in my use case it is the data itself -- the users want a histogram 
which includes a count of failed retrievals,


regards,

Martin


From: Hollis, Dan 
Sent: 14 May 2019 11:22
To: Juckes, Martin (STFC,RAL,RALSP); Gregory, Jonathan; 
cf-metadata@cgd.ucar.edu; jbi...@cicsnc.org
Subject: RE: [CF-metadata] Missing data bins in histograms

Dear Martin/Jonathan/Jim,

I appreciate that this discussion is focussed on histograms, however I wonder 
if there is a wider issue here i.e. how should one record the number of missing 
values for any extensive quantity?

For example, we use number_of_days_with_air_temperature_below_threshold to 
store counts of days of air frost (computed from station observations of daily 
minimum temperature). The threshold is specified using a scalar coordinate 
variable called 'air_temperature' with a value of 0.0. The counts of air frost 
are for periods of months, seasons or years and, inevitably, the values for 
some periods for some stations are based on incomplete data. Is there a 
recommended method for recording the number of missing observations for each 
data point (apologies if I've missed this in the conventions)? If so then maybe 
the same approach could be used for histograms too. If not then my feeling is 
that whatever solution you propose should be applicable to all extensive 
quantities (i.e. all quantities that can be derived from a set of constituent 
observations). Having a special 'bin' might work for histogram data but would 
not work for other variables so I think a different approach is required.

My feeling is that the number of missing values is sort of like metadata i.e. 
it's telling you something about the quality of the data itself. Would an 
ancillary variable suit this purpose?

Regards,

Dan


-Original Message-
From: CF-metadata  On Behalf Of Martin Juckes 
- UKRI STFC
Sent: Tuesday, 14 May 2019 10:03
To: Gregory, Jonathan ; cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] Missing data bins in histograms

Dear Jonathan,


I looked at "valid_range", and also "actual_range", but I believe that the 
definitions of either of these would have to be changed to accommodate this 
usage, and we would run into the problem that Jim raised in connection with my 
earlier suggestion of using "missing_value": such changes can break assumptions 
made by existing software. Data outside the "valid_range" may well be 
automatically rejected by a user application before the data gets to any CF 
aware libraries. For instance, python netCDF4 at version 1.3.0 and 1.3.1 
automatically removes data outside the valid_range, giving the user a masked 
array.  There is some discussion of this here: 
https://github.com/Unidata/netcdf4-python/issues/748.


<https://github.com/Unidata/netcdf4-python/issues/748>It is possible to 
circumvent this behaviour by changing the auto-masking setting in python 
netCDF4, and the NUG does suggest using values outside the "valid_range" as 
flags. NUG also suggests using the missing_value attribute to list such flag 
values ... but Jim has pointed out that such an approach is likely to cause 
problems with many applications. This is a complex area because the meaning of 
"missing_value" in NUG has evolved. Up until CF 1.5 it appears that a 
"missing_value" meant, unambiguously, missing data.  The current CF appears to 
changed this in line with NUG so that different usages are now permissible, but 
I still agree with Jim's objection. We can't, I'm sure, at this stage, follow 
an approach which depends on users being able to control the auto-masking 
settings (it is a simple call to the "set_auto_mask" method if you are using 
the python netCDF4 library directly ... but may not be available to users who 
are worki
 ng with applications built on the library).


I wanted to use a new standard name for the hight bins because of the fact that 
the value in the first bin, which I have set to -., is not a height. This 
data point needs to have a valid floating point value to conform to the rules 
for a coordinate array, but, unlike the rest of the array, it should not be 
interpreted as height. This is signalled by the presence of an auxiliary 
c

Re: [CF-metadata] Missing data bins in histograms

2019-05-14 Thread Hollis, Dan
Dear Martin/Jonathan/Jim,

I appreciate that this discussion is focussed on histograms, however I wonder 
if there is a wider issue here i.e. how should one record the number of missing 
values for any extensive quantity?

For example, we use number_of_days_with_air_temperature_below_threshold to 
store counts of days of air frost (computed from station observations of daily 
minimum temperature). The threshold is specified using a scalar coordinate 
variable called 'air_temperature' with a value of 0.0. The counts of air frost 
are for periods of months, seasons or years and, inevitably, the values for 
some periods for some stations are based on incomplete data. Is there a 
recommended method for recording the number of missing observations for each 
data point (apologies if I've missed this in the conventions)? If so then maybe 
the same approach could be used for histograms too. If not then my feeling is 
that whatever solution you propose should be applicable to all extensive 
quantities (i.e. all quantities that can be derived from a set of constituent 
observations). Having a special 'bin' might work for histogram data but would 
not work for other variables so I think a different approach is required.

My feeling is that the number of missing values is sort of like metadata i.e. 
it's telling you something about the quality of the data itself. Would an 
ancillary variable suit this purpose?

Regards,

Dan


-Original Message-
From: CF-metadata  On Behalf Of Martin Juckes 
- UKRI STFC
Sent: Tuesday, 14 May 2019 10:03
To: Gregory, Jonathan ; cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] Missing data bins in histograms

Dear Jonathan,


I looked at "valid_range", and also "actual_range", but I believe that the 
definitions of either of these would have to be changed to accommodate this 
usage, and we would run into the problem that Jim raised in connection with my 
earlier suggestion of using "missing_value": such changes can break assumptions 
made by existing software. Data outside the "valid_range" may well be 
automatically rejected by a user application before the data gets to any CF 
aware libraries. For instance, python netCDF4 at version 1.3.0 and 1.3.1 
automatically removes data outside the valid_range, giving the user a masked 
array.  There is some discussion of this here: 
https://github.com/Unidata/netcdf4-python/issues/748.


<https://github.com/Unidata/netcdf4-python/issues/748>It is possible to 
circumvent this behaviour by changing the auto-masking setting in python 
netCDF4, and the NUG does suggest using values outside the "valid_range" as 
flags. NUG also suggests using the missing_value attribute to list such flag 
values ... but Jim has pointed out that such an approach is likely to cause 
problems with many applications. This is a complex area because the meaning of 
"missing_value" in NUG has evolved. Up until CF 1.5 it appears that a 
"missing_value" meant, unambiguously, missing data.  The current CF appears to 
changed this in line with NUG so that different usages are now permissible, but 
I still agree with Jim's objection. We can't, I'm sure, at this stage, follow 
an approach which depends on users being able to control the auto-masking 
settings (it is a simple call to the "set_auto_mask" method if you are using 
the python netCDF4 library directly ... but may not be available to users who 
are worki
 ng with applications built on the library).


I wanted to use a new standard name for the hight bins because of the fact that 
the value in the first bin, which I have set to -., is not a height. This 
data point needs to have a valid floating point value to conform to the rules 
for a coordinate array, but, unlike the rest of the array, it should not be 
interpreted as height. This is signalled by the presence of an auxiliary 
coordinate -- but I'm not sure that that is adequate. Applications and users 
are entitled to believe that a variable which has standard name "height" really 
refers to height, without having to check all the auxiliary coordinates to see 
if there is something there which modifies the meaning of the variable. The 
standard name "height_bins" would signal that they must look in the auxiliary 
coordinate.


Do you agree with the necessity and appropriateness of the new name of 
"bin_status_flag" which I have suggested for the auxiliary coordinate?


regards,

Martin


From: CF-metadata  on behalf of Jonathan 
Gregory 
Sent: 13 May 2019 18:00
To: cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] Missing data bins in histograms

Dear Martin

I agree that an alternative which would not require a change to the convention 
is to attach a string-valued aux coord variable. However, the flags are much 
more economical and seem natural, as you say.

As I said in my last email, I feel th

Re: [CF-metadata] Missing data bins in histograms

2019-05-13 Thread Jonathan Gregory
Dear Martin

I agree that an alternative which would not require a change to the
convention is to attach a string-valued aux coord variable. However, the
flags are much more economical and seem natural, as you say.

As I said in my last email, I feel that it's better to keep the standard name
as it is, despite the presence of a special value in it which isn't really a
coordinate value. Maybe a valid_range could be specified, with the special
value outside the range? I'm not sure if that would count as an error, but it
is not the same as reinterpreting missing data, which would be problematic.

Best wishes

Jonathan

- Forwarded message from Martin Juckes - UKRI STFC 
 -

> Date: Mon, 13 May 2019 09:39:11 +
> From: Martin Juckes - UKRI STFC 
> To: Jim Biard , "cf-metadata@cgd.ucar.edu"
>       
> Subject: Re: [CF-metadata] Missing data bins in histograms
> 
> Dear Jim, Jonathan,
> 
> 
> OK, I accept your point that this would be a new meaning of "missing_value", 
> and that probably causes more serious problems than the one we are trying to 
> solve.
> 
> 
> I believe that Jonathan is also correct in saying that we can't use 
> flag_values for this purpose without a change in the convention ... but 
> perhaps we should explore whether such a change would lead to a good 
> solution. Do you have a specific proposal in mind?
> 
> 
> If not, here is an idea adapted from my last post. In this example I have 
> used a real valued dimension coordinate to define the data bins, with an 
> arbitrary value in the first bin. The new feature is an auxiliary coordinate 
> which is constructed to label the first bin as "missing_data" and the 
> remaining bins as "data".
> 
> 
> I believe that it would be mis-leading to use the existing standard name 
> "height" for the variable "zbins" in this example, because the first bin is 
> not a height range. Hence, I suggest introducing a new standard name 
> "height_bins" which would allow one or more bins to have special meanings 
> which should be indicated by a string valued auxiliary coordinate.
> 
> 
> "zbin_flags" could also be encoded as a character array, with values 
> ["missing_values", "data", "data", ..]. This could be done without 
> changing the convention text, since character arrays are allowed auxiliary 
> coordinate. However, I believe that it would be worth making an adjustment 
> here, since the use of flags seems more natural.
> 
> 
> I've given the "zbin_flags" a new standard name "bin_status_flag", which is 
> related to the existing term "status_flag", but refers to the status of the 
> data being counted in the histogram rather than to the status of the 
> histogram itself. This allows us to refer unambiguously to the fact that we 
> are counting missing data in the first bin.
> 
> 
> Would this work?
> 
> 
> float data(time,lat,lon,zbins);
> 
>   data: standard_name =   
> "histogram_of_equivalent_reflectivity_factor_over_height_above_reference_ellipsoid";
> 
>   data: coordinates="zbin_flags status";
> 
> float zbins(zbina);
> 
>   zbins: long_name="Height ranges (with bin for missing data at first 
> element)"
> 
>   zbins: units="m";
> 
>   zbins: bounds="zbin_bnds";
> 
>   zbins: standard_name = "height_bins";
> 
> float zbin_bnds(zindex,2);
> 
> integer zbin_flags(zbins);
> 
>zbin_flags: long_name = "Flags indicating the status of data which is 
> counted in each bin";
> 
>zbin_flags:standard_name = "bin_status_flag";
> 
>zbin_flags:flag_values = 0,1;
> 
>zbin_flags:flag_meanings = "missing_values data";
> 
> character status(char_len);
> 
>    status:standard_name = "status_flag";
> 
>status:long_name = "Flag indicating quality of histogram";
> 
> float lat(lat);
> 
> float lon(lon);
> 
> 
> data:
> 
>   zbins = -., 25., 100., ;
> 
>   zbin_bnds = -.,0., 0., 50., 50., 150., ...
> 
>   zbin_flags = 0,1,1,1,...
> 
> 
> Regards,
> Martin
> 
> 
> From: CF-metadata  on behalf of Jim Biard 
> 
> Sent: 10 May 2019 15:58
> To: cf-metadata@cgd.ucar.edu
> Subject: Re: [CF-metadata] Missing data bins in histograms
> 
> 
> Hi.
> 
> Sorry I have been so quiet lately. I've been caught up in other activities.
> 
> I have a strong aversion to the proposal to overload the missing_value 
> attribute with a wholly different meaning. Using missing_value in this way 
> will produce unexp

Re: [CF-metadata] Missing data bins in histograms

2019-05-13 Thread Jonathan Gregory
Dear Jim and Martin

I think the minimum is to change App A, but it would be helpful to add some
words and a new example to Section 3.5, because the flag_* attributes are
introduced and discussed only for data variables there.

Cheers

Jonathan

- Forwarded message from Jim Biard  -

> Date: Mon, 13 May 2019 10:46:30 -0400
> From: Jim Biard 
> To: "cf-metadata@cgd.ucar.edu" 
> Subject: Re: [CF-metadata] Missing data bins in histograms
> User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0)
>   Gecko/20100101 Thunderbird/60.6.1
> 
> Martin,
> 
> I believe that the convention change is as simple as changing the
> table cells where it declares the flag_* attributes to be for data
> variables only to declare that the flag_* attributes can be used on
> coordinate variables as well.
> 
> Grace and peace,
> 
> Jim
> 
> On 5/13/19 5:39 AM, Martin Juckes - UKRI STFC wrote:
> >Dear Jim, Jonathan,
> >
> >
> >OK, I accept your point that this would be a new meaning of "missing_value", 
> >and that probably causes more serious problems than the one we are trying to 
> >solve.
> >
> >
> >I believe that Jonathan is also correct in saying that we can't use 
> >flag_values for this purpose without a change in the convention ... but 
> >perhaps we should explore whether such a change would lead to a good 
> >solution. Do you have a specific proposal in mind?
> >
> >
> >If not, here is an idea adapted from my last post. In this example I have 
> >used a real valued dimension coordinate to define the data bins, with an 
> >arbitrary value in the first bin. The new feature is an auxiliary coordinate 
> >which is constructed to label the first bin as "missing_data" and the 
> >remaining bins as "data".
> >
> >
> >I believe that it would be mis-leading to use the existing standard name 
> >"height" for the variable "zbins" in this example, because the first bin is 
> >not a height range. Hence, I suggest introducing a new standard name 
> >"height_bins" which would allow one or more bins to have special meanings 
> >which should be indicated by a string valued auxiliary coordinate.
> >
> >
> >"zbin_flags" could also be encoded as a character array, with values 
> >["missing_values", "data", "data", ..]. This could be done without 
> >changing the convention text, since character arrays are allowed auxiliary 
> >coordinate. However, I believe that it would be worth making an adjustment 
> >here, since the use of flags seems more natural.
> >
> >
> >I've given the "zbin_flags" a new standard name "bin_status_flag", which is 
> >related to the existing term "status_flag", but refers to the status of the 
> >data being counted in the histogram rather than to the status of the 
> >histogram itself. This allows us to refer unambiguously to the fact that we 
> >are counting missing data in the first bin.
> >
> >
> >Would this work?
> >
> >
> >float data(time,lat,lon,zbins);
> >
> >   data: standard_name =   
> > "histogram_of_equivalent_reflectivity_factor_over_height_above_reference_ellipsoid";
> >
> >   data: coordinates="zbin_flags status";
> >
> >float zbins(zbina);
> >
> >   zbins: long_name="Height ranges (with bin for missing data at first 
> > element)"
> >
> >   zbins: units="m";
> >
> >   zbins: bounds="zbin_bnds";
> >
> >   zbins: standard_name = "height_bins";
> >
> >float zbin_bnds(zindex,2);
> >
> >integer zbin_flags(zbins);
> >
> >zbin_flags: long_name = "Flags indicating the status of data which is 
> > counted in each bin";
> >
> >zbin_flags:standard_name = "bin_status_flag";
> >
> >zbin_flags:flag_values = 0,1;
> >
> >zbin_flags:flag_meanings = "missing_values data";
> >
> >character status(char_len);
> >
> >status:standard_name = "status_flag";
> >
> >status:long_name = "Flag indicating quality of histogram";
> >
> >float lat(lat);
> >
> >float lon(lon);
> >
> >
> >data:
> >
> >   zbins = -., 25., 100., ;
> >
> >   zbin_bnds = -.,0., 0., 50., 50., 150., ...
> >
> >   zbin_flags = 0,1,1,1,...
> >
> >
> >Regards,
> >Martin
> >
> >__

Re: [CF-metadata] Missing data bins in histograms

2019-05-13 Thread Jim Biard

Martin,

I believe that the convention change is as simple as changing the table 
cells where it declares the flag_* attributes to be for data variables 
only to declare that the flag_* attributes can be used on coordinate 
variables as well.


Grace and peace,

Jim

On 5/13/19 5:39 AM, Martin Juckes - UKRI STFC wrote:

Dear Jim, Jonathan,


OK, I accept your point that this would be a new meaning of "missing_value", 
and that probably causes more serious problems than the one we are trying to solve.


I believe that Jonathan is also correct in saying that we can't use flag_values 
for this purpose without a change in the convention ... but perhaps we should 
explore whether such a change would lead to a good solution. Do you have a 
specific proposal in mind?


If not, here is an idea adapted from my last post. In this example I have used a real valued 
dimension coordinate to define the data bins, with an arbitrary value in the first bin. The new 
feature is an auxiliary coordinate which is constructed to label the first bin as 
"missing_data" and the remaining bins as "data".


I believe that it would be mis-leading to use the existing standard name "height" for the variable 
"zbins" in this example, because the first bin is not a height range. Hence, I suggest introducing 
a new standard name "height_bins" which would allow one or more bins to have special meanings which 
should be indicated by a string valued auxiliary coordinate.


"zbin_flags" could also be encoded as a character array, with values ["missing_values", 
"data", "data", ..]. This could be done without changing the convention text, since character 
arrays are allowed auxiliary coordinate. However, I believe that it would be worth making an adjustment here, since the 
use of flags seems more natural.


I've given the "zbin_flags" a new standard name "bin_status_flag", which is related to 
the existing term "status_flag", but refers to the status of the data being counted in the 
histogram rather than to the status of the histogram itself. This allows us to refer unambiguously to the 
fact that we are counting missing data in the first bin.


Would this work?


float data(time,lat,lon,zbins);

   data: standard_name =   
"histogram_of_equivalent_reflectivity_factor_over_height_above_reference_ellipsoid";

   data: coordinates="zbin_flags status";

float zbins(zbina);

   zbins: long_name="Height ranges (with bin for missing data at first element)"

   zbins: units="m";

   zbins: bounds="zbin_bnds";

   zbins: standard_name = "height_bins";

float zbin_bnds(zindex,2);

integer zbin_flags(zbins);

zbin_flags: long_name = "Flags indicating the status of data which is counted in 
each bin";

zbin_flags:standard_name = "bin_status_flag";

zbin_flags:flag_values = 0,1;

zbin_flags:flag_meanings = "missing_values data";

character status(char_len);

status:standard_name = "status_flag";

status:long_name = "Flag indicating quality of histogram";

float lat(lat);

float lon(lon);


data:

   zbins = -., 25., 100., ;

   zbin_bnds = -.,0., 0., 50., 50., 150., ...

   zbin_flags = 0,1,1,1,...


Regards,
Martin


From: CF-metadata  on behalf of Jim Biard 

Sent: 10 May 2019 15:58
To: cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] Missing data bins in histograms


Hi.

Sorry I have been so quiet lately. I've been caught up in other activities.

I have a strong aversion to the proposal to overload the missing_value 
attribute with a wholly different meaning. Using missing_value in this way will 
produce unexpected results in a number of existing software packages. If the 
minor modification to CF to designate flag attributes to be used on coordinate 
variables doesn't seem like an acceptable solution for one reason or another, I 
think we should define a new convention that doesn't add contradictory 
interpretations of existing attributes.

Grace and peace,

Jim

On 5/2/19 11:49 AM, Martin Juckes - UKRI STFC wrote:

Dear Jonathan, Jim,



I’m sorry to have dropped this conversation after starting it three years ago. 
We ended up not fixing the problem for CMIP6, but I think it is worth taking 
another look.



Coming back to it again, I think that a variation on Jim’s suggestion could 
work: rather than using flags it should be possible to use a coordinate 
variable, as is done for some CMIP variables that have region names along one 
axis. The NetCDF  dimension would be an index, and the array of values defining 
the bins would be an auxiliary coordinate which, I believe, is not subject to 
the rules on monotonicity and missing values which apply to NetCDF dimensions. 
There may be a need for some clarifications, but I think this approa

Re: [CF-metadata] Missing data bins in histograms

2019-05-13 Thread Martin Juckes - UKRI STFC
Dear Jim, Jonathan,


OK, I accept your point that this would be a new meaning of "missing_value", 
and that probably causes more serious problems than the one we are trying to 
solve.


I believe that Jonathan is also correct in saying that we can't use flag_values 
for this purpose without a change in the convention ... but perhaps we should 
explore whether such a change would lead to a good solution. Do you have a 
specific proposal in mind?


If not, here is an idea adapted from my last post. In this example I have used 
a real valued dimension coordinate to define the data bins, with an arbitrary 
value in the first bin. The new feature is an auxiliary coordinate which is 
constructed to label the first bin as "missing_data" and the remaining bins as 
"data".


I believe that it would be mis-leading to use the existing standard name 
"height" for the variable "zbins" in this example, because the first bin is not 
a height range. Hence, I suggest introducing a new standard name "height_bins" 
which would allow one or more bins to have special meanings which should be 
indicated by a string valued auxiliary coordinate.


"zbin_flags" could also be encoded as a character array, with values 
["missing_values", "data", "data", ..]. This could be done without changing 
the convention text, since character arrays are allowed auxiliary coordinate. 
However, I believe that it would be worth making an adjustment here, since the 
use of flags seems more natural.


I've given the "zbin_flags" a new standard name "bin_status_flag", which is 
related to the existing term "status_flag", but refers to the status of the 
data being counted in the histogram rather than to the status of the histogram 
itself. This allows us to refer unambiguously to the fact that we are counting 
missing data in the first bin.


Would this work?


float data(time,lat,lon,zbins);

  data: standard_name =   
"histogram_of_equivalent_reflectivity_factor_over_height_above_reference_ellipsoid";

  data: coordinates="zbin_flags status";

float zbins(zbina);

  zbins: long_name="Height ranges (with bin for missing data at first element)"

  zbins: units="m";

  zbins: bounds="zbin_bnds";

  zbins: standard_name = "height_bins";

float zbin_bnds(zindex,2);

integer zbin_flags(zbins);

   zbin_flags: long_name = "Flags indicating the status of data which is 
counted in each bin";

   zbin_flags:standard_name = "bin_status_flag";

   zbin_flags:flag_values = 0,1;

   zbin_flags:flag_meanings = "missing_values data";

character status(char_len);

   status:standard_name = "status_flag";

   status:long_name = "Flag indicating quality of histogram";

float lat(lat);

float lon(lon);


data:

  zbins = -., 25., 100., ;

  zbin_bnds = -.,0., 0., 50., 50., 150., ...

  zbin_flags = 0,1,1,1,...


Regards,
Martin


From: CF-metadata  on behalf of Jim Biard 

Sent: 10 May 2019 15:58
To: cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] Missing data bins in histograms


Hi.

Sorry I have been so quiet lately. I've been caught up in other activities.

I have a strong aversion to the proposal to overload the missing_value 
attribute with a wholly different meaning. Using missing_value in this way will 
produce unexpected results in a number of existing software packages. If the 
minor modification to CF to designate flag attributes to be used on coordinate 
variables doesn't seem like an acceptable solution for one reason or another, I 
think we should define a new convention that doesn't add contradictory 
interpretations of existing attributes.

Grace and peace,

Jim

On 5/2/19 11:49 AM, Martin Juckes - UKRI STFC wrote:

Dear Jonathan, Jim,



I’m sorry to have dropped this conversation after starting it three years ago. 
We ended up not fixing the problem for CMIP6, but I think it is worth taking 
another look.



Coming back to it again, I think that a variation on Jim’s suggestion could 
work: rather than using flags it should be possible to use a coordinate 
variable, as is done for some CMIP variables that have region names along one 
axis. The NetCDF  dimension would be an index, and the array of values defining 
the bins would be an auxiliary coordinate which, I believe, is not subject to 
the rules on monotonicity and missing values which apply to NetCDF dimensions. 
There may be a need for some clarifications, but I think this approach would be 
much closer to the current convention that any change in the specification for 
non-auxiliary coordinate variables.


We have a specific use case in CMIP6 for which the bins are height bins (height 
of detected cloud), with one bin reserved for "retri

Re: [CF-metadata] Missing data bins in histograms

2019-05-13 Thread Jonathan Gregory
Dear Martin and Jim

I prefer the proposal to use flag_values and flag_meanings to indicate special
values of a coordinate axis. In Martin's use-case, the coordinate variable
contains physical values of height for all the elements of the axis apart
from the special "retrieval error" bin. To me it feels less satisfactory to
convert this coordinate variable to an auxiliary coordinate variable with a
different standard name just because of this bin. Instead, I feel we should
keep the standard name unchanged, with the special bin in the right place in
monotonic order. That means generic software which doesn't recognise the
special coordinate value will treat it as a physical value. This is a hazard
because it could lead to peculiar plots, or to meaningless results if, for
example, the coordinate axis is differentiated. I think we can tolerate this
hazard. What do you think?

The flag_values mechanism could be used to indicate to up-to-date CF-aware
software that the value is special (not missing), as Jim suggested. As I said
before, this would need a change to the convention, but only a small one, just
to permit flag_values and flag_meanings for coordinate variables as well as
data variables. This requires a change to Appendix A, and it would be sensible
to have a few words and a new example in Section 3.5.

I suspect that if we add this possibility there would be other applications for
it. For example, it could be used to indicate that the bottom or top element of
a coordinate axis has bounds which are open on one side.

Best wishes

Jonathan

- Forwarded message from Jim Biard  -

> Date: Fri, 10 May 2019 10:58:48 -0400
> From: Jim Biard 
> To: "cf-metadata@cgd.ucar.edu" 
> Subject: Re: [CF-metadata] Missing data bins in histograms
> User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0)
>   Gecko/20100101 Thunderbird/60.6.1
> 
> Hi.
> 
> Sorry I have been so quiet lately. I've been caught up in other activities.
> 
> I have a strong aversion to the proposal to overload the
> missing_value attribute with a wholly different meaning. Using
> missing_value in this way will produce unexpected results in a
> number of existing software packages. If the minor modification to
> CF to designate flag attributes to be used on coordinate variables
> doesn't seem like an acceptable solution for one reason or another,
> I think we should define a new convention that doesn't add
> contradictory interpretations of existing attributes.
> 
> Grace and peace,
> 
> Jim
> 
> On 5/2/19 11:49 AM, Martin Juckes - UKRI STFC wrote:
> >Dear Jonathan, Jim,
> >
> >
> >
> >I’m sorry to have dropped this conversation after starting it three years 
> >ago. We ended up not fixing the problem for CMIP6, but I think it is worth 
> >taking another look.
> >
> >
> >
> >Coming back to it again, I think that a variation on Jim’s suggestion could 
> >work: rather than using flags it should be possible to use a coordinate 
> >variable, as is done for some CMIP variables that have region names along 
> >one axis. The NetCDF  dimension would be an index, and the array of values 
> >defining the bins would be an auxiliary coordinate which, I believe, is not 
> >subject to the rules on monotonicity and missing values which apply to 
> >NetCDF dimensions. There may be a need for some clarifications, but I think 
> >this approach would be much closer to the current convention that any change 
> >in the specification for non-auxiliary coordinate variables.
> >
> >
> >We have a specific use case in CMIP6 for which the bins are height bins 
> >(height of detected cloud), with one bin reserved for "retrieval error".
> >
> >
> >This might not need a change in the convention rules, but it would help, I 
> >think, to at least add an example and a standard name for the coordinate 
> >variable. For example:
> >
> >
> >float data(time,lat,lon,zindex);
> >
> >   data: standard_name =   
> > "histogram_of_equivalent_reflectivity_factor_over_height_above_reference_ellipsoid";
> >
> >   data: coordinates="zbins";
> >
> >float zbins(zindex);
> >
> >   zbins: long_name="Height ranges (with bin for missing data at first 
> > element)";
> >
> >   zbins:missing_value= -.;
> >
> >   zbins: units="m";
> >
> >   zbins: bounds="zbin_bnds";
> >
> >   zbins: standard_name = "";
> >
> >float zbin_bnds(zindex,2);
> >
> >   zbin_bnds:missing_value= -.;
> >
> >float lat(lat);
> >
> >float lon(lon);
> >
> >
> >data:
> >
>

Re: [CF-metadata] Missing data bins in histograms

2019-05-10 Thread Jim Biard

Hi.

Sorry I have been so quiet lately. I've been caught up in other activities.

I have a strong aversion to the proposal to overload the missing_value 
attribute with a wholly different meaning. Using missing_value in this 
way will produce unexpected results in a number of existing software 
packages. If the minor modification to CF to designate flag attributes 
to be used on coordinate variables doesn't seem like an acceptable 
solution for one reason or another, I think we should define a new 
convention that doesn't add contradictory interpretations of existing 
attributes.


Grace and peace,

Jim

On 5/2/19 11:49 AM, Martin Juckes - UKRI STFC wrote:

Dear Jonathan, Jim,



I’m sorry to have dropped this conversation after starting it three years ago. 
We ended up not fixing the problem for CMIP6, but I think it is worth taking 
another look.



Coming back to it again, I think that a variation on Jim’s suggestion could 
work: rather than using flags it should be possible to use a coordinate 
variable, as is done for some CMIP variables that have region names along one 
axis. The NetCDF  dimension would be an index, and the array of values defining 
the bins would be an auxiliary coordinate which, I believe, is not subject to 
the rules on monotonicity and missing values which apply to NetCDF dimensions. 
There may be a need for some clarifications, but I think this approach would be 
much closer to the current convention that any change in the specification for 
non-auxiliary coordinate variables.


We have a specific use case in CMIP6 for which the bins are height bins (height of 
detected cloud), with one bin reserved for "retrieval error".


This might not need a change in the convention rules, but it would help, I 
think, to at least add an example and a standard name for the coordinate 
variable. For example:


float data(time,lat,lon,zindex);

   data: standard_name =   
"histogram_of_equivalent_reflectivity_factor_over_height_above_reference_ellipsoid";

   data: coordinates="zbins";

float zbins(zindex);

   zbins: long_name="Height ranges (with bin for missing data at first 
element)";

   zbins:missing_value= -.;

   zbins: units="m";

   zbins: bounds="zbin_bnds";

   zbins: standard_name = "";

float zbin_bnds(zindex,2);

   zbin_bnds:missing_value= -.;

float lat(lat);

float lon(lon);


data:

   zbins = -., 25., 100., ;

   zbin_bnds = -.,-., 0., 50., 50., 150., ...


The use of missing_value in the bounds variable appears to conflict with 
conformance rules, but I'm not sure if this is really banned by the convention 
in this context.


Using missing_value in this way appears to be acceptable to the convention, but I think 
it conflicts with the spirit of the convention: it is not indicating that a value of 
"zbins" is missing, but indicating that this index of the array relates to a 
count of missing values. For this reason I have omitted _FillValue.


The "zbins" auxiliary coordinate here is a height-like variable, but I don't think we can use a standard name 
"height": is it worth adding a standard name "height_bins" defined to be "Height ranges, as 
used, for example in a histogram or frequency distribution. A variable with this standard name may include a special 
bin for the count or frequency of missing data. This should be indicated by setting the value of that bin and its 
bounds to equal the missing_value of the variable. If there is no missing value bin, it is recommended that the term 
'height' be used instead."


regards,

Martin


CF-metadata] Missing data bins in histograms

Jonathan Gregoryj.m.gregory at reading.ac.uk 
<mailto:cf-metadata%40cgd.ucar.edu?Subject=Re%3A%20%5BCF-metadata%5D%20Missing%20data%20bins%20in%20histograms=%3C20161013094247.GF6219%40met.reading.ac.uk%3E>
Thu Oct 13 03:42:47 MDT 2016

   *   Previous message (by thread): [CF-metadata] Missing data bins in histograms 
<http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2016/018983.html>
   *   Next message (by thread): [CF-metadata] Usage of histogram_of_X_over_Z 
<http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2016/008836.html>
   *   Messages sorted by: [ date 
]<http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2016/date.html#18984> [ thread 
]<http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2016/thread.html#18984> [ subject 
]<http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2016/subject.html#18984> [ author 
]<http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2016/author.html#18984>



Dear Jim



In Appendix A it does not say that the flag attributes are allowed for

coordinate variables - it has just "D" in the "Use" column. This is not an

argument why they shouldn't be if there is a need, but they weren't introduced

with that in mind. The use 

Re: [CF-metadata] Missing data bins in histograms

2019-05-02 Thread Martin Juckes - UKRI STFC
Dear Jonathan, Jim,



I’m sorry to have dropped this conversation after starting it three years ago. 
We ended up not fixing the problem for CMIP6, but I think it is worth taking 
another look.



Coming back to it again, I think that a variation on Jim’s suggestion could 
work: rather than using flags it should be possible to use a coordinate 
variable, as is done for some CMIP variables that have region names along one 
axis. The NetCDF  dimension would be an index, and the array of values defining 
the bins would be an auxiliary coordinate which, I believe, is not subject to 
the rules on monotonicity and missing values which apply to NetCDF dimensions. 
There may be a need for some clarifications, but I think this approach would be 
much closer to the current convention that any change in the specification for 
non-auxiliary coordinate variables.


We have a specific use case in CMIP6 for which the bins are height bins (height 
of detected cloud), with one bin reserved for "retrieval error".


This might not need a change in the convention rules, but it would help, I 
think, to at least add an example and a standard name for the coordinate 
variable. For example:


float data(time,lat,lon,zindex);

  data: standard_name =   
"histogram_of_equivalent_reflectivity_factor_over_height_above_reference_ellipsoid";

  data: coordinates="zbins";

float zbins(zindex);

  zbins: long_name="Height ranges (with bin for missing data at first element)";

  zbins:missing_value= -.;

  zbins: units="m";

  zbins: bounds="zbin_bnds";

  zbins: standard_name = "";

float zbin_bnds(zindex,2);

  zbin_bnds:missing_value= -.;

float lat(lat);

float lon(lon);


data:

  zbins = -., 25., 100., ;

  zbin_bnds = -.,-., 0., 50., 50., 150., ...


The use of missing_value in the bounds variable appears to conflict with 
conformance rules, but I'm not sure if this is really banned by the convention 
in this context.


Using missing_value in this way appears to be acceptable to the convention, but 
I think it conflicts with the spirit of the convention: it is not indicating 
that a value of "zbins" is missing, but indicating that this index of the array 
relates to a count of missing values. For this reason I have omitted _FillValue.


The "zbins" auxiliary coordinate here is a height-like variable, but I don't 
think we can use a standard name "height": is it worth adding a standard name 
"height_bins" defined to be "Height ranges, as used, for example in a histogram 
or frequency distribution. A variable with this standard name may include a 
special bin for the count or frequency of missing data. This should be 
indicated by setting the value of that bin and its bounds to equal the 
missing_value of the variable. If there is no missing value bin, it is 
recommended that the term 'height' be used instead."


regards,

Martin


CF-metadata] Missing data bins in histograms

Jonathan Gregoryj.m.gregory at reading.ac.uk 
<mailto:cf-metadata%40cgd.ucar.edu?Subject=Re%3A%20%5BCF-metadata%5D%20Missing%20data%20bins%20in%20histograms=%3C20161013094247.GF6219%40met.reading.ac.uk%3E>
Thu Oct 13 03:42:47 MDT 2016

  *   Previous message (by thread): [CF-metadata] Missing data bins in 
histograms <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2016/018983.html>
  *   Next message (by thread): [CF-metadata] Usage of histogram_of_X_over_Z 
<http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2016/008836.html>
  *   Messages sorted by: [ date 
]<http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2016/date.html#18984> [ 
thread 
]<http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2016/thread.html#18984> [ 
subject 
]<http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2016/subject.html#18984> [ 
author 
]<http://mailman.cgd.ucar.edu/pipermail/cf-metadata/2016/author.html#18984>



Dear Jim



In Appendix A it does not say that the flag attributes are allowed for

coordinate variables - it has just "D" in the "Use" column. This is not an

argument why they shouldn't be if there is a need, but they weren't introduced

with that in mind. The use which you suggested for Martin's case is a good

idea, but I think it would need a change to the convention.



Best wishes



Jonathan



- Forwarded message from Jim Biard http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>> -



> Date: Wed, 12 Oct 2016 14:58:11 -0400

> From: Jim Biard  cicsnc.org<http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>>

> To: cf-metadata at 
> cgd.ucar.edu<http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata>

> Subject: Re: [CF-metadata] Missing data bins in histograms

> User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)

>  Gecko/20100101 Thund

Re: [CF-metadata] Missing data bins in histograms

2016-10-13 Thread Jonathan Gregory
Dear Jim

In Appendix A it does not say that the flag attributes are allowed for
coordinate variables - it has just "D" in the "Use" column. This is not an
argument why they shouldn't be if there is a need, but they weren't introduced
with that in mind. The use which you suggested for Martin's case is a good
idea, but I think it would need a change to the convention.

Best wishes

Jonathan

- Forwarded message from Jim Biard <jbi...@cicsnc.org> -

> Date: Wed, 12 Oct 2016 14:58:11 -0400
> From: Jim Biard <jbi...@cicsnc.org>
> To: cf-metadata@cgd.ucar.edu
> Subject: Re: [CF-metadata] Missing data bins in histograms
> User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)
>   Gecko/20100101 Thunderbird/45.4.0
> 
> Jonathan,
> 
> Missing/fill values are not allowed, but I don't see any language
> prohibiting flags. I'd appreciate it if you could expand on your
> thoughts about why they aren't allowed.
> 
> Grace and peace,
> 
> Jim
> On 10/12/16 1:30 PM, Jonathan Gregory wrote:
> >Dear Jim
> >
> >That is an ingenious idea. I don't think the flag atts are currently allowed
> >for coord variables, but they could be, I agree.
> >
> >Best wishes
> >
> >Jonathan
> >
> >- Forwarded message from Jim Biard <jbi...@cicsnc.org> -
> >
> >>Date: Tue, 11 Oct 2016 14:39:56 -0400
> >>From: Jim Biard <jbi...@cicsnc.org>
> >>To: cf-metadata@cgd.ucar.edu
> >>Subject: Re: [CF-metadata] Missing data bins in histograms
> >>User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)
> >>Gecko/20100101 Thunderbird/45.4.0
> >>
> >>Hi.
> >>
> >>Another approach could be to use flag_values and flag_meanings on
> >>the coordinate variable to indicate one or more special coordinate
> >>values that correspond to any number of "missing data" or "out of
> >>bounds" bins. These attributes aren't forbidden by CF, and
> >>everything should be fine as long as the coordinate variable remains
> >>monotonic.
> >>
> >>Grace and peace,
> >>
> >>Jim
> >>
> >>On 10/11/16 8:41 AM, martin.juc...@stfc.ac.uk wrote:
> >>>Hello,
> >>>
> >>>the CF standard name list has two "histogram_ " entries, and in the 
> >>>CMIP6 data request we may need to add a third, a 
> >>>histogram_of_cloud_top_height. Besides the standard name, we also need, 
> >>>for this new variable, a method of encoding the "missing data" bin in the 
> >>>histogram. That is, the histogram should record frequency in 16 data bins 
> >>>and one additional bin for the frequency of missing data.
> >>>
> >>>Can we define a "missing_data_index" attribute for histogram variables, 
> >>>and use this to indicate that the first bin in the array has this special 
> >>>purpose. It might be more pythonic to put the _FillValue in the coordinate 
> >>>value for the missing data bin, but I suspect that this would cause 
> >>>substantial problems for many software packages.
> >>>
> >>>regards,
> >>>Martin
> >>>___
> >>>CF-metadata mailing list
> >>>CF-metadata@cgd.ucar.edu
> >>>http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> >>-- 
> >>CICS-NC <http://www.cicsnc.org/> Visit us on
> >>Facebook <http://www.facebook.com/cicsnc>   *Jim Biard*
> >>*Research Scholar*
> >>Cooperative Institute for Climate and Satellites NC <http://cicsnc.org/>
> >>North Carolina State University <http://ncsu.edu/>
> >>NOAA National Centers for Environmental Information <http://ncdc.noaa.gov/>
> >>/formerly NOAA’s National Climatic Data Center/
> >>151 Patton Ave, Asheville, NC 28801
> >>e: jbi...@cicsnc.org <mailto:jbi...@cicsnc.org>
> >>o: +1 828 271 4900
> >>
> >>/Connect with us on Facebook for climate
> >><https://www.facebook.com/NOAANCEIclimate> and ocean and geophysics
> >><https://www.facebook.com/NOAANCEIoceangeo> information, and follow
> >>us on Twitter at @NOAANCEIclimate
> >><https://twitter.com/NOAANCEIclimate> and @NOAANCEIocngeo
> >><https://twitter.com/NOAANCEIocngeo>. /
> >>
> >>
> >>___
> >>CF-metadata mailing list
> >>CF-metadata@cgd.ucar.edu
> >>http://mailm

[CF-metadata] Missing data bins in histograms

2016-10-13 Thread Jonathan Gregory
Dear Martin

Ah, OK, thanks. I must have misunderstood.

Best wishes

Jonathan

- Forwarded message from martin.juc...@stfc.ac.uk -

> Date: Thu, 13 Oct 2016 08:20:56 +
> From: martin.juc...@stfc.ac.uk
> To: cf-metadata@cgd.ucar.edu
> Subject: [CF-metadata]  Missing data bins in histograms
> 
> Dear Jonathan,
> 
> I'm sorry I didn't respond on the point about it being the first bin: I had 
> not intended the special value to be restricted to the first bin, so I guess 
> there is something ambiguous in my intial formulation which is giving this 
> impression. I agree that we should formulate any extension so that it can 
> apply to any bin, and I also think it should be possible to label multiple 
> bins in this way.
> 
> regards,
> Martin
> ___
> CF-metadata mailing list
> CF-metadata@cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


[CF-metadata] Missing data bins in histograms

2016-10-13 Thread martin.juckes
Dear Jonathan,

I'm sorry I didn't respond on the point about it being the first bin: I had not 
intended the special value to be restricted to the first bin, so I guess there 
is something ambiguous in my intial formulation which is giving this 
impression. I agree that we should formulate any extension so that it can apply 
to any bin, and I also think it should be possible to label multiple bins in 
this way.

regards,
Martin
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Missing data bins in histograms

2016-10-12 Thread Jim Biard

Jonathan,

Missing/fill values are not allowed, but I don't see any language 
prohibiting flags. I'd appreciate it if you could expand on your 
thoughts about why they aren't allowed.


Grace and peace,

Jim
On 10/12/16 1:30 PM, Jonathan Gregory wrote:

Dear Jim

That is an ingenious idea. I don't think the flag atts are currently allowed
for coord variables, but they could be, I agree.

Best wishes

Jonathan

- Forwarded message from Jim Biard <jbi...@cicsnc.org> -


Date: Tue, 11 Oct 2016 14:39:56 -0400
From: Jim Biard <jbi...@cicsnc.org>
To: cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] Missing data bins in histograms
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)
Gecko/20100101 Thunderbird/45.4.0

Hi.

Another approach could be to use flag_values and flag_meanings on
the coordinate variable to indicate one or more special coordinate
values that correspond to any number of "missing data" or "out of
bounds" bins. These attributes aren't forbidden by CF, and
everything should be fine as long as the coordinate variable remains
monotonic.

Grace and peace,

Jim

On 10/11/16 8:41 AM, martin.juc...@stfc.ac.uk wrote:

Hello,

the CF standard name list has two "histogram_ " entries, and in the CMIP6 data 
request we may need to add a third, a histogram_of_cloud_top_height. Besides the standard name, we 
also need, for this new variable, a method of encoding the "missing data" bin in the 
histogram. That is, the histogram should record frequency in 16 data bins and one additional bin 
for the frequency of missing data.

Can we define a "missing_data_index" attribute for histogram variables, and use 
this to indicate that the first bin in the array has this special purpose. It might be 
more pythonic to put the _FillValue in the coordinate value for the missing data bin, but 
I suspect that this would cause substantial problems for many software packages.

regards,
Martin
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

--
CICS-NC <http://www.cicsnc.org/> Visit us on
Facebook <http://www.facebook.com/cicsnc> *Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC <http://cicsnc.org/>
North Carolina State University <http://ncsu.edu/>
NOAA National Centers for Environmental Information <http://ncdc.noaa.gov/>
/formerly NOAA’s National Climatic Data Center/
151 Patton Ave, Asheville, NC 28801
e: jbi...@cicsnc.org <mailto:jbi...@cicsnc.org>
o: +1 828 271 4900

/Connect with us on Facebook for climate
<https://www.facebook.com/NOAANCEIclimate> and ocean and geophysics
<https://www.facebook.com/NOAANCEIoceangeo> information, and follow
us on Twitter at @NOAANCEIclimate
<https://twitter.com/NOAANCEIclimate> and @NOAANCEIocngeo
<https://twitter.com/NOAANCEIocngeo>. /


___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


- End forwarded message -
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


--
CICS-NC <http://www.cicsnc.org/> Visit us on
Facebook <http://www.facebook.com/cicsnc> *Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC <http://cicsnc.org/>
North Carolina State University <http://ncsu.edu/>
NOAA National Centers for Environmental Information <http://ncdc.noaa.gov/>
/formerly NOAA’s National Climatic Data Center/
151 Patton Ave, Asheville, NC 28801
e: jbi...@cicsnc.org <mailto:jbi...@cicsnc.org>
o: +1 828 271 4900

/Connect with us on Facebook for climate 
<https://www.facebook.com/NOAANCEIclimate> and ocean and geophysics 
<https://www.facebook.com/NOAANCEIoceangeo> information, and follow us 
on Twitter at @NOAANCEIclimate <https://twitter.com/NOAANCEIclimate> and 
@NOAANCEIocngeo <https://twitter.com/NOAANCEIocngeo>. /



___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


[CF-metadata] Missing data bins in histograms

2016-10-12 Thread Jonathan Gregory
Dear Martin

I'm still uneasy about it having to be the first bin, in particular, or are
you not set on that? If it can be identified from the coordinate value by
flags, it could be any bin.

I believe that a change to convention would be needed to allow flag values
to be used with coordinates, unless we've already agreed that in some ticket.

Best wishes

Jonathan

- Forwarded message from martin.juc...@stfc.ac.uk -

> Date: Wed, 12 Oct 2016 17:14:38 +
> From: martin.juc...@stfc.ac.uk
> To: cf-metadata@cgd.ucar.edu
> Subject: [CF-metadata]  Missing data bins in histograms
> 
> Dear Karl, Jonathan, Jim,
> 
> thanks for those comments.
> 
> The CMIP6 variable in question is clmisr 
> (http://clipc-services.ceda.ac.uk/dreq/u/59151ed6-9e49-11e5-803c-0d0b866b59f3.html)
>  with a coordinatte of 16 altitude bins 
> (http://clipc-services.ceda.ac.uk/dreq/u/dim:alt16.html ).
> 
> I'd be happy with Jim's proposed solution, which does not need any change to 
> the convention, though it may be a bit cryptic: all the examples in the 
> convention are for cases in which all array values are intended to match one 
> of the flag_values. Having an array which is a mixture of flags and "normal" 
> values would be a new usage.  We could, perhaps, introduce a consistency 
> problem: ticket 151 (http://cf-trac.llnl.gov/trac/ticket/151) explains how, 
> for variables with standard_name "area_type", flag_values and flag_meanings 
> can be used to encode the data, in which case it is the "flag_meanings" which 
> match the requirements of the standard name. Here, on the other hand, we want 
> the special bin to be the exception which is not described by the standard 
> name (altitude). So .. perhaps it is simpler to introduce a new attribute 
> name?
> 
> Concerning Jonathan and Karl's comments, the idea of calling it a 
> "missing_value" was a mistake I made, but it actually refers to locations 
> where cloud is detected but the height of the cloud cannot be retrieved.
> 
> The current proposal is to have a value of 0.0 in the coordinate and 
> (-99000.0,0.0) in the bounds of the special value "bin". I imagine these need 
> to be present, but I think their values are not going to mean anything.
> 
> It is certainly possible to do as Karl suggests and place an explanation in 
> the variable description. Having the special status of the first bin 
> explicitly flagged in way which can be easily picked up by software brings 
> added value.
> 
> regards,
> Martin
> 
> ___
> CF-metadata mailing list
> CF-metadata@cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

- End forwarded message -
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Missing data bins in histograms

2016-10-12 Thread Jonathan Gregory
Dear Jim

That is an ingenious idea. I don't think the flag atts are currently allowed
for coord variables, but they could be, I agree. 

Best wishes

Jonathan

- Forwarded message from Jim Biard <jbi...@cicsnc.org> -

> Date: Tue, 11 Oct 2016 14:39:56 -0400
> From: Jim Biard <jbi...@cicsnc.org>
> To: cf-metadata@cgd.ucar.edu
> Subject: Re: [CF-metadata] Missing data bins in histograms
> User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)
>   Gecko/20100101 Thunderbird/45.4.0
> 
> Hi.
> 
> Another approach could be to use flag_values and flag_meanings on
> the coordinate variable to indicate one or more special coordinate
> values that correspond to any number of "missing data" or "out of
> bounds" bins. These attributes aren't forbidden by CF, and
> everything should be fine as long as the coordinate variable remains
> monotonic.
> 
> Grace and peace,
> 
> Jim
> 
> On 10/11/16 8:41 AM, martin.juc...@stfc.ac.uk wrote:
> >Hello,
> >
> >the CF standard name list has two "histogram_ " entries, and in the 
> >CMIP6 data request we may need to add a third, a 
> >histogram_of_cloud_top_height. Besides the standard name, we also need, for 
> >this new variable, a method of encoding the "missing data" bin in the 
> >histogram. That is, the histogram should record frequency in 16 data bins 
> >and one additional bin for the frequency of missing data.
> >
> >Can we define a "missing_data_index" attribute for histogram variables, and 
> >use this to indicate that the first bin in the array has this special 
> >purpose. It might be more pythonic to put the _FillValue in the coordinate 
> >value for the missing data bin, but I suspect that this would cause 
> >substantial problems for many software packages.
> >
> >regards,
> >Martin
> >___
> >CF-metadata mailing list
> >CF-metadata@cgd.ucar.edu
> >http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> 
> -- 
> CICS-NC <http://www.cicsnc.org/> Visit us on
> Facebook <http://www.facebook.com/cicsnc> *Jim Biard*
> *Research Scholar*
> Cooperative Institute for Climate and Satellites NC <http://cicsnc.org/>
> North Carolina State University <http://ncsu.edu/>
> NOAA National Centers for Environmental Information <http://ncdc.noaa.gov/>
> /formerly NOAA’s National Climatic Data Center/
> 151 Patton Ave, Asheville, NC 28801
> e: jbi...@cicsnc.org <mailto:jbi...@cicsnc.org>
> o: +1 828 271 4900
> 
> /Connect with us on Facebook for climate
> <https://www.facebook.com/NOAANCEIclimate> and ocean and geophysics
> <https://www.facebook.com/NOAANCEIoceangeo> information, and follow
> us on Twitter at @NOAANCEIclimate
> <https://twitter.com/NOAANCEIclimate> and @NOAANCEIocngeo
> <https://twitter.com/NOAANCEIocngeo>. /
> 
> 

> ___
> CF-metadata mailing list
> CF-metadata@cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


- End forwarded message -
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


[CF-metadata] Missing data bins in histograms

2016-10-12 Thread martin.juckes
Dear Karl, Jonathan, Jim,

thanks for those comments.

The CMIP6 variable in question is clmisr 
(http://clipc-services.ceda.ac.uk/dreq/u/59151ed6-9e49-11e5-803c-0d0b866b59f3.html)
 with a coordinatte of 16 altitude bins 
(http://clipc-services.ceda.ac.uk/dreq/u/dim:alt16.html ).

I'd be happy with Jim's proposed solution, which does not need any change to 
the convention, though it may be a bit cryptic: all the examples in the 
convention are for cases in which all array values are intended to match one of 
the flag_values. Having an array which is a mixture of flags and "normal" 
values would be a new usage.  We could, perhaps, introduce a consistency 
problem: ticket 151 (http://cf-trac.llnl.gov/trac/ticket/151) explains how, for 
variables with standard_name "area_type", flag_values and flag_meanings can be 
used to encode the data, in which case it is the "flag_meanings" which match 
the requirements of the standard name. Here, on the other hand, we want the 
special bin to be the exception which is not described by the standard name 
(altitude). So .. perhaps it is simpler to introduce a new attribute name?

Concerning Jonathan and Karl's comments, the idea of calling it a 
"missing_value" was a mistake I made, but it actually refers to locations where 
cloud is detected but the height of the cloud cannot be retrieved.

The current proposal is to have a value of 0.0 in the coordinate and 
(-99000.0,0.0) in the bounds of the special value "bin". I imagine these need 
to be present, but I think their values are not going to mean anything.

It is certainly possible to do as Karl suggests and place an explanation in the 
variable description. Having the special status of the first bin explicitly 
flagged in way which can be easily picked up by software brings added value.

regards,
Martin

___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Missing data bins in histograms

2016-10-11 Thread Jim Biard

Hi.

Another approach could be to use flag_values and flag_meanings on the 
coordinate variable to indicate one or more special coordinate values 
that correspond to any number of "missing data" or "out of bounds" bins. 
These attributes aren't forbidden by CF, and everything should be fine 
as long as the coordinate variable remains monotonic.


Grace and peace,

Jim

On 10/11/16 8:41 AM, martin.juc...@stfc.ac.uk wrote:

Hello,

the CF standard name list has two "histogram_ " entries, and in the CMIP6 data 
request we may need to add a third, a histogram_of_cloud_top_height. Besides the standard name, we 
also need, for this new variable, a method of encoding the "missing data" bin in the 
histogram. That is, the histogram should record frequency in 16 data bins and one additional bin 
for the frequency of missing data.

Can we define a "missing_data_index" attribute for histogram variables, and use 
this to indicate that the first bin in the array has this special purpose. It might be 
more pythonic to put the _FillValue in the coordinate value for the missing data bin, but 
I suspect that this would cause substantial problems for many software packages.

regards,
Martin
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


--
CICS-NC  Visit us on
Facebook  *Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC 
North Carolina State University 
NOAA National Centers for Environmental Information 
/formerly NOAA’s National Climatic Data Center/
151 Patton Ave, Asheville, NC 28801
e: jbi...@cicsnc.org 
o: +1 828 271 4900

/Connect with us on Facebook for climate 
 and ocean and geophysics 
 information, and follow us 
on Twitter at @NOAANCEIclimate  and 
@NOAANCEIocngeo . /



___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Missing data bins in histograms

2016-10-11 Thread Karl Taylor

Hello,

The histogram records frequencies of a single characteristic of a 
variable (in this case for cloud top height).  I think that information 
about whether or not a cloud exists should not be formally a part of the 
histogram.  We could adopt the convention for this variable that in the 
absence of clouds, the cloud is considered to be "under ground" so the 
upper bound of the height of a missing cloud would be 0.[This is 
akin to Lorenz's definition of the potential temperature isotherms as 
coinciding with the ground in his discussion of available potential energy.]


By the way, I couldn't find this variable in the current release of the 
CMIP6 data request.  Is it there?  If not, could you say a bit more 
about how the bins are defined?  Are they height or pressure bins?


thanks,
Karl

On 10/11/16 5:41 AM, martin.juc...@stfc.ac.uk wrote:

Hello,

the CF standard name list has two "histogram_ " entries, and in the CMIP6 data 
request we may need to add a third, a histogram_of_cloud_top_height. Besides the standard name, we 
also need, for this new variable, a method of encoding the "missing data" bin in the 
histogram. That is, the histogram should record frequency in 16 data bins and one additional bin 
for the frequency of missing data.

Can we define a "missing_data_index" attribute for histogram variables, and use 
this to indicate that the first bin in the array has this special purpose. It might be 
more pythonic to put the _FillValue in the coordinate value for the missing data bin, but 
I suspect that this would cause substantial problems for many software packages.

regards,
Martin
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


[CF-metadata] Missing data bins in histograms

2016-10-11 Thread Jonathan Gregory
Dear Martin

I feel there would be an advantage in flexibility, by not requiring the missing
data count to be the first bin necessarily. The new attribute could indicate
the index of the bin which contains the missing data count. I suggest that
this would be an attribute of the coordinate variable of the histogram for the
quantity which is binned (cloud top height), since the index refers to that
dimension specifically.

I agree it would be neat if it were possible instead to put _FillValue in the
coordinate variable. Actually _FillValue is not allowed in coordinate vars by
CF, so as far as CF is concerned it would not be a problem to adopt this as a
new convention. But maybe software would have problems with it. If we need the
new attribute, I'd suggest missing_value_index, to make it more similar to
missing_value and _FillValue. What would you put in the coordinate and bounds
for the missing data bin?

In any case, this needs a new convention to be proposed as a trac ticket.

Best wishes

Jonathan

- Forwarded message from martin.juc...@stfc.ac.uk -

> Date: Tue, 11 Oct 2016 12:41:21 +
> From: martin.juc...@stfc.ac.uk
> To: cf-metadata@cgd.ucar.edu
> CC: rojma...@u.washington.edu
> Subject: [CF-metadata] Missing data bins in histograms
> 
> Hello,
> 
> the CF standard name list has two "histogram_ " entries, and in the CMIP6 
> data request we may need to add a third, a histogram_of_cloud_top_height. 
> Besides the standard name, we also need, for this new variable, a method of 
> encoding the "missing data" bin in the histogram. That is, the histogram 
> should record frequency in 16 data bins and one additional bin for the 
> frequency of missing data.
> 
> Can we define a "missing_data_index" attribute for histogram variables, and 
> use this to indicate that the first bin in the array has this special 
> purpose. It might be more pythonic to put the _FillValue in the coordinate 
> value for the missing data bin, but I suspect that this would cause 
> substantial problems for many software packages.
> 
> regards,
> Martin
> ___
> CF-metadata mailing list
> CF-metadata@cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

- End forwarded message -
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


[CF-metadata] Missing data bins in histograms

2016-10-11 Thread martin.juckes
Hello,

the CF standard name list has two "histogram_ " entries, and in the CMIP6 
data request we may need to add a third, a histogram_of_cloud_top_height. 
Besides the standard name, we also need, for this new variable, a method of 
encoding the "missing data" bin in the histogram. That is, the histogram should 
record frequency in 16 data bins and one additional bin for the frequency of 
missing data.

Can we define a "missing_data_index" attribute for histogram variables, and use 
this to indicate that the first bin in the array has this special purpose. It 
might be more pythonic to put the _FillValue in the coordinate value for the 
missing data bin, but I suspect that this would cause substantial problems for 
many software packages.

regards,
Martin
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata