Re: [CF-metadata] Missing data bins in histograms

2016-10-12 Thread Jim Biard

Jonathan,

Missing/fill values are not allowed, but I don't see any language 
prohibiting flags. I'd appreciate it if you could expand on your 
thoughts about why they aren't allowed.


Grace and peace,

Jim
On 10/12/16 1:30 PM, Jonathan Gregory wrote:

Dear Jim

That is an ingenious idea. I don't think the flag atts are currently allowed
for coord variables, but they could be, I agree.

Best wishes

Jonathan

- Forwarded message from Jim Biard  -


Date: Tue, 11 Oct 2016 14:39:56 -0400
From: Jim Biard 
To: cf-metadata@cgd.ucar.edu
Subject: Re: [CF-metadata] Missing data bins in histograms
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)
Gecko/20100101 Thunderbird/45.4.0

Hi.

Another approach could be to use flag_values and flag_meanings on
the coordinate variable to indicate one or more special coordinate
values that correspond to any number of "missing data" or "out of
bounds" bins. These attributes aren't forbidden by CF, and
everything should be fine as long as the coordinate variable remains
monotonic.

Grace and peace,

Jim

On 10/11/16 8:41 AM, martin.juc...@stfc.ac.uk wrote:

Hello,

the CF standard name list has two "histogram_ " entries, and in the CMIP6 data 
request we may need to add a third, a histogram_of_cloud_top_height. Besides the standard name, we 
also need, for this new variable, a method of encoding the "missing data" bin in the 
histogram. That is, the histogram should record frequency in 16 data bins and one additional bin 
for the frequency of missing data.

Can we define a "missing_data_index" attribute for histogram variables, and use 
this to indicate that the first bin in the array has this special purpose. It might be 
more pythonic to put the _FillValue in the coordinate value for the missing data bin, but 
I suspect that this would cause substantial problems for many software packages.

regards,
Martin
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

--
CICS-NC  Visit us on
Facebook  *Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC 
North Carolina State University 
NOAA National Centers for Environmental Information 
/formerly NOAA’s National Climatic Data Center/
151 Patton Ave, Asheville, NC 28801
e: jbi...@cicsnc.org 
o: +1 828 271 4900

/Connect with us on Facebook for climate
 and ocean and geophysics
 information, and follow
us on Twitter at @NOAANCEIclimate
 and @NOAANCEIocngeo
. /


___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


- End forwarded message -
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


--
CICS-NC  Visit us on
Facebook  *Jim Biard*
*Research Scholar*
Cooperative Institute for Climate and Satellites NC 
North Carolina State University 
NOAA National Centers for Environmental Information 
/formerly NOAA’s National Climatic Data Center/
151 Patton Ave, Asheville, NC 28801
e: jbi...@cicsnc.org 
o: +1 828 271 4900

/Connect with us on Facebook for climate 
 and ocean and geophysics 
 information, and follow us 
on Twitter at @NOAANCEIclimate  and 
@NOAANCEIocngeo . /



___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


[CF-metadata] Usage of histogram_of_X_over_Z

2016-10-12 Thread martin.juckes
Hello,

There are two standard names of the form histogram_of_. in the CF Standard 
Name list (at version 36): 
histogram_of_backscattering_ratio_over_height_above_reference_ellipsoid and 
histogram_of_equivalent_reflectivity_factor_over_height_above_reference_ellipsoid.
 Both of these where used in CMIP5 and set to be used in CMIP6, but the usage 
does not appear to match the standard name desecriptions. 

The possible confusion is over the role of different coordinates. The CF 
definitions say ''"histogram_of_X[_over_Z]" means histogram (i.e. number of 
counts for each range of X) of variations (over Z) of X.' This implies to me 
that you start with a function of Z and possibly other coordinates and end up 
with a function of X and the other coordinates. E.g. if the source data is 
X(lat,lon,Z), then the histogram data will be of the form frequency(lat,lon,X).

In the two CMIP5/CMIP6 draft variables (cfadLidarsr532, cfadDbze94) using these 
standard names the "Z" coordinate  which is included in the standard name 
("height_above_reference_ellipsoid") is one of the coordinates of the histogram 
data variable. Both these variables appear to be joint distributions (frequency 
of X and Y values) over sub-grid variability as a function of latitude, 
longitude and time. 

I've been reviewing these existing definitions in some detail because there are 
some new distribution variables in the request and I'd like to make sure that 
we have a consistent approach. 

If we need to described a variable which carries a joint distribution of X and 
Y, then the variable will have to use X and Y as coordinates, so perhaps we can 
simplify the process by leaving them out of the standard name. Similarly the 
"over_Z" part of the name would be better expressed as a cell_methods 
construct. This line of reasoning suggests using a new standard name such as 
"frequency_distribution" (units "1"). The only difficulty is that the frequency 
distribution might be a function of the quantities X and Y (scattering ratio 
and cloud top height for cfadLidarsr532) and also of latitude, longitude and 
time. There should be some way of distinguishing the different roles of these 5 
coordinates: is is the distribution of X and Y as a function of latitude, 
longitude and time. I think this could be done conveniently by introducing a 
single new attribute, e.g. "bin_coords: X Y".

"frequency_distribution" could be used for single or joint distributions.

My questions to the list are:
(1) am I missing something in my interpretation of the existing 
histogram_of_... names?
(2) if not, is the adoption of a "frequency_distribution" standard name an 
appropriate way forward?

regards,
Martin

regards,
Martin
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


[CF-metadata] Missing data bins in histograms

2016-10-12 Thread Jonathan Gregory
Dear Martin

I'm still uneasy about it having to be the first bin, in particular, or are
you not set on that? If it can be identified from the coordinate value by
flags, it could be any bin.

I believe that a change to convention would be needed to allow flag values
to be used with coordinates, unless we've already agreed that in some ticket.

Best wishes

Jonathan

- Forwarded message from martin.juc...@stfc.ac.uk -

> Date: Wed, 12 Oct 2016 17:14:38 +
> From: martin.juc...@stfc.ac.uk
> To: cf-metadata@cgd.ucar.edu
> Subject: [CF-metadata]  Missing data bins in histograms
> 
> Dear Karl, Jonathan, Jim,
> 
> thanks for those comments.
> 
> The CMIP6 variable in question is clmisr 
> (http://clipc-services.ceda.ac.uk/dreq/u/59151ed6-9e49-11e5-803c-0d0b866b59f3.html)
>  with a coordinatte of 16 altitude bins 
> (http://clipc-services.ceda.ac.uk/dreq/u/dim:alt16.html ).
> 
> I'd be happy with Jim's proposed solution, which does not need any change to 
> the convention, though it may be a bit cryptic: all the examples in the 
> convention are for cases in which all array values are intended to match one 
> of the flag_values. Having an array which is a mixture of flags and "normal" 
> values would be a new usage.  We could, perhaps, introduce a consistency 
> problem: ticket 151 (http://cf-trac.llnl.gov/trac/ticket/151) explains how, 
> for variables with standard_name "area_type", flag_values and flag_meanings 
> can be used to encode the data, in which case it is the "flag_meanings" which 
> match the requirements of the standard name. Here, on the other hand, we want 
> the special bin to be the exception which is not described by the standard 
> name (altitude). So .. perhaps it is simpler to introduce a new attribute 
> name?
> 
> Concerning Jonathan and Karl's comments, the idea of calling it a 
> "missing_value" was a mistake I made, but it actually refers to locations 
> where cloud is detected but the height of the cloud cannot be retrieved.
> 
> The current proposal is to have a value of 0.0 in the coordinate and 
> (-99000.0,0.0) in the bounds of the special value "bin". I imagine these need 
> to be present, but I think their values are not going to mean anything.
> 
> It is certainly possible to do as Karl suggests and place an explanation in 
> the variable description. Having the special status of the first bin 
> explicitly flagged in way which can be easily picked up by software brings 
> added value.
> 
> regards,
> Martin
> 
> ___
> CF-metadata mailing list
> CF-metadata@cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata

- End forwarded message -
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Missing data bins in histograms

2016-10-12 Thread Jonathan Gregory
Dear Jim

That is an ingenious idea. I don't think the flag atts are currently allowed
for coord variables, but they could be, I agree. 

Best wishes

Jonathan

- Forwarded message from Jim Biard  -

> Date: Tue, 11 Oct 2016 14:39:56 -0400
> From: Jim Biard 
> To: cf-metadata@cgd.ucar.edu
> Subject: Re: [CF-metadata] Missing data bins in histograms
> User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)
>   Gecko/20100101 Thunderbird/45.4.0
> 
> Hi.
> 
> Another approach could be to use flag_values and flag_meanings on
> the coordinate variable to indicate one or more special coordinate
> values that correspond to any number of "missing data" or "out of
> bounds" bins. These attributes aren't forbidden by CF, and
> everything should be fine as long as the coordinate variable remains
> monotonic.
> 
> Grace and peace,
> 
> Jim
> 
> On 10/11/16 8:41 AM, martin.juc...@stfc.ac.uk wrote:
> >Hello,
> >
> >the CF standard name list has two "histogram_ " entries, and in the 
> >CMIP6 data request we may need to add a third, a 
> >histogram_of_cloud_top_height. Besides the standard name, we also need, for 
> >this new variable, a method of encoding the "missing data" bin in the 
> >histogram. That is, the histogram should record frequency in 16 data bins 
> >and one additional bin for the frequency of missing data.
> >
> >Can we define a "missing_data_index" attribute for histogram variables, and 
> >use this to indicate that the first bin in the array has this special 
> >purpose. It might be more pythonic to put the _FillValue in the coordinate 
> >value for the missing data bin, but I suspect that this would cause 
> >substantial problems for many software packages.
> >
> >regards,
> >Martin
> >___
> >CF-metadata mailing list
> >CF-metadata@cgd.ucar.edu
> >http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> 
> -- 
> CICS-NC  Visit us on
> Facebook  *Jim Biard*
> *Research Scholar*
> Cooperative Institute for Climate and Satellites NC 
> North Carolina State University 
> NOAA National Centers for Environmental Information 
> /formerly NOAA’s National Climatic Data Center/
> 151 Patton Ave, Asheville, NC 28801
> e: jbi...@cicsnc.org 
> o: +1 828 271 4900
> 
> /Connect with us on Facebook for climate
>  and ocean and geophysics
>  information, and follow
> us on Twitter at @NOAANCEIclimate
>  and @NOAANCEIocngeo
> . /
> 
> 

> ___
> CF-metadata mailing list
> CF-metadata@cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


- End forwarded message -
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


[CF-metadata] Missing data bins in histograms

2016-10-12 Thread martin.juckes
Dear Karl, Jonathan, Jim,

thanks for those comments.

The CMIP6 variable in question is clmisr 
(http://clipc-services.ceda.ac.uk/dreq/u/59151ed6-9e49-11e5-803c-0d0b866b59f3.html)
 with a coordinatte of 16 altitude bins 
(http://clipc-services.ceda.ac.uk/dreq/u/dim:alt16.html ).

I'd be happy with Jim's proposed solution, which does not need any change to 
the convention, though it may be a bit cryptic: all the examples in the 
convention are for cases in which all array values are intended to match one of 
the flag_values. Having an array which is a mixture of flags and "normal" 
values would be a new usage.  We could, perhaps, introduce a consistency 
problem: ticket 151 (http://cf-trac.llnl.gov/trac/ticket/151) explains how, for 
variables with standard_name "area_type", flag_values and flag_meanings can be 
used to encode the data, in which case it is the "flag_meanings" which match 
the requirements of the standard name. Here, on the other hand, we want the 
special bin to be the exception which is not described by the standard name 
(altitude). So .. perhaps it is simpler to introduce a new attribute name?

Concerning Jonathan and Karl's comments, the idea of calling it a 
"missing_value" was a mistake I made, but it actually refers to locations where 
cloud is detected but the height of the cloud cannot be retrieved.

The current proposal is to have a value of 0.0 in the coordinate and 
(-99000.0,0.0) in the bounds of the special value "bin". I imagine these need 
to be present, but I think their values are not going to mean anything.

It is certainly possible to do as Karl suggests and place an explanation in the 
variable description. Having the special status of the first bin explicitly 
flagged in way which can be easily picked up by software brings added value.

regards,
Martin

___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata