Re: [CF-metadata] Usage of histogram_of_X_over_Z

2016-10-13 Thread Bodas-Salcedo, Alejandro
Dear Martin,

You are right, those definitions are not correct.

> From your reply I understand now that these are univariate distributions 
> giving the
> frequency of different radar reflectivities in different height bands. Coming 
> from
> radar/lidar instruments (or an emulator of these instruments), there are 
> multiple
> observations in each GCM-scale height band. Presumably, there are also 
> multiple
> profiles in the GCM-scale grid square, so that we have a frequency 
> distribution over
> sub-grid scale variability in the vertical and the horizontal? Or is it 
> actually evaluated
> at a spatial point?
>
There is a sub-grid distribution of vertical profiles from which they are 
constructed.

The definition that you propose seems accurate to me. Thanks again for your 
time spent clarifying this.

Regards,

Alejandro

> -Original Message-
> From: CF-metadata [mailto:cf-metadata-boun...@cgd.ucar.edu] On Behalf Of
> martin.juc...@stfc.ac.uk
> Sent: 13 October 2016 13:05
> To: cf-metadata@cgd.ucar.edu
> Subject: [CF-metadata] Usage of histogram_of_X_over_Z
> 
> Dear Alejandro,
> 
> The two CMIP variables which I'm talking about are cfadDbze94 currently 
> defined
> as "CFAD (Cloud Frequency Altitude Diagrams) are joint height - radar 
> reflectivity
> (or lidar scattering ratio) distributions." and cfadLidarsr532, which has the 
> same
> definition. If they are not joint distributions we clearly have a problem 
> with these
> definitions.
> 
> From your reply I understand now that these are univariate distributions 
> giving the
> frequency of different radar reflectivities in different height bands. Coming 
> from
> radar/lidar instruments (or an emulator of these instruments), there are 
> multiple
> observations in each GCM-scale height band. Presumably, there are also 
> multiple
> profiles in the GCM-scale grid square, so that we have a frequency 
> distribution over
> sub-grid scale variability in the vertical and the horizontal? Or is it 
> actually evaluated
> at a spatial point?
> 
> If this is the case, you are right and we just need to correct the 
> definitions in the
> CMIP tables (though there is still a case for introducing a 
> frequencs_distribution for
> other variables, but that should ne another thread). I would favour a 
> slightly more
> verbose and explicit definition, e.g.
> "CFAD (Cloud Frequency Altitude Diagrams) are frequency distributions of radar
> reflectivity (or lidar scattering ratio) as a function of altitude. 
> cfadDbze94 is defined
> as the simulated relative frequency of radar reflectivity in sampling volumes 
> defined
> by altitude bins and model grid cells."
> 
> Note that I'm using "altitude" rather than "height" to match the standard 
> names: in
> the CF Convention, "altitude" means height above the geoid, and "height" means
> height above the surface.
> 
> Is that an accurate definition?
> 
> regards,
> Martin
> 
> 
> Dear Martin,
> 
> Thanks for your detailed explanation. I'd like to add a bit more information. 
> These
> variables are not joint distributions, they are 1D distributions for 
> different ranges of Z.
> The question is, does "histogram_of_X[_over_Z]" mean that the Z coordinate 
> has to
> be completely collapsed? It is not clear to that the current definition 
> implies that. If Z
> is not completely collapsed, you can then end up with a function of the form
> frequency(lat,lon,X,Z2), where the coordinate Z is only partially collapsed 
> into bins
> described by Z2. I'm using here Z2 to explicitly show when the Z coordinate
> represents bins. This would look like a joint histogram, but it is not. I 
> think that your
> proposal of dropping "_over_Z" from the standard name works for a joint
> distribution, but not for a collection of 1D distributions along Z, unless 
> there is a way
> of distinguishing between both cases with the use of attributes.
> 
> Another detail is that these histograms provide relative frequencies (values 
> between
> 0 and 1, not counts), not absolute frequencies. Is that inconsistent with the 
> current
> definition of histogram in CF?
> 
> Regards,
> 
> Alejandro
> 
> > -Original Message-
> > From: martin.juckes at
> stfc.ac.uk
> [mailto:martin.juckes at 
> stfc.ac.uk metadata>]
> > Sent: 12 October 2016 19:05
> > To: cf-metadata at 
> > cgd.ucar.edu metadata>
> > Cc: Bodas-Salcedo, Alejandro
> > Subject: Usage of histogram_of_X_over_Z
> >
> > Hello,
> >
> > There are two standard names of the form histogram_of_. in the CF 
> > Standard
> > Name list (at version 36):
> > histogram_of_backscattering_ratio_over_height_above_reference_ellipsoid and
> >
> histogram_of_equivalent_reflectivity_factor_over_height_above_reference_ellipsoid
> > . Both of these where used in CMIP5 and set to be used in CMIP6, but the 
> > usage
> > does not 

[CF-metadata] Usage of histogram_of_X_over_Z

2016-10-13 Thread martin.juckes
Dear Alejandro,

The two CMIP variables which I'm talking about are cfadDbze94 currently defined 
as "CFAD (Cloud Frequency Altitude Diagrams) are joint height - radar 
reflectivity (or lidar scattering ratio) distributions." and cfadLidarsr532, 
which has the same definition. If they are not joint distributions we clearly 
have a problem with these definitions.

>From your reply I understand now that these are univariate distributions 
>giving the frequency of different radar reflectivities in different height 
>bands. Coming from radar/lidar instruments (or an emulator of these 
>instruments), there are multiple observations in each GCM-scale height band. 
>Presumably, there are also multiple profiles in the GCM-scale grid square, so 
>that we have a frequency distribution over sub-grid scale variability in the 
>vertical and the horizontal? Or is it actually evaluated at a spatial point?

If this is the case, you are right and we just need to correct the definitions 
in the CMIP tables (though there is still a case for introducing a 
frequencs_distribution for other variables, but that should ne another thread). 
I would favour a slightly more verbose and explicit definition, e.g.
"CFAD (Cloud Frequency Altitude Diagrams) are frequency distributions of radar 
reflectivity (or lidar scattering ratio) as a function of altitude. cfadDbze94 
is defined as the simulated relative frequency of radar reflectivity in 
sampling volumes defined by altitude bins and model grid cells."

Note that I'm using "altitude" rather than "height" to match the standard 
names: in the CF Convention, "altitude" means height above the geoid, and 
"height" means height above the surface.

Is that an accurate definition?

regards,
Martin


Dear Martin,

Thanks for your detailed explanation. I'd like to add a bit more information. 
These variables are not joint distributions, they are 1D distributions for 
different ranges of Z. The question is, does "histogram_of_X[_over_Z]" mean 
that the Z coordinate has to be completely collapsed? It is not clear to that 
the current definition implies that. If Z is not completely collapsed, you can 
then end up with a function of the form frequency(lat,lon,X,Z2), where the 
coordinate Z is only partially collapsed into bins described by Z2. I'm using 
here Z2 to explicitly show when the Z coordinate represents bins. This would 
look like a joint histogram, but it is not. I think that your proposal of 
dropping "_over_Z" from the standard name works for a joint distribution, but 
not for a collection of 1D distributions along Z, unless there is a way of 
distinguishing between both cases with the use of attributes.

Another detail is that these histograms provide relative frequencies (values 
between 0 and 1, not counts), not absolute frequencies. Is that inconsistent 
with the current definition of histogram in CF?

Regards,

Alejandro

> -Original Message-
> From: martin.juckes at 
> stfc.ac.uk 
> [mailto:martin.juckes at 
> stfc.ac.uk]
> Sent: 12 October 2016 19:05
> To: cf-metadata at 
> cgd.ucar.edu
> Cc: Bodas-Salcedo, Alejandro
> Subject: Usage of histogram_of_X_over_Z
>
> Hello,
>
> There are two standard names of the form histogram_of_. in the CF Standard
> Name list (at version 36):
> histogram_of_backscattering_ratio_over_height_above_reference_ellipsoid and
> histogram_of_equivalent_reflectivity_factor_over_height_above_reference_ellipsoid
> . Both of these where used in CMIP5 and set to be used in CMIP6, but the usage
> does not appear to match the standard name desecriptions.
>
> The possible confusion is over the role of different coordinates. The CF 
> definitions
> say ''"histogram_of_X[_over_Z]" means histogram (i.e. number of counts for 
> each
> range of X) of variations (over Z) of X.' This implies to me that you start 
> with a
> function of Z and possibly other coordinates and end up with a function of X 
> and the
> other coordinates. E.g. if the source data is X(lat,lon,Z), then the 
> histogram data will
> be of the form frequency(lat,lon,X).
>
> In the two CMIP5/CMIP6 draft variables (cfadLidarsr532, cfadDbze94) using 
> these
> standard names the "Z" coordinate  which is included in the standard name
> ("height_above_reference_ellipsoid") is one of the coordinates of the 
> histogram data
> variable. Both these variables appear to be joint distributions (frequency of 
> X and Y
> values) over sub-grid variability as a function of latitude, longitude and 
> time.
>
> I've been reviewing these existing definitions in some detail because there 
> are some
> new distribution variables in the request and I'd like to make sure that we 
> have a
> consistent approach.
>
> If we need to described a variable which carries a joint distribution of X 
> and Y, then
> the variable will have to 

Re: [CF-metadata] Missing data bins in histograms

2016-10-13 Thread Jonathan Gregory
Dear Jim

In Appendix A it does not say that the flag attributes are allowed for
coordinate variables - it has just "D" in the "Use" column. This is not an
argument why they shouldn't be if there is a need, but they weren't introduced
with that in mind. The use which you suggested for Martin's case is a good
idea, but I think it would need a change to the convention.

Best wishes

Jonathan

- Forwarded message from Jim Biard  -

> Date: Wed, 12 Oct 2016 14:58:11 -0400
> From: Jim Biard 
> To: cf-metadata@cgd.ucar.edu
> Subject: Re: [CF-metadata] Missing data bins in histograms
> User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)
>   Gecko/20100101 Thunderbird/45.4.0
> 
> Jonathan,
> 
> Missing/fill values are not allowed, but I don't see any language
> prohibiting flags. I'd appreciate it if you could expand on your
> thoughts about why they aren't allowed.
> 
> Grace and peace,
> 
> Jim
> On 10/12/16 1:30 PM, Jonathan Gregory wrote:
> >Dear Jim
> >
> >That is an ingenious idea. I don't think the flag atts are currently allowed
> >for coord variables, but they could be, I agree.
> >
> >Best wishes
> >
> >Jonathan
> >
> >- Forwarded message from Jim Biard  -
> >
> >>Date: Tue, 11 Oct 2016 14:39:56 -0400
> >>From: Jim Biard 
> >>To: cf-metadata@cgd.ucar.edu
> >>Subject: Re: [CF-metadata] Missing data bins in histograms
> >>User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)
> >>Gecko/20100101 Thunderbird/45.4.0
> >>
> >>Hi.
> >>
> >>Another approach could be to use flag_values and flag_meanings on
> >>the coordinate variable to indicate one or more special coordinate
> >>values that correspond to any number of "missing data" or "out of
> >>bounds" bins. These attributes aren't forbidden by CF, and
> >>everything should be fine as long as the coordinate variable remains
> >>monotonic.
> >>
> >>Grace and peace,
> >>
> >>Jim
> >>
> >>On 10/11/16 8:41 AM, martin.juc...@stfc.ac.uk wrote:
> >>>Hello,
> >>>
> >>>the CF standard name list has two "histogram_ " entries, and in the 
> >>>CMIP6 data request we may need to add a third, a 
> >>>histogram_of_cloud_top_height. Besides the standard name, we also need, 
> >>>for this new variable, a method of encoding the "missing data" bin in the 
> >>>histogram. That is, the histogram should record frequency in 16 data bins 
> >>>and one additional bin for the frequency of missing data.
> >>>
> >>>Can we define a "missing_data_index" attribute for histogram variables, 
> >>>and use this to indicate that the first bin in the array has this special 
> >>>purpose. It might be more pythonic to put the _FillValue in the coordinate 
> >>>value for the missing data bin, but I suspect that this would cause 
> >>>substantial problems for many software packages.
> >>>
> >>>regards,
> >>>Martin
> >>>___
> >>>CF-metadata mailing list
> >>>CF-metadata@cgd.ucar.edu
> >>>http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> >>-- 
> >>CICS-NC  Visit us on
> >>Facebook    *Jim Biard*
> >>*Research Scholar*
> >>Cooperative Institute for Climate and Satellites NC 
> >>North Carolina State University 
> >>NOAA National Centers for Environmental Information 
> >>/formerly NOAA’s National Climatic Data Center/
> >>151 Patton Ave, Asheville, NC 28801
> >>e: jbi...@cicsnc.org 
> >>o: +1 828 271 4900
> >>
> >>/Connect with us on Facebook for climate
> >> and ocean and geophysics
> >> information, and follow
> >>us on Twitter at @NOAANCEIclimate
> >> and @NOAANCEIocngeo
> >>. /
> >>
> >>
> >>___
> >>CF-metadata mailing list
> >>CF-metadata@cgd.ucar.edu
> >>http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> >
> >- End forwarded message -
> >___
> >CF-metadata mailing list
> >CF-metadata@cgd.ucar.edu
> >http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
> 
> -- 
> CICS-NC  Visit us on
> Facebook  *Jim Biard*
> *Research Scholar*
> Cooperative Institute for Climate and Satellites NC 
> North Carolina State University 
> NOAA National Centers for Environmental Information 
> /formerly NOAA’s National Climatic Data Center/
> 151 Patton Ave, Asheville, NC 28801
> e: jbi...@cicsnc.org 
> o: +1 828 271 4900
> 
> /Connect with us on Facebook for climate
>  and ocean and geophysics
>  

[CF-metadata] Missing data bins in histograms

2016-10-13 Thread Jonathan Gregory
Dear Martin

Ah, OK, thanks. I must have misunderstood.

Best wishes

Jonathan

- Forwarded message from martin.juc...@stfc.ac.uk -

> Date: Thu, 13 Oct 2016 08:20:56 +
> From: martin.juc...@stfc.ac.uk
> To: cf-metadata@cgd.ucar.edu
> Subject: [CF-metadata]  Missing data bins in histograms
> 
> Dear Jonathan,
> 
> I'm sorry I didn't respond on the point about it being the first bin: I had 
> not intended the special value to be restricted to the first bin, so I guess 
> there is something ambiguous in my intial formulation which is giving this 
> impression. I agree that we should formulate any extension so that it can 
> apply to any bin, and I also think it should be possible to label multiple 
> bins in this way.
> 
> regards,
> Martin
> ___
> CF-metadata mailing list
> CF-metadata@cgd.ucar.edu
> http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


[CF-metadata] Missing data bins in histograms

2016-10-13 Thread martin.juckes
Dear Jonathan,

I'm sorry I didn't respond on the point about it being the first bin: I had not 
intended the special value to be restricted to the first bin, so I guess there 
is something ambiguous in my intial formulation which is giving this 
impression. I agree that we should formulate any extension so that it can apply 
to any bin, and I also think it should be possible to label multiple bins in 
this way.

regards,
Martin
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata


Re: [CF-metadata] Usage of histogram_of_X_over_Z

2016-10-13 Thread Bodas-Salcedo, Alejandro
Dear Martin,

Thanks for your detailed explanation. I'd like to add a bit more information. 
These variables are not joint distributions, they are 1D distributions for 
different ranges of Z. The question is, does "histogram_of_X[_over_Z]" mean 
that the Z coordinate has to be completely collapsed? It is not clear to that 
the current definition implies that. If Z is not completely collapsed, you can 
then end up with a function of the form frequency(lat,lon,X,Z2), where the 
coordinate Z is only partially collapsed into bins described by Z2. I'm using 
here Z2 to explicitly show when the Z coordinate represents bins. This would 
look like a joint histogram, but it is not. I think that your proposal of 
dropping "_over_Z" from the standard name works for a joint distribution, but 
not for a collection of 1D distributions along Z, unless there is a way of 
distinguishing between both cases with the use of attributes.

Another detail is that these histograms provide relative frequencies (values 
between 0 and 1, not counts), not absolute frequencies. Is that inconsistent 
with the current definition of histogram in CF?

Regards,

Alejandro

> -Original Message-
> From: martin.juc...@stfc.ac.uk [mailto:martin.juc...@stfc.ac.uk]
> Sent: 12 October 2016 19:05
> To: cf-metadata@cgd.ucar.edu
> Cc: Bodas-Salcedo, Alejandro
> Subject: Usage of histogram_of_X_over_Z
> 
> Hello,
> 
> There are two standard names of the form histogram_of_. in the CF Standard
> Name list (at version 36):
> histogram_of_backscattering_ratio_over_height_above_reference_ellipsoid and
> histogram_of_equivalent_reflectivity_factor_over_height_above_reference_ellipsoid
> . Both of these where used in CMIP5 and set to be used in CMIP6, but the usage
> does not appear to match the standard name desecriptions.
> 
> The possible confusion is over the role of different coordinates. The CF 
> definitions
> say ''"histogram_of_X[_over_Z]" means histogram (i.e. number of counts for 
> each
> range of X) of variations (over Z) of X.' This implies to me that you start 
> with a
> function of Z and possibly other coordinates and end up with a function of X 
> and the
> other coordinates. E.g. if the source data is X(lat,lon,Z), then the 
> histogram data will
> be of the form frequency(lat,lon,X).
> 
> In the two CMIP5/CMIP6 draft variables (cfadLidarsr532, cfadDbze94) using 
> these
> standard names the "Z" coordinate  which is included in the standard name
> ("height_above_reference_ellipsoid") is one of the coordinates of the 
> histogram data
> variable. Both these variables appear to be joint distributions (frequency of 
> X and Y
> values) over sub-grid variability as a function of latitude, longitude and 
> time.
> 
> I've been reviewing these existing definitions in some detail because there 
> are some
> new distribution variables in the request and I'd like to make sure that we 
> have a
> consistent approach.
> 
> If we need to described a variable which carries a joint distribution of X 
> and Y, then
> the variable will have to use X and Y as coordinates, so perhaps we can 
> simplify the
> process by leaving them out of the standard name. Similarly the "over_Z" part 
> of the
> name would be better expressed as a cell_methods construct. This line of 
> reasoning
> suggests using a new standard name such as "frequency_distribution" (units 
> "1").
> The only difficulty is that the frequency distribution might be a function of 
> the
> quantities X and Y (scattering ratio and cloud top height for cfadLidarsr532) 
> and also
> of latitude, longitude and time. There should be some way of distinguishing 
> the
> different roles of these 5 coordinates: is is the distribution of X and Y as 
> a function of
> latitude, longitude and time. I think this could be done conveniently by 
> introducing a
> single new attribute, e.g. "bin_coords: X Y".
> 
> "frequency_distribution" could be used for single or joint distributions.
> 
> My questions to the list are:
> (1) am I missing something in my interpretation of the existing 
> histogram_of_...
> names?
> (2) if not, is the adoption of a "frequency_distribution" standard name an 
> appropriate
> way forward?
> 
> regards,
> Martin
> 
> regards,
> Martin
___
CF-metadata mailing list
CF-metadata@cgd.ucar.edu
http://mailman.cgd.ucar.edu/mailman/listinfo/cf-metadata