-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jaroslav Hajek wrote: > On Fri, Mar 6, 2009 at 9:25 AM, Alois Schlögl <alois.schlo...@tugraz.at> > wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Jaroslav Hajek wrote: >>> On Fri, Mar 6, 2009 at 8:09 AM, Alois Schlögl <alois.schlo...@tugraz.at> >>> wrote: >>>> -----BEGIN PGP SIGNED MESSAGE----- >>>> Hash: SHA1 >>>> >>>> Jaroslav Hajek wrote: >>>>> On Thu, Mar 5, 2009 at 4:04 PM, Alois Schlögl <alois.schlo...@tugraz.at> >>>>> wrote: >>>>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>>> Hash: SHA1 >>>>>> >>>>>> Jaroslav Hajek wrote: >>>>>>> On Thu, Mar 5, 2009 at 12:02 PM, Alois Schlögl >>>>>>> <alois.schlo...@tugraz.at> wrote: >>>>>>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>>>>> Hash: SHA1 >>>>>>>> >>>>>>>> Jaroslav Hajek wrote: >>>>>>>>>> sumskipnan counts also the number of non-NaNs. >>>>>>>>>> [s,c]=sumskipnan(...) >>>>>>>>>> >>>>>>>>>> computing both s and c in a single step is beneficial for estimating >>>>>>>>>> mean, variance and other statistics. >>>>>>>>>> >>>>>>>>> well, you can do >>>>>>>>> >>>>>>>>> nans = isnan (x); >>>>>>>>> x(nans) = 0; >>>>>>>>> s = sum (x, dim); >>>>>>>>> c = size (x, dim) - sum (nans); >>>>>>>>> >>>>>>>>> Not exactly as fast as doing it all in a single loop, but simplistic. >>>>>>>> I guess, you meant >>>>>>>> c = size (x, dim) - sum (nans,dim); >>>>>>>> >>>>>>>> In terms of simplicity, >>>>>>>> [s,c]=sumskipnan(x,dim); >>>>>>>> will win. >>>>>>>> >>>>>>> Depends on what you count in. I wrote the first from top of my head, >>>>>>> whereas for the second I'd need to look up the syntax. But I don't >>>>>>> have any fundamental objections against the existence of sumskipnan, >>>>>>> of course. >>>>>> Fine. >>>>>> >>>>>>>>>>> Besides, I think the fact that the NaN package shadows Octave's >>>>>>>>>>> built-in functions is very dangerous and confusing, even though I >>>>>>>>>>> understand the motivation. I think this package should not be >>>>>>>>>>> installed by default. >>>>>>>>>> Where do you see a danger ? Please explain. >>>>>>>>>> >>>>>>>>> It seems that sometimes users (especially windows users) get this >>>>>>>>> package unknowingly loaded. Not that this is your fault, just that it >>>>>>>>> probably shouldn't be on by default in distributions. >>>>>>>>> >>>>>>>>> The more painful issue is that it makes the package less attractive to >>>>>>>>> use - for instance, if I want to use the nanmean function to get >>>>>>>>> nan-free mean, but I *don't* want the built-in mean to be shadowed >>>>>>>>> (because the replacement is slower). >>>>>>>> Therefore, it would be nice to have a pre-compiled sumskipnan that >>>>>>>> limits the performance hit. And their is certainly room for further >>>>>>>> improvement. >>>>>>> I don't want to limit it. I just don't want it to be there. I would >>>>>>> like to be able to use *both* nanmean and the default mean at the same >>>>>>> time. >>>>>> And there are many others, like me for example, that do not want to >>>>>> think about, whether nanmean or mean is the proper function for a >>>>>> specific problem. >>>>>> >>>>>> In case there are no NaN's, both yield the same result. >>>>>> In the presence of NaN's, the default mean results in NaN, while a >>>>>> perfectly valid result could be obtained. >>>>>> >>>>>> Or can You think of any reasonable problem, when mean should propagate >>>>>> the NaN's ? I can not. Consequently, there is no need to have both >>>>>> nanmean and mean. >>>>>> >>>>> Just like Soren said, in most cases where NaN does not represent a >>>>> missing value. >>>> It statistics nobody is asking what the meaning of the NaN is. Ignoring >>>> NaN is just the right thing to do. >>>> >>>> Again, I'm just talking about statistical functions, and do not >>>> generalize this to other areas. >>>> >>> That's OK. But I may want to use both "statistical" mean and >>> "non-statistical" in totally different areas of a single computation. >> >> Do you really have a case where you want the mean estimation to behave >> differently than the statistical mean ? That is, were NaN's should be >> propagated ? >> > > You just think too statistically of the mean. I may well use "mean" > just for it's mathematical definition, that is, sum divided by count, > completely unrelated to any statistics. For instance, to calculate the > centroid of a simplex. In that case, skipping NaNs is a complete > nonsense because it will give silently a wrong result.
Fair example. This example requires some explicit handling of NaN's. Lets look at the case that raises an error: c = mean(x); if any(isnan(c)) error(); end; With the skippingNaN-mean() you do if any(isnan(x(:)) error(); end c=mean(x); In both cases you need somethink to do about the NaN's e.g. some error handling. Except for the performance issue, there is no disadvantage in using the nanskipping-mean(). And one could also imagine to address the performance issue by a change in the interface (e.g. by raising a global flag) [c,N]=mean(x); if flag_nans_occured(), error(); end Actually, flag_nans_occured() is now supported. - --- You might consider it an advantage, that you can do the error checking much later, e.g. c = mean(x); d = do_some_more(c); if any(isnan(d)) error(); end; However, this makes reading the code and finding the error more difficult. Because, one can not easily see which step is causing the NaN. > >> I'm asking because in 15+ years of using Matlab and Octave, I've never >> found such a case. Maybe I can learn something new. > > See above. > >> Even in case, NaN propagation is desired, I guess I'd prefer to have an >> explicit check for NaN's in order to emphasize that special case and >> make the code more readable. Again, I've never come across a case were I >> needed the mean to propagate NaN's. >> > > Same thing - you're just used to skipping NaNs in mean, others may not be. Yes, currently we have two different approaches. That's good for comparing both approaches. I understand also that there is resistance to changes - that's just the way it is, and its good because it provides a rather stable system. However, this resistance should not stop one from adapting new/better approaches once the advantages of the new approach become clear. > >>> But the different NaN treatment is not actually that bad, I doubt >>> anyone would notice (the performance hit may be noticeable, but it is >>> also unlikely). >> I'm aware that the performance hit might be a disadvantage in using the >> NaN-toolbox (although the benchmark tests have not been widely applied). >> I guess its the major obstacle for a more widely application. >> > > I can't judge that. Maybe most people are fine with it. In any case, > I'm certainly free to not use the package if I don't like what it > does. Besides, the functionality I was asking for (i.e. nanmean > without shadowed mean) is provided by another package, so I just have > no problem. > >> On the other hand, you gain in terms of programming effort: >> (i) software is doing more often the right thing, > > depends... > >> (ii) its less likely to fail due to NaN-related issues. > > depends... > >> (iii) its more likely that users unaware of the NaN-issue get it right >> in the first place, > > and stay unaware... (if it's right, of course) > >> (iv) no need to think about whether nanmean or mean is the right function; >> (v) of course using always nanmean() would also do, but its nicer to >> write only mean(); >> > > I strongly prefer to have different syntax for functions doing different > things. > >> In my experience, these advantages outweigh the small performance >> penalty. These are also the reasons, why it was developed. Except for >> compatibility tests, I've never found a need to turn off the NaN-toolbox. >> > > Good for you :) > > cheers > Cheers, Alois -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkmxHBEACgkQzSlbmAlvEIhsfgCguFUSjwyJFat9M0dTJkzIhtxE G9UAoJTHujFRjJzobem77EBmi300tcX4 =y397 -----END PGP SIGNATURE----- ------------------------------------------------------------------------------ Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H _______________________________________________ Octave-dev mailing list Octave-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/octave-dev