-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jaroslav Hajek wrote:
> On Fri, Mar 6, 2009 at 9:25 AM, Alois Schlögl <alois.schlo...@tugraz.at> 
> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Jaroslav Hajek wrote:
>>> On Fri, Mar 6, 2009 at 8:09 AM, Alois Schlögl <alois.schlo...@tugraz.at> 
>>> wrote:
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>>
>>>> Jaroslav Hajek wrote:
>>>>> On Thu, Mar 5, 2009 at 4:04 PM, Alois Schlögl <alois.schlo...@tugraz.at> 
>>>>> wrote:
>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>>> Hash: SHA1
>>>>>>
>>>>>> Jaroslav Hajek wrote:
>>>>>>> On Thu, Mar 5, 2009 at 12:02 PM, Alois Schlögl 
>>>>>>> <alois.schlo...@tugraz.at> wrote:
>>>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>>>>> Hash: SHA1
>>>>>>>>
>>>>>>>> Jaroslav Hajek wrote:
>>>>>>>>>> sumskipnan counts also the number of non-NaNs.
>>>>>>>>>> [s,c]=sumskipnan(...)
>>>>>>>>>>
>>>>>>>>>> computing both s and c in a single step is beneficial for estimating
>>>>>>>>>> mean, variance and other statistics.
>>>>>>>>>>
>>>>>>>>> well, you can do
>>>>>>>>>
>>>>>>>>> nans = isnan (x);
>>>>>>>>> x(nans) = 0;
>>>>>>>>> s = sum (x, dim);
>>>>>>>>> c = size (x, dim) - sum (nans);
>>>>>>>>>
>>>>>>>>> Not exactly as fast as doing it all in a single loop, but simplistic.
>>>>>>>> I guess, you meant
>>>>>>>>    c = size (x, dim) - sum (nans,dim);
>>>>>>>>
>>>>>>>> In terms of simplicity,
>>>>>>>>       [s,c]=sumskipnan(x,dim);
>>>>>>>> will win.
>>>>>>>>
>>>>>>> Depends on what you count in. I wrote the first from top of my head,
>>>>>>> whereas for the second I'd need to look up the syntax. But I don't
>>>>>>> have any fundamental objections against the existence of sumskipnan,
>>>>>>> of course.
>>>>>> Fine.
>>>>>>
>>>>>>>>>>> Besides, I think the fact that the NaN package shadows Octave's
>>>>>>>>>>> built-in functions is very dangerous and confusing, even though I
>>>>>>>>>>> understand the motivation. I think this package should not be
>>>>>>>>>>> installed by default.
>>>>>>>>>> Where do you see a danger ? Please explain.
>>>>>>>>>>
>>>>>>>>> It seems that sometimes users (especially windows users) get this
>>>>>>>>> package unknowingly loaded. Not that this is your fault, just that it
>>>>>>>>> probably shouldn't be on by default in distributions.
>>>>>>>>>
>>>>>>>>> The more painful issue is that it makes the package less attractive to
>>>>>>>>> use - for instance, if I want to use the nanmean function to get
>>>>>>>>> nan-free mean, but I *don't* want the built-in mean to be shadowed
>>>>>>>>> (because the replacement is slower).
>>>>>>>> Therefore, it would be nice to have a pre-compiled sumskipnan that
>>>>>>>> limits the performance hit. And their is certainly room for further
>>>>>>>> improvement.
>>>>>>> I don't want to limit it. I just don't want it to be there. I would
>>>>>>> like to be able to use *both* nanmean and the default mean at the same
>>>>>>> time.
>>>>>> And there are many others, like me for example, that do not want to
>>>>>> think about, whether nanmean or mean is the proper function for a
>>>>>> specific problem.
>>>>>>
>>>>>> In case there are no NaN's, both yield the same result.
>>>>>> In the presence of NaN's, the default mean results in NaN, while a
>>>>>> perfectly valid result could be obtained.
>>>>>>
>>>>>> Or can You think of any reasonable problem, when mean should propagate
>>>>>> the NaN's ? I can not. Consequently, there is no need to have both
>>>>>> nanmean and mean.
>>>>>>
>>>>> Just like Soren said, in most cases where NaN does not represent a
>>>>> missing value.
>>>> It statistics nobody is asking what the meaning of the NaN is. Ignoring
>>>> NaN is just the right thing to do.
>>>>
>>>> Again, I'm just talking about statistical functions, and do not
>>>> generalize this to other areas.
>>>>
>>> That's OK. But I may want to use both "statistical" mean and
>>> "non-statistical" in totally different areas of a single computation.
>>
>> Do you really have a case where you want the mean estimation to behave
>> differently than the statistical mean ? That is, were NaN's should be
>> propagated ?
>>
> 
> You just think too statistically of the mean. I may well use "mean"
> just for it's mathematical definition, that is, sum divided by count,
> completely unrelated to any statistics. For instance, to calculate the
> centroid of a simplex. In that case, skipping NaNs is a complete
> nonsense because it will give silently a wrong result.


Fair example. This example requires some explicit handling of NaN's.
Lets look at the case that raises an error:

c = mean(x);
if any(isnan(c))
        error();        
end;


With the skippingNaN-mean() you do

if any(isnan(x(:))
        error();
end
c=mean(x);


In both cases you need somethink to do about the NaN's e.g. some error
handling. Except for the performance issue, there is no disadvantage in
using the nanskipping-mean().

And one could also imagine to address the performance issue by a change
in the interface (e.g. by raising a global flag)

[c,N]=mean(x);
if flag_nans_occured(),
        error();
end

Actually, flag_nans_occured() is now supported.


- ---

You might consider it an advantage, that you can do the error checking
much later, e.g.

c = mean(x);
d = do_some_more(c);
if any(isnan(d))
        error();        
end;

However, this makes reading the code and finding the error more
difficult. Because, one can not easily see which step is causing the NaN.


> 
>> I'm asking because in 15+ years of using Matlab and Octave, I've never
>> found such a case. Maybe I can learn something new.
> 
> See above.
> 
>> Even in case, NaN propagation is desired, I guess I'd prefer to have an
>> explicit check for NaN's in order to emphasize that special case and
>> make the code more readable. Again, I've never come across a case were I
>> needed the mean to propagate NaN's.
>>
> 
> Same thing - you're just used to skipping NaNs in mean, others may not be.


Yes, currently we have two different approaches. That's good for
comparing both approaches.

I understand also that there is resistance to changes - that's just the
way it is, and its good because it provides a rather stable system.
However, this resistance should not stop one from adapting new/better
approaches once the advantages of the new approach become clear.


> 
>>> But the different NaN treatment is not actually that bad, I doubt
>>> anyone would notice (the performance hit may be noticeable, but it is
>>> also unlikely).
>> I'm aware that the performance hit might be a disadvantage in using the
>> NaN-toolbox (although the benchmark tests have not been widely applied).
>>  I guess its the major obstacle for a more widely application.
>>
> 
> I can't judge that. Maybe most people are fine with it. In any case,
> I'm certainly free to not use the package if I don't like what it
> does. Besides, the functionality I was asking for (i.e. nanmean
> without shadowed mean) is provided by another package, so I just have
> no problem.
> 
>> On the other hand, you gain in terms of programming effort:
>> (i) software is doing more often the right thing,
> 
> depends...
> 
>> (ii) its less likely to fail due to NaN-related issues.
> 
> depends...
> 
>> (iii) its more likely that users unaware of the NaN-issue get it right
>> in the first place,
> 
> and stay unaware... (if it's right, of course)
> 
>> (iv) no need to think about whether nanmean or mean is the right function;
>> (v) of course using always nanmean() would also do, but its nicer to
>> write only mean();
>>
> 
> I strongly prefer to have different syntax for functions doing different 
> things.
> 
>> In my experience, these advantages outweigh the small performance
>> penalty. These are also the reasons, why it was developed. Except for
>> compatibility tests, I've never found a need to turn off the NaN-toolbox.
>>
> 
> Good for you :)
> 
> cheers
> 


Cheers,
  Alois

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkmxHBEACgkQzSlbmAlvEIhsfgCguFUSjwyJFat9M0dTJkzIhtxE
G9UAoJTHujFRjJzobem77EBmi300tcX4
=y397
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Octave-dev mailing list
Octave-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/octave-dev

Reply via email to