søn, 08 03 2009 kl. 09:40 +0100, skrev Jaroslav Hajek: > 1. "use all" -> "all" etc - I think this is more Octavish
Agreed. > 2. covariances of zero-length vectors are returned as NA. covariances > of length 1 vectors are zero. Makes sense. > 3. vectorizing the "pairs" case was really tricky (due to NaN/Inf/NA > issues), but I think I got there in the end. I welcome testing. I tried the following: ## Create data data = rand (10, 2); na_data = data; na_data (6, 1) = na_data (7, 2) = NA; ## Compute covariances c1 = cov (na_data, "complete"); c2 = cov (na_data, "pairs"); I get c1 = 0.062607 0.042061 0.042061 0.081121 which seems right, but c2 = NaN NaN NaN NaN which doesn't really seem right. > PS. this shows that for "cov", the penalty incurred by NA handling is > nontrivial, especially for "pairs". Further, it is not clear which one > of "complete" or "pairs" should be the default. I actually think "all" should be default as this is the compatible behaviour. This is also what R does, so statisticians should be happy. [a couple of minutes later] On modern processors NaN (and hence NA) handling is really slow. So, just to get an idea of how this influences performance I did octave:20> data = rand (10000, 20); octave:21> na_data = data; na_data (6, 1) = na_data (7, 2) = NA; octave:22> tic, cov (data); toc Elapsed time is 0.0366599 seconds. octave:23> tic, cov (na_data); toc Elapsed time is 0.216626 seconds. octave:24> tic, cov (na_data, "complete"); toc Elapsed time is 0.055954 seconds. So, removing NA's actually speed up the computation, while providing a more sensible result. Of course, when NA's aren't present the cost of checking for NA's is present. Hmm, now I'm not sure about the default behaviour... > I think this and > Matlab/R compatibility sums up to just not care about missing values > by default. For consistency, we should probably do the same for mean, > std etc. > > Opinions? I think the most important point of this thread is that it seems reasonable/possible to skip NA's in statistical functions. So, I guess it makes sense to discuss doing this at the maintainers list to get a feel of the general opinion of doing this. Søren ------------------------------------------------------------------------------ Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H _______________________________________________ Octave-dev mailing list Octave-dev@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/octave-dev