Hi Hannes,
octave-forge mailing list is the right place for
bugs and patches on the statistics package

-------- Original Message --------
Subject: Re: Problems with Statistics package
Date: Wed, 30 May 2012 11:06:42 +0200
From: Hannes <h2+lists2...@fsfe.org>
To: help-oct...@octave.org

Ok, I think I got it pinned downed and solved. I have attached a patch.

Heres the explenation:

  total_mean = mean (y(:));
  SSB = sum (group_count .* (group_mean - total_mean) .^ 2);

is always positive [sum of squares between]

  SST = sumsq (reshape (y, n, 1) - total_mean);

is always positive and >= SSB [sum of squares total]

  SSW = SST - SSB;

=> should always be >=0
but it is an ill conditioned operation if SST==SSB resulting in
possibly negative error. Note that if the error is positive, it
doesn't matter because it will produce f == Inf and p = 0 as expected.

  df_b = k - 1;
  df_w = n - k;
  v_b = SSB / df_b;
  v_w = SSW / df_w;

this can become negative, if the SSW is negative or 0 if SSW is zero
or close to zero.


  f = v_b / v_w;

this results in divide by zero if v_w is zero and results in *wrong
result without warning* if v_w is negative.


My patch changes this whole block to:

   SSB = sum (group_count .* (group_mean - total_mean) .^ 2);
   SST = sumsq (reshape (y, n, 1) - total_mean);
   SSW = SST - SSB;
   if (SSW < 0) # machine error if SSB == SSW
     SSW = 0;
   endif
   df_b = k - 1;
   df_w = n - k;
   v_b = SSB / df_b;
   v_w = SSW / df_w;
   if (v_w != 0)
     f = v_b / v_w;
     pval = 1 - fcdf (f, df_b, df_w);
   elseif (v_b == 0)
     f = NaN;
     pval = 1;

0 / 0 is  NaN and the hypothesis mus be rejected

   else
     f = Inf;
     pval = 0;

x / 0 is Inf and hypothesis must be accepted (because there is no
variance inside the sample but there is outside

   endif

HTH,
Hannes
--- /usr/share/octave/3.6.1/m/statistics/tests/anova.m	2012-04-27 01:06:51.000000000 +0800
+++ anova.m	2012-05-30 16:49:20.000000000 +0800
@@ -83,12 +83,23 @@
   SSB = sum (group_count .* (group_mean - total_mean) .^ 2);
   SST = sumsq (reshape (y, n, 1) - total_mean);
   SSW = SST - SSB;
+  if (SSW < 0) # machine error if SSB == SSW
+    SSW = 0;
+  endif
   df_b = k - 1;
   df_w = n - k;
   v_b = SSB / df_b;
   v_w = SSW / df_w;
-  f = v_b / v_w;
-  pval = 1 - fcdf (f, df_b, df_w);
+  if (v_w != 0)
+    f = v_b / v_w;
+    pval = 1 - fcdf (f, df_b, df_w);
+  elseif (v_b == 0)
+    f = NaN;
+    pval = 1;
+  else
+    f = Inf;
+    pval = 0;
+  endif
 
   if (nargout == 0)
     ## This eventually needs to be done more cleanly ...

_______________________________________________
Help-octave mailing list
help-oct...@octave.org
https://mailman.cae.wisc.edu/listinfo/help-octave

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Octave-dev mailing list
Octave-dev@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/octave-dev

Reply via email to