Bugs item #1585977, was opened at 2006-10-27 16:06
Message generated for change (Settings changed) made by warnes
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=453021&aid=1585977&group_id=48422

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
>Category: User Education
Group: None
>Status: Closed
>Resolution: Invalid
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
>Assigned to: Gregory Warnes (warnes)
Summary: some floating point values are being altered

Initial Comment:
I am using
        Python 2.3.5 (#2, Mar  6 2006, 10:12:24) [GCC 4.0.3 20060304 
(prerelease) (Debian 4.0.2-10)] on linux2
        I do not know which version of RPy I am using;  how can I find 
that information?
        I also do not know which version of R I am controlling with RPy;  
how can I find that?
        The manually-controlled R I'm using is R version 1.0 (v2004-10
-14) on Mac OSX 10.3.9.

In a nutshell, it appears that some floating point values are being 
altered, either on the way from Py to R or back.

I am calling the R function ks.test from RPy, and occasionally I get 
negative p-values, which is not possible.  When I perform the identical 
command in R manually, I do not get the same result.  Further 
investigation showed other cases where the answers were different, but 
I will stick with this example.

----

In R:

data844 = 
c(0.98326834745,0.98371832020,0.98390990837,0.98410972477,0.
98448770184,0.98459213417,0.98838426859,0.98853557083,0.988
76263763,0.98905877724,0.98989187920)
ks.test(data844,"punif")

output:

                One-sample Kolmogorov-Smirnov test

        data:  data844
        D = 0.9833, p-value = 1.158e-09
        alternative hypothesis: two.sided

----

In python:

from rpy import *
data844 = 
[0.98326834745,0.98371832020,0.98390990837,0.98410972477,0.9
8448770184,0.98459213417,0.98838426859,0.98853557083,0.9887
6263763,0.98905877724,0.98989187920]
r.ks_test(data844,"punif")

output:

        {'data.name': ['c(0.98326834745, 0.9837183202, 
0.98390990837, 0.98410972477, ', '0.98448770184,
                              0.98459213417, 0.98838426859, 
0.98853557083, 0.98876263763, ', '0.98905877724,                
              0.9898918792)'],
        'alternative': 'two.sided',
        'method': 'One-sample Kolmogorov-Smirnov test',
        'p.value': -2.2204460492503131e-16,                          <----- 
note difference from 1.158e-09
        'statistic': {'D': 0.98326834745000002}}

----

Note that my input vector contains highly precise values.  One guess is 
that the data going into R is truncated to less precision.  The problem 
with that theory is that, mathematically, we should never expect ks.test 
to produce a negative p-value.  But I can't vouch for the fact that R will 
never do that.

My second guess is that the result somehow gets mis-converted on the 
way back to python.  This would be acceptable if the sign didn't 
change.

Are there any simple tests you can suggest I do that would help me 
figure out where the error is introduced?  I read the manual section 
(manual version 0.3.3, not necessarily in sync with the installed version 
here) on type conversion and I didn't see any mention of floats other 
than with respect to NaN and list-vs-tuple.

I can live with some conversion error.  The problem I have right now is 
that it's difficult to trust the results I'm getting, so I'm trying to figure 
out where the problem is, if it's something I can correct for, or detect, 
or what have you.

Thanks for any help,
Bob H


----------------------------------------------------------------------

Comment By: Gregory Warnes (warnes)
Date: 2006-10-30 15:38

Message:
Logged In: YES 
user_id=9316

Hi,

ks.test returns an *object* of class 'htest'.  When you ask
R to print this object, it checks the value of the p.value
field before printing it.  If it is negative it prints the
"<" symbol.  For example 

> tmp = ks.test(data844,"punif")
> tmp

        One-sample Kolmogorov-Smirnov test

data:  data844 
D = 0.9833, p-value < 2.2e-16
alternative hypothesis: two.sided 

But looking at the actual object itself you see:

> str(tmp)
List of 5
 $ statistic  : Named num 0.983
  ..- attr(*, "names")= chr "D"
 $ p.value    : num -2.22e-16
 $ alternative: chr "two.sided"
 $ method     : chr "One-sample Kolmogorov-Smirnov test"
 $ data.name  : chr "data844"
 - attr(*, "class")= chr "htest"

and asking for just the p-value component gives

> tmp$p.value
[1] -2.220446e-16

So, you need to check for negative values in your program
code and handle them appropriately.

-Greg

----------------------------------------------------------------------

Comment By: Peter (maubp)
Date: 2006-10-30 14:31

Message:
Logged In: YES 
user_id=259020

Confirmed behaviour - but its not a bug in rpy!

I am using rpy 0.99.2 (if I recall correctly) with windows
XP, R 2.3.1, python 2.3

I tried several things including:

from rpy import *
data844 = [0.98326834745, 0.98371832020, 0.98390990837,
           0.98410972477, 0.98448770184, 0.98459213417,
           0.98838426859, 0.98853557083, 0.98876263763,
           0.98905877724, 0.98989187920]
print result["p.value"]

Giving:

-2.22044604925e-016

In R, as you say:

> ks.test(data844,"punif")

        One-sample Kolmogorov-Smirnov test

data:  data844 
D = 0.9833, p-value < 2.2e-16
alternative hypothesis: two.sided 

However, try this at the R command line:

> ks.test(data844,"punif")$p.value
[1] -2.220446e-16

Clearly ks.test seems to have stored the p-value internally
as a negative number (I have no idea why), and rpy is
converting this faithfully.

i.e. You do have not found a bug in rpy.  You might have
found a bug in R or perhaps an undocumented feature..

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2006-10-27 22:35

Message:
Logged In: NO 

I have some more info.

First, the R version controlled by RPy is 2.2.1.  The manual version I was 
using was R 2.0.1.

Second, part of the problem is that the result from ks.test is not always a 
number.  Some times it is what I 
would call a relation.   I now have my hands on a command-line version of R 
2.2.1, and doing the 
command ks.test(data844,"punif") gives me
     ...
    D = 0.9833, p-value < 2.2e-16
     ...

RPy seems to be converting the "< 2.2e-16" result into -2.2e-16.

Bob H


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=453021&aid=1585977&group_id=48422

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
rpy-list mailing list
rpy-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rpy-list

Reply via email to