Re: [Rd] `sort` hanging without R_CheckUserInterrupt()

2024-02-22 Thread Aidan Lakshman
Hi Martin,

Thanks for the quick response. I had observed the machine-dependent
behavior—my advisor initially asked me about this because this code
would be killed by the OS on his machine, and I wasn’t able to
replicate it on mine.

> R is *not* working at all at the
> point in time, it's waiting for the OS to feed memory space to R.

Ah, I should have suspected as such. That makes sense, of course there
can be long lag times when asking the CPU to allocate many gigs of
space all at once.

> R would be terribly slow if it allowed interruption everywhere.

Yep, agreed.

Thank you for the detailed answer, I appreciate it!

-Aidan


-------
Aidan Lakshman (he/him)
www.AHL27.com

On 22 Feb 2024, at 10:09, Martin Maechler wrote:

>>>>>> Aidan Lakshman
>>>>>> on Wed, 21 Feb 2024 15:10:35 -0500 writes:
>
> > Hi everyone,
> > Just a quick question/problem I encountered, wanted to make sure this 
> is known behavior. Running `sort` on a long vector can take quite a bit of 
> time, and I found today that there don’t seem to be any calls to 
> `R_CheckUserInterrupt()` during execution. Calling something like 
> `sort((2^31):1)` takes good bit of time to run on my machine and is 
> uninterruptable without force-terminating the entire R session.
>
> > There doesn’t seem to be any mention in the help files that this method 
> is uninterruptable. All the methods called from `sortVector` in 
> `src/main/sort.c` lack checks for user interrupts as well.
>
> > My main question is, is this known behavior? Is it worth including a 
> warning in the help file? I don’t doubt that including a bunch of 
> `R_CheckUserInterrupt()` calls would really hurt performance, but it may be 
> possible to add an occasional call if `sort` is called on a long vector.
>
> > This may not even be a problem that affects people, which is my main 
> reason for inquiring.
>
> What you claim is partly incorrect.
> It depends very much on the platform you are using, and this
> case is depends quite a bit on the amount of RAM it has,
> but sort() is definitely interruptable {read on, see later}:
>
> The reason that your interrupt does not happen for a while is
> that you are working with huge objects. For such objects, even
>v <- v + 1
>
> typically takes several seconds... also depending on the
> platform *and* R would be terribly slow if it allowed
> interruption everywhere.
> Also with such huge objects *and* when you are close to the RAM boundary,
> the computer starts swapping {easy to observe with a system
> monitor, e.g. `htop` on Linux} and such processes belong to the
> OS, not to R, so are typically *not* interruptable by just
> telling R to stop working: R is *not* working at all at the
> point in time, it's waiting for the OS to feed memory space to R.
>
>
> If I use my personal computer with 16 GB RAM, my process is even
> *killed* by the OS when I do   v <- v+1
> because my OS is Fedora Linux and it uses an OOM Daemon process
> (OOM = Out Of Memory) which kills processes if they start to eat
> most of the computer RAM ... because the whole computer becomes
> unusable in such situations [yes, one can tweak the OOMD or
> disable it].  I assume your computer also has 16 GB RAM because
> that is really the critical size for *numeric* vectors of length 2^31:
> (numeric = double prec = 8 = 2^3 bytes).
>
>   > 2^34
>   [1] 17'179'869'184  # (the "'" added by MM) i.e. 17 billion
>
>   16 GB is roughly 16 billion bytes
>
> As soon as I switch to one of our powerful "compute clients"
> with several hundred giga bytes of RAM, everything behaves
> normally ... well if you are aware that 2^31 *is* large and
> hence slow by necessity, and almost *every* operation takes a
> few seconds.
>
> Here's a log on such a computer {using my package's
> sfsmisc::Sys.memGB() , not crucially} :
>
> ---
>
> R version 4.3.3 RC (2024-02-21 r85967) -- "Angel Food Cake"
> Copyright (C) 2024 The R Foundation for Statistical Computing
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
>
>   Natural language support but running in an English locale
>
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
>
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q

[Rd] `sort` hanging without R_CheckUserInterrupt()

2024-02-21 Thread Aidan Lakshman
Hi everyone,

Just a quick question/problem I encountered, wanted to make sure this is known 
behavior. Running `sort` on a long vector can take quite a bit of time, and I 
found today that there don’t seem to be any calls to `R_CheckUserInterrupt()` 
during execution. Calling something like `sort((2^31):1)` takes good bit of 
time to run on my machine and is uninterruptable without force-terminating the 
entire R session.

There doesn’t seem to be any mention in the help files that this method is 
uninterruptable. All the methods called from `sortVector` in `src/main/sort.c` 
lack checks for user interrupts as well.

My main question is, is this known behavior? Is it worth including a warning in 
the help file? I don’t doubt that including a bunch of `R_CheckUserInterrupt()` 
calls would really hurt performance, but it may be possible to add an 
occasional call if `sort` is called on a long vector.

This may not even be a problem that affects people, which is my main reason for 
inquiring.

-Aidan



---
Aidan Lakshman (he/him)
www.AHL27.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Errors in wilcox family functions

2024-02-12 Thread Aidan Lakshman
Hi Everyone,

Following the previous discussion on optimizing *wilcox functions, Andreas 
Loeffler brought to my attention a few other bugs in `wilcox` family functions. 
It seems like these issues have been discussed online in the past few months, 
but I haven’t seen discussion on R-devel...unless I missed an email, it seems 
like discussion never made it to the mailing list. I haven’t seen any bug 
reports on Bugzilla to this effect either, so I’m emailing the list to start 
discussion with both Andreas and Andrey cc’d so they can provide additional 
detail.

Most of these issues have been raised by Andrey Akinshin on his blog, which I 
will link throughout for reproducible examples and code rather than making this 
email even longer than it already is. Of the following points, (1-2) could be 
considered bugs, and (3-4) enhancements. I wanted to reach out to determine if 
R-devel was already aware of these, and if so, if people have already been 
working on them.

The current issues are the following:

1. `wilcox.test` returns very distorted p-values when `exact=FALSE`, especially 
at the tails of the distribution. These errors are due to inaccuracy in the 
normal approximation. While there will obviously be errors using any 
approximation, it is possible to use an Edgeworth Expansion correction to 
uniformly decreases the error in p-value approximation without substantially 
rewriting the internals. An example patch and benchmarks are available at 
https://github.com/ahl27/R_Patches/tree/94e8e0bcf5076841637f1031ea9cf456ad18d7ef/EdgeworthExpansion.

More verbose details, examples, and benchmarks:
- https://aakinshin.net/posts/r-mann-whitney-incorrect-p-value/
- https://aakinshin.net/posts/mw-edgeworth/

2. The built-in Hodges-Lehmann estimator has some edge cases that result in 
incorrect answers without suitable warnings. These are detailed extensively at 
the links below. In short, the following cases result in an incorrect value for 
`wilcox.test(..., conf.int=TRUE)$estimate`:
- `x` has zero values: these are trimmed before the median is calculated, so 
`x=c(0,1,2)` returns a median of 1.5.
- tied values in either one- or two-sample tests: these can force R to use a 
normal approximation even in low-sample cases.
- degenerate two-sample tests (`max(x)==min(x)` and `max(y)==min(y)`): this 
produces an error due to an unhandled edge case.

This particular issue has caused problems for the DescTools package, which 
recently reimplemented HodgesLehmann due to inaccurate results in base R. At 
the very least, warnings should probably be added so that users don’t 
misinterpret these results. Better still would be to just fix these cases, but 
I haven’t dug super deep into the codebase yet, so I’m not completely sure how 
difficult that would be.

Details and examples:
- https://aakinshin.net/posts/r-hodges-lehmann-problems/
- Discussion in DescTools 
(https://github.com/AndriSignorell/DescTools/issues/97)

3. `*signrank` functions hang for large values of `n`. The exact value varies, 
but tends to be around `n>=1000`. `wilcox.test` supports an `exact` 
argument—should an inexact approximation be implemented for *signrank functions 
with `n>=1000`? An Edgeworth approximation could be similarly leveraged here to 
return results in a reasonable manner.

Full writeup: https://aakinshin.net/posts/signrank-limitations/

4. Suggestions for updating tie correction in Mann-Whitney U tests. Andrey has 
a very extensive writeup of the cases that make tie correction sometimes 
unintuitive, and I’m just going to link it here: 
https://aakinshin.net/posts/mw-confusing-tie-correction/.

If these seem like they’d be welcome improvements to R, I’ll work with Andrey 
on putting some patches up on Bugzilla this week. If people have already known 
about these and discussed them and I just somehow missed it, then I’m very 
sorry for the verbose email.

Thanks again to Andrey for the writeup, and Andreas for pointing me to it.

-Aidan

---
Aidan Lakshman (he/him)
www.AHL27.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] round.Date and trunc.Date not working / implemented

2024-02-08 Thread Aidan Lakshman
I just wanted to add some additional problems from trying to replicate this.
`trunc` does indeed work as advertised here, as does calling 
`round.POSIXt(Sys.Date())`
and `round(Sys.Date())`. These functions are definitely implemented.

However, I am also able to replicate Jiří’s error:

```
> round(Sys.Date())
[1] "2024-02-08"
> round(Sys.time(), 'years')
[1] "2024-01-01 EST"
> round(Sys.Date(), 'years')
Error in round.default(19761, "years") :
  non-numeric argument to mathematical function
```

Specifying `units="years"` causes errors — either an error from named argument
(`unused argument units="years"`) or the above error when the argument is 
unnamed.
The error they’re experiencing isn’t actually from `round.Date`, it’s from 
trying
to call `round("years")`, which is done implicitly when `"years"` is provided 
as an
unnamed second argument to `round()`.

I suspect that the original bug report is referencing this behavior. Yes, it is 
correct
that `round.Date` does not accept a `units` parameter as implemented and as 
specified in
the help file. However, it does feel a little odd that `round.POSIXt` does 
accept a `units`
parameter, but `round.Date` does not. Adding that capability would be fairly 
simple, unless
there are other reasons why it isn’t implemented. Just my quick thoughts.

-Aidan

---
Aidan Lakshman (he/him)
http://www.ahl27.com/

On 8 Feb 2024, at 9:36, Olivier Benz via R-devel wrote:

>> On 8 Feb 2024, at 15:15, Martin Maechler  wrote:
>>
>>>>>>> Jiří Moravec
>>>>>>>on Wed, 7 Feb 2024 10:23:15 +1300 writes:
>>
>>> This is my first time working with dates, so if the answer is "Duh, work
>>> with POSIXt", please ignore it.
>>
>>> Why is not `round.Date` and `trunc.Date` "implemented" for `Date`?
>>
>>> Is this because `Date` is (mostly) a virtual class setup for a better
>>> inheritance or is that something that is just missing? (like
>>> `sort.data.frame`). Would R core welcome a patch?
>>
>>> I decided to convert some dates to date using `as.Date` function, which
>>> converts to a plain `Date` class, because that felt natural.
>>
>>> But then when trying to round to closest year, I have realized that the
>>> `round` and `trunc` for `Date` do not behave as for `POSIXt`.
>>
>>> I would assume that these will have equivalent output:
>>
>>> Sys.time() |> round("years") # 2024-01-01 NZDT
>>
>>> Sys.Date() |> round("years") # Error in round.default(...): non-numeric
>>> argument to mathematical function
>>
>>
>>> Looking at the code (and reading the documentation more carefully) shows
>>> the issue, but this looks like an omission that should be patched.
>>
>>> -- Jirka
>>
>> You are wrong:  They *are* implemented,
>> both even visible since they are in the 'base' package!
>>
>> ==> they have help pages you can read 
>>
>> Here are examples:
>>
>>> trunc(Sys.Date())
>> [1] "2024-02-08"
>>> trunc(Sys.Date(), "month")
>> [1] "2024-02-01"
>>> trunc(Sys.Date(), "year")
>> [1] "2024-01-01"
>>>
>>
>
> Maybe he meant
>
> r$> Sys.time() |> round.POSIXt("years")
> [1] "2024-01-01 CET"
>
> r$> Sys.Date() |> round.POSIXt("years")
> [1] "2024-01-01 UTC"
>
> The only difference is the timezone
>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] cwilcox - new version

2024-01-17 Thread Aidan Lakshman
Hi everyone,

I’ve opened a Bugzilla report for Andreas with the most recent implementation 
here: https://bugs.r-project.org/show_bug.cgi?id=18655. Feedback would be 
greatly appreciated.


The most straight forward approach is likely to implement both methods and 
determine which to use based on population sizes. The cutoff at n=50 is very 
sharp; it would be a large improvement to just call Andreas’s algorithm when 
either population is larger than 50, and use the current method otherwise.

For the Bugzilla report I’ve only submitted the new version for benchmarking 
purposes. I think that if there is a way to improve this algorithm such that it 
matches the performance of the current version for population sizes under 50, 
then it would be significantly cleaner than to have two algorithms with an 
internal switch.

As for remaining performance improvements:

1. cwilcox_sigma is definitely a performance loss. It would improve performance 
to instead just loop from 1 to min(m, sqrt(k)) and from n+1 to min(m+n, 
sqrt(k)), since the formula just finds potential factors of k. Maybe there are 
other ways to improve this, but I think factorization is a notoriously 
intensive problem, so that further optimzation may be intractable.

2. Calculation of the distribution values has quadratic scaling. Maybe there’s 
a way to optimize that further? See lines 91-103 in the most recent version.

Regardless of runtime, memory is certainly improved. For calculation on 
population sizes m,n, the current version has memory complexity O((mn)^2), 
whereas Andreas’s version has complexity O(mn). Running `qwilcox(0.5,500,500)` 
crashes my R session with the old version, but runs successfully in about 10s 
with the new version.

I’ve written up all the information so far on the Bugzilla report, and I’m sure 
Andreas will add more information if necessary when his account is approved. 
Thanks again to Andreas for introducing this algorithm—I’m hopeful that this is 
able to improve performance of the wilcox functions.

-Aidan


---
Aidan Lakshman (he/him)
PhD Candidate, Wright Lab
University of Pittsburgh School of Medicine
Department of Biomedical Informatics
www.AHL27.com
ah...@pitt.edu | (724) 612-9940

On 17 Jan 2024, at 5:55, Andreas Löffler wrote:

>>
>>
>> Performance statistics are interesting. If we assume the two populations
>> have a total of `m` members, then this implementation runs slightly slower
>> for m < 20, and much slower for 50 < m < 100. However, this implementation
>> works significantly *faster* for m > 200. The breakpoint is precisely when
>> each population has a size of 50; `qwilcox(0.5,50,50)` runs in 8
>> microseconds in the current version, but `qwilcox(0.5, 50, 51)` runs in 5
>> milliseconds. The new version runs in roughly 1 millisecond for both. This
>> is probably because of internal logic that requires many more `free/calloc`
>> calls if either population is larger than `WILCOX_MAX`, which is set to 50.
>>
> Also because cwilcox_sigma has to be evaluated, and this is slightly more
> demanding since it uses k%d.
>
> There is a tradeoff here between memory usage and time of execution. I am
> not a heavy user of the U test but I think the typical use case does not
> involve several hundreds of tests in a session so execution time (my 2
> cents) is less important. But if R crashes one execution is already
> problematic.
>
> But the takeaway is  probably: we should implement both approaches in the
> code and leave it to the user which one she prefers. If time is important
> and memory not an issue and if m, n are low go for the "traditional
> approach". Otherwise, use my formula?
>
> PS (@Aidan): I have applied for an bugzilla account two days ago and heard
> not back from them. Also Spam is empty. Is that ok or shall I do something?

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] cwilcox - new version

2024-01-16 Thread Aidan Lakshman
I’ve been looking at this for a couple hours--it ended up being trickier than I 
expected to implement well.

I’ve attached a new patch here. This version scales significantly better than 
the existing method for both `pwilcox` and `dwilcox`. Examples are included 
below.

I can’t think of any other ways to improve runtime at this point. That’s not to 
say there aren’t any, though—I’m hopeful inspection by someone else could 
identify more ways to save some time.

This version uses Andreas’s algorithm for computing the value of `cwilcox`, but 
evaluates it lazily and caches results. Memory usage of this should be orders 
of magnitude less than the current version, and recursion is eliminated. As far 
as I can tell, the numerical accuracy of the results is unchanged.

Performance statistics are interesting. If we assume the two populations have a 
total of `m` members, then this implementation runs slightly slower for m < 20, 
and much slower for 50 < m < 100. However, this implementation works 
significantly *faster* for m > 200. The breakpoint is precisely when each 
population has a size of 50; `qwilcox(0.5,50,50)` runs in 8 microseconds in the 
current version, but `qwilcox(0.5, 50, 51)` runs in 5 milliseconds. The new 
version runs in roughly 1 millisecond for both. This is probably because of 
internal logic that requires many more `free/calloc` calls if either population 
is larger than `WILCOX_MAX`, which is set to 50.

I’m hopeful that this can be optimized to be suitable for inclusion in R. Lower 
performance for population sizes below 50 is not ideal, since `wilcox.test` 
switches to non-exact testing for population sizes above 50.

-Aidan

Benchmarking results on my machine using `microbenchmark`:
(median runtimes over 100 iterations, us=microseconds, ms=milliseconds, 
s=seconds)

`qwilcox(0.5, n, n)`

  n: 10  25 50 100 200
old version:  1.2us   2.9us  9.0us  87.4ms2.3s
Andreas version:  2.7us  68.6us  1.1ms  16.9ms 264.3ms

`dwilcox((n*n)%/%2, n, n)`
  n: 10  25 50 100 200
old version:  1.4us   0.9us  0.9us  43.2ms 851.6ms
Andreas version:  2.3us  53.9us  1.0ms  16.4ms 261.7ms


`pwilcox(1:100, 10, 10)`:
old version:  62.9us
Andreas version:  22.9us


---
Aidan Lakshman (he/him)
PhD Candidate, Wright Lab
University of Pittsburgh School of Medicine
Department of Biomedical Informatics
http://www.ahl27.com/
ah...@pitt.edu | (724) 612-9940

On 16 Jan 2024, at 9:47, Aidan Lakshman wrote:

> New email client, hoping that Outlook won’t continue to HTMLify my emails and 
> mess up my attachments without my consent :/
>
> I’m re-attaching the patch file here, compressed as a .gz in case that was 
> the issue with the previous attachment. Thanks to Ivan for pointing out that 
> my attachment didn’t go through correctly on the first try.
>
> I’ll be working on seeing if I can optimize this to actually improve 
> performance during the day today.
>
> -Aidan
>
> On 15 Jan 2024, at 5:51, Andreas Löffler wrote:
>
>> I am a newbie to C (although I did some programming in Perl and Pascal) so
>> forgive me for the language that follows. I am writing because I have a
>> question concerning the *implementation of *(the internal function)*
>> cwilcox* in
>> https://svn.r-project.org/R/!svn/bc/81274/branches/R-DL/src/nmath/wilcox.c.
>> This function is used to determine the test statistics of the
>> Wilcoxon-Mann-Whitney U-test.
>>
>> ___
>> The function `cwilcox` has three inputs: k, m and n. To establish the test
>> statistics `cwilcox` determines the number of possible sums of natural
>> numbers with the following restrictions:
>> - the sum of the elements must be k,
>> - the elements (summands) must be in a decreasing (or increasing) order,
>> - every element must be less then m and
>> - there must be at most n summands.
>>
>> The problem with the evaluation of this function `cwilcox(k,m,n)` is that
>> it requires a recursion where in fact a three-dimensional array has to be
>> evaluated (see the code around line 157). This requires a lot of memory and
>> takes time and seems still an issue even with newer machines, see the
>> warning in the documentation
>> https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/wilcox.test
>> .
>>
>> In an old paper I have written a formula where the evaluation can be done
>> in a one-dimensional array that uses way less memory and is much faster.
>> The paper itself is in German (you find a copy here:
>> https://upload.wikimedia.org/wikipedia/commons/f/f5/LoefflerWilcoxonMannWhitneyTest.pdf),
>>

Re: [Rd] cwilcox - new version

2024-01-16 Thread Aidan Lakshman
New email client, hoping that Outlook won’t continue to HTMLify my emails and 
mess up my attachments without my consent :/

I’m re-attaching the patch file here, compressed as a .gz in case that was the 
issue with the previous attachment. Thanks to Ivan for pointing out that my 
attachment didn’t go through correctly on the first try.

I’ll be working on seeing if I can optimize this to actually improve 
performance during the day today.

-Aidan

On 15 Jan 2024, at 5:51, Andreas Löffler wrote:

> I am a newbie to C (although I did some programming in Perl and Pascal) so
> forgive me for the language that follows. I am writing because I have a
> question concerning the *implementation of *(the internal function)*
> cwilcox* in
> https://svn.r-project.org/R/!svn/bc/81274/branches/R-DL/src/nmath/wilcox.c.
> This function is used to determine the test statistics of the
> Wilcoxon-Mann-Whitney U-test.
>
> ___
> The function `cwilcox` has three inputs: k, m and n. To establish the test
> statistics `cwilcox` determines the number of possible sums of natural
> numbers with the following restrictions:
> - the sum of the elements must be k,
> - the elements (summands) must be in a decreasing (or increasing) order,
> - every element must be less then m and
> - there must be at most n summands.
>
> The problem with the evaluation of this function `cwilcox(k,m,n)` is that
> it requires a recursion where in fact a three-dimensional array has to be
> evaluated (see the code around line 157). This requires a lot of memory and
> takes time and seems still an issue even with newer machines, see the
> warning in the documentation
> https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/wilcox.test
> .
>
> In an old paper I have written a formula where the evaluation can be done
> in a one-dimensional array that uses way less memory and is much faster.
> The paper itself is in German (you find a copy here:
> https://upload.wikimedia.org/wikipedia/commons/f/f5/LoefflerWilcoxonMannWhitneyTest.pdf),
> so I uploaded a translation into English (to be found in
> https://upload.wikimedia.org/wikipedia/de/1/19/MannWhitney_151102.pdf).
>
> I also looked at the code in R and wrote a new version of the code that
> uses my formulae so that a faster implementation is possible (code with
> comments is below). I have several questions regarding the code:
>
>1. A lot of commands in the original R code are used to handle the
>memory. I do not know how to do this so I skipped memory handling
>completely and simply allocated space to an array (see below int
>cwilcox_distr[(m*n+1)/2];). Since my array is 1-dimensional instead of
>3-dimensional I think as a first guess that will be ok.
>2. I read the documentation when and why code in R should be changed. I
>am not familiar enough with R to understand how this applies here. My code
>uses less memory - is that enough? Or should another function be defined
>that uses my code? Or shall it be disregarded?
>3. I was not able to file my ideas in bugzilla.r-project or the like and
>I hope that this mailing list is a good address for my question.
>
> I also have written an example of a main function where the original code
> from R is compared to my code. I do not attach this example because this
> email is already very long but I can provide that example if needed.
>
> Maybe we can discuss this further. Best wishes, Andreas Löffler.
>
>
> ```
> #include 
> #include 
>
> /* The traditional approch to determine the Mann-Whitney statistics uses a
> recursion formular for partitions of natural numbers, namely in the line
> w[i][j][k] = cwilcox(k - j, i - 1, j) + cwilcox(k, i, j - 1);
> (taken from
> https://svn.r-project.org/R/!svn/bc/81274/branches/R-DL/src/nmath/wilcox.c).
> This approach requires a huge number of partitions to be evaluated because
> the second variable (j in the left term) and the third variable (k in the
> left term) in this recursion are not constant but change as well. Hence, a
> three dimensional array is evaluated.
>
> In an old paper a recursion equations was shown that avoids this
> disadvantage. The recursion equation of that paper uses only an array where
> the second as well as the third variable remain constant. This implies
> faster evaluation and less memory used. The original paper is in German and
> can be found in
> https://upload.wikimedia.org/wikipedia/commons/f/f5/LoefflerWilcoxonMannWhitneyTest.pdf
> and the author has uploaded a translation into English in
> https://upload.wikimedia.org/wikipedia/de/1/19/MannWhitney_151102.pdf. This
> function uses this approach. */
>
> static int
> cwilcox_sigma(int k, int m, int n) { /* this relates to the sigma function
> below */
>   int s, d;
>
>   s=0;
>   for (d = 1; d <= m; d++) {
>   if (k%d == 0) {
>   s=s+d;
>   }
>   }
>   for (d =