subject:"Re\: \[Rd\] stats\:\:fft produces inconsistent results"

Re: [Rd] stats::fft produces inconsistent results

2021-10-21 Thread Ben Bolker

 
have to be allocated for more time. A call to R_PreserveObject 
protects the object, even after the C code returns to R, until 
R_ReleaseObject is called. Without an explicit call to 
R_ReleaseObject, memory is leaked!


There is another mechanism in R that must be known. If you call any R 
function from C code, or any internal R function that may fail with an 
error, or any internal R function that can be stopped by the user (see 
R_CheckUserInterrupt), then, R may call a longjmp to exit all the C 
code. This is very much incompatible with C++ exceptions or 
constructors/destructors. Rcpp can avoid, to some extent, that problem.


With C code, this means that some malloc'ed memory or allocated 
resources (file descriptors, sockets, etc.) may be leaked unless 
something is done to prevent that. All PROTECT'ed objects are 
automatically unprotected, so there is no problem with memory leak of 
garbage collectable objects. There is a R_UnwindProtect() mechanism to 
free temporary resources (e.g. a socket you allocated) when a longjmp 
is triggered. Non-memory resources (e.g. a socket) returned to R 
should use theR_MakeExternalPtr() mechanism to make sure that, when 
the memory is freed by the garbage collector, the resource is also freed.


"Writing R extensions" contains more extensive documentation, but I 
hope that my quick description of the system will make it easier to 
understand the extensive documentation.


--
Sincèrement
André GILLIBERT

*De :*Dipterix Wang <mailto:dipterix.w...@gmail.com>>

*Envoyé :*mercredi 20 octobre 2021 22:01
*À :*Martin Maechler <mailto:maech...@stat.math.ethz.ch>>; GILLIBERT, Andre 
<mailto:andre.gillib...@chu-rouen.fr>>;bbol...@gmail.com 
<mailto:bbol...@gmail.com>

*Cc :*r-devel@r-project.org <mailto:r-devel@r-project.org>
*Objet :*Re: [Rd] stats::fft produces inconsistent results

ATTENTION:Cet e-mail provient d’une adresse mail extérieure au CHU de 
Rouen. Ne cliquez pas sur les liens ou n'ouvrez pas les pièces jointes 
à moins de connaître l'expéditeur et de savoir que le contenu est sûr. 
En cas de doute, transférer le mail à « DSI, Sécurité » pour analyse. 
Merci de votre vigilance


Wow, you guys are amazing!

as part of its pipeline, ravetools::mvfftw computes the
mean of the
input vector **and then centers it to a mean of zero**
(intentionally or
accidentally?)



because variables are passed to compiled code by reference
(someone
can feel free to correct my terminology), this means that
the original
vector in R now has a mean of zero

I didn’t center the input vector in my code. The data was fed “as-is” 
into FFTW3. My guess is FFTW3 internally center the data. It could be 
that FFTW3 library behave differently on different platforms, which 
could explain the reproducibility issue.


/"Indeed, R vectors are passed "by reference" to C code,
but the semantic must be "by value", i.e. the C function
must NOT change the contents of the vector, except in very
specific cases.”/

CRAN has already had fftw and fftwtools, the issue is the data I’m 
targeting at are at GB-level, copying the vectors can be memory 
inefficient or even use up memories. The strategy of ravetools is to 
import signals from local files, fft, then directly write to disk. So 
only one reference will be used and modifying in-place is on purpose. 
In fact, and the fft functions I created are not intended to be used 
directly by users.


However, I should've been very cautious when using these functions. 
This is my fault. I’ll check the whole package to make sure only one 
reference is used or otherwise the vectors will be copied.


/This can be tested by the MAYBE_REFERENCED() macro in
Rinternals.h./

Nice to learn! I’ll add it to my code.

Properly using R vectors in C code is tricky. You have to
understand.

1) When you are allowed or not to modify vectors

2) When to PROTECT()vectors

3) How the garbage collector works and when it can trigger
(answer : basically, when you call any internal R function)

Chapter 5 of "Writing R Extensions" documentation is quite
extensive:


https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Handling-R-objects-in-C<https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Handling-R-objects-in-C>

Indeed, I found myself often confused about when to PROTECT and when not.

... but then ravetools  is not even a CRAN package, so
why should you dare to use it for anything serious ?

Haha, thanks : ) I guess I will probably be grouchy too if seeing so 
many people making the same mistakes again and again. It just happened 
to be me.
But it’s good

Re: [Rd] stats::fft produces inconsistent results

2021-10-21 Thread Dipterix Wang

t; memory they use internally except the object that is returned. Sometimes, 
> some "background" memory, hidden to R code, may have to be allocated for more 
> time. A call to R_PreserveObject protects the object, even after the C code 
> returns to R, until R_ReleaseObject is called. Without an explicit call to 
> R_ReleaseObject, memory is leaked!
> 
> There is another mechanism in R that must be known. If you call any R 
> function from C code, or any internal R function that may fail with an error, 
> or any internal R function that can be stopped by the user (see 
> R_CheckUserInterrupt), then, R may call a longjmp to exit all the C code. 
> This is very much incompatible with C++ exceptions or 
> constructors/destructors. Rcpp can avoid, to some extent, that problem.
> 
> With C code, this means that some malloc'ed memory or allocated resources 
> (file descriptors, sockets, etc.) may be leaked unless something is done to 
> prevent that. All PROTECT'ed objects are automatically unprotected, so there 
> is no problem with memory leak of garbage collectable objects. There is a 
> R_UnwindProtect() mechanism to free temporary resources (e.g. a socket you 
> allocated) when a longjmp is triggered. Non-memory resources (e.g. a socket) 
> returned to R should use the R_MakeExternalPtr() mechanism to make sure that, 
> when the memory is freed by the garbage collector, the resource is also freed.
> 
> "Writing R extensions" contains more extensive documentation, but I hope that 
> my quick description of the system will make it easier to understand the 
> extensive documentation.
> 
>  -- <>
> Sincèrement
> André GILLIBERT
>  
> De : <> Dipterix Wang  <mailto:dipterix.w...@gmail.com>> 
> Envoyé : mercredi 20 octobre 2021 22:01
> À : Martin Maechler  <mailto:maech...@stat.math.ethz.ch>>; GILLIBERT, Andre 
> mailto:andre.gillib...@chu-rouen.fr>>; 
> bbol...@gmail.com <mailto:bbol...@gmail.com>
> Cc : r-devel@r-project.org <mailto:r-devel@r-project.org>
> Objet : Re: [Rd] stats::fft produces inconsistent results
>  
> ATTENTION: Cet e-mail provient d’une adresse mail extérieure au CHU de Rouen. 
> Ne cliquez pas sur les liens ou n'ouvrez pas les pièces jointes à moins de 
> connaître l'expéditeur et de savoir que le contenu est sûr. En cas de doute, 
> transférer le mail à « DSI, Sécurité » pour analyse. Merci de votre vigilance
>  
> Wow, you guys are amazing! 
>  
> as part of its pipeline, ravetools::mvfftw computes the mean of the
> input vector **and then centers it to a mean of zero** (intentionally or
> accidentally?)
> 
> 
> because variables are passed to compiled code by reference (someone
> can feel free to correct my terminology), this means that the original
> vector in R now has a mean of zero
>  
> I didn’t center the input vector in my code. The data was fed “as-is” into 
> FFTW3. My guess is FFTW3 internally center the data. It could be that FFTW3 
> library behave differently on different platforms, which could explain the 
> reproducibility issue. 
>  
> "Indeed, R vectors are passed "by reference" to C code, but the semantic must 
> be "by value", i.e. the C function must NOT change the contents of the 
> vector, except in very specific cases.”
>  
> CRAN has already had fftw and fftwtools, the issue is the data I’m targeting 
> at are at GB-level, copying the vectors can be memory inefficient or even use 
> up memories. The strategy of ravetools is to import signals from local files, 
> fft, then directly write to disk. So only one reference will be used and 
> modifying in-place is on purpose. In fact, and the fft functions I created 
> are not intended to be used directly by users.
>  
> However, I should've been very cautious when using these functions. This is 
> my fault. I’ll check the whole package to make sure only one reference is 
> used or otherwise the vectors will be copied.
>  
> This can be tested by the MAYBE_REFERENCED() macro in Rinternals.h.
>  
> Nice to learn! I’ll add it to my code.
>  
> Properly using R vectors in C code is tricky. You have to understand.
> 1) When you are allowed or not to modify vectors
> 2) When to PROTECT()vectors
> 3) How the garbage collector works and when it can trigger (answer : 
> basically, when you call any internal R function)
> Chapter 5 of "Writing R Extensions" documentation is quite extensive:
>  
> https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Handling-R-objects-in-C
>  
> <https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Handling-R-objects-in-C>
>  
> Indeed, I found myself often confused about when to PROTECT and when not. 
>  
&

Re: [Rd] stats::fft produces inconsistent results

2021-10-21 Thread GILLIBERT, Andre

ary resources (e.g. a socket you 
allocated) when a longjmp is triggered. Non-memory resources (e.g. a socket) 
returned to R should use the R_MakeExternalPtr() mechanism to make sure that, 
when the memory is freed by the garbage collector, the resource is also freed.


"Writing R extensions" contains more extensive documentation, but I hope that 
my quick description of the system will make it easier to understand the 
extensive documentation.


 --

Sinc�rement

Andr� GILLIBERT



De : Dipterix Wang 
Envoy� : mercredi 20 octobre 2021 22:01
� : Martin Maechler ; GILLIBERT, Andre 
; bbol...@gmail.com
Cc : r-devel@r-project.org
Objet : Re: [Rd] stats::fft produces inconsistent results



ATTENTION: Cet e-mail provient d�une adresse mail ext�rieure au CHU de Rouen. 
Ne cliquez pas sur les liens ou n'ouvrez pas les pi�ces jointes � moins de 
conna�tre l'exp�diteur et de savoir que le contenu est s�r. En cas de doute, 
transf�rer le mail � � DSI, S�curit� � pour analyse. Merci de votre vigilance



Wow, you guys are amazing!



as part of its pipeline, ravetools::mvfftw computes the mean of the
input vector **and then centers it to a mean of zero** (intentionally or
accidentally?)



because variables are passed to compiled code by reference (someone
can feel free to correct my terminology), this means that the original
vector in R now has a mean of zero



I didn�t center the input vector in my code. The data was fed �as-is� into 
FFTW3. My guess is FFTW3 internally center the data. It could be that FFTW3 
library behave differently on different platforms, which could explain the 
reproducibility issue.



"Indeed, R vectors are passed "by reference" to C code, but the semantic must 
be "by value", i.e. the C function must NOT change the contents of the vector, 
except in very specific cases.�



CRAN has already had fftw and fftwtools, the issue is the data I�m targeting at 
are at GB-level, copying the vectors can be memory inefficient or even use up 
memories. The strategy of ravetools is to import signals from local files, fft, 
then directly write to disk. So only one reference will be used and modifying 
in-place is on purpose. In fact, and the fft functions I created are not 
intended to be used directly by users.



However, I should've been very cautious when using these functions. This is my 
fault. I�ll check the whole package to make sure only one reference is used or 
otherwise the vectors will be copied.



This can be tested by the MAYBE_REFERENCED() macro in Rinternals.h.



Nice to learn! I�ll add it to my code.



Properly using R vectors in C code is tricky. You have to understand.

1) When you are allowed or not to modify vectors

2) When to PROTECT()vectors

3) How the garbage collector works and when it can trigger (answer : basically, 
when you call any internal R function)

Chapter 5 of "Writing R Extensions" documentation is quite extensive:

 
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Handling-R-objects-in-C



Indeed, I found myself often confused about when to PROTECT and when not.



... but then ravetools  is not even a CRAN package, so why should you dare to 
use it for anything serious ?



Haha, thanks : ) I guess I will probably be grouchy too if seeing so many 
people making the same mistakes again and again. It just happened to be me.

But it�s good to be rigorous. Sooner or later I'll have to face these problems. 
It�s better to make mistakes before having made many.



Thanks y�all!



Best,

- Dipterix Wang





On Oct 20, 2021, at 5:32 AM, Martin Maechler 
mailto:maech...@stat.math.ethz.ch>> wrote:



Martin Maechler
   on Wed, 20 Oct 2021 11:26:21 +0200 writes:

[]



Thank you, Andr� , that's very good.



Just to state the obvious conclusion:



If Ben's suggestion is correct (and Andr� has explained *how*
that could happen) this would mean  a
SEVERE BUG in package ravetools's  mvfftw() function.



and it would have been (yet another) case of gaining speed by
killing correctness...



... but then ravetools  is not even a CRAN package, so why
should you dare to use it for anything serious ?



... yes, being grouchy ..

which I should rather not be.

Dipterix Wang *did* say initially that he is currently
developing ravetools so it's very reasonabl this is not yet a
CRAN package..

Best,
Martin



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] stats::fft produces inconsistent results

2021-10-20 Thread Dipterix Wang

Wow, you guys are amazing!

>>> as part of its pipeline, ravetools::mvfftw computes the mean of the
>>> input vector **and then centers it to a mean of zero** (intentionally or
>>> accidentally?)
> 
>>> because variables are passed to compiled code by reference (someone
>>> can feel free to correct my terminology), this means that the original
>>> vector in R now has a mean of zero

I didn’t center the input vector in my code. The data was fed “as-is” into 
FFTW3. My guess is FFTW3 internally center the data. It could be that FFTW3 
library behave differently on different platforms, which could explain the 
reproducibility issue. 

>>> "Indeed, R vectors are passed "by reference" to C code, but the semantic 
>>> must be "by value", i.e. the C function must NOT change the contents of the 
>>> vector, except in very specific cases.”

CRAN has already had fftw and fftwtools, the issue is the data I’m targeting at 
are at GB-level, copying the vectors can be memory inefficient or even use up 
memories. The strategy of ravetools is to import signals from local files, fft, 
then directly write to disk. So only one reference will be used and modifying 
in-place is on purpose. In fact, and the fft functions I created are not 
intended to be used directly by users.

However, I should've been very cautious when using these functions. This is my 
fault. I’ll check the whole package to make sure only one reference is used or 
otherwise the vectors will be copied.

>>> This can be tested by the MAYBE_REFERENCED() macro in Rinternals.h.

Nice to learn! I’ll add it to my code.

>>> Properly using R vectors in C code is tricky. You have to understand.
>>> 1) When you are allowed or not to modify vectors
>>> 2) When to PROTECT()vectors
>>> 3) How the garbage collector works and when it can trigger (answer : 
>>> basically, when you call any internal R function)
>>> Chapter 5 of "Writing R Extensions" documentation is quite extensive:
>>>  
>>> https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Handling-R-objects-in-C
>>>  
>>> 

Indeed, I found myself often confused about when to PROTECT and when not. 

 ... but then ravetools  is not even a CRAN package, so why should you dare 
 to use it for anything serious ?

Haha, thanks : ) I guess I will probably be grouchy too if seeing so many 
people making the same mistakes again and again. It just happened to be me.
But it’s good to be rigorous. Sooner or later I'll have to face these problems. 
It’s better to make mistakes before having made many.

Thanks y’all!

Best,
- Dipterix Wang

> On Oct 20, 2021, at 5:32 AM, Martin Maechler  
> wrote:
> 
>> Martin Maechler 
>>on Wed, 20 Oct 2021 11:26:21 +0200 writes:
> 
> []
> 
>> Thank you, André , that's very good.
> 
>> Just to state the obvious conclusion:
> 
>> If Ben's suggestion is correct (and André has explained *how*
>> that could happen) this would mean  a
>> SEVERE BUG in package ravetools's  mvfftw() function.
> 
>> and it would have been (yet another) case of gaining speed by
>> killing correctness...
> 
>> ... but then ravetools  is not even a CRAN package, so why
>> should you dare to use it for anything serious ?
> 
>> ... yes, being grouchy ..
> 
> which I should rather not be.
> 
> Dipterix Wang *did* say initially that he is currently
> developing ravetools so it's very reasonabl this is not yet a
> CRAN package..
> 
> Best,
> Martin
> 

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] stats::fft produces inconsistent results

2021-10-20 Thread Martin Maechler

>>>>> Martin Maechler 
>>>>> on Wed, 20 Oct 2021 11:26:21 +0200 writes:

[]

> Thank you, André , that's very good.

> Just to state the obvious conclusion:

> If Ben's suggestion is correct (and André has explained *how*
> that could happen) this would mean  a
> SEVERE BUG in package ravetools's  mvfftw() function.

> and it would have been (yet another) case of gaining speed by
> killing correctness...

> ... but then ravetools  is not even a CRAN package, so why
> should you dare to use it for anything serious ?

> ... yes, being grouchy ..

which I should rather not be.

Dipterix Wang *did* say initially that he is currently
developing ravetools so it's very reasonabl this is not yet a
CRAN package..

Best,
Martin

>> -Message d'origine-
>> De : R-devel  De la part de Ben Bolker
>> Envoyé : mercredi 20 octobre 2021 03:27
>> À : r-devel@r-project.org
>> Objet : Re: [Rd] stats::fft produces inconsistent results


>> This is a long shot, but here's a plausible scenario:

>> as part of its pipeline, ravetools::mvfftw computes the mean of the
>> input vector **and then centers it to a mean of zero** (intentionally or
>> accidentally?)

>> because variables are passed to compiled code by reference (someone
>> can feel free to correct my terminology), this means that the original
>> vector in R now has a mean of zero

>> the first element of fft() is mean(x)*length(x), so if mean(x) has
>> been forced to zero, that would explain your issue.

>> I don't know about the non-reproducibility part.

>> On 10/19/21 7:06 PM, Dipterix Wang wrote:
>>> Dear R-devel Team,
>>> 
>>> I'm developing a neuroscience signal pipeline package in R 
(https://github.com/dipterix/ravetools) and I noticed a weird issue that failed 
my unit test.
>>> 
>>> Basically I was trying to use `fftw3` library to implement fast 
multivariate fft function in C++. When I tried to compare my results with 
stats::fft, the test result showed the first element of **expected** (which was 
produced by stats::fft) was zero, which, I am pretty sure, is wrong, and I can 
confirm that my function produces correct results.
>>> 
>>> However, somehow I couldn’t reproduce this issue on my personal 
computer (osx, M1, R4.1.1), the error simply went away.
>>> 
>>> The catch is my function produced consistent and correct results but 
stats::fft was not. This does not mean `stats::fft` has bugs. Instead, I 
suspect there could be some weird interactions between my code and stats::fft 
at C/C++ level, but I couldn’t figure it out why.
>>> 
>>> +++ Details:
>>> 
>>> Here’s the code I used for the test:
>>> 
>>> 
https://github.com/dipterix/ravetools/blob/4dc35d64763304aff869d92dddad38a7f2b30637/tests/testthat/test-fftw.R#L33-L41
>>> 
>>> Test code
>>> set.seed(1)
>>> x <- rnorm(1000)
>>> dim(x) <- c(100,10)
>>> a <- ravetools:::mvfftw_r2c(x, 0)
>>> c <- apply(x, 2, stats::fft)[1:51,]
>>> expect_equal(a, c)
>>> 
>>> 
>>> Here are the tests that gave me the errors:
>>> 
>>> The test logs on win-builder
>>> https://win-builder.r-project.org/07586ios8AbL/00check.log
>>> 
>>> Test logs on GitHub
>>> 
https://github.com/dipterix/ravetools/runs/3944874310?check_suite_focus=true
>>> 
>>> 
>>> —— Failed tests ——
>>> -- Failure (test-fftw.R:41:3): mvfftw_r2c 
--
>>> `a` (`actual`) not equal to `c` (`expected`).
>>> 
>>> actual vs expected
>>> [,1][,2]  [,3]  
[,4]...
>>> - actual[1, ] 10.8887367+ 0.000i  -3.7808077+ 0.000i   
2.967354+ 0.00i   5.160186+ 0.00i ...
>>> + expected[1, ]0.000+ 0.000i  -3.7808077+ 0.000i   
2.967354+ 0.00i   5.160186+ 0.00i...
>>> 
>>> 
>>> 
>>> The first columns are different, `actual` is the results I produced via 
`ravetools:::mvfftw_r2c`, and `expected` was produced by `stats::fft`
>>> 
>>> 
>>> Any help or attention is very much appreciated.
>>> Thanks,
>>> - Zhengjia

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] stats::fft produces inconsistent results

2021-10-20 Thread Martin Maechler

>>>>> GILLIBERT, Andre 
>>>>> on Wed, 20 Oct 2021 08:10:00 + writes:

> Hello,
> That sounds like a good diagnosis!
> Indeed, R vectors are passed "by reference" to C code, but the semantic 
must be "by value", i.e. the C function must NOT change the contents of the 
vector, except in very specific cases.

> A good program that has to work on a vector, must first duplicate the 
vector, unless the only reference to the vector is the reference inside the C 
function.
> This can be tested by the MAYBE_REFERENCED() macro in Rinternals.h.

> A good example can be found in the fft() function in 
src/library/stats/src/fourier.c in R source code:
> switch (TYPEOF(z)) {
> case INTSXP:
> case LGLSXP:
> case REALSXP:
> z = coerceVector(z, CPLXSXP);
> break;
> case CPLXSXP:
> if (MAYBE_REFERENCED(z)) z = duplicate(z);
> break;
> default:
> error(_("non-numeric argument"));
> }
> PROTECT(z);

> This code coerces non-complex vectors to complex. Since this makes a 
copy, there is no need to duplicate.
> Complex vectors are duplicated, unless they are not referenced by 
anything but the fft() function.

> Now, the z vector can be modified "in place" without inconsistency.

> Properly using R vectors in C code is tricky. You have to understand.
> 1) When you are allowed or not to modify vectors
> 2) When to PROTECT()vectors
> 3) How the garbage collector works and when it can trigger (answer : 
basically, when you call any internal R function)

> Chapter 5 of "Writing R Extensions" documentation is quite extensive:
> 
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Handling-R-objects-in-C

> -- 
> Sincerely
> André GILLIBERT

Thank you, André , that's very good.

Just to state the obvious conclusion:

If Ben's suggestion is correct (and André has explained *how*
that could happen) this would mean  a
SEVERE BUG in package ravetools's  mvfftw() function.

and it would have been (yet another) case of gaining speed by
killing correctness...

... but then ravetools  is not even a CRAN package, so why
should you dare to use it for anything serious ?

... yes, being grouchy ..

    > -----Message d'origine-
    > De : R-devel  De la part de Ben Bolker
> Envoyé : mercredi 20 octobre 2021 03:27
> À : r-devel@r-project.org
> Objet : Re: [Rd] stats::fft produces inconsistent results


> This is a long shot, but here's a plausible scenario:

> as part of its pipeline, ravetools::mvfftw computes the mean of the
> input vector **and then centers it to a mean of zero** (intentionally or
> accidentally?)

> because variables are passed to compiled code by reference (someone
> can feel free to correct my terminology), this means that the original
> vector in R now has a mean of zero

> the first element of fft() is mean(x)*length(x), so if mean(x) has
> been forced to zero, that would explain your issue.

> I don't know about the non-reproducibility part.

> On 10/19/21 7:06 PM, Dipterix Wang wrote:
>> Dear R-devel Team,
>> 
>> I'm developing a neuroscience signal pipeline package in R 
(https://github.com/dipterix/ravetools) and I noticed a weird issue that failed 
my unit test.
>> 
>> Basically I was trying to use `fftw3` library to implement fast 
multivariate fft function in C++. When I tried to compare my results with 
stats::fft, the test result showed the first element of **expected** (which was 
produced by stats::fft) was zero, which, I am pretty sure, is wrong, and I can 
confirm that my function produces correct results.
>> 
>> However, somehow I couldn’t reproduce this issue on my personal computer 
(osx, M1, R4.1.1), the error simply went away.
>> 
>> The catch is my function produced consistent and correct results but 
stats::fft was not. This does not mean `stats::fft` has bugs. Instead, I 
suspect there could be some weird interactions between my code and stats::fft 
at C/C++ level, but I couldn’t figure it out why.
>> 
>> +++ Details:
>> 
>> Here’s the code I used for the test:
>> 
>> 
https://github.com/dipterix/ravetools/blob/4dc35d64763304aff869d92dddad38a7f2b30637/tests/testthat/test-fftw.R#L33-L41
>> 
>> Test code
>> set.seed(1)
>> x <- rnorm(1000)
>> dim(x) <- c(100,10)
>> a <- ravetools:::mvfftw_r2c(x, 0)
>> c <- apply(x, 2, stats::fft)[1:51,]
>> expect_equal(a, c)
>> 
>> 
>>

Re: [Rd] stats::fft produces inconsistent results

2021-10-20 Thread GILLIBERT, Andre

Hello,

That sounds like a good diagnosis!
Indeed, R vectors are passed "by reference" to C code, but the semantic must be 
"by value", i.e. the C function must NOT change the contents of the vector, 
except in very specific cases.

A good program that has to work on a vector, must first duplicate the vector, 
unless the only reference to the vector is the reference inside the C function.
This can be tested by the MAYBE_REFERENCED() macro in Rinternals.h.

A good example can be found in the fft() function in 
src/library/stats/src/fourier.c in R source code:
switch (TYPEOF(z)) {
case INTSXP:
case LGLSXP:
case REALSXP:
z = coerceVector(z, CPLXSXP);
break;
case CPLXSXP:
if (MAYBE_REFERENCED(z)) z = duplicate(z);
break;
default:
error(_("non-numeric argument"));
}
PROTECT(z);

This code coerces non-complex vectors to complex. Since this makes a copy, 
there is no need to duplicate.
Complex vectors are duplicated, unless they are not referenced by anything but 
the fft() function.

Now, the z vector can be modified "in place" without inconsistency.

Properly using R vectors in C code is tricky. You have to understand.
1) When you are allowed or not to modify vectors
2) When to PROTECT()vectors
3) How the garbage collector works and when it can trigger (answer : basically, 
when you call any internal R function)

Chapter 5 of "Writing R Extensions" documentation is quite extensive:
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Handling-R-objects-in-C

-- 
Sincerely
André GILLIBERT

-Message d'origine-
De : R-devel  De la part de Ben Bolker
Envoyé : mercredi 20 octobre 2021 03:27
À : r-devel@r-project.org
Objet : Re: [Rd] stats::fft produces inconsistent results


   This is a long shot, but here's a plausible scenario:

   as part of its pipeline, ravetools::mvfftw computes the mean of the
input vector **and then centers it to a mean of zero** (intentionally or
accidentally?)

   because variables are passed to compiled code by reference (someone
can feel free to correct my terminology), this means that the original
vector in R now has a mean of zero

   the first element of fft() is mean(x)*length(x), so if mean(x) has
been forced to zero, that would explain your issue.

   I don't know about the non-reproducibility part.

On 10/19/21 7:06 PM, Dipterix Wang wrote:
> Dear R-devel Team,
>
> I'm developing a neuroscience signal pipeline package in R 
> (https://github.com/dipterix/ravetools) and I noticed a weird issue that 
> failed my unit test.
>
> Basically I was trying to use `fftw3` library to implement fast multivariate 
> fft function in C++. When I tried to compare my results with stats::fft, the 
> test result showed the first element of **expected** (which was produced by 
> stats::fft) was zero, which, I am pretty sure, is wrong, and I can confirm 
> that my function produces correct results.
>
> However, somehow I couldn’t reproduce this issue on my personal computer 
> (osx, M1, R4.1.1), the error simply went away.
>
> The catch is my function produced consistent and correct results but 
> stats::fft was not. This does not mean `stats::fft` has bugs. Instead, I 
> suspect there could be some weird interactions between my code and stats::fft 
> at C/C++ level, but I couldn’t figure it out why.
>
> +++ Details:
>
> Here’s the code I used for the test:
>
> https://github.com/dipterix/ravetools/blob/4dc35d64763304aff869d92dddad38a7f2b30637/tests/testthat/test-fftw.R#L33-L41
>
> Test code
> set.seed(1)
> x <- rnorm(1000)
> dim(x) <- c(100,10)
> a <- ravetools:::mvfftw_r2c(x, 0)
> c <- apply(x, 2, stats::fft)[1:51,]
> expect_equal(a, c)
> 
>
> Here are the tests that gave me the errors:
>
> The test logs on win-builder
> https://win-builder.r-project.org/07586ios8AbL/00check.log
>
> Test logs on GitHub
> https://github.com/dipterix/ravetools/runs/3944874310?check_suite_focus=true
>
>
> —— Failed tests ——
>   -- Failure (test-fftw.R:41:3): mvfftw_r2c 
> --
>   `a` (`actual`) not equal to `c` (`expected`).
>
>   actual vs expected
>   [,1][,2]
>   [,3]  [,4]...
>   - actual[1, ] 10.8887367+ 0.000i  -3.7808077+ 0.000i   
> 2.967354+ 0.00i   5.160186+ 0.00i ...
>   + expected[1, ]0.000+ 0.000i  -3.7808077+ 0.000i   
> 2.967354+ 0.00i   5.160186+ 0.00i...
>
> 
>
> The first columns are different, `actual` is the results I produced via 
> `ravetools:::mvfftw_r2c`, and `expected` was produced by `sta

Re: [Rd] stats::fft produces inconsistent results

2021-10-19 Thread Ben Bolker


  This is a long shot, but here's a plausible scenario:

  as part of its pipeline, ravetools::mvfftw computes the mean of the 
input vector **and then centers it to a mean of zero** (intentionally or 
accidentally?)


  because variables are passed to compiled code by reference (someone 
can feel free to correct my terminology), this means that the original 
vector in R now has a mean of zero


  the first element of fft() is mean(x)*length(x), so if mean(x) has 
been forced to zero, that would explain your issue.


  I don't know about the non-reproducibility part.

On 10/19/21 7:06 PM, Dipterix Wang wrote:

Dear R-devel Team,

I'm developing a neuroscience signal pipeline package in R 
(https://github.com/dipterix/ravetools) and I noticed a weird issue that failed 
my unit test.

Basically I was trying to use `fftw3` library to implement fast multivariate 
fft function in C++. When I tried to compare my results with stats::fft, the 
test result showed the first element of **expected** (which was produced by 
stats::fft) was zero, which, I am pretty sure, is wrong, and I can confirm that 
my function produces correct results.

However, somehow I couldn’t reproduce this issue on my personal computer (osx, 
M1, R4.1.1), the error simply went away.

The catch is my function produced consistent and correct results but stats::fft 
was not. This does not mean `stats::fft` has bugs. Instead, I suspect there 
could be some weird interactions between my code and stats::fft at C/C++ level, 
but I couldn’t figure it out why.

+++ Details:

Here’s the code I used for the test:

https://github.com/dipterix/ravetools/blob/4dc35d64763304aff869d92dddad38a7f2b30637/tests/testthat/test-fftw.R#L33-L41

Test code
set.seed(1)
x <- rnorm(1000)
dim(x) <- c(100,10)
a <- ravetools:::mvfftw_r2c(x, 0)
c <- apply(x, 2, stats::fft)[1:51,]
expect_equal(a, c)


Here are the tests that gave me the errors:

The test logs on win-builder
https://win-builder.r-project.org/07586ios8AbL/00check.log

Test logs on GitHub
https://github.com/dipterix/ravetools/runs/3944874310?check_suite_focus=true


—— Failed tests ——
  -- Failure (test-fftw.R:41:3): mvfftw_r2c 
--
  `a` (`actual`) not equal to `c` (`expected`).

  actual vs expected
  [,1][,2]  
[,3]  [,4]...
  - actual[1, ] 10.8887367+ 0.000i  -3.7808077+ 0.000i   2.967354+ 
0.00i   5.160186+ 0.00i ...
  + expected[1, ]0.000+ 0.000i  -3.7808077+ 0.000i   2.967354+ 
0.00i   5.160186+ 0.00i...



The first columns are different, `actual` is the results I produced via 
`ravetools:::mvfftw_r2c`, and `expected` was produced by `stats::fft`


Any help or attention is very much appreciated.
Thanks,
- Zhengjia
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
Graduate chair, Mathematics & Statistics

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] stats::fft produces inconsistent results

Re: [Rd] stats::fft produces inconsistent results

Re: [Rd] stats::fft produces inconsistent results

Re: [Rd] stats::fft produces inconsistent results

Re: [Rd] stats::fft produces inconsistent results

Re: [Rd] stats::fft produces inconsistent results

Re: [Rd] stats::fft produces inconsistent results

Re: [Rd] stats::fft produces inconsistent results

8 matches

Site Navigation

Mail list logo

Footer information