Re: [Rd] R-devel (r81196) hanging at dchisq(large) (PR#13309)

2021-11-24 Thread Avraham Adler
On Wed, Nov 24, 2021 at 11:45 AM Martin Maechler
 wrote:
>
> > Avraham Adler  on Thu, 18 Nov 2021 02:18:54 + writes:
>
> > Hello.  I have isolated the issue: it is the
> > fused-multiply-add instruction set (FMA on Intel
> > processors). Running -march=skylake -mno-fma not only does
> > not hang, but passes make check-all (using R's native
> > BLAS).  My intuition remains that something in the new
> > more precise ebd0 code used in dpois_raw—called by dgamma,
> > called by dchsq, called by dnchisq—is hanging when the
> > assembler uses FMA. Unfortunately, I have come across
> > other cases online where the extra precision and the
> > different assembler code of FMA vs. non-FMA has caused
> > bugs, such as [1]. Page 5 of this paper by Dr. William
> > Kahan sheds some light on why this may be happening [2]
> > (PDF).
>
> > Martin & Morton, having written (PR#15628 [3]) and/or
> > implemented the ebd0 code that is now being used, can
> > either of you think of any reason why it would hang if
> > compiled using FMA?
>
> I vaguely remember I had a version of ebd0(), either Morton
> Welinder's original, or a slight modification of it that needed some
> mending, because in some border case, there was an out of
> array-boundary indexing... but that's just a vague recollection.
>
> I had investigated  ebd0()'s behavior quite a bit, also notably
> the version -- together with a pure R code version --
> in my CRAN package DPQ, yesterday updated to version 0.5-0 on CRAN
> {written in Summer, but published to CRAN only yesterday}
> where I have  dpois_raw() optionally using several experimental versions of
> bd0(), and both 'pure R' and a C version of ebd0(),
> as DPQ::ebd0() and DPQ::edb0C()
> each with an option  'verbose' which shows you which branches are chosen
> for the given arguments.
>
> So, if you install this version (0.5-0 or newer) from the development
> sources, using the *same* FMA configuration,
> I hope you should see the same "hanging" but would be able to see some
> more.. ?
>
> Can you install it from R-forge
>
> install.packages("DPQ", type = "source",
>  repos="http://R-Forge.R-project.org;)
>
> and then experiment?
> I'd be grateful  {and we maybe can move "off - mailing list"}
>
> Thank you in advance,
> Martin
>
> Martin Maechler
> ETH Zurich  and  R Core team

Sure. Responding here simply for closure. Will direct further
questions and output directly to you.

Thank yyou,

Avi

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R-devel (r81196) hanging at dchisq(large) (PR#13309)

2021-11-24 Thread Martin Maechler
> Avraham Adler  on Thu, 18 Nov 2021 02:18:54 + writes:

> Hello.  I have isolated the issue: it is the
> fused-multiply-add instruction set (FMA on Intel
> processors). Running -march=skylake -mno-fma not only does
> not hang, but passes make check-all (using R's native
> BLAS).  My intuition remains that something in the new
> more precise ebd0 code used in dpois_raw—called by dgamma,
> called by dchsq, called by dnchisq—is hanging when the
> assembler uses FMA. Unfortunately, I have come across
> other cases online where the extra precision and the
> different assembler code of FMA vs. non-FMA has caused
> bugs, such as [1]. Page 5 of this paper by Dr. William
> Kahan sheds some light on why this may be happening [2]
> (PDF).

> Martin & Morton, having written (PR#15628 [3]) and/or
> implemented the ebd0 code that is now being used, can
> either of you think of any reason why it would hang if
> compiled using FMA? 

I vaguely remember I had a version of ebd0(), either Morton
Welinder's original, or a slight modification of it that needed some
mending, because in some border case, there was an out of
array-boundary indexing... but that's just a vague recollection.

I had investigated  ebd0()'s behavior quite a bit, also notably
the version -- together with a pure R code version --
in my CRAN package DPQ, yesterday updated to version 0.5-0 on CRAN
{written in Summer, but published to CRAN only yesterday}
where I have  dpois_raw() optionally using several experimental versions of
bd0(), and both 'pure R' and a C version of ebd0(),
as DPQ::ebd0() and DPQ::edb0C()
each with an option  'verbose' which shows you which branches are chosen
for the given arguments.

So, if you install this version (0.5-0 or newer) from the development
sources, using the *same* FMA configuration,
I hope you should see the same "hanging" but would be able to see some
more.. ?

Can you install it from R-forge

install.packages("DPQ", type = "source",
 repos="http://R-Forge.R-project.org;)

and then experiment?
I'd be grateful  {and we maybe can move "off - mailing list"}

Thank you in advance,
Martin

Martin Maechler
ETH Zurich  and  R Core team


> Again, I'm not a professional, but
> line 325 of the ebd0 function in bd0.c [4] has "ADD1(-x *
> log1pmx ((M * fg - x) / x))" which looks like a
> Multiply-Add to me, at least in the inner parenthesis. Is
> there anything that can be, or should be, done with the
> code to prevent the hang, or must we forbid the use of FMA
> instructions (and I guess FMA4 on AMD processors) when
> compiling R?

> Also, What happens in the case where M/x neither over- nor
> under-flowed, M_LN2 * ((double) -e) <= 1. + DBL_MAX / x,
> fg != 1, and after 4 loops of lines 329 & 330, *yh is
> still finite? How does ebd0 exit in that case? There is no
> "return" after line 331. Am I missing something? Could
> that be related to this issue?

> As an aside, could ebd0 be enhanced by using FMA
> instructions on processors which support them?

> Thank you very much,

> Avi

> [1]
> 
https://flameeyes.blog/2014/10/27/the-subtlety-of-modern-cpus-or-the-search-for-the-phantom-bug/
> [2]
> https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF
> [3] https://bugs.r-project.org/show_bug.cgi?id=15628 [4]
> https://github.com/wch/r-source/blob/trunk/src/nmath/bd0.c

> On Wed, Nov 17, 2021 at 3:55 PM Avraham Adler
>  wrote:
>> 
>> Hello, Martin et. al.
>> 
>> I apologize for top posting, but I believe I have tracked
>> down the difference why last time my build worked and now
>> it hangs on `dchisq(c(Inf, 1e80, 1e50, 1e40), df=10,
>> ncp=1)`. and it's NOT the BLAS. I built against both 3.15
>> AND R's vanilla and it hung both times. The issue was
>> passing "march=skylake". I own an i7-8700K which gcc
>> considers a skylake. When I pass mtune=skylake, it does
>> not hang and the make check-devel after the build
>> completes.
>> 
>> Below is a list of the different flags passed when using
>> mtune vs.  march. It stands to reason that at least one
>> of them contributed to the hanging issue which Martin
>> fixed in
>> https://bugs.r-project.org/show_bug.cgi?id=13309. While I
>> recognize the obvious ones, I'm not an expert and do not
>> understand which if any may be the culprit. For
>> reference, most of these flags are described here:
>> 
https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gcc/x86-Options.html#x86-Options.
>> 
>> All the following flags are DISABLED for mtune=skylake
>> (so march=x86-64) and ENABLED when passing
>> march-skylake. For the record, I usually passed march in
>> the past without problem:
>> 
>> madx, maes, mavx, mavx2, mbmi, mbmi2, mclflushopt, mcx16,
>> mf16c, mfma, mfsgsbase, 

Re: [Rd] R-devel (r81196) hanging at dchisq(large) (PR#13309)

2021-11-17 Thread Avraham Adler
Hello.

I have isolated the issue: it is the fused-multiply-add instruction
set (FMA on Intel processors). Running -march=skylake -mno-fma not
only does not hang, but passes make check-all (using R's native BLAS).
My intuition remains that something in the new more precise ebd0 code
used in dpois_raw—called by dgamma, called by dchsq, called by
dnchisq—is hanging when the assembler uses FMA. Unfortunately, I have
come across other cases online where the extra precision and the
different assembler code of FMA vs. non-FMA has caused bugs, such as
[1]. Page 5 of this paper by Dr. William Kahan sheds some light on why
this may be happening [2] (PDF).

Martin & Morton, having written (PR#15628 [3]) and/or implemented the
ebd0 code that is now being used, can either of you think of any
reason why it would hang if compiled using FMA? Again, I'm not a
professional, but line 325 of the ebd0 function in bd0.c [4] has
"ADD1(-x * log1pmx ((M * fg - x) / x))" which looks like a
Multiply-Add to me, at least in the inner parenthesis. Is there
anything that can be, or should be, done with the code to prevent the
hang, or must we forbid the use of FMA instructions (and I guess FMA4
on AMD processors) when compiling R?

Also, What happens in the case where M/x neither over- nor
under-flowed, M_LN2 * ((double) -e) <= 1. + DBL_MAX / x, fg != 1, and
after 4 loops of lines 329 & 330, *yh is still finite? How does ebd0
exit in that case? There is no "return" after line 331. Am I missing
something? Could that be related to this issue?

As an aside, could ebd0 be enhanced by using FMA instructions on
processors which support them?

Thank you very much,

Avi

[1] 
https://flameeyes.blog/2014/10/27/the-subtlety-of-modern-cpus-or-the-search-for-the-phantom-bug/
[2] https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF
[3] https://bugs.r-project.org/show_bug.cgi?id=15628
[4] https://github.com/wch/r-source/blob/trunk/src/nmath/bd0.c

On Wed, Nov 17, 2021 at 3:55 PM Avraham Adler  wrote:
>
> Hello, Martin et. al.
>
> I apologize for top posting, but I believe I have tracked down the
> difference why last time my build worked and now it hangs on
> `dchisq(c(Inf, 1e80, 1e50, 1e40), df=10, ncp=1)`. and it's NOT the
> BLAS. I built against both 3.15 AND R's vanilla and it hung both
> times. The issue was passing "march=skylake". I own an i7-8700K which
> gcc considers a skylake. When I pass mtune=skylake, it does not hang
> and the make check-devel after the build completes.
>
> Below is a list of the different flags passed when using mtune vs.
> march. It stands to reason that at least one of them contributed to
> the hanging issue which Martin fixed in
> https://bugs.r-project.org/show_bug.cgi?id=13309. While I recognize
> the obvious ones, I'm not an expert and do not understand which if any
> may be the culprit. For reference, most of these flags are described
> here: 
> https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gcc/x86-Options.html#x86-Options.
>
> All the following flags are DISABLED for mtune=skylake (so
> march=x86-64) and ENABLED when passing march-skylake. For the record,
> I usually passed march in the past without problem:
>
> madx, maes, mavx, mavx2, mbmi, mbmi2, mclflushopt, mcx16, mf16c, mfma,
> mfsgsbase, mhle, mlzcnt, mmovbe, mpclmul, mpopcnt, mprfchw, mrdrnd,
> mrdseed, msahf, msgx, msse3, msse4, msse4.1, msse4.2, mssse3, mxsave,
> mxsavec, mxsaveopt, and mxsaves.
>
> Inversely, mno-sse4 is enabled when using mtune and disabled when
> using arch, of course.
>
> For completeness, the following two are disabled on both mtune and
> march but enabled when passing march=native, otherwise the latter is
> the same as march=skylake: mabm and mrtm. Obviously these cannot
> contribute to the hanging issue.
>
> Any ideas, especially from the experts who understand how the flags
> would address the code in dchisq, would be greatly appreciated.
>
> Thank you,
>
> Avi
>
> On Wed, Nov 17, 2021 at 7:15 AM Avraham Adler  wrote:
> >
> > On Tue, Nov 16, 2021 at 10:42 PM Kevin Ushey  wrote:
> > >
> > > Do you see this same hang in a build of R with debug symbols? Can you
> > > try running R with GDB, or even WinDbg or another debugger, to see
> > > what the call stack looks like when the hang occurs? Does the hang
> > > depend on the number of threads used by OpenBLAS?
> > >
> > > On the off chance it's relevant, I've seen hangs / crashes when using
> > > a multithreaded OpenBLAS with R on some Linux systems before, but
> > > never found the time to isolate a root cause.
> > >
> >
> >
> > This last was a good thought, Kevin, as I had just compiled OpenBLAS
> > 3.18 multi-threaded, but I recompiled it single threaded and it still
> > crashes. The version of R I built from source last time, (2021-05-20
> > r80347), does not hang when calling `dchisq(c(Inf, 1e80, 1e50, 1e40),
> > df=10, ncp=1)`. I think I built that with OpenBLAS 3.15. I can try
> > doing that here. As for building with debug symbols, I have never done
> > that 

Re: [Rd] R-devel (r81196) hanging at dchisq(large) (PR#13309)

2021-11-17 Thread Avraham Adler
Hello, Martin et. al.

I apologize for top posting, but I believe I have tracked down the
difference why last time my build worked and now it hangs on
`dchisq(c(Inf, 1e80, 1e50, 1e40), df=10, ncp=1)`. and it's NOT the
BLAS. I built against both 3.15 AND R's vanilla and it hung both
times. The issue was passing "march=skylake". I own an i7-8700K which
gcc considers a skylake. When I pass mtune=skylake, it does not hang
and the make check-devel after the build completes.

Below is a list of the different flags passed when using mtune vs.
march. It stands to reason that at least one of them contributed to
the hanging issue which Martin fixed in
https://bugs.r-project.org/show_bug.cgi?id=13309. While I recognize
the obvious ones, I'm not an expert and do not understand which if any
may be the culprit. For reference, most of these flags are described
here: https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gcc/x86-Options.html#x86-Options.

All the following flags are DISABLED for mtune=skylake (so
march=x86-64) and ENABLED when passing march-skylake. For the record,
I usually passed march in the past without problem:

madx, maes, mavx, mavx2, mbmi, mbmi2, mclflushopt, mcx16, mf16c, mfma,
mfsgsbase, mhle, mlzcnt, mmovbe, mpclmul, mpopcnt, mprfchw, mrdrnd,
mrdseed, msahf, msgx, msse3, msse4, msse4.1, msse4.2, mssse3, mxsave,
mxsavec, mxsaveopt, and mxsaves.

Inversely, mno-sse4 is enabled when using mtune and disabled when
using arch, of course.

For completeness, the following two are disabled on both mtune and
march but enabled when passing march=native, otherwise the latter is
the same as march=skylake: mabm and mrtm. Obviously these cannot
contribute to the hanging issue.

Any ideas, especially from the experts who understand how the flags
would address the code in dchisq, would be greatly appreciated.

Thank you,

Avi

On Wed, Nov 17, 2021 at 7:15 AM Avraham Adler  wrote:
>
> On Tue, Nov 16, 2021 at 10:42 PM Kevin Ushey  wrote:
> >
> > Do you see this same hang in a build of R with debug symbols? Can you
> > try running R with GDB, or even WinDbg or another debugger, to see
> > what the call stack looks like when the hang occurs? Does the hang
> > depend on the number of threads used by OpenBLAS?
> >
> > On the off chance it's relevant, I've seen hangs / crashes when using
> > a multithreaded OpenBLAS with R on some Linux systems before, but
> > never found the time to isolate a root cause.
> >
>
>
> This last was a good thought, Kevin, as I had just compiled OpenBLAS
> 3.18 multi-threaded, but I recompiled it single threaded and it still
> crashes. The version of R I built from source last time, (2021-05-20
> r80347), does not hang when calling `dchisq(c(Inf, 1e80, 1e50, 1e40),
> df=10, ncp=1)`. I think I built that with OpenBLAS 3.15. I can try
> doing that here. As for building with debug symbols, I have never done
> that before, so if you could provide some guidance (off-list if you
> think it is inappropriate to keep it here) or point me in the
> direction of some already posted advice, I would appreciate it!
>
> Avi
>
>
>
> > Best,
> > Kevin
> >
> > On Tue, Nov 16, 2021 at 5:12 AM Avraham Adler  
> > wrote:
> > >
> > > On Tue, Nov 16, 2021 at 8:43 AM Martin Maechler
> > >  wrote:
> > > >
> > > > > Avraham Adler
> > > > > on Tue, 16 Nov 2021 02:35:56 + writes:
> > > >
> > > > > I am building r-devel on Windows 10 64bit using Jeroen's mingw 
> > > > system,
> > > > > and I am finding that my make check-devel hangs on the above 
> > > > issue.
> > > > > Everything is vanila except that I am using OpenBLAS 0.3.18. I 
> > > > have
> > > > > been using OpenBLAS for over a decade and have not had this issue
> > > > > before. Is there anything I can do to dig deeper into this issue 
> > > > from
> > > > > my end? Could there be anything that changed in R-devel that may 
> > > > have
> > > > > triggered this? The bugzilla report doesn't have any code 
> > > > attached to
> > > > > it.
> > > >
> > > > > Thank you,
> > > > > Avi
> > > >
> > > > Hmm.. it would've be nice to tell a bit more, instead of having all
> > > > your readers to search links, etc.
> > > >
> > > > In the bugzilla bug report PR#13309
> > > > https://bugs.r-project.org/show_bug.cgi?id=13309 ,the example was
> > > >
> > > >  dchisq(x=Inf, df=10, ncp=1)
> > > >
> > > > I had fixed the bug 13 years ago, in svn rev 47005
> > > > with regression test in /tests/d-p-q-r-tests.R :
> > > >
> > > >
> > > > ## Non-central Chi^2 density for large x
> > > > stopifnot(0 == dchisq(c(Inf, 1e80, 1e50, 1e40), df=10, ncp=1))
> > > > ## did hang in 2.8.0 and earlier (PR#13309).
> > > >
> > > >
> > > > and you are seeing your version of R hanging at exactly this
> > > > location?
> > > >
> > > >
> > > > I'd bet quite a bit that the underlying code in these
> > > > non-central chi square computations *never* calls BLAS and hence
> > > > I cannot imagine how openBLAS could play a role.
> > > >
> > > > However, there 

Re: [Rd] R-devel (r81196) hanging at dchisq(large) (PR#13309)

2021-11-16 Thread Avraham Adler
On Tue, Nov 16, 2021 at 10:42 PM Kevin Ushey  wrote:
>
> Do you see this same hang in a build of R with debug symbols? Can you
> try running R with GDB, or even WinDbg or another debugger, to see
> what the call stack looks like when the hang occurs? Does the hang
> depend on the number of threads used by OpenBLAS?
>
> On the off chance it's relevant, I've seen hangs / crashes when using
> a multithreaded OpenBLAS with R on some Linux systems before, but
> never found the time to isolate a root cause.
>


This last was a good thought, Kevin, as I had just compiled OpenBLAS
3.18 multi-threaded, but I recompiled it single threaded and it still
crashes. The version of R I built from source last time, (2021-05-20
r80347), does not hang when calling `dchisq(c(Inf, 1e80, 1e50, 1e40),
df=10, ncp=1)`. I think I built that with OpenBLAS 3.15. I can try
doing that here. As for building with debug symbols, I have never done
that before, so if you could provide some guidance (off-list if you
think it is inappropriate to keep it here) or point me in the
direction of some already posted advice, I would appreciate it!

Avi



> Best,
> Kevin
>
> On Tue, Nov 16, 2021 at 5:12 AM Avraham Adler  wrote:
> >
> > On Tue, Nov 16, 2021 at 8:43 AM Martin Maechler
> >  wrote:
> > >
> > > > Avraham Adler
> > > > on Tue, 16 Nov 2021 02:35:56 + writes:
> > >
> > > > I am building r-devel on Windows 10 64bit using Jeroen's mingw 
> > > system,
> > > > and I am finding that my make check-devel hangs on the above issue.
> > > > Everything is vanila except that I am using OpenBLAS 0.3.18. I have
> > > > been using OpenBLAS for over a decade and have not had this issue
> > > > before. Is there anything I can do to dig deeper into this issue 
> > > from
> > > > my end? Could there be anything that changed in R-devel that may 
> > > have
> > > > triggered this? The bugzilla report doesn't have any code attached 
> > > to
> > > > it.
> > >
> > > > Thank you,
> > > > Avi
> > >
> > > Hmm.. it would've be nice to tell a bit more, instead of having all
> > > your readers to search links, etc.
> > >
> > > In the bugzilla bug report PR#13309
> > > https://bugs.r-project.org/show_bug.cgi?id=13309 ,the example was
> > >
> > >  dchisq(x=Inf, df=10, ncp=1)
> > >
> > > I had fixed the bug 13 years ago, in svn rev 47005
> > > with regression test in /tests/d-p-q-r-tests.R :
> > >
> > >
> > > ## Non-central Chi^2 density for large x
> > > stopifnot(0 == dchisq(c(Inf, 1e80, 1e50, 1e40), df=10, ncp=1))
> > > ## did hang in 2.8.0 and earlier (PR#13309).
> > >
> > >
> > > and you are seeing your version of R hanging at exactly this
> > > location?
> > >
> > >
> > > I'd bet quite a bit that the underlying code in these
> > > non-central chi square computations *never* calls BLAS and hence
> > > I cannot imagine how openBLAS could play a role.
> > >
> > > However, there must be something peculiar in your compiler setup,
> > > compilation options, 
> > > as of course the above regression test has been run 100s of
> > > 1000s of times also under Windows in the last 13 years ..
> > >
> > > Last but not least (but really only vaguely related):
> > >There is still a FIXME in the source code (but not about
> > > hanging, but rather of loosing some accuracy in border cases),
> > > see e.g. https://svn.r-project.org/R/trunk/src/nmath/dnchisq.c
> > > and for that reason I had written an R version of that C code
> > > even back in 2008 which I've made available in  CRAN package
> > > DPQ  a few years ago (together with many other D/P/Q
> > > distribution computations/approximations).
> > >  -> https://cran.r-project.org/package=DPQ
> > >
> > > Best,
> > > Martin
> > >
> >
> > Hello, Martin.
> >
> > Apologies, I thought the PR # was sufficient. Yes, I am seeing this at
> > this exact location. This is what I saw in d-p-q-r-tst-2.Rout.fail and
> > I then ran d-p-q-r-tst.R line-by-line and R hung precisely after
> > `stopifnot(0 == dchisq(c(Inf, 1e80, 1e50, 1e40), df=10, ncp=1))`.
> >
> > Is it at all possible that this has to do with the recent change from
> > bd0 to ebd0 (PR #15628) [1]?
> >
> > For completeness, I ran all the code _beneath_ the call, and while
> > nothing else cause an infinite loop, I posted what I believe may be
> > unexpected results below,
> >
> > Thank you,
> >
> > Avi
> >
> > [1]: https://bugs.r-project.org/show_bug.cgi?id=15628
> >
> > > ## FIXME ?!:  MxM/2 seems +- ok ??
> > > (dLmM <- dnbinom(xL, mu = 1, size = MxM))  # all NaN but the last
> > Warning in dnbinom(xL, mu = 1, size = MxM) : NaNs produced
> >  [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
> > NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN   0
> > > (dLpI <- dnbinom(xL, prob=1/2, size = Inf))#  ditto
> > Warning in dnbinom(xL, prob = 1/2, size = Inf) : NaNs produced
> >  [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
> > NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN   0

Re: [Rd] R-devel (r81196) hanging at dchisq(large) (PR#13309)

2021-11-16 Thread Kevin Ushey
Do you see this same hang in a build of R with debug symbols? Can you
try running R with GDB, or even WinDbg or another debugger, to see
what the call stack looks like when the hang occurs? Does the hang
depend on the number of threads used by OpenBLAS?

On the off chance it's relevant, I've seen hangs / crashes when using
a multithreaded OpenBLAS with R on some Linux systems before, but
never found the time to isolate a root cause.

Best,
Kevin

On Tue, Nov 16, 2021 at 5:12 AM Avraham Adler  wrote:
>
> On Tue, Nov 16, 2021 at 8:43 AM Martin Maechler
>  wrote:
> >
> > > Avraham Adler
> > > on Tue, 16 Nov 2021 02:35:56 + writes:
> >
> > > I am building r-devel on Windows 10 64bit using Jeroen's mingw system,
> > > and I am finding that my make check-devel hangs on the above issue.
> > > Everything is vanila except that I am using OpenBLAS 0.3.18. I have
> > > been using OpenBLAS for over a decade and have not had this issue
> > > before. Is there anything I can do to dig deeper into this issue from
> > > my end? Could there be anything that changed in R-devel that may have
> > > triggered this? The bugzilla report doesn't have any code attached to
> > > it.
> >
> > > Thank you,
> > > Avi
> >
> > Hmm.. it would've be nice to tell a bit more, instead of having all
> > your readers to search links, etc.
> >
> > In the bugzilla bug report PR#13309
> > https://bugs.r-project.org/show_bug.cgi?id=13309 ,the example was
> >
> >  dchisq(x=Inf, df=10, ncp=1)
> >
> > I had fixed the bug 13 years ago, in svn rev 47005
> > with regression test in /tests/d-p-q-r-tests.R :
> >
> >
> > ## Non-central Chi^2 density for large x
> > stopifnot(0 == dchisq(c(Inf, 1e80, 1e50, 1e40), df=10, ncp=1))
> > ## did hang in 2.8.0 and earlier (PR#13309).
> >
> >
> > and you are seeing your version of R hanging at exactly this
> > location?
> >
> >
> > I'd bet quite a bit that the underlying code in these
> > non-central chi square computations *never* calls BLAS and hence
> > I cannot imagine how openBLAS could play a role.
> >
> > However, there must be something peculiar in your compiler setup,
> > compilation options, 
> > as of course the above regression test has been run 100s of
> > 1000s of times also under Windows in the last 13 years ..
> >
> > Last but not least (but really only vaguely related):
> >There is still a FIXME in the source code (but not about
> > hanging, but rather of loosing some accuracy in border cases),
> > see e.g. https://svn.r-project.org/R/trunk/src/nmath/dnchisq.c
> > and for that reason I had written an R version of that C code
> > even back in 2008 which I've made available in  CRAN package
> > DPQ  a few years ago (together with many other D/P/Q
> > distribution computations/approximations).
> >  -> https://cran.r-project.org/package=DPQ
> >
> > Best,
> > Martin
> >
>
> Hello, Martin.
>
> Apologies, I thought the PR # was sufficient. Yes, I am seeing this at
> this exact location. This is what I saw in d-p-q-r-tst-2.Rout.fail and
> I then ran d-p-q-r-tst.R line-by-line and R hung precisely after
> `stopifnot(0 == dchisq(c(Inf, 1e80, 1e50, 1e40), df=10, ncp=1))`.
>
> Is it at all possible that this has to do with the recent change from
> bd0 to ebd0 (PR #15628) [1]?
>
> For completeness, I ran all the code _beneath_ the call, and while
> nothing else cause an infinite loop, I posted what I believe may be
> unexpected results below,
>
> Thank you,
>
> Avi
>
> [1]: https://bugs.r-project.org/show_bug.cgi?id=15628
>
> > ## FIXME ?!:  MxM/2 seems +- ok ??
> > (dLmM <- dnbinom(xL, mu = 1, size = MxM))  # all NaN but the last
> Warning in dnbinom(xL, mu = 1, size = MxM) : NaNs produced
>  [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
> NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN   0
> > (dLpI <- dnbinom(xL, prob=1/2, size = Inf))#  ditto
> Warning in dnbinom(xL, prob = 1/2, size = Inf) : NaNs produced
>  [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
> NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN   0
> > (dLpM <- dnbinom(xL, prob=1/2, size = MxM))#  ditto
> Warning in dnbinom(xL, prob = 1/2, size = MxM) : NaNs produced
>  [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
> NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN   0
>
> > d <- dnbinom(x,  mu = mu, size = Inf) # gave NaN (for 0 and L), now all 0
> Warning in dnbinom(x, mu = mu, size = Inf) : NaNs produced
> > p <- pnbinom(x,  mu = mu, size = Inf) # gave all NaN, now uses ppois(x, mu)
> Warning in pnbinom(x, mu = mu, size = Inf) : NaNs produced
>
> > pp <- (0:16)/16
> > q <- qnbinom(pp, mu = mu, size = Inf) # gave all NaN
> > set.seed(1); NI <- rnbinom(32, mu = mu, size = Inf)# gave all NaN
> > set.seed(1); N2 <- rnbinom(32, mu = mu, size = L  )
> > stopifnot(exprs = {
> + all.equal(d, c(0.006737947, 0.033689735, 0.0842243375,
> 0.140373896, 0,0,0,0), tol = 9e-9)# 7.6e-10
> + all.equal(p, c(0.006737947, 

Re: [Rd] R-devel (r81196) hanging at dchisq(large) (PR#13309)

2021-11-16 Thread Avraham Adler
On Tue, Nov 16, 2021 at 8:43 AM Martin Maechler
 wrote:
>
> > Avraham Adler
> > on Tue, 16 Nov 2021 02:35:56 + writes:
>
> > I am building r-devel on Windows 10 64bit using Jeroen's mingw system,
> > and I am finding that my make check-devel hangs on the above issue.
> > Everything is vanila except that I am using OpenBLAS 0.3.18. I have
> > been using OpenBLAS for over a decade and have not had this issue
> > before. Is there anything I can do to dig deeper into this issue from
> > my end? Could there be anything that changed in R-devel that may have
> > triggered this? The bugzilla report doesn't have any code attached to
> > it.
>
> > Thank you,
> > Avi
>
> Hmm.. it would've be nice to tell a bit more, instead of having all
> your readers to search links, etc.
>
> In the bugzilla bug report PR#13309
> https://bugs.r-project.org/show_bug.cgi?id=13309 ,the example was
>
>  dchisq(x=Inf, df=10, ncp=1)
>
> I had fixed the bug 13 years ago, in svn rev 47005
> with regression test in /tests/d-p-q-r-tests.R :
>
>
> ## Non-central Chi^2 density for large x
> stopifnot(0 == dchisq(c(Inf, 1e80, 1e50, 1e40), df=10, ncp=1))
> ## did hang in 2.8.0 and earlier (PR#13309).
>
>
> and you are seeing your version of R hanging at exactly this
> location?
>
>
> I'd bet quite a bit that the underlying code in these
> non-central chi square computations *never* calls BLAS and hence
> I cannot imagine how openBLAS could play a role.
>
> However, there must be something peculiar in your compiler setup,
> compilation options, 
> as of course the above regression test has been run 100s of
> 1000s of times also under Windows in the last 13 years ..
>
> Last but not least (but really only vaguely related):
>There is still a FIXME in the source code (but not about
> hanging, but rather of loosing some accuracy in border cases),
> see e.g. https://svn.r-project.org/R/trunk/src/nmath/dnchisq.c
> and for that reason I had written an R version of that C code
> even back in 2008 which I've made available in  CRAN package
> DPQ  a few years ago (together with many other D/P/Q
> distribution computations/approximations).
>  -> https://cran.r-project.org/package=DPQ
>
> Best,
> Martin
>

Hello, Martin.

Apologies, I thought the PR # was sufficient. Yes, I am seeing this at
this exact location. This is what I saw in d-p-q-r-tst-2.Rout.fail and
I then ran d-p-q-r-tst.R line-by-line and R hung precisely after
`stopifnot(0 == dchisq(c(Inf, 1e80, 1e50, 1e40), df=10, ncp=1))`.

Is it at all possible that this has to do with the recent change from
bd0 to ebd0 (PR #15628) [1]?

For completeness, I ran all the code _beneath_ the call, and while
nothing else cause an infinite loop, I posted what I believe may be
unexpected results below,

Thank you,

Avi

[1]: https://bugs.r-project.org/show_bug.cgi?id=15628

> ## FIXME ?!:  MxM/2 seems +- ok ??
> (dLmM <- dnbinom(xL, mu = 1, size = MxM))  # all NaN but the last
Warning in dnbinom(xL, mu = 1, size = MxM) : NaNs produced
 [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN   0
> (dLpI <- dnbinom(xL, prob=1/2, size = Inf))#  ditto
Warning in dnbinom(xL, prob = 1/2, size = Inf) : NaNs produced
 [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN   0
> (dLpM <- dnbinom(xL, prob=1/2, size = MxM))#  ditto
Warning in dnbinom(xL, prob = 1/2, size = MxM) : NaNs produced
 [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN   0

> d <- dnbinom(x,  mu = mu, size = Inf) # gave NaN (for 0 and L), now all 0
Warning in dnbinom(x, mu = mu, size = Inf) : NaNs produced
> p <- pnbinom(x,  mu = mu, size = Inf) # gave all NaN, now uses ppois(x, mu)
Warning in pnbinom(x, mu = mu, size = Inf) : NaNs produced

> pp <- (0:16)/16
> q <- qnbinom(pp, mu = mu, size = Inf) # gave all NaN
> set.seed(1); NI <- rnbinom(32, mu = mu, size = Inf)# gave all NaN
> set.seed(1); N2 <- rnbinom(32, mu = mu, size = L  )
> stopifnot(exprs = {
+ all.equal(d, c(0.006737947, 0.033689735, 0.0842243375,
0.140373896, 0,0,0,0), tol = 9e-9)# 7.6e-10
+ all.equal(p, c(0.006737947, 0.040427682, 0.1246520195,
0.265025915, 1,1,1,1), tol = 9e-9)# 7.3e-10
+ all.equal(d, dpois(x, mu))# current implementation: even identical()
+ all.equal(p, ppois(x, mu))
+ q == c(0, 2, 3, 3, 3, 4, 4, 4, 5, 5, 6, 6, 6, 7, 8, 9, Inf)
+ q == qpois(pp, mu)
+ identical(NI, N2)
+ })
Error: d and c(0.006737947, 0.033689735, 0.0842243375, 0.140373896, 0,
0,   are not equal:
  'is.NA' value mismatch: 0 in current 1 in target

> if(!(onWindows && arch == "x86")) {
+  ## This gave a practically infinite loop (on 64-bit Lnx, Windows;
not in 32-bit)
+ tools::assertWarning(p <- pchisq(1.0012e200, df=1e200, ncp=100),
+  "simpleWarning", verbose=TRUE)
+ stopifnot(p == 1)
+ 

Re: [Rd] R-devel (r81196) hanging at dchisq(large) (PR#13309)

2021-11-16 Thread Martin Maechler
> Avraham Adler 
> on Tue, 16 Nov 2021 02:35:56 + writes:

> I am building r-devel on Windows 10 64bit using Jeroen's mingw system,
> and I am finding that my make check-devel hangs on the above issue.
> Everything is vanila except that I am using OpenBLAS 0.3.18. I have
> been using OpenBLAS for over a decade and have not had this issue
> before. Is there anything I can do to dig deeper into this issue from
> my end? Could there be anything that changed in R-devel that may have
> triggered this? The bugzilla report doesn't have any code attached to
> it.

> Thank you,
> Avi

Hmm.. it would've be nice to tell a bit more, instead of having all
your readers to search links, etc.

In the bugzilla bug report PR#13309
https://bugs.r-project.org/show_bug.cgi?id=13309 ,the example was

 dchisq(x=Inf, df=10, ncp=1)

I had fixed the bug 13 years ago, in svn rev 47005
with regression test in /tests/d-p-q-r-tests.R :


## Non-central Chi^2 density for large x
stopifnot(0 == dchisq(c(Inf, 1e80, 1e50, 1e40), df=10, ncp=1))
## did hang in 2.8.0 and earlier (PR#13309).


and you are seeing your version of R hanging at exactly this
location?


I'd bet quite a bit that the underlying code in these
non-central chi square computations *never* calls BLAS and hence
I cannot imagine how openBLAS could play a role.

However, there must be something peculiar in your compiler setup,
compilation options, 
as of course the above regression test has been run 100s of
1000s of times also under Windows in the last 13 years ..

Last but not least (but really only vaguely related):
   There is still a FIXME in the source code (but not about
hanging, but rather of loosing some accuracy in border cases),
see e.g. https://svn.r-project.org/R/trunk/src/nmath/dnchisq.c
and for that reason I had written an R version of that C code
even back in 2008 which I've made available in  CRAN package
DPQ  a few years ago (together with many other D/P/Q
distribution computations/approximations).
 -> https://cran.r-project.org/package=DPQ

Best,
Martin


> sessionInfo:
> R Under development (unstable) (2021-11-15 r81196)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 10 x64 (build 19043)

> Matrix products: default

> locale:
> [1] LC_COLLATE=English_United States.1252
> [2] LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252

> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base

> loaded via a namespace (and not attached):
> [1] compiler_4.2.0 tools_4.2.0

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel