Re: [Rd] [External] Re: Operations with long altrep vectors cause segfaults on Windows

2020-09-08 Thread Hugh Parsonage
I am unable to set break or use gdb with any success when I use that version.

On linux I would do R -d gdb but this gives "unknown option '-d' "
while gdb R.exe (in the same directory as the debug version) gives the
same output as before.

I'm happy to help but I appreciate this list might not be the best
place to get a tutorial on using gdb on Windows.

On Wed, 9 Sep 2020 at 07:47, Jeroen Ooms  wrote:
>
> On Tue, Sep 8, 2020 at 11:44 PM Jeroen Ooms  wrote:
> >
> > On Tue, Sep 8, 2020 at 5:20 PM Tomas Kalibera  
> > wrote:
> > >
> > > On 9/8/20 4:48 PM, Hugh Parsonage wrote:
> > > > Unfortunately I only get
> > > >
> > > > [Thread 21752.0x4aa8 exited with code 3221225477]
> > > > [Thread 21752.0x4514 exited with code 3221225477]
> > > > [Thread 21752.0x3f10 exited with code 3221225477]
> > > > [Inferior 1 (process 21752) exited with code 0305]
> > > >
> > > > (I'm guessing I would need to build an instrumented version of R, or
> > > > can R be debugged using gdb with an off-the-shelf installation?)
> > >
> > > No, the default build lacks debug symbols. You need a build with debug
> > > symbols, and if you can reproduce in a build without compiler
> > > optimizations (-O0), the backtrace may be easier to interpret. Some bugs
> > > however "disappear" when optimizations are disabled. You can build R
> > > from source (and there may be debug builds provided by someone else
> > > (Jeroen?)).
> >
> > Debug builds for each revision are available from
> > https://r-devel.github.io . To download the installer you need to
> > click the github icon in the last column in the table. You need to be
> > signed in with a (free) Github account in order to download builds
> > (artifacts) from Github actions. It will show download links for both
> > the regular installer and installer with debug symbols.
> >
> > In other news, the https://r-devel.github.io table also shows that the
> > fix that martin committed is segfaulting on 32-bit.
>
> Sorry that was inaccurate, it is not segfaulting at all, but the unit
> test is raising an error on 32-bit.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] failing automatic incoming check

2020-09-08 Thread Uwe Ligges




On 08.09.2020 21:34, Sebastian P. Luque wrote:

Hello,

I got a notification regarding a failure to pass incoming checks
automatically after a CRAN submission.  The details are given here:

https://win-builder.r-project.org/incoming_pretest/diveMove_1.5.0_20200908_191325/

The only visible issue is a NOTE from the macosx build, with the very
terse:

"No Protocol Specified"

My searches suggest this can be ignored, but it would be nice to squash
it.  Any tips welcome.




For some reason this should hgave undergone manual inpection but got 
auto rejected. Ideally you would reduce the test timing so that the 
overall check time is less than 10 min .


Best,
Uwe Ligges

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] more Matrix weirdness

2020-09-08 Thread Rui Barradas

Hello,

R 4.0.2 on Ubuntu 20.04, sessionInfo() below.

I can reproduce this, sort of. The error I'm getting is different:


x[rr, cc] <- m
#Error in x[rr, cc] <- m : replacement has length zero

But if I check lengths and dimensions, they are identical().

identical(length(x[rr, cc]), length(m))
#[1] TRUE
identical(dim(x[rr, cc]), dim(m))
#[1] TRUE


What works is


x[rr, cc] <- as.matrix(m)

I ran Ben's code on RStudio 1.3.1073, the following is run with Rscript 
and the error message is the same.



rui@rui:~$ Rscript --vanilla rhelp.R
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=pt_PT.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=pt_PT.UTF-8LC_COLLATE=pt_PT.UTF-8
 [5] LC_MONETARY=pt_PT.UTF-8LC_MESSAGES=pt_PT.UTF-8
 [7] LC_PAPER=pt_PT.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] Matrix_1.2-18

loaded via a namespace (and not attached):
[1] compiler_4.0.2  grid_4.0.2  lattice_0.20-41
Error in x[rr, cc] <- m :
  number of items to replace is not a multiple of replacement length
Execution halted


Hope this helps,

Rui Barradas


Às 03:04 de 09/09/20, Ben Bolker escreveu:
   Am I being too optimistic in expecting this (mixing and matching 
matrices and Matrices) to work?  If x is a matrix and m is a Matrix, 
replacing a commensurately sized sub-matrix of x with m throws "number 
of items to replace is not a multiple of replacement length" ...


x <- matrix(0,nrow=3,ncol=10, dimnames=list(letters[1:3],LETTERS[1:10]))
rr <- c("a","b","c")
cc <- c("B","C","E")
m <- Matrix(matrix(1:9,3,3))
x[rr,cc] <- m

    cheers
     Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] more Matrix weirdness

2020-09-08 Thread Ben Bolker
  Am I being too optimistic in expecting this (mixing and matching 
matrices and Matrices) to work?  If x is a matrix and m is a Matrix, 
replacing a commensurately sized sub-matrix of x with m throws "number 
of items to replace is not a multiple of replacement length" ...


x <- matrix(0,nrow=3,ncol=10, dimnames=list(letters[1:3],LETTERS[1:10]))
rr <- c("a","b","c")
cc <- c("B","C","E")
m <- Matrix(matrix(1:9,3,3))
x[rr,cc] <- m

   cheers
Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: Operations with long altrep vectors cause segfaults on Windows

2020-09-08 Thread Jeroen Ooms
On Tue, Sep 8, 2020 at 11:44 PM Jeroen Ooms  wrote:
>
> On Tue, Sep 8, 2020 at 5:20 PM Tomas Kalibera  
> wrote:
> >
> > On 9/8/20 4:48 PM, Hugh Parsonage wrote:
> > > Unfortunately I only get
> > >
> > > [Thread 21752.0x4aa8 exited with code 3221225477]
> > > [Thread 21752.0x4514 exited with code 3221225477]
> > > [Thread 21752.0x3f10 exited with code 3221225477]
> > > [Inferior 1 (process 21752) exited with code 0305]
> > >
> > > (I'm guessing I would need to build an instrumented version of R, or
> > > can R be debugged using gdb with an off-the-shelf installation?)
> >
> > No, the default build lacks debug symbols. You need a build with debug
> > symbols, and if you can reproduce in a build without compiler
> > optimizations (-O0), the backtrace may be easier to interpret. Some bugs
> > however "disappear" when optimizations are disabled. You can build R
> > from source (and there may be debug builds provided by someone else
> > (Jeroen?)).
>
> Debug builds for each revision are available from
> https://r-devel.github.io . To download the installer you need to
> click the github icon in the last column in the table. You need to be
> signed in with a (free) Github account in order to download builds
> (artifacts) from Github actions. It will show download links for both
> the regular installer and installer with debug symbols.
>
> In other news, the https://r-devel.github.io table also shows that the
> fix that martin committed is segfaulting on 32-bit.

Sorry that was inaccurate, it is not segfaulting at all, but the unit
test is raising an error on 32-bit.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: Operations with long altrep vectors cause segfaults on Windows

2020-09-08 Thread Jeroen Ooms
On Tue, Sep 8, 2020 at 5:20 PM Tomas Kalibera  wrote:
>
> On 9/8/20 4:48 PM, Hugh Parsonage wrote:
> > Unfortunately I only get
> >
> > [Thread 21752.0x4aa8 exited with code 3221225477]
> > [Thread 21752.0x4514 exited with code 3221225477]
> > [Thread 21752.0x3f10 exited with code 3221225477]
> > [Inferior 1 (process 21752) exited with code 0305]
> >
> > (I'm guessing I would need to build an instrumented version of R, or
> > can R be debugged using gdb with an off-the-shelf installation?)
>
> No, the default build lacks debug symbols. You need a build with debug
> symbols, and if you can reproduce in a build without compiler
> optimizations (-O0), the backtrace may be easier to interpret. Some bugs
> however "disappear" when optimizations are disabled. You can build R
> from source (and there may be debug builds provided by someone else
> (Jeroen?)).

Debug builds for each revision are available from
https://r-devel.github.io . To download the installer you need to
click the github icon in the last column in the table. You need to be
signed in with a (free) Github account in order to download builds
(artifacts) from Github actions. It will show download links for both
the regular installer and installer with debug symbols.

In other news, the https://r-devel.github.io table also shows that the
fix that martin committed is segfaulting on 32-bit.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] some questions about R internal SEXP types

2020-09-08 Thread Dan Kortschak via R-devel
Thank you everyone who has helped a non-R developer attempt to build a
tool to extend the R ecosystem.

>From what I've read, it looks like I should document the sexp internals
package I provide as a here-be-dragons package, keep the hand-holding
level of the rgo tool using Cgo calls to perform data interchange, and
try to sort out some form of cross language testing to ensure skew
between my understanding of R internals and what actually happens
internally, and as that potentially changes over time.

If anyone has any additional comments that they feel will be helpful in
this thread for me, please make sure that my address is included in the
cc list as I will be unsubscribing.

Again, thanks for the help.

Dan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] failing automatic incoming check

2020-09-08 Thread Sebastian P. Luque
Hello,

I got a notification regarding a failure to pass incoming checks
automatically after a CRAN submission.  The details are given here:

https://win-builder.r-project.org/incoming_pretest/diveMove_1.5.0_20200908_191325/

The only visible issue is a NOTE from the macosx build, with the very
terse:

"No Protocol Specified"

My searches suggest this can be ignored, but it would be nice to squash
it.  Any tips welcome.

-- 
Seb

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: Operations with long altrep vectors cause segfaults on Windows

2020-09-08 Thread luke-tierney

On Tue, 8 Sep 2020, Martin Maechler wrote:


luke-tierney
on Tue, 8 Sep 2020 09:42:43 -0500 (CDT) writes:


   > On Tue, 8 Sep 2020, Martin Maechler wrote:
   >>> Martin Maechler
   >>> on Tue, 8 Sep 2020 10:40:24 +0200 writes:
   >>
   >>> Hugh Parsonage
   >>> on Tue, 8 Sep 2020 18:08:11 +1000 writes:
   >>
   >> >> I can only reproduce on Windows, but reliably (both 4.0.0 and 4.0.2):
   >>
   >> >> $> R --vanilla
   >> >> x <- c(0L, -2e9:2e9)
   >>
   >> >> # > Segmentation fault
   >>
   >> >> Tried to reproduce on Linux but the above worked as expected. Not an
   >> >> issue merely with the length of the vector; for example, x <-
   >> >> rep_len(1:10, 1e10) works, though the altrep vector must be long to
   >> >> reproduce:
   >>
   >> >> x <- c(0L, -1e9:1e9)  #ok
   >>
   >> >> Segmentation faults occur with the following too:
   >>
   >> >> x <- (-2e9:2e9) + 1L
   >>
   >> > Your operation would "need" (not in theory, but in practice)
   >> > to go from altrep to regular vectors.
   >> > I guess the segfault occurs because of something like this :
   >>
   >> > R asks Windows to hand it a huge amount of memory and Windows replies
   >> > "ok, here is the memory pointer"
   >> > and then R tries to write to there, but illegally (because
   >> > Windows should have told R that it does not really have enough
   >> > memory for that ..).
   >>
   >> > I cannot reproduce the segmentation fault .. but I can confirm
   >> > there is a bug there that shows for me on Windows but not on
   >> > Linux:
   >>
   >> > "My" Windows is on a terminalserver not with too many GB of memory
   >> > (but then in a version of Windows that recognizes that it cannot
   >> > get so much memory):
   >>
   >> > - Here some transcript (thanks to
   >> > using Emacs w/ ESS also on Windows) --
   >>
   >> > R Under development (unstable) (2020-08-24 r79074) -- "Unsuffered 
Consequences"
   >> > Copyright (C) 2020 The R Foundation for Statistical Computing
   >> > Platform: x86_64-w64-mingw32/x64 (64-bit)
   >>
   >> > R ist freie Software und kommt OHNE JEGLICHE GARANTIE.
   >> > Sie sind eingeladen, es unter bestimmten Bedingungen weiter zu 
verbreiten.
   >> > Tippen Sie 'license()' or 'licence()' für Details dazu.
   >>
   >> > R ist ein Gemeinschaftsprojekt mit vielen Beitragenden.
   >> > Tippen Sie 'contributors()' für mehr Information und 'citation()',
   >> > um zu erfahren, wie R oder R packages in Publikationen zitiert werden 
können.
   >>
   >> > Tippen Sie 'demo()' für einige Demos, 'help()' für on-line Hilfe, oder
   >> > 'help.start()' für eine HTML Browserschnittstelle zur Hilfe.
   >> > Tippen Sie 'q()', um R zu verlassen.
   >>
   >> >> x <- (-2e9:2e9) + 1L
   >> > Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
   >> >> y <- c(0L, -2e9:2e9)
   >> > Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
   >> >> Sys.setenv(LANGUAGE="en")
   >> >> y <- c(0L, -2e9:2e9)
   >> > Error: cannot allocate vector of size 14.9 Gb
   >> >> y <- -1e9:4e9
   >> >> .Internal(inspect(y))
   >> > @0x195a6808 14 REALSXP g0c0 [REF(65535)]  -10 : 
-294967296 (compact)
   >> >> .Machine$integer.max / 1e9
   >> > [1] 2.147484
   >> >> y <- -1e6:2.2e9
   >> >> .Internal(inspect(y))
   >> > @0x0a11a5d8 14 REALSXP g0c0 [REF(65535)]  -100 : 
-2094967296 (compact)
   >> >> y <- -1e6:2e9
   >> >> .Internal(inspect(y))
   >> > @0x0a13adf0 13 INTSXP g0c0 [REF(65535)]  -100 : 20 
(compact)
   >> >>
   >> > - end of transcript 
---
   >>
   >> > So indeed, no seg.fault, R notices that it can't get 15 GB of
   >> > memory.
   >>
   >> > But the bug is bad news:  We have *silent* integer overflow happening
   >> > according to what  .Internal(inspect(y)) shows...
   >>
   >> >  less bad new: Probably the bug is only in the 'internal inspect' 
code
   >> > where a format specifier is used in C's printf() that does not work
   >> > correctly on Windows, at least the way it is currently compiled ..
   >>
   >>
   >> > On (64-bit) Linux, I get
   >>
   >> >> y <- -1e9:4e9 ; .Internal(inspect(y))
   >> > @7d86388 14 REALSXP g0c0 [REF(65535)]  -10 : 40 
(compact)
   >>
   >> >> y <- c(0L, y)
   >> > Error: cannot allocate vector of size 37.3 Gb
   >>
   >> > which seems much better ... until I do find a bug, may again
   >> > only in the C code underlying .Internal(inspect(.)) :
   >>
   >> >> y <- -1e9:2e9 ; .Internal(inspect(y))
   >> > @7d86ac0 13 INTSXP g0c0 [REF(65535)] Error: long vectors not supported 
yet: ../../../R/src/main/altclasses.c:139
   >> >>
   >>
   >> Indeed, the purported "integer overflow" (above) does not
   >> happen.
   >> It is "only" a  'printf' related bug inside .Internal(inspect(.)) on 
Windows.
   >>
   >> *interestingly*, the above bug I've noticed on (64-bit) Linux
   >> does *not* show on Windows (64-bit), at least not for tha

Re: [Rd] [External] Re: Operations with long altrep vectors cause segfaults on Windows

2020-09-08 Thread Martin Maechler
> luke-tierney  
> on Tue, 8 Sep 2020 09:42:43 -0500 (CDT) writes:

> On Tue, 8 Sep 2020, Martin Maechler wrote:
>>> Martin Maechler
>>> on Tue, 8 Sep 2020 10:40:24 +0200 writes:
>> 
>>> Hugh Parsonage
>>> on Tue, 8 Sep 2020 18:08:11 +1000 writes:
>> 
>> >> I can only reproduce on Windows, but reliably (both 4.0.0 and 4.0.2):
>> 
>> >> $> R --vanilla
>> >> x <- c(0L, -2e9:2e9)
>> 
>> >> # > Segmentation fault
>> 
>> >> Tried to reproduce on Linux but the above worked as expected. Not an
>> >> issue merely with the length of the vector; for example, x <-
>> >> rep_len(1:10, 1e10) works, though the altrep vector must be long to
>> >> reproduce:
>> 
>> >> x <- c(0L, -1e9:1e9)  #ok
>> 
>> >> Segmentation faults occur with the following too:
>> 
>> >> x <- (-2e9:2e9) + 1L
>> 
>> > Your operation would "need" (not in theory, but in practice)
>> > to go from altrep to regular vectors.
>> > I guess the segfault occurs because of something like this :
>> 
>> > R asks Windows to hand it a huge amount of memory and Windows replies
>> > "ok, here is the memory pointer"
>> > and then R tries to write to there, but illegally (because
>> > Windows should have told R that it does not really have enough
>> > memory for that ..).
>> 
>> > I cannot reproduce the segmentation fault .. but I can confirm
>> > there is a bug there that shows for me on Windows but not on
>> > Linux:
>> 
>> > "My" Windows is on a terminalserver not with too many GB of memory
>> > (but then in a version of Windows that recognizes that it cannot
>> > get so much memory):
>> 
>> > - Here some transcript (thanks to
>> > using Emacs w/ ESS also on Windows) --
>> 
>> > R Under development (unstable) (2020-08-24 r79074) -- "Unsuffered 
Consequences"
>> > Copyright (C) 2020 The R Foundation for Statistical Computing
>> > Platform: x86_64-w64-mingw32/x64 (64-bit)
>> 
>> > R ist freie Software und kommt OHNE JEGLICHE GARANTIE.
>> > Sie sind eingeladen, es unter bestimmten Bedingungen weiter zu 
verbreiten.
>> > Tippen Sie 'license()' or 'licence()' für Details dazu.
>> 
>> > R ist ein Gemeinschaftsprojekt mit vielen Beitragenden.
>> > Tippen Sie 'contributors()' für mehr Information und 'citation()',
>> > um zu erfahren, wie R oder R packages in Publikationen zitiert werden 
können.
>> 
>> > Tippen Sie 'demo()' für einige Demos, 'help()' für on-line Hilfe, oder
>> > 'help.start()' für eine HTML Browserschnittstelle zur Hilfe.
>> > Tippen Sie 'q()', um R zu verlassen.
>> 
>> >> x <- (-2e9:2e9) + 1L
>> > Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
>> >> y <- c(0L, -2e9:2e9)
>> > Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
>> >> Sys.setenv(LANGUAGE="en")
>> >> y <- c(0L, -2e9:2e9)
>> > Error: cannot allocate vector of size 14.9 Gb
>> >> y <- -1e9:4e9
>> >> .Internal(inspect(y))
>> > @0x195a6808 14 REALSXP g0c0 [REF(65535)]  -10 : 
-294967296 (compact)
>> >> .Machine$integer.max / 1e9
>> > [1] 2.147484
>> >> y <- -1e6:2.2e9
>> >> .Internal(inspect(y))
>> > @0x0a11a5d8 14 REALSXP g0c0 [REF(65535)]  -100 : 
-2094967296 (compact)
>> >> y <- -1e6:2e9
>> >> .Internal(inspect(y))
>> > @0x0a13adf0 13 INTSXP g0c0 [REF(65535)]  -100 : 20 
(compact)
>> >>
>> > - end of transcript 
---
>> 
>> > So indeed, no seg.fault, R notices that it can't get 15 GB of
>> > memory.
>> 
>> > But the bug is bad news:  We have *silent* integer overflow happening
>> > according to what  .Internal(inspect(y)) shows...
>> 
>> >  less bad new: Probably the bug is only in the 'internal inspect' 
code
>> > where a format specifier is used in C's printf() that does not work
>> > correctly on Windows, at least the way it is currently compiled ..
>> 
>> 
>> > On (64-bit) Linux, I get
>> 
>> >> y <- -1e9:4e9 ; .Internal(inspect(y))
>> > @7d86388 14 REALSXP g0c0 [REF(65535)]  -10 : 40 
(compact)
>> 
>> >> y <- c(0L, y)
>> > Error: cannot allocate vector of size 37.3 Gb
>> 
>> > which seems much better ... until I do find a bug, may again
>> > only in the C code underlying .Internal(inspect(.)) :
>> 
>> >> y <- -1e9:2e9 ; .Internal(inspect(y))
>> > @7d86ac0 13 INTSXP g0c0 [REF(65535)] Error: long vectors not supported 
yet: ../../../R/src/main/altclasses.c:139
>> >>
>> 
>> Indeed, the purported "integer overflow" (above) does not
>> happen.
>> It is "only" a  'printf' related bug inside .Internal(inspect(.)) on 
Windows.
>> 
>> *interestingly*

Re: [Rd] [External] Re: Operations with long altrep vectors cause segfaults on Windows

2020-09-08 Thread Tomas Kalibera

On 9/8/20 4:48 PM, Hugh Parsonage wrote:

Unfortunately I only get

[Thread 21752.0x4aa8 exited with code 3221225477]
[Thread 21752.0x4514 exited with code 3221225477]
[Thread 21752.0x3f10 exited with code 3221225477]
[Inferior 1 (process 21752) exited with code 0305]

(I'm guessing I would need to build an instrumented version of R, or
can R be debugged using gdb with an off-the-shelf installation?)


No, the default build lacks debug symbols. You need a build with debug 
symbols, and if you can reproduce in a build without compiler 
optimizations (-O0), the backtrace may be easier to interpret. Some bugs 
however "disappear" when optimizations are disabled. You can build R 
from source (and there may be debug builds provided by someone else 
(Jeroen?)).


Tomas



On Wed, 9 Sep 2020 at 00:32,  wrote:

On Tue, 8 Sep 2020, Hugh Parsonage wrote:


Thanks Martin.  On further testing, it seems that the segmentation
fault can only occur when the amount of obtainable memory is
sufficiently high. On my machine (admittedly with other processes
running):

$ R --vanilla --max-mem-size=30G -e "x <- c(0L, -2e9:2e9)"
Segmentation fault

$ R --vanilla --max-mem-size=29G -e "x <- c(0L, -2e9:2e9)"
Error: cannot allocate vector of size 14.9 Gb
Execution halted

Unfortunately I don't have access to a Windows machine with enough
memory to get to the point of failure. If you have rtools and gdb
installed can you run in gdb and see where the segfault is happening?

Best,

luke


On Tue, 8 Sep 2020 at 18:52, Martin Maechler  wrote:

Martin Maechler
 on Tue, 8 Sep 2020 10:40:24 +0200 writes:
Hugh Parsonage
 on Tue, 8 Sep 2020 18:08:11 +1000 writes:

>> I can only reproduce on Windows, but reliably (both 4.0.0 and 4.0.2):

>> $> R --vanilla
>> x <- c(0L, -2e9:2e9)

>> # > Segmentation fault

>> Tried to reproduce on Linux but the above worked as expected. Not an
>> issue merely with the length of the vector; for example, x <-
>> rep_len(1:10, 1e10) works, though the altrep vector must be long to
>> reproduce:

>> x <- c(0L, -1e9:1e9)  #ok

>> Segmentation faults occur with the following too:

>> x <- (-2e9:2e9) + 1L

> Your operation would "need" (not in theory, but in practice)
> to go from altrep to regular vectors.
> I guess the segfault occurs because of something like this :

> R asks Windows to hand it a huge amount of memory and Windows replies
> "ok, here is the memory pointer"
> and then R tries to write to there, but illegally (because
> Windows should have told R that it does not really have enough
> memory for that ..).

> I cannot reproduce the segmentation fault .. but I can confirm
> there is a bug there that shows for me on Windows but not on
> Linux:

> "My" Windows is on a terminalserver not with too many GB of memory
> (but then in a version of Windows that recognizes that it cannot
> get so much memory):

> - Here some transcript (thanks to
> using Emacs w/ ESS also on Windows) --

> R Under development (unstable) (2020-08-24 r79074) -- "Unsuffered 
Consequences"
> Copyright (C) 2020 The R Foundation for Statistical Computing
> Platform: x86_64-w64-mingw32/x64 (64-bit)

> R ist freie Software und kommt OHNE JEGLICHE GARANTIE.
> Sie sind eingeladen, es unter bestimmten Bedingungen weiter zu verbreiten.
> Tippen Sie 'license()' or 'licence()' für Details dazu.

> R ist ein Gemeinschaftsprojekt mit vielen Beitragenden.
> Tippen Sie 'contributors()' für mehr Information und 'citation()',
> um zu erfahren, wie R oder R packages in Publikationen zitiert werden 
können.

> Tippen Sie 'demo()' für einige Demos, 'help()' für on-line Hilfe, oder
> 'help.start()' für eine HTML Browserschnittstelle zur Hilfe.
> Tippen Sie 'q()', um R zu verlassen.

>> x <- (-2e9:2e9) + 1L
> Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
>> y <- c(0L, -2e9:2e9)
> Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
>> Sys.setenv(LANGUAGE="en")
>> y <- c(0L, -2e9:2e9)
> Error: cannot allocate vector of size 14.9 Gb
>> y <- -1e9:4e9
>> .Internal(inspect(y))
> @0x195a6808 14 REALSXP g0c0 [REF(65535)]  -10 : 
-294967296 (compact)
>> .Machine$integer.max / 1e9
> [1] 2.147484
>> y <- -1e6:2.2e9
>> .Internal(inspect(y))
> @0x0a11a5d8 14 REALSXP g0c0 [REF(65535)]  -100 : -2094967296 
(compact)
>> y <- -1e6:2e9
>> .Internal(inspect(y))
> @0x0a13adf0 13 INTSXP g0c0 [REF(65535)]  -100 : 20 
(compact)
>>
> - end of transcript 
---

> So indeed, no seg.fault, R notices that it can't get 15 GB of
> memory.

> But the bug is bad news:  We have *silent* integer overflow happening
> according to what  .Internal(inspect(y)) shows...

> .

Re: [Rd] [External] Re: Operations with long altrep vectors cause segfaults on Windows

2020-09-08 Thread Hugh Parsonage
Unfortunately I only get

[Thread 21752.0x4aa8 exited with code 3221225477]
[Thread 21752.0x4514 exited with code 3221225477]
[Thread 21752.0x3f10 exited with code 3221225477]
[Inferior 1 (process 21752) exited with code 0305]

(I'm guessing I would need to build an instrumented version of R, or
can R be debugged using gdb with an off-the-shelf installation?)

On Wed, 9 Sep 2020 at 00:32,  wrote:
>
> On Tue, 8 Sep 2020, Hugh Parsonage wrote:
>
> > Thanks Martin.  On further testing, it seems that the segmentation
> > fault can only occur when the amount of obtainable memory is
> > sufficiently high. On my machine (admittedly with other processes
> > running):
> >
> > $ R --vanilla --max-mem-size=30G -e "x <- c(0L, -2e9:2e9)"
> > Segmentation fault
> >
> > $ R --vanilla --max-mem-size=29G -e "x <- c(0L, -2e9:2e9)"
> > Error: cannot allocate vector of size 14.9 Gb
> > Execution halted
>
> Unfortunately I don't have access to a Windows machine with enough
> memory to get to the point of failure. If you have rtools and gdb
> installed can you run in gdb and see where the segfault is happening?
>
> Best,
>
> luke
>
> >
> > On Tue, 8 Sep 2020 at 18:52, Martin Maechler  
> > wrote:
> >>
> >>> Martin Maechler
> >>> on Tue, 8 Sep 2020 10:40:24 +0200 writes:
> >>
> >>> Hugh Parsonage
> >>> on Tue, 8 Sep 2020 18:08:11 +1000 writes:
> >>
> >>>> I can only reproduce on Windows, but reliably (both 4.0.0 and 4.0.2):
> >>
> >>>> $> R --vanilla
> >>>> x <- c(0L, -2e9:2e9)
> >>
> >>>> # > Segmentation fault
> >>
> >>>> Tried to reproduce on Linux but the above worked as expected. Not an
> >>>> issue merely with the length of the vector; for example, x <-
> >>>> rep_len(1:10, 1e10) works, though the altrep vector must be long to
> >>>> reproduce:
> >>
> >>>> x <- c(0L, -1e9:1e9)  #ok
> >>
> >>>> Segmentation faults occur with the following too:
> >>
> >>>> x <- (-2e9:2e9) + 1L
> >>
> >>> Your operation would "need" (not in theory, but in practice)
> >>> to go from altrep to regular vectors.
> >>> I guess the segfault occurs because of something like this :
> >>
> >>> R asks Windows to hand it a huge amount of memory and Windows replies
> >>> "ok, here is the memory pointer"
> >>> and then R tries to write to there, but illegally (because
> >>> Windows should have told R that it does not really have enough
> >>> memory for that ..).
> >>
> >>> I cannot reproduce the segmentation fault .. but I can confirm
> >>> there is a bug there that shows for me on Windows but not on
> >>> Linux:
> >>
> >>> "My" Windows is on a terminalserver not with too many GB of memory
> >>> (but then in a version of Windows that recognizes that it cannot
> >>> get so much memory):
> >>
> >>> - Here some transcript (thanks to
> >>> using Emacs w/ ESS also on Windows) --
> >>
> >>> R Under development (unstable) (2020-08-24 r79074) -- "Unsuffered 
> >> Consequences"
> >>> Copyright (C) 2020 The R Foundation for Statistical Computing
> >>> Platform: x86_64-w64-mingw32/x64 (64-bit)
> >>
> >>> R ist freie Software und kommt OHNE JEGLICHE GARANTIE.
> >>> Sie sind eingeladen, es unter bestimmten Bedingungen weiter zu 
> >> verbreiten.
> >>> Tippen Sie 'license()' or 'licence()' für Details dazu.
> >>
> >>> R ist ein Gemeinschaftsprojekt mit vielen Beitragenden.
> >>> Tippen Sie 'contributors()' für mehr Information und 'citation()',
> >>> um zu erfahren, wie R oder R packages in Publikationen zitiert werden 
> >> können.
> >>
> >>> Tippen Sie 'demo()' für einige Demos, 'help()' für on-line Hilfe, oder
> >>> 'help.start()' für eine HTML Browserschnittstelle zur Hilfe.
> >>> Tippen Sie 'q()', um R zu verlassen.
> >>
> >>>> x <- (-2e9:2e9) + 1L
> >>> Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
> >>>> y <- c(0L, -2e9:2e9)
> >>> Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
> >>>> Sys.setenv(LANGUAGE="en")
> >>>> y <- c(0L, -2e9:2e9)
> >>> Error: cannot allocate vector of size 14.9 Gb
> >>>> y <- -1e9:4e9
> >>>> .Internal(inspect(y))
> >>> @0x195a6808 14 REALSXP g0c0 [REF(65535)]  -10 : 
> >> -294967296 (compact)
> >>>> .Machine$integer.max / 1e9
> >>> [1] 2.147484
> >>>> y <- -1e6:2.2e9
> >>>> .Internal(inspect(y))
> >>> @0x0a11a5d8 14 REALSXP g0c0 [REF(65535)]  -100 : 
> >> -2094967296 (compact)
> >>>> y <- -1e6:2e9
> >>>> .Internal(inspect(y))
> >>> @0x0a13adf0 13 INTSXP g0c0 [REF(65535)]  -100 : 
> >> 20 (compact)
> >>>>
> >>> - end of transcript 
> >> ---
> >>
> >>> So indeed, no seg.fault, R notices that it can't get 15 GB of
> >>> memory.
> >>
> >>> But the bug is bad news:  We have *silent* integer overflow happening
>

Re: [Rd] [External] Re: Operations with long altrep vectors cause segfaults on Windows

2020-09-08 Thread luke-tierney

On Tue, 8 Sep 2020, Martin Maechler wrote:


Martin Maechler
on Tue, 8 Sep 2020 10:40:24 +0200 writes:



Hugh Parsonage
on Tue, 8 Sep 2020 18:08:11 +1000 writes:


   >> I can only reproduce on Windows, but reliably (both 4.0.0 and 4.0.2):

   >> $> R --vanilla
   >> x <- c(0L, -2e9:2e9)

   >> # > Segmentation fault

   >> Tried to reproduce on Linux but the above worked as expected. Not an
   >> issue merely with the length of the vector; for example, x <-
   >> rep_len(1:10, 1e10) works, though the altrep vector must be long to
   >> reproduce:

   >> x <- c(0L, -1e9:1e9)  #ok

   >> Segmentation faults occur with the following too:

   >> x <- (-2e9:2e9) + 1L

   > Your operation would "need" (not in theory, but in practice)
   > to go from altrep to regular vectors.
   > I guess the segfault occurs because of something like this :

   > R asks Windows to hand it a huge amount of memory and Windows replies
   > "ok, here is the memory pointer"
   > and then R tries to write to there, but illegally (because
   > Windows should have told R that it does not really have enough
   > memory for that ..).

   > I cannot reproduce the segmentation fault .. but I can confirm
   > there is a bug there that shows for me on Windows but not on
   > Linux:

   > "My" Windows is on a terminalserver not with too many GB of memory
   > (but then in a version of Windows that recognizes that it cannot
   > get so much memory):

   > - Here some transcript (thanks to
   > using Emacs w/ ESS also on Windows) --

   > R Under development (unstable) (2020-08-24 r79074) -- "Unsuffered 
Consequences"
   > Copyright (C) 2020 The R Foundation for Statistical Computing
   > Platform: x86_64-w64-mingw32/x64 (64-bit)

   > R ist freie Software und kommt OHNE JEGLICHE GARANTIE.
   > Sie sind eingeladen, es unter bestimmten Bedingungen weiter zu verbreiten.
   > Tippen Sie 'license()' or 'licence()' für Details dazu.

   > R ist ein Gemeinschaftsprojekt mit vielen Beitragenden.
   > Tippen Sie 'contributors()' für mehr Information und 'citation()',
   > um zu erfahren, wie R oder R packages in Publikationen zitiert werden 
können.

   > Tippen Sie 'demo()' für einige Demos, 'help()' für on-line Hilfe, oder
   > 'help.start()' für eine HTML Browserschnittstelle zur Hilfe.
   > Tippen Sie 'q()', um R zu verlassen.

   >> x <- (-2e9:2e9) + 1L
   > Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
   >> y <- c(0L, -2e9:2e9)
   > Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
   >> Sys.setenv(LANGUAGE="en")
   >> y <- c(0L, -2e9:2e9)
   > Error: cannot allocate vector of size 14.9 Gb
   >> y <- -1e9:4e9
   >> .Internal(inspect(y))
   > @0x195a6808 14 REALSXP g0c0 [REF(65535)]  -10 : -294967296 
(compact)
   >> .Machine$integer.max / 1e9
   > [1] 2.147484
   >> y <- -1e6:2.2e9
   >> .Internal(inspect(y))
   > @0x0a11a5d8 14 REALSXP g0c0 [REF(65535)]  -100 : -2094967296 
(compact)
   >> y <- -1e6:2e9
   >> .Internal(inspect(y))
   > @0x0a13adf0 13 INTSXP g0c0 [REF(65535)]  -100 : 20 
(compact)
   >>
   > - end of transcript 
---

   > So indeed, no seg.fault, R notices that it can't get 15 GB of
   > memory.

   > But the bug is bad news:  We have *silent* integer overflow happening
   > according to what  .Internal(inspect(y)) shows...

   >  less bad new: Probably the bug is only in the 'internal inspect' code
   > where a format specifier is used in C's printf() that does not work
   > correctly on Windows, at least the way it is currently compiled ..


   > On (64-bit) Linux, I get

   >> y <- -1e9:4e9 ; .Internal(inspect(y))
   > @7d86388 14 REALSXP g0c0 [REF(65535)]  -10 : 40 (compact)

   >> y <- c(0L, y)
   > Error: cannot allocate vector of size 37.3 Gb

   > which seems much better ... until I do find a bug, may again
   > only in the C code underlying .Internal(inspect(.)) :

   >> y <- -1e9:2e9 ; .Internal(inspect(y))
   > @7d86ac0 13 INTSXP g0c0 [REF(65535)] Error: long vectors not supported 
yet: ../../../R/src/main/altclasses.c:139
   >>

Indeed, the purported "integer overflow" (above) does not
happen.
It is "only" a  'printf' related bug inside .Internal(inspect(.)) on Windows.

*interestingly*, the above bug I've noticed on (64-bit) Linux
does *not* show on Windows (64-bit), at least not for that case:

On Windows, things are fine as long as they remain (compacted
aka 'ALTREP') INTSXP:

 > y <- -1e3:2e9 ;.Internal(inspect(y))
 @0x0a285648 13 INTSXP g0c0 [REF(65535)]  -1000 : 20 (compact)
 > y <- -1e3:2.1e9 ;.Internal(inspect(y))
 @0x19925930 13 INTSXP g0c0 [REF(65535)]  -1000 : 21 (compact)

and here, y is correct, just the printing from
.Internal(inspect(y)) is bugous (probably prints the double as an integer):


It's a '%ld' that probably needs to be '%lld' for Windows. Will fix
sometime soon.

Best,

luke

Re: [Rd] [External] Re: Operations with long altrep vectors cause segfaults on Windows

2020-09-08 Thread luke-tierney

On Tue, 8 Sep 2020, Hugh Parsonage wrote:


Thanks Martin.  On further testing, it seems that the segmentation
fault can only occur when the amount of obtainable memory is
sufficiently high. On my machine (admittedly with other processes
running):

$ R --vanilla --max-mem-size=30G -e "x <- c(0L, -2e9:2e9)"
Segmentation fault

$ R --vanilla --max-mem-size=29G -e "x <- c(0L, -2e9:2e9)"
Error: cannot allocate vector of size 14.9 Gb
Execution halted


Unfortunately I don't have access to a Windows machine with enough
memory to get to the point of failure. If you have rtools and gdb
installed can you run in gdb and see where the segfault is happening?

Best,

luke



On Tue, 8 Sep 2020 at 18:52, Martin Maechler  wrote:



Martin Maechler
on Tue, 8 Sep 2020 10:40:24 +0200 writes:



Hugh Parsonage
on Tue, 8 Sep 2020 18:08:11 +1000 writes:


   >> I can only reproduce on Windows, but reliably (both 4.0.0 and 4.0.2):

   >> $> R --vanilla
   >> x <- c(0L, -2e9:2e9)

   >> # > Segmentation fault

   >> Tried to reproduce on Linux but the above worked as expected. Not an
   >> issue merely with the length of the vector; for example, x <-
   >> rep_len(1:10, 1e10) works, though the altrep vector must be long to
   >> reproduce:

   >> x <- c(0L, -1e9:1e9)  #ok

   >> Segmentation faults occur with the following too:

   >> x <- (-2e9:2e9) + 1L

   > Your operation would "need" (not in theory, but in practice)
   > to go from altrep to regular vectors.
   > I guess the segfault occurs because of something like this :

   > R asks Windows to hand it a huge amount of memory and Windows replies
   > "ok, here is the memory pointer"
   > and then R tries to write to there, but illegally (because
   > Windows should have told R that it does not really have enough
   > memory for that ..).

   > I cannot reproduce the segmentation fault .. but I can confirm
   > there is a bug there that shows for me on Windows but not on
   > Linux:

   > "My" Windows is on a terminalserver not with too many GB of memory
   > (but then in a version of Windows that recognizes that it cannot
   > get so much memory):

   > - Here some transcript (thanks to
   > using Emacs w/ ESS also on Windows) --

   > R Under development (unstable) (2020-08-24 r79074) -- "Unsuffered 
Consequences"
   > Copyright (C) 2020 The R Foundation for Statistical Computing
   > Platform: x86_64-w64-mingw32/x64 (64-bit)

   > R ist freie Software und kommt OHNE JEGLICHE GARANTIE.
   > Sie sind eingeladen, es unter bestimmten Bedingungen weiter zu verbreiten.
   > Tippen Sie 'license()' or 'licence()' für Details dazu.

   > R ist ein Gemeinschaftsprojekt mit vielen Beitragenden.
   > Tippen Sie 'contributors()' für mehr Information und 'citation()',
   > um zu erfahren, wie R oder R packages in Publikationen zitiert werden 
können.

   > Tippen Sie 'demo()' für einige Demos, 'help()' für on-line Hilfe, oder
   > 'help.start()' für eine HTML Browserschnittstelle zur Hilfe.
   > Tippen Sie 'q()', um R zu verlassen.

   >> x <- (-2e9:2e9) + 1L
   > Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
   >> y <- c(0L, -2e9:2e9)
   > Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
   >> Sys.setenv(LANGUAGE="en")
   >> y <- c(0L, -2e9:2e9)
   > Error: cannot allocate vector of size 14.9 Gb
   >> y <- -1e9:4e9
   >> .Internal(inspect(y))
   > @0x195a6808 14 REALSXP g0c0 [REF(65535)]  -10 : -294967296 
(compact)
   >> .Machine$integer.max / 1e9
   > [1] 2.147484
   >> y <- -1e6:2.2e9
   >> .Internal(inspect(y))
   > @0x0a11a5d8 14 REALSXP g0c0 [REF(65535)]  -100 : -2094967296 
(compact)
   >> y <- -1e6:2e9
   >> .Internal(inspect(y))
   > @0x0a13adf0 13 INTSXP g0c0 [REF(65535)]  -100 : 20 
(compact)
   >>
   > - end of transcript 
---

   > So indeed, no seg.fault, R notices that it can't get 15 GB of
   > memory.

   > But the bug is bad news:  We have *silent* integer overflow happening
   > according to what  .Internal(inspect(y)) shows...

   >  less bad new: Probably the bug is only in the 'internal inspect' code
   > where a format specifier is used in C's printf() that does not work
   > correctly on Windows, at least the way it is currently compiled ..


   > On (64-bit) Linux, I get

   >> y <- -1e9:4e9 ; .Internal(inspect(y))
   > @7d86388 14 REALSXP g0c0 [REF(65535)]  -10 : 40 (compact)

   >> y <- c(0L, y)
   > Error: cannot allocate vector of size 37.3 Gb

   > which seems much better ... until I do find a bug, may again
   > only in the C code underlying .Internal(inspect(.)) :

   >> y <- -1e9:2e9 ; .Internal(inspect(y))
   > @7d86ac0 13 INTSXP g0c0 [REF(65535)] Error: long vectors not supported 
yet: ../../../R/src/main/altclasses.c:139
   >>

Indeed, the purported "integer overflow" (above) does not
happen.
It is "only" a  'printf' related bug inside .Internal(inspect(.)) on Wi

Re: [Rd] [External] Re: some questions about R internal SEXP types

2020-09-08 Thread luke-tierney

On Tue, 8 Sep 2020, Hadley Wickham wrote:


On Tue, Sep 8, 2020 at 4:12 AM Tomas Kalibera  wrote:



The general principle is that R packages are only allowed to use what is
documented in the R help (? command) and in Writing R Extensions. The
former covers what is allowed from R code in extensions, the latter
mostly what is allowed from C code in extensions (with some references
to Fortran).


Could you clarify what you mean by "documented"? For example,
Rf_allocVector() is mentioned several times in R-exts, but I don't see
anywhere where the inputs and output are precisely described (which is
what I would consider to be documented). Is Rf_allocVector() part of
the API?


For now, documented means mentioned as something extension writers can
use.  Details are in the header files, Rinternals.h for
Rf_allocVector().

Ideally someone would find the time to refactor the header files,
Rinternals.h in particular, so everything in installed headers is
considered in the API and everything else is considered private and
subject to change. Unfortunately that would take a lot of effort, both
technical and political, and I don't see it happening soon. But I'm
happy to be proved wrong.

Best,

luke



Hadley




--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] some questions about R internal SEXP types

2020-09-08 Thread Dan Kortschak via R-devel
I was not. I was explaining why my expectations exist. I honestly
surprised that this would be misinterpreted.

Dan

On Tue, 2020-09-08 at 13:47 +0200, Tomas Kalibera wrote:
> Please don't use this list for advertising on other languages, there
> may be other lists for that.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] some questions about R internal SEXP types

2020-09-08 Thread Hadley Wickham
On Tue, Sep 8, 2020 at 4:12 AM Tomas Kalibera  wrote:
>
>
> The general principle is that R packages are only allowed to use what is
> documented in the R help (? command) and in Writing R Extensions. The
> former covers what is allowed from R code in extensions, the latter
> mostly what is allowed from C code in extensions (with some references
> to Fortran).

Could you clarify what you mean by "documented"? For example,
Rf_allocVector() is mentioned several times in R-exts, but I don't see
anywhere where the inputs and output are precisely described (which is
what I would consider to be documented). Is Rf_allocVector() part of
the API?

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] some questions about R internal SEXP types

2020-09-08 Thread Tomas Kalibera

On 9/8/20 1:13 PM, Dan Kortschak wrote:

On Tue, 2020-09-08 at 12:08 +0200, Tomas Kalibera wrote:

I am not sure if I understand correctly, but if you were accessing
directly the memory of SEXPs from Go implementation instead of
calling
through exported access functions documented in WRE, that would be a
really bad idea. Of course fine for research and experimentation, but
the internal structure can and does change at any time, otherwise we
would not be able to develop nor maintain R. Such direct access
bypassing WRE would likely be a clear case for rejection in CRAN for
this interface and any packages using it, and I hope in other package
repositories as well.

Sorry, I'm coming from a language that has strong backwards
compatibility guarantees and (generally) machine level data types, so
it is surprising to me that basic data types are that fluid.


Since R does not allow to do these things, it can change the object 
header without breaking compatibility.


In a managed language, it is certainly not typical to let native code 
extensions to access object headers directly, for safety, for allowing 
optimizations, due to synchronization, etc. In R, a recent optimization 
that would not have been possible otherwise, is the ALTREP framework.


Please don't use this list for advertising on other languages, there may 
be other lists for that.



However, I believe the overhead of calling the C-level access
functions
R exports should be minimal compared to other overheads. You can't
hope,
anyway, for being able to efficiently call tiny functions frequently
between Go and R. This can only work for bigger functions, anyway,
and
then the Go-C overhead should not be important.

This really depends on the complexity/structure of the data structures
that are being handed in to Go. The entirety of the tool is there to
allow interchange of data between Go and R, in the case of atomic
vectors, this cost is very cheap with direct access or via Cgo calling,
however each name access or attribute access (both of which are
necessary for struct population - and structs may come in slices) is a
Cgo call; these look ups go from ~nanosecond to ~hundred nanoseconds
per lookup.


Probably most data in R would be in vectors (as part of data frames), 
anyway. In some cases you may be able to cache the calls (some R objects 
are immutable, see WRE 5.9.10).


Tomas




Note that there is a lot in WRE that's beyond what I want rgo to be
able to do (calling in to R from Go for example). In fact, there's
just
a lot in WRE (it's almost 3 times the length of the Go language
spec
and memory model reference combined). The issues around weak
references
and external pointers are not something that I want to deal with;
working with that kind of object is not idiomatic for Go (in fact
without using C.malloc, R external pointers from Go would be
forbidden
by the Go runtime) and I would not expect that they are likely to
be
used by people writing extensions for R in Go.

Sure, I think it is perfectly fine to cover only a subset, if that is
already useful to write some extensions in Go. Maintenance would be
easiest if Go programs didn't call back into the R runtime at all, so
fewer calls the better for maintenance.

This is apparently unavoidable though from what I read here.


Best
Tomas


thanks
Dan




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] some questions about R internal SEXP types

2020-09-08 Thread Dan Kortschak via R-devel
On Tue, 2020-09-08 at 12:08 +0200, Tomas Kalibera wrote:
> I am not sure if I understand correctly, but if you were accessing
> directly the memory of SEXPs from Go implementation instead of
> calling
> through exported access functions documented in WRE, that would be a
> really bad idea. Of course fine for research and experimentation, but
> the internal structure can and does change at any time, otherwise we
> would not be able to develop nor maintain R. Such direct access
> bypassing WRE would likely be a clear case for rejection in CRAN for
> this interface and any packages using it, and I hope in other package
> repositories as well.

Sorry, I'm coming from a language that has strong backwards
compatibility guarantees and (generally) machine level data types, so
it is surprising to me that basic data types are that fluid.

> However, I believe the overhead of calling the C-level access
> functions
> R exports should be minimal compared to other overheads. You can't
> hope,
> anyway, for being able to efficiently call tiny functions frequently
> between Go and R. This can only work for bigger functions, anyway,
> and
> then the Go-C overhead should not be important.

This really depends on the complexity/structure of the data structures
that are being handed in to Go. The entirety of the tool is there to
allow interchange of data between Go and R, in the case of atomic
vectors, this cost is very cheap with direct access or via Cgo calling,
however each name access or attribute access (both of which are
necessary for struct population - and structs may come in slices) is a
Cgo call; these look ups go from ~nanosecond to ~hundred nanoseconds
per lookup.

> > Note that there is a lot in WRE that's beyond what I want rgo to be
> > able to do (calling in to R from Go for example). In fact, there's
> > just
> > a lot in WRE (it's almost 3 times the length of the Go language
> > spec
> > and memory model reference combined). The issues around weak
> > references
> > and external pointers are not something that I want to deal with;
> > working with that kind of object is not idiomatic for Go (in fact
> > without using C.malloc, R external pointers from Go would be
> > forbidden
> > by the Go runtime) and I would not expect that they are likely to
> > be
> > used by people writing extensions for R in Go.
> 
> Sure, I think it is perfectly fine to cover only a subset, if that is
> already useful to write some extensions in Go. Maintenance would be
> easiest if Go programs didn't call back into the R runtime at all, so
> fewer calls the better for maintenance.

This is apparently unavoidable though from what I read here.

> Best
> Tomas


thanks
Dan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] some questions about R internal SEXP types

2020-09-08 Thread Dan Kortschak via R-devel


Thanks, Tomas.

This is unfortunate. Calling between Go and C is not cheap; the gc
implementation of the Go compiler (as opposed to gccgo) uses different
calling conventions from C and there are checks to ensure that Go
allocated memory pointers do not leak into C code. For this reason I
wanted to avoid these if at all possible (I cannot for allocations
since I don't want to keep tracking changes in how R implements its GC
and allocation).

However, if SEXP type behaviour of the standard types, and how
attributes are handled are not highly mobile, I think that what I'm
doing will be OK - at worst the Go code will panic and result in an R
error. The necessary interface to R for allocations is only eight
functions[1].

Note that there is a lot in WRE that's beyond what I want rgo to be
able to do (calling in to R from Go for example). In fact, there's just
a lot in WRE (it's almost 3 times the length of the Go language spec
and memory model reference combined). The issues around weak references
and external pointers are not something that I want to deal with;
working with that kind of object is not idiomatic for Go (in fact
without using C.malloc, R external pointers from Go would be forbidden
by the Go runtime) and I would not expect that they are likely to be
used by people writing extensions for R in Go.

Dan

[1]


https://github.com/rgonomic/rgo/blob/2ce7717c85516bbfb94d0b5c7ef1d9749dd1f817/sexp/r_internal.go#L86-L118

On Tue, 2020-09-08 at 11:07 +0200, Tomas Kalibera wrote:
> The general principle is that R packages are only allowed to use what
> is
> documented in the R help (? command) and in Writing R Extensions. The
> former covers what is allowed from R code in extensions, the latter
> mostly what is allowed from C code in extensions (with some
> references
> to Fortran).
> 
> If you are implementing a Go interface for writing R packages, such
> Go
> interface should thus only use what is in the R help and in Writing R
> Extensions. Otherwise, packages would not be able to use such
> interface.
> 
> What is described in R Internals is for understanding the internal
> structure of R implementation itself, so for development of R itself,
> it
> could help indeed also debugging of R itself and in some cases
> debugging
> or performance analysis of extensions. R Internals can help in giving
> an
> intuition, but when people are implementing R itself, they also need
> to
> check the code. R Internals does not describe any interface for
> external
> code, if it states any constraints about say pairlists, etc, take it
> as
> an intuition for what has been intended and probably holds or held at
> some level of abstraction, but you need to check the source code for
> the
> details, anyway (e.g., at some very low level CAR and CDR can be any
> SEXP or R_NilValue, locally in some functions even C NULL).
> Internally,
> some C code uses C NULL SEXPs, but it is rare and local, and again,
> only
> the interface described in Writing R Extensions is for external use.
> 
> WRE speaks about "R NULL", "R NULL object" or "C NULL" in some cases
> to
> avoid confusion, e.g. for values types as "void *". SEXPs that
> packages
> obtain using the interface in WRE should not be C NULL, only R NULL
> (R_NilValue). External pointers can become C NULL and this is
> documented
> in WRE 5.13.
> 
> Best
> Tomas
> 
> On 9/6/20 3:44 AM, Dan Kortschak via R-devel wrote:
> > Hello,
> > 
> > I am writing an R/Go interoperability tool[1] that work similarly
> > to
> > Rcpp; the tool takes packages written in Go and performs the
> > necessary
> > Go type analysis to wrap the Go code with C and R shims that allow
> > the
> > Go code to then be called from R. The system is largely complete
> > (with
> > the exception of having a clean approach to handling generalised
> > attributes in the easy case[2] - the less hand holding case does
> > handle
> > these). Testing of some of the code is unfortunately lacking
> > because of
> > the difficulties of testing across environments.
> > 
> > To make the system flexible I have provided an (intentionally
> > incomplete) Go API into the R internals which allows reasonably Go
> > type-safe interaction with SEXP values (Go does not have unions, so
> > this is uglier than it might be otherwise and unions are faked with
> > Go
> > interface values). For efficiency reasons I've avoided using R
> > internal
> > calls where possible (accessors are done with Go code directly, but
> > allocations are done in R's C code to avoid having to duplicate the
> > garbage collection mechanics in Go with the obvious risks of error
> > and
> > possible behaviour skew in the future).
> > 
> > In doing this work I have some questions that I have not been able
> > to
> > find answers for in the R-ints doc or hadley/r-internals.
> > 
> > 1. In R-ints, the LISTSXP SEXP type CDR is said to hold
> > "usually"
> >LISTSXP or NULL. What does the "usually" mean here? Is it
> > possible
> >for the CDR to ho

Re: [Rd] some questions about R internal SEXP types

2020-09-08 Thread Tomas Kalibera



On 9/8/20 11:47 AM, Dan Kortschak wrote:

Thanks, Tomas.

This is unfortunate. Calling between Go and C is not cheap; the gc
implementation of the Go compiler (as opposed to gccgo) uses different
calling conventions from C and there are checks to ensure that Go
allocated memory pointers do not leak into C code. For this reason I
wanted to avoid these if at all possible (I cannot for allocations
since I don't want to keep tracking changes in how R implements its GC
and allocation).

However, if SEXP type behaviour of the standard types, and how
attributes are handled are not highly mobile, I think that what I'm
doing will be OK - at worst the Go code will panic and result in an R
error. The necessary interface to R for allocations is only eight
functions[1].


I am not sure if I understand correctly, but if you were accessing 
directly the memory of SEXPs from Go implementation instead of calling 
through exported access functions documented in WRE, that would be a 
really bad idea. Of course fine for research and experimentation, but 
the internal structure can and does change at any time, otherwise we 
would not be able to develop nor maintain R. Such direct access 
bypassing WRE would likely be a clear case for rejection in CRAN for 
this interface and any packages using it, and I hope in other package 
repositories as well.


However, I believe the overhead of calling the C-level access functions 
R exports should be minimal compared to other overheads. You can't hope, 
anyway, for being able to efficiently call tiny functions frequently 
between Go and R. This can only work for bigger functions, anyway, and 
then the Go-C overhead should not be important.



Note that there is a lot in WRE that's beyond what I want rgo to be
able to do (calling in to R from Go for example). In fact, there's just
a lot in WRE (it's almost 3 times the length of the Go language spec
and memory model reference combined). The issues around weak references
and external pointers are not something that I want to deal with;
working with that kind of object is not idiomatic for Go (in fact
without using C.malloc, R external pointers from Go would be forbidden
by the Go runtime) and I would not expect that they are likely to be
used by people writing extensions for R in Go.


Sure, I think it is perfectly fine to cover only a subset, if that is 
already useful to write some extensions in Go. Maintenance would be 
easiest if Go programs didn't call back into the R runtime at all, so 
fewer calls the better for maintenance.


Best
Tomas



Dan

[1]


https://github.com/rgonomic/rgo/blob/2ce7717c85516bbfb94d0b5c7ef1d9749dd1f817/sexp/r_internal.go#L86-L118

On Tue, 2020-09-08 at 11:07 +0200, Tomas Kalibera wrote:

The general principle is that R packages are only allowed to use what
is
documented in the R help (? command) and in Writing R Extensions. The
former covers what is allowed from R code in extensions, the latter
mostly what is allowed from C code in extensions (with some
references
to Fortran).

If you are implementing a Go interface for writing R packages, such
Go
interface should thus only use what is in the R help and in Writing R
Extensions. Otherwise, packages would not be able to use such
interface.

What is described in R Internals is for understanding the internal
structure of R implementation itself, so for development of R itself,
it
could help indeed also debugging of R itself and in some cases
debugging
or performance analysis of extensions. R Internals can help in giving
an
intuition, but when people are implementing R itself, they also need
to
check the code. R Internals does not describe any interface for
external
code, if it states any constraints about say pairlists, etc, take it
as
an intuition for what has been intended and probably holds or held at
some level of abstraction, but you need to check the source code for
the
details, anyway (e.g., at some very low level CAR and CDR can be any
SEXP or R_NilValue, locally in some functions even C NULL).
Internally,
some C code uses C NULL SEXPs, but it is rare and local, and again,
only
the interface described in Writing R Extensions is for external use.

WRE speaks about "R NULL", "R NULL object" or "C NULL" in some cases
to
avoid confusion, e.g. for values types as "void *". SEXPs that
packages
obtain using the interface in WRE should not be C NULL, only R NULL
(R_NilValue). External pointers can become C NULL and this is
documented
in WRE 5.13.

Best
Tomas

On 9/6/20 3:44 AM, Dan Kortschak via R-devel wrote:

Hello,

I am writing an R/Go interoperability tool[1] that work similarly
to
Rcpp; the tool takes packages written in Go and performs the
necessary
Go type analysis to wrap the Go code with C and R shims that allow
the
Go code to then be called from R. The system is largely complete
(with
the exception of having a clean approach to handling generalised
attributes in the easy case[2] - the less hand holding case does
handle
these). Testing of some of t

Re: [Rd] some questions about R internal SEXP types

2020-09-08 Thread Dan Kortschak via R-devel


Thanks, Gabriel.

On Mon, 2020-09-07 at 14:38 -0700, Gabriel Becker wrote:
> I cannot speak to initial intent, perhaps others can. I can say that
> there is at least one place where the difference between R_NilValue
> and NULL is very important as of right now. The current design of the
> ALTREP framework contract expects ALTREP methods that return a SEXP
> to return C NULL when they fail (or decline) to do the requested
> computation and the non-altclass-specific machinery should be run as
> a fallback. The places where ALTREP methods are plugged into the
> existing, general internals then check for C-NULL after attempting to
> fast-path the computation via ALTREP. Any non-C-NULL SEXP, including
> R_Nilvalue will be taken as an indication that the altrep-method
> succeeded and that SEXP is the resulting value, causing the fall-
> back 
> machinery to be skipped.  

This is helpful. Currently this will work in the low level SEXP API,
though not in the hand-holding level (and I think this is probably a
reasonable behavioural distinction); in the low level SEXP API in
rgo/sexp there are two facilitated ways to return values to R, the
Value.Pointer method and the Value.Export method, the first returns
whatever the value of the SEXP is, C NULL, R_NilValue or non-null
result, the second converts C NULL to R_NilValue before returning.
However, in line with the Go philosophy of not doing too much, the user
is free to return a Go nil (equivalent to a C NULL) or anything else if
they want.

The Pointer method is a pure type conversion:

```
func (v *T) Pointer() unsafe.Pointer {
return unsafe.Pointer(v)
}
```

and the Export method was an addition I made when I accidentally
returned a nil during testing and the R runtime complained at me.

```
func (v *T) Export() unsafe.Pointer {
if v == nil {
return NilValue.Pointer()
}
return unsafe.Pointer(v)
}
```

These are really just helpers that mean users don't need to use the Go
unsafe package directly for anything other than making their function
signatures valid.

Similarly, the parameter passed in to Go can be C NULL, R_NilValue or a
non-null value. It's a little more work in the case that C NULL needs
to be distinquished from R_NilValue:

```
func UserGoCode(p unsafe.Pointer) unsafe.Pointer {
if p == nil {
// We have a C Null.
// If this condition is omitted, v below will be
// R_NilValue when p is nil.
}
v := (*sexp.Value)(p).Value()
// We have v as a type that is one of the R TYPE values.
...
```

> IIUC the system you described, this means that it would be impossible
> to implement (a fully general) ALTREP class in GO using your
> framework (at least for the method types that return SEXP and for
> which R_NilValue is a valid return value) because your code is unable
> to distinguish safely between the two. In practice in most currently
> existing methods, you wouldn't ever need to return R_NilValue, I
> wouldn't think.

This should be OK from what I've said above. What the user won't be
able to do is distinguish between C NULL and R_NilValue in values that
come from. So I guess a better phrasing of my original question is
whether valid SEXP value fields ever hold C NULL. If they do, then I
have a problem. I'm very much hoping that some kind of sanity in the
code prevails and this doesn't ever happen.

> The problem that jumps out at me is Extract_subset. Now I'd need to
> do some digging to be certain but there, for some types in some
> situations, it DOES seem like you might need to return the R-NULL and
> find yourself unable to do so. 

I have not looked at all at ALTREP (though it looks like it would be
valuable given the goal of the project), but as above, I *can* return
the C NULL.

> Its also possible more methods will be added to the table in the
> future that would be problematic in light of that restrictrion.
> 
> In particular, if ALTREP list/environment implementations were to
> ever be supported I would expect you to be dead in the water entirely
> in terms of building those as you'd find yourself entirely unable to
> implement the Basic Single-element getter machinery, I think.

Is this still a concern with my clarifications above?

thanks
Dan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] some questions about R internal SEXP types

2020-09-08 Thread Dan Kortschak via R-devel


Thanks, Tomas.

This is unfortunate. Calling between Go and C is not cheap; the gc
implementation of the Go compiler (as opposed to gccgo) uses different
calling conventions from C and there are checks to ensure that Go
allocated memory pointers do not leak into C code. For this reason I
wanted to avoid these if at all possible (I cannot for allocations
since I don't want to keep tracking changes in how R implements its GC
and allocation).

However, if SEXP type behaviour of the standard types, and how
attributes are handled are not highly mobile, I think that what I'm
doing will be OK - at worst the Go code will panic and result in an R
error. The necessary interface to R for allocations is only eight
functions[1].

Note that there is a lot in WRE that's beyond what I want rgo to be
able to do (calling in to R from Go for example). In fact, there's just
a lot in WRE (it's almost 3 times the length of the Go language spec
and memory model reference combined). The issues around weak references
and external pointers are not something that I want to deal with;
working with that kind of object is not idiomatic for Go (in fact
without using C.malloc, R external pointers from Go would be forbidden
by the Go runtime) and I would not expect that they are likely to be
used by people writing extensions for R in Go.

Dan

[1]


https://github.com/rgonomic/rgo/blob/2ce7717c85516bbfb94d0b5c7ef1d9749dd1f817/sexp/r_internal.go#L86-L118

On Tue, 2020-09-08 at 11:07 +0200, Tomas Kalibera wrote:
> The general principle is that R packages are only allowed to use what
> is
> documented in the R help (? command) and in Writing R Extensions. The
> former covers what is allowed from R code in extensions, the latter
> mostly what is allowed from C code in extensions (with some
> references
> to Fortran).
> 
> If you are implementing a Go interface for writing R packages, such
> Go
> interface should thus only use what is in the R help and in Writing R
> Extensions. Otherwise, packages would not be able to use such
> interface.
> 
> What is described in R Internals is for understanding the internal
> structure of R implementation itself, so for development of R itself,
> it
> could help indeed also debugging of R itself and in some cases
> debugging
> or performance analysis of extensions. R Internals can help in giving
> an
> intuition, but when people are implementing R itself, they also need
> to
> check the code. R Internals does not describe any interface for
> external
> code, if it states any constraints about say pairlists, etc, take it
> as
> an intuition for what has been intended and probably holds or held at
> some level of abstraction, but you need to check the source code for
> the
> details, anyway (e.g., at some very low level CAR and CDR can be any
> SEXP or R_NilValue, locally in some functions even C NULL).
> Internally,
> some C code uses C NULL SEXPs, but it is rare and local, and again,
> only
> the interface described in Writing R Extensions is for external use.
> 
> WRE speaks about "R NULL", "R NULL object" or "C NULL" in some cases
> to
> avoid confusion, e.g. for values types as "void *". SEXPs that
> packages
> obtain using the interface in WRE should not be C NULL, only R NULL
> (R_NilValue). External pointers can become C NULL and this is
> documented
> in WRE 5.13.
> 
> Best
> Tomas
> 
> On 9/6/20 3:44 AM, Dan Kortschak via R-devel wrote:
> > Hello,
> > 
> > I am writing an R/Go interoperability tool[1] that work similarly
> > to
> > Rcpp; the tool takes packages written in Go and performs the
> > necessary
> > Go type analysis to wrap the Go code with C and R shims that allow
> > the
> > Go code to then be called from R. The system is largely complete
> > (with
> > the exception of having a clean approach to handling generalised
> > attributes in the easy case[2] - the less hand holding case does
> > handle
> > these). Testing of some of the code is unfortunately lacking
> > because of
> > the difficulties of testing across environments.
> > 
> > To make the system flexible I have provided an (intentionally
> > incomplete) Go API into the R internals which allows reasonably Go
> > type-safe interaction with SEXP values (Go does not have unions, so
> > this is uglier than it might be otherwise and unions are faked with
> > Go
> > interface values). For efficiency reasons I've avoided using R
> > internal
> > calls where possible (accessors are done with Go code directly, but
> > allocations are done in R's C code to avoid having to duplicate the
> > garbage collection mechanics in Go with the obvious risks of error
> > and
> > possible behaviour skew in the future).
> > 
> > In doing this work I have some questions that I have not been able
> > to
> > find answers for in the R-ints doc or hadley/r-internals.
> > 
> > 1. In R-ints, the LISTSXP SEXP type CDR is said to hold
> > "usually"
> >LISTSXP or NULL. What does the "usually" mean here? Is it
> > possible
> >for the CDR to ho

Re: [Rd] some questions about R internal SEXP types

2020-09-08 Thread Bertram, Alexander via R-devel
Hi Dan,

For what it's worth, Renjin requires LISTSXPs to hold either a LISTSXP or a
NULL, and this appears to be largely the case in practice based on running
tests for thousands of packages (including cross compiled C code). I can
only remember it being briefly an issue with the rlang package, but Lionel
graciously changed it:
https://github.com/r-lib/rlang/pull/579

Best,
Alex

On Mon, Sep 7, 2020 at 1:24 PM Dan Kortschak via R-devel <
r-devel@r-project.org> wrote:

>
> Hello,
>
> I am writing an R/Go interoperability tool[1] that work similarly to
> Rcpp; the tool takes packages written in Go and performs the necessary
> Go type analysis to wrap the Go code with C and R shims that allow the
> Go code to then be called from R. The system is largely complete (with
> the exception of having a clean approach to handling generalised
> attributes in the easy case[2] - the less hand holding case does handle
> these). Testing of some of the code is unfortunately lacking because of
> the difficulties of testing across environments.
>
> To make the system flexible I have provided an (intentionally
> incomplete) Go API into the R internals which allows reasonably Go
> type-safe interaction with SEXP values (Go does not have unions, so
> this is uglier than it might be otherwise and unions are faked with Go
> interface values). For efficiency reasons I've avoided using R internal
> calls where possible (accessors are done with Go code directly, but
> allocations are done in R's C code to avoid having to duplicate the
> garbage collection mechanics in Go with the obvious risks of error and
> possible behaviour skew in the future).
>
> In doing this work I have some questions that I have not been able to
> find answers for in the R-ints doc or hadley/r-internals.
>
>1. In R-ints, the LISTSXP SEXP type CDR is said to hold "usually"
>   LISTSXP or NULL. What does the "usually" mean here? Is it possible
>   for the CDR to hold values other than LISTSXP or NULL, and is
>   this NULL NILSXP or C NULL? I assume that the CAR can hold any type
>   of SEXP, is this correct?
>2. The LANGSXP and DOTSXP types are lists, but the R-ints comments on
>   them do not say whether the CDR of one of these lists is the same at
>   the head of the list of devolves to a LISTSXP. Looking through the
>   code suggests to me that functions that allocate these two types
>   allocate a LISTSXP and then change only the head of the list to be
>   the LANGSXP or DOTSXP that's required, meaning that the tail of the
>   list is all LISTSXP. Is this correct?
>
> The last question is more a question of interest in design strategy,
> and the answer may have been lost to time. In order to reduce the need
> to go through Go's interface assertions in a number of cases I have
> decided to reinterpret R_NilValue to an untyped Go nil (this is
> important for example in list traversal where the CDR can (hopefully)
> be only one of two types LISTSXP or NILSXP; in Go this would require a
> generalised SEXP return, but by doing this reinterpretation I can
> return a *List pointer which may be nil, greatly simplifying the code
> and improving the performance). My question her is why a singleton null
> value was chosen to be represented as a fully allocated SEXP value
> rather than just a C NULL. Also, whether C NULL is used to any great
> extent within the internal code. Note that the Go API provides a
> mechanism to easily reconvert the nil's used back to a R_NilValue when
> returning from a Go function[3].
>
> thanks
> Dan Kortschak
>
> [1]https://github.com/rgonomic/rgo
> [2]https://github.com/rgonomic/rgo/issues/1
> [3]https://pkg.go.dev/github.com/rgonomic/rgo/sexp?tab=doc#Value.Export
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Alexander Bertram
Technical Director
*BeDataDriven BV*

Web: http://bedatadriven.com
Email: a...@bedatadriven.com
Tel. Nederlands: +31(0)647205388
Skype: akbertram

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] some questions about R internal SEXP types

2020-09-08 Thread Gabriel Becker
Dan,

Sounds like a cool project! Response to one of your questions inline

On Mon, Sep 7, 2020 at 4:24 AM Dan Kortschak via R-devel <
r-devel@r-project.org> wrote:

>
> The last question is more a question of interest in design strategy,
> and the answer may have been lost to time. In order to reduce the need
> to go through Go's interface assertions in a number of cases I have
> decided to reinterpret R_NilValue to an untyped Go nil (this is
> important for example in list traversal where the CDR can (hopefully)
> be only one of two types LISTSXP or NILSXP; in Go this would require a
> generalised SEXP return, but by doing this reinterpretation I can
> return a *List pointer which may be nil, greatly simplifying the code
> and improving the performance). My question her is why a singleton null
> value was chosen to be represented as a fully allocated SEXP value
> rather than just a C NULL. Also, whether C NULL is used to any great
> extent within the internal code.


I cannot speak to initial intent, perhaps others can. I can say that there
is at least one place where the difference between R_NilValue and NULL is
very important as of right now. The current design of the ALTREP framework
contract expects ALTREP methods that return a SEXP to return C NULL when
they fail (or decline) to do the requested computation and the
non-altclass-specific machinery should be run as a fallback. The places
where ALTREP methods are plugged into the existing, general internals then
check for C-NULL after attempting to fast-path the computation via ALTREP.
Any non-C-NULL SEXP, including R_Nilvalue will be taken as an indication
that the altrep-method succeeded and that SEXP is the resulting value,
causing the fall-back machinery to be skipped.

IIUC the system you described, this means that it would be impossible to
implement (a fully general) ALTREP class in GO using your framework (at
least for the method types that return SEXP and for which R_NilValue is a
valid return value) because your code is unable to distinguish safely
between the two. In practice in most currently existing methods, you
wouldn't ever need to return R_NilValue, I wouldn't think.

The problem that jumps out at me is Extract_subset. Now I'd need to do some
digging to be certain but there, for some types in some situations, it DOES
*seem* like you might need to return the R-NULL and find yourself unable to
do so.

Its also possible more methods will be added to the table in the future
that would be problematic in light of that restrictrion.

In particular, if ALTREP list/environment implementations were to ever be
supported I would expect you to be dead in the water entirely in terms of
building those as you'd find yourself entirely unable to implement the
Basic Single-element getter machinery, I think.

Beyond that, a quick grep of the sources tells me there are definitely a
few times SEXP objects are  tested with  == NULL though not
overwhelmingly many. Most such tests are for non-SEXP pointers.

Best,
~G



> Note that the Go API provides a
> mechanism to easily reconvert the nil's used back to a R_NilValue when
> returning from a Go function[3].
>
> thanks
> Dan Kortschak
>
> [1]https://github.com/rgonomic/rgo
> [2]https://github.com/rgonomic/rgo/issues/1
> [3]https://pkg.go.dev/github.com/rgonomic/rgo/sexp?tab=doc#Value.Export
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Operations with long altrep vectors cause segfaults on Windows

2020-09-08 Thread Hugh Parsonage
Thanks Martin.  On further testing, it seems that the segmentation
fault can only occur when the amount of obtainable memory is
sufficiently high. On my machine (admittedly with other processes
running):

$ R --vanilla --max-mem-size=30G -e "x <- c(0L, -2e9:2e9)"
Segmentation fault

$ R --vanilla --max-mem-size=29G -e "x <- c(0L, -2e9:2e9)"
Error: cannot allocate vector of size 14.9 Gb
Execution halted

On Tue, 8 Sep 2020 at 18:52, Martin Maechler  wrote:
>
> > Martin Maechler
> > on Tue, 8 Sep 2020 10:40:24 +0200 writes:
>
> > Hugh Parsonage
> > on Tue, 8 Sep 2020 18:08:11 +1000 writes:
>
> >> I can only reproduce on Windows, but reliably (both 4.0.0 and 4.0.2):
>
> >> $> R --vanilla
> >> x <- c(0L, -2e9:2e9)
>
> >> # > Segmentation fault
>
> >> Tried to reproduce on Linux but the above worked as expected. Not an
> >> issue merely with the length of the vector; for example, x <-
> >> rep_len(1:10, 1e10) works, though the altrep vector must be long to
> >> reproduce:
>
> >> x <- c(0L, -1e9:1e9)  #ok
>
> >> Segmentation faults occur with the following too:
>
> >> x <- (-2e9:2e9) + 1L
>
> > Your operation would "need" (not in theory, but in practice)
> > to go from altrep to regular vectors.
> > I guess the segfault occurs because of something like this :
>
> > R asks Windows to hand it a huge amount of memory and Windows replies
> > "ok, here is the memory pointer"
> > and then R tries to write to there, but illegally (because
> > Windows should have told R that it does not really have enough
> > memory for that ..).
>
> > I cannot reproduce the segmentation fault .. but I can confirm
> > there is a bug there that shows for me on Windows but not on
> > Linux:
>
> > "My" Windows is on a terminalserver not with too many GB of memory
> > (but then in a version of Windows that recognizes that it cannot
> > get so much memory):
>
> > - Here some transcript (thanks to
> > using Emacs w/ ESS also on Windows) --
>
> > R Under development (unstable) (2020-08-24 r79074) -- "Unsuffered 
> Consequences"
> > Copyright (C) 2020 The R Foundation for Statistical Computing
> > Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> > R ist freie Software und kommt OHNE JEGLICHE GARANTIE.
> > Sie sind eingeladen, es unter bestimmten Bedingungen weiter zu 
> verbreiten.
> > Tippen Sie 'license()' or 'licence()' für Details dazu.
>
> > R ist ein Gemeinschaftsprojekt mit vielen Beitragenden.
> > Tippen Sie 'contributors()' für mehr Information und 'citation()',
> > um zu erfahren, wie R oder R packages in Publikationen zitiert werden 
> können.
>
> > Tippen Sie 'demo()' für einige Demos, 'help()' für on-line Hilfe, oder
> > 'help.start()' für eine HTML Browserschnittstelle zur Hilfe.
> > Tippen Sie 'q()', um R zu verlassen.
>
> >> x <- (-2e9:2e9) + 1L
> > Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
> >> y <- c(0L, -2e9:2e9)
> > Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
> >> Sys.setenv(LANGUAGE="en")
> >> y <- c(0L, -2e9:2e9)
> > Error: cannot allocate vector of size 14.9 Gb
> >> y <- -1e9:4e9
> >> .Internal(inspect(y))
> > @0x195a6808 14 REALSXP g0c0 [REF(65535)]  -10 : 
> -294967296 (compact)
> >> .Machine$integer.max / 1e9
> > [1] 2.147484
> >> y <- -1e6:2.2e9
> >> .Internal(inspect(y))
> > @0x0a11a5d8 14 REALSXP g0c0 [REF(65535)]  -100 : 
> -2094967296 (compact)
> >> y <- -1e6:2e9
> >> .Internal(inspect(y))
> > @0x0a13adf0 13 INTSXP g0c0 [REF(65535)]  -100 : 20 
> (compact)
> >>
> > - end of transcript 
> ---
>
> > So indeed, no seg.fault, R notices that it can't get 15 GB of
> > memory.
>
> > But the bug is bad news:  We have *silent* integer overflow happening
> > according to what  .Internal(inspect(y)) shows...
>
> >  less bad new: Probably the bug is only in the 'internal inspect' 
> code
> > where a format specifier is used in C's printf() that does not work
> > correctly on Windows, at least the way it is currently compiled ..
>
>
> > On (64-bit) Linux, I get
>
> >> y <- -1e9:4e9 ; .Internal(inspect(y))
> > @7d86388 14 REALSXP g0c0 [REF(65535)]  -10 : 40 
> (compact)
>
> >> y <- c(0L, y)
> > Error: cannot allocate vector of size 37.3 Gb
>
> > which seems much better ... until I do find a bug, may again
> > only in the C code underlying .Internal(inspect(.)) :
>
> >> y <- -1e9:2e9 ; .Internal(inspect(y))
> > @7d86ac0 13 INTSXP g0c0 [REF(65535)] Error: long vectors not supported 
> yet: ../../../R/src/main/altclasses.c:139
> >>
>
> Indeed, the purported "integer overflow" (above) does not
> happen.
> It is "only" a  'printf

Re: [Rd] some questions about R internal SEXP types

2020-09-08 Thread Tomas Kalibera



The general principle is that R packages are only allowed to use what is 
documented in the R help (? command) and in Writing R Extensions. The 
former covers what is allowed from R code in extensions, the latter 
mostly what is allowed from C code in extensions (with some references 
to Fortran).


If you are implementing a Go interface for writing R packages, such Go 
interface should thus only use what is in the R help and in Writing R 
Extensions. Otherwise, packages would not be able to use such interface.


What is described in R Internals is for understanding the internal 
structure of R implementation itself, so for development of R itself, it 
could help indeed also debugging of R itself and in some cases debugging 
or performance analysis of extensions. R Internals can help in giving an 
intuition, but when people are implementing R itself, they also need to 
check the code. R Internals does not describe any interface for external 
code, if it states any constraints about say pairlists, etc, take it as 
an intuition for what has been intended and probably holds or held at 
some level of abstraction, but you need to check the source code for the 
details, anyway (e.g., at some very low level CAR and CDR can be any 
SEXP or R_NilValue, locally in some functions even C NULL). Internally, 
some C code uses C NULL SEXPs, but it is rare and local, and again, only 
the interface described in Writing R Extensions is for external use.


WRE speaks about "R NULL", "R NULL object" or "C NULL" in some cases to 
avoid confusion, e.g. for values types as "void *". SEXPs that packages 
obtain using the interface in WRE should not be C NULL, only R NULL 
(R_NilValue). External pointers can become C NULL and this is documented 
in WRE 5.13.


Best
Tomas

On 9/6/20 3:44 AM, Dan Kortschak via R-devel wrote:

Hello,

I am writing an R/Go interoperability tool[1] that work similarly to
Rcpp; the tool takes packages written in Go and performs the necessary
Go type analysis to wrap the Go code with C and R shims that allow the
Go code to then be called from R. The system is largely complete (with
the exception of having a clean approach to handling generalised
attributes in the easy case[2] - the less hand holding case does handle
these). Testing of some of the code is unfortunately lacking because of
the difficulties of testing across environments.

To make the system flexible I have provided an (intentionally
incomplete) Go API into the R internals which allows reasonably Go
type-safe interaction with SEXP values (Go does not have unions, so
this is uglier than it might be otherwise and unions are faked with Go
interface values). For efficiency reasons I've avoided using R internal
calls where possible (accessors are done with Go code directly, but
allocations are done in R's C code to avoid having to duplicate the
garbage collection mechanics in Go with the obvious risks of error and
possible behaviour skew in the future).

In doing this work I have some questions that I have not been able to
find answers for in the R-ints doc or hadley/r-internals.

1. In R-ints, the LISTSXP SEXP type CDR is said to hold "usually"
   LISTSXP or NULL. What does the "usually" mean here? Is it possible
   for the CDR to hold values other than LISTSXP or NULL, and is
   this NULL NILSXP or C NULL? I assume that the CAR can hold any type
   of SEXP, is this correct?
2. The LANGSXP and DOTSXP types are lists, but the R-ints comments on
   them do not say whether the CDR of one of these lists is the same at
   the head of the list of devolves to a LISTSXP. Looking through the
   code suggests to me that functions that allocate these two types
   allocate a LISTSXP and then change only the head of the list to be
   the LANGSXP or DOTSXP that's required, meaning that the tail of the
   list is all LISTSXP. Is this correct?

The last question is more a question of interest in design strategy,
and the answer may have been lost to time. In order to reduce the need
to go through Go's interface assertions in a number of cases I have
decided to reinterpret R_NilValue to an untyped Go nil (this is
important for example in list traversal where the CDR can (hopefully)
be only one of two types LISTSXP or NILSXP; in Go this would require a
generalised SEXP return, but by doing this reinterpretation I can
return a *List pointer which may be nil, greatly simplifying the code
and improving the performance). My question her is why a singleton null
value was chosen to be represented as a fully allocated SEXP value
rather than just a C NULL. Also, whether C NULL is used to any great
extent within the internal code. Note that the Go API provides a
mechanism to easily reconvert the nil's used back to a R_NilValue when
returning from a Go function[3].

thanks
Dan Kortschak

[1]https://github.com/rgonomic/rgo
[2]https://github.com/rgonomic/rgo/issues/1
[3]https://pkg.go.dev/github.com/rgonomic/rgo/sexp?tab=doc#

Re: [Rd] some questions about R internal SEXP types

2020-09-08 Thread Dan Kortschak via R-devel
Thanks, Alex.

That might be good enough for me for this particular concern; in the
absence of a language specification specifying my behaviour and
referring to precedent seems like a reasonable fall back.

Dan

On Tue, 2020-09-08 at 09:33 +0200, Bertram, Alexander wrote:
> Hi Dan,
> 
> For what it's worth, Renjin requires LISTSXPs to hold either a
> LISTSXP or a NULL, and this appears to be largely the case in
> practice based on running tests for thousands of packages (including
> cross compiled C code). I can only remember it being briefly an issue
> with the rlang package, but Lionel graciously changed it:
> https://github.com/r-lib/rlang/pull/579
> 
> Best,
> Alex

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Operations with long altrep vectors cause segfaults on Windows

2020-09-08 Thread Martin Maechler
> Martin Maechler 
> on Tue, 8 Sep 2020 10:40:24 +0200 writes:

> Hugh Parsonage 
> on Tue, 8 Sep 2020 18:08:11 +1000 writes:

>> I can only reproduce on Windows, but reliably (both 4.0.0 and 4.0.2):

>> $> R --vanilla
>> x <- c(0L, -2e9:2e9)

>> # > Segmentation fault

>> Tried to reproduce on Linux but the above worked as expected. Not an
>> issue merely with the length of the vector; for example, x <-
>> rep_len(1:10, 1e10) works, though the altrep vector must be long to
>> reproduce:

>> x <- c(0L, -1e9:1e9)  #ok

>> Segmentation faults occur with the following too:

>> x <- (-2e9:2e9) + 1L

> Your operation would "need" (not in theory, but in practice)
> to go from altrep to regular vectors.
> I guess the segfault occurs because of something like this :

> R asks Windows to hand it a huge amount of memory and Windows replies
> "ok, here is the memory pointer"
> and then R tries to write to there, but illegally (because
> Windows should have told R that it does not really have enough
> memory for that ..). 
 
> I cannot reproduce the segmentation fault .. but I can confirm
> there is a bug there that shows for me on Windows but not on
> Linux:

> "My" Windows is on a terminalserver not with too many GB of memory
> (but then in a version of Windows that recognizes that it cannot
> get so much memory):

> - Here some transcript (thanks to
> using Emacs w/ ESS also on Windows) --

> R Under development (unstable) (2020-08-24 r79074) -- "Unsuffered 
Consequences"
> Copyright (C) 2020 The R Foundation for Statistical Computing
> Platform: x86_64-w64-mingw32/x64 (64-bit)

> R ist freie Software und kommt OHNE JEGLICHE GARANTIE.
> Sie sind eingeladen, es unter bestimmten Bedingungen weiter zu verbreiten.
> Tippen Sie 'license()' or 'licence()' für Details dazu.

> R ist ein Gemeinschaftsprojekt mit vielen Beitragenden.
> Tippen Sie 'contributors()' für mehr Information und 'citation()',
> um zu erfahren, wie R oder R packages in Publikationen zitiert werden 
können.

> Tippen Sie 'demo()' für einige Demos, 'help()' für on-line Hilfe, oder
> 'help.start()' für eine HTML Browserschnittstelle zur Hilfe.
> Tippen Sie 'q()', um R zu verlassen.

>> x <- (-2e9:2e9) + 1L
> Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
>> y <- c(0L, -2e9:2e9)
> Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
>> Sys.setenv(LANGUAGE="en")
>> y <- c(0L, -2e9:2e9)
> Error: cannot allocate vector of size 14.9 Gb
>> y <- -1e9:4e9
>> .Internal(inspect(y))
> @0x195a6808 14 REALSXP g0c0 [REF(65535)]  -10 : 
-294967296 (compact)
>> .Machine$integer.max / 1e9
> [1] 2.147484
>> y <- -1e6:2.2e9
>> .Internal(inspect(y))
> @0x0a11a5d8 14 REALSXP g0c0 [REF(65535)]  -100 : -2094967296 
(compact)
>> y <- -1e6:2e9
>> .Internal(inspect(y))
> @0x0a13adf0 13 INTSXP g0c0 [REF(65535)]  -100 : 20 
(compact)
>> 
> - end of transcript 
---

> So indeed, no seg.fault, R notices that it can't get 15 GB of
> memory.

> But the bug is bad news:  We have *silent* integer overflow happening
> according to what  .Internal(inspect(y)) shows...

>  less bad new: Probably the bug is only in the 'internal inspect' code
> where a format specifier is used in C's printf() that does not work
> correctly on Windows, at least the way it is currently compiled ..


> On (64-bit) Linux, I get

>> y <- -1e9:4e9 ; .Internal(inspect(y))
> @7d86388 14 REALSXP g0c0 [REF(65535)]  -10 : 40 (compact)

>> y <- c(0L, y)
> Error: cannot allocate vector of size 37.3 Gb

> which seems much better ... until I do find a bug, may again
> only in the C code underlying .Internal(inspect(.)) :

>> y <- -1e9:2e9 ; .Internal(inspect(y))
> @7d86ac0 13 INTSXP g0c0 [REF(65535)] Error: long vectors not supported 
yet: ../../../R/src/main/altclasses.c:139
>> 

Indeed, the purported "integer overflow" (above) does not
happen.
It is "only" a  'printf' related bug inside .Internal(inspect(.)) on Windows.

*interestingly*, the above bug I've noticed on (64-bit) Linux
does *not* show on Windows (64-bit), at least not for that case:

On Windows, things are fine as long as they remain (compacted
aka 'ALTREP') INTSXP:

  > y <- -1e3:2e9 ;.Internal(inspect(y))
  @0x0a285648 13 INTSXP g0c0 [REF(65535)]  -1000 : 20 (compact)
  > y <- -1e3:2.1e9 ;.Internal(inspect(y))
  @0x19925930 13 INTSXP g0c0 [REF(65535)]  -1000 : 21 (compact)

and here, y is correct, just the printing from
.Internal(inspect(y)) is bugous (probably prints the double as an integer):

  > y <- -1e3:2.2e9 ; .Internal(ins

Re: [Rd] Operations with long altrep vectors cause segfaults on Windows

2020-09-08 Thread Martin Maechler
> Hugh Parsonage 
> on Tue, 8 Sep 2020 18:08:11 +1000 writes:

> I can only reproduce on Windows, but reliably (both 4.0.0 and 4.0.2):

> $> R --vanilla
> x <- c(0L, -2e9:2e9)

> # > Segmentation fault

> Tried to reproduce on Linux but the above worked as expected. Not an
> issue merely with the length of the vector; for example, x <-
> rep_len(1:10, 1e10) works, though the altrep vector must be long to
> reproduce:

> x <- c(0L, -1e9:1e9)  #ok

> Segmentation faults occur with the following too:

> x <- (-2e9:2e9) + 1L

Your operation would "need" (not in theory, but in practice)
to go from altrep to regular vectors.
I guess the segfault occurs because of something like this :

 R asks Windows to hand it a huge amount of memory and Windows replies
 "ok, here is the memory pointer"
 and then R tries to write to there, but illegally (because
 Windows should have told R that it does not really have enough
 memory for that ..). 
 
I cannot reproduce the segmentation fault .. but I can confirm
there is a bug there that shows for me on Windows but not on
Linux:

"My" Windows is on a terminalserver not with too many GB of memory
(but then in a version of Windows that recognizes that it cannot
 get so much memory):

- Here some transcript (thanks to
  using Emacs w/ ESS also on Windows) --

R Under development (unstable) (2020-08-24 r79074) -- "Unsuffered Consequences"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R ist freie Software und kommt OHNE JEGLICHE GARANTIE.
Sie sind eingeladen, es unter bestimmten Bedingungen weiter zu verbreiten.
Tippen Sie 'license()' or 'licence()' für Details dazu.

R ist ein Gemeinschaftsprojekt mit vielen Beitragenden.
Tippen Sie 'contributors()' für mehr Information und 'citation()',
um zu erfahren, wie R oder R packages in Publikationen zitiert werden können.

Tippen Sie 'demo()' für einige Demos, 'help()' für on-line Hilfe, oder
'help.start()' für eine HTML Browserschnittstelle zur Hilfe.
Tippen Sie 'q()', um R zu verlassen.

> x <- (-2e9:2e9) + 1L
Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
> y <- c(0L, -2e9:2e9)
Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
> Sys.setenv(LANGUAGE="en")
> y <- c(0L, -2e9:2e9)
Error: cannot allocate vector of size 14.9 Gb
> y <- -1e9:4e9
> .Internal(inspect(y))
@0x195a6808 14 REALSXP g0c0 [REF(65535)]  -10 : -294967296 
(compact)
> .Machine$integer.max / 1e9
[1] 2.147484
> y <- -1e6:2.2e9
> .Internal(inspect(y))
@0x0a11a5d8 14 REALSXP g0c0 [REF(65535)]  -100 : -2094967296 
(compact)
> y <- -1e6:2e9
> .Internal(inspect(y))
@0x0a13adf0 13 INTSXP g0c0 [REF(65535)]  -100 : 20 (compact)
> 
- end of transcript ---

So indeed, no seg.fault, R notices that it can't get 15 GB of
memory.

But the bug is bad news:  We have *silent* integer overflow happening
according to what  .Internal(inspect(y)) shows...

  less bad new: Probably the bug is only in the 'internal inspect' code
 where a format specifier is used in C's printf() that does not work
 correctly on Windows, at least the way it is currently compiled ..


On (64-bit) Linux, I get

> y <- -1e9:4e9 ; .Internal(inspect(y))
@7d86388 14 REALSXP g0c0 [REF(65535)]  -10 : 40 (compact)

> y <- c(0L, y)
Error: cannot allocate vector of size 37.3 Gb

which seems much better ... until I do find a bug, may again
only in the C code underlying .Internal(inspect(.)) :

> y <- -1e9:2e9 ; .Internal(inspect(y))
@7d86ac0 13 INTSXP g0c0 [REF(65535)] Error: long vectors not supported yet: 
../../../R/src/main/altclasses.c:139
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Operations with long altrep vectors cause segfaults on Windows

2020-09-08 Thread Hugh Parsonage
I can only reproduce on Windows, but reliably (both 4.0.0 and 4.0.2):

$> R --vanilla
x <- c(0L, -2e9:2e9)

# > Segmentation fault

Tried to reproduce on Linux but the above worked as expected. Not an
issue merely with the length of the vector; for example, x <-
rep_len(1:10, 1e10) works, though the altrep vector must be long to
reproduce:

x <- c(0L, -1e9:1e9)  #ok

Segmentation faults occur with the following too:

x <- (-2e9:2e9) + 1L

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel