Re: [Rd] max on numeric_version with long components

2024-04-27 Thread Ivan Krylov via R-devel
В Sat, 27 Apr 2024 13:56:58 -0500
Jonathan Keane  пишет:

> In devel:
> > max(numeric_version(c("1.0.1.1", "1.0.3.1",  
> "1.0.2.1")))
> [1] ‘1.0.1.1’
> > max(numeric_version(c("1.0.1.1000", "1.0.3.1000",  
> "1.0.2.1000")))
> [1] ‘1.0.3.1000’

Thank you Jon for spotting this!

This is an unintended consequence of
https://bugs.r-project.org/show_bug.cgi?id=18697.

The old behaviour of max() was to call
which.max(xtfrm(x)), which first produced a permutation that sorted the
entire .encode_numeric_version(x). The new behavioiur is to call
which.max directly on .encode_numeric_version(x), which is faster (only
O(length(x)) instead of a sort).

What do the encoded version strings look like?

x <- numeric_version(c(
 "1.0.1.1", "1.0.3.1", "1.0.2.1"
))
# Ignore the attributes
(e <- as.vector(.encode_numeric_version(x)))
# [1] "101575360400"
# [2] "103575360400"
# [3] "102575360400"

# order(), xtfrm(), sort() all agree that e[2] is the maximum:
order(e)
# [1] 1 3 2
xtfrm(e)
# [1] 1 3 2
sort(e)
# [1] "101575360400"
# [2] "102575360400"
# [3] "103575360400"

# but not which.max:
which.max(e)
# [1] 1

This happens because which.max() converts its argument to double, which
loses precision:

(n <- as.numeric(e))
# [1] 1e+27 1e+27 1e+27
identical(n[1], n[2])
# [1] TRUE
identical(n[3], n[2])
# [1] TRUE

Will be curious to know if there is a clever way to keep both the O(N)
complexity and the full arbitrary precision.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Extending proj with proj.line3d methods and overloading the methods

2024-04-27 Thread Ivan Krylov via R-package-devel
27 апреля 2024 г. 00:49:47 GMT+03:00, Leo Mada via R-package-devel 
 пишет:
>Dear List-Members,
>
>I try to implement a proj.line3d method and to overload this method as follows:
>
>proj.line3d <- function(p, x, y, z, ...)
>  UseMethod("proj.line3d")
>
>proj.line3d.numeric = function(p, x, y, z, ...) {
>  # ...
>}
>
>proj.line3d.matrix = function(p, x, y, z, ...) {
>  # ...
>}

>p = c(1,2,3)
>line = matrix(c(0,5,2,3,1,4), 2)
>proj.line3d(p, line)
>#  Error in UseMethod("proj.line3d") :
>#   no applicable method for 'proj.line3d' applied to an object of class 
>"c('double', 'numeric')"

>methods(proj)
># [1] proj.aov*   proj.aovlist*   proj.default*   proj.line3d
># [5] proj.line3d.matrix  proj.line3d.numeric proj.lm

In your NAMESPACE, you've registered methods for the generic function 'proj', 
classes 'line3d.matrix' and 'line3d.numeric', but above you are calling a 
different generic, 'proj.line3d', for which no methods are registered.

For proj.line3d(, ) to work, you'll have to register the 
methods for the proj.line3d generic. If you need a visible connection to the 
proj() generic, you can try registering a method on the 'proj' generic, class 
'line3d' *and* creating a class 'line3d' that would wrap your vectors and 
matrices:

proj(line3d(p), line) -> call lands in proj.line3d -> maybe additional dispatch 
on the remaining classes of 'p'?

This seems to work, but I haven't tested it extensively:

> proj.line3d <- \(x, ...) UseMethod('proj.line3d')
> proj.line3d.numeric <- \(x, ...) { message('proj.line3d.numeric'); x }
> line3d <- \(x) structure(x, class = c('line3d', class(x)))
> proj(line3d(pi))
proj.line3d.numeric
[1] 3.141593
attr(,"class")
[1] "line3d"  "numeric"

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] R 4.4.0 has version of Matrix 1.7-0, but it's not available on CRAN

2024-04-26 Thread Ivan Krylov via R-devel
On Fri, 26 Apr 2024 13:15:47 +0200
Gábor Csárdi  wrote:

> That's not how this worked in the past AFAIR. Simply, the packages in
> the x.y.z/Recommended directories were included in
> src/contrib/PACKAGES*, metadata, with the correct R version
> dependencies, in the correct order, so that `install.packages()`
> automatically installed the correct version without having to add
> extra repositories or manually search for package files.

That's great, then there is no need to patch anything. Thanks for
letting me know.

Should we be asking c...@r-project.org to add 4.4.0/Recommended to the
index, then?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 4.4.0 has version of Matrix 1.7-0, but it's not available on CRAN

2024-04-26 Thread Ivan Krylov via R-devel
On Fri, 26 Apr 2024 12:32:59 +0200
Martin Maechler  wrote:

> Finally, I'd think it definitely would be nice for
> install.packages("Matrix") to automatically get the correct
> Matrix version from CRAN ... so we (R-core) would be grateful
> for a patch to install.packages() to achieve this

Since the binaries offered on CRAN are already of the correct version
(1.7-0 for -release and -devel), only source package installation needs
to concern itself with the Recommended subdirectory.

Would it be possible to generate the PACKAGES* index files in the
4.4.0/Recommended subdirectory? Then on the R side it would be needed
to add a new repo (adjusting chooseCRANmirror() to set it together with
repos["CRAN"]) and keep the rest of the machinery intact.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Some, but not all vignettes compressed

2024-04-25 Thread Ivan Krylov via R-package-devel
В Thu, 25 Apr 2024 11:54:49 -0700
Bryan Hanson  пишет:

> So my version of gs blows things up!

The relatively good news is that GhostScript is not solely to blame. A
fresh build of "GPL Ghostscript 10.03.0 (2024-03-06)" was able to
reduce the files to 16..70% of their original size on my computer. But
I just typed ./configure && make and relied on the dependencies already
present on my system.

We can try to compare the build settings (which will involve compiling
things by hand) or ask the Homebrew people [*] (and they will probably
ask for a PDF file and a specific command line that works on some
builds of gs-10.03.0 but not with Homebrew).

What would you rather do?

qpdf, on the other hand, results in no size reduction (99.7% or worse),
just like on your system.

-- 
Best regards,
Ivan

[*]
https://docs.brew.sh/Troubleshooting
https://github.com/Homebrew/homebrew-core/issues?q=ghostscript

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Some, but not all vignettes compressed

2024-04-25 Thread Ivan Krylov via R-package-devel
В Thu, 25 Apr 2024 08:54:41 -0700
Bryan Hanson  пишет:

>   'gs+qpdf' made some significant size reductions:
>  compacted 'Vig_02_Conceptual_Intro_PCA.pdf' from 432Kb to 143Kb
>  compacted 'Vig_03_Step_By_Step_PCA.pdf' from 414Kb to 101Kb
>  compacted 'Vig_04_Scores_Loadings.pdf' from 334Kb to 78Kb
>  compacted 'Vig_06_Math_Behind_PCA.pdf' from 558Kb to 147Kb
>  compacted 'Vig_07_Functions_PCA.pdf' from 381Kb to 90Kb

I'm getting similar (but not same) results on Debian Stable, gs 10.00.0
& qpdf 11.3.0:

# R CMD build --no-resave-data --compact-vignettes=both
compacted ‘Vig_01_Start_Here.pdf’ from 244Kb to 45Kb   
compacted ‘Vig_02_Conceptual_Intro_PCA.pdf’ from 432Kb to 143Kb
compacted ‘Vig_03_Step_By_Step_PCA.pdf’ from 411Kb to 100Kb
compacted ‘Vig_04_Scores_Loadings.pdf’ from 335Kb to 78Kb  
compacted ‘Vig_05_Visualizing_PCA_3D.pdf’ from 679Kb to 478Kb  
compacted ‘Vig_06_Math_Behind_PCA.pdf’ from 556Kb to 145Kb 
compacted ‘Vig_07_Functions_PCA.pdf’ from 378Kb to 89Kb
compacted ‘Vig_08_Notes.pdf’ from 239Kb to 39Kb

 
> - doc/Vig_01_Start_Here.pdf:gs: res=0;  + qpdf: res=0; 
> ==> (new=49942)/(old=45101) = 1.10734 .. not worth using  
> - doc/Vig_02_Conceptual_Intro_PCA.pdf:gs: res=0;  + qpdf: res=0; 
> ==> (new=1.00061e+07)/(old=442210) = 22.6275 .. not worth using  
> - doc/Vig_03_Step_By_Step_PCA.pdf:gs: res=0;  + qpdf: res=0; 
> ==> (new=5.763e+06)/(old=423484) = 13.6085 .. not worth using  
> - doc/Vig_04_Scores_Loadings.pdf:gs: res=0;  + qpdf: res=0; 
> ==> (new=5.41409e+06)/(old=341680) = 15.8455 .. not worth using  
> - doc/Vig_05_Visualizing_PCA_3D.pdf:gs: res=0;  + qpdf: res=0; 
> ==> (new=1.23622e+07)/(old=692901) = 17.8412 .. not worth using  
> - doc/Vig_06_Math_Behind_PCA.pdf:gs: res=0;  + qpdf: res=0; 
> ==> (new=816690)/(old=571493) = 1.42905 .. not worth using  
> - doc/Vig_07_Functions_PCA.pdf:gs: res=0;  + qpdf: res=0; 
> ==> (new=1.36419e+06)/(old=389478) = 3.50262 .. not worth using  
> - doc/Vig_08_Notes.pdf:gs: res=0;  + qpdf: res=0; 
> ==> (new=40919)/(old=38953) = 1.05047 .. not worth using  

Thank you for providing this data! Somehow, instead of compacting the
PDFs, one of the tools manages to blow them up in size, as much as ~23
times.

Can you try tools::compactPDF() separately with gs_quality = 'none'
(isolating qpdf) and with qpdf = '' (isolating GhostScript)?

If the culprit turns out to be GhostScript, it may be due to their
rewritten PDF rendering engine (now in C instead of PostScript with
special extensions) not being up to par when the PDF file needs to be
compressed. If it turns out to be qpdf, we might have to extract the
exact command lines and compare results further.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Big speedup in install.packages() by re-using connections

2024-04-25 Thread Ivan Krylov via R-devel
On Thu, 25 Apr 2024 14:45:04 +0200
Jeroen Ooms  wrote:

> Thoughts?

How verboten would it be to create an empty external pointer object,
add it to the preserved list, and set an on-exit finalizer to clean up
the curl multi-handle? As far as I can tell, the internet module is not
supposed to be unloaded, so this would not introduce an opportunity to
jump to an unmapped address. This makes it possible to avoid adding a
CurlCleanup() function to the internet module:

Index: src/modules/internet/libcurl.c
===
--- src/modules/internet/libcurl.c  (revision 86484)
+++ src/modules/internet/libcurl.c  (working copy)
@@ -55,6 +55,47 @@
 
 static int current_timeout = 0;
 
+// The multi-handle is shared between downloads for reusing connections
+static CURLM *shared_mhnd = NULL;
+static SEXP mhnd_sentinel = NULL;
+
+static void cleanup_mhnd(SEXP ignored)
+{
+if(shared_mhnd){
+curl_multi_cleanup(shared_mhnd);
+shared_mhnd = NULL;
+}
+curl_global_cleanup();
+}
+static void rollback_mhnd_sentinel(void* sentinel) {
+// Failed to allocate memory while registering a finalizer,
+// therefore must release the object
+R_ReleaseObject((SEXP)sentinel);
+}
+static CURLM *get_mhnd(void)
+{
+if (!mhnd_sentinel) {
+  SEXP sentinel = PROTECT(R_MakeExternalPtr(NULL, R_NilValue, R_NilValue));
+  R_PreserveObject(sentinel);
+  UNPROTECT(1);
+  // Avoid leaking the sentinel before setting the finalizer
+  RCNTXT cntxt;
+  begincontext(, CTXT_CCODE, R_NilValue, R_BaseEnv, R_BaseEnv,
+   R_NilValue, R_NilValue);
+  cntxt.cend = _mhnd_sentinel;
+  cntxt.cenddata = sentinel;
+  R_RegisterCFinalizerEx(sentinel, cleanup_mhnd, TRUE);
+  // Succeeded, no need to clean up if endcontext() fails allocation
+  mhnd_sentinel = sentinel;
+  cntxt.cend = NULL;
+  endcontext();
+}
+if(!shared_mhnd) {
+  shared_mhnd = curl_multi_init();
+}
+return shared_mhnd;
+}
+
 # if LIBCURL_VERSION_MAJOR < 7 || (LIBCURL_VERSION_MAJOR == 7 && 
LIBCURL_VERSION_MINOR < 28)
 
 // curl/curl.h includes  and headers it requires.
@@ -565,8 +606,6 @@
if (c->hnd && c->hnd[i])
curl_easy_cleanup(c->hnd[i]);
 }
-if (c->mhnd)
-   curl_multi_cleanup(c->mhnd);
 if (c->headers)
curl_slist_free_all(c->headers);
 
@@ -668,7 +707,7 @@
c.headers = headers = tmp;
 }
 
-CURLM *mhnd = curl_multi_init();
+CURLM *mhnd = get_mhnd();
 if (!mhnd)
error(_("could not create curl handle"));
 c.mhnd = mhnd;


-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: Is ALTREP "non-API"?

2024-04-25 Thread Ivan Krylov via R-devel
On Wed, 24 Apr 2024 15:31:39 -0500 (CDT)
luke-tierney--- via R-devel  wrote:

> We would be better off (in my view, not necessarily shared by others
> in R-core) if we could get to a point where:
> 
>  all entry points listed in installed header files can be used in
>  packages, at least with some caveats;
> 
>  the caveats are expressed in a standard way that is searchable,
>  e.g. with a standardized comment syntax at the header file or
>  individual declaration level.

This sounds almost like Doxygen, although the exact syntax used to
denote the entry points and the necessary comments is far from the most
important detail at this point.

> There are some 500 entry points in the R shared library that are in
> the installed headers but not mentioned in WRE. These would need to
> be reviewed and adjusted.

Is there a way for outsiders to help? For example, would it help to
produce the linking graph (package P links to entry points X, Y)? I
understand that an entry point being unpopular doesn't mean it
shouldn't be public (and the other way around), but combined with a
list of entry points that are listed in WRE, such a graph could be
useful to direct effort or estimate impact from interface changes.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] View() segfaulting ...

2024-04-25 Thread Ivan Krylov via R-devel
On Wed, 24 Apr 2024 19:35:42 -0400
Ben Bolker  wrote:

>  I'm using bleeding-edge R-devel, so maybe my build is weird. Can 
> anyone else reproduce this?
> 
>View() seems to crash on just about anything.

Not for me, sorry.

If you have a sufficiently new processor, you can use `rr` [*] to
capture the crash, set a breakpoint in in_R_X11_dataviewer and rewind,
then set a watchpoint on the stack canary and run the program forward
again:
https://www.redhat.com/en/blog/debugging-stack-protector-failures

If you can't locate the canary, try setting watchpoints on large local
variables. Without `rr`, the procedure is probably the same, but
without rewinding: set a breakpoint in in_R_X11_dataviewer, set some
watchpoints, see if they fire when they shouldn't, start from scratch
if you get past the watchpoints and the process crashes.

I think that that either an object file didn't get rebuilt when it
should have, or a shared library used by something downstream from
View() got an ABI-breaking update. If this still reproduces with a clean
rebuild of R, it's definitely worth investigating further, perhaps using
AddressSanitizer. Valgrind may be lacking the information about the
stack canary and thus failing to distinguish between overwriting the
canary and normal access to a stack variable via a pointer.

-- 
Best regards,
Ivan

[*] https://rr-project.org/
Edit distance of one from the domain name of the R project!

Use rr replay -g $EVENT_NUMBER to debug past the initial execve()
from the shell wrapper: https://github.com/rr-debugger/rr/wiki/FAQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] [External] Re: Package submission to CRAN not passing incoming checks

2024-04-24 Thread Ivan Krylov via R-package-devel
В Wed, 24 Apr 2024 00:17:28 +
"Petersen, Isaac T"  пишет:

> I included the packages (including the raw package folders and their
> .tar.gz files) in the /inst/extdata folder.

Would you prefer your test to install them from the source directories
(as you currently do, in which case the *.tar.gz files can be omitted)
or the *.tar.gz files (in which case you can set the `repos` argument
to a file:/// URI and omit the package directories and the setwd()
calls)?

I think (but haven't tested) that the two problems that are currently
breaking your test are with .libPaths() and setwd().

.libPaths(temp_lib) overwrites the library paths with `temp_lib` and
the system libraries, the ones in %PROGRAMFILES%\R\R-*\library. In
particular, this removes %LOCALAPPDATA%\R\win-library\* from the list
of library paths, so the packages installed by the user (including
'waldo', which is needed by 'testthat') stop being available.

In order to add temp_lib to the list of the paths, use
.libPaths(c(temp_lib, .libPaths())).

Since setwd() returns the previous directory, one that was current
before setwd() was called, the code newpath <- setwd(filepath);
setwd(newpath) will keep the current directory, not set it to
`filepath`. Use oldpath <- setwd(filepath) instead.

Since you're already using 'testthat' and it already depends on
'withr', you may find it easier to use withr::local_dir(...) and
withr::local_temp_libpaths(...).

In order to test for a package being attached by load_or_install() (and
not just installed and loadable), check for 'package:testpackage1'
being present in the return value of search(). (This check is good
enough and much easier to write than comparing environments on the
search path with the package exports or comparing searchpaths() with
the paths under the temporary library.)

Finally, I think that there is no need for the test_load_or_install()
call because I don't see the function being defined anywhere. Doesn't
test_that(...) run the tests by itself?

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Package submission to CRAN not passing incoming checks

2024-04-23 Thread Ivan Krylov via R-package-devel
Dear Isaac,

В Mon, 22 Apr 2024 17:00:27 +
"Petersen, Isaac T"  пишет:

> This my first post--I read the posting guidelines, but my apologies
> in advance if I make a mistake.

Welcome to R-package-devel! You're doing just fine.

> 1) The first note <...> includes the contents of the LICENSE file

It's multiple NOTEs in a trench coat. Kasper has addressed the "large
version components" and the DOIs interpreted as file URIs, but there's
one more.

The ' + file LICENSE' syntax has two uses: (1)
for when the terms of the license is a template, requiring the author
of the software to substitute some information (e.g. the year and the
copyright holder for MIT) and (2) for when a package puts additional
restrictions on the base license.

(Hmm. Only case (2) is currently described at
; case
(1) is only described inside the license files.)

The CRAN team has expressed a preference for the package authors not to
put 2 twisty little copies of standard licenses, all slightly
different, inside their packages. Since you're not restricting CC BY
4.0, it's enough to say 'License: CC BY 4.0'. If you'd like a full copy
of the license text in your source code repository, that's fine, but
you'll need to list the file in .Rbuildignore:
https://cran.r-project.org/doc/manuals/R-exts.html#Building-package-tarballs

Speaking of the Creative Commons license: the choice of a license for
your code is obviously yours, but Creative Commons themselves recommend
against using their licenses for software:
.
I can't recommend you a license - that would be politically motivated
meddling in foreign affairs - but the lists linked by the CC FAQ and
Writing R Extensions section 1.1.2 should provide a good starting point.

> Here are the results from win-builder:
> https://win-builder.r-project.org/incoming_pretest/petersenlab_0.1.2-9033_20240415_212322/

There is one more NOTE:

>> * checking examples ... [437s/438s] NOTE
>> Examples with CPU (user + system) or elapsed time > 5s
>>user system elapsed
>> load_or_install 349.410 37.410 387.233
>> vwReg35.199  0.379  35.606
 
The examples are not only for the user to read in the help page; they
are also for the user to run example(vwReg) and see your code in action
(and for R CMD check to see whether they crash, including regularly on
CRAN).

For vwReg, try reducing the number of regressions you are running
(since your dataset is mtcars, which is already very compact).

For load_or_install, we have the additional issue that running
example(load_or_install) modifies the contents of the R library and the
search path, which belong to the user. The CRAN policy forbids such
modifications: 

Examples in general should change as little of the global state of the
R session and the underlying computer as possible. I suggest wrapping
the example in \dontrun{} (since everything about load_or_install() is
about altering global state) and creating a test for the function in
tests/*.R.

The test should set up a new library under tempdir(), run
load_or_install(), check the outcomes (that the desired package is
attached, etc.) and clean up after itself. There's also the matter of
the package not failing without a connection to the Internet, which is
another CRAN policy requirement. You might have to bring a very small
test package in inst/extdata just for load_or_install() to install and
load it, so that R CMD check won't fail when running offline.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Old references in the Description file.

2024-04-11 Thread Ivan Krylov via R-package-devel
В Thu, 11 Apr 2024 11:57:00 +
Gabriel Constantino Blain  пишет:

> The problem is that it is a paper from the 70's (Priestley and
> Taylor, 1972) and its DOI has very uncommon symbols, such as <>. The
> DOI is: 10.1175/1520-0493(1972)100<0081:OTAOSH>2.3.CO;2.

Since the R CMD check function responsible for locating and checking
the DOIs from the package metadata expects to see them URL-encoded, it
should be possible to put your DOI through paste0('') in order to generate the correct link.

Another workaround is to generate a shortDOI that would redirect to the
same place as the original DOI:
https://shortdoi.org/10.1175/1520-0493(1972)100%3C0081:OTAOSH%3E2.3.CO;2
Now  should work like the original DOI.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Question about CRAN submission resulting in 1 note

2024-04-10 Thread Ivan Krylov via R-package-devel
В Wed, 10 Apr 2024 14:11:53 +
Chris Knoll  пишет:

> For "Package has VignetteBuilder field but no prebuilt vignette
> index", how would this be resolved?

The package at https://github.com/OHDSI/CirceR/ doesn't seem to have any
vignettes. Without vignettes, there's no need for VignetteBuilder:
knitr.

> For "Package ahs FOSS license, installs .class/.jar but has no 'java
> directory'':  This is custom code that I've written in Java plus has
> a few maven dependencies and I'm not sure if they are asking me to
> bundle the source code of all Java dependencies (that have classes in
> the jar file).   That could be hard to do, and was hoping if anyone
> had experience in this, is it enough to put into the Readme where
> such source code could be found?

Here's what the policy has to say:

>> For Java .class and .jar files, the sources should be in a top-level
>> java directory in the source package (or that directory should
>> explain how they can be obtained).



At the very least, XLconnect seems to be fine supplying just the
README. If it's not too much trouble, shipping your custom source code
(definitely not all of the maven dependencies) would be the kind thing
to do, I think. (Feel free to disregard this part if a more experienced
Java package developer says otherwise.)

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Wish: a way to track progress of parallel operations

2024-04-09 Thread Ivan Krylov via R-devel
Dear Henrik (and everyone else):

Here's a patch implementing support for immediateConditions in
'parallel' socket clusters. What do you think?

I've tried to make the feature backwards-compatible in the sense that
an older R starting a newer cluster worker will not pass the flag
enabling condition passing and so will avoid being confused by packets
with type = 'CONDITION'.

In order to propagate the conditions in a timely manner, all 'parallel'
functions that currently use recvData() on individual nodes will have
to switch to calling recvOneData(). I've already adjusted
staticClusterApply(), but e.g. clusterCall() would still postpone
immediateConditions from nodes later in the list (should they appear).

If this is deemed a good way forward, I can prepare a similar patch for
the MPI and socket clusters implemented in the 'snow' package.

-- 
Best regards,
Ivan
Index: src/library/parallel/R/clusterApply.R
===
--- src/library/parallel/R/clusterApply.R	(revision 86373)
+++ src/library/parallel/R/clusterApply.R	(working copy)
@@ -28,8 +28,12 @@
 end <- min(n, start + p - 1L)
 	jobs <- end - start + 1L
 for (i in 1:jobs)
-sendCall(cl[[i]], fun, argfun(start + i - 1L))
-val[start:end] <- lapply(cl[1:jobs], recvResult)
+sendCall(cl[[i]], fun, argfun(start + i - 1L),
+ tag = start + i - 1L)
+for (i in 1:jobs) {
+d <- recvOneResult(cl)
+val[d$tag] <- list(d$value)
+}
 start <- start + jobs
 }
 checkForRemoteErrors(val)
Index: src/library/parallel/R/snow.R
===
--- src/library/parallel/R/snow.R	(revision 86373)
+++ src/library/parallel/R/snow.R	(working copy)
@@ -120,7 +120,8 @@
 rprog = file.path(R.home("bin"), "R"),
 snowlib = .libPaths()[1],
 useRscript = TRUE, # for use by snow clusters
-useXDR = TRUE)
+useXDR = TRUE,
+forward_conditions = TRUE)
 defaultClusterOptions <<- addClusterOptions(emptyenv(), options)
 }
 
Index: src/library/parallel/R/snowSOCK.R
===
--- src/library/parallel/R/snowSOCK.R	(revision 86373)
+++ src/library/parallel/R/snowSOCK.R	(working copy)
@@ -32,6 +32,7 @@
 methods <- getClusterOption("methods", options)
 useXDR <- getClusterOption("useXDR", options)
 homogeneous <- getClusterOption("homogeneous", options)
+forward_conditions <- getClusterOption('forward_conditions', options)
 
 ## build the local command for starting the worker
 env <- paste0("MASTER=", master,
@@ -40,7 +41,8 @@
  " SETUPTIMEOUT=", setup_timeout,
  " TIMEOUT=", timeout,
  " XDR=", useXDR,
- " SETUPSTRATEGY=", setup_strategy)
+ " SETUPSTRATEGY=", setup_strategy,
+ " FORWARDCONDITIONS=", forward_conditions)
 ## Should cmd be run on a worker with R <= 4.0.2,
 ## .workRSOCK will not exist, so fallback to .slaveRSOCK
 arg <- "tryCatch(parallel:::.workRSOCK,error=function(e)parallel:::.slaveRSOCK)()"
@@ -130,17 +132,26 @@
 sendData.SOCKnode <- function(node, data) serialize(data, node$con)
 sendData.SOCK0node <- function(node, data) serialize(data, node$con, xdr = FALSE)
 
-recvData.SOCKnode <- recvData.SOCK0node <- function(node) unserialize(node$con)
+recvData.SOCKnode <- recvData.SOCK0node <- function(node) repeat {
+val <- unserialize(node$con)
+if (val$type != 'CONDITION') return(val)
+signalCondition(val$value)
+}
 
 recvOneData.SOCKcluster <- function(cl)
 {
 socklist <- lapply(cl, function(x) x$con)
 repeat {
-ready <- socketSelect(socklist)
-if (length(ready) > 0) break;
+repeat {
+ready <- socketSelect(socklist)
+if (length(ready) > 0) break;
+}
+n <- which.max(ready) # may need rotation or some such for fairness
+value <- unserialize(socklist[[n]])
+if (value$type != 'CONDITION')
+return(list(node = n, value = value))
+signalCondition(value$value)
 }
-n <- which.max(ready) # may need rotation or some such for fairness
-list(node = n, value = unserialize(socklist[[n]]))
 }
 
 makePSOCKcluster <- function(names, ...)
@@ -349,6 +360,7 @@
 timeout <- 2592000L   # wait 30 days for new cmds before failing
 useXDR <- TRUE# binary serialization
 setup_strategy <- "sequential"
+forward_conditions <- FALSE
 
 for (a in commandArgs(TRUE)) {
 ## Or use strsplit?
@@ -365,6 +377,9 @@
SETUPSTRATEGY = {
setup_strategy <- match.arg(value,
c("sequential", 

Re: [R-pkg-devel] Linking Tutorial Site to CRAN Package site.

2024-04-07 Thread Ivan Krylov via R-package-devel
В Sat, 6 Apr 2024 18:27:24 +
"Ruff, Sergej"  пишет:

> The CRAN site
> (https://cran.r-project.org/web/packages/RepeatedHighDim/index.html)
> has a "documentation" part with the refrence pdf.
> 
> Can I link to our tutorial site (https://software.klausjung-lab.de/.)
> under documentation?

Since your tutorial is relatively short and contains R code intermixed
with the results of running it, it could make a great vignette.
Vignettes are linked on the CRAN page for a package right under the
PDF reference manual. For example, the BiocManager package has one
vignette: https://cran.r-project.org/package=BiocManager

Vignettes are a part of the package and their code is automatically
checked together with your examples. For the users of your package,
this will help keep the tutorial available (even if the website moves
in the future) and compatible with the current version of the package
(even if the package evolves and the tutorial website evolves together
with it).

R has built-in support for PDF vignettes via LaTeX using Sweave [*].
HTML vignettes can be much more accessible than PDF files, but there is
no built-in HTML vignette engine in R [**]. The 'markdown' package is
reasonably lightweight and has an HTML vignette engine. Markdown tries
to be a superset of HTML, so it should be possible to keep most of your
original HTML, including the styling, while rewriting the tutorial as
an executable vignette.

-- 
Best regards,
Ivan

[*]
https://cran.r-project.org/doc/manuals/R-exts.html#Writing-package-vignettes

[**]
It's possible to write a crude HTML vignette engine in ~100 lines of R
code, but we cannot expect every package author to do that.

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Bug in out-of-bounds assignment of list object to expression() vector

2024-04-05 Thread Ivan Krylov via R-devel
On Fri, 5 Apr 2024 08:15:20 -0400
June Choe  wrote:

> When assigning a list to an out of bounds index (ex: the next, n+1
> index), it errors the same but now changes the values of the vector
> to NULL:
> 
> ```
> x <- expression(a,b,c)
> x[[4]] <- list() # Error
> x
> #> expression(NULL, NULL, NULL)  
> ```
> 
> Curiously, this behavior disappears if a prior attempt is made at
> assigning to the same index, using a different incompatible object
> that does not share this bug (like a function)

Here's how the problem happens:

1. The call lands in src/main/subassign.c, do_subassign2_dflt().

2. do_subassign2_dflt() calls SubassignTypeFix() to prepare the operand
for the assignment.

3. Since the assignment is "stretching", SubassignTypeFix() calls
EnlargeVector() to provide the space for the assignment.

The bug relies on `x` not being IS_GROWABLE(), which may explain 
why a plain x[[4]] <- list() sometimes doesn't fail.

The future assignment result `x` is now expression(a, b, c, NULL), and
the old `x` set to expression(NULL, NULL, NULL) by SET_VECTOR_ELT(newx,
i, VECTOR_ELT(x, i)); CLEAR_VECTOR_ELT(x, i); during EnlargeVector().

4. But then the assignment fails, raising the error back in
do_subassign2_dflt(), because the assignment kind is invalid: there is
no way to put data.frames into an expression vector. The new resized
`x` is lost, and the old overwritten `x` stays there.

Not sure what the right way to fix this is. It's desirable to avoid
shallow_duplicate(x) for the overwriting assignments, but then the
sub-assignment must either succeed or leave the operand untouched.
Is there a way to perform the type check before overwriting the operand?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] hand compile; link to MKL fails at BLAS zdotu

2024-03-30 Thread Ivan Krylov via R-devel
В Sat, 30 Mar 2024 20:31:25 +0300
Ivan Krylov via R-devel  пишет:

> It seems to crash inside MKL!

Should have read some more about mkl_gf_lp64 before posting. According
to the Intel forums, it is indeed required in order to work with the
GFortran calling convention, but if you're linking against it, you also
have to add the rest of the linker command line, i.e.:

-lmkl_gf_lp64 -lmkl_core -lmkl_sequential 
-Wl,--no-as-needed -lpthread -lm -ldl

https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/ARPACK-with-MKL-crashes-when-calling-zdotc/m-p/1054316

Maybe it's even documented somewhere, but Intel makes it too annoying
to read their documentation, and they definitely don't mention it in
the link line advisor. There's also the ominous comment saying that

>> you cannot call standard BLAS [c,z]dot[c,u] functions from C/C++
>> because the interface library that is linked is specific for
>> GFortran which has a different calling convention of returning a
>> Complex type and would cause issues

I'm not seeing any calls to [c,z]dot[c,u] from inside R's C code (which
is why R seems to work when running with libmkl_rt.so), and the
respective declarations in R_ext/BLAS.h have an appropriate warning:

>> WARNING!  The next two return a value that may not be compatible
>> between C and Fortran, and even if it is, this might not be the
>> right translation to C.

...so it's likely that everything will keep working.

Indeed, R configured with

--with-blas='-lmkl_gf_lp64 -lmkl_core -lmkl_sequential'
--with-lapack='-lmkl_gf_lp64 -lmkl_core -lmkl_sequential'

seems to work with MKL.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] hand compile; link to MKL fails at BLAS zdotu

2024-03-30 Thread Ivan Krylov via R-devel
В Sat, 30 Mar 2024 10:55:48 +
Ramón Fallon  пишет:

> In contrast to Dirk's solution, I've found R's configure script
> doesn't recognise the update-alternatives system on debian/ubuntu, if
> it's MKL.

It ought to work if configured with --with-blas=-lblas
--with-lapack=-llapack, but, as you found out (and I can confirm), if
libblas.so and liblapack.so already point to MKL, ./configure somehow
fails the test for zdotu and falls back to bundled Rblas and Rlapack.

If you'd like the built R to work with the update-alternatives system,
the workaround seems to help is to temporarily switch the alternatives
to reference BLAS & LAPACK, configure and build R, and then switch the
alternatives back to MKL.

> appending "-lmkl_gf_lp64" to the --with-blas option does not help
> (that's suggested by several posts out there).

MKL has an official "link line advisor" at
,
which may suggest a completely different set of linker options
depending on what it is told. Here's how R's zdotu test always fails
when linking directly with MKL:

# pre-configure some variables
echo '#define HAVE_F77_UNDERSCORE 1' > confdefs.h
FC=gfortran
FFLAGS='-g -Og'
CC=gcc
CFLAGS='-g -Og'
CPPFLAGS=-I/usr/local/include
MAIN_LDFLAGS='-Wl,--export-dynamic -fopenmp'
LDFLAGS='-L/usr/local/lib'
LIBM=-lm
FLIBS=' -lgfortran -lm -lquadmath'
# copied & pasted from the Intel web page
BLAS_LIBS='-lmkl_rt -Wl,--no-as-needed -lpthread -lm -ldl'

# R prepares to call zdotu from Fortran...
cat > conftestf.f < 1.0d-10) then
iflag = 1
  else
iflag = 0
  endif
  end
EOF
${FC} ${FFLAGS} -c conftestf.f

# and then call the Fortran subroutine from the C runner...
cat > conftest.c <
#include "confdefs.h"
#ifdef HAVE_F77_UNDERSCORE
# define F77_SYMBOL(x)   x ## _
#else
# define F77_SYMBOL(x)   x
#endif
extern void F77_SYMBOL(test1)(int *iflag);

int main () {
  int iflag;
  F77_SYMBOL(test1)();
  exit(iflag);
}
EOF
${CC} ${CPPFLAGS} ${CFLAGS} -c conftest.c

# and then finally link and execute the program
${CC} ${CPPFLAGS} ${CFLAGS} ${LDFLAGS} ${MAIN_LDFLAGS} \
 -o conftest conftest.o conftestf.o \
 ${BLAS_LIBS} ${FLIBS} ${LIBM}
./conftest

It seems to crash inside MKL!

rax=cccd rbx=5590ee102008 rcx=7ffdab2ddb20 
rdx=5590ee102008 
rsi=7ffdab2ddb18 rdi=5590ee10200c rbp=7ffdab2dd910 
rsp=7ffdab2db600 
 r8=5590ee102008  r9=7ffdab2ddb28 r10=7f4086a99178 
r11=7f4086e02490 
r12=5590ee10200c r13=7ffdab2ddb20 r14=5590ee102008 
r15=7ffdab2ddb28 
ip = 7f4086e02a60, sp = 7ffdab2db600 [mkl_blas_zdotu+1488]
ip = 7f4085dc5250, sp = 7ffdab2dd920 [zdotu+256]
ip = 5590ee1011cc, sp = 7ffdab2ddb40 [test1_+91]
ip = 5590ee101167, sp = 7ffdab2ddb70 [main+14]

It's especially strange that R does seem to work if you just
update-alternatives after linking it with the reference BLAS, but
./conftest starts crashing again in the same place. This is with
Debian's MKL version 2020.4.304-4, by the way.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] paths capability FALSE on devel?

2024-03-27 Thread Ivan Krylov via R-devel
В Wed, 27 Mar 2024 11:28:17 +0100
Alexandre Courtiol  пишет:

> after installing R-devel the output of
> grDevices::dev.capabilities()$paths is FALSE, while it is TRUE for R
> 4.3.3

Your system must be missing Cairo development headers, making x11()
fall back to type = 'Xlib':

$ R-devel -q -s -e 'x11(); grDevices::dev.capabilities()$paths'
 [1] TRUE
$ R-devel -q -s -e \
 'x11(type="Xlib"); grDevices::dev.capabilities()$paths'
 [1] FALSE

If that's not the case and capabilities()['cairo'] is TRUE in your
build of R-devel, please show us the sessionInfo() from your build of
R-devel.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wish: a way to track progress of parallel operations

2024-03-26 Thread Ivan Krylov via R-devel
Henrik,

Thank you for taking the time to read and reply to my message!

On Mon, 25 Mar 2024 10:19:38 -0700
Henrik Bengtsson  wrote:

> * Target a solution that works the same regardless whether we run in
> parallel or not, i.e. the code/API should look the same regardless of
> using, say, parallel::parLapply(), parallel::mclapply(), or
> base::lapply(). The solution should also work as-is in other parallel
> frameworks.

You are absolutely right about mclapply(): it suffers from the same
problem where the task running inside it has no reliable mechanism of
reporting progress. Just like on a 'parallel' cluster (which can be
running on top of an R connection, MPI, the 'mirai' package, a server
pretending to be multiple cluster nodes, or something completely
different), there is currently no documented interface for the task to
report any additional data except the result of the computation.

> I argue the end-user should be able to decided whether they want to
> "see" progress updates or not, and the developer should focus on
> where to report on progress, but not how and when.

Agreed. As a package developer, I don't even want to bother calling
setTxtProgressBar(...), but it gets most of the job done at zero
dependency cost, and the users don't complain. The situation could
definitely be improved.

> It is possible to use the existing PSOCK socket connections to send
> such 'immediateCondition':s.

Thanks for pointing me towards ClusterFuture, that's a great hack, and
conditions are a much better fit for progress tracking than callbacks.

It would be even better if 'parallel' clusters could "officially"
handle immediateConditions and re-signal them in the main R session.
Since R-4.4 exports (but not yet documents) sendData, recvData and
recvOneData generics from 'parallel', we are still in a position to
codify and implement the change to the 'parallel' cluster back-end API.

It shouldn't be too hard to document the requirement that recvData() /
recvOneData() must signal immediateConditions arriving from the nodes
and patch the existing cluster types (socket and MPI). Not sure how
hard it will be to implement for 'mirai' clusters.

> I honestly think we could arrive at a solution where base-R proposes
> a very light, yet powerful, progress API that handles all of the
> above. The main task is to come up with a standard API/protocol -
> then the implementation does not matter.

Since you've already given it a lot of thought, which parts of
progressr would you suggest for inclusion into R, besides 'parallel'
clusters and mclapply() forwarding immediateConditions from the worker
processes?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Wish: a way to track progress of parallel operations

2024-03-25 Thread Ivan Krylov via R-devel
Hello R-devel,

A function to be run inside lapply() or one of its friends is trivial
to augment with side effects to show a progress bar. When the code is
intended to be run on a 'parallel' cluster, it generally cannot rely on
its own side effects to report progress.

I've found three approaches to progress bars for parallel processes on
CRAN:

 - Importing 'snow' (not 'parallel') internals like sendCall and
   implementing parallel processing on top of them (doSNOW). This has
   the downside of having to write higher-level code from scratch
   using undocumented inferfaces.

 - Splitting the workload into length(cluster)-sized chunks and
   processing them in separate parLapply() calls between updating the
   progress bar (pbapply). This approach trades off parallelism against
   the precision of the progress information: the function has to wait
   until all chunk elements have been processed before updating the
   progress bar and submitting a new portion; dynamic load balancing
   becomes much less efficient.

 - Adding local side effects to the function and detecting them while
   the parallel function is running in a child process (parabar). A
   clever hack, but much harder to extend to distributed clusters.

With recvData and recvOneData becoming exported in R-4.4 [*], another
approach becomes feasible: wrap the cluster object (and all nodes) into
another class, attach the progress callback as an attribute, and let
recvData / recvOneData call it. This makes it possible to give wrapped
cluster objects to unchanged code, but requires knowing the precise
number of chunks that the workload will be split into.

Could it be feasible to add an optional .progress argument after the
ellipsis to parLapply() and its friends? We can require it to be a
function accepting (done_chunk, total_chunks, ...). If not a new
argument, what other interfaces could be used to get accurate progress
information from staticClusterApply and dynamicClusterApply?

I understand that the default parLapply() behaviour is not very
amenable to progress tracking, but when running clusterMap(.scheduling
= 'dynamic') spanning multiple hours if not whole days, having progress
information sets the mind at ease.

I would be happy to prepare code and documentation. If there is no time
now, we can return to it after R-4.4 is released.

-- 
Best regards,
Ivan

[*] https://bugs.r-project.org/show_bug.cgi?id=18587

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] How to store large data to be used in an R package?

2024-03-25 Thread Ivan Krylov via R-package-devel
В Mon, 25 Mar 2024 11:12:57 +0100
Jairo Hidalgo Migueles  пишет:

> Specifically, this data consists of regression and random forest
> models crucial for making predictions within our R package.

Apologies for asking a silly question, but is there a chance that these
models are large by accident (e.g. because an object references a large
environment containing multiple copies of the training dataset)? Or it
is there really more than a million weights required to make
predictions?

> Initially, I attempted to save these models as internal data within
> the package. While this approach maintains functionality, it has led
> to a package size exceeding 20 MB. I'm concerned that this would
> complicate submitting the package to CRAN in the future.

The policy mentions the possibility of having a separate large
data-only package. Since CRAN strives to archive all package versions,
this data-only package will have to be updated as rarely as possible.
You will need to ask CRAN for approval.

If there is a significant amount of core functionality inside your
package that does *not* require the large data (so that it can still
be installed and used without the data), you can publish the data-only
package yourself (e.g. using the 'drat' package), put it in Suggests
and link to it in the Additional_repositories field of your DESCRIPTION.
Alternatively, you can publish the data on Zenodo and offer to download
it on first use. Make sure to (1) use tools::R_user_dir to determine
where to put the files, (2) only download the files after the user
explicitly agrees to it and (3) test as much of your package
functionality as possible without requiring the data to be downloaded.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Request for assistance: error in installing on Debian (undefined symbol: omp_get_num_procs) and note in checking the HTML versions (no command 'tidy' found, package 'V8' unavailable

2024-03-22 Thread Ivan Krylov via R-package-devel
В Thu, 21 Mar 2024 18:32:59 +
Annaig De-Walsche  пишет:

> If ever I condition the use of OpenMD directives, users will indeed
> be capable of installing the package, but they wont access to a
> performant version of the code, as it necessitates the use of OpenMD.
> Is there a method to explicitly express that the use of OpenMD is
> highly encouraged?

I think the most practical method would be to produce a
packageStartupMessage() from the .onAttach function of your package if
you detect that the package has been compiled without OpenMP support:
https://cran.r-project.org/doc/manuals/R-exts.html#Load-hooks

> In practical, how to know from R code if OpenMP is present or not?

Your C code will have to detect it and provide this information to the
R code. WRE 1.6.4 says:

>> [C]heck carefully that you have followed the advice in the
>> subsection on OpenMP support [WRE 1.2.1.1]. In particular, any use
>> of OpenMP in C/C++ code will need to use
>> 
>>  #ifdef _OPENMP
>>  # include 
>>  #endif



Similarly, any time you use #pragma omp ... or call
omp_set_num_threads(), it needs to be wrapped in #ifdef _OPENMP ...
#endif.

Additionally, it is important to make sure that during tests and
examples, your OpenMP code doesn't use more than two threads:
https://cran.r-project.org/web/packages/policies.html
This is in place because CRAN checks are run in parallel, and a package
that tries to helpfully use all of the processor cores would interfere
with other packages being checked at the same time.

>   [[alternative HTML version deleted]]

This mailing list removes HTML e-mails. If you compose your messages in
HTML, we only get the plain text version automatically prepared by your
mailer:
https://stat.ethz.ch/pipermail/r-package-devel/2024q1/010595.html

In order to preserve the content and the presentation of your messages,
it's best to compose them in plain text.

-- 
Très cordialement,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] help diagnosing win-builder failures

2024-03-17 Thread Ivan Krylov via R-package-devel
Hi,

This may need the help of Uwe Ligges to diagnose. I suspect this may be
related to the Windows machine having too much memory committed (as Uwe
has been able to pinpoint recently [*] about a package that failed to
compile some heavily templated C++), but there is not enough information
to give a conclusive diagnosis.

On Sun, 17 Mar 2024 14:01:33 -0400
Ben Bolker  wrote:

> 2. an ERROR running tests, where the output ends with a cryptic
> 
>Anova: ..
> 
> (please try to refrain from snarky comments about not using testthat
> ...)

Pardon my ignorance, but is it an option to upload a version of the
package that uses test_check(pkg, reporter=LocationReporter()) instead
of the summary reporter?

-- 
Best regards,
Ivan

[*] https://stat.ethz.ch/pipermail/r-package-devel/2024q1/010304.html

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Spurious warning in as.data.frame.factor()

2024-03-17 Thread Ivan Krylov via R-devel
On Fri, 15 Mar 2024 11:24:22 +0100
Martin Maechler  wrote:

> I think just adding
> 
>  removeGeneric('as.data.frame')
> 
> is appropriate here as it is self-explaining and should not leave
> much traces.

Thanks for letting me know! I'll make sure to use removeGeneric() in
similar cases in the future.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Removing import(methods) stops exporting S4 "meta name"

2024-03-15 Thread Ivan Krylov via R-package-devel
On Thu, 14 Mar 2024 16:06:50 -0400
Duncan Murdoch  wrote:

> Error in xj[i] : invalid subscript type 'list'
> Calls: join_inner -> data.frame -> [ -> [.data.table -> [.data.frame
> Execution halted

And here's how it happens:

join_inner calls xi[yi,on=by,nomatch=0] on data.tables xi and yi.

`[.data.table` calls cedta() to determine whether the calling
environment is data.table-aware. If the import of `.__T__[:base` is
removed, cedta() returns FALSE.

`[.data.table` then forwards the call to `[.data.frame`, which cannot
handle data.table-style subsetting.

This is warned about in
;
the 'do' package should have set the .datatable.aware = TRUE marker in
its environment. In fact, example(join_inner) doesn't raise an error
with the following changes when running with data.table commit f92aee69
(i.e. pre-#6001):

diff -rU2 do/NAMESPACE do_2.0.0.0.2/NAMESPACE
--- do/NAMESPACE2021-08-03 12:37:00.0 +0300
+++ do_2.0.0.0.2/NAMESPACE  2024-03-15 14:01:10.588561222 +0300
@@ -130,5 +130,4 @@
 export(upper.dir)
 export(write_xlsx)
-importFrom(data.table,`.__T__[:base`)
 importFrom(methods,as)
 importFrom(reshape2,melt)
diff -rU2 do/R/join.R do_2.0.0.0.2/R/join.R
--- do/R/join.R 2020-06-30 06:47:22.0 +0300
+++ do_2.0.0.0.2/R/join.R   2024-03-15 13:54:02.289440613 +0300
@@ -1,2 +1,4 @@
+.datatable.aware = TRUE
+
 #' @title Join two dataframes together
 #' @description Join two dataframes by the same id column.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Spurious warning in as.data.frame.factor()

2024-03-14 Thread Ivan Krylov via R-devel
On Thu, 14 Mar 2024 10:41:54 +0100
Martin Maechler  wrote:

> Anybody trying S7 examples and see if they work w/o producing
> wrong warnings?

It looks like this is not applicable to S7. If I overwrite
as.data.frame with a newly created S7 generic, it fails to dispatch on
existing S3 classes:

new_generic('as.data.frame', 'x')(factor(1))
# Error: Can't find method for `as.data.frame(S3)`.

But there is no need to overwrite the generic, because S7 classes
should work with existing S3 generics:

foo <- new_class('foo', parent = class_double)
method(as.data.frame, foo) <- function(x) structure(
 # this is probably not generally correct
 list(x),
 names = deparse1(substitute(x)),
 row.names = seq_len(length(x)),
 class = 'data.frame'
)
str(as.data.frame(foo(pi)))
# 'data.frame':   1 obs. of  1 variable:
#  $ x:  num 3.14

So I think that is nothing to break because S7 methods for
as.data.frame will rely on S3 for dispatch.

> > The patch passes make check-devel, but I'm not sure how to safely
> > put setGeneric('as.data.frame'); as.data.frame(factor(1:10)) in a
> > regression test.  
> 
> {What's the danger/problem?  we do have "similar" tests in both
>   src/library/methods/tests/*.R
>   tests/reg-S4.R
> 
>  -- maybe we can discuss bi-laterally  (or here, as you prefer)
> }

This might be educational for other people wanting to add a regression
test to their patch. I see that tests/reg-tests-1e.R is already running
under options(warn = 2), so if I add the following near line 750
("Deprecation of *direct* calls to as.data.frame.")...

# Should not warn for a call from a derivedDefaultMethod to the raw
# S3 method -- implementation detail of S4 dispatch
setGeneric('as.data.frame')
as.data.frame(factor(1))

...then as.data.frame will remain an S4 generic. Should the test then
rm(as.data.frame) and keep going? (Or even keep the S4 generic?) Is
there any hidden state I may be breaking for the rest of the test this
way? The test does pass like this, so this may be worrying about
nothing.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Spurious warning in as.data.frame.factor()

2024-03-13 Thread Ivan Krylov via R-devel
В Tue, 12 Mar 2024 12:33:17 -0700
Hervé Pagès  пишет:

> The acrobatics that as.data.frame.factor() is going thru in order to 
> recognize a direct call don't play nice if as.data.frame() is an S4 
> generic:
> 
>      df <- as.data.frame(factor(11:12))
> 
>      suppressPackageStartupMessages(library(BiocGenerics))
>      isGeneric("as.data.frame")
>      # [1] TRUE
> 
>      df <- as.data.frame(factor(11:12))
>      # Warning message:
>      # In as.data.frame.factor(factor(11:12)) :
>      #   Direct call of 'as.data.frame.factor()' is deprecated.

How about something like the following:

Index: src/library/base/R/zzz.R
===
--- src/library/base/R/zzz.R(revision 86109)
+++ src/library/base/R/zzz.R(working copy)
@@ -681,7 +681,14 @@
 bdy <- body(as.data.frame.vector)
 bdy <- bdy[c(1:2, seq_along(bdy)[-1L])] # taking [(1,2,2:n)] to insert at 
[2]:
 ## deprecation warning only when not called by method dispatch from 
as.data.frame():
-bdy[[2L]] <- quote(if((sys.nframe() <= 1L || !identical(sys.function(-1L), 
as.data.frame)))
+bdy[[2L]] <- quote(if((sys.nframe() <= 1L || !(
+   identical(sys.function(-1L), as.data.frame) || (
+   .isMethodsDispatchOn() &&
+   methods::is(sys.function(-1L), 'derivedDefaultMethod') &&
+   identical(
+   sys.function(-1L)@generic,
+   structure('as.data.frame', package = 'base')
+   )
.Deprecated(
msg = gettextf(
"Direct call of '%s()' is deprecated.  Use '%s()' or
'%s()' instead",

The patch passes make check-devel, but I'm not sure how to safely put
setGeneric('as.data.frame'); as.data.frame(factor(1:10)) in a
regression test.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] confusion over spellchecking

2024-03-13 Thread Ivan Krylov via R-package-devel
В Sun, 10 Mar 2024 13:55:43 -0400
Ben Bolker  пишет:

> I am working on a package and can't seem to get rid of a NOTE about
> 
> Possibly misspelled words in DESCRIPTION:
>glmmTMB (10:88)
>lme (10:82)
> 
> on win-builder.

Do you have these words anywhere else in the package (e.g. in the Rd
files)? It turns out that R has a special environment variable that
makes it ignore custom dictionaries specifically for DESCRIPTION:

>>## Allow providing package defaults but make this controllable via
>>##   _R_ASPELL_USE_DEFAULTS_FOR_PACKAGE_DESCRIPTION_
>>## to safeguard against possible mis-use for CRAN incoming checks.

I cannot see it used anywhere under the trunk/CRAN subdirectory in the
developer.r-project.org Subversion repo, but it could be set somewhere
else on Win-Builder.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Submission after archived version

2024-03-13 Thread Ivan Krylov via R-package-devel
В Mon, 11 Mar 2024 23:45:13 +0100
Nils Mechtel  пишет:

> Despite R CMD check not giving any errors or warnings, the package
> doesn’t pass the pre-tests:

If your question was more about the reasons for the difference between
your R CMD check and the pre-tests, most of it is due to --as-cran:

(Using commit ffe216d from https://github.com/nilsmechtel/MetAlyzer as
the basis for the example, which seems to be different from the
incoming pretest from the link you've shared.)

$ R-devel CMD check MetAlyzer_1.0.0.tar.gz
<...>
Status: OK  
$ R-devel CMD check --as-cran MetAlyzer_1.0.0.tar.gz
<...>
* checking for non-standard things in the check directory ... NOTE
Found the following files/directories: ‘metabolomics_data.csv’
<...>

It's less wasteful to run checks without --as-cran in CI (as you
currently do), but you need to perform additional testing before making
a release. The incoming pre-tests use a custom set of environment
variables that go a but further than just --as-cran:
https://svn.r-project.org/R-dev-web/trunk/CRAN/QA/Kurt/lib/R/Scripts/check_CRAN_incoming.R

In particular, _R_CHECK_CRAN_INCOMING_USE_ASPELL_=true enables the
check for words that are possibly misspelled:

(Using an extra environment variable because your package has been
already published and R filters out "misspellings" found in the CRAN
version of the package. Congratulations!)

$ env \
 _R_CHECK_CRAN_INCOMING_ASPELL_RECHECK_MAYBE_=FALSE \
 _R_CHECK_CRAN_INCOMING_USE_ASPELL_=true \
 R-devel CMD check --as-cran MetAlyzer_1.0.0.tar.gz
<...>
Possibly misspelled words in DESCRIPTION:
  metabolomics (15:78)
<...>

Yet another way to avoid false misspellings is to create a custom
dictionary:
http://dirk.eddelbuettel.com/blog/2017/08/10/#008_aspell_cran_incoming

$ mkdir -p .aspell
$ echo '
 Rd_files <- vignettes <- R_files <- description <- list(
  encoding = "UTF-8",
  language = "en",
  dictionaries = c("en_stats", "dictionary")
 )
' > .aspell/defaults.R
$ R -q -s -e '
 saveRDS(c(
  "metabolomics" # , extra words go here
 ), file.path(".aspell", "dictionary.rds"))
'
$ R CMD build .
$ env \
 _R_CHECK_CRAN_INCOMING_ASPELL_RECHECK_MAYBE_=FALSE \
 _R_CHECK_CRAN_INCOMING_USE_ASPELL_=true \
 R-devel CMD check --as-cran MetAlyzer_1.0.0.tar.gz
# No more "Possibly misspelled words in DESCRIPTION"!

Some day, this will be documented in Writing R Extensions, or maybe in
R Internals (where the other _R_CHECK_* variables are documented), or
perhaps in the CRAN policy. See also:
https://stat.ethz.ch/pipermail/r-package-devel/2024q1/010558.html

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Submission after archived version

2024-03-12 Thread Ivan Krylov via R-package-devel
В Mon, 11 Mar 2024 23:45:13 +0100
Nils Mechtel  пишет:

> Debian:
> 
> Status: 3 NOTEs

>> * checking CRAN incoming feasibility ... [4s/6s] NOTE

>> Possibly misspelled words in DESCRIPTION:
>>  metabolomics (36:78)

This one can be explained in the submission comment. The rest of the
NOTE is to be expected.

>> * checking DESCRIPTION meta-information ... NOTE
>> Author field differs from that derived from Authors@R

Just remove the Author: field from your DESCRIPTION and let R CMD build
automatically generate it from Authors@R.

>> * checking for non-standard things in the check directory ... NOTE
>> Found the following files/directories:
>>  ‘metabolomics_data.csv’

Make sure that when your tests and examples create files, they do so in
the session temp directory and then remove the files afterwards. If a
user had a valuable file named metabolomics_data.csv in the current
directory, ran example(...) and had it overwritten as a result, they
would be very unhappy.

The NOTEs on Windows are similar.

Good luck!

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] [EXTERN] Re: [EXTERN] Re: [EXTERN] Re: @doctype is deprecated. need help for r package documentation

2024-03-12 Thread Ivan Krylov via R-package-devel
В Mon, 11 Mar 2024 14:57:58 +
"Ruff, Sergej"  пишет:

> I uploaded the old version of the package to my repo:
> https://github.com/SergejRuff/boot

After installing this tarball, running RStudio and typing:

library(bootGSEA)
?bootGSEA

...I see the help page in RStudio's help tab, not in the browser. I
think this is the expected behaviour for RStudio.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] RFC: C backtraces for R CMD check via just-in-time debugging

2024-03-11 Thread Ivan Krylov via R-package-devel
Vladimir,

Thank you for the example and for sharing the ideas regarding
symbol-relative offsets!

On Thu, 7 Mar 2024 09:38:18 -0500 (EST)
Vladimir Dergachev  wrote:

>  unw_get_reg(, UNW_REG_IP, );

Is it ever possible for unw_get_reg() to fail (return non-zero) for
UNW_REG_IP? The documentation isn't being obvious about this. Then
again, if the process is so damaged it cannot even read the instruction
pointer from its own stack frame, any attempts at self-debugging must
be doomed.

>* this should work as a package, but I am not sure whether the
> offsets between package symbols and R symbols would be static or not.

Since package shared objects are mmap()ed into the address space and
(at least on Linux with ASLR enabled) mmap()s are supposed to be made
unpredictable, this offset ends up not being static. On Linux, R seems
to be normally built as a position-independent executable, so no matter
whether there is a libR.so, both the R base address and the package
shared object base address are randomised:

$ cat ex.c
#include 
#include 
void addr_diff(void) {
 ptrdiff_t diff = (char*)_diff - (char*)
 Rprintf("self - Rprintf = %td\n", diff);
}
$ R CMD SHLIB ex.c
$ R-dynamic -q -s -e 'dyn.load("ex.so"); .C("addr_diff");'
self - Rprintf = -9900928
$ R-dynamic -q -s -e 'dyn.load("ex.so"); .C("addr_diff");'
self - Rprintf = -15561600
$ R-static -q -s -e 'dyn.load("ex.so"); .C("addr_diff");'
self - Rprintf = 45537907472976
$ R-static -q -s -e 'dyn.load("ex.so"); .C("addr_diff");'
self - Rprintf = 46527711447632

>* R ought to know where packages are loaded, we might want to be
> clever and print out information on which package contains which
> function, or there might be identical R_init_RMVL() printouts.

That's true. Informaion on all registered symbols is available from
getLoadedDLLs().

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] [EXTERN] Re: [EXTERN] Re: @doctype is deprecated. need help for r package documentation

2024-03-07 Thread Ivan Krylov via R-package-devel
В Thu, 7 Mar 2024 20:27:29 +
"Ruff, Sergej"  пишет:

> I am refering to Rstudio. I checked the settings and type is set to
> "htlm", not text. And I was wondering why the package documentation
> opened in a browser when I used @doctype.

Do you still have the source package .tar.gz file for which ?bootGSEA
would start a browser from inside RStudio?

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] @doctype is deprecated. need help for r package documentation

2024-03-07 Thread Ivan Krylov via R-package-devel
В Thu, 7 Mar 2024 10:37:51 +
"Ruff, Sergej"  пишет:

> I noticed that when I try _?bootGSEA_ it goes to the help page in R
> itself but not to the html page

That's up to the user to choose. help(bootGSEA, help_type = 'html')
should get you to the HTML documentation; help(bootGSEA, help_type =
'text') should give you plain text. The default depends on
options(help_type=...). On Windows, you get a choice during
installation of R; this gets recorded in file.path(R.home('etc'),
'Rprofile.site').

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] RFC: C backtraces for R CMD check via just-in-time debugging

2024-03-07 Thread Ivan Krylov via R-package-devel
On Tue, 5 Mar 2024 18:26:28 -0500 (EST)
Vladimir Dergachev  wrote:

> I use libunwind in my programs, works quite well, and simple to use.
> 
> Happy to share the code if there is interest..

Do you mean that you use libunwind in signal handlers? An example on
how to produce a backtrace without calling any async-signal-unsafe
functions would indeed be greatly useful.

Speaking of shared objects injected using LD_PRELOAD, I've experimented
some more, and I think that none of them would work with R without
additional adjustments. They install their signal handler very soon
after the process starts up, and later, when R initialises, it
installs its own signal handler, overwriting the previous one. For this
scheme to work, either R would have to cooperate, remembering a pointer
to the previous signal handler and calling it at some point (which
sounds unsafe), or the injected shared object would have to override
sigaction() and call R's signal handler from its own (which sounds
extremely unsafe).

Without that, if we want C-level backtraces, we either need to patch R
to produce them (using backtrace() and limiting this to glibc systems
or using libunwind and paying the dependency cost) or to use a debugger.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] [External] [External] RcmdrPlugin.HH_1.1-48.tar.gz

2024-03-07 Thread Ivan Krylov via R-package-devel
On Wed, 6 Mar 2024 13:46:55 -0500
Duncan Murdoch  wrote:

> is this just a more or less harmless error, thinking that 
> the dot needs escaping

I think it's this one. You are absolutely right that the dot doesn't
need escaping in either TRE (which is what's used inside exportPattern)
or PCRE. In PRCE, this regular expression would have worked as intended:

# We do match backslashes by mistake.
grepl('[\\.]', '\\')
# [1] TRUE

# In PCRE, this wouldn't have been a mistake.
grepl('[\\.]', c('\\', '.'), perl = TRUE)
# [1] FALSE TRUE

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[Rd] Never exporting .__global__ and .__suppressForeign__?

2024-03-06 Thread Ivan Krylov via R-devel
Hello,

(Dear Richard, I hope you don't mind being Cc:'d on this thread in
R-devel. This is one of the ways we can prevent similar problems from
happening in the future.)

Sometimes, package authors who use both exportPattern('.') and
utils::globalVariables(...) get confusing WARNINGs about undocumented
exports:
https://stat.ethz.ch/pipermail/r-package-devel/2024q1/010531.html

I would like to suggest adding the variables used by
utils::globalVariables and utils::suppressForeignCheck to the list of
things that should never be exported:

Index: src/library/base/R/namespace.R
===
--- src/library/base/R/namespace.R  (revision 86054)
+++ src/library/base/R/namespace.R  (working copy)
@@ -806,7 +806,8 @@
 if (length(exports)) {
 stoplist <- c(".__NAMESPACE__.", ".__S3MethodsTable__.",
   ".packageName", ".First.lib", ".onLoad",
-  ".onAttach", ".conflicts.OK", ".noGenerics")
+  ".onAttach", ".conflicts.OK", ".noGenerics",
+  ".__global__", ".__suppressForeign__")
 exports <- exports[! exports %in% stoplist]
 }
if(lev > 2L) message("--- processing exports for ", dQuote(package))

(Indeed, R CMD check is very careful to only access these variables
using the interface functions in the utils package, so there doesn't
seem to be any code that depends on them being exported, and they
usually aren't.)

Alternatively (or maybe additionally), it may be possible to enhance
the R CMD check diagnostics by checking whether the name of the
undocumented object starts with a dot and asking the user whether it
was intended to be exported. This is not as easy to implement due to
tools:::.check_packages working with the log output from
tools::undoc(), not the object itself. Would a change to
tools:::format.undoc be warranted?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] RcmdrPlugin.HH_1.1-48.tar.gz

2024-03-05 Thread Ivan Krylov via R-package-devel
В Tue, 5 Mar 2024 22:41:32 +
"Richard M. Heiberger"  пишет:

>  Undocumented code objects:
>'.__global__'
>  All user-level objects in a package should have documentation
> entries. See chapter 'Writing R documentation files' in the 'Writing R
>  Extensions' manual.

This object is not here for the user of the package. If you don't
export it, there will be no WARNING about it being undocumented. This
variable is exported because of exportPattern(".") in the file
NAMESPACE. The lone dot is a regular expression that matches any name
of an R object.

If you don't want to manually list your exports in the NAMESPACE file
(which can get tedious) or generate it (which takes additional
dependencies and build steps), you can use exportPattern('^[^\\.]') to
export everything except objects with a name starting with a period:
https://cran.r-project.org/doc/manuals/R-exts.html#Specifying-imports-and-exports

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] RFC: C backtraces for R CMD check via just-in-time debugging

2024-03-03 Thread Ivan Krylov via R-package-devel
On Sun, 3 Mar 2024 19:19:43 -0800
Kevin Ushey  wrote:

> Would libSegFault be useful here?

Glad to know it has been moved to
 and not
just removed altogether after the upstream commit
.

libSegFault is safer than, say, libsegfault [*] because it both
supports SA_ONSTACK (for when a SIGSEGV is caused by stack overflow)
and avoids functions like snprintf() (which depend on the locale code,
which may have been the source of the crash). The only correctness
problem that may still be unaddressed is potential memory allocations
in backtrace() when it loads libgcc on first use. That should be easy
to fix by calling backtrace() once in segfault_init(). Unfortunately,
libSegFault is limited to glibc systems, so a different solution will
be needed on Windows, macOS and Linux systems with the musl libc.

Google-owned "backward" [**] tries to do most of this right, but (1) is
designed to be compiled together with C++ programs, not injected into
unrelated processes and (2) will exit the process if it survives
raise(signum), which will interfere with both rJava (judging by the
number of Java-related SIGSEGVs I saw while running R CMD check) and R's
own stack overflow survival attempts.

-- 
Best regards,
Ivan

[*] https://github.com/stass/libsegfault
(Which doesn't compile out of the box on GNU/Linux due to missing
pthread_np.h, although that should be easy to patch.)

[**] https://github.com/bombela/backward-cpp

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] RFC: C backtraces for R CMD check via just-in-time debugging

2024-03-03 Thread Ivan Krylov via R-package-devel
Hello,

This may be of interest to people who run lots of R CMD checks and have
to deal with resulting crashes in compiled code.

Every now and then, the CRAN checks surface a particularly nasty crash.
The R-level traceback stops in the compiled code. It's not obvious
where exactly the crash happens. Naturally, this never happened on the
maintainer's computer before and, in fact, is hard to reproduce.

Containers would help, but they cannot solve the problem completely.
Some problems only surface when there's more than 32 logical
processors, or during certain times of day. It may help to at least see
the location of the crash as it happens on the computer running the
check.

One way to provide that would be to run a special debugger that does
nothing most of the time, attaches to child threads and processes, and
produces backtraces when processes receive a crashing signal. There is
such a debugger for Windows [1], and there is now a proof of concept
for amd64 Linux [2]. 

I've just tried [2] on a 250-package reverse dependency check and saw a
lot of SIGSEGVs with rcx=cafebabe or Java in the backtrace, but
other than that, it seems to work fine. Do you think it's worth
developing further?

The major downside of using a debugger like this is a noticeable change
in the environment: [v]fork(), clone() and exec() become slower,
attaching another tracer becomes impossible, SIGSEGVs may become much
slower (although I do hope that most software I rely upon doesn't care
about SIGSEGVs per second). On the other hand, these wrappers are as
transparent as they get and don't even need R -d to pass the arguments
to the child process.

The other way to provide C-level backtraces is a post-mortem debugger
(registered via the AeDebug registry key on Windows or
kernel.core_pattern sysctl on Linux). This avoids interference with the
process environment during normal execution, but requires more
integration work to collect the crash dumps, process them into usable
backtraces and associate with the R CMD check runs. There are also
injectable DLLs like libbacktrace, but these have to interfere with the
process from the inside, which may be worse than ptrace() in terms of
observable environment changes. On glibc systems (but not musl, macOS,
Windows), R's SIGSEGV handler could be enhanced to call
backtrace_symbols_fd(), which should be safe (no malloc()) as long as
libgcc is preloaded.

Is adding C-level backtraces to R CMD checks worth the effort? Could it
be a good idea to add this on CRAN? If yes, how can I help?

-- 
Best regards,
Ivan

[1] , see "catchsegv"

[2] https://codeberg.org/aitap/tracecrash

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Additional issues: Intel segfault

2024-03-01 Thread Ivan Krylov via R-package-devel
В Sat, 2 Mar 2024 02:07:47 +
Murray Efford  пишет:

> Gabor suggested https://github.com/r-hub/rhub2 and that worked like a
> charm. A check there on the Intel platform found no errors in my
> present version of secrdesign, so I'll resubmit with confidence.

Thank you for letting me know! Having this as a container simplifies a
lot of things.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Additional issues: Intel segfault

2024-03-01 Thread Ivan Krylov via R-package-devel
В Fri, 1 Mar 2024 07:42:01 +
Murray Efford  пишет:

> R CMD check suggests it is most likely in the Examples for
> 'validate', but all code there is wrapped in \dontrun{}.

The crash happens after q('no'), suggesting a corruption in the heap or
in the R memory manager. At least it's a null pointer being
dereferenced and not a 0xRANDOM_LOOKING_NUMBER: this limits the impact
of the problem.

I don't know if anyone created an easily reproducible container with an
Intel build of R (there's https://hub.docker.com/r/intel/oneapi, but
aren't the compilers themselves supposed to be not redistributable?),
so you will most likely have to follow
https://www.stats.ox.ac.uk/pub/bdr/Intel/README.txt and
https://cran.r-project.org/doc/manuals/r-devel/R-admin.html#Intel-compilers
manually, compiling R using Intel compilers yourself in order to
reproduce this.

I think it would be great if CRAN checking machines used a just-in-time
debugger to provide C-level backtraces at the place of the crash. For
Windows, such a utility does exist [*], but I recently learned that the
glibc `catchsegv` program (and most other similar programs) used to
perform shared object preloading (before being thrown out of the
codebase altogether), which is more intrusive than it could be. A proof
of concept using GDB on Linux can be shown to work:

R -d gdb \
 --debugger-args='-batch -ex run -ex bt -ex c -ex q' \
 -e '
  Rcpp::sourceCpp(code =
   "//[[Rcpp::export]]\nvoid rip() { *(double*)(42) = 42; }"
  ); rip()
 '

-- 
Best regards,
Ivan

[*] https://github.com/jrfonseca/drmingw

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Unexpected multi-core CPU usage in package tests

2024-02-28 Thread Ivan Krylov via R-package-devel
В Tue, 27 Feb 2024 11:14:19 +
Jon Clayden  пишет:

> My testing route is to install the packages within the
> 'rocker/r-devel' Docker container, which is Debian-based, then use
> 'time' to evaluate CPU usage. Note that, even though 'RNifti' does not
> use OpenMP, setting OMP_NUM_THREADS changes its CPU usage

I think that's because rocker/r-devel uses parallel OpenBLAS:

$ podman run --rm -it docker.io/rocker/r-devel \
 R -q -s -e 'sessionInfo()' | grep -A1 BLAS
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.24.so;  
LAPACK version 3.11.0

The incoming CRAN check machine either sets the BLAS parallellism to 1
or uses a non-parallel BLAS. With rocker/r-devel, you can run R with
the environment variable OPENBLAS_NUM_THREADS set to 1. It's been
effective in the past to run R -d gdb and set a breakpoint on
pthread_create before launching the test. (In theory, it may be
required to set a breakpoint on every system call that may be used to
create threads, including various variations of clone(), subject to
variations between operating systems, but pthread_create has been
enough for me so far.)

With OPENBLAS_NUM_THREADS=1, I'm only seeing OpenMP threads created by
the mmand package during tests for your package tractor.base, and the
latest commit (that temporary disables testing of mmand) doesn't hit
the breakpoint or raise any NOTEs at all.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[Rd] How to avoid the Markdown code block bug on R Bugzilla

2024-02-27 Thread Ivan Krylov via R-devel
Hello,

There's a rare but annoying bug in Bugzilla 5.1.2...5.3.2+ where a
Markdown code block inside a comment may be replaced by U+F111 or
U+F222, and then the following code blocks may end up being replaced by
the preceding ones. For example, the problem can be seen in PR16158:
https://bugs.r-project.org/show_bug.cgi?id=16158.

Here's how to avoid it:

1. If no code blocks have been already swallowed by Bugzilla, use the
comment preview to make sure yours won't be swallowed either. If you do
see a  or a  instead of your code block in the preview tab, try:
 - starting the comment with an empty line
 - removing the colons from the starting sentence
 - if all else fails, switching Markdown off

2. If you would like to post some code into a bug where this has
already happened, the preview won't be enough. Bugzilla::Markdown has
separate queues for fenced code blocks and indented code blocks, so if
one was swallowed, it may be possible to post the other. Unfortunately,
you won't know whether it'll fail until you post the comment, and by
then it may be a part of the problem. The only safe way to continue is
to switch Markdown off for the comment.

A technical analysis of the bug is available at
,
but it may take a while to get this fixed on the Bugzilla side.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] CRAN Package Check Note: Warning: trimming empty

2024-02-24 Thread Ivan Krylov via R-package-devel
В Fri, 23 Feb 2024 17:04:39 +
Sunmee Kim  пишет:

> Version: 1.0.4
> Check: HTML version of manual
> Result: NOTE

This may not be immediately obvious in the e-mail from CRAN, but I
think this is a reminder of a warning from the previous version of the
package. Haven't you just uploaded version 1.0.5? I'm not getting any
warnings for gesca_1.0.5.tar.gz from the /incoming/archive subdirectory
on the CRAN FTP server, except perhaps "This build time stamp is over a
month old", and the latest check looks almost clean in the same manner:
https://win-builder.r-project.org/incoming_pretest/gesca_1.0.5_20240223_172938/

What does the rest of the e-mail say?

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Tcl socket server (tcltk) does not work any more on R 4.3.2

2024-02-21 Thread Ivan Krylov via R-devel
В Wed, 21 Feb 2024 08:01:16 +0100
"webmail.gandi.net"  пишет:

> Since the {tcltk} package was working fine with  "while
> (Tcl_DoOneEvent(TCL_DONT_WAIT) && max_ev) max_ev—;", unless there is
> a clear performance enhancement with "while (i-- &&
> Tcl_ServiceAll())", it would perhaps be wise to revert this back.

I forgot to mention the comment in the new version of the function
explaining the switch:

>> [Tcl_DoOneEvent(TCL_DONT_WAIT)] <...> causes infinite recursion with
>> R handlers that have a re-entrancy guard, when TclSpinLoop is
>> invoked from such a handler (seen with Rhttp server)

The difference between Tcl_ServiceAll() and Tcl_DoOneEvent() is that
the latter calls Tcl_WaitForEvent(). The comments say that it is called
for the side effect of queuing the events detected by select(). The
function can indeed be observed to access the fileHandlers via the
thread-specific data pointer, which contain the file descriptors and
the instructions saying what to do with them.

Without Tcl_WaitForEvent, the only event sources known to Tcl are
RTcl_{setup,check}Proc (which only checks file descriptors owned by R),
Display{Setup,Check}Proc (which seems to be owned by Tk), and
Timer{Setup,Check}Proc (for which there doesn't seem to be any timers
by default).

As far as I understand the problem, while the function
worker_input_handler() from src/modules/internet/Rhttpd.c is running,
TclHandler() might be invoked, causing Tcl_DoOneEvent() to call
RTcl_checkProc() and therefore trying to run worker_input_handler()
again. The Rhttpd handler prevents this and doesn't clear the
condition, which causes the event loop to keep calling it. Is that
correct? Are there easy ways to reproduce the problem?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Conversion failure in 'mbcsToSbcs'

2024-02-21 Thread Ivan Krylov
В Wed, 21 Feb 2024 12:29:02 +
Package Maintainer  пишет:

> Error: processing vignette 'ggenealogy.Rnw' failed with diagnostics:
>  chunk 58 (label = plotCBText)

In order to use the non-standard graphics device, the chunk must
set the option fig=TRUE. Otherwise, when something calls
graphics::strwidth('Lubomír Kubáček', "inches"), R notices that no
graphics device is active and creates a default one, which happens to
be pdf() and has all these problems. With fig=TRUE, Sweave will
initialise the cairo_pdf() device first, and then graphics::strwidth()
will use the existing device, avoiding the error.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Tcl socket server (tcltk) does not work any more on R 4.3.2

2024-02-20 Thread Ivan Krylov via R-devel
В Tue, 20 Feb 2024 12:27:35 +0100
"webmail.gandi.net"  пишет:

> When R process #1 is R 4.2.3, it works as expected (whatever version
> of R #2). When R process #1 is R 4.3.2, nothing is sent or received
> through the socket apparently, but no error is issued and process #2
> seems to be able to connect to the socket.

The difference is related to the change in
src/library/tcltk/src/tcltk_unix.c.

In R-4.2.1, the function static void TclSpinLoop(void *data) says:

int max_ev = 100;
/* Tcl_ServiceAll is not enough here, for reasons that escape me */
while (Tcl_DoOneEvent(TCL_DONT_WAIT) && max_ev) max_ev--;

In R-devel, the function instead says:

int i = R_TCL_SPIN_MAX; 
while (i-- && Tcl_ServiceAll())
;

Manually calling Tcl_DoOneEvent(0) from the debugger at this point
makes the Tcl code respond to the connection. Tcl_ServiceAll() seems to
be still not enough. I'll try reading Tcl documentation to investigate
this further.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Conversion failure in 'mbcsToSbcs'

2024-02-15 Thread Ivan Krylov
В Mon, 12 Feb 2024 16:01:27 +
Package Maintainer  пишет:

> Unfortunately, I received a reply from the CRAN submission team
> stating that my vignette file is still obtaining the "mbcsToSbcs"
> ERROR as is shown here
> (https://win-builder.r-project.org/incoming_pretest/ggenealogy_1.0.3_20240212_152455/Debian/00check.log).

I am sorry for leading you down the wrong way with my advice. It turns
out that no 8-bit Type-1 encoding known to pdf() can represent both
'Lubomír Kubáček' and 'Anders Ågren':

lapply(
 setNames(nm = c(
  'latin1', 'cp1252', 'latin2', 'latin7',
  'latin-9', 'CP1250', 'CP1257'
 )), function(enc)
  iconv(enc2utf8(c(
   'Lubomír Kubáček', 'Anders Ågren'
  )), 'UTF-8', enc, toRaw = TRUE)
) |> sapply(lengths)
# one of the two strings cannot be represented, returning a NULL:
#  latin1 cp1252 latin2 latin7 latin-9 CP1250 CP1257
# [1,]  0  0 15  0   0 15  0
# [2,] 12 12  0 12  12  0 12

While it may still be possible to give extra parameters to pdf() to use
a font encoding that covers all the relevant characters, it seems
easier to switch to cairo_pdf() for your multi-lingual plots. Place the
following somewhere in the beginning of the vignette:

<>=
my.Swd <- function(name, width, height, ...)
 grDevices::cairo_pdf(
  filename = paste(name, "pdf", sep = "."),
  width = width, height = height
 )
@
\SweaveOpts{grdevice=my.Swd,pdf=FALSE}

This should define a new plot device function for Sweave, one that
handles more Unicode characters correctly.

> PS: Thanks for the advice about plain text mode. Hopefully, I have
> correctly abide by that advice in this current email.

This e-mail arrived in plain text, thank you!

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] certain pipe() use cases not working in r-devel

2024-02-15 Thread Ivan Krylov via R-devel
В Wed, 14 Feb 2024 14:43:12 -0800
Jennifer Bryan  пишет:

> But in r-devel on macOS, this is silent no-op, i.e. "hello, world"
> does not print:
> 
> > R.version.string  
> [1] "R Under development (unstable) (2024-02-13 r85895)"
> > con <- pipe("cat")
> > writeLines("hello, world", con)  

I can reproduce this on 64-bit Linux.

I think that this boils down to problems with cleanup in R_pclose_pg
[*]. The FILE* fp corresponding to the child process pipe is created
using fdopen() in R_popen_pg(), but R_pclose_pg() only performs close()
on the file descriptor returned by fileno(). The FILE* itself is
leaked, and any buffered content waiting to be written out is lost.

One of the last few lines in the strace output before the process
terminates is the standard C library cleaning up the FILE* object and
trying to flush the buffer:

$ strace -f bin/R -q -s \
 -e 'writeLines("hello", x <- pipe("cat")); close(x)'
...skip...
write(5, "hello\n", 6)  = -1 EBADF (Bad file descriptor)
exit_group(0)   = ?
+++ exited with 0 +++

There is a comment saying "see timeout_wait for why not to use fclose",
which I think references a different function, R_pclose_timeout():

>> Do not use fclose, because on Solaris it sets errno to "Invalid
>> seek" when the pipe is already closed (e.g. because of timeout).
>> fclose would not return an error, but it would set errno and the
>> non-zero errno would then be reported by R's "system" function.

(There are no comments about fclose() in timeout_wait() itself.)

Is there a way to work around the errno problem without letting the
FILE* leak?

-- 
Best regards,
Ivan

[*] Introduced in https://bugs.r-project.org/show_bug.cgi?id=17764#c6
to run child processes in a separate process group, safe from
interrupts aimed at R.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] failing CRAN checks due to problems with dependencies

2024-02-08 Thread Ivan Krylov via R-package-devel
В Wed, 7 Feb 2024 08:40:44 -0600
Marcin Jurek  пишет:

> Packages required but not available: 'Rcpp', 'FNN',
> 'RcppArmadillo' Packages suggested but not available for checking:
> 'fields', 'rmarkdown', 'testthat', 'maptools'

One of the machines running the incoming checks was having problems. If
you followed the failing dependency chain by looking at the CRAN check
results of the packages described as "not available", you could
eventually find a package needing compilation (Rcpp or stringi or
something else), look at the installation log and see Make trying to
run commands that are completely wrong.

It looked like the path to the compiler was empty:
https://web.archive.org/web/20240208191430/https://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-debian-clang/Rcpp-00install.html

I think that the problems are solved now, so it should be safe to
increment the version and submit it again.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Difficult debug

2024-02-07 Thread Ivan Krylov via R-devel
On Wed, 07 Feb 2024 14:01:44 -0600
"Therneau, Terry M., Ph.D. via R-devel"  wrote:

>  > test2 <- mysurv(fit2, pbc2$bili4, p0= 4:0/10, fit2, x0 =50)  
> ==31730== Invalid read of size 8
> ==31730==    at 0x298A07: Rf_allocVector3 (memory.c:2861)
> ==31730==    by 0x299B2C: Rf_allocVector (Rinlinedfuns.h:595)
> ==31730==    by 0x299B2C: R_alloc (memory.c:2330)
> ==31730==    by 0x3243C6: do_which (summary.c:1152)
<...>
> ==31730==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
<...>
>   *** caught segfault ***
> address 0x10, cause 'memory not mapped'

An unrelated allocation function suddenly dereferencing a null pointer
is likely indication of heap corruption. Valgrind may be silent about
it because the C heap (that it knows how to override and track) is still
intact, but the R memory management metadata got corrupted (which looks
like a valid memory access to Valgrind).

An easy solution could be brought by more instrumentation.

R can tell Valgrind to consider some memory accesses invalid if you
configure it using --with-valgrind-instrumentation [*], but I'm not
sure it will be able to trap overwriting GC metadata, so let's set it
aside for now.

If you compile your own R, you can configure it with -fsanitize=address
added to the compiler and linker flags [**]. I'm not sure whether the
bounds checks performed by AddressSanitizer would be sufficient to
catch the problem, but it's worth a try. Instead of compiling R with
sanitizers, it should be also possible to use the container image
docker.io/rocker/r-devel-san.

The hard option is left if no instrumentation lets you pinpoint the
error. Since the first (as far as Valgrind is concerned) memory error
already happens to result in a SIGSEGV, you can run R in a regular
debugger and try to work backwards from the local variables at the
location of the crash. Maybe there's a way to identify the block
containing the pointer that gets overwritten and set a watchpoint on
it for the next run of R. Maybe you can read the overwritten value as
double and guess where the number came from. If your processor is
sufficiently new, you can try `rr`, the time-travelling debugger [***],
to rewind the process execution back to the point where the pointer gets
overwritten.

-- 
Best regards,
Ivan

[*]
https://cran.r-project.org/doc/manuals/R-exts.html#Using-valgrind

[**]
https://cran.r-project.org/doc/manuals/R-exts.html#Using-Address-Sanitizer

[***]
https://rr-project.org
Judging by the domain name, it's practically designed to fix troublesome
bugs in R packages!

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] r-oldrel-linux- not in CRAN checks?

2024-02-06 Thread Ivan Krylov via R-package-devel
В Tue, 6 Feb 2024 18:27:32 +0100
Vincent van Hees  пишет:

> For details see:
> https://github.com/RfastOfficial/Rfast/issues/99

GitHub processed your plain text description of the problem as if it
was Markdown and among other things ate the text that used to be there
between angle brackets:

> #include
>  ^~~

By digging through the raw source code of the issue at
https://api.github.com/repos/RfastOfficial/Rfast/issues/99 it is
possible to find out which header was missing for Rfast:

> ../inst/include/Rfast/parallel.h:20:10:fatal error: tion: No such
> file or directory
> #include 
>  ^~~
>compilation terminated.

Indeed,  is a C++17 header [1]. While g++ version
7.5.0-3ubuntu1~18.04 seems to accept --std=c++17 without complaint, its
libstdc++-7-dev package is missing this header. Moreover, there's still
no  in libstdc++-8-dev. I think that you need libstdc++-9
for that to work, which is not in Bionic; older versions aren't
C++17-compliant enough to compile Rfast, and C++17 is listed in the
SystemRequirements of the package.

Installing clang-10 and editing Makeconf to use clang++-10 instead of
g++ seems to let the compilation proceed. In order to successfully link
the resulting shared object, I also had to edit Makeconf to specify
-L/usr/lib/gcc/x86_64-linux-gnu/7 when linking -lgfortran.

If you plan to use this in production, be very careful. I don't know
about binary compatibility guarantees between g++-7 and clang++-10, so
you might have to recompile every C++-using R package from source with
clang++-10 in order to avoid hard-to-debug problems when using them
together. (It might also work fine. That's the worst thing about such
problems.)

-- 
Best regards,
Ivan

[1] https://en.cppreference.com/w/cpp/header/execution

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] new maintainer for CRAN package XML

2024-02-05 Thread Ivan Krylov via R-package-devel
Dear Uwe Ligges,

On Mon, 22 Jan 2024 15:50:44 +0100
Uwe Ligges  wrote:

> So we are looking for a person volunteering to take over 'XML'.
> Please let us know if you are interested.

Unless someone else has been discussing this with CRAN in private or
had a package depending on XML and was planning to step up but forgot,
I would like to volunteer.

I'm assuming that the Omegahat page is best preserved in its current
form for historical reasons, so instead I have prepared a Git
repository and a page with an option to file issues on the Codeberg
forge: https://codeberg.org/aitap/XML

With the help of the amazing list members, I have also set up a virtual
machine to run the reverse dependency checks, so it should be possible
to avoid immediate breakage if I have to make any changes.

That's the theory, at least.

(Also, thank you for your reply to my question!)

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Advice debugging M1Mac check errors

2024-02-05 Thread Ivan Krylov via R-devel
On Sun, 4 Feb 2024 20:41:51 +0100
Holger Hoefling  wrote:

> I wanted to ask if people have good advice on how to debug M1Mac
> package check errors when you don´t have a Mac?

Apologies for not answering the question you asked, but is this about
hdf5r and problems printing R_xlen_t [*] that appeared in 1.3.8 and you
tried to solve in 1.3.9?

We had a thread about this last November:
https://stat.ethz.ch/pipermail/r-package-devel/2023q4/010123.html

To summarise, there is no single standard C format specifier that can be
used to print R_xlen_t. As an implementation detail, it can be defined
as int or ptrdiff_t (or something completely different in the future),
and ptrdiff_t itself is usually defined as long or long long (or, also,
something completely different on a weirder platform). All three basic
types can have different widths and cause painful stack-related
problems when a mismatch happens.

In R-4.4, there will be a macro R_PRIdXLEN_T defining a compatible
printf specifier. Until then (and for compatibility with R-4.3 and
lower), it's relatively safe to cast to (long long) or (ptrdiff_t) and
then use the corresponding specifier, but that's not 100% future-proof.
Also, mind the warnings that mingw compilers sometimes emit for "new"
printf specifiers despite UCRT is documented to support them.

-- 
Best regards,
Ivan

[*] https://www.stats.ox.ac.uk/pub/bdr/M1mac/hdf5r.out

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Bioconductor reverse dependency checks for a CRAN package

2024-02-05 Thread Ivan Krylov via R-package-devel
Thank you Georgi Boshnakov, Ben Bolker, and Diego Hernangómez Herrero
for introducing me to `revdepcheck`!

On Tue, 30 Jan 2024 12:38:57 -0500
Ben Bolker  wrote:

> I have had a few issues with it 
>  but overall it's
> been very helpful.

Indeed that looks perplexing. Writable .Library can also cause problems
for people running R-svn built in their home directories without
R_LIBS_USER set when they check their packages without Suggests.
I'm also relying on .Library.site for the dependencies of the reverse
dependencies. So far, my setup seems to be working as intended, but I'll
keep this issue in mind.

On Tue, 30 Jan 2024 18:57:41 +0100
Diego Hernangómez Herrero  wrote:

> Haven’t tried with a package with such an amount of revdeps, but my
> approach is revdepcheck in GH actions and commiting the result to the
> repo (that is somehow similar to the docker approach if you host the
> package in GitHub).

Great to know that reverse dependency checks can run in CI! I think
I'll keep a stateful virtual machine for now, because otherwise I would
need to find space for 4 to 32 gigabytes of cache somewhere (or download
everything from the repository mirrors every time).

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Bioconductor reverse dependency checks for a CRAN package

2024-02-05 Thread Ivan Krylov via R-package-devel
On Tue, 30 Jan 2024 16:24:40 +
Martin Morgan  wrote:

> BiocManager (the recommended way to install Bioconductor packages) at
> the end of the day does essentially install.packages(repos =
> BiocManager::repositories()), ensuring that the right versions of
> Bioconductor packages are installed for the version of R in use.

That's great to know, thanks! I think I will use BiocManager::install
for now, both because it uses the correct repositories and because it
doesn't forcibly reinstall the packages I am asking for. With bspm, I
can run BiocManager::install(all_the_dependencies) and have the system
perform the least amount of work required to reach the desired state.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Bioconductor reverse dependency checks for a CRAN package

2024-02-05 Thread Ivan Krylov via R-package-devel
Dear Dirk,

Thank you very much for your help here and over on GitHub!

I have finally managed to get the reverse dependency checks working. It
took some additional disk space and a few more system dependencies. If
not for r2u, I would have been stuck for much longer. I really
appreciate the work that went into packaging all these R packages.

On Tue, 30 Jan 2024 10:32:36 -0600
Dirk Eddelbuettel  wrote:

> For what it is worth, my own go-to for many years has been a VM in
> which I install 'all packages needed' for the rev.dep to be checked.

This approach seems to be working for me, too. I had initially hoped to
set something up using CI infrastructure, but there's too many
dependencies to install in a prepare step and it's too much work to
make a container image with all dependencies anew every time I want to
run a reverse dependency check. Easier to just let it run overnight on
a spare computer.

> Well a few of us maintain packages with quite a tail and cope. Rcpp
> has 2700, RcppArmadillo have over 100, BH a few hundred. These aren't
> 'light'.

Maintaining a top-5 CRAN package by in-degree rank
[10.32614/RJ-2023-060] is indeed a very serious responsibility. 

> I wrote myself the `prrd` package (on CRAN) for this, others have
> other tools -- Team data.table managed to release 1.5.0 to CRAN today
> too. So this clearly is possible.

I'll check out `prrd` next, thanks. tools::check_packages_in_dir is
nice, but it could be faster if I could disable mc.preschedule. 

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] Bioconductor reverse dependency checks for a CRAN package

2024-01-30 Thread Ivan Krylov via R-package-devel
Hello R-package-devel,

What would you recommend in order to run reverse dependency checks for
a package with 182 direct strong dependencies from CRAN and 66 from
Bioconductor (plus 3 more from annotations and experiments)?

Without extra environment variables, R CMD check requires the Suggested
packages to be available, which means installing...

revdepdep <- package_dependencies(revdep, which = 'most')
revdeprest <- package_dependencies(
 unique(unlist(revdepdep)),
 which = 'strong', recursive = TRUE
)
length(setdiff(
 unlist(c(revdepdep, revdeprest)),
 unlist(standard_package_names())
))

...up to 1316 packages. 7 of these suggested packages aren't on CRAN or
Bioconductor (because they've been archived or have always lived on
GitHub), but even if I filter those out, it's not easy. Some of the
Bioconductor dependencies are large; I now have multiple gigabytes of
genome fragments and mass spectra, but also a 500-megabyte arrow.so in
my library. As long as a data package declares a dependency on your
package, it still has to be installed and checked, right?

Manually installing the SystemRequirements is no fun at all, so I've
tried the rocker/r2u container. It got me most of the way there, but
there were a few remaining packages with newer versions on CRAN. For
these, I had to install the system packages manually in order to build
them from source.

Someone told me to try the rocker/r-base container together with pak.
It was more proactive at telling me about dependency conflicts and
would have got me most of the way there too, except it somehow got me a
'stringi' binary without the corresponding libicu*.so*, which stopped
the installation process. Again, nothing that a bit of manual work
wouldn't fix, but I don't feel comfortable setting this up on a CI
system. (Not on every commit, of course - that would be extremely
wasteful - but it would be nice if it was possible to run these checks
before release on a different computer and spot more problems this way.)

I can't help but notice that neither install.packages() nor pak() is
the recommended way to install Bioconductor packages. Could that
introduce additional problems with checking the reverse dependencies?

Then there's the check_packages_in_dir() function itself. Its behaviour
about the reverse dependencies is not very helpful: they are removed
altogether or at least moved away. Something may be wrong with my CRAN
mirror, because some of the downloaded reverse dependencies come out
with a size of zero and subsequently fail the check very quickly.

I am thinking of keeping a separate persistent library with all the
1316 dependencies required to check the reverse dependencies and a
persistent directory with the reverse dependencies themselves. Instead
of using the reverse=... argument, I'm thinking of using the following
scheme:

1. Use package_dependencies() to determine the list of packages to test.
2. Use download.packages() to download the latest version of everything
if it doesn't already exist. Retry if got zero-sized or otherwise
damaged tarballs. Remove old versions of packages if a newer version
exists.
3. Run check_packages_in_dir() on the whole directory with the
downloaded reverse dependencies.

For this to work, I need a way to run step (3) twice, ensuring that one
of the runs is performed with the CRAN version of the package in the
library and the other one is performed with the to-be-released version
of the package in the library. Has anyone already come up with an
automated way to do that?

No wonder nobody wants to maintain the XML package.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Possible malware(?) in a vignette

2024-01-28 Thread Ivan Krylov via R-package-devel
There used to be a long analysis in the draft of this e-mail [*], but
let me cut to the chase.

Even something as simple as replacing the four-byte comment [**] at the
beginning of the file ("%\xd0\xd4\xc5\xd8" -> "%") that keeps the
file fully readable (!) results in the same behaviour but zero
detections:

$ sha256sum d_jss_paper*.pdf
0ae3b229fdd763a0571463dc98e02010752bb0213a672db6826afcd72ccaf291  
d_jss_paper1.pdf
9486d99c1c1f2d1b06f0b6c5d27c54d4f6e39d69a91d7fad845f323b0ab88de9  
d_jss_paper.pdf
$ diff -u <(hd d_jss_paper.pdf) <(hd d_jss_paper1.pdf)
--- /dev/fd/63  2024-01-28 13:00:43.454419322 +0300
+++ /dev/fd/62  2024-01-28 13:00:43.454419322 +0300
@@ -1,4 +1,4 @@
-  25 50 44 46 2d 31 2e 35  0a 25 d0 d4 c5 d8 0a 37  |%PDF-1.5.%.7|
+  25 50 44 46 2d 31 2e 35  0a 25 20 20 20 20 0a 37  |%PDF-1.5.%.7|
 0010  37 20 30 20 6f 62 6a 0a  3c 3c 0a 2f 4c 65 6e 67  |7 0 obj.<<./Leng|
 0020  74 68 20 32 36 32 38 20  20 20 20 20 20 0a 2f 46  |th 2628  ./F|
 0030  69 6c 74 65 72 20 2f 46  6c 61 74 65 44 65 63 6f  |ilter /FlateDeco|

https://www.virustotal.com/gui/file/0ae3b229fdd763a0571463dc98e02010752bb0213a672db6826afcd72ccaf291

The scary-looking files and hosts being accessed are just Adobe Reader
and Chrome behaving in a manner indistinguishable from spyware. Upload
any PDF file with links in it and you'll see the same picture. Even the
original report for d_jss_paper.pdf from poweRlaw_0.70.6 says "no
sandboxes flagged this file as malicious".

I think that the few non-major antivirus products that "detected" the
original file remembered a low-quality checksum of a different file,
and this whole thread resulted from a checksum collision. 0x043BC33F
(71025471) is what, four bytes? Doesn't seem to be a standard CRC-32 or
the sum of all bytes modulo 2^32, though.

I cannot prove a negative, but I invite infosec people with more PDF
experience to comment further on the issue.

-- 
Best regards,
Ivan

[*] Colin seems to have used the Debian build of TeX Live 2017 to
generate it, which is non-trivial but possible to reproduce by
installing it from Debian Snapshots on top of Stretch. The resulting
file has a different hash (for valid reasons), the same behaviour, but
zero detections:
https://www.virustotal.com/gui/file/f7b0e0400167e06970ac61fcadfda29daec1c2ee685d4c9ff805e375bcffc985/behavior

Trying a "binary search" by removing PDF objects or replacing byte
ranges with ASCII spaces was also a dead end: any change results in no
detections.

[**] PDF 1.5 specification, section 3.1.2:

>> Comments (other than the %PDF−1.4 and %%EOF comments described in
>> Section 3.4, “File Structure”) have no semantics.

https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.5_v6.pdf#G8.1860480

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Possible malware(?) in a vignette

2024-01-27 Thread Ivan Krylov via R-package-devel
Apologies for being insufficiently clear. By "a file straight from NOAA" I 
meant a completely different PDF, 
, 
that gives the same SHA-256 hash whether downloaded by VirusTotal 

 or me, comes from a supposedly trusted source, and still makes Acrobat Reader 
behave like it's infected, show a crashed Firefox on the screenshot and drop a 
number of scary-looking files. Surely there will be a difference between 
reading an infected file and a non-infected file?

27 января 2024 г. 15:10:53 GMT+03:00, Bob Rudis  пишет:
>Ivan: do you know what mirror NOAA used at that time to get that version of
>the package? Or, did they pull it "directly" from cran.r-project.org
>(scare-quotes only b/c DNS spoofing is and has been a pretty solid attack
>vector)?

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Possible malware(?) in a vignette

2024-01-27 Thread Ivan Krylov via R-package-devel
В Sat, 27 Jan 2024 03:52:01 -0500
Bob Rudis  пишет:

> Two VT sandboxes used Adobe Acrobat Reader to open the PDF and the PDF
> seems to either had malicious JavaScript or had been crafted
> sufficiently to caused a buffer overflow in Reader that then let it
> perform other functions on those sandboxes.

Let's talk package versions and SHA256 hashes of
poweRlaw/inst/doc/d_jss_paper.pdf.

poweRlaw version 0.70.4:
Packaged: 2020-04-07 14:55:32 UTC
Date/Publication: 2020-04-07 16:10:02 UTC
SHA-256(poweRlaw/inst/doc/d_jss_paper.pdf):
96535de112f471c66e29b74c77444b34a29b82d6525c04d477ed2d987ea6ccae

Not previously uploaded to VirusTotal, currently checks out clean:
https://www.virustotal.com/gui/file/96535de112f471c66e29b74c77444b34a29b82d6525c04d477ed2d987ea6ccae

poweRlaw version 0.70.5:
Packaged: 2020-04-23 15:36:49 UTC
Date/Publication: 2020-04-23 16:40:06 UTC
SHA-256(poweRlaw/inst/doc/d_jss_paper.pdf):
5f827302ede74e1345fba5ba52c279129823da3c104baa821d654ebb8d7a67fb

Not previously uploaded to VirusTotal, also checks out clean:
https://www.virustotal.com/gui/file/5f827302ede74e1345fba5ba52c279129823da3c104baa821d654ebb8d7a67fb/behavior

For some reason, the Zenbox report shows a browser starting up and
someone (something?) moving the mouse:
https://vtbehaviour.commondatastorage.googleapis.com/5f827302ede74e1345fba5ba52c279129823da3c104baa821d654ebb8d7a67fb_Zenbox.html?GoogleAccessId=758681729565-rc7fgq07icj8c9dm2gi34a4cckv23...@developer.gserviceaccount.com=1706348766=KSTxSZJJUUv0FOA51Kwuot89ep4PKUDTY6tHL7kTyG7VwaMlF8VjmU90loeF4ytLBxKjkEtAk%2Ffr39xFrTTyOym3mehtc3HLyT9DS3C5qGa9OPVcu%2BfQfd8qr%2BRubBWb3SKNnhGpi%2Bn%2BTDhaiRx3PilEz%2BwVGiukfNUzWGBlGweG%2BmR1Y%2F0fIgDxJ3eyZ8KwTaocbywMoOLJeC1GSmoW8VYUAnFS2bb8P9Jt%2Bs%2F0axvAkc0M2pmSN3s2lpMq8u5P%2FZZ8yRIMdmv%2B1kUR5ajBdIa%2FHV8Vw8xAdNjZID6ozwAsmBOOizJmHgzr4zh1tX4V65qmcz8D3jctvDRKsuEqXA%3D%3D=text%2Fhtml;#overview

Lots of file activity. I think that all of it can be attributed to
either normal Acrobat Reader activity or normal Chrome activity.

Then we come to poweRlaw version 0.70.6:
Packaged: 2020-04-24 10:44:31 UTC
Date/Publication: 2020-04-25 07:30:12 UTC
SHA-256(inst/doc/d_jss_paper.pdf):
9486d99c1c1f2d1b06f0b6c5d27c54d4f6e39d69a91d7fad845f323b0ab88de9

The Web Archive capture version 20201205222617 for the address
https://cran.r-project.org/web/packages/poweRlaw/vignettes/d_jss_paper.pdf
has the same SHA-256 hash.

This file is being disputed because some antivirus applications flag it:
https://www.virustotal.com/gui/file/9486d99c1c1f2d1b06f0b6c5d27c54d4f6e39d69a91d7fad845f323b0ab88de9/behavior

The behaviour is exactly the same as the one from version 0.70.5:
browser opens with a link to a wrong DOI. Some links are followed.
https://vtbehaviour.commondatastorage.googleapis.com/9486d99c1c1f2d1b06f0b6c5d27c54d4f6e39d69a91d7fad845f323b0ab88de9_Zenbox.html?GoogleAccessId=758681729565-rc7fgq07icj8c9dm2gi34a4cckv23...@developer.gserviceaccount.com=1706347808=Kv1LXUGvDe988Br0pU1AMlttjYY1K9sDwouvZrlzAVSspkdOGS9Ow%2Bg%2F3VjnQLEshx08QqgOHZzQcghownumPDUJLBbEHbOk6KG9IZSH43rxkYhTIy%2BYT5PfNFIupevbJA5XrnJHrm1wKho2%2BDb4t8vA4cgOJJY0UahXTbIMKUeUmPCKAzx9W5kYKj55WhNDrIPrEuni9EeGWkFV45kPr%2BBwYfl2hK4%2BWv6K78CB7zJtzFltF6P3pewafn5Lg3M3AY5YcZ4TryXi01t0dq04Fha83fLRP7JUkmcfpAJauA48Ct0XN7RdCRPSogb0TAGwG%2BDstxNzLAphOEsVju9LUQ%3D%3D=text%2Fhtml;#dropped-info

I've uploaded a decompressed version (prepared using qpdf in.pdf
--stream-data=uncompress out.pdf) of the same file to VirusTotal, and
there are no detections. Zero detections, but the behaviour is the same:
some files are "dropped", but all of them relate to cache in Acrobat
Reader (which is nowadays a piece of Chrome) and Chrome itself:
https://www.virustotal.com/gui/file/5acbc41f103c88a801db36fa72f01d4fa81b9afa1879c36235b1f5373d46ee1a/behavior

Finally, there's poweRlaw version 0.80.0:
Packaged: 2024-01-25 10:39:42 UTC
Date/Publication: 2024-01-25 18:00:02 UTC
SHA-256(inst/doc/d_jss_paper.pdf):
17c252a38e6c9bcfab90a69070b17c5e9d8a1713b7bb376badaeba28b3a38739
Same zero flags, same behaviour of starting the browser, same "dropped"
files in the cache:
https://www.virustotal.com/gui/file/17c252a38e6c9bcfab90a69070b17c5e9d8a1713b7bb376badaeba28b3a38739/behavior
https://vtbehaviour.commondatastorage.googleapis.com/17c252a38e6c9bcfab90a69070b17c5e9d8a1713b7bb376badaeba28b3a38739_Zenbox.html?GoogleAccessId=758681729565-rc7fgq07icj8c9dm2gi34a4cckv23...@developer.gserviceaccount.com=1706348864=UjXMjCvz0uTjS1sqyr5y%2FOwluE%2BskW9F2XupXuOs5JgODlsL1BuwJcWJ56xddQNEtKDHDOaXoRfNxynsffmSaza4yJD9hvPJ6%2BrNMibbB8hojY53g07WKnCd3wdaOmOHEqIP7Md06QWD4CnLEN0KlRvWdsUUA%2F9YTB1bAVqkIR%2FtiaJcRrOTAmdG%2F9Hwrq4xpiEBaFZzO%2FsQPVj3dzNS1LQEXOHFAfnOTaC1LlbBfn9QQWCPib%2FpCOL7huVYqIFSm%2FO8VHWv67JD1qwcTOY7JSl8XPw1ueyumRpF5xF1rpWYCPjC1awU8tho25A2COA7f7LSkku0BRqkuHYW3kuZaw%3D%3D=text%2Fhtml;#dropped-info

I've also uploaded a PDF that came directly from a US agency (NOAA) and
got a similar report:

Re: [R-pkg-devel] How to deal with issues when using devtools::check_rhub(), rhub::check(), and web form

2024-01-24 Thread Ivan Krylov via R-package-devel
В Wed, 24 Jan 2024 16:14:05 -0800
Carl Schwarz  пишет:

> I tried using the web interface at https://builder.r-hub.io/ to
> select the denebian machines, and it returns a message saying
> 
> We're sorry, but something went wrong.
> If you are the application owner check the logs for more information.

> So how do I tell if this a "Rhub issue" or an issue with my package?

A problem with your package would look more like the check at least
starting and then producing errors. Here, it doesn't look like the
check is even starting.

> Or do I just give up on using Rhub to check the denebian machines?

For a while, Rhub used to offer the only on-demand checking service
specifically on Linux machines (there was Win-builder by Uwe Ligges and
macOS builder by Simon Urbanek, but no "Linux builder"), including
Debian [*]. Now that the funding ran out [**], you can try using
various continuous integration services to run your checks in a Linux
virtual machine. Many of them offer free compute minutes.

I think that you've already fulfilled the requirements of the CRAN
policy by fixing all known problems and having R CMD check --as-cran on
R-devel run for you by Win-Builder (which is what
devtools::check_win_devel() does).

-- 
Best regards,
Ivan

[*]
Named after Debra Lynn and Ian Murdock

[**]
https://github.com/RConsortium/r-repositories-wg/blob/main/minutes/2023-09-07_Minutes.md

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] New Package Removal because Shared Library Too Large from Debugging Symbols

2024-01-24 Thread Ivan Krylov via R-package-devel
On Mon, 22 Jan 2024 17:14:04 +0100
Tomas Kalibera  wrote:

> Yes, inside a bigger email, reports can get overlooked, particularly 
> when in a thread with a rather different subject. It wasn't
> overlooked this time thanks to Martin.

Then additional thanks goes to Martin, and I'll make sure to report in
the right place if a similar situation happens again.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] lost braces note on CRAN pretest related to \itemize

2024-01-23 Thread Ivan Krylov via R-package-devel
В Tue, 23 Jan 2024 19:39:54 +0100
Patrick Giraudoux  пишет:

>    \itemize{
>    \item{.}{lm and glm objects can be passed directly as the upper
> scope of term addition (all terms added).

Inside the \itemize and \enumerate commands, the \item command doesn't
take any arguments:
https://cran.r-project.org/doc/manuals/R-exts.html#Lists-and-tables

Instead, it starts a new paragraph with a number (\enumerate) or a
bullet point (\itemize). R CMD check is reminding you that \itemize{
\item{foo}{bar} } is equivalent to \itemize{ \item foo bar } without
any braces.

If you meant to highlight a word by making it an argument of the \item
command, use the \describe command. Here, you're highlighting a dot,
which would be rendered with a bullet point before it, so it's probably
neither semantically nor visually appropriate.

> \value{
>    A \code{\link[sf]{sfc}} object, of POINT geometry, with the
> following columns:
>    \itemize{
>    \item{ID}{ ID number}

The same problem applies here.

Additionally, R CMD check is reminding you that \value{} is implicitly
a special case of a \describe{} environment:
https://cran.r-project.org/doc/manuals/R-exts.html#index-_005cvalue

Since you're already using \item{}{} labels to name the components of
the value, just drop the \itemize{} (but keep its contents). \value{} is
enough.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Cannot see the failure output on Fedora clang/gcc falvor (page not found)

2024-01-22 Thread Ivan Krylov via R-package-devel
On Sun, 21 Jan 2024 16:51:39 +
Sameh Abdulah  wrote:

> However, we cannot access the webpage (page not found) to identify
> and address the failures on Fedora systems.
> 
> https://cran-archive.r-project.org/web/checks/2024/2024-01-12_check_results_MPCR.html
> 
> How can we see the failures on these systems?

I cannot help you with the exact output from the Fedora system (I think
it's lost), but here's how the package fails on mine:

* installing *source* package 'MPCR' ...
** using staged installation
Linux

/tmp/RtmpCSPOGc/Rbuild6043fb1a651/MPCR
/usr/bin/cmake
CMake is installed in: /usr/bin
-- The C compiler identification is GNU 12.2.0
-- The CXX compiler identification is GNU 12.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- WORKING ON RELEASE MODE
MPCR Install Result : FALSE
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
OpenMp Found
R Include Path :  /home/ivan/R-build/include
Rcpp Lib Path :  /home/ivan/R-build/library/Rcpp
R Home Path :  /home/ivan/R-build
CMake Error at cmake/FindR.cmake:63 (find_library):
  Could not find R_LIB using the following names: libR.so
Call Stack (most recent call first):
  CMakeLists.txt:70 (FIND_PACKAGE)


-- Configuring incomplete, errors occurred!
See also 
"/tmp/RtmpCSPOGc/Rbuild6043fb1a651/MPCR/bin/CMakeFiles/CMakeOutput.log".
make: *** No rule to make target 'clean'.  Stop.
make: *** No rule to make target 'all'.  Stop.
cp: cannot stat '/tmp/RtmpCSPOGc/Rbuild6043fb1a651/MPCR/bin/src/libmpcr.so': No 
such file or directory
Failed: libmpcr.so -> src
** libs
make: Nothing to be done for 'all'.
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
Error: package or namespace load failed for 'MPCR' in library.dynam(lib, 
package, package.lib):
 shared object 'MPCR.so' not found

It is not the default to build R as a shared library, and this
installation of R has been built without --enable-R-shlib. I'm sure
that with enough effort it's possible to propagate the information from
R to CMake so that it would make you a shared library in the correct
manner, but I think it's easier to separate your code into two parts:

 1. One part should contain most of your code, without the dependencies
on R. It can be built using CMake if that's what you prefer. It
will probably be more convenient to build it as a static library.

 2. The other part will be the R interface. Let the R build system
(described in WRE 1.2 [*] and below, especially 1.2.6) link the
final shared library from the small remaining part of the source
files (those that include R-related headers) and the static library
from the previous step. If you play your cards right, it will also
work on Windows without significant additional effort.

Have you considered linking your R package against the BLAS and LAPACK
that already come with R? This may not give the user the best possible
performance ever, but those who do care about performance have probably
installed a copy of BLAS of their own choice and may not prefer an
extra copy of OpenBLAS that may or may not match the optimal parameters
for their hardware. Same goes for libgfortran (that may be required
depending on what you're linking) [**].

This would also make it easier to comply with CRAN policy on external
libraries [***]: if you want to download software during package
installation, you may be required to host a fixed version of the
package on something extra reliable (like Zenodo) and verify a
cryptographic hash of the file you download before using it.

-- 
Best regards,
Ivan

[*]
https://cran.r-project.org/doc/manuals/R-exts.html#Configure-and-cleanup

[**]
https://cran.r-project.org/doc/manuals/R-exts.html#index-FLIBS

[***]
https://cran.r-project.org/web/packages/using_rust.html
https://cran.r-project.org/web/packages/external_libs.html

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] New Package Removal because Shared Library Too Large from Debugging Symbols

2024-01-22 Thread Ivan Krylov via R-package-devel
On Mon, 22 Jan 2024 12:30:46 +0100
Tomas Kalibera  wrote:

> Thanks, ported now to R-patched.

Thank you!

Is it fine to mention problems like this one in the middle of an
e-mail, or should I have left a note in the Bugzilla instead?

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Assistance Needed for Resolving Submission Issues with openaistream Package

2024-01-22 Thread Ivan Krylov via R-package-devel
Hello Li Gen and welcome to R-package-devel!

В Mon, 22 Jan 2024 17:50:33 +0800
 пишет:

> The specific areas of concern are:License Information: There's a note
> indicating that the license stub is an "invalid DCF". I've used 'MIT
> + file LICENSE' as the licensing terms. I would appreciate guidance
> on how to correctly format this section to meet the DCF standards.

Leave just the following lines in the LICENSE file, as it currently is
on CRAN [*]:

YEAR: 2023
COPYRIGHT HOLDER: openaistream authors

Why would you like to change it? CRAN doesn't want packages to provide
yet another copy of the MIT license inside the tarball. The text of the
MIT license is always available in an R install at
file.path(R.home('share'), 'licenses', 'MIT').

If you need a copy of the MIT license inside your GitHub repository,
store it elsewhere (e.g. LICENSE.md) and list it in .Rbuildignore [**].

Since you composed your e-mail in HTML and left your mailer to generate
a plain text equivalent, we only got the latter, somewhat mangled:
https://stat.ethz.ch/pipermail/r-package-devel/2024q1/010356.html

Please compose your messages to R mailing lists in plain text.

-- 
Best regards,
Ivan

[*]
https://cran.r-project.org/web/packages/openaistream/LICENSE

[**]
https://cran.r-project.org/doc/manuals/R-exts.html#Building-package-tarballs

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] New Package Removal because Shared Library Too Large from Debugging Symbols

2024-01-21 Thread Ivan Krylov via R-package-devel
В Sat, 20 Jan 2024 20:28:00 -0500
Johann Gaebler  пишет:

> most likely there’s some error on my part in how I’ve set up cpp11,
> but it also seems possible that cpp11 should have detected that that
> header needs to be included and added it automatically

Upon further investigation, it's more complicated than a missing
#include.

cpp11::cpp_register() uses
tools::package_native_routine_registration_skeleton() to generate these
declarations. This function works by scanning the R code for calls to
.Call(), .C(), .Fortran(), and others and then trying to come up with
appropriate prototypes for the native functions being called. For
.Call()s, the function must output the correct type of SEXP for every
argument in the generated declaration.

This works the right way, for example, in R-4.2.2 (2022-11-10) and
today's R-devel, but was broken for a while (e.g. in R-4.3.1 and
R-4.3.2), and the fix, unfortunately, hasn't been backported (not to
R-patched either): https://bugs.r-project.org/show_bug.cgi?id=18585

I can suggest three workarounds.

1. Edit src/cpp11.cpp on a separate "for-CRAN" branch and rebase it on
   top of the main branch every time you update the package.

2. Install R-devel and use it to generate the source package. Strictly
   speaking, this would go against the letter of the CRAN policy
   (builds "should be done with current R-patched or the current
   release of R"), but would at least follow its spirit (use the
   version of R where the known package-building-related bug was fixed).

3. Add a configure script that would modify src/cpp11.cpp while the
   package is being installed. This way, the only thing modifying
   generated code would be more code, which is considered
   architecturally pure by some developers.

   Lots of ways to implement it, too: you can do it in a single shell
   script (using sed or patch -- are these tools guaranteed to be
   available?), delegate to tools/configure.R (that you would also
   write yourself), or go full GNU Autoconf and generate a
   megabyte-sized ./configure from some m4 macros just to replace one
   line.

   There is definitely a lot of performance art value if you go this
   way, but extra code means extra ways for it to go wrong. For more
   style points, make it a Makevars target instead of a configure
   script.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] New Package Removal because Shared Library Too Large from Debugging Symbols

2024-01-20 Thread Ivan Krylov via R-package-devel
В Sat, 20 Jan 2024 14:38:55 -0500
Johann Gaebler  пишет:

> The issue is that the compiled libraries are too large.

Was it in the e-mail? As you quite correctly observed, many other
packages get the NOTE about shared library size.

It may be not exactly obvious, but the red link saying "LTO" on the
check page that points to
 is hiding a more
serious issue:

> cpp11.cpp:18:13: warning: 'run_testthat_tests' violates the C++ One 
> Definition Rule [-Wodr]
>18 | extern SEXP run_testthat_tests(void *);
>   | ^
> /data/gannet/ripley/R/test-dev/testthat/include/testthat/testthat.h:172:17: 
> note: 'run_testthat_tests' was previously declared here
>   172 | extern "C" SEXP run_testthat_tests(SEXP use_xml_sxp) {
>   | ^

Modern C++ compilers are painfully pedantic about undefined behaviour
and can optimise away large sections of code if they think they have a
proof that your code causes it [*]. If you edit cpp11.cpp to provide the
correct declaration (#include the testthat header if possible), the
error should go away.

-- 
Best regards,
Ivan

[*] For example, see this issue in R: 
https://bugs.r-project.org/show_bug.cgi?id=18430

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Inquiry Regarding Package Organization in CRAN

2024-01-19 Thread Ivan Krylov via R-package-devel
Hello Andriy and welcome to R-package-devel!

On Fri, 19 Jan 2024 14:34:25 +
Protsak Andriy via R-package-devel 
wrote:

> to achieve this the initial focus is on exploring the possibility of
> renaming the packages so that they share a common prefix, making it
> easier for uses to locate them in the package list.

CRAN package names are long-term identifiers. Assume that there are
many users happy with the packages as they are. If you rename a
package, they will have to patch their scripts and their own packages
just to keep them working as before. Red Queen's race is not something
people like to participate in.

It is certainly not impossible to rename a package, but there has to be
a very good reason to break backwards compatibility and assume a new
name, while the old name stays in the archive, unavailable for new
packages.

Here are some past responses to similar questions:

https://stat.ethz.ch/pipermail/r-package-devel/2022q2/008140.html
https://stat.ethz.ch/pipermail/r-package-devel/2017q2/001678.html
https://stat.ethz.ch/pipermail/r-package-devel/2015q3/000271.html

> If you believe there are alternative strategies to achieve a similar
> result, please feel free to share your perspective.

There are approximately 2 active packages on CRAN. Looking for
useful packages by scanning a list of names will not be very effective.
Better results can be achieved using tools like RSiteSearch
. If you want a package to be more
visible, request its addition to a Task View
. If some packages are related,
make them link to each other in their documentation. David's options
are all very good.

> Additionally, I'm looking into the prospect of merging two packages
> that contain similar functionalities. The aim is to create a more
> comprehensive package by incorporation additional features and
> ensuring seamless compatibility.

The previous point about keeping backwards compatibility still stands.
It should be possible to move all the functions to one package and then
import() it from the other package. Both packages can then export() all
functions, making them available to the dependencies of either package.
Eventually, the skeleton package may grow packageStartupMessage()s
letting the users know that it is deprecated and could they please use
the other package instead. After a while, it should be possible to
archive the skeleton package. But deprecation cycles should be long:
for example, rgeos and rgdal took more than a year to retire
.

Or do you intend to come up with a completely new API? Beware of the
second system effect (although it's certainly not unheard of for second
system projects to succeed).

The spatstat package went through the opposite process a few years ago:
it grew too big and had to be split into multiple packages. Here's one
of its maintainers sharing the experience:
https://stat.ethz.ch/pipermail/r-package-devel/2022q4/008557.html

What is the nature of your final year project? If it can include
technical writing, you could add well-written vignettes to the packages
(only one of the CRAN packages maintained by people @uah.es has a
vignette, and it's very terse). If it has to be mostly programming or
maintenance of R packages, I'm out of ideas.

Either way, good luck, and I hope your project succeeds!

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] [External] Re: Choices to remove `srcref` (and its buddies) when serializing objects

2024-01-18 Thread Ivan Krylov via R-devel
On Thu, 18 Jan 2024 09:59:31 -0600 (CST)
luke-tier...@uiowa.edu wrote:

> What does 'blow up' mean? If it is anything other than signal a "bad
> binding access" error then it would be good to have more details.

My apologies for not being precise enough. I meant the "bad binding
access" error in all such cases.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Choices to remove `srcref` (and its buddies) when serializing objects

2024-01-18 Thread Ivan Krylov via R-devel
В Tue, 16 Jan 2024 14:16:19 -0500
Dipterix Wang  пишет:

> Could you recommend any packages/functions that compute hash such
> that the source references and sexpinfo_struct are ignored? Basically
> a version of `serialize` that convert R objects to raw without
> storing the ancillary source reference and sexpinfo.

I can show how this can be done, but it's not currently on CRAN or even
a well-defined package API. I have adapted a copy of R's serialize()
[*] with the following changes:

 * Function bytecode and flags are ignored:

f <- function() invisible()
depcache:::hash(f, 2) # This is plain FNV1a-64 of serialize() output
# [1] "9b7a1af5468deba4"
.Call(depcache:::C_hash2, f) # This is the new hash
[1] 91 5f b8 a1 b0 6b cb 40
f() # called once: function gets the MAYBEJIT_MASK flag
depcache:::hash(f, 2)
# [1] "7d30e05546e7a230"
.Call(depcache:::C_hash2, f)
# [1] 91 5f b8 a1 b0 6b cb 40
f() # called twice: function now has bytecode
depcache:::hash(f, 2)
# [1] "2a2cba4150e722b8"
.Call(depcache:::C_hash2, f)
# [1] 91 5f b8 a1 b0 6b cb 40 # new hash stays the same

 * Source references are ignored:

.Call(depcache:::C_hash2, \( ) invisible( ))
# [1] 91 5f b8 a1 b0 6b cb 40 # compare vs. above

# For quoted function definitions, source references have to be handled
# differently 
.Call(depcache:::C_hash2, quote(function(){}))
[1] 58 0d 44 8e d4 fd 37 6f
.Call(depcache:::C_hash2, quote(\( ){  }))
[1] 58 0d 44 8e d4 fd 37 6f

 * ALTREP is ignored:

identical(1:10, 1:10+0L)
# [1] TRUE
identical(serialize(1:10, NULL), serialize(1:10+0L, NULL))
# [1] FALSE
identical(
 .Call(depcache:::C_hash2, 1:10),
 .Call(depcache:::C_hash2, 1:10+0L)
)
# [1] TRUE

 * Strings not marked as bytes are encoded into UTF-8:

identical('\uff', iconv('\uff', 'UTF-8', 'latin1'))
# [1] TRUE
identical(
 serialize('\uff', NULL),
 serialize(iconv('\uff', 'UTF-8', 'latin1'), NULL)
)
# [1] FALSE
identical(
 .Call(depcache:::C_hash2, '\uff'),
 .Call(depcache:::C_hash2, iconv('\uff', 'UTF-8', 'latin1'))
)
# [1] TRUE

 * NaNs with different payloads (except NA_numeric_) are replaced by
   R_NaN.

One of the many downsides to the current approach is that we rely on
the non-API entry point getPRIMNAME() in order to hash builtins.
Looking at the source code for identical() is no help here, because it
uses the private PRIMOFFSET macro.

The bitstream being hashed is also, unfortunately, not exactly
compatible with R serialization format version 2: I had to ignore the
LEVELS of the language objects being hashed both because identical()
seems to ignore those and because I was missing multiple private
definitions (e.g. the MAYBEJIT flag) to handle them properly.

Then there's also the problem of immediate bindings [**]: I've seen bits
of vctrs, rstudio, rlang blow up when calling CAR() on SEXP objects that
are not safe to handle this way, but R_expand_binding_value() (used by
serialize()) is again a private function that is not accessible from
packages. identical() won't help here, because it compares reference
objects (which may or may not contain such immediate bindings) by their
pointer values instead of digging down into them.

Dropping the (already violated) requirement to be compatible with R
serialization bitstream will make it possible to simplify the code
further.

Finally:

a <- new.env()
b <- new.env()
a$x <- b$x <- 42
identical(a, b)
# [1] FALSE
.Call(depcache:::C_hash2, a)
# [1] 44 21 f1 36 5d 92 03 1b
.Call(depcache:::C_hash2, b)
# [1] 44 21 f1 36 5d 92 03 1b

...but that's unavoidable when looking at frozen object contents
instead of their live memory layout.

If you're interested, here's the development version of the package:
install.packages('depcache',contriburl='https://aitap.github.io/Rpackages')

-- 
Best regards,
Ivan

[*]
https://github.com/aitap/depcache/blob/serialize_canonical/src/serialize.c

[**]
https://svn.r-project.org/R/trunk/doc/notes/immbnd.md

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Additional Issues: Intel

2024-01-17 Thread Ivan Krylov via R-package-devel
В Wed, 17 Jan 2024 10:30:36 +1100
Hugh Parsonage  пишет:

> I am unable to immediately see where in the test suite this error has
> occurred.

Without testthat, you would have gotten a line by line printout of the code, 
letting you pinpoint the (top-level) place of the crash. With
testthat, you will need a more verbose reporter that would print tests
as they are executed to find out which test causes the crash.

> The only hunch I have is that the package uses C code and includes
> structs with arrays on the stack, which perhaps are excessive for the
> Intel check machine, but am far from confident that's the issue.

According to GNU cflow, your only recursive C functions are
getListElement (from getListElement.c) and nthOffset (from Offset.c),
but the recursion seems bounded in both cases.

I've tried looking for variable-length arrays in your code using a
Coccinelle patch, but found none. If you had variable-bounded recursion
or variable-length stack arrays (VLA or alloca()), it would be prudent
to use R_CheckStack() or R_CheckStack2(size_of_VLA), but your C code
contains neither, so there's no obvious culprit. If you know about
R-level recursion happening in your code and have a way to reduce it,
that might help too.

Otherwise, it's time to install Intel Everything and reproduce and
debug the problem the hard way.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] CMake on CRAN Systems

2024-01-16 Thread Ivan Krylov via R-package-devel
Dear Sameh,

Regarding your question about the MPCR package and the use of CMake
:
on a Mac, you have to look for the cmake executable in more than one
place because it is not guaranteed to be on the $PATH. As described in
Writing R Extensions
, the
following is one way to work around the problem:

if test -z "$CMAKE"; then CMAKE="`which cmake`"; fi
if test -z "$CMAKE"; then
 CMAKE=/Applications/CMake.app/Contents/bin/cmake;
fi
if test -f "$CMAKE"; then echo "no ‘cmake’ command found"; exit 1; fi

Please don't reply to existing threads when starting a new topic on
mailing lists. Your message had a mangled link that went to
urldefense.com instead of cran-archive.r-project.org, letting Amazon
(who host the website) know about every visit to the link:
https://stat.ethz.ch/pipermail/r-package-devel/2024q1/010328.html

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] checking CRAN incoming feasibility

2024-01-16 Thread Ivan Krylov via R-package-devel
В Tue, 16 Jan 2024 08:47:07 +
David Hugh-Jones  пишет:

> If I understand correctly, the current procedure is that the client
> downloads every package name from CRAN, and then checks its name is
> unique.

This is not the only check that relies on utils::available.packages().

In particular, strong dependencies are ensured to be present in
mainstream repositories, and the whole strong dependency tree is checked
for packages with FOSS licenses to ensure that their dependencies do not
restrict use.

Additional checks require even more files:

 - src/contrib/PACKAGES.in is checked for CRAN notes on packages
 - src/contrib/Meta/archive.rds is also checked for potential name
   collisions, case-insensitively.
 - src/contrib/Meta/current.rds is checked together with archive.rds
   for update frequency
 - web/packages/packages.rds is checked for maintainer changes

> Wouldn’t it be faster (for both parties) to check name uniqueness
> directly on the server?

The current scheme, if somewhat wasteful, makes it possible to run R
CMD check with any CRAN mirror without making it run any code server
side. (With the small exception of .htaccess to rewrite some paths, but
that should be translatable for other servers like nginx too.)

It's probably not impossible to transmit only data related to the
current package while keeping this property, but recursive dependency
checks in particular will not be easy. I think it's not worth the
effort.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] as.matrix.dist patch (performance)

2024-01-16 Thread Ivan Krylov
Dear Tim,

В Thu, 10 Aug 2023 22:38:44 +0100
Tim Taylor  пишет:

> Submitting here in the first instance but happy to move to Bugzilla
> if more appropriate.

It's a fine patch. The 1.7 times speed up from not transposing the
return value shouldn't be sneezed at. I think it's time to move it to
Bugzilla so that it won't be completely forgotten.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] checking CRAN incoming feasibility

2024-01-15 Thread Ivan Krylov via R-package-devel
В Tue, 16 Jan 2024 05:49:01 +
Rolf Turner  пишет:

> The problem is persistent/repeatable.  I don't believe that there is
> any faulty connection.

One of the things done by R CMD check --as-cran at this point is
sending a HEAD request to every Web link mentioned in the package
documentation and DESCRIPTION. One of the hosts may be slow to respond,
either by accident or due to misguided anti-robot countermeasures.
(Most website protection systems would say that R CMD check counts as a
robot because there's no human behind it to look at the ads.)

Here's what you could try. Unpack your built source package. If you
have a fresh .Rcheck directory from an R CMD check, use
YOURPACKAGE.Rcheck/00_pkg_src/YOURPACKAGE. Then profile the check
function, using the subdirectory from the source package archive as the
argument:

Rprof(); tools:::.check_package_CRAN_incoming(dir); Rprof('NULL')

Does any one function stand out in the subsequent summaryRprof()
output? For me, it's readRDS (not very helpful), but by reading
Rprof.out I can see that it's used by CRAN_package_db and
CRAN_archive_db to download web/packages/packages.rds and
src/contrib/Meta/archive.rds from the chosen CRAN mirror, which for me
takes a few seconds for both files.

Do you have a CRAN mirror set up in ~/.Rprofile? It could be having a
slow day.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Does dependencies up to date on the pretest CRAN infrastructure

2024-01-13 Thread Ivan Krylov via R-package-devel
В Fri, 12 Jan 2024 21:19:00 +0100
Serge  пишет:

> After somme minor midficiations, I make a try on the winbuilder site.
> I was able to build the archive with the static library
> but I get again a Bad address error. You can have a look to
> 
> https://win-builder.r-project.org/bw47qsMX3HTd/00install.out

I think that Win-Builder is running out of memory. It took some
experimenting, but I was able to reproduce something like this using
the following:

1. Set the swap file in the Windows settings to minimal recommended
size and disable its automatic growth

2. Write and run a program that does malloc(LARGE_NUMBER); getchar();
so that almost all physical memory is allocated

3. Run gcc -DFOO=`/path/to/Rscript -e 'some script'` & many times

I got a lot of interesting errors, including the "Bad address":

Warnings:
1: .getGeneric(f, , package) : internal error -4 in R_decompress1
2: package "methods" in options("defaultPackages") was not found

0 [main] bash (2892) child_copy: cygheap read copy failed,
0x0..0x800025420, done 0, windows pid 2892, Win32 error 299

0 [main] bash (3256) C:\rtools43\usr\bin\bash.exe: *** fatal error in
forked process - MEM_COMMIT failed, Win32 error 1455

-bash: fork: retry: Resource temporarily unavailable

-bash: R-devel/bin/Rscript.exe: Bad address

Your package is written in C++, but that by itself shouldn't disqualify
it. On my Linux computer, /usr/bin/time R -e
'install.packages("MixAll")' says that the installation takes slightly
less than a gigabyte of memory ("912516maxresident k"), which is par
the course for such packages. (My small Rcpp-using package takes
approximately half a gigabyte by the same metric.)

I'm still not 100% sure (if Win-Builder is running out of memory, why
are you seeing "Bad address" only and not the rest of the carnage?),
but I'm not seeing a problem with your package, either. If EFAULT is
Cygwin's way of saying "I caught a bad pointer in your system call"
(which, I must stress, is happening inside /bin/sh, not your package
or even R at all), it's not impossible that Win-Builder is having
hardware problems. Unfortunately, they take a lot of effort and
downtime to diagnose and could be hiding anywhere from RAM to the power
supply.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Does dependencies up to date on the pretest CRAN infrastructure

2024-01-12 Thread Ivan Krylov via R-package-devel
В Fri, 12 Jan 2024 19:09:29 +0100
Serge  пишет:

> I updated the package rtkore one month ago, fixing a compilation
> problem on windows devel platform.
> 
> MixAll has a dependency to rtkore. Thus, I suspect that the error
> reported below is due to the presence of the old version of rtkore on
> the pretest infrastructure of the CRAN.

:

/usr/bin/make -C projects/Clustering/src/
make[2]: Entering directory 
'/d/temp/RtmpYJkDTJ/R.INSTALL316dc7c0f48e6/MixAll/inst/projects/Clustering/src'
g++ -std=gnu++17  -I"D:/RCompile/recent/R/include" -DNDEBUG 
`D:/RCompile/recent/R/bin/Rscript -e "rtkore:::CppFlags()"`  
-I'D:/RCompile/CRANpkg/lib/4.4/Rcpp/include' 
-I'D:/RCompile/CRANpkg/lib/4.4/rtkore/include'   
-I"d:/rtools43/x86_64-w64-mingw32.static.posix/include"
`D:/RCompile/recent/R/bin/Rscript -e "rtkore:::CxxFlags()"` -I../inst/projects/ 
-I../inst/include/ -fopenmp   -pedantic -O2 -Wall  -mfpmath=sse -msse2 
-mstackrealign  -I../../../projects/ -I../../../include/ 
STK_CategoricalParameters.cpp -c -o ../../../bin/STK_CategoricalParameters.o
/bin/sh: line 1: /x86_64-w64-mingw32.static.posix/bin/g++: Bad address
make[2]: *** [makefile:54: ../../../bin/STK_CategoricalParameters.o] Error 126

RTools uses Cygwin features to emulate the presence of certain virtual
paths; /x86_64-w64-mingw32.static.posix/bin/g++ actually exists and is
transparently mapped to
d:/rtools43/x86_64-w64-mingw32.static.posix/bin/g++.exe:

User@WINMACHINE MSYS ~
$ /x86_64-w64-mingw32.static.posix/bin/g++ --version
g++.exe (GCC) 12.2.0

The "Bad address" here means that /bin/sh got an EFAULT while trying to
launch g++.exe:
https://stat.ethz.ch/pipermail/r-package-devel/2023q4/010223.html

Unless there is something extremely weird in the command line arguments
returned by Rscript -e "rtkore:::CxxFlags()" that causes the process to
fail to launch (in my opinion, very unlikely, but can you print them
from your compilation process just in case?), I would be looking for
problems elsewhere.

In particular, the problem cannot be in having rtkore installed that is
one version too old, because you only changed Makevars in that version,
and your package MixAll doesn't use the Makevars from a different
source package.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Sys.which() caching path to `which`

2024-01-12 Thread Ivan Krylov via R-devel
On Thu, 11 Jan 2024 09:30:55 +1300
Simon Urbanek  wrote:

> That said, WHICH is a mess - it may make sense to switch to the
> command -v built-in which is part of POSIX (where available - which
> is almost everywhere today) which would not require an external tool

This is a bit tricky to implement. I've prepared the patch at the end
of this e-mail, tested it on GNU/Linux and tried to test on OpenBSD [*]
(I cannot test on a Mac), but then I realised one crucial detail:
unlike `which`, `command -v` returns names of shell builtins if
something is both an executable and a builtin. So for things like `[`,
Sys.which would behave differently if changed to use command -v:

$ sh -c 'which ['
/usr/bin/[
$ sh -c 'command -v ['
[

R checks the returned string with file.exists(), so the new
Sys.which('[') returns an empty string instead of /usr/bin/[. That's
probably undesirable, isn't it?

Index: configure
===
--- configure   (revision 85802)
+++ configure   (working copy)
@@ -949,7 +949,6 @@
 PDFTEX
 TEX
 PAGER
-WHICH
 SED
 INSTALL_DATA
 INSTALL_SCRIPT
@@ -5390,66 +5389,6 @@
 done
 test -n "$SED" || SED="/bin/sed"
 
-
-## 'which' is not POSIX, and might be a shell builtin or alias
-##  (but should not be in 'sh')
-for ac_prog in which
-do
-  # Extract the first word of "$ac_prog", so it can be a program name with 
args.
-set dummy $ac_prog; ac_word=$2
-{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
-printf %s "checking for $ac_word... " >&6; }
-if test ${ac_cv_path_WHICH+y}
-then :
-  printf %s "(cached) " >&6
-else $as_nop
-  case $WHICH in
-  [\\/]* | ?:[\\/]*)
-  ac_cv_path_WHICH="$WHICH" # Let the user override the test with a path.
-  ;;
-  *)
-  as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
-for as_dir in $PATH
-do
-  IFS=$as_save_IFS
-  case $as_dir in #(((
-'') as_dir=./ ;;
-*/) ;;
-*) as_dir=$as_dir/ ;;
-  esac
-for ac_exec_ext in '' $ac_executable_extensions; do
-  if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then
-ac_cv_path_WHICH="$as_dir$ac_word$ac_exec_ext"
-printf "%s\n" "$as_me:${as_lineno-$LINENO}: found 
$as_dir$ac_word$ac_exec_ext" >&5
-break 2
-  fi
-done
-  done
-IFS=$as_save_IFS
-
-  ;;
-esac
-fi
-WHICH=$ac_cv_path_WHICH
-if test -n "$WHICH"; then
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $WHICH" >&5
-printf "%s\n" "$WHICH" >&6; }
-else
-  { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5
-printf "%s\n" "no" >&6; }
-fi
-
-
-  test -n "$WHICH" && break
-done
-test -n "$WHICH" || WHICH="which"
-
-if test "${WHICH}" = which ; then
-  ## needed to build and run R
-  ## ends up hard-coded in the utils package
-  as_fn_error $? "which is required but missing" "$LINENO" 5
-fi
-
 ## Make
 : ${MAKE=make}
 
Index: configure.ac
===
--- configure.ac(revision 85802)
+++ configure.ac(working copy)
@@ -680,15 +680,6 @@
 ## we would like a POSIX sed, and need one on Solaris
 AC_PATH_PROGS(SED, sed, /bin/sed, [/usr/xpg4/bin:$PATH])
 
-## 'which' is not POSIX, and might be a shell builtin or alias
-##  (but should not be in 'sh')
-AC_PATH_PROGS(WHICH, which, which)
-if test "${WHICH}" = which ; then
-  ## needed to build and run R
-  ## ends up hard-coded in the utils package
-  AC_MSG_ERROR([[which is required but missing]])
-fi
-
 ## Make
 : ${MAKE=make}
 AC_SUBST(MAKE)
Index: src/library/base/Makefile.in
===
--- src/library/base/Makefile.in(revision 85802)
+++ src/library/base/Makefile.in(working copy)
@@ -28,7 +28,7 @@
 all: Makefile DESCRIPTION
@$(ECHO) "building package '$(pkg)'"
@$(MKINSTALLDIRS) $(top_builddir)/library/$(pkg)
-   @WHICH="@WHICH@" $(MAKE) mkRbase mkdesc2 mkdemos2
+   @$(MAKE) mkRbase mkdesc2 mkdemos2
@$(INSTALL_DATA) $(srcdir)/inst/CITATION $(top_builddir)/library/$(pkg)
 
 include $(top_srcdir)/share/make/basepkg.mk
@@ -45,12 +45,12 @@
 mkR: mkRbase
 
 Rsimple:
-   @WHICH="@WHICH@" $(MAKE) mkRbase mkRsimple
+   @$(MAKE) mkRbase mkRsimple
 
 ## Remove files to allow this to be done repeatedly
 Rlazy:
-@rm -f  $(top_builddir)/library/$(pkg)/R/$(pkg)*
-   @WHICH="@WHICH@" $(MAKE) mkRbase
+   @$(MAKE) mkRbase
@cat $(srcdir)/makebasedb.R | \
  R_DEFAULT_PACKAGES=NULL LC_ALL=C $(R_EXE) > /dev/null
@$(INSTALL_DATA) $(srcdir)/baseloader.R \
@@ -57,4 +57,4 @@
  $(top_builddir)/library/$(pkg)/R/$(pkg)
 
 Rlazycomp:
-   @WHICH="@WHICH@" $(MAKE) mkRbase mklazycomp
+   @$(MAKE) mkRbase mklazycomp
Index: src/library/base/R/unix/system.unix.R
===
--- src/library/base/R/unix/system.unix.R   (revision 85802)
+++ src/library/base/R/unix/system.unix.R   (working copy)
@@ -114,23 +114,14 @@
 Sys.which <- function(names)
 {
 res <- 

Re: [Rd] Choices to remove `srcref` (and its buddies) when serializing objects

2024-01-12 Thread Ivan Krylov via R-devel
В Fri, 12 Jan 2024 00:11:45 -0500
Dipterix Wang  пишет:

> I wonder how hard it would be to have options to discard source when
> serializing R objects? 

> Currently my analyses heavily depend on digest function to generate
> file caches and automatically schedule pipelines (to update cache)
> when changes are detected.

Source references may be the main problem here, but not the only one.
There are also string encodings and function bytecode (which may or may
not be present and probably changes between R versions). I've been
collecting the ways that the objects that are identical() to each other
can serialize() differently in my package 'depcache'; I'm sure I missed
a few.

Admittedly, string encodings are less important nowadays (except on
older Windows and weirdly set up Unix-like systems). Thankfully, the
digest package already knows to skip the serialization header (which
contains the current version of R).

serialize() only knows about basic types [*], and source references are
implemented on top of these as objects of class 'srcref'. Sometimes
they are attached as attributes to other objects, other times (e.g. in
quote(function(){}), [**]) just sitting there as arguments to a call.

Sometimes you can hash the output of deparse(x) instead of serialize(x)
[***]. Text representations aren't without their own problems (e.g.
IEEE floating-point numbers not being representable as decimal
fractions), but at least deparsing both ignores the source references
and punts the encoding problem to the abstraction layer above it:
deparse() is the same for both '\uff' and iconv('\uff', 'UTF-8',
'latin1'): just "ÿ".

Unfortunately, this doesn't solve the environment problem. For these,
you really need a way to canonicalize the reference-semantics objects
before serializing them without changing the originals, even in cases
like a <- new.env(); b <- new.env(); a$x <- b; b$x <- a. I'm not sure
that reference hooks can help with that. In order to implement it
properly, the fixup process will have to rely on global state and keep
weak references to the environments it visits and creates shadow copies
of.

I think it's not impossible to implement
serialize_to_canonical_representation() for an R package, but it will
be a lot of work to decide which parts are canonical and which should
be discarded.

-- 
Best regards,
Ivan

[*]
https://cran.r-project.org/doc/manuals/R-ints.html#Serialization-Formats

[**]
https://bugs.r-project.org/show_bug.cgi?id=18638

[***]
https://stat.ethz.ch/pipermail/r-devel/2023-March/082505.html

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] "Examples with CPU time > 2.5 times elapsed time" and other NOTEs on CRAN and rhub checks

2024-01-11 Thread Ivan Krylov via R-package-devel
В Thu, 11 Jan 2024 12:39:17 +
D Z  пишет:

> The package itself has no parallelism built-in, but Imports
> data.table. This NOTE does not surface on other platforms (eg using
> rhub or on my GitHub actions runners). My unit tests already limit
> data.table to 2 cores using setDTthreads(2), but I would like to keep
> this line out of the help files for my functions.

A breakpoint on pthread_create confirms that these are OpenMP threads
created by data.table. You can wrap setDTthreads(2) in \dontshow{} to
avoid visual pollution:
https://cran.r-project.org/doc/manuals/R-exts.html#index-_005cdontshow

> I receive the NOTE that my libs/ sub-directory is at 7.7Mb. Can I
> ignore this or do I need to figure out how to reduce the binary size
> of the package?

I think this is typically accepted for packages using C++.

> And last but not least, on some rhub instances (Fedora and Ubuntu
> GCC) I receive a NOTE that the package runs its examples too slowly
> (eg above 5secs). I have already tweaked the example code already
> that it runs reliably <4 secs on my development laptop

Then it should be fine.

Additionally, you may need to cast some of your Rprintf arguments to
avoid format warnings on Windows:
https://win-builder.r-project.org/incoming_pretest/RITCH_0.1.23_20240110_120457/Windows/00check.log

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] using Paraview "in-situ" with R?

2024-01-09 Thread Ivan Krylov via R-devel
В Tue, 9 Jan 2024 14:20:17 +
Mike Marchywka  пишет:

> it seems like an excellent tool to interface to R allowing
> visualization without a bunch of temp files or 
> 
> Is anyone aware of anyone doing this interface or reasons its  a
> boondoggle?

This sounds like it's better suited for r-package-de...@r-project.org,
not R-devel itself.

In theory, nothing should prevent you from writing C++ code interfacing
with ParaView (via its "adios" streaming library) and with R. The Rcpp
package will likely help you bring the semantics of the two languages
closer together. (Memory allocation and error handling are the two
major topics where R and C++ especially disagree.)

On the R side, make an object with reference semantics (i.e. an
external pointer) and use callbacks to update it with new information
while R code is running. On the R extension side, translate these
callbacks into necessary calls to the adios library to transfer the
data to ParaView.

For more informaion, see Writing R Extensions at
 and Rcpp
documentation at .

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] CRAN submission struggle

2024-01-07 Thread Ivan Krylov
On Sun, 7 Jan 2024 10:52:44 +0200
Christiaan Pieterse  wrote:

> I have edited my package to have two examples. One uses a small
> self-generated dataset and another uses a big dataset. For the big
> dataset example, I put \donttest{} around it, should this be fine?

The small example is definitely fine: it exercises the code and does so
fast.

The big example wrapped in \donttest{} could be fine, I'm not sure.
I've seen CRAN packages that wrap the long parts of their examples in
\donttest{} while still making sure that R CMD check --run-donttest
exercises most of their code.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] CRAN submission struggle

2024-01-06 Thread Ivan Krylov
On Sat, 6 Jan 2024 15:16:01 +0300
Ivan Krylov  wrote:

> Congratulations! I also get the single expected NOTE in my checks.

Apologies for the double e-mail, but I've read the code now, and
wrapping the example of your only function in \dontrun{} will most
likely not be allowed.

Is it really the case that you cannot remove a single row from the
example dataset without making the example crash? It may help to write
a function that would remove rows one by one, make sure that the
example still runs, and keep doing that until not a single row can be
removed. The complexity is terrible (something like O(n^k)), but let it
run for a while, and maybe it'll reduce the dataset enough to fit in
the example time limit.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] CRAN submission struggle

2024-01-06 Thread Ivan Krylov
On Sat, 6 Jan 2024 13:50:49 +0200
Christiaan Pieterse  wrote:

> Is there a way to confirm that this package is ready for submission?
> I submitted it to https://win-builder.r-project.org/ and
> https://mac.r-project.org/macbuilder/submit.html and
> https://builder.r-hub.io/. All of these seem to only show the
> expected new submission note.

Congratulations! I also get the single expected NOTE in my checks.

This may be your last chance to rename the package from iopspackage to,
say, "IOPS". If you go over the package and the CRAN policy at
 one last time
and deem the package compliant, it should be ready for submission.

The CRAN reviewer may find additional problems and ask you to fix them,
but you are likely most of the way there.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] static html vignette

2024-01-04 Thread Ivan Krylov
On Thu, 4 Jan 2024 11:57:15 +0200
Adrian Dușa  wrote:

> I wonder if it would be possible to include an html static vignette. 

This is better suited for R-package-devel, not R-devel.

I would say that static vignettes are against the spirit of vignettes:
the idea is to provide another layer of unit testing to the package by
providing a deeper executable example than is possible with just Rd
examples. I think that Bioconductor will even refuse a package with a
vignette with no executable code in it.

Still, you can use the R.rsp package to provide static vignettes in
both PDF and HTML formats:
https://cran.r-project.org/package=R.rsp/vignettes/R_packages-Static_PDF_and_HTML_vignettes.pdf

This will add 6 packages to your total Suggests budget:

setdiff(
 unlist(package_dependencies('R.rsp', recursive=TRUE)),
 unlist(standard_package_names())
)
# [1] "R.methodsS3" "R.oo""R.utils" "R.cache" "digest"  

HTML vignettes currently have much better accessibility than PDF
vignettes, and the need for a low-dependency-footprint (in terms of
both R packages and external tools like Pandoc) HTML vignette engine is
evident . It's easy
to solve this problem ~80% of the way, but making something that ticks
all the boxes (zero-dependency and/or suitable for inclusion into R
itself, handles plots *and* math, low-boilerplate, no external
dependencies like Pandoc or JavaScript CDNs, compact output) is a hard
problem that's mostly not fun to work on.

The R2HTML package has no non-core hard dependencies and provides an
HTML Sweave engine, but I'm not sure it can be used in a vignette (and
it probably needs more maintainer work to be up to modern standards).

The zero-dependency approach will be to bring your own vignette engine
with you, but that requires lots of additional work (including bug
workarounds: ). I've
seen CRAN packages that do that, but I cannot find them right now. Yet
another trick would be to provide a dummy *.Rnw file to trigger the
vignette-building code and a Makefile in which the real vignette is
produced (removing the dummy vignette and its intermediate output).
Again, writing a portable Makefile is non-trivial and only lets you
work around PR18191.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] how to use pkgdown::build_site() with a project using S7 with a specialized plot()?

2024-01-03 Thread Ivan Krylov
On Wed, 3 Jan 2024 13:34:27 +
Daniel Kelley  wrote:

> Error: 
> ! in callr subprocess.
> Caused by error in `map2(.x, vec_index(.x), .f, ...)`:
> ! In index: 1.

Interesting that the actual error messages seem to be completely
empty.

By chance (I was searching for "rlang See `$stdout` for standard
output" because I was curious to know what is this error message
telling the user to subset) I found a bug report that seems relevant
(as it's also about S7, has the same warning and crashes in the same
call to rlang::check_installed):
https://github.com/r-lib/pkgdown/issues/2186

Unfortunately, there's no solution, just two similar-looking cases.

Is there an equivalent of options(error = recover) for callr child
processes? If you can recover the expression evaluated by the child
process, it could be worth executing it directly and walking the call
stack looking at the local variables at the time of the crash.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] CRAN submission struggle

2023-12-29 Thread Ivan Krylov
On Thu, 28 Dec 2023 18:00:37 +0200
Christiaan Pieterse  wrote:

> I only get 3 notes (see below), and if I run it in PositCloud, it
> crashes or yields the same 1 ERROR and 2 NOTES result as before. Why
> might this be? 

Does the PositCloud check crash with "Killed" (most likely out of RAM)
or with a different error message?

> Is it a problem or is it fine if I continue working in RStudio since
> I cannot increase the specs in PositCloud because I'm working on a
> research group account?

If your local R CMD check works, it should be fine. Rough
specifications for the machines running CRAN checks can be found at
. We ought to
test our packages on the weakest hardware that could plausibly be used
to run our code, but that's not always easy to do. I know I don't
always dig out my old Intel Atom ultraportable to run the checks myself.

> The second is the runtime that is too long:
> * checking examples ... [43s] NOTE
> Examples with CPU (user + system) or elapsed time > 5s
>   user system elapsed
> IOPS 10.06   3.35   35.04

Similar NOTEs can be seen about the use of multi-threading, but here
the "elapsed" (real, as measured by a clock) time exceeds the "user"
(CPU time spent inside applications) + "system" (CPU time spent inside
the operating system kernel) time, so the code uses less than 100% of
one CPU core on average, which fits comfortably in the 200% allowed by
the CRAN policy for examples and tests.

Unfortunately, 35 seconds is still too much.

> How can I reduce this time? I'm not sure how to reduce the size of my
> ExampleTradeData without the check giving errors when running the
> example.

How does the algorithm work? I've seen it fail due to Proximities being
a 0x0 matrix. Can you work backwards from
economiccomplexity::proximity() returning a 0x0 matrix to derive the
requirements that IOPS places on the dataset? It may help to experiment
with sample.int() to subset the rows and see which combinations work.
Perhaps you can reduce the dataset to two countries and two products?

Have you tried profiling? You can profile your code for both speed and
memory use, and replacing less performant idioms with those using less
CPU time and memory may solve both the CPU time problem and the
OOM-crash problem:
https://cran.r-project.org/doc/manuals/R-exts.html#Tidying-and-profiling-R-code
 
> The third note I am unsure what it means:
> * checking for detritus in the temp directory ... NOTE
> Found the following files/directories:
>   'lastMiKTeXException'

Have you installed the inconsolata MiKTeX package?
https://cran.r-project.org/doc/manuals/R-admin.html#LaTeX-on-Windows
Try running R CMD Rd2pdf on your package directory: maybe MiKTeX will
pop up an interactive dialog to let you install any remaining missing
dependencies. If not, there should be a lastMiKTeXException file for
you to read.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Issue with flang-new (segfault from C stack overflow)

2023-12-18 Thread Ivan Krylov
В Mon, 18 Dec 2023 11:06:16 +0100
Jisca Huisman  пишет:

> I isolated the problem in a minimal working example available here: 
> https://github.com/JiscaH/flang_segfault_min_example . All that does
> is pass a vector of length N*N back and forth between R and Fortran.
> It works fine for very long vectors (tested up to length 5e8), but
> throws a segfault when I reshape a large array in Fortran to a vector
> to pass to R, both when using RESHAPE() and when using loops.

You've done an impressive amount of investigative work. Thank you for
reducing your problem to such a small example! My eyes are drawn to
these two lines:

>>  integer, intent(IN) :: N
>>  integer :: M(N,N)

If this was C, such a declaration would mean a variable-length array
that would have to be placed on the (limited-size) stack and eventually
overflow it. gfortran places the array on the heap, so the program
works:

  integer, intent(IN) :: N
  integer, intent(INOUT) :: V(N*N)
  integer :: M(N,N)
1205:   48 63 dbmovslq %ebx,%rbx
1208:   b8 00 00 00 00  mov$0x0,%eax
120d:   48 85 dbtest   %rbx,%rbx
1210:   49 89 c4mov%rax,%r12
1213:   4c 0f 49 e3 cmovns %rbx,%r12
1217:   48 89 dfmov%rbx,%rdi
121a:   49 0f af fc imul   %r12,%rdi
121e:   48 85 fftest   %rdi,%rdi
1221:   48 0f 48 f8 cmovs  %rax,%rdi
1225:   48 c1 e7 02 shl$0x2,%rdi
1229:   b8 01 00 00 00  mov$0x1,%eax
122e:   48 0f 44 f8 cmove  %rax,%rdi
1232:   e8 19 fe ff ff  callq  1050 
1237:   48 89 c5mov%rax,%rbp
123a:   4c 89 e7mov%r12,%rdi
123d:   48 f7 d7not%rdi

(Looking at the address of M in GDB and comparing it with the output
of info proc mappings, I can confirm that it lives on the heap.)

flang-new makes M into a C-style VLA:

  integer, intent(IN) :: N
  integer, intent(INOUT) :: V(N*N)
  integer :: M(N,N)
74ec:   48 63 17movslq (%rdi),%rdx
74ef:   89 d1   mov%edx,%ecx
74f1:   31 c0   xor%eax,%eax
74f3:   48 85 d2test   %rdx,%rdx
74f6:   48 0f 49 c2 cmovns %rdx,%rax
74fa:   48 89 85 b0 fe ff ffmov%rax,-0x150(%rbp)
7501:   48 89 c2mov%rax,%rdx
7504:   48 0f af d2 imul   %rdx,%rdx
7508:   48 8d 34 95 0f 00 00lea0xf(,%rdx,4),%rsi
750f:   00
7510:   48 83 e6 f0 and$0xfff0,%rsi
7514:   48 89 e2mov%rsp,%rdx
7517:   48 29 f2sub%rsi,%rdx
751a:   48 89 95 b8 fe ff ffmov%rdx,-0x148(%rbp)
7521:   48 89 d4mov%rdx,%rsp

(Looking at the value of the stack pointer in GDB after M(N,N) is
declared, I can see it way below the end of the stack and the loaded
shared libraries according to info proc mappings. GDB doesn't let me
see the address of M. The program crashes in `M = 42`, trying to
overwrite the code from the C standard library.)

Are Fortran processors allowed to place such "automatic data objects"
like integer :: M(N,N) on the stack? The Fortran standard doesn't seem
to give an answer to this question, but if you make your M allocatable,
you won't have to worry about stack usage:

subroutine dostuff(N,V)
  implicit none

  integer, intent(IN) :: N
  integer, intent(INOUT) :: V(N*N)
  integer, allocatable :: M(:,:) ! <-- here

  allocate(M(N,N))   ! <-- and here
  M = 42
  V = RESHAPE(M, (/N*N/))
end subroutine dostuff

No leaks or crashes observed with these two changes and either
compiler. The Fortran standard requires that local allocatable unsaved
arrays (except for the function result) are deallocated at the end of
procedures.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] CRAN submission struggle

2023-12-18 Thread Ivan Krylov
On Sun, 17 Dec 2023 21:48:51 +0200
Christiaan Pieterse  wrote:

> Warning in complexity_measures(Mbin, method = "reflections",
> iterations = iterCompl) :
>   'iterations' was changed to 'iterations + 1' to work with an even
> number of iterations
> Killed

If this is happening on Linux, this could mean the current example (or
some other, completely unrelated process running at the same time)
allocating too much memory and summoning the OOM-killer.

In the current HEAD of the CRAN-prep branch, the man/*.Rd files are
empty, which prevents the package from being installed, so I couldn't
reproduce the problem. Are there any local changes you need to commit
and push to GitHub? If you're not comfortable keeping the package
source in sync with GitHub, we can consider other options.

> * checking HTML version of manual ... NOTE
> Skipping checking HTML validation: no command 'tidy' found

This is a problem with the system running R CMD check, not your
package. In order to check the validity of generated HTML
documentation, R needs to use the program called HTML-Tidy, which is
not installed on this computer.

Where are you getting these results from? Win-Builder
? macOS builder
?

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] CRAN submission struggle

2023-12-17 Thread Ivan Krylov
On Sun, 17 Dec 2023 15:29:34 +0200
Christiaan Pieterse  wrote:

> But, I've uploaded the newly created package as discussed in my first
> email, available at:
> https://github.com/ChristiaanPieterse/iopspackage2.1.0

Are you sure it wouldn't be better to clean up the existing package
instead of creating unrelated forks? If you're afraid of breaking
something that works, do the work on a separate branch

until it both works as well as your current repo currently does *and*
passes R CMD check --as-cran.

> * checking CRAN incoming feasibility ... [25s] NOTE
> Maintainer: 'C.J Pieterse '
> New submission

"New submission" is to be expected.

> Unknown, possibly misspelled, fields in DESCRIPTION:
>   'Exports'

"Writing R Extensions" doesn't define a field named "Exports". The
exports are declared in the NAMESPACE file. Since you're using
roxygen2, use its @export tag to export your functions and remove the
Exports: field.

> * checking whether package 'iopspackage' can be installed ... [27s]
> WARNING Found the following significant warnings:
>   Warning: package 'Rcpp' was built under R version 4.3.2

This is probably not a problem with your package (but may be a problem
with the way the machine running R CMD check is set up).

> * checking dependencies in R code ... NOTE
> Namespaces in Imports field not imported from:
>   'openxlsx' 'roxygen2' 'tibble'
>   All declared Imports should be used.

> * checking R code for possible problems ... [12s] NOTE
> IOPS: no visible global function definition for 'createWorkbook'
> IOPS: no visible global function definition for 'addWorksheet'
> IOPS: no visible global function definition for 'writeData'
> IOPS: no visible global function definition for 'saveWorkbook'
> Undefined global functions or variables:
>   addWorksheet createWorkbook saveWorkbook writeData

Are you sure you should be importing roxygen2? You only run
roxygenise() before running R CMD build in order to generate
man/*.Rd and NAMESPACE; I don't think it's used after that.

If you don't use functions from tibble, there's no need to import it
or depend on it either. I also don't see you directly using Rcpp, but
there's no warning about it for some reason.

Use the @importFrom tags to import individual functions that you
actually use (i.e. createWorkbook and friends). See

for more information on importing.

Also, remove all library() calls from R/iopspackage2.R. Packages live
in namespaces, not in the global environment; your package should rely
upon the dependency information in DESCRIPTION and NAMESPACE (the
latter generated by roxygen2) for its dependencies.

> > data(ExampleTradeData)  
> Warning in data(ExampleTradeData) :
>   data set 'ExampleTradeData' not found

There's no 'data' directory and no file named ExampleTradeData.* in it.
Data for use by the function data() should be prepared as described in
.
If you want to use files under inst/extdata, you have to read them
manually:

ETD <- read.csv(system.file(
 file.path('extdata','ExampleTradeData.csv'),
 package = 'iopspackage'
))

> * checking for detritus in the temp directory ... NOTE
> Found the following files/directories:
>   'lastMiKTeXException'

Is this on R-hub? This usually happens on R-hub and doesn't indicate a
problem with your package.

> #' temp_dir <- tempdir()
> #' 
> #' # Set the working directory to a temporary directory
> #' setwd(temp_dir)
<...>
> #' # Clean up the temporary directory
> #' unlink(temp_dir, recursive = TRUE)

Please make it a subdirectory of the session temporary directory:

temp_dir <- tempfile()
dir.create(temp_dir)
...

Removing the session temporary directory is not as bad as directly
overwriting user data (it's all going away after the process is shut
down), but quite a lot of parts of R and other packages rely on
tempdir() being a directory that exists, not to mention that there
could be other temporary files in use by other packages.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] CRAN submission struggle

2023-12-16 Thread Ivan Krylov
On Sat, 16 Dec 2023 19:41:16 +0200
Christiaan Pieterse  wrote:

> This .R file contained the Roxygen2 comments. (I was very unsure what
> comments to include in this so it might be wrong, I'm unsure)

If you like roxygen2 and would like to keep using it, you're welcome to
keep the roxygen2 comments in your R files. Just don't forget to re-run
roxygenise() every time you update them.

>3. Included a DESCRIPTION file is the 'iopspackage' folder. (Once
> again I was very unsure what to include in this file so it might be
> wrong).

Does it help to follow the guide at
?
Start with the mandatory fields Package, Version, License,
Description, Title and Authors@R (to generate Author: and Maintainer:
from).

>8. I checked the tar file using *R CMD check --as-cran
>"iopspackage_2.1.0.tar.gz". *This yielded errors, warnings and
> notes which I don't know how to solve and suspect are due to me
> setting the file up wrong.

Can you show us the log from the check? It should be fine to copy &
paste the entries that don't say OK (i.e. NOTEs, WARNINGs, and ERRORs).

Most of what you'll need to fix is described in "Writing R Extensions"
(link above) and the CRAN policy at
.

> I've been told before not to include my package as an attachment, so
> can someone please help me with the submission process?

Can you publish the code anywhere? Ideally, this place should provide
instant access to the latest version of every source code file inside
your package. The most popular option nowadays is GitHub, but it does
not have to be GitHub. Two GDPR-friendly alternatives are Codeberg and
SourceHut. If you don't like Git (which does take effort to learn),
there's R-Forge and Chiselapp.com. If you don't want to learn software
version control right now, any free web/file hosting will suffice as
long as you keep the files updated and accessible.

> It is a 10mb file

I think we've discussed this before. A 10-megabyte package mostly
consisting of example data is not a good fit for CRAN. It's possible to
use free Web hosting services to distribute data packages (see the
'drat' package and the function tools::writePACKAGES) separate from the
CRAN package that should mainly contain the code.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Partial matching performance in data frame rownames using [

2023-12-16 Thread Ivan Krylov
On Wed, 13 Dec 2023 09:04:18 +0100
Hilmar Berger via R-devel  wrote:

> Still, I feel that default partial matching cripples the functionality
> of data.frame for larger tables.

Changing the default now would require a long deprecation cycle to give
everyone who uses `[.data.frame` and relies on partial matching
(whether they know it or not) enough time to adjust.

Still, adding an argument feels like a small change: edit
https://svn.r-project.org/R/trunk/src/library/base/R/dataframe.R and
add a condition before calling pmatch(). Adjust the warning() for named
arguments. Don't forget to document the new argument in the man page at
https://svn.r-project.org/R/trunk/src/library/base/man/Extract.data.frame.Rd

Index: src/library/base/R/dataframe.R
===
--- src/library/base/R/dataframe.R  (revision 85664)
+++ src/library/base/R/dataframe.R  (working copy)
@@ -591,14 +591,14 @@
 ###  These are a little less general than S
 
 `[.data.frame` <-
-function(x, i, j, drop = if(missing(i)) TRUE else length(cols) == 1)
+function(x, i, j, drop = if(missing(i)) TRUE else length(cols) == 1, 
pmatch.rows = TRUE)
 {
 mdrop <- missing(drop)
 Narg <- nargs() - !mdrop  # number of arg from x,i,j that were specified
 has.j <- !missing(j)
-if(!all(names(sys.call()) %in% c("", "drop"))
+if(!all(names(sys.call()) %in% c("", "drop", "pmatch.rows"))
&& !isS4(x)) # at least don't warn for callNextMethod!
-warning("named arguments other than 'drop' are discouraged")
+warning("named arguments other than 'drop', 'pmatch.rows' are 
discouraged")
 
 if(Narg < 3L) {  # list-like indexing or matrix indexing
 if(!mdrop) warning("'drop' argument will be ignored")
@@ -679,7 +679,11 @@
 ## for consistency with [, ]
 if(is.character(i)) {
 rows <- attr(xx, "row.names")
-i <- pmatch(i, rows, duplicates.ok = TRUE)
+i <- if (pmatch.rows) {
+pmatch(i, rows, duplicates.ok = TRUE)
+} else {
+match(i, rows)
+}
 }
 ## need to figure which col was selected:
 ## cannot use .subset2 directly as that may
@@ -699,7 +703,11 @@
  # as this can be expensive.
 if(is.character(i)) {
 rows <- attr(xx, "row.names")
-i <- pmatch(i, rows, duplicates.ok = TRUE)
+i <- if (pmatch.rows) {
+pmatch(i, rows, duplicates.ok = TRUE)
+} else {
+match(i, rows)
+}
 }
 for(j in seq_along(x)) {
 xj <- xx[[ sxx[j] ]]
Index: src/library/base/man/Extract.data.frame.Rd
===
--- src/library/base/man/Extract.data.frame.Rd  (revision 85664)
+++ src/library/base/man/Extract.data.frame.Rd  (working copy)
@@ -15,7 +15,7 @@
   Extract or replace subsets of data frames.
 }
 \usage{
-\method{[}{data.frame}(x, i, j, drop = )
+\method{[}{data.frame}(x, i, j, drop =, pmatch.rows = TRUE)
 \method{[}{data.frame}(x, i, j) <- value
 \method{[[}{data.frame}(x, ..., exact = TRUE)
 \method{[[}{data.frame}(x, i, j) <- value
@@ -45,6 +45,9 @@
 column is selected.}
 
\item{exact}{logical: see \code{\link{[}}, and applies to column names.}
+
+   \item{pmatch.rows}{logical: whether to perform partial matching on
+ row names in case \code{i} is a character vector.}
 }
 \details{
   Data frames can be indexed in several modes.  When \code{[} and


system.time({r <- d1[q2,, drop=FALSE, pmatch.rows = FALSE]})
#user  system elapsed 
#   0.478   0.004   0.482 

Unfortunately, that would be only the beginning. The prose in the whole
?`[.data.frame` would have to be updated; the new behaviour would have
to be tested in tests/**.R. There may be very good reasons why named
arguments to `[` other than drop= are discouraged for data.frames. I'm
afraid I lack the whole-project view to consider whether such an
addition would be safe.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Getting 'Rscript: Bad address' error when CRAN build my package on windows platforms

2023-12-14 Thread Ivan Krylov
On Thu, 7 Dec 2023 19:29:46 +0100
Serge  wrote:

> g++ -std=gnu++11 -I"D:/RCompile/recent/R-4.3.2/include" -DNDEBUG
> -I../inst/projects/ -I../inst/include/ -DIS_RTKPP_LIB -DSTKUSELAPACK
> -I'D:/RCompile/CRANpkg/lib/4.3/Rcpp/include'
> -I"d:/rtools43/x86_64-w64-mingw32.static.posix/include" -fopenmp -O2
> -Wall -mfpmath=sse -msse2 -mstackrealign -c fastRand.cpp -o
> fastRand.o
> /bin/sh: line 1: /x86_64-w64-mingw32.static.posix/bin/g++: Bad address

I don't think this is a problem with your package. The shell says "Bad
address" when it gets an EFAULT while trying to run a program:

$ strace -f -e fault=execve:error=EFAULT:when=1 -e trace=execve \
 /bin/sh -c '/usr/bin/g++'
execve("/bin/sh", ["/bin/sh", "-c", "/usr/bin/g++"], [/* 51 vars */]) = 0
strace: Process 20756 attached
[pid 20756] execve("/usr/bin/g++", ["/usr/bin/g++"], [/* 50 vars */]) = -1 
EFAULT (Bad address) (INJECTED)
/bin/sh: 1: /usr/bin/g++: Bad address

There is not enough information to find out why this happens. I think
that since Rtools are based on MSYS2 which is based on Cygwin, the
place to look for EFAULT is Cygwin's implementation of the exec()
system call. Indeed, there's one such place there, after the giant
structured exception handling __try block, where errno is set to EFAULT
if the system got such an exception while launching a process without
previously setting errno to ENOMEM:
https://cygwin.com/cgit/newlib-cygwin/tree/winsup/cygwin/spawn.cc?id=ca2a4ec243627b19f0ac2c7262703f81712f3be4#n947

Does this happen every time? If not, I think the problem was
Win-Builder temporary running out of memory.

P.S.: Please don't compose HTML e-mail to this list with Thunderbird.
Thunderbird's auto-generated plain text version is all we get, and it's
severely mangled:
https://stat.ethz.ch/pipermail/r-package-devel/2023q4/010178.html

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] How to fix: non-standard things in the check directory: 'NUL' ?

2023-12-13 Thread Ivan Krylov
Dear Friedemann von Lampe,

Welcome to R-package-devel! This is a good, concise description of the
problem, but please also provide a link to your code in the future.

В Wed, 13 Dec 2023 09:58:41 +0100
Friedemann von Lampe 
пишет:

> Flavor: r-devel-linux-x86_64-debian-gcc
> Check: for non-standard things in the check directory, Result: NOTE
> Found the following files/directories:
> 'NUL'

The file named 'NUL' is created in the function screeplot_NMDS:

R/screeplot_NMDS.R:  capture.output(nmds_i <- invisible(metaMDS(matrix,
distance = distance, k = i, trymax = trymax, engine = "monoMDS",
autotransform = autotransform)), file='NUL')

https://github.com/fvlampe/goeveg/blob/db94c4a567eeac67b6df1df5a4f2d1aa771e629a/R/screeplot_NMDS.R#L76

On Windows, everything goes right and the output is redirected to the
null device. On Linux, the null device is '/dev/null', not 'NUL', and
this name doesn't hold any special powers, so the file with this name
gets created.

Use the base function nullfile() to obtain the name of the null device
in a portable manner. I think you can also not supply the `file`
argument and ignore the return value of the capture.output(...)
expression. This may be less efficient if there's truly a lot of output.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Partial matching performance in data frame rownames using [

2023-12-12 Thread Ivan Krylov
В Mon, 11 Dec 2023 21:11:48 +0100
Hilmar Berger via R-devel  пишет:

> What was unexpected is that in this case was that [.data.frame was
> hanging for a long time (I waited about 10 minutes and then restarted
> R). Also, this cannot be interrupted in interactive mode.

That's unfortunate. If an operation takes a long time, it ought to be
interruptible. Here's a patch that passes make check-devel:

--- src/main/unique.c   (revision 85667)
+++ src/main/unique.c   (working copy)
@@ -1631,6 +1631,7 @@
}
 }
 
+unsigned int ic = ;
 if(nexact < n_input) {
/* Second pass, partial matching */
for (R_xlen_t i = 0; i < n_input; i++) {
@@ -1642,6 +1643,10 @@
mtch = 0;
mtch_count = 0;
for (int j = 0; j < n_target; j++) {
+   if (!--ic) {
+   R_CheckUserInterrupt();
+   ic = ;
+   }
if (no_dups && used[j]) continue;
if (strncmp(ss, tar[j], temp) == 0) {
mtch = j + 1;

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] vignette with "Run Examples"

2023-12-12 Thread Ivan Krylov
On Tue, 12 Dec 2023 08:24:11 +0100
Sigbert Klinke  wrote:

> is it possible to get a button or link to run an example in a vignette

Technically, yes, but very hard to implement in practice.

Vignettes are a form of literate programming, expressed in terms of
files: there's a source file containing code mixed with prose, and
there are two programs, one of which extracts the code into a runnable
.R file and the other renders the code together with prose and any
resulting plots into a human-readable document. A link to run examples
implies that there's R running somewhere, which cannot be guaranteed by
the time the human-readable document is opened by the human.

One way around this problem would be to embed a copy of webR [*] in the
document so that R would run in the browser. This involves a
significant developer effort and would either bloat your vignette to
the size of an R installation or make it depend on external resources
to load webR from (that could go away or spy on the user). webR is
still experimental; last time I tried it, it crashed the browser tab
when I invoked functions from the quadprog package.

Another way would be to add a hack to the vignette engine to start a
server at vignette rendering time, insert the link to this server into
the vignette as it's being rendered and hope that the server is still
running by the time the vignette is opened. This would require the user
to re-render the vignette every time they restart the server.

Technically, one could also invent a completely new kind of vignette
engine that would output self-contained executable files with a
document rendering engine and R built in, so that a click on the "Run
examples" would use that built-in R. This is basically the webR
solution without the web and with a lot of extra pain.

You could also fake some of it by writing extra JavaScript (with the
help of third-party statistics libraries, e.g. [**]) to do the same
thing in the browser as is done in R, but that's still a lot of work
for little benefit.

Yet another way would be to make these links point to an external
service somewhere on the Internet that would run the R code. Since R is
not designed to work with untrusted input (not to mention untrusted
users entering code), that would be an informational security nightmare
both on your side (R would have to run in locked-down read-only
disposable virtual machines hardened against sandbox escape and
privilege escalation exploits) and on the GDPR side of things.

There are doubtlessly more approaches, but I think they would all be
this convoluted or worse.

-- 
Best regards,
Ivan

[*] https://docs.r-wasm.org/webr/latest/

[**] https://github.com/svkucheryavski/mdatools-js

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


  1   2   3   4   5   >