Re: [R-pkg-devel] Compile issues on r-devel-linux-x86_64-debian-clang with OpenMP

2024-05-23 Thread Ivan Krylov via R-package-devel
On Wed, 22 May 2024 09:18:13 -0500
Dirk Eddelbuettel  wrote:

> Testing via 'nm' as you show is possible but not exactly 'portable'.
> So any suggestions as to what to condition on here?

(My apologies if you already got an answer from Kurt. I think we're not
seeing his mails to the list.)

Perhaps take the configure test a bit further and try to dyn.load() the
resulting shared object? To be extra sure, call the function that uses
the OpenMP features? (Some weird systems may have lazy binding enabled,
making dyn.load() succeed but crashing the process on invocation of a
missing function.)

On GNU/Linux, the linker will happily leave undefined symbols in when
creating a shared library (unlike on, say, Windows, where extern void
foo(void); foo(); is a link-time error unless an object file or an
import library providing foo() is also present). When loading such a
library, the operation fails unless the missing symbols are already
present in the address space of the process (e.g. from a different
shared library).

A fresh process of R built without OpenMP support will neither link in
the OpenMP runtime while running SHLIB nor have the OpenMP runtime
loaded and so should successfully fail the test.

I also wouldn't call the entry point "main" just in case some future
compiler considers this a violation of the rules™ [*] and breaks the
code. extern "C" void configtest(int*) would be compatible with .C()
without having to talk to R's memory manager:

# The configure script:
cat > test-omp.cpp <
extern "C" void configtest(int * arg) {
  *arg = omp_get_num_threads();
}
EOF
# Without the following you're relying on the GNU/Linux-like behaviour
# w.r.t. undefined symbols (see WRE 1.2.1.1):
cat > Makevars 

Re: [Rd] confint Attempts to Use All Server CPUs by Default

2024-05-21 Thread Ivan Krylov via R-devel
В Tue, 21 May 2024 08:00:11 +
Dario Strbenac via R-devel  пишет:

> Would a less resource-intensive value, such as 1, be a safer default
> CPU value for confint?

Which confint() method do you have in mind? There is at least four of
them by default in R, and many additional classes could make use of
stats:::confint.default by implementing vcov().

> Also, there is no mention of such parallel processing in ?confint, so
> it was not clear at first where to look for performance degradation.
> It could at least be described in the manual page so that users would
> know that export OPENBLAS_NUM_THREADS=1 is a solution.

There isn't much R can do about the behaviour of the BLAS, because
there is no standard interface to set the number of threads. Some BLASes
(like ATLAS) don't even offer it as a tunable number at all [*].

A system administrator could link the installation of R against
FlexiBLAS [**], provide safe defaults in the environment variables and
educate the users about its tunables [***], but that's a choice just
like it had been a choice to link R against a parallel variant of
OpenBLAS on a shared computer. This is described in R Installation and
Administration, section A.3.1 [].

-- 
Best regards,
Ivan

[*]
https://math-atlas.sourceforge.net/faq.html#tnum

[**]
https://www.mpi-magdeburg.mpg.de/projects/flexiblas

[***]
https://search.r-project.org/CRAN/refmans/flexiblas/html/flexiblas-threads.html

[]
https://cran.r-project.org/doc/manuals/R-admin.html#BLAS

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] [External] Re: Assistance Needed to Resolve CRAN Submission Note

2024-05-19 Thread Ivan Krylov via R-package-devel
On Sun, 19 May 2024 09:52:08 +
Daniel Kelley  wrote:

> In answer to the question about tidy version on macOS, I have the
> latest version of that OS (Sonoma 14.5 release 23F79 -- a beta
> release, unless the official has caught up in recent days) and I get
> as follows.

> $  ~ /usr/bin/tidy --version
> HTML Tidy for Mac OS X released on 31 October 2006 - Apple Inc. build
> 9576

Thank you for providing the output! R CMD check only knows about "Apple
Inc. build 2649" (not 9576) being old, which must be why the spurious
NOTEs appeared on Zeinab's computer.

Submitted the updated patch at
.

> The second one is from the homebrew project, and that's
> what gets used by default on my machine.  I don't know which of these
> R would be using, but I could check that if required (and if provided
> a hint on how to invoke R to tell me).

I would expect Sys.which(Sys.getenv("R_TIDYCMD", "tidy")) to point to
the new version in /usr/local/bin/.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] [External] Re: Assistance Needed to Resolve CRAN Submission Note

2024-05-18 Thread Ivan Krylov via R-package-devel
On Sat, 18 May 2024 21:10:18 +
"Richard M. Heiberger"  wrote:

> when checking a package and discovering these messages about html5,
> can you generate an informational message about tidy with a link to
> updating tidy?

That's a useful suggestion.

Would you mind testing the patch from
?

If you or someone else here has a computer running macOS, what exactly
does it print when running `tidy --version` (1) with an old version of
Tidy (that comes with macOS) and (2) with a new (>= 5) version of Tidy?

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Altrep header, MSVC, and STRUCT_SUBTYPES macro

2024-05-17 Thread Ivan Krylov via R-package-devel
В Thu, 16 May 2024 21:32:24 +0200
David Cortes  пишет:

> Unfortunately, after some further testing, it seems this was just a
> matter of getting lucky - using the alternative non-STRUCT_SUBTYPES 
> def. of Altrep still leads to memory corruptions and crashes, just at
> different points than when using the STRUCT_SUBTYPES definition.

So much for the hope for an easy solution.
 
> May I ask: how would you go around getting R code into Godbolt?

Definitely not much of it. I was assuming that the problem was due to
passing structs by value (something that had been a problem for MSVC
compatibility more than a decade ago on x86 Windows), so I only
provided typedef struct SEXPREC *SEXP, one of the two definitions of
R_altrep_class_t and related macros, declared a number of functions
that would accept or return R_altrep_class_t by value, and tried to
call them.

I don't know a good way to use Godbolt for larger amounts of code.

> As far as I can tell, the altrep methods are calling the functions
> which they were assigned so at least the 'set' and 'dataptr' methods
> are working, but memory corruptions that crash the program happen
> after calling such altrep methods, particularly when there is a
> combination of 'R_UnwindProtect', C++ 'catch' that involves
> destructing variables before 'R_ContinueUnwind', and then 'Rf_error'.

What do the crashes look like? Is it heap corruption? Stack corruption?
Are they at least deterministic?

Can you reproduce the error by compiling one of the example ALTREP
classes [1] with MSVC, without C++ exception handling? What about using
R_ContinueUnwind() and/or C++ exceptions in some non-ALTREP code
compiled using MSVC?

Maybe if you run the code under Dr. Memory [2] or Application Verifier
[3], it'll detect the corruption slightly earlier and let you pinpoint
the problem? I'm assuming that there is no good way to link a sanitizer
into the process.

Can you eliminate C runtime incompatibility? Is there a chance that a
heap object allocated by the UCRT linked to R is freed by the CRT
linked to the MSVC-side library (or vice versa)?

-- 
Best regards,
Ivan

[1]
https://github.com/altrep-examples

[2]
https://drmemory.org/

[3]
https://learn.microsoft.com/en-us/windows-hardware/drivers/devtest/application-verifier

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Assistance Needed to Resolve CRAN Submission Note

2024-05-16 Thread Ivan Krylov via R-package-devel
В Thu, 16 May 2024 16:01:45 +
Zeinab Mashreghi  пишет:

> checking HTML version of manual ... NOTE
> Found the following HTML validation problems:
> All.data.html:4:1 (All.data.Rd:10): Warning:  inserting "type"
> attribute
> All.data.html:12:1 (All.data.Rd:10): Warning: 

Re: [Rd] FR: Customize background colour of row and column headers for the View output

2024-05-16 Thread Ivan Krylov via R-devel
The change suggested by Iago Giné Vázquez is indeed very simple. It
sets the background colour of the row and column headers to the
background of the rest of the dataentry window. With this patch, R
passes 'make check'. As Duncan Murdoch mentions, the X11 editor already
behaves this way.

If it's not acceptable to make the row and column headers the same
colour as the rest of the text, let's make it into a separate setting.

--- src/library/utils/src/windows/dataentry.c   (revision 86557)
+++ src/library/utils/src/windows/dataentry.c   (working copy)
@@ -1474,7 +1474,7 @@
 resize(DE->de, r);
 
 DE->CellModified = DE->CellEditable = FALSE;
-bbg = dialog_bg();
+bbg = guiColors[dataeditbg];
 /* set the active cell to be the upper left one */
 DE->crow = 1;
 DE->ccol = 1;

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Altrep header, MSVC, and STRUCT_SUBTYPES macro

2024-05-16 Thread Ivan Krylov via R-package-devel
В Wed, 15 May 2024 18:54:37 +0200
David Cortes  пишет:

> The code compiles without errors under MSVC, but executing code that
> involves returning Altrep objects leads to segfaults and memory
> corruptions, even though it works fine under other compilers.
> 
> I see the R Altrep header has this section:
> #define STRUCT_SUBTYPES
> #ifdef STRUCT_SUBTYPES
> # define R_SEXP(x) (x).ptr
> # define R_SUBTYPE_INIT(x) { x }
>   typedef struct { SEXP ptr; } R_altrep_class_t;
> #else
> # define R_SEXP(x) ((SEXP) (x))
> # define R_SUBTYPE_INIT(x) (void *) (x)
>   typedef struct R_altcls *R_altrep_class_t;
> #endif

Interesting ABI incompatibility you've found. Can you show a minimal
example? I've tried playing with https://godbolt.org/ and passing
around values of type R_altrep_class_t between functions, but couldn't
convince "x64 msvc v19.latest" to generate different assembly no matter
whether R_altrep_class_t was a pointer or a struct containing a SEXP.

> If I manually edit the R header to remove the definition of
> 'STRUCT_SUBTYPES', leading to the second definition of
> 'R_altrep_class_t' being used, then things work as expected when the
> package is compiled with MSVC (no segfaults and no memory
> corruptions).

While it's hard to argue with results (I don't think it'll ever be
broken on x86_64 Windows), this workaround relies on undefined
behaviour and will only work as long as the ABI as understood by GCC
passes a structure with a pointer inside exactly the same way as the
ABI as understood by MSVC passes a bare pointer.

Isolating the MSVC-specific code as suggested by Vladimir should be
safer, but it's also important to find out where exactly the
incompatibility arises from. The GCC and MSVC parts still have to use a
common ABI to talk to each other.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Assistance Needed to Resolve CRAN Submission Note

2024-05-16 Thread Ivan Krylov via R-package-devel
Dear Zeinab,

Welcome to R-package-devel!

В Thu, 16 May 2024 03:22:56 +
Zeinab Mashreghi  пишет:

> I recently submitted my R package to CRAN, and I received this note
> from the CRAN teams: "checking CRAN incoming feasibility ... NOTE."

Without a link to the full error log and, ideally, to the source code
of the package, it's impossible to help with such a NOTE, because the
check for "CRAN incoming feasibility" encompasses many tests. I was
lucky to fish your package from the archived queue and correlate it
with the publicly available logs, but it's not always this simple:

https://win-builder.r-project.org/incoming_pretest/bootsurv_0.0.0.9000_20240515_212834/

>> New submission

This is expected and will always result in a NOTE to flag the package
to a CRAN reviewer.

>> Version contains large components (0.0.0.9000)

The convention is to use version components like 9000 for pre-release,
untested versions of packages. Could you please use a version like
0.0.1 for the version of the package to be released on CRAN?

> Unknown, possibly misspelled, fields in DESCRIPTION:
>  ‘ImportFrom’ ‘Data’

'importFrom' is a NAMESPACE file directive [1]. The DESCRIPTION must
list 'Imports:' instead [2]. What did you intend to mean with the Data:
field of your DESCRIPTION?

>> The Title field should be in title case. Current version is:
>> ‘Bootstrap Methods for complete (absence of missing values) survey
>> data’
>> In title case that is:
>> ‘Bootstrap Methods for Complete (Absence of Missing Values) Survey
>> Data’

This is yet another CRAN convention. You'll need to change the 'Title:'
field of the DESCRIPTION file.

>> * checking Rd line widths ... NOTE
>> Rd file 'boot.twostage.Rd':
>>  \examples lines wider than 100 characters:

Could you please wrap the lines of your \examples{} sections to 100
characters or less?

>> * checking examples ... [21m/21m] NOTE
>> Examples with CPU (user + system) or elapsed time > 5s
>>user system  elapsed
>> boot.weights.stsrs 1242.735  0.492 1243.285
>> boot.twostage 9.799  0.0489.847

The \examples{...} in your documentation are not only to be read by the
user. R CMD check runs them periodically on the CRAN servers. The user
should also be able to run example(boot.weights.stsrs) and see your
code directly in action. CRAN requires examples to run in 5 seconds or
less, both elapsed and CPU time. An example(...) that runs for 20
minutes is way too long. You'll need to find a way to reduce the time
spent in the example.

Wasn't something like this said in the e-mail you received from CRAN?

> When I run R CMD check on my device, I do not encounter any issues,

Even with R CMD check --as-cran?

-- 
Best regards,
Ivan

[1]
https://cran.r-project.org/doc/manuals/R-exts.html#Specifying-imports-and-exports

[2]
https://cran.r-project.org/doc/manuals/R-exts.html#The-DESCRIPTION-file

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] [External] R hang/bug with circular references and promises

2024-05-13 Thread Ivan Krylov via R-devel
On Mon, 13 May 2024 09:54:27 -0500 (CDT)
luke-tierney--- via R-devel  wrote:

> Looks like I added that warning 22 years ago, so that should be enough
> notice :-). I'll look into removing it now.

Dear Luke,

I've got a somewhat niche use case: as a way of protecting myself
against rogue *.rds files and vulnerabilities in the C code, I've been
manually unserializing "plain" data objects (without anything
executable), including environments, in R [1].

I see that SET_ENCLOS() is already commented as "not API and probably
should not be <...> used". Do you think there is a way to recreate an
environment, taking the REFSXP entries into account, without
`parent.env<-`?  Would you recommend to abandon the folly of
unserializing environments manually?

-- 
Best regards,
Ivan

[1]
https://codeberg.org/aitap/unserializeData/src/commit/33d72705c1ee265349b3e369874ce4b47f9cd358/R/unserialize.R#L289-L313

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] An issue regarding the authors field in DESCRIPTION

2024-05-13 Thread Ivan Krylov via R-package-devel
В Mon, 13 May 2024 08:33:04 -0500
Ruwani Herath  пишет:

> This is what I entered in DESCRIPTION field.
> 
> Authors@R: c(person(given = "Ruwani", family = "Herath", role =
> c("aut","cre"), email = "ruwanirasanja...@gmail.com"),
>person(given = "Leila", family = "Amiri",  role = "ctb"),
>  person(given = "Mahmoud", family = "Torabi", role =
> "ctb"))
> 
> Authors: Ruwani Herath [aut, cre],
>   Leila Amiri [ctb],
>   Mahmoud Torabi [ctb]
> Maintainer: Ruwani Herath 

R CMD build generates the fields "Authors" and "Maintainers" from the
field "Authors@R", so the easiest way forward is to delete Authors: and
Maintainer: from your DESCRIPTION. Next time you run R CMD build, the
DESCRIPTION file inside the resulting *.tar.gz file will contain the
correct fields "Authors" and "Maintainers".

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] clang-UBSAN

2024-05-13 Thread Ivan Krylov via R-package-devel
В Sun, 12 May 2024 14:43:18 -0400
Kaifeng Lu  пишет:

> /data/gannet/ripley/R/test-clang/Rcpp/include/Rcpp/internal/caster.h:30:25:
> runtime error: nan is outside the range of representable values of
> type 'int'

On line 4618 of src/misc.cpp of the lrstat package, you have a
suspicious default parameter value:

>> const int n = NA_REAL

NA_REAL is a special kind of NaN, and C++ signed integers cannot
represent NaNs. You probably meant NA_INTEGER.

I think that Rcpp::traits::input_parameter takes care of
asking R to cast NA_REAL to NA_INTEGER, so this shouldn't directly
cause problems, but without a link to the code and the full error
report we have to resort to forbidden OSINT techniques [1], which don't
always work reliably and may attract the wrong kind of attention on the
darknet [2].

> Is there any way to reproduce the error before submitting the package
> to CRAN?

Yes.

If you use containers, try the rocker/r-devel-ubsan-clang [3] image
that should already contain a "sanitized" build of R produced with the
clang compiler.

If that doesn't help, start with a Fedoda 36 installation and follow
the description [4] to install clang and compile R from source with
sanitizers enabled. This procedure is described in more detail in WRE
4.3.4 [5].

If you start having problems using the Docker/podman image or compiling
R from source, don't hesitate to ask further questions.

-- 
Best regards,
Ivan

[1]
Such as searching your name on CRAN and GitHub.

[2]
Such as Google suggesting AI-powered results.

[3]
https://rocker-project.org/images/

[4]
https://www.stats.ox.ac.uk/pub/bdr/memtests/README.txt

[5]
https://cran.r-project.org/doc/manuals/R-exts.html#Using-Undefined-Behaviour-Sanitizer

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Fast Matrix Serialization in R?

2024-05-10 Thread Ivan Krylov via R-package-devel
On Fri, 10 May 2024 15:12:17 +1200
Simon Urbanek  wrote:

> I wonder if it may be worth doing something a bit smarter and tag
> officially a "reverse XDR" format instead - that way it would be
> well-defined and could be made the default.

Do you mean changing R so that when reading a "B\n" serialized stream,
a format code read as 0x0200 or 0x0300 would mean regular
formats 2 or 3 but byte-swapped? That would be backwards-compatible,
and we probably weren't going to have >= 65536 format versions anyway...

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Overcoming CRAN's 5mb vendoring requirement

2024-05-09 Thread Ivan Krylov via R-package-devel
В Wed, 8 May 2024 16:01:23 -0400
Josiah Parry  пишет:

>- I'll see if I can get the configure.ac to make the appropriate
> Rscript call for configure.win.
>   - I think the idea of having a single `confgure.ac` file to
> generate both configure and configure.win is nice. Guidance with
> GitHub actions and ChatGPT is essentially a must for me since my bash
> is remedial at best.

Then you might like Kevin Ushey's configure
, which is like autoconf
redone in R. The only few lines of bash are the system-specific bits in
{configure,cleanup}{.win,} to run the R scripts under tools/, and they
are already written for you.

Generating two system-specific configures from one configure.ac might
be possible - GNU m4 is very versatile - but to implement that, you
would have to program m4, which is even more niche than bash.

> The requirement to avoid GitHub feels surprisingly anachronistic
> given how central it is to the vast majority of software development.

I think that Ben Bolker's answer explains it very well. Part of the
goal of the CRAN archive is to be able to take a package, a
period-appropriate version of R and install the former on the latter.
The URL carrying the code must be able to survive as long. Unlike
Zenodo, GitHub's goal is not directly to provide storage forever, and
its current owners have a reputation [*] that could have played a part
in the requirement to avoid them.

I wonder if it would be ethical to use Archive.org for this.

In an ideal world, CRAN would be able to directly archive larger
software packages (just like PyPI is currently hosting more than a
terabyte of Tensorflow builds and a few terabytes more of other
GPU-related code [**]) without requiring the maintainers to swim
between the Scylla of vendoring the dependencies and the Charybdis of
making the build depend on an external URL, but that's a luxury someone
would have to pay for.

-- 
Best regards,
Ivan

[*]
https://stat.ethz.ch/pipermail/r-package-devel/2024q2/010708.html

[**]
https://discuss.python.org/t/what-to-do-about-gpus-and-the-built-distributions-that-support-them/7125/16

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] flang doesn't support derived types

2024-05-09 Thread Ivan Krylov via R-package-devel
В Thu, 09 May 2024 15:31:25 +
Othman El Hammouchi  пишет:

> Do I understand it correctly that there is no way to specify a
> Fortran standard in the SystemRequirements?

It's possible (and even recommended) to describe the Fortran version
requirement in SystemRequirements [1], but this field is for now mostly
informational. I think I remember efforts to standardise it, but they
are far from complete.

> I had resubmitted my package in the mean time with a configure script
> that aborts the install if the compiler does not support
> polymorphism, but I understand that this is a fruitless avenue for
> CRAN?

Signs point to yes, at least judging by a previous time we had
flang-related problems [2]. On the other hand, there were relatively
easy workarounds that time, and here I'm not seeing anything as simple.

> I should point out my local flang install is version 16, but I cannot
> install 18 on my system since it's in unstable (this again
> underscores the problem of developing under these constraints).

Would you consider containers for this purpose? I was able to reproduce
the problem relatively quickly by starting podman run --rm -it
debian:sid and installing flang-18 in there. (Unlike Docker a few years
ago, podman can be installed straight from the repository, at least on
Debian, and doesn't require adding users to special groups in order to
work. Maybe Docker has also improved.) I don't like containers as a
basis for software distribution, but I can't deny that they are being
great at letting me quickly reproduce problems without installing 10
different GNU/Linux distros.

> What would you advise? And don't you think these Fortran constraints
> should be better documenten.

I'm afraid I don't have any more specific advice besides testing your
workarounds with Debian Sid in a container or a virtual machine or a
chroot. I can try to take a look at more concrete problems. I hope you
will be able to find a relatively painless workaround.

I do wish that flang-new would be a better compiler or at least a
better documented one, but instead of a list of features on their
website, I can only see "Getting Involved [3] for tips on how to get in
touch <...> and to learn more about the current status". There is only
so many projects one can get involved in.

-- 
Best regards,
Ivan

[1]
https://cran.r-project.org/doc/manuals/R-exts.html#Using-modern-Fortran-code

[2]
https://stat.ethz.ch/pipermail/r-package-devel/2023q4/010065.html

[3]
https://flang.llvm.org/docs/GettingInvolved.html

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] flang doesn't support derived types

2024-05-09 Thread Ivan Krylov via R-package-devel
Dear Othman El Hammouchi,

Welcome to R-package-devel!

В Wed, 08 May 2024 16:52:51 +
Othman El Hammouchi  пишет:

> However, upon submission I received an automatic reply shortly
> afterwards saying the build had failed on CRAN's servers for Debian.
> The log gives the following error:
> 
> flang/lib/Lower/CallInterface.cpp:949: not yet implemented: support
> for polymorphic types

Your use of contained procedures in class(t_mack_triangle) and
class(t_cl_res) signifies the derived types as being extensible and
thus potentially polymorphic. You'll have to replace class(...) with
type(...) and move the contained procedures out of the type definitions
(and maybe additionally make the types 'sequence' or 'bind(C)' to
signify them being non-extensible) to make the code work with flang-18.
I'm afraid this will also prevent you from defining destructors for
these types.

flang-new can be a very disappointing compiler at times [*], but it's
what people do use in the real world, especially for 64-bit ARM
processors, so in order to keep our packages portable, we have to cater
to its whims.

-- 
Best regards,
Ivan

[*] https://stat.ethz.ch/pipermail/r-package-devel/2023q4/009987.html

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Overcoming CRAN's 5mb vendoring requirement

2024-05-08 Thread Ivan Krylov via R-package-devel
В Wed, 8 May 2024 14:08:36 -0400
Josiah Parry  пишет:

> With ChatGPT's ability to write autoconf, I *think *I have something
> that can work.

You don't have to write autoconf if your configure.ac is mostly a plain
shell script. You can write the configure script itself. Set the PATH
and then exec "${R_HOME}/bin/Rscript" tools/configure.R (in the
regular, non-multiarch configure for Unix-like systems) or exec
"${R_HOME}/bin${R_ARCH_BIN}/Rscript.exe" tools/configure.R (in
configure.win, which you'll also need). You've already wrote the rest
of the code in a language you know well: R.

Autoconf would be useful if you had system-specific dependencies with
the need to perform lots of compile tests. Those would have been a pain
to set up in R. Here you mostly need sys.which() instead of
AC_CHECK_PROGS and command -v.

> The configure file runs tools/get-deps.R which will download the
> dependencies from the repo if available and verify the checksums.

One of the pain points is the need for a strong, cryptographically
secure hash. MD5 is, unfortunately, no longer such a hash. In a cmake
build, you would be able to use cmake's built in strong hashes (such as
SHA-2 or SHA-3). The CRAN policy doesn't explicitly forbid MD5; it only
requires a "checksum". If you figure out a way to use a strong hash
from tools/configure.R for the downloaded tarball, please let us know.

> If the checksums don't match, an error is thrown, otherwise it can
> continue. I believe this meets the requirements of CRAN?

The other important CRAN requirement is to store the vendor tarball
somewhere as permanent as CRAN itself (see the caveats at the bottom of
https://cran.r-project.org/web/packages/using_rust.html), that is, not
GitHub. I think that Zenodo counts as a sufficiently reliable store.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] package removed from CRAN

2024-05-08 Thread Ivan Krylov via R-package-devel
В Wed, 8 May 2024 17:30:46 +0200
"Jose V. Die Ramon"  пишет:

> Could anyone please help me understand the reasons behind this, or
> suggest any steps I should take to resolve it?

Here's what I could find in
https://cran.r-project.org/src/contrib/PACKAGES.in:

>> X-CRAN-Comment: Archived on 2024-04-30 for policy violation.
>>  .
>>  On Internet access.  Also other errors.

So Avi is right, this is about the tests and/or examples failing
(possibly due to problems on the remote server).

If possible, try to emit errors with a special class set for
Internet-related errors. This will make it possible for your examples
and tests to catch them, as in:

tests/*.R:

tryCatch(
 ,
 refseqR_internet_error = function(e)
  message("Caught Internet-related error")
)

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Cannot repro failing CRAN autochecks

2024-05-07 Thread Ivan Krylov via R-package-devel
В Tue, 7 May 2024 21:40:31 +0300
Ivan Krylov via R-package-devel  пишет:

> It's too late for Makevars to exclude files from the source package
> tarball. Use .Rbuildignore instead:

Sorry, that was mostly misguided. .Rbuildignore won't help with the
contents of the Rust vendor tarball.

1. Can you omit the .cff file from src/rust/vendor.tar.xz when building
it?

2. I think that there is --exclude in both GNU tar and BSD tar. How
about tar --exclude="*.cff" -x -f rust/vendor.tar.xz ?

3. From
<https://win-builder.r-project.org/incoming_pretest/arcgisutils_0.3.0_20240507_194020/Debian/00install.out>,
it can be seen that the "clean" target does not get called. Can you
remove the *.cff file in the same Make target?

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Cannot repro failing CRAN autochecks

2024-05-07 Thread Ivan Krylov via R-package-devel
В Tue, 7 May 2024 14:03:42 -0400
Josiah Parry  пишет:

> This NOTE does not appear in Ubuntu, Mac, or Windows checks
> https://github.com/R-ArcGIS/arcgisutils/actions/runs/8989812276/job/24693685840

That's a bit strange. It fires for me in a local R CMD check for a test
package even without --as-cran. The code performing the check has been
in R since ~2010.

> I've made an edit to the Makevars to specifically remove this
> directory, but it seems to continue to persist.

It's too late for Makevars to exclude files from the source package
tarball. Use .Rbuildignore instead:
https://cran.r-project.org/doc/manuals/R-exts.html#Building-binary-packages

I think that the line src/vendor/chrono/CITATION\\.cff will prevent the
file from appearing in the package tarball.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Trouble with dependencies on phyloseq and microViz

2024-05-07 Thread Ivan Krylov via R-package-devel
On Tue, 7 May 2024 10:07:59 +1200
Simon Urbanek  wrote:

> That doesn't work - additional repositories are not allowed on CRAN
> other than in very exceptional cases, because it means the package
> cannot be installed by users making it somewhat pointless.

I suppose that with(tools::CRAN_package_db(),
sum(!is.na(Additional_repositories)) / length(Additional_repositories))
= 0.7% does make it very rare. But not even for a weak dependency? Is
it for data packages only, as seems to be the focus of
[10.32614/RJ-2017-026]? The current wording of the CRAN policy makes it
sound like Additional_repositories is preferred to explaining the
non-mainstream weak dependencies in Description.

So what should be done about the non-Bioconductor weak dependency
microViz?

> As for the OP, can you post the name of the package and/or the link
> to the errors so I can have a look?

Sharon has since got rid of the WARNING and now only has NOTEs due to
microViz and a URL to its repo in the Description:
https://win-builder.r-project.org/incoming_pretest/HybridMicrobiomes_0.1.2_20240504_185748/Debian/00check.log

If Additional_repositories: is the correct way to specify a
non-mainstream weak dependency for a CRAN package, the URL must be
specified as https://david-barnett.r-universe.dev/src/contrib, not just
https://david-barnett.r-universe.dev/. I am sorry for not getting it
right the first time.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Trouble with dependencies on phyloseq and microViz

2024-05-04 Thread Ivan Krylov via R-package-devel
В Sat, 4 May 2024 15:53:25 +
Sharon Bewick  пишет:

> I have a dependency on phyloseq, which is available through GitHub
> but not on the CRAN site. I have a similar problem with microViz,
> however I’ve managed to make it suggested, rather than required.
> There is no way to get around the phyloseq requirement. How do I fix
> this problem so that I can upload my package to the CRAN website?

Did a human reviewer tell you to get rid of the dependencies? There is
at least 444 packages on CRAN with strong dependencies on Bioconductor
packages, so your phyloseq dependency should work. In fact, 14 of them
depend on phyloseq.

What you need is an Additional_repositories field in your DESCRIPTION
specifying the source package repository where microViz could be
installed from. I think that

Additional_repositories: https://david-barnett.r-universe.dev

...should work.

Besides that, you'll need to increment the version and list the *.Rproj
file in .Rbuildignore:
https://win-builder.r-project.org/incoming_pretest/HybridMicrobiomes_0.1.1_20240504_173331/Debian/00check.log

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit

2024-05-04 Thread Ivan Krylov via R-package-devel
On Sat, 4 May 2024 08:09:28 +0200
Maciej Nasinski  wrote:

> What do you think about promoting containers?

Containers have an attack surface too, have user experience problems
(how's Docker on Windows?) and may bring in more third-party code than
what you're trying to protect against (whole operating system images!).
Even Firejail and Bubblewrap, containers specifically designed to
sandbox untrusted code, have bugs in their setup or implementation
every now and then.

Still, you are welcome to run third-party code in a virtual machine or
a container. It may be not everyone's favourite trade-off, but is a net
increase in security over running untrusted code directly. Feel free to
search for a point on the Pareto optimal line between security and
convenience that you'll be comfortable with: https://xkcd.com/2044/

> Nowadays, containers are more accessible, with GitHub codespaces
> being more affordable (mostly free for students and the educational
> sector).

The GitHub-isation of the development process is kind of a
vulnerability too, or at the very least has a cost. I'm a few
handshakes away from several people who have been disappeared from
GitHub and couldn't get their accounts back. Microsoft is too big to
have real tech support [*], so once you fall foul of their AI
moderation systems, you'll have to be a Hacker News celebrity to
attract attention of a human on the inside.

I've got an ageing ThinkPad that I cannot afford to replace. It can
process all the data I've been gathering during my PhD and then some,
least squares, inverse problems, you name it, all while playing music
and having Quake I open. But the moment I try to launch Codespaces, it
downloads more bytes of JavaScript than the whole Quake I installation
takes in size, and then the browser overheats the laptop.

Maybe programming other people's computers in the browser is the
future, but then you need a fancy laptop and maybe a friend in
Microsoft just to be admitted into that future. A solution for some,
but not for all.

-- 
Best regards,
Ivan

[*] https://danluu.com/diseconomies-scale/

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit

2024-05-03 Thread Ivan Krylov via R-package-devel
On Fri, 3 May 2024 18:17:52 +0200
Maciej Nasinski  wrote:

> I found the https://github.com/hrbrmstr/rdaradar solution and ran it
> on the 100 most downloaded R packages.
> Happily, all data/inst rda files are safe/non-exposed to RDS exploit
> (using the linked solution).

This is a bit useful - knowing that there are no obvious exploits in
the 100 most downloaded CRAN packages is better that not knowing that - 
but it is important to keep the big picture in mind. Bob himself said
that the script is "super basic". Currently, it only checks whether an
*.rda file, when loaded in the global environment, would shadow certain
important functions. This is not an attack a package author would
perform; this is something one would send directly to the victim.

In order to defeat an attacker, you must think like an attacker.

Here's someone jokingly describing how they would trojan the world's
online shop checkout systems if they wanted to commit financial crimes:
https://archive.ph/FCdBu
(With kindness and pull requests.)

Here's someone spending two years to plant a fake maintainer with a
backdoor in a key free software project:
https://lwn.net/Articles/967192/
(The backdoor was assembled from obfuscated "test files for the
decompressor".)

Here's the 2015 Underhanded C Contest, where people competed in writing
the most harmless-looking code that would instead do something
nefarious: http://www.underhanded-c.org/

On the one hand, hiding the bad functions in a data file (which is
compressed and binary) instead of the R files (which are plain text and
indexed everywhere) would be the obvious first step, so it may be
useful to flag data files with functions in them for human review.

On the other hand, an evil package author has so many tools at their
disposal that they may not need this one in particular. There are CRAN
packages with tens of megabytes of compiled code inside. Sneaking a
little extra something in a file starting with "// This is generated
grammar parser. Do not edit!" followed by an impenetrable wall of C
could be easier and stay undetected for longer. How many packages use
Java? You don't even have to ship the Java source together with an R
package, so one of your *.jars could have a poisoned dependency with
nobody being the wiser.

Attackers are very cunning, and we don't even know what exactly we are
looking for. We can automate some of it, but the kind of code review
that will spot an evil function tucked 50 layers inside a giant
auxiliary data object is a lot of effort, hours to days per package.

> It will be great to run it on all CRAN packages, but I imagine we
> should be sure that the check is decent enough to not overload the
> servers without a need.

This probably counts as creating an unofficial CRAN mirror:
https://cran.r-project.org/mirror-howto.html

(I remember someone sending too many requests to download packages one
my one and losing access from a university address to CRAN as a result.)

You'll need 12.7 Gb for the current versions of the packages or >400 Gb
for the whole archive.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Urgent Review of R Packages in Light of Recent RDS Exploit

2024-05-03 Thread Ivan Krylov via R-package-devel
Dear Maciej Nasinski,

On Fri, 3 May 2024 11:37:57 +0200
Maciej Nasinski  wrote:

> I believe we must conduct a comprehensive review of all existing CRAN
> packages.

Why now? R packages are already code. You don't need poisoned RDS files
to wreak havoc using an R package.

On the other hand, R data files contain R objects, which contain code.
You don't need exploits to smuggle code inside an R object.

> Additionally, I will expect an introduction of an additional
> step in the R CMD check process.

What exactly would you like this step to be?

> It is stated that R Team is aware of
> that, and the exploit is fixed in R 4.4.0, but I can not find any
> clear bullet point in the NEWS file for 4.4.0
> (https://cran.r-project.org/doc/manuals/r-release/NEWS.html).

This has recently been discussed in the R-help thread:
https://stat.ethz.ch/pipermail/r-help/2024-May/479287.html

> I look forward to your thoughts and collaborating closely on this
> urgent review.

It may be worth teaching people that in general, R data files should be
as trusted as R code.

It may also be worth setting aside a strict subset of the R data format
to carry data only, without any executable code [*], but it may turn
out to be much less useful than it sounds. For example, you won't be
able to save many kinds of model objects using this plain data format,
which makes it unrealistic to require plain data only inside data files
in CRAN packages.

An independent review of the whole >2 packages on CRAN for
malicious behaviour is a noble endeavour, but it will require people
and funding. Perhaps you could try to apply for an R Consortium
infrastructure grant to do that.

-- 
Best regards,
Ivan

[*] https://aitap.github.io/2024/05/02/unserialize.html#subset

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Extending proj with proj.line3d methods and overloading the methods

2024-04-28 Thread Ivan Krylov via R-package-devel
В Sun, 28 Apr 2024 15:15:06 +
Leo Mada  пишет:

> This is why I intended to define a new method "proj.line3d" and
> overload this method. But it seems that R interprets "line3d.numeric"
> as a class - which originates probably from the "data,frame" class.

It may help to call the original 'proj' function and your new
'proj.line3d' function "generics", because that's what most S3
literature calls these functions that you overload. This separates them
from the "methods" 'proj.line3d.numeric' and 'proj.line3d.matrix' that
can be said to "implement" or "overload" the generic.

A concise but very readable guide to S3 and other built-in OOP systems
in R can be found in Advanced R by Hadley Wickham:
http://adv-r.had.co.nz/OO-essentials.html#s3

> How can I define a real method "proj.line3d"?

In order to export an S3 generic and register methods for it from a
package, you need the following directives in your NAMESPACE:

export(proj.line3d)
S3method(proj.line3d, numeric) # will use function proj.line3d.numeric
S3method(proj.line3d, matrix) # similar



> There might be some limitations from Roxygen as well (as I use it for
> the package); but it might be easier to proceed, once I understand
> how to do it in R.

The roxygen2 documentation says that if there are multiple dots in the
name of a function, you need to use the two-argument form of the
@method keyword: @method proj.line3d numeric (untested).


> I thought that this solves the problem:
> proj.line3d <- function(p, x, y, z, ...)
>   UseMethod("proj.line3d")

Right. This is the definition of an S3 generic generic in R. As long as
all the methods will also accept arguments (p, x, y, z, , ), all will be fine.

> The other solution, as you pointed out, is more cumbersome; and it
> needs 2 separate classes, so I would need to define "proj" as an S4
> class (as S3 does not handle 2 classes at once).

Moreover, you still need an exported generic proj.line3d and registered
methods for it to work. Inheritance does work in S3 (see NextMethod()),
but it alone won't help you call proj.line3d.numeric() from proj() and
'numeric' x.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] max on numeric_version with long components

2024-04-27 Thread Ivan Krylov via R-devel
В Sat, 27 Apr 2024 13:56:58 -0500
Jonathan Keane  пишет:

> In devel:
> > max(numeric_version(c("1.0.1.1", "1.0.3.1",  
> "1.0.2.1")))
> [1] ‘1.0.1.1’
> > max(numeric_version(c("1.0.1.1000", "1.0.3.1000",  
> "1.0.2.1000")))
> [1] ‘1.0.3.1000’

Thank you Jon for spotting this!

This is an unintended consequence of
https://bugs.r-project.org/show_bug.cgi?id=18697.

The old behaviour of max() was to call
which.max(xtfrm(x)), which first produced a permutation that sorted the
entire .encode_numeric_version(x). The new behavioiur is to call
which.max directly on .encode_numeric_version(x), which is faster (only
O(length(x)) instead of a sort).

What do the encoded version strings look like?

x <- numeric_version(c(
 "1.0.1.1", "1.0.3.1", "1.0.2.1"
))
# Ignore the attributes
(e <- as.vector(.encode_numeric_version(x)))
# [1] "101575360400"
# [2] "103575360400"
# [3] "102575360400"

# order(), xtfrm(), sort() all agree that e[2] is the maximum:
order(e)
# [1] 1 3 2
xtfrm(e)
# [1] 1 3 2
sort(e)
# [1] "101575360400"
# [2] "102575360400"
# [3] "103575360400"

# but not which.max:
which.max(e)
# [1] 1

This happens because which.max() converts its argument to double, which
loses precision:

(n <- as.numeric(e))
# [1] 1e+27 1e+27 1e+27
identical(n[1], n[2])
# [1] TRUE
identical(n[3], n[2])
# [1] TRUE

Will be curious to know if there is a clever way to keep both the O(N)
complexity and the full arbitrary precision.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Extending proj with proj.line3d methods and overloading the methods

2024-04-27 Thread Ivan Krylov via R-package-devel
27 апреля 2024 г. 00:49:47 GMT+03:00, Leo Mada via R-package-devel 
 пишет:
>Dear List-Members,
>
>I try to implement a proj.line3d method and to overload this method as follows:
>
>proj.line3d <- function(p, x, y, z, ...)
>  UseMethod("proj.line3d")
>
>proj.line3d.numeric = function(p, x, y, z, ...) {
>  # ...
>}
>
>proj.line3d.matrix = function(p, x, y, z, ...) {
>  # ...
>}

>p = c(1,2,3)
>line = matrix(c(0,5,2,3,1,4), 2)
>proj.line3d(p, line)
>#  Error in UseMethod("proj.line3d") :
>#   no applicable method for 'proj.line3d' applied to an object of class 
>"c('double', 'numeric')"

>methods(proj)
># [1] proj.aov*   proj.aovlist*   proj.default*   proj.line3d
># [5] proj.line3d.matrix  proj.line3d.numeric proj.lm

In your NAMESPACE, you've registered methods for the generic function 'proj', 
classes 'line3d.matrix' and 'line3d.numeric', but above you are calling a 
different generic, 'proj.line3d', for which no methods are registered.

For proj.line3d(, ) to work, you'll have to register the 
methods for the proj.line3d generic. If you need a visible connection to the 
proj() generic, you can try registering a method on the 'proj' generic, class 
'line3d' *and* creating a class 'line3d' that would wrap your vectors and 
matrices:

proj(line3d(p), line) -> call lands in proj.line3d -> maybe additional dispatch 
on the remaining classes of 'p'?

This seems to work, but I haven't tested it extensively:

> proj.line3d <- \(x, ...) UseMethod('proj.line3d')
> proj.line3d.numeric <- \(x, ...) { message('proj.line3d.numeric'); x }
> line3d <- \(x) structure(x, class = c('line3d', class(x)))
> proj(line3d(pi))
proj.line3d.numeric
[1] 3.141593
attr(,"class")
[1] "line3d"  "numeric"

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] R 4.4.0 has version of Matrix 1.7-0, but it's not available on CRAN

2024-04-26 Thread Ivan Krylov via R-devel
On Fri, 26 Apr 2024 13:15:47 +0200
Gábor Csárdi  wrote:

> That's not how this worked in the past AFAIR. Simply, the packages in
> the x.y.z/Recommended directories were included in
> src/contrib/PACKAGES*, metadata, with the correct R version
> dependencies, in the correct order, so that `install.packages()`
> automatically installed the correct version without having to add
> extra repositories or manually search for package files.

That's great, then there is no need to patch anything. Thanks for
letting me know.

Should we be asking c...@r-project.org to add 4.4.0/Recommended to the
index, then?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R 4.4.0 has version of Matrix 1.7-0, but it's not available on CRAN

2024-04-26 Thread Ivan Krylov via R-devel
On Fri, 26 Apr 2024 12:32:59 +0200
Martin Maechler  wrote:

> Finally, I'd think it definitely would be nice for
> install.packages("Matrix") to automatically get the correct
> Matrix version from CRAN ... so we (R-core) would be grateful
> for a patch to install.packages() to achieve this

Since the binaries offered on CRAN are already of the correct version
(1.7-0 for -release and -devel), only source package installation needs
to concern itself with the Recommended subdirectory.

Would it be possible to generate the PACKAGES* index files in the
4.4.0/Recommended subdirectory? Then on the R side it would be needed
to add a new repo (adjusting chooseCRANmirror() to set it together with
repos["CRAN"]) and keep the rest of the machinery intact.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Some, but not all vignettes compressed

2024-04-25 Thread Ivan Krylov via R-package-devel
В Thu, 25 Apr 2024 11:54:49 -0700
Bryan Hanson  пишет:

> So my version of gs blows things up!

The relatively good news is that GhostScript is not solely to blame. A
fresh build of "GPL Ghostscript 10.03.0 (2024-03-06)" was able to
reduce the files to 16..70% of their original size on my computer. But
I just typed ./configure && make and relied on the dependencies already
present on my system.

We can try to compare the build settings (which will involve compiling
things by hand) or ask the Homebrew people [*] (and they will probably
ask for a PDF file and a specific command line that works on some
builds of gs-10.03.0 but not with Homebrew).

What would you rather do?

qpdf, on the other hand, results in no size reduction (99.7% or worse),
just like on your system.

-- 
Best regards,
Ivan

[*]
https://docs.brew.sh/Troubleshooting
https://github.com/Homebrew/homebrew-core/issues?q=ghostscript

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Some, but not all vignettes compressed

2024-04-25 Thread Ivan Krylov via R-package-devel
В Thu, 25 Apr 2024 08:54:41 -0700
Bryan Hanson  пишет:

>   'gs+qpdf' made some significant size reductions:
>  compacted 'Vig_02_Conceptual_Intro_PCA.pdf' from 432Kb to 143Kb
>  compacted 'Vig_03_Step_By_Step_PCA.pdf' from 414Kb to 101Kb
>  compacted 'Vig_04_Scores_Loadings.pdf' from 334Kb to 78Kb
>  compacted 'Vig_06_Math_Behind_PCA.pdf' from 558Kb to 147Kb
>  compacted 'Vig_07_Functions_PCA.pdf' from 381Kb to 90Kb

I'm getting similar (but not same) results on Debian Stable, gs 10.00.0
& qpdf 11.3.0:

# R CMD build --no-resave-data --compact-vignettes=both
compacted ‘Vig_01_Start_Here.pdf’ from 244Kb to 45Kb   
compacted ‘Vig_02_Conceptual_Intro_PCA.pdf’ from 432Kb to 143Kb
compacted ‘Vig_03_Step_By_Step_PCA.pdf’ from 411Kb to 100Kb
compacted ‘Vig_04_Scores_Loadings.pdf’ from 335Kb to 78Kb  
compacted ‘Vig_05_Visualizing_PCA_3D.pdf’ from 679Kb to 478Kb  
compacted ‘Vig_06_Math_Behind_PCA.pdf’ from 556Kb to 145Kb 
compacted ‘Vig_07_Functions_PCA.pdf’ from 378Kb to 89Kb
compacted ‘Vig_08_Notes.pdf’ from 239Kb to 39Kb

 
> - doc/Vig_01_Start_Here.pdf:gs: res=0;  + qpdf: res=0; 
> ==> (new=49942)/(old=45101) = 1.10734 .. not worth using  
> - doc/Vig_02_Conceptual_Intro_PCA.pdf:gs: res=0;  + qpdf: res=0; 
> ==> (new=1.00061e+07)/(old=442210) = 22.6275 .. not worth using  
> - doc/Vig_03_Step_By_Step_PCA.pdf:gs: res=0;  + qpdf: res=0; 
> ==> (new=5.763e+06)/(old=423484) = 13.6085 .. not worth using  
> - doc/Vig_04_Scores_Loadings.pdf:gs: res=0;  + qpdf: res=0; 
> ==> (new=5.41409e+06)/(old=341680) = 15.8455 .. not worth using  
> - doc/Vig_05_Visualizing_PCA_3D.pdf:gs: res=0;  + qpdf: res=0; 
> ==> (new=1.23622e+07)/(old=692901) = 17.8412 .. not worth using  
> - doc/Vig_06_Math_Behind_PCA.pdf:gs: res=0;  + qpdf: res=0; 
> ==> (new=816690)/(old=571493) = 1.42905 .. not worth using  
> - doc/Vig_07_Functions_PCA.pdf:gs: res=0;  + qpdf: res=0; 
> ==> (new=1.36419e+06)/(old=389478) = 3.50262 .. not worth using  
> - doc/Vig_08_Notes.pdf:gs: res=0;  + qpdf: res=0; 
> ==> (new=40919)/(old=38953) = 1.05047 .. not worth using  

Thank you for providing this data! Somehow, instead of compacting the
PDFs, one of the tools manages to blow them up in size, as much as ~23
times.

Can you try tools::compactPDF() separately with gs_quality = 'none'
(isolating qpdf) and with qpdf = '' (isolating GhostScript)?

If the culprit turns out to be GhostScript, it may be due to their
rewritten PDF rendering engine (now in C instead of PostScript with
special extensions) not being up to par when the PDF file needs to be
compressed. If it turns out to be qpdf, we might have to extract the
exact command lines and compare results further.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Big speedup in install.packages() by re-using connections

2024-04-25 Thread Ivan Krylov via R-devel
On Thu, 25 Apr 2024 14:45:04 +0200
Jeroen Ooms  wrote:

> Thoughts?

How verboten would it be to create an empty external pointer object,
add it to the preserved list, and set an on-exit finalizer to clean up
the curl multi-handle? As far as I can tell, the internet module is not
supposed to be unloaded, so this would not introduce an opportunity to
jump to an unmapped address. This makes it possible to avoid adding a
CurlCleanup() function to the internet module:

Index: src/modules/internet/libcurl.c
===
--- src/modules/internet/libcurl.c  (revision 86484)
+++ src/modules/internet/libcurl.c  (working copy)
@@ -55,6 +55,47 @@
 
 static int current_timeout = 0;
 
+// The multi-handle is shared between downloads for reusing connections
+static CURLM *shared_mhnd = NULL;
+static SEXP mhnd_sentinel = NULL;
+
+static void cleanup_mhnd(SEXP ignored)
+{
+if(shared_mhnd){
+curl_multi_cleanup(shared_mhnd);
+shared_mhnd = NULL;
+}
+curl_global_cleanup();
+}
+static void rollback_mhnd_sentinel(void* sentinel) {
+// Failed to allocate memory while registering a finalizer,
+// therefore must release the object
+R_ReleaseObject((SEXP)sentinel);
+}
+static CURLM *get_mhnd(void)
+{
+if (!mhnd_sentinel) {
+  SEXP sentinel = PROTECT(R_MakeExternalPtr(NULL, R_NilValue, R_NilValue));
+  R_PreserveObject(sentinel);
+  UNPROTECT(1);
+  // Avoid leaking the sentinel before setting the finalizer
+  RCNTXT cntxt;
+  begincontext(, CTXT_CCODE, R_NilValue, R_BaseEnv, R_BaseEnv,
+   R_NilValue, R_NilValue);
+  cntxt.cend = _mhnd_sentinel;
+  cntxt.cenddata = sentinel;
+  R_RegisterCFinalizerEx(sentinel, cleanup_mhnd, TRUE);
+  // Succeeded, no need to clean up if endcontext() fails allocation
+  mhnd_sentinel = sentinel;
+  cntxt.cend = NULL;
+  endcontext();
+}
+if(!shared_mhnd) {
+  shared_mhnd = curl_multi_init();
+}
+return shared_mhnd;
+}
+
 # if LIBCURL_VERSION_MAJOR < 7 || (LIBCURL_VERSION_MAJOR == 7 && 
LIBCURL_VERSION_MINOR < 28)
 
 // curl/curl.h includes  and headers it requires.
@@ -565,8 +606,6 @@
if (c->hnd && c->hnd[i])
curl_easy_cleanup(c->hnd[i]);
 }
-if (c->mhnd)
-   curl_multi_cleanup(c->mhnd);
 if (c->headers)
curl_slist_free_all(c->headers);
 
@@ -668,7 +707,7 @@
c.headers = headers = tmp;
 }
 
-CURLM *mhnd = curl_multi_init();
+CURLM *mhnd = get_mhnd();
 if (!mhnd)
error(_("could not create curl handle"));
 c.mhnd = mhnd;


-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: Is ALTREP "non-API"?

2024-04-25 Thread Ivan Krylov via R-devel
On Wed, 24 Apr 2024 15:31:39 -0500 (CDT)
luke-tierney--- via R-devel  wrote:

> We would be better off (in my view, not necessarily shared by others
> in R-core) if we could get to a point where:
> 
>  all entry points listed in installed header files can be used in
>  packages, at least with some caveats;
> 
>  the caveats are expressed in a standard way that is searchable,
>  e.g. with a standardized comment syntax at the header file or
>  individual declaration level.

This sounds almost like Doxygen, although the exact syntax used to
denote the entry points and the necessary comments is far from the most
important detail at this point.

> There are some 500 entry points in the R shared library that are in
> the installed headers but not mentioned in WRE. These would need to
> be reviewed and adjusted.

Is there a way for outsiders to help? For example, would it help to
produce the linking graph (package P links to entry points X, Y)? I
understand that an entry point being unpopular doesn't mean it
shouldn't be public (and the other way around), but combined with a
list of entry points that are listed in WRE, such a graph could be
useful to direct effort or estimate impact from interface changes.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] View() segfaulting ...

2024-04-25 Thread Ivan Krylov via R-devel
On Wed, 24 Apr 2024 19:35:42 -0400
Ben Bolker  wrote:

>  I'm using bleeding-edge R-devel, so maybe my build is weird. Can 
> anyone else reproduce this?
> 
>View() seems to crash on just about anything.

Not for me, sorry.

If you have a sufficiently new processor, you can use `rr` [*] to
capture the crash, set a breakpoint in in_R_X11_dataviewer and rewind,
then set a watchpoint on the stack canary and run the program forward
again:
https://www.redhat.com/en/blog/debugging-stack-protector-failures

If you can't locate the canary, try setting watchpoints on large local
variables. Without `rr`, the procedure is probably the same, but
without rewinding: set a breakpoint in in_R_X11_dataviewer, set some
watchpoints, see if they fire when they shouldn't, start from scratch
if you get past the watchpoints and the process crashes.

I think that that either an object file didn't get rebuilt when it
should have, or a shared library used by something downstream from
View() got an ABI-breaking update. If this still reproduces with a clean
rebuild of R, it's definitely worth investigating further, perhaps using
AddressSanitizer. Valgrind may be lacking the information about the
stack canary and thus failing to distinguish between overwriting the
canary and normal access to a stack variable via a pointer.

-- 
Best regards,
Ivan

[*] https://rr-project.org/
Edit distance of one from the domain name of the R project!

Use rr replay -g $EVENT_NUMBER to debug past the initial execve()
from the shell wrapper: https://github.com/rr-debugger/rr/wiki/FAQ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] [External] Re: Package submission to CRAN not passing incoming checks

2024-04-24 Thread Ivan Krylov via R-package-devel
В Wed, 24 Apr 2024 00:17:28 +
"Petersen, Isaac T"  пишет:

> I included the packages (including the raw package folders and their
> .tar.gz files) in the /inst/extdata folder.

Would you prefer your test to install them from the source directories
(as you currently do, in which case the *.tar.gz files can be omitted)
or the *.tar.gz files (in which case you can set the `repos` argument
to a file:/// URI and omit the package directories and the setwd()
calls)?

I think (but haven't tested) that the two problems that are currently
breaking your test are with .libPaths() and setwd().

.libPaths(temp_lib) overwrites the library paths with `temp_lib` and
the system libraries, the ones in %PROGRAMFILES%\R\R-*\library. In
particular, this removes %LOCALAPPDATA%\R\win-library\* from the list
of library paths, so the packages installed by the user (including
'waldo', which is needed by 'testthat') stop being available.

In order to add temp_lib to the list of the paths, use
.libPaths(c(temp_lib, .libPaths())).

Since setwd() returns the previous directory, one that was current
before setwd() was called, the code newpath <- setwd(filepath);
setwd(newpath) will keep the current directory, not set it to
`filepath`. Use oldpath <- setwd(filepath) instead.

Since you're already using 'testthat' and it already depends on
'withr', you may find it easier to use withr::local_dir(...) and
withr::local_temp_libpaths(...).

In order to test for a package being attached by load_or_install() (and
not just installed and loadable), check for 'package:testpackage1'
being present in the return value of search(). (This check is good
enough and much easier to write than comparing environments on the
search path with the package exports or comparing searchpaths() with
the paths under the temporary library.)

Finally, I think that there is no need for the test_load_or_install()
call because I don't see the function being defined anywhere. Doesn't
test_that(...) run the tests by itself?

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Package submission to CRAN not passing incoming checks

2024-04-23 Thread Ivan Krylov via R-package-devel
Dear Isaac,

В Mon, 22 Apr 2024 17:00:27 +
"Petersen, Isaac T"  пишет:

> This my first post--I read the posting guidelines, but my apologies
> in advance if I make a mistake.

Welcome to R-package-devel! You're doing just fine.

> 1) The first note <...> includes the contents of the LICENSE file

It's multiple NOTEs in a trench coat. Kasper has addressed the "large
version components" and the DOIs interpreted as file URIs, but there's
one more.

The ' + file LICENSE' syntax has two uses: (1)
for when the terms of the license is a template, requiring the author
of the software to substitute some information (e.g. the year and the
copyright holder for MIT) and (2) for when a package puts additional
restrictions on the base license.

(Hmm. Only case (2) is currently described at
; case
(1) is only described inside the license files.)

The CRAN team has expressed a preference for the package authors not to
put 2 twisty little copies of standard licenses, all slightly
different, inside their packages. Since you're not restricting CC BY
4.0, it's enough to say 'License: CC BY 4.0'. If you'd like a full copy
of the license text in your source code repository, that's fine, but
you'll need to list the file in .Rbuildignore:
https://cran.r-project.org/doc/manuals/R-exts.html#Building-package-tarballs

Speaking of the Creative Commons license: the choice of a license for
your code is obviously yours, but Creative Commons themselves recommend
against using their licenses for software:
.
I can't recommend you a license - that would be politically motivated
meddling in foreign affairs - but the lists linked by the CC FAQ and
Writing R Extensions section 1.1.2 should provide a good starting point.

> Here are the results from win-builder:
> https://win-builder.r-project.org/incoming_pretest/petersenlab_0.1.2-9033_20240415_212322/

There is one more NOTE:

>> * checking examples ... [437s/438s] NOTE
>> Examples with CPU (user + system) or elapsed time > 5s
>>user system elapsed
>> load_or_install 349.410 37.410 387.233
>> vwReg35.199  0.379  35.606
 
The examples are not only for the user to read in the help page; they
are also for the user to run example(vwReg) and see your code in action
(and for R CMD check to see whether they crash, including regularly on
CRAN).

For vwReg, try reducing the number of regressions you are running
(since your dataset is mtcars, which is already very compact).

For load_or_install, we have the additional issue that running
example(load_or_install) modifies the contents of the R library and the
search path, which belong to the user. The CRAN policy forbids such
modifications: 

Examples in general should change as little of the global state of the
R session and the underlying computer as possible. I suggest wrapping
the example in \dontrun{} (since everything about load_or_install() is
about altering global state) and creating a test for the function in
tests/*.R.

The test should set up a new library under tempdir(), run
load_or_install(), check the outcomes (that the desired package is
attached, etc.) and clean up after itself. There's also the matter of
the package not failing without a connection to the Internet, which is
another CRAN policy requirement. You might have to bring a very small
test package in inst/extdata just for load_or_install() to install and
load it, so that R CMD check won't fail when running offline.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Old references in the Description file.

2024-04-11 Thread Ivan Krylov via R-package-devel
В Thu, 11 Apr 2024 11:57:00 +
Gabriel Constantino Blain  пишет:

> The problem is that it is a paper from the 70's (Priestley and
> Taylor, 1972) and its DOI has very uncommon symbols, such as <>. The
> DOI is: 10.1175/1520-0493(1972)100<0081:OTAOSH>2.3.CO;2.

Since the R CMD check function responsible for locating and checking
the DOIs from the package metadata expects to see them URL-encoded, it
should be possible to put your DOI through paste0('') in order to generate the correct link.

Another workaround is to generate a shortDOI that would redirect to the
same place as the original DOI:
https://shortdoi.org/10.1175/1520-0493(1972)100%3C0081:OTAOSH%3E2.3.CO;2
Now  should work like the original DOI.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Question about CRAN submission resulting in 1 note

2024-04-10 Thread Ivan Krylov via R-package-devel
В Wed, 10 Apr 2024 14:11:53 +
Chris Knoll  пишет:

> For "Package has VignetteBuilder field but no prebuilt vignette
> index", how would this be resolved?

The package at https://github.com/OHDSI/CirceR/ doesn't seem to have any
vignettes. Without vignettes, there's no need for VignetteBuilder:
knitr.

> For "Package ahs FOSS license, installs .class/.jar but has no 'java
> directory'':  This is custom code that I've written in Java plus has
> a few maven dependencies and I'm not sure if they are asking me to
> bundle the source code of all Java dependencies (that have classes in
> the jar file).   That could be hard to do, and was hoping if anyone
> had experience in this, is it enough to put into the Readme where
> such source code could be found?

Here's what the policy has to say:

>> For Java .class and .jar files, the sources should be in a top-level
>> java directory in the source package (or that directory should
>> explain how they can be obtained).



At the very least, XLconnect seems to be fine supplying just the
README. If it's not too much trouble, shipping your custom source code
(definitely not all of the maven dependencies) would be the kind thing
to do, I think. (Feel free to disregard this part if a more experienced
Java package developer says otherwise.)

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Wish: a way to track progress of parallel operations

2024-04-09 Thread Ivan Krylov via R-devel
Dear Henrik (and everyone else):

Here's a patch implementing support for immediateConditions in
'parallel' socket clusters. What do you think?

I've tried to make the feature backwards-compatible in the sense that
an older R starting a newer cluster worker will not pass the flag
enabling condition passing and so will avoid being confused by packets
with type = 'CONDITION'.

In order to propagate the conditions in a timely manner, all 'parallel'
functions that currently use recvData() on individual nodes will have
to switch to calling recvOneData(). I've already adjusted
staticClusterApply(), but e.g. clusterCall() would still postpone
immediateConditions from nodes later in the list (should they appear).

If this is deemed a good way forward, I can prepare a similar patch for
the MPI and socket clusters implemented in the 'snow' package.

-- 
Best regards,
Ivan
Index: src/library/parallel/R/clusterApply.R
===
--- src/library/parallel/R/clusterApply.R	(revision 86373)
+++ src/library/parallel/R/clusterApply.R	(working copy)
@@ -28,8 +28,12 @@
 end <- min(n, start + p - 1L)
 	jobs <- end - start + 1L
 for (i in 1:jobs)
-sendCall(cl[[i]], fun, argfun(start + i - 1L))
-val[start:end] <- lapply(cl[1:jobs], recvResult)
+sendCall(cl[[i]], fun, argfun(start + i - 1L),
+ tag = start + i - 1L)
+for (i in 1:jobs) {
+d <- recvOneResult(cl)
+val[d$tag] <- list(d$value)
+}
 start <- start + jobs
 }
 checkForRemoteErrors(val)
Index: src/library/parallel/R/snow.R
===
--- src/library/parallel/R/snow.R	(revision 86373)
+++ src/library/parallel/R/snow.R	(working copy)
@@ -120,7 +120,8 @@
 rprog = file.path(R.home("bin"), "R"),
 snowlib = .libPaths()[1],
 useRscript = TRUE, # for use by snow clusters
-useXDR = TRUE)
+useXDR = TRUE,
+forward_conditions = TRUE)
 defaultClusterOptions <<- addClusterOptions(emptyenv(), options)
 }
 
Index: src/library/parallel/R/snowSOCK.R
===
--- src/library/parallel/R/snowSOCK.R	(revision 86373)
+++ src/library/parallel/R/snowSOCK.R	(working copy)
@@ -32,6 +32,7 @@
 methods <- getClusterOption("methods", options)
 useXDR <- getClusterOption("useXDR", options)
 homogeneous <- getClusterOption("homogeneous", options)
+forward_conditions <- getClusterOption('forward_conditions', options)
 
 ## build the local command for starting the worker
 env <- paste0("MASTER=", master,
@@ -40,7 +41,8 @@
  " SETUPTIMEOUT=", setup_timeout,
  " TIMEOUT=", timeout,
  " XDR=", useXDR,
- " SETUPSTRATEGY=", setup_strategy)
+ " SETUPSTRATEGY=", setup_strategy,
+ " FORWARDCONDITIONS=", forward_conditions)
 ## Should cmd be run on a worker with R <= 4.0.2,
 ## .workRSOCK will not exist, so fallback to .slaveRSOCK
 arg <- "tryCatch(parallel:::.workRSOCK,error=function(e)parallel:::.slaveRSOCK)()"
@@ -130,17 +132,26 @@
 sendData.SOCKnode <- function(node, data) serialize(data, node$con)
 sendData.SOCK0node <- function(node, data) serialize(data, node$con, xdr = FALSE)
 
-recvData.SOCKnode <- recvData.SOCK0node <- function(node) unserialize(node$con)
+recvData.SOCKnode <- recvData.SOCK0node <- function(node) repeat {
+val <- unserialize(node$con)
+if (val$type != 'CONDITION') return(val)
+signalCondition(val$value)
+}
 
 recvOneData.SOCKcluster <- function(cl)
 {
 socklist <- lapply(cl, function(x) x$con)
 repeat {
-ready <- socketSelect(socklist)
-if (length(ready) > 0) break;
+repeat {
+ready <- socketSelect(socklist)
+if (length(ready) > 0) break;
+}
+n <- which.max(ready) # may need rotation or some such for fairness
+value <- unserialize(socklist[[n]])
+if (value$type != 'CONDITION')
+return(list(node = n, value = value))
+signalCondition(value$value)
 }
-n <- which.max(ready) # may need rotation or some such for fairness
-list(node = n, value = unserialize(socklist[[n]]))
 }
 
 makePSOCKcluster <- function(names, ...)
@@ -349,6 +360,7 @@
 timeout <- 2592000L   # wait 30 days for new cmds before failing
 useXDR <- TRUE# binary serialization
 setup_strategy <- "sequential"
+forward_conditions <- FALSE
 
 for (a in commandArgs(TRUE)) {
 ## Or use strsplit?
@@ -365,6 +377,9 @@
SETUPSTRATEGY = {
setup_strategy <- match.arg(value,
c("sequential", 

Re: [R-pkg-devel] Linking Tutorial Site to CRAN Package site.

2024-04-07 Thread Ivan Krylov via R-package-devel
В Sat, 6 Apr 2024 18:27:24 +
"Ruff, Sergej"  пишет:

> The CRAN site
> (https://cran.r-project.org/web/packages/RepeatedHighDim/index.html)
> has a "documentation" part with the refrence pdf.
> 
> Can I link to our tutorial site (https://software.klausjung-lab.de/.)
> under documentation?

Since your tutorial is relatively short and contains R code intermixed
with the results of running it, it could make a great vignette.
Vignettes are linked on the CRAN page for a package right under the
PDF reference manual. For example, the BiocManager package has one
vignette: https://cran.r-project.org/package=BiocManager

Vignettes are a part of the package and their code is automatically
checked together with your examples. For the users of your package,
this will help keep the tutorial available (even if the website moves
in the future) and compatible with the current version of the package
(even if the package evolves and the tutorial website evolves together
with it).

R has built-in support for PDF vignettes via LaTeX using Sweave [*].
HTML vignettes can be much more accessible than PDF files, but there is
no built-in HTML vignette engine in R [**]. The 'markdown' package is
reasonably lightweight and has an HTML vignette engine. Markdown tries
to be a superset of HTML, so it should be possible to keep most of your
original HTML, including the styling, while rewriting the tutorial as
an executable vignette.

-- 
Best regards,
Ivan

[*]
https://cran.r-project.org/doc/manuals/R-exts.html#Writing-package-vignettes

[**]
It's possible to write a crude HTML vignette engine in ~100 lines of R
code, but we cannot expect every package author to do that.

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Bug in out-of-bounds assignment of list object to expression() vector

2024-04-05 Thread Ivan Krylov via R-devel
On Fri, 5 Apr 2024 08:15:20 -0400
June Choe  wrote:

> When assigning a list to an out of bounds index (ex: the next, n+1
> index), it errors the same but now changes the values of the vector
> to NULL:
> 
> ```
> x <- expression(a,b,c)
> x[[4]] <- list() # Error
> x
> #> expression(NULL, NULL, NULL)  
> ```
> 
> Curiously, this behavior disappears if a prior attempt is made at
> assigning to the same index, using a different incompatible object
> that does not share this bug (like a function)

Here's how the problem happens:

1. The call lands in src/main/subassign.c, do_subassign2_dflt().

2. do_subassign2_dflt() calls SubassignTypeFix() to prepare the operand
for the assignment.

3. Since the assignment is "stretching", SubassignTypeFix() calls
EnlargeVector() to provide the space for the assignment.

The bug relies on `x` not being IS_GROWABLE(), which may explain 
why a plain x[[4]] <- list() sometimes doesn't fail.

The future assignment result `x` is now expression(a, b, c, NULL), and
the old `x` set to expression(NULL, NULL, NULL) by SET_VECTOR_ELT(newx,
i, VECTOR_ELT(x, i)); CLEAR_VECTOR_ELT(x, i); during EnlargeVector().

4. But then the assignment fails, raising the error back in
do_subassign2_dflt(), because the assignment kind is invalid: there is
no way to put data.frames into an expression vector. The new resized
`x` is lost, and the old overwritten `x` stays there.

Not sure what the right way to fix this is. It's desirable to avoid
shallow_duplicate(x) for the overwriting assignments, but then the
sub-assignment must either succeed or leave the operand untouched.
Is there a way to perform the type check before overwriting the operand?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] hand compile; link to MKL fails at BLAS zdotu

2024-03-30 Thread Ivan Krylov via R-devel
В Sat, 30 Mar 2024 20:31:25 +0300
Ivan Krylov via R-devel  пишет:

> It seems to crash inside MKL!

Should have read some more about mkl_gf_lp64 before posting. According
to the Intel forums, it is indeed required in order to work with the
GFortran calling convention, but if you're linking against it, you also
have to add the rest of the linker command line, i.e.:

-lmkl_gf_lp64 -lmkl_core -lmkl_sequential 
-Wl,--no-as-needed -lpthread -lm -ldl

https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/ARPACK-with-MKL-crashes-when-calling-zdotc/m-p/1054316

Maybe it's even documented somewhere, but Intel makes it too annoying
to read their documentation, and they definitely don't mention it in
the link line advisor. There's also the ominous comment saying that

>> you cannot call standard BLAS [c,z]dot[c,u] functions from C/C++
>> because the interface library that is linked is specific for
>> GFortran which has a different calling convention of returning a
>> Complex type and would cause issues

I'm not seeing any calls to [c,z]dot[c,u] from inside R's C code (which
is why R seems to work when running with libmkl_rt.so), and the
respective declarations in R_ext/BLAS.h have an appropriate warning:

>> WARNING!  The next two return a value that may not be compatible
>> between C and Fortran, and even if it is, this might not be the
>> right translation to C.

...so it's likely that everything will keep working.

Indeed, R configured with

--with-blas='-lmkl_gf_lp64 -lmkl_core -lmkl_sequential'
--with-lapack='-lmkl_gf_lp64 -lmkl_core -lmkl_sequential'

seems to work with MKL.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] hand compile; link to MKL fails at BLAS zdotu

2024-03-30 Thread Ivan Krylov via R-devel
В Sat, 30 Mar 2024 10:55:48 +
Ramón Fallon  пишет:

> In contrast to Dirk's solution, I've found R's configure script
> doesn't recognise the update-alternatives system on debian/ubuntu, if
> it's MKL.

It ought to work if configured with --with-blas=-lblas
--with-lapack=-llapack, but, as you found out (and I can confirm), if
libblas.so and liblapack.so already point to MKL, ./configure somehow
fails the test for zdotu and falls back to bundled Rblas and Rlapack.

If you'd like the built R to work with the update-alternatives system,
the workaround seems to help is to temporarily switch the alternatives
to reference BLAS & LAPACK, configure and build R, and then switch the
alternatives back to MKL.

> appending "-lmkl_gf_lp64" to the --with-blas option does not help
> (that's suggested by several posts out there).

MKL has an official "link line advisor" at
,
which may suggest a completely different set of linker options
depending on what it is told. Here's how R's zdotu test always fails
when linking directly with MKL:

# pre-configure some variables
echo '#define HAVE_F77_UNDERSCORE 1' > confdefs.h
FC=gfortran
FFLAGS='-g -Og'
CC=gcc
CFLAGS='-g -Og'
CPPFLAGS=-I/usr/local/include
MAIN_LDFLAGS='-Wl,--export-dynamic -fopenmp'
LDFLAGS='-L/usr/local/lib'
LIBM=-lm
FLIBS=' -lgfortran -lm -lquadmath'
# copied & pasted from the Intel web page
BLAS_LIBS='-lmkl_rt -Wl,--no-as-needed -lpthread -lm -ldl'

# R prepares to call zdotu from Fortran...
cat > conftestf.f < 1.0d-10) then
iflag = 1
  else
iflag = 0
  endif
  end
EOF
${FC} ${FFLAGS} -c conftestf.f

# and then call the Fortran subroutine from the C runner...
cat > conftest.c <
#include "confdefs.h"
#ifdef HAVE_F77_UNDERSCORE
# define F77_SYMBOL(x)   x ## _
#else
# define F77_SYMBOL(x)   x
#endif
extern void F77_SYMBOL(test1)(int *iflag);

int main () {
  int iflag;
  F77_SYMBOL(test1)();
  exit(iflag);
}
EOF
${CC} ${CPPFLAGS} ${CFLAGS} -c conftest.c

# and then finally link and execute the program
${CC} ${CPPFLAGS} ${CFLAGS} ${LDFLAGS} ${MAIN_LDFLAGS} \
 -o conftest conftest.o conftestf.o \
 ${BLAS_LIBS} ${FLIBS} ${LIBM}
./conftest

It seems to crash inside MKL!

rax=cccd rbx=5590ee102008 rcx=7ffdab2ddb20 
rdx=5590ee102008 
rsi=7ffdab2ddb18 rdi=5590ee10200c rbp=7ffdab2dd910 
rsp=7ffdab2db600 
 r8=5590ee102008  r9=7ffdab2ddb28 r10=7f4086a99178 
r11=7f4086e02490 
r12=5590ee10200c r13=7ffdab2ddb20 r14=5590ee102008 
r15=7ffdab2ddb28 
ip = 7f4086e02a60, sp = 7ffdab2db600 [mkl_blas_zdotu+1488]
ip = 7f4085dc5250, sp = 7ffdab2dd920 [zdotu+256]
ip = 5590ee1011cc, sp = 7ffdab2ddb40 [test1_+91]
ip = 5590ee101167, sp = 7ffdab2ddb70 [main+14]

It's especially strange that R does seem to work if you just
update-alternatives after linking it with the reference BLAS, but
./conftest starts crashing again in the same place. This is with
Debian's MKL version 2020.4.304-4, by the way.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] paths capability FALSE on devel?

2024-03-27 Thread Ivan Krylov via R-devel
В Wed, 27 Mar 2024 11:28:17 +0100
Alexandre Courtiol  пишет:

> after installing R-devel the output of
> grDevices::dev.capabilities()$paths is FALSE, while it is TRUE for R
> 4.3.3

Your system must be missing Cairo development headers, making x11()
fall back to type = 'Xlib':

$ R-devel -q -s -e 'x11(); grDevices::dev.capabilities()$paths'
 [1] TRUE
$ R-devel -q -s -e \
 'x11(type="Xlib"); grDevices::dev.capabilities()$paths'
 [1] FALSE

If that's not the case and capabilities()['cairo'] is TRUE in your
build of R-devel, please show us the sessionInfo() from your build of
R-devel.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wish: a way to track progress of parallel operations

2024-03-26 Thread Ivan Krylov via R-devel
Henrik,

Thank you for taking the time to read and reply to my message!

On Mon, 25 Mar 2024 10:19:38 -0700
Henrik Bengtsson  wrote:

> * Target a solution that works the same regardless whether we run in
> parallel or not, i.e. the code/API should look the same regardless of
> using, say, parallel::parLapply(), parallel::mclapply(), or
> base::lapply(). The solution should also work as-is in other parallel
> frameworks.

You are absolutely right about mclapply(): it suffers from the same
problem where the task running inside it has no reliable mechanism of
reporting progress. Just like on a 'parallel' cluster (which can be
running on top of an R connection, MPI, the 'mirai' package, a server
pretending to be multiple cluster nodes, or something completely
different), there is currently no documented interface for the task to
report any additional data except the result of the computation.

> I argue the end-user should be able to decided whether they want to
> "see" progress updates or not, and the developer should focus on
> where to report on progress, but not how and when.

Agreed. As a package developer, I don't even want to bother calling
setTxtProgressBar(...), but it gets most of the job done at zero
dependency cost, and the users don't complain. The situation could
definitely be improved.

> It is possible to use the existing PSOCK socket connections to send
> such 'immediateCondition':s.

Thanks for pointing me towards ClusterFuture, that's a great hack, and
conditions are a much better fit for progress tracking than callbacks.

It would be even better if 'parallel' clusters could "officially"
handle immediateConditions and re-signal them in the main R session.
Since R-4.4 exports (but not yet documents) sendData, recvData and
recvOneData generics from 'parallel', we are still in a position to
codify and implement the change to the 'parallel' cluster back-end API.

It shouldn't be too hard to document the requirement that recvData() /
recvOneData() must signal immediateConditions arriving from the nodes
and patch the existing cluster types (socket and MPI). Not sure how
hard it will be to implement for 'mirai' clusters.

> I honestly think we could arrive at a solution where base-R proposes
> a very light, yet powerful, progress API that handles all of the
> above. The main task is to come up with a standard API/protocol -
> then the implementation does not matter.

Since you've already given it a lot of thought, which parts of
progressr would you suggest for inclusion into R, besides 'parallel'
clusters and mclapply() forwarding immediateConditions from the worker
processes?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Wish: a way to track progress of parallel operations

2024-03-25 Thread Ivan Krylov via R-devel
Hello R-devel,

A function to be run inside lapply() or one of its friends is trivial
to augment with side effects to show a progress bar. When the code is
intended to be run on a 'parallel' cluster, it generally cannot rely on
its own side effects to report progress.

I've found three approaches to progress bars for parallel processes on
CRAN:

 - Importing 'snow' (not 'parallel') internals like sendCall and
   implementing parallel processing on top of them (doSNOW). This has
   the downside of having to write higher-level code from scratch
   using undocumented inferfaces.

 - Splitting the workload into length(cluster)-sized chunks and
   processing them in separate parLapply() calls between updating the
   progress bar (pbapply). This approach trades off parallelism against
   the precision of the progress information: the function has to wait
   until all chunk elements have been processed before updating the
   progress bar and submitting a new portion; dynamic load balancing
   becomes much less efficient.

 - Adding local side effects to the function and detecting them while
   the parallel function is running in a child process (parabar). A
   clever hack, but much harder to extend to distributed clusters.

With recvData and recvOneData becoming exported in R-4.4 [*], another
approach becomes feasible: wrap the cluster object (and all nodes) into
another class, attach the progress callback as an attribute, and let
recvData / recvOneData call it. This makes it possible to give wrapped
cluster objects to unchanged code, but requires knowing the precise
number of chunks that the workload will be split into.

Could it be feasible to add an optional .progress argument after the
ellipsis to parLapply() and its friends? We can require it to be a
function accepting (done_chunk, total_chunks, ...). If not a new
argument, what other interfaces could be used to get accurate progress
information from staticClusterApply and dynamicClusterApply?

I understand that the default parLapply() behaviour is not very
amenable to progress tracking, but when running clusterMap(.scheduling
= 'dynamic') spanning multiple hours if not whole days, having progress
information sets the mind at ease.

I would be happy to prepare code and documentation. If there is no time
now, we can return to it after R-4.4 is released.

-- 
Best regards,
Ivan

[*] https://bugs.r-project.org/show_bug.cgi?id=18587

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] How to store large data to be used in an R package?

2024-03-25 Thread Ivan Krylov via R-package-devel
В Mon, 25 Mar 2024 11:12:57 +0100
Jairo Hidalgo Migueles  пишет:

> Specifically, this data consists of regression and random forest
> models crucial for making predictions within our R package.

Apologies for asking a silly question, but is there a chance that these
models are large by accident (e.g. because an object references a large
environment containing multiple copies of the training dataset)? Or it
is there really more than a million weights required to make
predictions?

> Initially, I attempted to save these models as internal data within
> the package. While this approach maintains functionality, it has led
> to a package size exceeding 20 MB. I'm concerned that this would
> complicate submitting the package to CRAN in the future.

The policy mentions the possibility of having a separate large
data-only package. Since CRAN strives to archive all package versions,
this data-only package will have to be updated as rarely as possible.
You will need to ask CRAN for approval.

If there is a significant amount of core functionality inside your
package that does *not* require the large data (so that it can still
be installed and used without the data), you can publish the data-only
package yourself (e.g. using the 'drat' package), put it in Suggests
and link to it in the Additional_repositories field of your DESCRIPTION.
Alternatively, you can publish the data on Zenodo and offer to download
it on first use. Make sure to (1) use tools::R_user_dir to determine
where to put the files, (2) only download the files after the user
explicitly agrees to it and (3) test as much of your package
functionality as possible without requiring the data to be downloaded.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Request for assistance: error in installing on Debian (undefined symbol: omp_get_num_procs) and note in checking the HTML versions (no command 'tidy' found, package 'V8' unavailable

2024-03-22 Thread Ivan Krylov via R-package-devel
В Thu, 21 Mar 2024 18:32:59 +
Annaig De-Walsche  пишет:

> If ever I condition the use of OpenMD directives, users will indeed
> be capable of installing the package, but they wont access to a
> performant version of the code, as it necessitates the use of OpenMD.
> Is there a method to explicitly express that the use of OpenMD is
> highly encouraged?

I think the most practical method would be to produce a
packageStartupMessage() from the .onAttach function of your package if
you detect that the package has been compiled without OpenMP support:
https://cran.r-project.org/doc/manuals/R-exts.html#Load-hooks

> In practical, how to know from R code if OpenMP is present or not?

Your C code will have to detect it and provide this information to the
R code. WRE 1.6.4 says:

>> [C]heck carefully that you have followed the advice in the
>> subsection on OpenMP support [WRE 1.2.1.1]. In particular, any use
>> of OpenMP in C/C++ code will need to use
>> 
>>  #ifdef _OPENMP
>>  # include 
>>  #endif



Similarly, any time you use #pragma omp ... or call
omp_set_num_threads(), it needs to be wrapped in #ifdef _OPENMP ...
#endif.

Additionally, it is important to make sure that during tests and
examples, your OpenMP code doesn't use more than two threads:
https://cran.r-project.org/web/packages/policies.html
This is in place because CRAN checks are run in parallel, and a package
that tries to helpfully use all of the processor cores would interfere
with other packages being checked at the same time.

>   [[alternative HTML version deleted]]

This mailing list removes HTML e-mails. If you compose your messages in
HTML, we only get the plain text version automatically prepared by your
mailer:
https://stat.ethz.ch/pipermail/r-package-devel/2024q1/010595.html

In order to preserve the content and the presentation of your messages,
it's best to compose them in plain text.

-- 
Très cordialement,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] help diagnosing win-builder failures

2024-03-17 Thread Ivan Krylov via R-package-devel
Hi,

This may need the help of Uwe Ligges to diagnose. I suspect this may be
related to the Windows machine having too much memory committed (as Uwe
has been able to pinpoint recently [*] about a package that failed to
compile some heavily templated C++), but there is not enough information
to give a conclusive diagnosis.

On Sun, 17 Mar 2024 14:01:33 -0400
Ben Bolker  wrote:

> 2. an ERROR running tests, where the output ends with a cryptic
> 
>Anova: ..
> 
> (please try to refrain from snarky comments about not using testthat
> ...)

Pardon my ignorance, but is it an option to upload a version of the
package that uses test_check(pkg, reporter=LocationReporter()) instead
of the summary reporter?

-- 
Best regards,
Ivan

[*] https://stat.ethz.ch/pipermail/r-package-devel/2024q1/010304.html

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Spurious warning in as.data.frame.factor()

2024-03-17 Thread Ivan Krylov via R-devel
On Fri, 15 Mar 2024 11:24:22 +0100
Martin Maechler  wrote:

> I think just adding
> 
>  removeGeneric('as.data.frame')
> 
> is appropriate here as it is self-explaining and should not leave
> much traces.

Thanks for letting me know! I'll make sure to use removeGeneric() in
similar cases in the future.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Removing import(methods) stops exporting S4 "meta name"

2024-03-15 Thread Ivan Krylov via R-package-devel
On Thu, 14 Mar 2024 16:06:50 -0400
Duncan Murdoch  wrote:

> Error in xj[i] : invalid subscript type 'list'
> Calls: join_inner -> data.frame -> [ -> [.data.table -> [.data.frame
> Execution halted

And here's how it happens:

join_inner calls xi[yi,on=by,nomatch=0] on data.tables xi and yi.

`[.data.table` calls cedta() to determine whether the calling
environment is data.table-aware. If the import of `.__T__[:base` is
removed, cedta() returns FALSE.

`[.data.table` then forwards the call to `[.data.frame`, which cannot
handle data.table-style subsetting.

This is warned about in
;
the 'do' package should have set the .datatable.aware = TRUE marker in
its environment. In fact, example(join_inner) doesn't raise an error
with the following changes when running with data.table commit f92aee69
(i.e. pre-#6001):

diff -rU2 do/NAMESPACE do_2.0.0.0.2/NAMESPACE
--- do/NAMESPACE2021-08-03 12:37:00.0 +0300
+++ do_2.0.0.0.2/NAMESPACE  2024-03-15 14:01:10.588561222 +0300
@@ -130,5 +130,4 @@
 export(upper.dir)
 export(write_xlsx)
-importFrom(data.table,`.__T__[:base`)
 importFrom(methods,as)
 importFrom(reshape2,melt)
diff -rU2 do/R/join.R do_2.0.0.0.2/R/join.R
--- do/R/join.R 2020-06-30 06:47:22.0 +0300
+++ do_2.0.0.0.2/R/join.R   2024-03-15 13:54:02.289440613 +0300
@@ -1,2 +1,4 @@
+.datatable.aware = TRUE
+
 #' @title Join two dataframes together
 #' @description Join two dataframes by the same id column.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Spurious warning in as.data.frame.factor()

2024-03-14 Thread Ivan Krylov via R-devel
On Thu, 14 Mar 2024 10:41:54 +0100
Martin Maechler  wrote:

> Anybody trying S7 examples and see if they work w/o producing
> wrong warnings?

It looks like this is not applicable to S7. If I overwrite
as.data.frame with a newly created S7 generic, it fails to dispatch on
existing S3 classes:

new_generic('as.data.frame', 'x')(factor(1))
# Error: Can't find method for `as.data.frame(S3)`.

But there is no need to overwrite the generic, because S7 classes
should work with existing S3 generics:

foo <- new_class('foo', parent = class_double)
method(as.data.frame, foo) <- function(x) structure(
 # this is probably not generally correct
 list(x),
 names = deparse1(substitute(x)),
 row.names = seq_len(length(x)),
 class = 'data.frame'
)
str(as.data.frame(foo(pi)))
# 'data.frame':   1 obs. of  1 variable:
#  $ x:  num 3.14

So I think that is nothing to break because S7 methods for
as.data.frame will rely on S3 for dispatch.

> > The patch passes make check-devel, but I'm not sure how to safely
> > put setGeneric('as.data.frame'); as.data.frame(factor(1:10)) in a
> > regression test.  
> 
> {What's the danger/problem?  we do have "similar" tests in both
>   src/library/methods/tests/*.R
>   tests/reg-S4.R
> 
>  -- maybe we can discuss bi-laterally  (or here, as you prefer)
> }

This might be educational for other people wanting to add a regression
test to their patch. I see that tests/reg-tests-1e.R is already running
under options(warn = 2), so if I add the following near line 750
("Deprecation of *direct* calls to as.data.frame.")...

# Should not warn for a call from a derivedDefaultMethod to the raw
# S3 method -- implementation detail of S4 dispatch
setGeneric('as.data.frame')
as.data.frame(factor(1))

...then as.data.frame will remain an S4 generic. Should the test then
rm(as.data.frame) and keep going? (Or even keep the S4 generic?) Is
there any hidden state I may be breaking for the rest of the test this
way? The test does pass like this, so this may be worrying about
nothing.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Spurious warning in as.data.frame.factor()

2024-03-13 Thread Ivan Krylov via R-devel
В Tue, 12 Mar 2024 12:33:17 -0700
Hervé Pagès  пишет:

> The acrobatics that as.data.frame.factor() is going thru in order to 
> recognize a direct call don't play nice if as.data.frame() is an S4 
> generic:
> 
>      df <- as.data.frame(factor(11:12))
> 
>      suppressPackageStartupMessages(library(BiocGenerics))
>      isGeneric("as.data.frame")
>      # [1] TRUE
> 
>      df <- as.data.frame(factor(11:12))
>      # Warning message:
>      # In as.data.frame.factor(factor(11:12)) :
>      #   Direct call of 'as.data.frame.factor()' is deprecated.

How about something like the following:

Index: src/library/base/R/zzz.R
===
--- src/library/base/R/zzz.R(revision 86109)
+++ src/library/base/R/zzz.R(working copy)
@@ -681,7 +681,14 @@
 bdy <- body(as.data.frame.vector)
 bdy <- bdy[c(1:2, seq_along(bdy)[-1L])] # taking [(1,2,2:n)] to insert at 
[2]:
 ## deprecation warning only when not called by method dispatch from 
as.data.frame():
-bdy[[2L]] <- quote(if((sys.nframe() <= 1L || !identical(sys.function(-1L), 
as.data.frame)))
+bdy[[2L]] <- quote(if((sys.nframe() <= 1L || !(
+   identical(sys.function(-1L), as.data.frame) || (
+   .isMethodsDispatchOn() &&
+   methods::is(sys.function(-1L), 'derivedDefaultMethod') &&
+   identical(
+   sys.function(-1L)@generic,
+   structure('as.data.frame', package = 'base')
+   )
.Deprecated(
msg = gettextf(
"Direct call of '%s()' is deprecated.  Use '%s()' or
'%s()' instead",

The patch passes make check-devel, but I'm not sure how to safely put
setGeneric('as.data.frame'); as.data.frame(factor(1:10)) in a
regression test.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] confusion over spellchecking

2024-03-13 Thread Ivan Krylov via R-package-devel
В Sun, 10 Mar 2024 13:55:43 -0400
Ben Bolker  пишет:

> I am working on a package and can't seem to get rid of a NOTE about
> 
> Possibly misspelled words in DESCRIPTION:
>glmmTMB (10:88)
>lme (10:82)
> 
> on win-builder.

Do you have these words anywhere else in the package (e.g. in the Rd
files)? It turns out that R has a special environment variable that
makes it ignore custom dictionaries specifically for DESCRIPTION:

>>## Allow providing package defaults but make this controllable via
>>##   _R_ASPELL_USE_DEFAULTS_FOR_PACKAGE_DESCRIPTION_
>>## to safeguard against possible mis-use for CRAN incoming checks.

I cannot see it used anywhere under the trunk/CRAN subdirectory in the
developer.r-project.org Subversion repo, but it could be set somewhere
else on Win-Builder.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Submission after archived version

2024-03-13 Thread Ivan Krylov via R-package-devel
В Mon, 11 Mar 2024 23:45:13 +0100
Nils Mechtel  пишет:

> Despite R CMD check not giving any errors or warnings, the package
> doesn’t pass the pre-tests:

If your question was more about the reasons for the difference between
your R CMD check and the pre-tests, most of it is due to --as-cran:

(Using commit ffe216d from https://github.com/nilsmechtel/MetAlyzer as
the basis for the example, which seems to be different from the
incoming pretest from the link you've shared.)

$ R-devel CMD check MetAlyzer_1.0.0.tar.gz
<...>
Status: OK  
$ R-devel CMD check --as-cran MetAlyzer_1.0.0.tar.gz
<...>
* checking for non-standard things in the check directory ... NOTE
Found the following files/directories: ‘metabolomics_data.csv’
<...>

It's less wasteful to run checks without --as-cran in CI (as you
currently do), but you need to perform additional testing before making
a release. The incoming pre-tests use a custom set of environment
variables that go a but further than just --as-cran:
https://svn.r-project.org/R-dev-web/trunk/CRAN/QA/Kurt/lib/R/Scripts/check_CRAN_incoming.R

In particular, _R_CHECK_CRAN_INCOMING_USE_ASPELL_=true enables the
check for words that are possibly misspelled:

(Using an extra environment variable because your package has been
already published and R filters out "misspellings" found in the CRAN
version of the package. Congratulations!)

$ env \
 _R_CHECK_CRAN_INCOMING_ASPELL_RECHECK_MAYBE_=FALSE \
 _R_CHECK_CRAN_INCOMING_USE_ASPELL_=true \
 R-devel CMD check --as-cran MetAlyzer_1.0.0.tar.gz
<...>
Possibly misspelled words in DESCRIPTION:
  metabolomics (15:78)
<...>

Yet another way to avoid false misspellings is to create a custom
dictionary:
http://dirk.eddelbuettel.com/blog/2017/08/10/#008_aspell_cran_incoming

$ mkdir -p .aspell
$ echo '
 Rd_files <- vignettes <- R_files <- description <- list(
  encoding = "UTF-8",
  language = "en",
  dictionaries = c("en_stats", "dictionary")
 )
' > .aspell/defaults.R
$ R -q -s -e '
 saveRDS(c(
  "metabolomics" # , extra words go here
 ), file.path(".aspell", "dictionary.rds"))
'
$ R CMD build .
$ env \
 _R_CHECK_CRAN_INCOMING_ASPELL_RECHECK_MAYBE_=FALSE \
 _R_CHECK_CRAN_INCOMING_USE_ASPELL_=true \
 R-devel CMD check --as-cran MetAlyzer_1.0.0.tar.gz
# No more "Possibly misspelled words in DESCRIPTION"!

Some day, this will be documented in Writing R Extensions, or maybe in
R Internals (where the other _R_CHECK_* variables are documented), or
perhaps in the CRAN policy. See also:
https://stat.ethz.ch/pipermail/r-package-devel/2024q1/010558.html

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Submission after archived version

2024-03-12 Thread Ivan Krylov via R-package-devel
В Mon, 11 Mar 2024 23:45:13 +0100
Nils Mechtel  пишет:

> Debian:
> 
> Status: 3 NOTEs

>> * checking CRAN incoming feasibility ... [4s/6s] NOTE

>> Possibly misspelled words in DESCRIPTION:
>>  metabolomics (36:78)

This one can be explained in the submission comment. The rest of the
NOTE is to be expected.

>> * checking DESCRIPTION meta-information ... NOTE
>> Author field differs from that derived from Authors@R

Just remove the Author: field from your DESCRIPTION and let R CMD build
automatically generate it from Authors@R.

>> * checking for non-standard things in the check directory ... NOTE
>> Found the following files/directories:
>>  ‘metabolomics_data.csv’

Make sure that when your tests and examples create files, they do so in
the session temp directory and then remove the files afterwards. If a
user had a valuable file named metabolomics_data.csv in the current
directory, ran example(...) and had it overwritten as a result, they
would be very unhappy.

The NOTEs on Windows are similar.

Good luck!

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] [EXTERN] Re: [EXTERN] Re: [EXTERN] Re: @doctype is deprecated. need help for r package documentation

2024-03-12 Thread Ivan Krylov via R-package-devel
В Mon, 11 Mar 2024 14:57:58 +
"Ruff, Sergej"  пишет:

> I uploaded the old version of the package to my repo:
> https://github.com/SergejRuff/boot

After installing this tarball, running RStudio and typing:

library(bootGSEA)
?bootGSEA

...I see the help page in RStudio's help tab, not in the browser. I
think this is the expected behaviour for RStudio.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] RFC: C backtraces for R CMD check via just-in-time debugging

2024-03-11 Thread Ivan Krylov via R-package-devel
Vladimir,

Thank you for the example and for sharing the ideas regarding
symbol-relative offsets!

On Thu, 7 Mar 2024 09:38:18 -0500 (EST)
Vladimir Dergachev  wrote:

>  unw_get_reg(, UNW_REG_IP, );

Is it ever possible for unw_get_reg() to fail (return non-zero) for
UNW_REG_IP? The documentation isn't being obvious about this. Then
again, if the process is so damaged it cannot even read the instruction
pointer from its own stack frame, any attempts at self-debugging must
be doomed.

>* this should work as a package, but I am not sure whether the
> offsets between package symbols and R symbols would be static or not.

Since package shared objects are mmap()ed into the address space and
(at least on Linux with ASLR enabled) mmap()s are supposed to be made
unpredictable, this offset ends up not being static. On Linux, R seems
to be normally built as a position-independent executable, so no matter
whether there is a libR.so, both the R base address and the package
shared object base address are randomised:

$ cat ex.c
#include 
#include 
void addr_diff(void) {
 ptrdiff_t diff = (char*)_diff - (char*)
 Rprintf("self - Rprintf = %td\n", diff);
}
$ R CMD SHLIB ex.c
$ R-dynamic -q -s -e 'dyn.load("ex.so"); .C("addr_diff");'
self - Rprintf = -9900928
$ R-dynamic -q -s -e 'dyn.load("ex.so"); .C("addr_diff");'
self - Rprintf = -15561600
$ R-static -q -s -e 'dyn.load("ex.so"); .C("addr_diff");'
self - Rprintf = 45537907472976
$ R-static -q -s -e 'dyn.load("ex.so"); .C("addr_diff");'
self - Rprintf = 46527711447632

>* R ought to know where packages are loaded, we might want to be
> clever and print out information on which package contains which
> function, or there might be identical R_init_RMVL() printouts.

That's true. Informaion on all registered symbols is available from
getLoadedDLLs().

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] [EXTERN] Re: [EXTERN] Re: @doctype is deprecated. need help for r package documentation

2024-03-07 Thread Ivan Krylov via R-package-devel
В Thu, 7 Mar 2024 20:27:29 +
"Ruff, Sergej"  пишет:

> I am refering to Rstudio. I checked the settings and type is set to
> "htlm", not text. And I was wondering why the package documentation
> opened in a browser when I used @doctype.

Do you still have the source package .tar.gz file for which ?bootGSEA
would start a browser from inside RStudio?

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] @doctype is deprecated. need help for r package documentation

2024-03-07 Thread Ivan Krylov via R-package-devel
В Thu, 7 Mar 2024 10:37:51 +
"Ruff, Sergej"  пишет:

> I noticed that when I try _?bootGSEA_ it goes to the help page in R
> itself but not to the html page

That's up to the user to choose. help(bootGSEA, help_type = 'html')
should get you to the HTML documentation; help(bootGSEA, help_type =
'text') should give you plain text. The default depends on
options(help_type=...). On Windows, you get a choice during
installation of R; this gets recorded in file.path(R.home('etc'),
'Rprofile.site').

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] RFC: C backtraces for R CMD check via just-in-time debugging

2024-03-07 Thread Ivan Krylov via R-package-devel
On Tue, 5 Mar 2024 18:26:28 -0500 (EST)
Vladimir Dergachev  wrote:

> I use libunwind in my programs, works quite well, and simple to use.
> 
> Happy to share the code if there is interest..

Do you mean that you use libunwind in signal handlers? An example on
how to produce a backtrace without calling any async-signal-unsafe
functions would indeed be greatly useful.

Speaking of shared objects injected using LD_PRELOAD, I've experimented
some more, and I think that none of them would work with R without
additional adjustments. They install their signal handler very soon
after the process starts up, and later, when R initialises, it
installs its own signal handler, overwriting the previous one. For this
scheme to work, either R would have to cooperate, remembering a pointer
to the previous signal handler and calling it at some point (which
sounds unsafe), or the injected shared object would have to override
sigaction() and call R's signal handler from its own (which sounds
extremely unsafe).

Without that, if we want C-level backtraces, we either need to patch R
to produce them (using backtrace() and limiting this to glibc systems
or using libunwind and paying the dependency cost) or to use a debugger.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] [External] [External] RcmdrPlugin.HH_1.1-48.tar.gz

2024-03-07 Thread Ivan Krylov via R-package-devel
On Wed, 6 Mar 2024 13:46:55 -0500
Duncan Murdoch  wrote:

> is this just a more or less harmless error, thinking that 
> the dot needs escaping

I think it's this one. You are absolutely right that the dot doesn't
need escaping in either TRE (which is what's used inside exportPattern)
or PCRE. In PRCE, this regular expression would have worked as intended:

# We do match backslashes by mistake.
grepl('[\\.]', '\\')
# [1] TRUE

# In PCRE, this wouldn't have been a mistake.
grepl('[\\.]', c('\\', '.'), perl = TRUE)
# [1] FALSE TRUE

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[Rd] Never exporting .__global__ and .__suppressForeign__?

2024-03-06 Thread Ivan Krylov via R-devel
Hello,

(Dear Richard, I hope you don't mind being Cc:'d on this thread in
R-devel. This is one of the ways we can prevent similar problems from
happening in the future.)

Sometimes, package authors who use both exportPattern('.') and
utils::globalVariables(...) get confusing WARNINGs about undocumented
exports:
https://stat.ethz.ch/pipermail/r-package-devel/2024q1/010531.html

I would like to suggest adding the variables used by
utils::globalVariables and utils::suppressForeignCheck to the list of
things that should never be exported:

Index: src/library/base/R/namespace.R
===
--- src/library/base/R/namespace.R  (revision 86054)
+++ src/library/base/R/namespace.R  (working copy)
@@ -806,7 +806,8 @@
 if (length(exports)) {
 stoplist <- c(".__NAMESPACE__.", ".__S3MethodsTable__.",
   ".packageName", ".First.lib", ".onLoad",
-  ".onAttach", ".conflicts.OK", ".noGenerics")
+  ".onAttach", ".conflicts.OK", ".noGenerics",
+  ".__global__", ".__suppressForeign__")
 exports <- exports[! exports %in% stoplist]
 }
if(lev > 2L) message("--- processing exports for ", dQuote(package))

(Indeed, R CMD check is very careful to only access these variables
using the interface functions in the utils package, so there doesn't
seem to be any code that depends on them being exported, and they
usually aren't.)

Alternatively (or maybe additionally), it may be possible to enhance
the R CMD check diagnostics by checking whether the name of the
undocumented object starts with a dot and asking the user whether it
was intended to be exported. This is not as easy to implement due to
tools:::.check_packages working with the log output from
tools::undoc(), not the object itself. Would a change to
tools:::format.undoc be warranted?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] RcmdrPlugin.HH_1.1-48.tar.gz

2024-03-05 Thread Ivan Krylov via R-package-devel
В Tue, 5 Mar 2024 22:41:32 +
"Richard M. Heiberger"  пишет:

>  Undocumented code objects:
>'.__global__'
>  All user-level objects in a package should have documentation
> entries. See chapter 'Writing R documentation files' in the 'Writing R
>  Extensions' manual.

This object is not here for the user of the package. If you don't
export it, there will be no WARNING about it being undocumented. This
variable is exported because of exportPattern(".") in the file
NAMESPACE. The lone dot is a regular expression that matches any name
of an R object.

If you don't want to manually list your exports in the NAMESPACE file
(which can get tedious) or generate it (which takes additional
dependencies and build steps), you can use exportPattern('^[^\\.]') to
export everything except objects with a name starting with a period:
https://cran.r-project.org/doc/manuals/R-exts.html#Specifying-imports-and-exports

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] RFC: C backtraces for R CMD check via just-in-time debugging

2024-03-03 Thread Ivan Krylov via R-package-devel
On Sun, 3 Mar 2024 19:19:43 -0800
Kevin Ushey  wrote:

> Would libSegFault be useful here?

Glad to know it has been moved to
 and not
just removed altogether after the upstream commit
.

libSegFault is safer than, say, libsegfault [*] because it both
supports SA_ONSTACK (for when a SIGSEGV is caused by stack overflow)
and avoids functions like snprintf() (which depend on the locale code,
which may have been the source of the crash). The only correctness
problem that may still be unaddressed is potential memory allocations
in backtrace() when it loads libgcc on first use. That should be easy
to fix by calling backtrace() once in segfault_init(). Unfortunately,
libSegFault is limited to glibc systems, so a different solution will
be needed on Windows, macOS and Linux systems with the musl libc.

Google-owned "backward" [**] tries to do most of this right, but (1) is
designed to be compiled together with C++ programs, not injected into
unrelated processes and (2) will exit the process if it survives
raise(signum), which will interfere with both rJava (judging by the
number of Java-related SIGSEGVs I saw while running R CMD check) and R's
own stack overflow survival attempts.

-- 
Best regards,
Ivan

[*] https://github.com/stass/libsegfault
(Which doesn't compile out of the box on GNU/Linux due to missing
pthread_np.h, although that should be easy to patch.)

[**] https://github.com/bombela/backward-cpp

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] RFC: C backtraces for R CMD check via just-in-time debugging

2024-03-03 Thread Ivan Krylov via R-package-devel
Hello,

This may be of interest to people who run lots of R CMD checks and have
to deal with resulting crashes in compiled code.

Every now and then, the CRAN checks surface a particularly nasty crash.
The R-level traceback stops in the compiled code. It's not obvious
where exactly the crash happens. Naturally, this never happened on the
maintainer's computer before and, in fact, is hard to reproduce.

Containers would help, but they cannot solve the problem completely.
Some problems only surface when there's more than 32 logical
processors, or during certain times of day. It may help to at least see
the location of the crash as it happens on the computer running the
check.

One way to provide that would be to run a special debugger that does
nothing most of the time, attaches to child threads and processes, and
produces backtraces when processes receive a crashing signal. There is
such a debugger for Windows [1], and there is now a proof of concept
for amd64 Linux [2]. 

I've just tried [2] on a 250-package reverse dependency check and saw a
lot of SIGSEGVs with rcx=cafebabe or Java in the backtrace, but
other than that, it seems to work fine. Do you think it's worth
developing further?

The major downside of using a debugger like this is a noticeable change
in the environment: [v]fork(), clone() and exec() become slower,
attaching another tracer becomes impossible, SIGSEGVs may become much
slower (although I do hope that most software I rely upon doesn't care
about SIGSEGVs per second). On the other hand, these wrappers are as
transparent as they get and don't even need R -d to pass the arguments
to the child process.

The other way to provide C-level backtraces is a post-mortem debugger
(registered via the AeDebug registry key on Windows or
kernel.core_pattern sysctl on Linux). This avoids interference with the
process environment during normal execution, but requires more
integration work to collect the crash dumps, process them into usable
backtraces and associate with the R CMD check runs. There are also
injectable DLLs like libbacktrace, but these have to interfere with the
process from the inside, which may be worse than ptrace() in terms of
observable environment changes. On glibc systems (but not musl, macOS,
Windows), R's SIGSEGV handler could be enhanced to call
backtrace_symbols_fd(), which should be safe (no malloc()) as long as
libgcc is preloaded.

Is adding C-level backtraces to R CMD checks worth the effort? Could it
be a good idea to add this on CRAN? If yes, how can I help?

-- 
Best regards,
Ivan

[1] , see "catchsegv"

[2] https://codeberg.org/aitap/tracecrash

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Additional issues: Intel segfault

2024-03-01 Thread Ivan Krylov via R-package-devel
В Sat, 2 Mar 2024 02:07:47 +
Murray Efford  пишет:

> Gabor suggested https://github.com/r-hub/rhub2 and that worked like a
> charm. A check there on the Intel platform found no errors in my
> present version of secrdesign, so I'll resubmit with confidence.

Thank you for letting me know! Having this as a container simplifies a
lot of things.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Additional issues: Intel segfault

2024-03-01 Thread Ivan Krylov via R-package-devel
В Fri, 1 Mar 2024 07:42:01 +
Murray Efford  пишет:

> R CMD check suggests it is most likely in the Examples for
> 'validate', but all code there is wrapped in \dontrun{}.

The crash happens after q('no'), suggesting a corruption in the heap or
in the R memory manager. At least it's a null pointer being
dereferenced and not a 0xRANDOM_LOOKING_NUMBER: this limits the impact
of the problem.

I don't know if anyone created an easily reproducible container with an
Intel build of R (there's https://hub.docker.com/r/intel/oneapi, but
aren't the compilers themselves supposed to be not redistributable?),
so you will most likely have to follow
https://www.stats.ox.ac.uk/pub/bdr/Intel/README.txt and
https://cran.r-project.org/doc/manuals/r-devel/R-admin.html#Intel-compilers
manually, compiling R using Intel compilers yourself in order to
reproduce this.

I think it would be great if CRAN checking machines used a just-in-time
debugger to provide C-level backtraces at the place of the crash. For
Windows, such a utility does exist [*], but I recently learned that the
glibc `catchsegv` program (and most other similar programs) used to
perform shared object preloading (before being thrown out of the
codebase altogether), which is more intrusive than it could be. A proof
of concept using GDB on Linux can be shown to work:

R -d gdb \
 --debugger-args='-batch -ex run -ex bt -ex c -ex q' \
 -e '
  Rcpp::sourceCpp(code =
   "//[[Rcpp::export]]\nvoid rip() { *(double*)(42) = 42; }"
  ); rip()
 '

-- 
Best regards,
Ivan

[*] https://github.com/jrfonseca/drmingw

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Unexpected multi-core CPU usage in package tests

2024-02-28 Thread Ivan Krylov via R-package-devel
В Tue, 27 Feb 2024 11:14:19 +
Jon Clayden  пишет:

> My testing route is to install the packages within the
> 'rocker/r-devel' Docker container, which is Debian-based, then use
> 'time' to evaluate CPU usage. Note that, even though 'RNifti' does not
> use OpenMP, setting OMP_NUM_THREADS changes its CPU usage

I think that's because rocker/r-devel uses parallel OpenBLAS:

$ podman run --rm -it docker.io/rocker/r-devel \
 R -q -s -e 'sessionInfo()' | grep -A1 BLAS
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.24.so;  
LAPACK version 3.11.0

The incoming CRAN check machine either sets the BLAS parallellism to 1
or uses a non-parallel BLAS. With rocker/r-devel, you can run R with
the environment variable OPENBLAS_NUM_THREADS set to 1. It's been
effective in the past to run R -d gdb and set a breakpoint on
pthread_create before launching the test. (In theory, it may be
required to set a breakpoint on every system call that may be used to
create threads, including various variations of clone(), subject to
variations between operating systems, but pthread_create has been
enough for me so far.)

With OPENBLAS_NUM_THREADS=1, I'm only seeing OpenMP threads created by
the mmand package during tests for your package tractor.base, and the
latest commit (that temporary disables testing of mmand) doesn't hit
the breakpoint or raise any NOTEs at all.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[Rd] How to avoid the Markdown code block bug on R Bugzilla

2024-02-27 Thread Ivan Krylov via R-devel
Hello,

There's a rare but annoying bug in Bugzilla 5.1.2...5.3.2+ where a
Markdown code block inside a comment may be replaced by U+F111 or
U+F222, and then the following code blocks may end up being replaced by
the preceding ones. For example, the problem can be seen in PR16158:
https://bugs.r-project.org/show_bug.cgi?id=16158.

Here's how to avoid it:

1. If no code blocks have been already swallowed by Bugzilla, use the
comment preview to make sure yours won't be swallowed either. If you do
see a  or a  instead of your code block in the preview tab, try:
 - starting the comment with an empty line
 - removing the colons from the starting sentence
 - if all else fails, switching Markdown off

2. If you would like to post some code into a bug where this has
already happened, the preview won't be enough. Bugzilla::Markdown has
separate queues for fenced code blocks and indented code blocks, so if
one was swallowed, it may be possible to post the other. Unfortunately,
you won't know whether it'll fail until you post the comment, and by
then it may be a part of the problem. The only safe way to continue is
to switch Markdown off for the comment.

A technical analysis of the bug is available at
,
but it may take a while to get this fixed on the Bugzilla side.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] CRAN Package Check Note: Warning: trimming empty

2024-02-24 Thread Ivan Krylov via R-package-devel
В Fri, 23 Feb 2024 17:04:39 +
Sunmee Kim  пишет:

> Version: 1.0.4
> Check: HTML version of manual
> Result: NOTE

This may not be immediately obvious in the e-mail from CRAN, but I
think this is a reminder of a warning from the previous version of the
package. Haven't you just uploaded version 1.0.5? I'm not getting any
warnings for gesca_1.0.5.tar.gz from the /incoming/archive subdirectory
on the CRAN FTP server, except perhaps "This build time stamp is over a
month old", and the latest check looks almost clean in the same manner:
https://win-builder.r-project.org/incoming_pretest/gesca_1.0.5_20240223_172938/

What does the rest of the e-mail say?

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Tcl socket server (tcltk) does not work any more on R 4.3.2

2024-02-21 Thread Ivan Krylov via R-devel
В Wed, 21 Feb 2024 08:01:16 +0100
"webmail.gandi.net"  пишет:

> Since the {tcltk} package was working fine with  "while
> (Tcl_DoOneEvent(TCL_DONT_WAIT) && max_ev) max_ev—;", unless there is
> a clear performance enhancement with "while (i-- &&
> Tcl_ServiceAll())", it would perhaps be wise to revert this back.

I forgot to mention the comment in the new version of the function
explaining the switch:

>> [Tcl_DoOneEvent(TCL_DONT_WAIT)] <...> causes infinite recursion with
>> R handlers that have a re-entrancy guard, when TclSpinLoop is
>> invoked from such a handler (seen with Rhttp server)

The difference between Tcl_ServiceAll() and Tcl_DoOneEvent() is that
the latter calls Tcl_WaitForEvent(). The comments say that it is called
for the side effect of queuing the events detected by select(). The
function can indeed be observed to access the fileHandlers via the
thread-specific data pointer, which contain the file descriptors and
the instructions saying what to do with them.

Without Tcl_WaitForEvent, the only event sources known to Tcl are
RTcl_{setup,check}Proc (which only checks file descriptors owned by R),
Display{Setup,Check}Proc (which seems to be owned by Tk), and
Timer{Setup,Check}Proc (for which there doesn't seem to be any timers
by default).

As far as I understand the problem, while the function
worker_input_handler() from src/modules/internet/Rhttpd.c is running,
TclHandler() might be invoked, causing Tcl_DoOneEvent() to call
RTcl_checkProc() and therefore trying to run worker_input_handler()
again. The Rhttpd handler prevents this and doesn't clear the
condition, which causes the event loop to keep calling it. Is that
correct? Are there easy ways to reproduce the problem?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Conversion failure in 'mbcsToSbcs'

2024-02-21 Thread Ivan Krylov
В Wed, 21 Feb 2024 12:29:02 +
Package Maintainer  пишет:

> Error: processing vignette 'ggenealogy.Rnw' failed with diagnostics:
>  chunk 58 (label = plotCBText)

In order to use the non-standard graphics device, the chunk must
set the option fig=TRUE. Otherwise, when something calls
graphics::strwidth('Lubomír Kubáček', "inches"), R notices that no
graphics device is active and creates a default one, which happens to
be pdf() and has all these problems. With fig=TRUE, Sweave will
initialise the cairo_pdf() device first, and then graphics::strwidth()
will use the existing device, avoiding the error.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Tcl socket server (tcltk) does not work any more on R 4.3.2

2024-02-20 Thread Ivan Krylov via R-devel
В Tue, 20 Feb 2024 12:27:35 +0100
"webmail.gandi.net"  пишет:

> When R process #1 is R 4.2.3, it works as expected (whatever version
> of R #2). When R process #1 is R 4.3.2, nothing is sent or received
> through the socket apparently, but no error is issued and process #2
> seems to be able to connect to the socket.

The difference is related to the change in
src/library/tcltk/src/tcltk_unix.c.

In R-4.2.1, the function static void TclSpinLoop(void *data) says:

int max_ev = 100;
/* Tcl_ServiceAll is not enough here, for reasons that escape me */
while (Tcl_DoOneEvent(TCL_DONT_WAIT) && max_ev) max_ev--;

In R-devel, the function instead says:

int i = R_TCL_SPIN_MAX; 
while (i-- && Tcl_ServiceAll())
;

Manually calling Tcl_DoOneEvent(0) from the debugger at this point
makes the Tcl code respond to the connection. Tcl_ServiceAll() seems to
be still not enough. I'll try reading Tcl documentation to investigate
this further.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Conversion failure in 'mbcsToSbcs'

2024-02-15 Thread Ivan Krylov
В Mon, 12 Feb 2024 16:01:27 +
Package Maintainer  пишет:

> Unfortunately, I received a reply from the CRAN submission team
> stating that my vignette file is still obtaining the "mbcsToSbcs"
> ERROR as is shown here
> (https://win-builder.r-project.org/incoming_pretest/ggenealogy_1.0.3_20240212_152455/Debian/00check.log).

I am sorry for leading you down the wrong way with my advice. It turns
out that no 8-bit Type-1 encoding known to pdf() can represent both
'Lubomír Kubáček' and 'Anders Ågren':

lapply(
 setNames(nm = c(
  'latin1', 'cp1252', 'latin2', 'latin7',
  'latin-9', 'CP1250', 'CP1257'
 )), function(enc)
  iconv(enc2utf8(c(
   'Lubomír Kubáček', 'Anders Ågren'
  )), 'UTF-8', enc, toRaw = TRUE)
) |> sapply(lengths)
# one of the two strings cannot be represented, returning a NULL:
#  latin1 cp1252 latin2 latin7 latin-9 CP1250 CP1257
# [1,]  0  0 15  0   0 15  0
# [2,] 12 12  0 12  12  0 12

While it may still be possible to give extra parameters to pdf() to use
a font encoding that covers all the relevant characters, it seems
easier to switch to cairo_pdf() for your multi-lingual plots. Place the
following somewhere in the beginning of the vignette:

<>=
my.Swd <- function(name, width, height, ...)
 grDevices::cairo_pdf(
  filename = paste(name, "pdf", sep = "."),
  width = width, height = height
 )
@
\SweaveOpts{grdevice=my.Swd,pdf=FALSE}

This should define a new plot device function for Sweave, one that
handles more Unicode characters correctly.

> PS: Thanks for the advice about plain text mode. Hopefully, I have
> correctly abide by that advice in this current email.

This e-mail arrived in plain text, thank you!

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] certain pipe() use cases not working in r-devel

2024-02-15 Thread Ivan Krylov via R-devel
В Wed, 14 Feb 2024 14:43:12 -0800
Jennifer Bryan  пишет:

> But in r-devel on macOS, this is silent no-op, i.e. "hello, world"
> does not print:
> 
> > R.version.string  
> [1] "R Under development (unstable) (2024-02-13 r85895)"
> > con <- pipe("cat")
> > writeLines("hello, world", con)  

I can reproduce this on 64-bit Linux.

I think that this boils down to problems with cleanup in R_pclose_pg
[*]. The FILE* fp corresponding to the child process pipe is created
using fdopen() in R_popen_pg(), but R_pclose_pg() only performs close()
on the file descriptor returned by fileno(). The FILE* itself is
leaked, and any buffered content waiting to be written out is lost.

One of the last few lines in the strace output before the process
terminates is the standard C library cleaning up the FILE* object and
trying to flush the buffer:

$ strace -f bin/R -q -s \
 -e 'writeLines("hello", x <- pipe("cat")); close(x)'
...skip...
write(5, "hello\n", 6)  = -1 EBADF (Bad file descriptor)
exit_group(0)   = ?
+++ exited with 0 +++

There is a comment saying "see timeout_wait for why not to use fclose",
which I think references a different function, R_pclose_timeout():

>> Do not use fclose, because on Solaris it sets errno to "Invalid
>> seek" when the pipe is already closed (e.g. because of timeout).
>> fclose would not return an error, but it would set errno and the
>> non-zero errno would then be reported by R's "system" function.

(There are no comments about fclose() in timeout_wait() itself.)

Is there a way to work around the errno problem without letting the
FILE* leak?

-- 
Best regards,
Ivan

[*] Introduced in https://bugs.r-project.org/show_bug.cgi?id=17764#c6
to run child processes in a separate process group, safe from
interrupts aimed at R.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] failing CRAN checks due to problems with dependencies

2024-02-08 Thread Ivan Krylov via R-package-devel
В Wed, 7 Feb 2024 08:40:44 -0600
Marcin Jurek  пишет:

> Packages required but not available: 'Rcpp', 'FNN',
> 'RcppArmadillo' Packages suggested but not available for checking:
> 'fields', 'rmarkdown', 'testthat', 'maptools'

One of the machines running the incoming checks was having problems. If
you followed the failing dependency chain by looking at the CRAN check
results of the packages described as "not available", you could
eventually find a package needing compilation (Rcpp or stringi or
something else), look at the installation log and see Make trying to
run commands that are completely wrong.

It looked like the path to the compiler was empty:
https://web.archive.org/web/20240208191430/https://www.r-project.org/nosvn/R.check/r-devel-linux-x86_64-debian-clang/Rcpp-00install.html

I think that the problems are solved now, so it should be safe to
increment the version and submit it again.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Difficult debug

2024-02-07 Thread Ivan Krylov via R-devel
On Wed, 07 Feb 2024 14:01:44 -0600
"Therneau, Terry M., Ph.D. via R-devel"  wrote:

>  > test2 <- mysurv(fit2, pbc2$bili4, p0= 4:0/10, fit2, x0 =50)  
> ==31730== Invalid read of size 8
> ==31730==    at 0x298A07: Rf_allocVector3 (memory.c:2861)
> ==31730==    by 0x299B2C: Rf_allocVector (Rinlinedfuns.h:595)
> ==31730==    by 0x299B2C: R_alloc (memory.c:2330)
> ==31730==    by 0x3243C6: do_which (summary.c:1152)
<...>
> ==31730==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
<...>
>   *** caught segfault ***
> address 0x10, cause 'memory not mapped'

An unrelated allocation function suddenly dereferencing a null pointer
is likely indication of heap corruption. Valgrind may be silent about
it because the C heap (that it knows how to override and track) is still
intact, but the R memory management metadata got corrupted (which looks
like a valid memory access to Valgrind).

An easy solution could be brought by more instrumentation.

R can tell Valgrind to consider some memory accesses invalid if you
configure it using --with-valgrind-instrumentation [*], but I'm not
sure it will be able to trap overwriting GC metadata, so let's set it
aside for now.

If you compile your own R, you can configure it with -fsanitize=address
added to the compiler and linker flags [**]. I'm not sure whether the
bounds checks performed by AddressSanitizer would be sufficient to
catch the problem, but it's worth a try. Instead of compiling R with
sanitizers, it should be also possible to use the container image
docker.io/rocker/r-devel-san.

The hard option is left if no instrumentation lets you pinpoint the
error. Since the first (as far as Valgrind is concerned) memory error
already happens to result in a SIGSEGV, you can run R in a regular
debugger and try to work backwards from the local variables at the
location of the crash. Maybe there's a way to identify the block
containing the pointer that gets overwritten and set a watchpoint on
it for the next run of R. Maybe you can read the overwritten value as
double and guess where the number came from. If your processor is
sufficiently new, you can try `rr`, the time-travelling debugger [***],
to rewind the process execution back to the point where the pointer gets
overwritten.

-- 
Best regards,
Ivan

[*]
https://cran.r-project.org/doc/manuals/R-exts.html#Using-valgrind

[**]
https://cran.r-project.org/doc/manuals/R-exts.html#Using-Address-Sanitizer

[***]
https://rr-project.org
Judging by the domain name, it's practically designed to fix troublesome
bugs in R packages!

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] r-oldrel-linux- not in CRAN checks?

2024-02-06 Thread Ivan Krylov via R-package-devel
В Tue, 6 Feb 2024 18:27:32 +0100
Vincent van Hees  пишет:

> For details see:
> https://github.com/RfastOfficial/Rfast/issues/99

GitHub processed your plain text description of the problem as if it
was Markdown and among other things ate the text that used to be there
between angle brackets:

> #include
>  ^~~

By digging through the raw source code of the issue at
https://api.github.com/repos/RfastOfficial/Rfast/issues/99 it is
possible to find out which header was missing for Rfast:

> ../inst/include/Rfast/parallel.h:20:10:fatal error: tion: No such
> file or directory
> #include 
>  ^~~
>compilation terminated.

Indeed,  is a C++17 header [1]. While g++ version
7.5.0-3ubuntu1~18.04 seems to accept --std=c++17 without complaint, its
libstdc++-7-dev package is missing this header. Moreover, there's still
no  in libstdc++-8-dev. I think that you need libstdc++-9
for that to work, which is not in Bionic; older versions aren't
C++17-compliant enough to compile Rfast, and C++17 is listed in the
SystemRequirements of the package.

Installing clang-10 and editing Makeconf to use clang++-10 instead of
g++ seems to let the compilation proceed. In order to successfully link
the resulting shared object, I also had to edit Makeconf to specify
-L/usr/lib/gcc/x86_64-linux-gnu/7 when linking -lgfortran.

If you plan to use this in production, be very careful. I don't know
about binary compatibility guarantees between g++-7 and clang++-10, so
you might have to recompile every C++-using R package from source with
clang++-10 in order to avoid hard-to-debug problems when using them
together. (It might also work fine. That's the worst thing about such
problems.)

-- 
Best regards,
Ivan

[1] https://en.cppreference.com/w/cpp/header/execution

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] new maintainer for CRAN package XML

2024-02-05 Thread Ivan Krylov via R-package-devel
Dear Uwe Ligges,

On Mon, 22 Jan 2024 15:50:44 +0100
Uwe Ligges  wrote:

> So we are looking for a person volunteering to take over 'XML'.
> Please let us know if you are interested.

Unless someone else has been discussing this with CRAN in private or
had a package depending on XML and was planning to step up but forgot,
I would like to volunteer.

I'm assuming that the Omegahat page is best preserved in its current
form for historical reasons, so instead I have prepared a Git
repository and a page with an option to file issues on the Codeberg
forge: https://codeberg.org/aitap/XML

With the help of the amazing list members, I have also set up a virtual
machine to run the reverse dependency checks, so it should be possible
to avoid immediate breakage if I have to make any changes.

That's the theory, at least.

(Also, thank you for your reply to my question!)

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] Advice debugging M1Mac check errors

2024-02-05 Thread Ivan Krylov via R-devel
On Sun, 4 Feb 2024 20:41:51 +0100
Holger Hoefling  wrote:

> I wanted to ask if people have good advice on how to debug M1Mac
> package check errors when you don´t have a Mac?

Apologies for not answering the question you asked, but is this about
hdf5r and problems printing R_xlen_t [*] that appeared in 1.3.8 and you
tried to solve in 1.3.9?

We had a thread about this last November:
https://stat.ethz.ch/pipermail/r-package-devel/2023q4/010123.html

To summarise, there is no single standard C format specifier that can be
used to print R_xlen_t. As an implementation detail, it can be defined
as int or ptrdiff_t (or something completely different in the future),
and ptrdiff_t itself is usually defined as long or long long (or, also,
something completely different on a weirder platform). All three basic
types can have different widths and cause painful stack-related
problems when a mismatch happens.

In R-4.4, there will be a macro R_PRIdXLEN_T defining a compatible
printf specifier. Until then (and for compatibility with R-4.3 and
lower), it's relatively safe to cast to (long long) or (ptrdiff_t) and
then use the corresponding specifier, but that's not 100% future-proof.
Also, mind the warnings that mingw compilers sometimes emit for "new"
printf specifiers despite UCRT is documented to support them.

-- 
Best regards,
Ivan

[*] https://www.stats.ox.ac.uk/pub/bdr/M1mac/hdf5r.out

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] Bioconductor reverse dependency checks for a CRAN package

2024-02-05 Thread Ivan Krylov via R-package-devel
Thank you Georgi Boshnakov, Ben Bolker, and Diego Hernangómez Herrero
for introducing me to `revdepcheck`!

On Tue, 30 Jan 2024 12:38:57 -0500
Ben Bolker  wrote:

> I have had a few issues with it 
>  but overall it's
> been very helpful.

Indeed that looks perplexing. Writable .Library can also cause problems
for people running R-svn built in their home directories without
R_LIBS_USER set when they check their packages without Suggests.
I'm also relying on .Library.site for the dependencies of the reverse
dependencies. So far, my setup seems to be working as intended, but I'll
keep this issue in mind.

On Tue, 30 Jan 2024 18:57:41 +0100
Diego Hernangómez Herrero  wrote:

> Haven’t tried with a package with such an amount of revdeps, but my
> approach is revdepcheck in GH actions and commiting the result to the
> repo (that is somehow similar to the docker approach if you host the
> package in GitHub).

Great to know that reverse dependency checks can run in CI! I think
I'll keep a stateful virtual machine for now, because otherwise I would
need to find space for 4 to 32 gigabytes of cache somewhere (or download
everything from the repository mirrors every time).

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Bioconductor reverse dependency checks for a CRAN package

2024-02-05 Thread Ivan Krylov via R-package-devel
On Tue, 30 Jan 2024 16:24:40 +
Martin Morgan  wrote:

> BiocManager (the recommended way to install Bioconductor packages) at
> the end of the day does essentially install.packages(repos =
> BiocManager::repositories()), ensuring that the right versions of
> Bioconductor packages are installed for the version of R in use.

That's great to know, thanks! I think I will use BiocManager::install
for now, both because it uses the correct repositories and because it
doesn't forcibly reinstall the packages I am asking for. With bspm, I
can run BiocManager::install(all_the_dependencies) and have the system
perform the least amount of work required to reach the desired state.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Bioconductor reverse dependency checks for a CRAN package

2024-02-05 Thread Ivan Krylov via R-package-devel
Dear Dirk,

Thank you very much for your help here and over on GitHub!

I have finally managed to get the reverse dependency checks working. It
took some additional disk space and a few more system dependencies. If
not for r2u, I would have been stuck for much longer. I really
appreciate the work that went into packaging all these R packages.

On Tue, 30 Jan 2024 10:32:36 -0600
Dirk Eddelbuettel  wrote:

> For what it is worth, my own go-to for many years has been a VM in
> which I install 'all packages needed' for the rev.dep to be checked.

This approach seems to be working for me, too. I had initially hoped to
set something up using CI infrastructure, but there's too many
dependencies to install in a prepare step and it's too much work to
make a container image with all dependencies anew every time I want to
run a reverse dependency check. Easier to just let it run overnight on
a spare computer.

> Well a few of us maintain packages with quite a tail and cope. Rcpp
> has 2700, RcppArmadillo have over 100, BH a few hundred. These aren't
> 'light'.

Maintaining a top-5 CRAN package by in-degree rank
[10.32614/RJ-2023-060] is indeed a very serious responsibility. 

> I wrote myself the `prrd` package (on CRAN) for this, others have
> other tools -- Team data.table managed to release 1.5.0 to CRAN today
> too. So this clearly is possible.

I'll check out `prrd` next, thanks. tools::check_packages_in_dir is
nice, but it could be faster if I could disable mc.preschedule. 

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[R-pkg-devel] Bioconductor reverse dependency checks for a CRAN package

2024-01-30 Thread Ivan Krylov via R-package-devel
Hello R-package-devel,

What would you recommend in order to run reverse dependency checks for
a package with 182 direct strong dependencies from CRAN and 66 from
Bioconductor (plus 3 more from annotations and experiments)?

Without extra environment variables, R CMD check requires the Suggested
packages to be available, which means installing...

revdepdep <- package_dependencies(revdep, which = 'most')
revdeprest <- package_dependencies(
 unique(unlist(revdepdep)),
 which = 'strong', recursive = TRUE
)
length(setdiff(
 unlist(c(revdepdep, revdeprest)),
 unlist(standard_package_names())
))

...up to 1316 packages. 7 of these suggested packages aren't on CRAN or
Bioconductor (because they've been archived or have always lived on
GitHub), but even if I filter those out, it's not easy. Some of the
Bioconductor dependencies are large; I now have multiple gigabytes of
genome fragments and mass spectra, but also a 500-megabyte arrow.so in
my library. As long as a data package declares a dependency on your
package, it still has to be installed and checked, right?

Manually installing the SystemRequirements is no fun at all, so I've
tried the rocker/r2u container. It got me most of the way there, but
there were a few remaining packages with newer versions on CRAN. For
these, I had to install the system packages manually in order to build
them from source.

Someone told me to try the rocker/r-base container together with pak.
It was more proactive at telling me about dependency conflicts and
would have got me most of the way there too, except it somehow got me a
'stringi' binary without the corresponding libicu*.so*, which stopped
the installation process. Again, nothing that a bit of manual work
wouldn't fix, but I don't feel comfortable setting this up on a CI
system. (Not on every commit, of course - that would be extremely
wasteful - but it would be nice if it was possible to run these checks
before release on a different computer and spot more problems this way.)

I can't help but notice that neither install.packages() nor pak() is
the recommended way to install Bioconductor packages. Could that
introduce additional problems with checking the reverse dependencies?

Then there's the check_packages_in_dir() function itself. Its behaviour
about the reverse dependencies is not very helpful: they are removed
altogether or at least moved away. Something may be wrong with my CRAN
mirror, because some of the downloaded reverse dependencies come out
with a size of zero and subsequently fail the check very quickly.

I am thinking of keeping a separate persistent library with all the
1316 dependencies required to check the reverse dependencies and a
persistent directory with the reverse dependencies themselves. Instead
of using the reverse=... argument, I'm thinking of using the following
scheme:

1. Use package_dependencies() to determine the list of packages to test.
2. Use download.packages() to download the latest version of everything
if it doesn't already exist. Retry if got zero-sized or otherwise
damaged tarballs. Remove old versions of packages if a newer version
exists.
3. Run check_packages_in_dir() on the whole directory with the
downloaded reverse dependencies.

For this to work, I need a way to run step (3) twice, ensuring that one
of the runs is performed with the CRAN version of the package in the
library and the other one is performed with the to-be-released version
of the package in the library. Has anyone already come up with an
automated way to do that?

No wonder nobody wants to maintain the XML package.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Possible malware(?) in a vignette

2024-01-28 Thread Ivan Krylov via R-package-devel
There used to be a long analysis in the draft of this e-mail [*], but
let me cut to the chase.

Even something as simple as replacing the four-byte comment [**] at the
beginning of the file ("%\xd0\xd4\xc5\xd8" -> "%") that keeps the
file fully readable (!) results in the same behaviour but zero
detections:

$ sha256sum d_jss_paper*.pdf
0ae3b229fdd763a0571463dc98e02010752bb0213a672db6826afcd72ccaf291  
d_jss_paper1.pdf
9486d99c1c1f2d1b06f0b6c5d27c54d4f6e39d69a91d7fad845f323b0ab88de9  
d_jss_paper.pdf
$ diff -u <(hd d_jss_paper.pdf) <(hd d_jss_paper1.pdf)
--- /dev/fd/63  2024-01-28 13:00:43.454419322 +0300
+++ /dev/fd/62  2024-01-28 13:00:43.454419322 +0300
@@ -1,4 +1,4 @@
-  25 50 44 46 2d 31 2e 35  0a 25 d0 d4 c5 d8 0a 37  |%PDF-1.5.%.7|
+  25 50 44 46 2d 31 2e 35  0a 25 20 20 20 20 0a 37  |%PDF-1.5.%.7|
 0010  37 20 30 20 6f 62 6a 0a  3c 3c 0a 2f 4c 65 6e 67  |7 0 obj.<<./Leng|
 0020  74 68 20 32 36 32 38 20  20 20 20 20 20 0a 2f 46  |th 2628  ./F|
 0030  69 6c 74 65 72 20 2f 46  6c 61 74 65 44 65 63 6f  |ilter /FlateDeco|

https://www.virustotal.com/gui/file/0ae3b229fdd763a0571463dc98e02010752bb0213a672db6826afcd72ccaf291

The scary-looking files and hosts being accessed are just Adobe Reader
and Chrome behaving in a manner indistinguishable from spyware. Upload
any PDF file with links in it and you'll see the same picture. Even the
original report for d_jss_paper.pdf from poweRlaw_0.70.6 says "no
sandboxes flagged this file as malicious".

I think that the few non-major antivirus products that "detected" the
original file remembered a low-quality checksum of a different file,
and this whole thread resulted from a checksum collision. 0x043BC33F
(71025471) is what, four bytes? Doesn't seem to be a standard CRC-32 or
the sum of all bytes modulo 2^32, though.

I cannot prove a negative, but I invite infosec people with more PDF
experience to comment further on the issue.

-- 
Best regards,
Ivan

[*] Colin seems to have used the Debian build of TeX Live 2017 to
generate it, which is non-trivial but possible to reproduce by
installing it from Debian Snapshots on top of Stretch. The resulting
file has a different hash (for valid reasons), the same behaviour, but
zero detections:
https://www.virustotal.com/gui/file/f7b0e0400167e06970ac61fcadfda29daec1c2ee685d4c9ff805e375bcffc985/behavior

Trying a "binary search" by removing PDF objects or replacing byte
ranges with ASCII spaces was also a dead end: any change results in no
detections.

[**] PDF 1.5 specification, section 3.1.2:

>> Comments (other than the %PDF−1.4 and %%EOF comments described in
>> Section 3.4, “File Structure”) have no semantics.

https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.5_v6.pdf#G8.1860480

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Possible malware(?) in a vignette

2024-01-27 Thread Ivan Krylov via R-package-devel
Apologies for being insufficiently clear. By "a file straight from NOAA" I 
meant a completely different PDF, 
, 
that gives the same SHA-256 hash whether downloaded by VirusTotal 

 or me, comes from a supposedly trusted source, and still makes Acrobat Reader 
behave like it's infected, show a crashed Firefox on the screenshot and drop a 
number of scary-looking files. Surely there will be a difference between 
reading an infected file and a non-infected file?

27 января 2024 г. 15:10:53 GMT+03:00, Bob Rudis  пишет:
>Ivan: do you know what mirror NOAA used at that time to get that version of
>the package? Or, did they pull it "directly" from cran.r-project.org
>(scare-quotes only b/c DNS spoofing is and has been a pretty solid attack
>vector)?

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Possible malware(?) in a vignette

2024-01-27 Thread Ivan Krylov via R-package-devel
В Sat, 27 Jan 2024 03:52:01 -0500
Bob Rudis  пишет:

> Two VT sandboxes used Adobe Acrobat Reader to open the PDF and the PDF
> seems to either had malicious JavaScript or had been crafted
> sufficiently to caused a buffer overflow in Reader that then let it
> perform other functions on those sandboxes.

Let's talk package versions and SHA256 hashes of
poweRlaw/inst/doc/d_jss_paper.pdf.

poweRlaw version 0.70.4:
Packaged: 2020-04-07 14:55:32 UTC
Date/Publication: 2020-04-07 16:10:02 UTC
SHA-256(poweRlaw/inst/doc/d_jss_paper.pdf):
96535de112f471c66e29b74c77444b34a29b82d6525c04d477ed2d987ea6ccae

Not previously uploaded to VirusTotal, currently checks out clean:
https://www.virustotal.com/gui/file/96535de112f471c66e29b74c77444b34a29b82d6525c04d477ed2d987ea6ccae

poweRlaw version 0.70.5:
Packaged: 2020-04-23 15:36:49 UTC
Date/Publication: 2020-04-23 16:40:06 UTC
SHA-256(poweRlaw/inst/doc/d_jss_paper.pdf):
5f827302ede74e1345fba5ba52c279129823da3c104baa821d654ebb8d7a67fb

Not previously uploaded to VirusTotal, also checks out clean:
https://www.virustotal.com/gui/file/5f827302ede74e1345fba5ba52c279129823da3c104baa821d654ebb8d7a67fb/behavior

For some reason, the Zenbox report shows a browser starting up and
someone (something?) moving the mouse:
https://vtbehaviour.commondatastorage.googleapis.com/5f827302ede74e1345fba5ba52c279129823da3c104baa821d654ebb8d7a67fb_Zenbox.html?GoogleAccessId=758681729565-rc7fgq07icj8c9dm2gi34a4cckv23...@developer.gserviceaccount.com=1706348766=KSTxSZJJUUv0FOA51Kwuot89ep4PKUDTY6tHL7kTyG7VwaMlF8VjmU90loeF4ytLBxKjkEtAk%2Ffr39xFrTTyOym3mehtc3HLyT9DS3C5qGa9OPVcu%2BfQfd8qr%2BRubBWb3SKNnhGpi%2Bn%2BTDhaiRx3PilEz%2BwVGiukfNUzWGBlGweG%2BmR1Y%2F0fIgDxJ3eyZ8KwTaocbywMoOLJeC1GSmoW8VYUAnFS2bb8P9Jt%2Bs%2F0axvAkc0M2pmSN3s2lpMq8u5P%2FZZ8yRIMdmv%2B1kUR5ajBdIa%2FHV8Vw8xAdNjZID6ozwAsmBOOizJmHgzr4zh1tX4V65qmcz8D3jctvDRKsuEqXA%3D%3D=text%2Fhtml;#overview

Lots of file activity. I think that all of it can be attributed to
either normal Acrobat Reader activity or normal Chrome activity.

Then we come to poweRlaw version 0.70.6:
Packaged: 2020-04-24 10:44:31 UTC
Date/Publication: 2020-04-25 07:30:12 UTC
SHA-256(inst/doc/d_jss_paper.pdf):
9486d99c1c1f2d1b06f0b6c5d27c54d4f6e39d69a91d7fad845f323b0ab88de9

The Web Archive capture version 20201205222617 for the address
https://cran.r-project.org/web/packages/poweRlaw/vignettes/d_jss_paper.pdf
has the same SHA-256 hash.

This file is being disputed because some antivirus applications flag it:
https://www.virustotal.com/gui/file/9486d99c1c1f2d1b06f0b6c5d27c54d4f6e39d69a91d7fad845f323b0ab88de9/behavior

The behaviour is exactly the same as the one from version 0.70.5:
browser opens with a link to a wrong DOI. Some links are followed.
https://vtbehaviour.commondatastorage.googleapis.com/9486d99c1c1f2d1b06f0b6c5d27c54d4f6e39d69a91d7fad845f323b0ab88de9_Zenbox.html?GoogleAccessId=758681729565-rc7fgq07icj8c9dm2gi34a4cckv23...@developer.gserviceaccount.com=1706347808=Kv1LXUGvDe988Br0pU1AMlttjYY1K9sDwouvZrlzAVSspkdOGS9Ow%2Bg%2F3VjnQLEshx08QqgOHZzQcghownumPDUJLBbEHbOk6KG9IZSH43rxkYhTIy%2BYT5PfNFIupevbJA5XrnJHrm1wKho2%2BDb4t8vA4cgOJJY0UahXTbIMKUeUmPCKAzx9W5kYKj55WhNDrIPrEuni9EeGWkFV45kPr%2BBwYfl2hK4%2BWv6K78CB7zJtzFltF6P3pewafn5Lg3M3AY5YcZ4TryXi01t0dq04Fha83fLRP7JUkmcfpAJauA48Ct0XN7RdCRPSogb0TAGwG%2BDstxNzLAphOEsVju9LUQ%3D%3D=text%2Fhtml;#dropped-info

I've uploaded a decompressed version (prepared using qpdf in.pdf
--stream-data=uncompress out.pdf) of the same file to VirusTotal, and
there are no detections. Zero detections, but the behaviour is the same:
some files are "dropped", but all of them relate to cache in Acrobat
Reader (which is nowadays a piece of Chrome) and Chrome itself:
https://www.virustotal.com/gui/file/5acbc41f103c88a801db36fa72f01d4fa81b9afa1879c36235b1f5373d46ee1a/behavior

Finally, there's poweRlaw version 0.80.0:
Packaged: 2024-01-25 10:39:42 UTC
Date/Publication: 2024-01-25 18:00:02 UTC
SHA-256(inst/doc/d_jss_paper.pdf):
17c252a38e6c9bcfab90a69070b17c5e9d8a1713b7bb376badaeba28b3a38739
Same zero flags, same behaviour of starting the browser, same "dropped"
files in the cache:
https://www.virustotal.com/gui/file/17c252a38e6c9bcfab90a69070b17c5e9d8a1713b7bb376badaeba28b3a38739/behavior
https://vtbehaviour.commondatastorage.googleapis.com/17c252a38e6c9bcfab90a69070b17c5e9d8a1713b7bb376badaeba28b3a38739_Zenbox.html?GoogleAccessId=758681729565-rc7fgq07icj8c9dm2gi34a4cckv23...@developer.gserviceaccount.com=1706348864=UjXMjCvz0uTjS1sqyr5y%2FOwluE%2BskW9F2XupXuOs5JgODlsL1BuwJcWJ56xddQNEtKDHDOaXoRfNxynsffmSaza4yJD9hvPJ6%2BrNMibbB8hojY53g07WKnCd3wdaOmOHEqIP7Md06QWD4CnLEN0KlRvWdsUUA%2F9YTB1bAVqkIR%2FtiaJcRrOTAmdG%2F9Hwrq4xpiEBaFZzO%2FsQPVj3dzNS1LQEXOHFAfnOTaC1LlbBfn9QQWCPib%2FpCOL7huVYqIFSm%2FO8VHWv67JD1qwcTOY7JSl8XPw1ueyumRpF5xF1rpWYCPjC1awU8tho25A2COA7f7LSkku0BRqkuHYW3kuZaw%3D%3D=text%2Fhtml;#dropped-info

I've also uploaded a PDF that came directly from a US agency (NOAA) and
got a similar report:

Re: [R-pkg-devel] How to deal with issues when using devtools::check_rhub(), rhub::check(), and web form

2024-01-24 Thread Ivan Krylov via R-package-devel
В Wed, 24 Jan 2024 16:14:05 -0800
Carl Schwarz  пишет:

> I tried using the web interface at https://builder.r-hub.io/ to
> select the denebian machines, and it returns a message saying
> 
> We're sorry, but something went wrong.
> If you are the application owner check the logs for more information.

> So how do I tell if this a "Rhub issue" or an issue with my package?

A problem with your package would look more like the check at least
starting and then producing errors. Here, it doesn't look like the
check is even starting.

> Or do I just give up on using Rhub to check the denebian machines?

For a while, Rhub used to offer the only on-demand checking service
specifically on Linux machines (there was Win-builder by Uwe Ligges and
macOS builder by Simon Urbanek, but no "Linux builder"), including
Debian [*]. Now that the funding ran out [**], you can try using
various continuous integration services to run your checks in a Linux
virtual machine. Many of them offer free compute minutes.

I think that you've already fulfilled the requirements of the CRAN
policy by fixing all known problems and having R CMD check --as-cran on
R-devel run for you by Win-Builder (which is what
devtools::check_win_devel() does).

-- 
Best regards,
Ivan

[*]
Named after Debra Lynn and Ian Murdock

[**]
https://github.com/RConsortium/r-repositories-wg/blob/main/minutes/2023-09-07_Minutes.md

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] New Package Removal because Shared Library Too Large from Debugging Symbols

2024-01-24 Thread Ivan Krylov via R-package-devel
On Mon, 22 Jan 2024 17:14:04 +0100
Tomas Kalibera  wrote:

> Yes, inside a bigger email, reports can get overlooked, particularly 
> when in a thread with a rather different subject. It wasn't
> overlooked this time thanks to Martin.

Then additional thanks goes to Martin, and I'll make sure to report in
the right place if a similar situation happens again.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] lost braces note on CRAN pretest related to \itemize

2024-01-23 Thread Ivan Krylov via R-package-devel
В Tue, 23 Jan 2024 19:39:54 +0100
Patrick Giraudoux  пишет:

>    \itemize{
>    \item{.}{lm and glm objects can be passed directly as the upper
> scope of term addition (all terms added).

Inside the \itemize and \enumerate commands, the \item command doesn't
take any arguments:
https://cran.r-project.org/doc/manuals/R-exts.html#Lists-and-tables

Instead, it starts a new paragraph with a number (\enumerate) or a
bullet point (\itemize). R CMD check is reminding you that \itemize{
\item{foo}{bar} } is equivalent to \itemize{ \item foo bar } without
any braces.

If you meant to highlight a word by making it an argument of the \item
command, use the \describe command. Here, you're highlighting a dot,
which would be rendered with a bullet point before it, so it's probably
neither semantically nor visually appropriate.

> \value{
>    A \code{\link[sf]{sfc}} object, of POINT geometry, with the
> following columns:
>    \itemize{
>    \item{ID}{ ID number}

The same problem applies here.

Additionally, R CMD check is reminding you that \value{} is implicitly
a special case of a \describe{} environment:
https://cran.r-project.org/doc/manuals/R-exts.html#index-_005cvalue

Since you're already using \item{}{} labels to name the components of
the value, just drop the \itemize{} (but keep its contents). \value{} is
enough.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Cannot see the failure output on Fedora clang/gcc falvor (page not found)

2024-01-22 Thread Ivan Krylov via R-package-devel
On Sun, 21 Jan 2024 16:51:39 +
Sameh Abdulah  wrote:

> However, we cannot access the webpage (page not found) to identify
> and address the failures on Fedora systems.
> 
> https://cran-archive.r-project.org/web/checks/2024/2024-01-12_check_results_MPCR.html
> 
> How can we see the failures on these systems?

I cannot help you with the exact output from the Fedora system (I think
it's lost), but here's how the package fails on mine:

* installing *source* package 'MPCR' ...
** using staged installation
Linux

/tmp/RtmpCSPOGc/Rbuild6043fb1a651/MPCR
/usr/bin/cmake
CMake is installed in: /usr/bin
-- The C compiler identification is GNU 12.2.0
-- The CXX compiler identification is GNU 12.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- WORKING ON RELEASE MODE
MPCR Install Result : FALSE
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
OpenMp Found
R Include Path :  /home/ivan/R-build/include
Rcpp Lib Path :  /home/ivan/R-build/library/Rcpp
R Home Path :  /home/ivan/R-build
CMake Error at cmake/FindR.cmake:63 (find_library):
  Could not find R_LIB using the following names: libR.so
Call Stack (most recent call first):
  CMakeLists.txt:70 (FIND_PACKAGE)


-- Configuring incomplete, errors occurred!
See also 
"/tmp/RtmpCSPOGc/Rbuild6043fb1a651/MPCR/bin/CMakeFiles/CMakeOutput.log".
make: *** No rule to make target 'clean'.  Stop.
make: *** No rule to make target 'all'.  Stop.
cp: cannot stat '/tmp/RtmpCSPOGc/Rbuild6043fb1a651/MPCR/bin/src/libmpcr.so': No 
such file or directory
Failed: libmpcr.so -> src
** libs
make: Nothing to be done for 'all'.
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
Error: package or namespace load failed for 'MPCR' in library.dynam(lib, 
package, package.lib):
 shared object 'MPCR.so' not found

It is not the default to build R as a shared library, and this
installation of R has been built without --enable-R-shlib. I'm sure
that with enough effort it's possible to propagate the information from
R to CMake so that it would make you a shared library in the correct
manner, but I think it's easier to separate your code into two parts:

 1. One part should contain most of your code, without the dependencies
on R. It can be built using CMake if that's what you prefer. It
will probably be more convenient to build it as a static library.

 2. The other part will be the R interface. Let the R build system
(described in WRE 1.2 [*] and below, especially 1.2.6) link the
final shared library from the small remaining part of the source
files (those that include R-related headers) and the static library
from the previous step. If you play your cards right, it will also
work on Windows without significant additional effort.

Have you considered linking your R package against the BLAS and LAPACK
that already come with R? This may not give the user the best possible
performance ever, but those who do care about performance have probably
installed a copy of BLAS of their own choice and may not prefer an
extra copy of OpenBLAS that may or may not match the optimal parameters
for their hardware. Same goes for libgfortran (that may be required
depending on what you're linking) [**].

This would also make it easier to comply with CRAN policy on external
libraries [***]: if you want to download software during package
installation, you may be required to host a fixed version of the
package on something extra reliable (like Zenodo) and verify a
cryptographic hash of the file you download before using it.

-- 
Best regards,
Ivan

[*]
https://cran.r-project.org/doc/manuals/R-exts.html#Configure-and-cleanup

[**]
https://cran.r-project.org/doc/manuals/R-exts.html#index-FLIBS

[***]
https://cran.r-project.org/web/packages/using_rust.html
https://cran.r-project.org/web/packages/external_libs.html

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] New Package Removal because Shared Library Too Large from Debugging Symbols

2024-01-22 Thread Ivan Krylov via R-package-devel
On Mon, 22 Jan 2024 12:30:46 +0100
Tomas Kalibera  wrote:

> Thanks, ported now to R-patched.

Thank you!

Is it fine to mention problems like this one in the middle of an
e-mail, or should I have left a note in the Bugzilla instead?

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Assistance Needed for Resolving Submission Issues with openaistream Package

2024-01-22 Thread Ivan Krylov via R-package-devel
Hello Li Gen and welcome to R-package-devel!

В Mon, 22 Jan 2024 17:50:33 +0800
 пишет:

> The specific areas of concern are:License Information: There's a note
> indicating that the license stub is an "invalid DCF". I've used 'MIT
> + file LICENSE' as the licensing terms. I would appreciate guidance
> on how to correctly format this section to meet the DCF standards.

Leave just the following lines in the LICENSE file, as it currently is
on CRAN [*]:

YEAR: 2023
COPYRIGHT HOLDER: openaistream authors

Why would you like to change it? CRAN doesn't want packages to provide
yet another copy of the MIT license inside the tarball. The text of the
MIT license is always available in an R install at
file.path(R.home('share'), 'licenses', 'MIT').

If you need a copy of the MIT license inside your GitHub repository,
store it elsewhere (e.g. LICENSE.md) and list it in .Rbuildignore [**].

Since you composed your e-mail in HTML and left your mailer to generate
a plain text equivalent, we only got the latter, somewhat mangled:
https://stat.ethz.ch/pipermail/r-package-devel/2024q1/010356.html

Please compose your messages to R mailing lists in plain text.

-- 
Best regards,
Ivan

[*]
https://cran.r-project.org/web/packages/openaistream/LICENSE

[**]
https://cran.r-project.org/doc/manuals/R-exts.html#Building-package-tarballs

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] New Package Removal because Shared Library Too Large from Debugging Symbols

2024-01-21 Thread Ivan Krylov via R-package-devel
В Sat, 20 Jan 2024 20:28:00 -0500
Johann Gaebler  пишет:

> most likely there’s some error on my part in how I’ve set up cpp11,
> but it also seems possible that cpp11 should have detected that that
> header needs to be included and added it automatically

Upon further investigation, it's more complicated than a missing
#include.

cpp11::cpp_register() uses
tools::package_native_routine_registration_skeleton() to generate these
declarations. This function works by scanning the R code for calls to
.Call(), .C(), .Fortran(), and others and then trying to come up with
appropriate prototypes for the native functions being called. For
.Call()s, the function must output the correct type of SEXP for every
argument in the generated declaration.

This works the right way, for example, in R-4.2.2 (2022-11-10) and
today's R-devel, but was broken for a while (e.g. in R-4.3.1 and
R-4.3.2), and the fix, unfortunately, hasn't been backported (not to
R-patched either): https://bugs.r-project.org/show_bug.cgi?id=18585

I can suggest three workarounds.

1. Edit src/cpp11.cpp on a separate "for-CRAN" branch and rebase it on
   top of the main branch every time you update the package.

2. Install R-devel and use it to generate the source package. Strictly
   speaking, this would go against the letter of the CRAN policy
   (builds "should be done with current R-patched or the current
   release of R"), but would at least follow its spirit (use the
   version of R where the known package-building-related bug was fixed).

3. Add a configure script that would modify src/cpp11.cpp while the
   package is being installed. This way, the only thing modifying
   generated code would be more code, which is considered
   architecturally pure by some developers.

   Lots of ways to implement it, too: you can do it in a single shell
   script (using sed or patch -- are these tools guaranteed to be
   available?), delegate to tools/configure.R (that you would also
   write yourself), or go full GNU Autoconf and generate a
   megabyte-sized ./configure from some m4 macros just to replace one
   line.

   There is definitely a lot of performance art value if you go this
   way, but extra code means extra ways for it to go wrong. For more
   style points, make it a Makevars target instead of a configure
   script.

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] New Package Removal because Shared Library Too Large from Debugging Symbols

2024-01-20 Thread Ivan Krylov via R-package-devel
В Sat, 20 Jan 2024 14:38:55 -0500
Johann Gaebler  пишет:

> The issue is that the compiled libraries are too large.

Was it in the e-mail? As you quite correctly observed, many other
packages get the NOTE about shared library size.

It may be not exactly obvious, but the red link saying "LTO" on the
check page that points to
 is hiding a more
serious issue:

> cpp11.cpp:18:13: warning: 'run_testthat_tests' violates the C++ One 
> Definition Rule [-Wodr]
>18 | extern SEXP run_testthat_tests(void *);
>   | ^
> /data/gannet/ripley/R/test-dev/testthat/include/testthat/testthat.h:172:17: 
> note: 'run_testthat_tests' was previously declared here
>   172 | extern "C" SEXP run_testthat_tests(SEXP use_xml_sxp) {
>   | ^

Modern C++ compilers are painfully pedantic about undefined behaviour
and can optimise away large sections of code if they think they have a
proof that your code causes it [*]. If you edit cpp11.cpp to provide the
correct declaration (#include the testthat header if possible), the
error should go away.

-- 
Best regards,
Ivan

[*] For example, see this issue in R: 
https://bugs.r-project.org/show_bug.cgi?id=18430

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] Inquiry Regarding Package Organization in CRAN

2024-01-19 Thread Ivan Krylov via R-package-devel
Hello Andriy and welcome to R-package-devel!

On Fri, 19 Jan 2024 14:34:25 +
Protsak Andriy via R-package-devel 
wrote:

> to achieve this the initial focus is on exploring the possibility of
> renaming the packages so that they share a common prefix, making it
> easier for uses to locate them in the package list.

CRAN package names are long-term identifiers. Assume that there are
many users happy with the packages as they are. If you rename a
package, they will have to patch their scripts and their own packages
just to keep them working as before. Red Queen's race is not something
people like to participate in.

It is certainly not impossible to rename a package, but there has to be
a very good reason to break backwards compatibility and assume a new
name, while the old name stays in the archive, unavailable for new
packages.

Here are some past responses to similar questions:

https://stat.ethz.ch/pipermail/r-package-devel/2022q2/008140.html
https://stat.ethz.ch/pipermail/r-package-devel/2017q2/001678.html
https://stat.ethz.ch/pipermail/r-package-devel/2015q3/000271.html

> If you believe there are alternative strategies to achieve a similar
> result, please feel free to share your perspective.

There are approximately 2 active packages on CRAN. Looking for
useful packages by scanning a list of names will not be very effective.
Better results can be achieved using tools like RSiteSearch
. If you want a package to be more
visible, request its addition to a Task View
. If some packages are related,
make them link to each other in their documentation. David's options
are all very good.

> Additionally, I'm looking into the prospect of merging two packages
> that contain similar functionalities. The aim is to create a more
> comprehensive package by incorporation additional features and
> ensuring seamless compatibility.

The previous point about keeping backwards compatibility still stands.
It should be possible to move all the functions to one package and then
import() it from the other package. Both packages can then export() all
functions, making them available to the dependencies of either package.
Eventually, the skeleton package may grow packageStartupMessage()s
letting the users know that it is deprecated and could they please use
the other package instead. After a while, it should be possible to
archive the skeleton package. But deprecation cycles should be long:
for example, rgeos and rgdal took more than a year to retire
.

Or do you intend to come up with a completely new API? Beware of the
second system effect (although it's certainly not unheard of for second
system projects to succeed).

The spatstat package went through the opposite process a few years ago:
it grew too big and had to be split into multiple packages. Here's one
of its maintainers sharing the experience:
https://stat.ethz.ch/pipermail/r-package-devel/2022q4/008557.html

What is the nature of your final year project? If it can include
technical writing, you could add well-written vignettes to the packages
(only one of the CRAN packages maintained by people @uah.es has a
vignette, and it's very terse). If it has to be mostly programming or
maintenance of R packages, I'm out of ideas.

Either way, good luck, and I hope your project succeeds!

-- 
Best regards,
Ivan

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Rd] [External] Re: Choices to remove `srcref` (and its buddies) when serializing objects

2024-01-18 Thread Ivan Krylov via R-devel
On Thu, 18 Jan 2024 09:59:31 -0600 (CST)
luke-tier...@uiowa.edu wrote:

> What does 'blow up' mean? If it is anything other than signal a "bad
> binding access" error then it would be good to have more details.

My apologies for not being precise enough. I meant the "bad binding
access" error in all such cases.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Choices to remove `srcref` (and its buddies) when serializing objects

2024-01-18 Thread Ivan Krylov via R-devel
В Tue, 16 Jan 2024 14:16:19 -0500
Dipterix Wang  пишет:

> Could you recommend any packages/functions that compute hash such
> that the source references and sexpinfo_struct are ignored? Basically
> a version of `serialize` that convert R objects to raw without
> storing the ancillary source reference and sexpinfo.

I can show how this can be done, but it's not currently on CRAN or even
a well-defined package API. I have adapted a copy of R's serialize()
[*] with the following changes:

 * Function bytecode and flags are ignored:

f <- function() invisible()
depcache:::hash(f, 2) # This is plain FNV1a-64 of serialize() output
# [1] "9b7a1af5468deba4"
.Call(depcache:::C_hash2, f) # This is the new hash
[1] 91 5f b8 a1 b0 6b cb 40
f() # called once: function gets the MAYBEJIT_MASK flag
depcache:::hash(f, 2)
# [1] "7d30e05546e7a230"
.Call(depcache:::C_hash2, f)
# [1] 91 5f b8 a1 b0 6b cb 40
f() # called twice: function now has bytecode
depcache:::hash(f, 2)
# [1] "2a2cba4150e722b8"
.Call(depcache:::C_hash2, f)
# [1] 91 5f b8 a1 b0 6b cb 40 # new hash stays the same

 * Source references are ignored:

.Call(depcache:::C_hash2, \( ) invisible( ))
# [1] 91 5f b8 a1 b0 6b cb 40 # compare vs. above

# For quoted function definitions, source references have to be handled
# differently 
.Call(depcache:::C_hash2, quote(function(){}))
[1] 58 0d 44 8e d4 fd 37 6f
.Call(depcache:::C_hash2, quote(\( ){  }))
[1] 58 0d 44 8e d4 fd 37 6f

 * ALTREP is ignored:

identical(1:10, 1:10+0L)
# [1] TRUE
identical(serialize(1:10, NULL), serialize(1:10+0L, NULL))
# [1] FALSE
identical(
 .Call(depcache:::C_hash2, 1:10),
 .Call(depcache:::C_hash2, 1:10+0L)
)
# [1] TRUE

 * Strings not marked as bytes are encoded into UTF-8:

identical('\uff', iconv('\uff', 'UTF-8', 'latin1'))
# [1] TRUE
identical(
 serialize('\uff', NULL),
 serialize(iconv('\uff', 'UTF-8', 'latin1'), NULL)
)
# [1] FALSE
identical(
 .Call(depcache:::C_hash2, '\uff'),
 .Call(depcache:::C_hash2, iconv('\uff', 'UTF-8', 'latin1'))
)
# [1] TRUE

 * NaNs with different payloads (except NA_numeric_) are replaced by
   R_NaN.

One of the many downsides to the current approach is that we rely on
the non-API entry point getPRIMNAME() in order to hash builtins.
Looking at the source code for identical() is no help here, because it
uses the private PRIMOFFSET macro.

The bitstream being hashed is also, unfortunately, not exactly
compatible with R serialization format version 2: I had to ignore the
LEVELS of the language objects being hashed both because identical()
seems to ignore those and because I was missing multiple private
definitions (e.g. the MAYBEJIT flag) to handle them properly.

Then there's also the problem of immediate bindings [**]: I've seen bits
of vctrs, rstudio, rlang blow up when calling CAR() on SEXP objects that
are not safe to handle this way, but R_expand_binding_value() (used by
serialize()) is again a private function that is not accessible from
packages. identical() won't help here, because it compares reference
objects (which may or may not contain such immediate bindings) by their
pointer values instead of digging down into them.

Dropping the (already violated) requirement to be compatible with R
serialization bitstream will make it possible to simplify the code
further.

Finally:

a <- new.env()
b <- new.env()
a$x <- b$x <- 42
identical(a, b)
# [1] FALSE
.Call(depcache:::C_hash2, a)
# [1] 44 21 f1 36 5d 92 03 1b
.Call(depcache:::C_hash2, b)
# [1] 44 21 f1 36 5d 92 03 1b

...but that's unavoidable when looking at frozen object contents
instead of their live memory layout.

If you're interested, here's the development version of the package:
install.packages('depcache',contriburl='https://aitap.github.io/Rpackages')

-- 
Best regards,
Ivan

[*]
https://github.com/aitap/depcache/blob/serialize_canonical/src/serialize.c

[**]
https://svn.r-project.org/R/trunk/doc/notes/immbnd.md

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


  1   2   3   4   5   6   >