> From: Tomas Kalibera <tomas.kalib...@gmail.com> > > Thanks for the report, I am working on a patch that will address this. > > I confirm there is a lot of potential for speedup. On my system, > > 'N=200000; x <- substring(paste(rep("A", N), collapse=""), 1:N, 1:N)' > > spends 96% time in checking if the string is ascii and 3% in strlen(); > if we take advantage of the pre-computed value in the ASCII bit, the > speed up is about 40x.
The latest version of pqR (at pqR-project.org) has changes that considerably speed up both this and other string operations. Here's a test (both compiled with gcc 8.2.0 with -O3 on a Skylake processor). R-3.5.2: > N=200000; system.time(for (i in 1:100) r<-paste(rep("A",N),collapse="")) user system elapsed 1.548 0.023 1.572 > system.time(for (i in 1:10) x<-substring(r,1:N,1:N)) user system elapsed 4.462 0.016 4.478 pqR-2019-02-19: > N=200000; system.time(for (i in 1:100) r<-paste(rep("A",N),collapse="")) user system elapsed 0.318 0.071 0.388 > system.time(for (i in 1:10) x<-substring(r,1:N,1:N)) user system elapsed 0.041 0.000 0.041 Some of this may be due to pqR's faster garbage collector - R Core implementatons have a particular GC problem with strings, as explained at https://radfordneal.wordpress.com/2018/11/29/faster-garbage-collection-in-pqr/ But there are also some specific improvements to string operations that you might want to have a look at. Radford Neal ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel