Re: [Rd] [FORGED] grDevices::convertColor and colorRamp(space='Lab') Performance Improvements

2018-10-03 Thread Paul Murrell



Thanks Brodie.  Having a look at all of this is on my list.

Paul

On 03/10/18 14:13, Brodie Gaslam via R-devel wrote:
`grDevices::convertColor` performance can be improved by 30-300x with 
small changes to the code.  `colorRamp(space='Lab')` uses `convertColor` 
so it too benefits from substantial performance gains.


`convertColor` vectorizes explicitly over the rows of the input color 
matrix using `apply`. The level-1 patch [1] illustrates a possible 
minimal set of changes to achieve this with just R matrix and vector 
operations.  The changes consist primarily of switching `if/else` to 
`ifelse`, `c` to `cbind`, `sum` to `rowSums`, etc. This results in 
speedups of 30-100x as shown in table 1:


    to
from    Apple RGB  sRGB CIE RGB   XYZ  Lab  Luv
   Apple RGB    NA  38.3    55.8  30.3 60.2 56.3
   sRGB   38.7    NA    55.7  36.5 62.9 52.7
   CIE RGB    45.2  44.4  NA  30.6 51.5 43.1
   XYZ    73.4  57.5    69.1    NA 92.2 69.0
   Lab    46.6  56.6    65.4  72.0   NA 61.3
   Luv    73.2 107.3    67.3 105.8 97.8   NA

## Table 1:
## Ratios of `grDevices` to 'level-1' patch speeds for 8000 colors
## from each supported color space to all other supported color spaces.

A few incremental changes as detailed in the level-2 patch [2] yield 
additional performance improvements:


    to
from    Apple RGB  sRGB CIE RGB   XYZ Lab   Luv
   Apple RGB    NA  97.1   106.2  89.0 117  83.4
   sRGB   92.5    NA    99.4  86.4 120  76.0
   CIE RGB   119.2 184.2  NA  82.2 135  83.4
   XYZ   122.3 209.8   140.9    NA 171 148.8
   Lab   166.4 168.2   255.4 288.5  NA 265.1
   Luv   141.7 173.6   279.6 310.1 195    NA

## Table 2:
## Ratios of `grDevices` to level-2 patch speeds for 8000 colors
## from each supported color space to all other supported color spaces.

Not shown here is that the patched versions of `convertColor` are faster 
with small inputs as well, though the difference is less marked.


I have posted tests on github [3], along with the results of running 
them on my system [4].  While these tests are not proof that the patches 
do not change the function other than to make it faster, the tests 
results combined with the narrow scope of the changes hopefully provides 
sufficient evidence this is the case.   For those wanting to run the 
tests, installation and testing instructions are on the github landing 
page for this project [5].


There are some minor (in my opinion) changes in behavior that need to be 
discussed:


* Inputs that would previously stop with errors or work inconsistently 
now work consistently (e.g. zero-row inputs or inputs containing 
NA/NaN/Inf).
* Column names are consistently set to the color space initials; these 
were previously inconsistently set / mangled by `c`.


These are discussed at length with examples in [6].

It would be possible to preserve the existing behavior, but doing so 
would require some contortions that serve no other purposes than that. 
Additionally, in many cases the existing behavior is inconsistent across 
different input/output color spaces.  Finally, most of the differences 
involve corner case inputs such as those containing NA/NaN/Inf, or zero 
rows.


I am entirely amenable to modify the patches to preserve existing 
behavior in these cases if that is considered desirable.


These patches should be coordinated with #17473 [7], which I found while 
working on this issue.


---

'level-1' patch:
[1]: 
https://raw.githubusercontent.com/brodieG/grDevices2/level-2/extra/level-1.txt 



'level-2' patch:
[2]: 
https://raw.githubusercontent.com/brodieG/grDevices2/level-2/extra/level-2.txt 



Tests on github, and the result of running them:
[3]: 
https://github.com/brodieG/grDevices2/blob/level-2/tests/convertColor.R
[4]: 
https://raw.githubusercontent.com/brodieG/grDevices2/level-2/tests/convertColor.Rout 



Github landing page for this project:
[5]: https://github.com/brodieG/grDevices2

Discussion of differences introduces by patches:
[6]: 
https://htmlpreview.github.io/?https://raw.githubusercontent.com/brodieG/grDevices2/level-2/extra/differences.html 



Indirectly related bugzilla issue #17473:
[7]: https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17473

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Dr Paul Murrell
Department of Statistics
The University of Auckland
Private Bag 92019
Auckland
New Zealand
64 9 3737599 x85392
p...@stat.auckland.ac.nz
http://www.stat.auckland.ac.nz/~paul/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[R-pkg-devel] Fwd: unable to load shared object

2018-10-03 Thread Witold E Wolski
Hello,

I am trying to install a package with some src files on windows (linux
build works smooth).

The sources seem to build (against MSYS2 libraries), there is an *.dll in
the src folder but than the installation process fails with:

```
** testing if installed package can be loaded
Error: package or namespace load failed for 'grpc' in inDL(x,
as.logical(local), as.logical(now), ...):
 unable to load shared object
'C:/Users/wewol/OneDrive/Documents/R/win-library/3.5/grpc/libs/x64/grpc.dll':
  LoadLibrary failure:  The specified module could not be found.

Error: loading failed
Execution halted
```

Help would be greatly appreciated.

The packages sources are here:
https://github.com/wolski/grpc-1

A more detailed discussion with the package author can be found here:
https://github.com/nfultz/grpc/issues/21

Witek

PS. Posted this on r-devel but since no one is replying I try it here.
-- 
Witold Eryk Wolski

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [Bioc-devel] a pattern to be avoided? mcols(x)$y <- z

2018-10-03 Thread Hervé Pagès

Hi Vince,

This issue was reported here a couple of weeks ago:

  https://github.com/Bioconductor/GenomicRanges/issues/11

Internally $<- uses something like:

  do.call(DataFrame, list(DF1, DF2))

to combine the metadata columns. However in some situations
the do.call(DataFrame, list(...)) form is **very** inefficient
compared to the more direct DataFrame(...) form:

  library(S4Vectors)
  DF1 <- DataFrame(a=Rle(11:1999, 1011:2999), b=5)
  DF2 <- DataFrame(c=Rle(12:2000, 1011:2999))
  system.time(DF12 <- do.call(DataFrame, list(DF1, DF2)))
  #   user  system elapsed
  #  4.476   0.000   4.476
  system.time(DF12b <- DataFrame(DF1, DF2))
  #   user  system elapsed
  #  0.002   0.000   0.001
  identical(DF12, DF12b)
  # [1] TRUE

@Michael: Any idea what's going on?

Thanks,
H.


On 10/03/2018 07:01 AM, Vincent Carey wrote:

The following comes up in use of Fdb.InfiniumMethylation.hg19::getPlatform


debug: mcols(GR)$channel <- Rle(as.factor(mcols(GR)$channel450))

Browse[3]> system.time(uu <- Rle(as.factor(mcols(GR)$channel450)))

user  system elapsed

   0.020   0.003   0.022

Browse[3]> system.time(mcols(GR)$channel <-
Rle(as.factor(mcols(GR)$channel450)))

user  system elapsed

  47.263   0.067  47.373

Browse[3]> GR$channel[1]

factor-Rle of length 1 with 1 run

   Lengths:1

   Values : Both

Levels(3): Both Grn Red

Browse[3]> system.time(GR$channel <- Rle(as.factor(mcols(GR)$channel450)))

user  system elapsed

   0.058   0.006   0.065


Presumably the mcols()$<- copies/rewrites a lot of data needlessly?



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Bioc-devel] a pattern to be avoided? mcols(x)$y <- z

2018-10-03 Thread Vincent Carey
The following comes up in use of Fdb.InfiniumMethylation.hg19::getPlatform


debug: mcols(GR)$channel <- Rle(as.factor(mcols(GR)$channel450))

Browse[3]> system.time(uu <- Rle(as.factor(mcols(GR)$channel450)))

   user  system elapsed

  0.020   0.003   0.022

Browse[3]> system.time(mcols(GR)$channel <-
Rle(as.factor(mcols(GR)$channel450)))

   user  system elapsed

 47.263   0.067  47.373

Browse[3]> GR$channel[1]

factor-Rle of length 1 with 1 run

  Lengths:1

  Values : Both

Levels(3): Both Grn Red

Browse[3]> system.time(GR$channel <- Rle(as.factor(mcols(GR)$channel450)))

   user  system elapsed

  0.058   0.006   0.065


Presumably the mcols()$<- copies/rewrites a lot of data needlessly?

-- 
The information in this e-mail is intended only for the ...{{dropped:18}}

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Rd] maximum matrix size

2018-10-03 Thread Therneau, Terry M., Ph.D. via R-devel
That is indeed helpful; reading the sections around it largely answered my 
questions. 
Rinternals.h has the definitions

#define allocMatrix Rf_allocMatrix
SEXP Rf_allocMatrix(SEXPTYPE, int, int);
#define allocVector        Rf_allocVector
SEXP Rf_allocVector(SEXPTYPE, R_xlen_t);

Which answers the further question of what to expect inside C routines invoked 
by Call.

It looks like the internal C routines for coxph work on large matrices by pure 
serendipity 
(nrow and ncol each less than 2^31 but with the product  > 2^31), but 
residuals.coxph 
fails with an allocation error on the same data.  A slight change and it could 
just as 
easily have led to a hard crash.    Sigh...   I'll need to do a complete code 
review.   
I've been converting .C routines to .Call  as convenient, this will force 
conversion of 
many of the rest as a side effect (20 done, 23 to go).  As a statsitician my 
overall 
response is "haven't they ever heard of sampling"?  But as I said earlier, it 
isn't just 
one user.

Terry T.

On 10/02/2018 12:22 PM, Peter Langfelder wrote:
> Does this help a little?
>
> https://cran.r-project.org/doc/manuals/r-release/R-ints.html#Long-vectors
>
> One thing I seem to remember but cannot find a reference for is that
> long vectors can only be passed to .Call calls, not C/Fortran. I
> remember rewriting .C() in my WGCNA package to .Call for this very
> reason but perhaps the restriction has been removed.
>
> Peter
> On Tue, Oct 2, 2018 at 9:43 AM Therneau, Terry M., Ph.D. via R-devel
>  wrote:
>> I am now getting the occasional complaint about survival routines that are 
>> not able to
>> handle big data.   I looked in the manuals to try and update my 
>> understanding of max
>> vector size, max matrix, max data set, etc; but it is either not there or I 
>> missed it (the
>> latter more likely).   Is it still .Machine$integer.max for everything?   
>> Will that
>> change?   Found where?
>>
>> I am going to need to go through the survival package and put specific 
>> checks in front
>> some or all of my .Call() statements, in order to give a sensible message 
>> whenever a
>> bounday is struck.  A well meaning person just posted a suggested "bug fix" 
>> to the github
>> source of one routine where my .C call allocates a scratch vector, 
>> suggesting  "resid =
>> double( as.double(n) *nvar)" to prevent a "NA produced by integer overflow" 
>> message,  in
>> the code below.   A fix is obvously not quite that easy :-)
>>
>>   resid <- .C(Ccoxscore, as.integer(n),
>>   as.integer(nvar),
>>   as.double(y),
>>   x=as.double(x),
>>   as.integer(newstrat),
>>   as.double(score),
>>   as.double(weights[ord]),
>>   as.integer(method=='efron'),
>>   resid= double(n*nvar),
>>   double(2*nvar))$resid
>>
>> Terry T.
>>
>>
>>  [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] unable to load shared object

2018-10-03 Thread Witold E Wolski
Hello,

I am trying to install a package with some src files on windows (linux
install works fine). The sources seem to build, there is an *.dll in
the src folder but than the installation process fails with:

```
** testing if installed package can be loaded
Error: package or namespace load failed for 'grpc' in inDL(x,
as.logical(local), as.logical(now), ...):
 unable to load shared object
'C:/Users/wewol/OneDrive/Documents/R/win-library/3.5/grpc/libs/x64/grpc.dll':
  LoadLibrary failure:  The specified module could not be found.

Error: loading failed
Execution halted
```

Do I need to point the installer to the *.dll in the src directory by
creating some special function (e.g. dyn.load) or creating some
special file?

Help would be greatly appreciated:

The packages sources are here:
https://github.com/wolski/grpc-1

Witek


-- 
Witold Eryk Wolski

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How do I set a compile flag _WIN32_WINNT=0x600 in Makevars.Win

2018-10-03 Thread Witold E Wolski
Hi,

Solved, some search for preprocessor helped.

PKG_CPPFLAGS=-D_WIN32_WINNT=0x600

Very Sorry for bothering you


Thank you
On Tue, 2 Oct 2018 at 23:01, Witold E Wolski  wrote:
>
> Sorry for bothering you
>
> I am trying to build the R grpc package on windows:
> https://github.com/nfultz/grpc
>
> against an MSYS2 build of grpc.
>
> when running devtools::install() I am getting the following error:
>
> C:/msys64/mingw64/include/grpc/impl/codegen/port_platform.h:47:2:
> error: #error "Please compile grpc with _WIN32_WINNT of at least 0x600
> (aka Windows Vista)"
>  #error \
>   ^
>
> Which, if I am correct asks me to set _WIN32_WINNT=0x600
>
> My Makevars.Win looks as follows:
>
> PKG_CPPFLAGS=-IC:/msys64/mingw64/include
> PKG_LIBS=-LC:/msys64/mingw64/lib -lgrpc
>
> What should I do?
>
> Have a great day
> best regards
> Witek
>
>
>
> --
> Witold Eryk Wolski



-- 
Witold Eryk Wolski

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel