Re: [Rd] [External] svd For Large Matrix

2021-08-13 Thread Dirk Eddelbuettel


Dario,

On 14 August 2021 at 00:00, Dario Strbenac via R-devel wrote:
| Good day,
| 
| Ah, I was confident it wouldn't be environment-specific but it is. My 
environment is
| 
| R version 4.1.0 (2021-05-18)
| Platform: x86_64-pc-linux-gnu (64-bit)
| Running under: Debian GNU/Linux 10 (buster)
| 
| Matrix products: default
| BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
| LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
| 
| It crashes at about 180 GB RAM usage. The server has 1024 GB physical RAM in 
it. Modestly downsampling to 30 million cells avoids the segmentation fault. 
The segmentation fault originates from BLAS
| 
| Program received signal SIGSEGV, Segmentation fault.
| 0x77649c10 in ATL_dgecopy () from 
/usr/lib/x86_64-linux-gnu/libblas.so.3

This would allow to do what was suggested: trying different BLAS. On Debian
and alike you can just install Atlas (as you have), or OpenBLAS, or the
reference BLAS (and there my script and post from a few years ago to use MKL
but depending on which release you use may actually be directly accessible),
or now also BLIS, or ...

In short, we see a bug when using Atlas. I would at least try OpenBLAS.

Here is what I see in a Docker container using testing/unstable -- you will
see a shorter list but you *will* have the three different openblas versions
at a minimum.

root@somedocker:~# apt-cache search libblas | grep -- -dev
libatlas-base-dev - Automatically Tuned Linear Algebra Software, generic static
libblis-openmp-dev - BLAS-like Library Instantiation Software Framework 
(dev,32bit,openmp)
libblis-pthread-dev - BLAS-like Library Instantiation Software Framework 
(dev,32bit,pthread)
libblis-serial-dev - BLAS-like Library Instantiation Software Framework 
(dev,32bit,serial)
libblis64-openmp-dev - BLAS-like Library Instantiation Software Framework 
(dev,64bit,openmp)
libblis64-pthread-dev - BLAS-like Library Instantiation Software Framework 
(dev,64bit,pthread)
libblis64-serial-dev - BLAS-like Library Instantiation Software Framework 
(dev,64bit,serial)
libblas-dev - Basic Linear Algebra Subroutines 3, static library
libblas64-dev - Basic Linear Algebra Subroutines 3, static library (64bit-index)
libopenblas-openmp-dev - Optimized BLAS (linear algebra) library (dev, openmp)
libopenblas-pthread-dev - Optimized BLAS (linear algebra) library (dev, pthread)
libopenblas-serial-dev - Optimized BLAS (linear algebra) library (dev, serial)
libopenblas64-openmp-dev - Optimized BLAS (linear algebra) library (dev, 64bit, 
openmp)
libopenblas64-pthread-dev - Optimized BLAS (linear algebra) library (dev, 
64bit, pthread)
libopenblas64-serial-dev - Optimized BLAS (linear algebra) library (dev, 64bit, 
serial)
libblasr-dev - tools for aligning PacBio reads to target sequences (development 
files)
root@somedocker:~# 

Dirk

-- 
https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] svd For Large Matrix

2021-08-13 Thread Dario Strbenac via R-devel
Good day,

Ah, I was confident it wouldn't be environment-specific but it is. My 
environment is

R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

It crashes at about 180 GB RAM usage. The server has 1024 GB physical RAM in 
it. Modestly downsampling to 30 million cells avoids the segmentation fault. 
The segmentation fault originates from BLAS

Program received signal SIGSEGV, Segmentation fault.
0x77649c10 in ATL_dgecopy () from /usr/lib/x86_64-linux-gnu/libblas.so.3

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] svd For Large Matrix

2021-08-13 Thread Prof Brian Ripley

On 13/08/2021 15:58, luke-tier...@uiowa.edu wrote:

[copying the list]

svd() does support matrices with long vector data. Your example works
fine for me on a machine with enough memory with either the reference
BLAS/LAPACK or the BLAS/LAPACK used on Fedora 33 (flexiblas backed, I
believe, by a version of openBLAS). Take a look at sessionInfo() to
see what you are using and consider switching to another BLAS/LAPACK
if necessary. Running under gdb may help tracking down where the issue
is and reporting it for the BLAS/LAPACK you are using.


See also 
https://cran.r-project.org/doc/manuals/r-devel/R-ints.html#Large-matrices which 
(to nuance Prof Tierney's comment) mentions that svd on long-vector 
*complex* data has been known to segfault (with the reference BLAS/Lapack).


My guess was that this was an out-of-memory condition not handled 
elegantly by the OS.  (There are many reasons why the posting guide asks 
for the output of sessionInfo().)


We do not have the statistical context but it seems unlikely that anyone 
is interested in each of the 45m samples, and for information on the 
proteins a quite small sample of cells would suffice.  And that not all 
45m left singular values are required (most likely none are, in which 
case the underlying Lapack routine can use a more efficient calculation).




Best,

luke

On Fri, 13 Aug 2021, Dario Strbenac via R-devel wrote:


Good day,

I have a real scenario involving 45 million biological cells (samples) 
and 60 proteins (variables) which leads to a segmentation fault for 
svd. I thought this might be a good example of why it might benefit 
from a long vector upgrade.


test <- matrix(rnorm(4500*60), ncol = 60)
testSVD <- svd(test)

*** caught segfault ***
address 0x7fe93514d618, cause 'memory not mapped'

Traceback:
1: La.svd(x, nu, nv)
2: svd(test)




--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] svd For Large Matrix

2021-08-13 Thread luke-tierney

[copying the list]

svd() does support matrices with long vector data. Your example works
fine for me on a machine with enough memory with either the reference
BLAS/LAPACK or the BLAS/LAPACK used on Fedora 33 (flexiblas backed, I
believe, by a version of openBLAS). Take a look at sessionInfo() to
see what you are using and consider switching to another BLAS/LAPACK
if necessary. Running under gdb may help tracking down where the issue
is and reporting it for the BLAS/LAPACK you are using.

Best,

luke

On Fri, 13 Aug 2021, Dario Strbenac via R-devel wrote:


Good day,

I have a real scenario involving 45 million biological cells (samples) and 60 
proteins (variables) which leads to a segmentation fault for svd. I thought 
this might be a good example of why it might benefit from a long vector upgrade.

test <- matrix(rnorm(4500*60), ncol = 60)
testSVD <- svd(test)

*** caught segfault ***
address 0x7fe93514d618, cause 'memory not mapped'

Traceback:
1: La.svd(x, nu, nv)
2: svd(test)

--
Dario Strbenac
University of Sydney
Camperdown NSW 2050
Australia

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel