Re: [R] questions on the ff package
I wonder how efficiently it is to do the following command on a frequent basis. nrow(matFF) - nrow(matFF)+1 Obviously there is overhead (closing file, enlarging file, openeing file). I recommend you measure yourself whether this is acceptable for you. no large file copying is needed each time the nrow is changed? With a decent filesystem there is *no* copying from smaller to larger file. would you think I can open 2000 large matrices and leave them open or I need to close each after it is opened and used? Not tested yet. I guess the number of open files can be configured when compiling your OS. Please test and let us know your experience. Regards Jens Oehlschlägel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] questions on the ff package
Hi Jens, Thanks for your prompt and informative answers. ff is a fabulous package and your suggestions helped me solve my problems at hands. As I need to incrementally increase each of several large matrices (about 1000 rows *1 columns, 1000 matrices) by a row every day. I wonder how efficiently it is to do the following command on a frequent basis. nrow(matFF) - nrow(matFF)+1 as far as I know for mmap, a fixed size of file is preallocated. I don't know the ff implementation and how it handle file size changes. Does the command in the line above preallocate, say 10% more space of the current size so that no large file copying is needed each time the nrow is changed? Another problem I am facing is that I have over 2000 large matrices that need the help of ff. Suppose I have a list of 2000 ff objects. My computing environment is 64bit linux, 64Gb mem. I remember there is some limitations on the maximum of files that can be opened in linux. If I need to access each matrix, would you think I can open each matrix and leave them open or I need to close it after it is opened and used? Thanks a lot Jeff -Original Message- From: Jens Oehlschlägel [mailto:oehl_l...@gmx.de] Sent: Wednesday, November 25, 2009 8:04 AM To: h...@andrew.cmu.edu Cc: r-h...@lists.r-project.org Subject: [R] questions on the ff package Jeff, I need to save a matrix as a memory-mapped file and load it back later. To save the matrix, I use mat = matrix(1:20, 4, 5) matFF = ff(mat, dim=dim(mat), filename=~/a.mat , overwrite=TRUE, dimnames = dimnames(mat)) # This stores the data in an ff file, # but not the metadata in R's ff object. # To do the latter you need to do save(matFF, file=~/matFF.RData) # Assuming that your ff file remains in the same location, # in a new R session you simply load(file=~/matFF.RData) # and the ff file is available automagically However, I don't always know the dimension when loading the matrix back. If I miss the dim attributes, ff will return it as vector. Is there a way to load the matrix without specifying the dimension? # You can create an ff object using your existing ff file by matFF - ff(filename=~/a.mat, vmode=double, dim=c(4,5)) # You can do the same at unknown file size with matFF - ff(filename=~/a.mat, vmode=double) # which gives you the length of the ff object length(matFF) # if you know the number of columns you can calculate the number of rows and give your ff object the interpretation of a matrix dim(matFF) - c(length(matFF)/5, 5) the matrix may grow in terms of the number of rows. Is there an efficient way to do this? # there are two ways to grow a matrix by rows # 1) you create the matrix in major row order matFF - ff(1:20, dim=c(4,5), dimorder=c(2:1)) # then you require a higher number of rows nrow(matFF) - 6 # as you can see there are new empty rows in the file matFF # 2) Instead of a matrix you create a ffdf data.frame #which you can also give more rows using nrow- #An example of this is in read.table.ffdf #which reads a csv file in chunks and extends the #number of rows in the ffdf Jens Oehlschlägel -- Preisknaller: GMX DSL Flatrate für nur 16,99 Euro/mtl.! http://portal.gmx.net/de/go/dsl02 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] questions on the ff package
Dear Jeff, This is not exactly what you are asking, but what I do is close the object, save it as RData, and then when I need to load the RData. The RData objects themselves are very small. Best, R. On Wed, Nov 25, 2009 at 2:28 AM, Hao Cen h...@andrew.cmu.edu wrote: Hi, I have two questions on using the ff package and wonder if anyone who used ff can share some thoughts. I need to save a matrix as a memory-mapped file and load it back later. To save the matrix, I use mat = matrix(1:20, 4, 5) matFF = ff(mat, dim=dim(mat), filename=~/a.mat, overwrite=TRUE, dimnames = dimnames(mat)) To load it back, I use matFF2 = ff(vmode = double, dim= ???, filename=~/a.mat, overwrite=F) However, I don't always know the dimension when loading the matrix back. If I miss the dim attributes, ff will return it as vector. Is there a way to load the matrix without specifying the dimension? The second question is that the matrix may grow in terms of the number of rows. I would like to synchronize the change to the memory-mapped file. Is there an efficient way to do this? Thanks Jeff __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ramon Diaz-Uriarte Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz Phone: +34-91-732-8000 ext. 3019 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] questions on the ff package
Jeff, I need to save a matrix as a memory-mapped file and load it back later. To save the matrix, I use mat = matrix(1:20, 4, 5) matFF = ff(mat, dim=dim(mat), filename=~/a.mat , overwrite=TRUE, dimnames = dimnames(mat)) # This stores the data in an ff file, # but not the metadata in R's ff object. # To do the latter you need to do save(matFF, file=~/matFF.RData) # Assuming that your ff file remains in the same location, # in a new R session you simply load(file=~/matFF.RData) # and the ff file is available automagically However, I don't always know the dimension when loading the matrix back. If I miss the dim attributes, ff will return it as vector. Is there a way to load the matrix without specifying the dimension? # You can create an ff object using your existing ff file by matFF - ff(filename=~/a.mat, vmode=double, dim=c(4,5)) # You can do the same at unknown file size with matFF - ff(filename=~/a.mat, vmode=double) # which gives you the length of the ff object length(matFF) # if you know the number of columns you can calculate the number of rows and give your ff object the interpretation of a matrix dim(matFF) - c(length(matFF)/5, 5) the matrix may grow in terms of the number of rows. Is there an efficient way to do this? # there are two ways to grow a matrix by rows # 1) you create the matrix in major row order matFF - ff(1:20, dim=c(4,5), dimorder=c(2:1)) # then you require a higher number of rows nrow(matFF) - 6 # as you can see there are new empty rows in the file matFF # 2) Instead of a matrix you create a ffdf data.frame #which you can also give more rows using nrow- #An example of this is in read.table.ffdf #which reads a csv file in chunks and extends the #number of rows in the ffdf Jens Oehlschlägel -- Preisknaller: GMX DSL Flatrate für nur 16,99 Euro/mtl.! http://portal.gmx.net/de/go/dsl02 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] questions on the ff package
Hi, I have two questions on using the ff package and wonder if anyone who used ff can share some thoughts. I need to save a matrix as a memory-mapped file and load it back later. To save the matrix, I use mat = matrix(1:20, 4, 5) matFF = ff(mat, dim=dim(mat), filename=~/a.mat, overwrite=TRUE, dimnames = dimnames(mat)) To load it back, I use matFF2 = ff(vmode = double, dim= ???, filename=~/a.mat, overwrite=F) However, I don't always know the dimension when loading the matrix back. If I miss the dim attributes, ff will return it as vector. Is there a way to load the matrix without specifying the dimension? The second question is that the matrix may grow in terms of the number of rows. I would like to synchronize the change to the memory-mapped file. Is there an efficient way to do this? Thanks Jeff __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.