Re: [R] questions on the ff package

2009-11-27 Thread Jens Oehlschlägel
 I wonder how efficiently it is to do the following command on a frequent 
 basis.
 nrow(matFF) - nrow(matFF)+1

Obviously there is overhead (closing file, enlarging file, openeing file). 
I recommend you measure yourself whether this is acceptable for you.

 no large file copying is needed each time the nrow is changed?

With a decent filesystem there is *no* copying from smaller to larger file.

 would you think I can open 2000 large matrices and leave them open or I
need to close each after it is opened and used?

Not tested yet. 
I guess the number of open files can be configured when compiling your OS. 
Please test and let us know your experience.

Regards
Jens Oehlschlägel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] questions on the ff package

2009-11-26 Thread Hao Cen
Hi Jens,

Thanks for your prompt and informative answers. ff is a fabulous package and
your suggestions helped me solve my problems at hands.

As I need to incrementally increase each of several large matrices (about
1000 rows *1 columns, 1000 matrices) by a row every day. I wonder how
efficiently it is to do the following command on a frequent basis.

nrow(matFF) - nrow(matFF)+1 

as far as I know for mmap, a fixed size of file is preallocated. I don't
know the ff implementation and how it handle file size changes. Does the
command in the line above preallocate, say 10% more space of the current
size so that no large file copying is needed each time the nrow is changed?



Another problem I am facing is that I have over 2000 large matrices that
need the help of ff. Suppose I have a list of 2000 ff objects. My computing
environment is 64bit linux, 64Gb mem. I remember there is some limitations
on the maximum of files that can be opened in linux. If I need to access
each matrix, would you think I can open each matrix and leave them open or I
need to close it after it is opened and used?

Thanks a lot

Jeff

-Original Message-
From: Jens Oehlschlägel [mailto:oehl_l...@gmx.de] 
Sent: Wednesday, November 25, 2009 8:04 AM
To: h...@andrew.cmu.edu
Cc: r-h...@lists.r-project.org
Subject: [R] questions on the ff package

Jeff,

 I need to save a matrix as a memory-mapped file and load it back later. 
 To save the matrix, I use
 mat = matrix(1:20, 4, 5)
 matFF = ff(mat, dim=dim(mat), filename=~/a.mat
 , overwrite=TRUE, dimnames = dimnames(mat))

# This stores the data in an ff file, 
# but not the metadata in R's ff object. 
# To do the latter you need to do 
save(matFF, file=~/matFF.RData)

# Assuming that your ff file remains in the same location, 
# in a new R session you simply 
load(file=~/matFF.RData)
# and the ff file is available automagically

 However, I don't always know the dimension when loading the matrix back.
 If I miss the dim attributes, ff will return it as vector. 
 Is there a way to load the matrix without specifying the dimension?

# You can create an ff object using your existing ff file by
matFF - ff(filename=~/a.mat, vmode=double, dim=c(4,5))

# You can do the same at unknown file size with 
matFF - ff(filename=~/a.mat, vmode=double)
# which gives you the length of the ff object
length(matFF)
# if you know the number of columns you can calculate the number of rows and
give your ff object the interpretation of a matrix
dim(matFF) - c(length(matFF)/5, 5)

 the matrix may grow in terms of the number of rows. 
 Is there an efficient way to do this?

# there are two ways to grow a matrix by rows

# 1) you create the matrix in major row order
matFF - ff(1:20, dim=c(4,5), dimorder=c(2:1))
# then you require a higher number of rows
nrow(matFF) - 6
# as you can see there are new empty rows in the file
matFF

# 2) Instead of a matrix you create a ffdf data.frame
#which you can also give more rows using nrow-
#An example of this is in read.table.ffdf
#which reads a csv file in chunks and extends the 
#number of rows in the ffdf

Jens Oehlschlägel

-- 
Preisknaller: GMX DSL Flatrate für nur 16,99 Euro/mtl.!
http://portal.gmx.net/de/go/dsl02

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] questions on the ff package

2009-11-25 Thread Ramon Diaz-Uriarte
Dear Jeff,

This is not exactly what you are asking, but what I do is close the
object, save it as RData, and then when I need to load the RData.  The
RData objects themselves are very small.

Best,

R.


On Wed, Nov 25, 2009 at 2:28 AM, Hao Cen h...@andrew.cmu.edu wrote:
 Hi,

 I have two questions on using the ff package and wonder if anyone who used
 ff can share some thoughts.

 I need to save a matrix as a memory-mapped file and load it back later. To
 save the matrix, I use

 mat = matrix(1:20, 4, 5)
 matFF = ff(mat, dim=dim(mat), filename=~/a.mat, overwrite=TRUE, dimnames
 = dimnames(mat))

 To load it back, I use
 matFF2 = ff(vmode = double, dim= ???, filename=~/a.mat, overwrite=F)

 However, I don't always know the dimension when loading the matrix back.
 If I miss the dim attributes, ff will return it as vector. Is there a way
 to load the matrix without specifying the dimension?

 The second question is that the matrix may grow in terms of the number of
 rows. I would like to synchronize the change to the memory-mapped file. Is
 there an efficient way to do this?

 Thanks

 Jeff

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ramon Diaz-Uriarte
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz
Phone: +34-91-732-8000 ext. 3019

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] questions on the ff package

2009-11-25 Thread Jens Oehlschlägel
Jeff,

 I need to save a matrix as a memory-mapped file and load it back later. 
 To save the matrix, I use
 mat = matrix(1:20, 4, 5)
 matFF = ff(mat, dim=dim(mat), filename=~/a.mat
 , overwrite=TRUE, dimnames = dimnames(mat))

# This stores the data in an ff file, 
# but not the metadata in R's ff object. 
# To do the latter you need to do 
save(matFF, file=~/matFF.RData)

# Assuming that your ff file remains in the same location, 
# in a new R session you simply 
load(file=~/matFF.RData)
# and the ff file is available automagically

 However, I don't always know the dimension when loading the matrix back.
 If I miss the dim attributes, ff will return it as vector. 
 Is there a way to load the matrix without specifying the dimension?

# You can create an ff object using your existing ff file by
matFF - ff(filename=~/a.mat, vmode=double, dim=c(4,5))

# You can do the same at unknown file size with 
matFF - ff(filename=~/a.mat, vmode=double)
# which gives you the length of the ff object
length(matFF)
# if you know the number of columns you can calculate the number of rows and 
give your ff object the interpretation of a matrix
dim(matFF) - c(length(matFF)/5, 5)

 the matrix may grow in terms of the number of rows. 
 Is there an efficient way to do this?

# there are two ways to grow a matrix by rows

# 1) you create the matrix in major row order
matFF - ff(1:20, dim=c(4,5), dimorder=c(2:1))
# then you require a higher number of rows
nrow(matFF) - 6
# as you can see there are new empty rows in the file
matFF

# 2) Instead of a matrix you create a ffdf data.frame
#which you can also give more rows using nrow-
#An example of this is in read.table.ffdf
#which reads a csv file in chunks and extends the 
#number of rows in the ffdf

Jens Oehlschlägel

-- 
Preisknaller: GMX DSL Flatrate für nur 16,99 Euro/mtl.!
http://portal.gmx.net/de/go/dsl02

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] questions on the ff package

2009-11-24 Thread Hao Cen
Hi,

I have two questions on using the ff package and wonder if anyone who used
ff can share some thoughts.

I need to save a matrix as a memory-mapped file and load it back later. To
save the matrix, I use

mat = matrix(1:20, 4, 5)
matFF = ff(mat, dim=dim(mat), filename=~/a.mat, overwrite=TRUE, dimnames
= dimnames(mat))

To load it back, I use
matFF2 = ff(vmode = double, dim= ???, filename=~/a.mat, overwrite=F)

However, I don't always know the dimension when loading the matrix back.
If I miss the dim attributes, ff will return it as vector. Is there a way
to load the matrix without specifying the dimension?

The second question is that the matrix may grow in terms of the number of
rows. I would like to synchronize the change to the memory-mapped file. Is
there an efficient way to do this?

Thanks

Jeff

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.