Re: [R] assigning and saving datasets in a loop, with names changing with i

2007-12-21 Thread Tony Plate
Marie Pierre Sylvestre wrote:
 Dear R users,
 
 I am analysing a very large data set and I need to perform several data
 manipulations. The dataset is so big that the only way I can play with it
 without having memory problems (E.g. cannot allocate vectors of size...)
 is to write a batch script to:
 
 1. cut the data into pieces 
 2. save the pieces in seperate .RData files
 3. Remove everything from the environment
 4. load one of the piece
 5. perform the manipulations on it
 6. save it and remove it from the environment
 7. Redo 4-6 for every piece
 8. Merge everything together at the end
 
 It works if coded line by line but since I'll have to perform these tasks
 on other data sets, I am trying to automate this as much as I can. 

The trackObjs package is designed to make it easy to work in approximately 
this manner -- it saves objects automatically to disk but they are still 
accessible as normal.

Here's how you could do the above - this example works with 10 8Mb objects 
in a R session with a limit of 40Mb.

# allow R only 40Mb of vector memory
mem.limits(vsize=40e6)
mem.limits()/1e6
library(trackObjs)
# start tracking to store data objects in the directory 'data'
# each object is 8Mb, and we store 10 of them
track.start(data)
n - 10
m - 1e6
constructObject - function(i) i+rnorm(m)
# steps 1, 2  3:
for (i in 1:n) {
xname - paste(x, i, sep=)
cat(, xname)
assign(xname, constructObject(i))
# store in a file, accessible by name:
track(list=xname)
}
cat(\n)
gc(TRUE)
# accessing object by name
object.size(x1)/2^20 # In Mb
mean(x1)
mean(x2)
gc(TRUE)
# steps 4:6
# accessing object through a constructed name
result - sapply(1:n, function(i) mean(get(paste(x, i, sep=
result
# remove the data objects
track.remove(list=paste(x, 1:n, sep=))
track.stop()

Here's the a full transcript of the above - note how whenever gc() is 
called there is hardly any vector memory in use.

  # allow R only 40Mb of vector memory
  mem.limits(vsize=40e6)
nsizevsize
   NA 4000
  mem.limits()/1e6
nsize vsize
NA40
  library(trackObjs)
  # start tracking to store data objects in the directory 'data'
  # each object is 8Mb, and we store 10 of them
  track.start(data)
  n - 10
  m - 1e6
  constructObject - function(i) i+rnorm(m)
  # steps 1, 2  3:
  for (i in 1:n) {
+xname - paste(x, i, sep=)
+cat(, xname)
+assign(xname, constructObject(i))
+# store in a file, accessible by name:
+track(list=xname)
+ }
  x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 cat(\n)

  gc(TRUE)
Garbage collection 19 = 6+0+13 (level 2) ...
4.0 Mbytes of cons cells used (42%)
0.7 Mbytes of vectors used (5%)
  used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 148362  4.0 35  9.4 NA   35  9.4
Vcells  89973  0.71950935 14.9   38.2  2112735 16.2
  # accessing object by name
  object.size(x1)/2^20 # In Mb
[1] 7.629417
  mean(x1)
[1] 0.998635
  mean(x2)
[1] 1.999656
  gc(TRUE)
Garbage collection 22 = 7+1+14 (level 2) ...
4.0 Mbytes of cons cells used (43%)
0.7 Mbytes of vectors used (6%)
  used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 149264  4.0 35  9.4 NA   35  9.4
Vcells  90160  0.71560747 12.0   38.2  2112735 16.2
  # steps 4:6
  result - sapply(1:n, function(i) mean(get(paste(x, i, sep=
  result
  [1]  0.998635  1.999656  2.997368  4.000197  5.000159  6.001216  6.999552
  [8]  7.999743  8.82 10.001355
  # remove the data objects
  track.remove(list=paste(x, 1:n, sep=))
  [1] x1  x2  x3  x4  x5  x6  x7  x8  x9  x10
  track.stop()
 



 
 I am using a loop in which I used 'assign' and 'get' (pseudo code below).
 My problem is when I use 'get', it prints the whole object on the screen.
 I am wondering whether there is a more efficient way to do what I need to
 do. Any help would be appreciated. Please keep in mind that the whole
 process is quite computer-intensive, so I can't keep everything in the
 environment while R performs calculations.
 
 Say I have 1 big dataframe called data. I use 'split' to divide it into a
 list of 12 dataframes (call this list my.list)
 
 my.fun is a function that takes a dataframe, performs several
 manipulations on it and returns a dataframe.
 
 
 for (i in 1:12){
   assign( paste( data, i, sep=),  my.fun(my.list[i]))   # this works
   # now I need to save this new object as a RData. 
 
   # The following line does not work
   save(paste(data, i, sep = ),  file = paste(  paste(data, i, sep =
 ), RData, sep=.))
 }
 
   # This works but it is a bit convoluted!!!
   temp - get(paste(data, i, sep = ))
   save(temp,  file = lala.RData)
 }
 
 
 I am *sure* there is something more clever to do but I can't find it. Any
 help would be appreciated.
 
 best regards,
 
 MP
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide 

Re: [R] assigning and saving datasets in a loop, with names changing with i

2007-12-19 Thread Moshe Olshansky
Won't it be simpler to do:

for (i in 1:12){
data - my.fun(my.list[i]))   
save(data,file = paste(data,i,.RData, sep=)) }


--- Marie Pierre Sylvestre
[EMAIL PROTECTED] wrote:

 Dear R users,
 
 I am analysing a very large data set and I need to
 perform several data
 manipulations. The dataset is so big that the only
 way I can play with it
 without having memory problems (E.g. cannot
 allocate vectors of size...)
 is to write a batch script to:
 
 1. cut the data into pieces 
 2. save the pieces in seperate .RData files
 3. Remove everything from the environment
 4. load one of the piece
 5. perform the manipulations on it
 6. save it and remove it from the environment
 7. Redo 4-6 for every piece
 8. Merge everything together at the end
 
 It works if coded line by line but since I'll have
 to perform these tasks
 on other data sets, I am trying to automate this as
 much as I can. 
 
 I am using a loop in which I used 'assign' and 'get'
 (pseudo code below).
 My problem is when I use 'get', it prints the whole
 object on the screen.
 I am wondering whether there is a more efficient way
 to do what I need to
 do. Any help would be appreciated. Please keep in
 mind that the whole
 process is quite computer-intensive, so I can't keep
 everything in the
 environment while R performs calculations.
 
 Say I have 1 big dataframe called data. I use
 'split' to divide it into a
 list of 12 dataframes (call this list my.list)
 
 my.fun is a function that takes a dataframe,
 performs several
 manipulations on it and returns a dataframe.
 
 
 for (i in 1:12){
   assign( paste( data, i, sep=), 
 my.fun(my.list[i]))   # this works
   # now I need to save this new object as a RData. 
 
   # The following line does not work
   save(paste(data, i, sep = ),  file = paste( 
 paste(data, i, sep =
 ), RData, sep=.))
 }
 
   # This works but it is a bit convoluted!!!
   temp - get(paste(data, i, sep = ))
   save(temp,  file = lala.RData)
 }
 
 
 I am *sure* there is something more clever to do but
 I can't find it. Any
 help would be appreciated.
 
 best regards,
 
 MP
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] assigning and saving datasets in a loop, with names changing with i

2007-12-18 Thread Marie Pierre Sylvestre
Dear R users,

I am analysing a very large data set and I need to perform several data
manipulations. The dataset is so big that the only way I can play with it
without having memory problems (E.g. cannot allocate vectors of size...)
is to write a batch script to:

1. cut the data into pieces 
2. save the pieces in seperate .RData files
3. Remove everything from the environment
4. load one of the piece
5. perform the manipulations on it
6. save it and remove it from the environment
7. Redo 4-6 for every piece
8. Merge everything together at the end

It works if coded line by line but since I'll have to perform these tasks
on other data sets, I am trying to automate this as much as I can. 

I am using a loop in which I used 'assign' and 'get' (pseudo code below).
My problem is when I use 'get', it prints the whole object on the screen.
I am wondering whether there is a more efficient way to do what I need to
do. Any help would be appreciated. Please keep in mind that the whole
process is quite computer-intensive, so I can't keep everything in the
environment while R performs calculations.

Say I have 1 big dataframe called data. I use 'split' to divide it into a
list of 12 dataframes (call this list my.list)

my.fun is a function that takes a dataframe, performs several
manipulations on it and returns a dataframe.


for (i in 1:12){
  assign( paste( data, i, sep=),  my.fun(my.list[i]))   # this works
  # now I need to save this new object as a RData. 

  # The following line does not work
  save(paste(data, i, sep = ),  file = paste(  paste(data, i, sep =
), RData, sep=.))
}

  # This works but it is a bit convoluted!!!
  temp - get(paste(data, i, sep = ))
  save(temp,  file = lala.RData)
}


I am *sure* there is something more clever to do but I can't find it. Any
help would be appreciated.

best regards,

MP

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] assigning and saving datasets in a loop, with names changing with i

2007-12-18 Thread Benilton Carvalho

you want to use:

save(list=paste(data, i, sep=), file=paste(data, i, .Rdata,  
sep=))


b

On Dec 18, 2007, at 9:24 PM, Marie Pierre Sylvestre wrote:


Dear R users,

I am analysing a very large data set and I need to perform several  
data
manipulations. The dataset is so big that the only way I can play  
with it
without having memory problems (E.g. cannot allocate vectors of  
size...)

is to write a batch script to:

1. cut the data into pieces
2. save the pieces in seperate .RData files
3. Remove everything from the environment
4. load one of the piece
5. perform the manipulations on it
6. save it and remove it from the environment
7. Redo 4-6 for every piece
8. Merge everything together at the end

It works if coded line by line but since I'll have to perform these  
tasks

on other data sets, I am trying to automate this as much as I can.

I am using a loop in which I used 'assign' and 'get' (pseudo code  
below).
My problem is when I use 'get', it prints the whole object on the  
screen.
I am wondering whether there is a more efficient way to do what I  
need to

do. Any help would be appreciated. Please keep in mind that the whole
process is quite computer-intensive, so I can't keep everything in the
environment while R performs calculations.

Say I have 1 big dataframe called data. I use 'split' to divide it  
into a

list of 12 dataframes (call this list my.list)

my.fun is a function that takes a dataframe, performs several
manipulations on it and returns a dataframe.


for (i in 1:12){
 assign( paste( data, i, sep=),  my.fun(my.list[i]))   # this  
works

 # now I need to save this new object as a RData.

 # The following line does not work
 save(paste(data, i, sep = ),  file = paste(  paste(data, i,  
sep =

), RData, sep=.))
}

 # This works but it is a bit convoluted!!!
 temp - get(paste(data, i, sep = ))
 save(temp,  file = lala.RData)
}


I am *sure* there is something more clever to do but I can't find  
it. Any

help would be appreciated.

best regards,

MP

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.