subject:"\[R\] Memory usage"

[R] Memory usage

2019-02-18 Thread Christofer Bogaso

Hi,

I have below lines of code to understand how R manages memory.

> library(pryr)

*Warning message:*

*package ‘pryr’ was built under R version 3.4.3 *

> mem_change(x <- 1:1e6)

4.01 MB

> mem_change(y <- x)

976 B

> mem_change(x[100] < NA)

976 B

> mem_change(rm(x))

864 B

> mem_change(rm(y))

-4 MB

>

I do understand why there is only 976 B positive change in the 3rd line.
This is because now y and x both points to the same block of memory that
holds 1:1e6.

But I dont understand below

> mem_change(rm(x))

864 B
Why memory consumption increased here while deleting an object, although by
a small amount?

Any detailed explanation will be appreciated. Thanks,

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory usage in prcomp

2016-03-22 Thread Roy Mendelssohn - NOAA Federal

> On Mar 22, 2016, at 10:00 AM, Martin Maechler  
> wrote:
> 
>> Roy Mendelssohn <- NOAA Federal >
>>on Tue, 22 Mar 2016 07:42:10 -0700 writes:
> 
>> Hi All:
>> I am running prcomp on a very large array, roughly [50, 3650].  The 
>> array itself is 16GB.  I am running on a Unix machine and am running “top” 
>> at the same time and am quite surprised to see that the application memory 
>> usage is 76GB.  I have the “tol” set very high  (.8) so that it should only 
>> pull out a few components.  I am surprised at this memory usage because 
>> prcomp uses the SVD if I am not mistaken, and when I take guesses at the 
>> size of the SVD matrices they shouldn’t be that large.   While I can fit 
>> this  in, for a variety of reasons I would like to reduce the memory 
>> footprint.  She questions:
> 
>> 1.  I am running with “center=FALSE” and “scale=TRUE”.  Would I save memory 
>> if I scaled the data first myself, saved the result, cleared out the 
>> workspace, read the scaled data back in and did the prcomp call?  Basically 
>> are the intermediate calculations for scaling kept in memory after use.
> 
>> 2. I don’t know how prcomp memory usage compares to a direct call to “svn” 
>> which allows me to explicitly set how many  singular vectors to compute (I 
>> only need like the first five at most).  prcomp is convenient because it 
>> does a lot of the other work for me
> 
> For your example, where p := ncol(x)  is 3650  but you only want
> the first 5 PCs, it would be *considerably* more efficient to
> use svd(..., nv = 5) directly.
> 
> So I would take  stats:::prcomp.default  and modify it
> correspondingly.
> 
> This seems such a useful idea in general that I consider
> updating the function in R with a new optional 'rank.'  argument which
> you'd set to 5 in your case.
> 
> Scrutinizing R's underlying svd() code however, I know see that
> there are typicall still two other [n x p] matrices created (on
> in R's La.svd(), one in C code) ... which I think should be
> unnecessary in this case... but that would really be another
> topic (for R-devel , not R-help).
> 
> Martin
> 

Thanks.  It is easy enough to recode using SVN, and I think I will.It gives 
me a ;title more control on what the algorithm does.

-Roy

**
"The contents of this message do not reflect any position of the U.S. 
Government or NOAA."
**
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new address and phone***
110 Shaffer Road
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected" 
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory usage in prcomp

2016-03-22 Thread Martin Maechler

> Roy Mendelssohn <- NOAA Federal >
> on Tue, 22 Mar 2016 07:42:10 -0700 writes:

> Hi All:
> I am running prcomp on a very large array, roughly [50, 3650].  The 
array itself is 16GB.  I am running on a Unix machine and am running “top” at 
the same time and am quite surprised to see that the application memory usage 
is 76GB.  I have the “tol” set very high  (.8) so that it should only pull out 
a few components.  I am surprised at this memory usage because prcomp uses the 
SVD if I am not mistaken, and when I take guesses at the size of the SVD 
matrices they shouldn’t be that large.   While I can fit this  in, for a 
variety of reasons I would like to reduce the memory footprint.  She questions:

> 1.  I am running with “center=FALSE” and “scale=TRUE”.  Would I save 
memory if I scaled the data first myself, saved the result, cleared out the 
workspace, read the scaled data back in and did the prcomp call?  Basically are 
the intermediate calculations for scaling kept in memory after use.

> 2. I don’t know how prcomp memory usage compares to a direct call to 
“svn” which allows me to explicitly set how many  singular vectors to compute 
(I only need like the first five at most).  prcomp is convenient because it 
does a lot of the other work for me

For your example, where p := ncol(x)  is 3650  but you only want
the first 5 PCs, it would be *considerably* more efficient to
use svd(..., nv = 5) directly.

So I would take  stats:::prcomp.default  and modify it
correspondingly.

This seems such a useful idea in general that I consider
updating the function in R with a new optional 'rank.'  argument which
you'd set to 5 in your case.

Scrutinizing R's underlying svd() code however, I know see that
there are typicall still two other [n x p] matrices created (on
in R's La.svd(), one in C code) ... which I think should be
unnecessary in this case... but that would really be another
topic (for R-devel , not R-help).

Martin

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Memory usage in prcomp

2016-03-22 Thread Roy Mendelssohn - NOAA Federal

Hi All:

I am running prcomp on a very large array, roughly [50, 3650].  The array 
itself is 16GB.  I am running on a Unix machine and am running “top” at the 
same time and am quite surprised to see that the application memory usage is 
76GB.  I have the “tol” set very high  (.8) so that it should only pull out a 
few components.  I am surprised at this memory usage because prcomp uses the 
SVD if I am not mistaken, and when I take guesses at the size of the SVD 
matrices they shouldn’t be that large.   While I can fit this  in, for a 
variety of reasons I would like to reduce the memory footprint.  She questions:

1.  I am running with “center=FALSE” and “scale=TRUE”.  Would I save memory if 
I scaled the data first myself, saved the result, cleared out the workspace, 
read the scaled data back in and did the prcomp call?  Basically are the 
intermediate calculations for scaling kept in memory after use.

2. I don’t know how prcomp memory usage compares to a direct call to “svn” 
which allows me to explicitly set how many  singular vectors to compute (I only 
need like the first five at most).  prcomp is convenient because it does a lot 
of the other work for me


**
"The contents of this message do not reflect any position of the U.S. 
Government or NOAA."
**
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new address and phone***
110 Shaffer Road
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected" 
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Memory usage problem while using nlm function

2014-12-31 Thread Kushal

Hi,
 I am trying to do nonlinear minimization using nlm() function, but for
large amount of data it is going out of memory.

Code which i am using:

f-function(p,n11,E){
  sum(-log((p[5] * dnbinom(n11, size=p[1], prob=p[2]/(p[2]+E)) +
(1-p[5]) * dnbinom(n11, size=p[3], prob=p[4]/(p[4]+E)
}
p_out -nlm(f, p=c(alpha1= 0.2, beta1= 0.06, alpha2=1.4, beta2=1.8, w=0.1),
n11=n11_c, E=E_c)

When the size of n11_c or E_c vector is to large, it is going out of memory.
please give me some solution for this.



--
View this message in context: 
http://r.789695.n4.nabble.com/Memory-usage-problem-while-using-nlm-function-tp4701241.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] memory usage with party::cforest

2014-11-21 Thread Andrew Z

Is there a way to shrink the size of RandomForest-class (an S4
object), so that it requires less memory during run-time and less disk
space for serialization?

On my system the data slot is about 2GB, which is causing problems,
and I'd like to see whether predict() works without it.

# example with a much smaller data set (i.e., less than 2GB)
require(party)
data(iris)
cf - cforest(Species ~ ., data=iris)
str(cf, max.level=2)
cf@data - NULL # this fails



Andrew

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory usage bar plot

2013-09-04 Thread mohan . radhakrishnan

   init
148.0 KiB +  26.0 KiB = 174.0 KiB   mapping-daemon
152.0 KiB +  25.5 KiB = 177.5 KiB   gnome-keyring-daemon
152.0 KiB +  27.5 KiB = 179.5 KiB   portmap
164.0 KiB +  18.0 KiB = 182.0 KiB   syslogd
168.0 KiB +  24.5 KiB = 192.5 KiB   atd
180.0 KiB +  18.5 KiB = 198.5 KiB   brcm_iscsiuio
188.0 KiB +  37.0 KiB = 225.0 KiB   rpc.statd
208.0 KiB +  24.0 KiB = 232.0 KiB   audispd
208.0 KiB +  40.5 KiB = 248.5 KiB   hald-runner
244.0 KiB +  23.5 KiB = 267.5 KiB   smartd
240.0 KiB +  35.5 KiB = 275.5 KiB   hpiod
244.0 KiB +  35.0 KiB = 279.0 KiB   hcid
228.0 KiB +  75.0 KiB = 303.0 KiB   hald-addon-keyboard (2)
196.0 KiB + 144.0 KiB = 340.0 KiB   sh
328.0 KiB +  32.5 KiB = 360.5 KiB   gam_server
336.0 KiB +  32.5 KiB = 368.5 KiB   xinetd
364.0 KiB +  28.5 KiB = 392.5 KiB   auditd
420.0 KiB +  84.0 KiB = 504.0 KiB   mingetty (6)
552.0 KiB +  19.5 KiB = 571.5 KiB   udevd
532.0 KiB +  56.0 KiB = 588.0 KiB   rpc.idmapd
544.0 KiB +  50.5 KiB = 594.5 KiB   ssh-agent
612.0 KiB +  29.0 KiB = 641.0 KiB   crond
484.0 KiB + 176.0 KiB = 660.0 KiB   avahi-daemon (2)
576.0 KiB + 164.0 KiB = 740.0 KiB   sftp-server
744.0 KiB +  74.5 KiB = 818.5 KiB   automount
756.0 KiB + 186.5 KiB = 942.5 KiB   gnome-vfs-daemon
736.0 KiB + 296.0 KiB =   1.0 MiB   dbus-daemon (2)
988.0 KiB +  61.5 KiB =   1.0 MiB   pcscd
824.0 KiB + 231.5 KiB =   1.0 MiB   pam-panel-icon
  1.0 MiB +  26.0 KiB =   1.1 MiB   nmon
864.0 KiB + 229.5 KiB =   1.1 MiB   bt-applet
712.0 KiB + 398.0 KiB =   1.1 MiB   nm-system-settings
  1.0 MiB +  63.0 KiB =   1.1 MiB   nmbd
996.0 KiB + 131.0 KiB =   1.1 MiB   bonobo-activation-server
740.0 KiB + 395.5 KiB =   1.1 MiB   escd
880.0 KiB + 432.0 KiB =   1.3 MiB   bash (2)
  1.1 MiB + 212.5 KiB =   1.3 MiB   gnome-screensaver
796.0 KiB + 617.5 KiB =   1.4 MiB   gdm-rh-security-token-helper
916.0 KiB + 739.5 KiB =   1.6 MiB   gdm-binary (2)
  1.2 MiB + 387.5 KiB =   1.6 MiB   gnome-session
  1.4 MiB + 221.0 KiB =   1.6 MiB   cupsd
  1.3 MiB + 443.5 KiB =   1.8 MiB   notification-area-applet
  2.1 MiB +  69.0 KiB =   2.2 MiB   xfs
  1.8 MiB + 545.5 KiB =   2.3 MiB   eggcups
  2.2 MiB +  86.5 KiB =   2.3 MiB   gconfd-2
  1.9 MiB + 492.5 KiB =   2.4 MiB   gnome-settings-daemon
  2.0 MiB + 421.5 KiB =   2.4 MiB   gnome-power-manager
  1.9 MiB + 569.0 KiB =   2.5 MiB   trashapplet
  1.7 MiB +   1.0 MiB =   2.7 MiB   smbd (2)
  2.6 MiB + 365.0 KiB =   2.9 MiB   iscsid (2)
  2.7 MiB + 349.0 KiB =   3.0 MiB   sendmail.sendmail (2)
  3.2 MiB +  73.0 KiB =   3.2 MiB   hald
  2.7 MiB + 649.0 KiB =   3.4 MiB   clock-applet
  2.5 MiB +   1.4 MiB =   3.9 MiB   nm-applet
  3.4 MiB + 729.5 KiB =   4.1 MiB   metacity
  2.8 MiB +   1.4 MiB =   4.2 MiB   sshd (4)
  3.4 MiB + 853.0 KiB =   4.3 MiB   wnck-applet
  4.4 MiB + 377.5 KiB =   4.8 MiB   Xorg
  4.3 MiB + 717.5 KiB =   5.0 MiB   mixer_applet2
  4.5 MiB + 809.5 KiB =   5.3 MiB   gnome-panel
  5.3 MiB + 251.5 KiB =   5.6 MiB   hpssd.py
  4.0 MiB +   3.3 MiB =   7.2 MiB   httpd (11)
 10.5 MiB + 870.0 KiB =  11.3 MiB   gdmgreeter
 12.8 MiB +   1.1 MiB =  13.8 MiB   Xvnc
 13.7 MiB + 515.5 KiB =  14.2 MiB   yum-updatesd
 16.3 MiB +   1.6 MiB =  17.9 MiB   nautilus
 20.8 MiB +   1.4 MiB =  22.2 MiB   puplet
  1.5 GiB + 438.0 KiB =   1.5 GiB   java
-
  1.7 GiB
=


Thanks,
Mohan



From:   jim holtman jholt...@gmail.com
To: mohan.radhakrish...@polarisft.com
Cc: R mailing list r-help@r-project.org
Date:   08/30/2013 07:14 PM
Subject:Re: [R] Memory usage bar plot



Here is how to parse the data and put it into groups.  Not sure what
the 'timing' of each group is since not time information was given.
Also not sure is there is an 'MiB' qualifier on the data, but you have
the matrix of data which is easy to do with as you want.


 input - readLines(textConnection(
+  Private  +   Shared  =  RAM used   Program
+
+  96.0 KiB +  11.5 KiB = 107.5 KiB   uuidd
+ 108.0 KiB +  12.5 KiB = 120.5 KiB   klogd
+ 124.0 KiB +  17.0 KiB = 141.0 KiB   hidd
+ 116.0 KiB +  30.0 KiB = 146.0 KiB   acpid
+ 124.0 KiB +  29.5 KiB = 153.5 KiB   hald-addon-storage
+ 144.0 KiB +  15.0 KiB = 159.0 KiB   gpm
+ 136.0 KiB +  26.5 KiB = 162.5 KiB   pam_timestamp_check
+ -
+ 453.9 MiB
+
+ =
+  Private  +   Shared  =  RAM used   Program
+
+  96.0 KiB +  11.5 KiB = 107.5 KiB   uuidd
+ 108.0 KiB +  12.5 KiB = 120.5 KiB   klogd
+ 124.0 KiB +  17.0 KiB = 141.0 KiB   hidd
+ 116.0 KiB +  30.0 KiB = 146.0 KiB   acpid
+ 124.0 KiB +  29.5 KiB = 153.5 KiB   hald-addon-storage
+ 144.0

Re: [R] Memory usage bar plot

2013-09-04 Thread arun

  brcm_iscsiuio
188.0 KiB +  37.0 KiB = 225.0 KiB  rpc.statd
208.0 KiB +  24.0 KiB = 232.0 KiB  audispd
208.0 KiB +  40.5 KiB = 248.5 KiB  hald-runner
244.0 KiB +  23.5 KiB = 267.5 KiB  smartd
240.0 KiB +  35.5 KiB = 275.5 KiB  hpiod
244.0 KiB +  35.0 KiB = 279.0 KiB  hcid
228.0 KiB +  75.0 KiB = 303.0 KiB  hald-addon-keyboard (2)
196.0 KiB + 144.0 KiB = 340.0 KiB  sh
328.0 KiB +  32.5 KiB = 360.5 KiB  gam_server
336.0 KiB +  32.5 KiB = 368.5 KiB  xinetd
364.0 KiB +  28.5 KiB = 392.5 KiB  auditd
420.0 KiB +  84.0 KiB = 504.0 KiB  mingetty (6)
552.0 KiB +  19.5 KiB = 571.5 KiB  udevd
532.0 KiB +  56.0 KiB = 588.0 KiB  rpc.idmapd
544.0 KiB +  50.5 KiB = 594.5 KiB  ssh-agent
612.0 KiB +  29.0 KiB = 641.0 KiB  crond
484.0 KiB + 176.0 KiB = 660.0 KiB  avahi-daemon (2)
576.0 KiB + 164.0 KiB = 740.0 KiB  sftp-server
744.0 KiB +  74.5 KiB = 818.5 KiB  automount
756.0 KiB + 186.5 KiB = 942.5 KiB  gnome-vfs-daemon
736.0 KiB + 296.0 KiB =  1.0 MiB  dbus-daemon (2)
988.0 KiB +  61.5 KiB =  1.0 MiB  pcscd
824.0 KiB + 231.5 KiB =  1.0 MiB  pam-panel-icon
  1.0 MiB +  26.0 KiB =  1.1 MiB  nmon
864.0 KiB + 229.5 KiB =  1.1 MiB  bt-applet
712.0 KiB + 398.0 KiB =  1.1 MiB  nm-system-settings
  1.0 MiB +  63.0 KiB =  1.1 MiB  nmbd
996.0 KiB + 131.0 KiB =  1.1 MiB  bonobo-activation-server
740.0 KiB + 395.5 KiB =  1.1 MiB  escd
880.0 KiB + 432.0 KiB =  1.3 MiB  bash (2)
  1.1 MiB + 212.5 KiB =  1.3 MiB  gnome-screensaver
796.0 KiB + 617.5 KiB =  1.4 MiB  gdm-rh-security-token-helper
916.0 KiB + 739.5 KiB =  1.6 MiB  gdm-binary (2)
  1.2 MiB + 387.5 KiB =  1.6 MiB  gnome-session
  1.4 MiB + 221.0 KiB =  1.6 MiB  cupsd
  1.3 MiB + 443.5 KiB =  1.8 MiB  notification-area-applet
  2.1 MiB +  69.0 KiB =  2.2 MiB  xfs
  1.8 MiB + 545.5 KiB =  2.3 MiB  eggcups
  2.2 MiB +  86.5 KiB =  2.3 MiB  gconfd-2
  1.9 MiB + 492.5 KiB =  2.4 MiB  gnome-settings-daemon
  2.0 MiB + 421.5 KiB =  2.4 MiB  gnome-power-manager
  1.9 MiB + 569.0 KiB =  2.5 MiB  trashapplet
  1.7 MiB +  1.0 MiB =  2.7 MiB  smbd (2)
  2.6 MiB + 365.0 KiB =  2.9 MiB  iscsid (2)
  2.7 MiB + 349.0 KiB =  3.0 MiB  sendmail.sendmail (2)
  3.2 MiB +  73.0 KiB =  3.2 MiB  hald
  2.7 MiB + 649.0 KiB =  3.4 MiB  clock-applet
  2.5 MiB +  1.4 MiB =  3.9 MiB  nm-applet
  3.4 MiB + 729.5 KiB =  4.1 MiB  metacity
  2.8 MiB +  1.4 MiB =  4.2 MiB  sshd (4)
  3.4 MiB + 853.0 KiB =  4.3 MiB  wnck-applet
  4.4 MiB + 377.5 KiB =  4.8 MiB  Xorg
  4.3 MiB + 717.5 KiB =  5.0 MiB  mixer_applet2
  4.5 MiB + 809.5 KiB =  5.3 MiB  gnome-panel
  5.3 MiB + 251.5 KiB =  5.6 MiB  hpssd.py
  4.0 MiB +  3.3 MiB =  7.2 MiB  httpd (11)
10.5 MiB + 870.0 KiB =  11.3 MiB  gdmgreeter
12.8 MiB +  1.1 MiB =  13.8 MiB  Xvnc
13.7 MiB + 515.5 KiB =  14.2 MiB  yum-updatesd
16.3 MiB +  1.6 MiB =  17.9 MiB  nautilus
20.8 MiB +  1.4 MiB =  22.2 MiB  puplet
  1.5 GiB + 438.0 KiB =  1.5 GiB  java
-
  1.7 GiB
=))


input1- input

 input2- str_trim(gsub([=+],,input1))
 input3- input2[input2!=]
 
dat1-read.table(text=gsub(\\,+,,,gsub(\\s{2},,,input3)),sep=,,header=FALSE,stringsAsFactors=FALSE,fill=TRUE)
dat2- dat1[,3:4]
 dat3- dat2[dat2[,1]!=,][-1,]
lst1-lapply(split(dat3,cumsum(1*grepl(RAM,dat3[,1]))),function(x) 
{x1-if(length(grep(RAM,x[,1]))0) x[-grep(RAM,x[,1]),] else x; x2- 
data.frame(read.table(text=x1[,1],sep=,header=FALSE,stringsAsFactors=FALSE),x1[,2],stringsAsFactors=FALSE);
 colnames(x2)- c(RAM, used, Program);x2})
 str(lst1)
#List of 2
# $ 0:'data.frame':    79 obs. of  3 variables:
#  ..$ RAM    : num [1:79] 98.5 119.5 139 140.5 144.5 ...
#  ..$ used   : chr [1:79] KiB KiB KiB KiB ...
#  ..$ Program: chr [1:79] sleep klogd hidd gpm ...
# $ 1:'data.frame':    79 obs. of  3 variables:
#  ..$ RAM    : num [1:79] 120 139 140 146 148 ...
#  ..$ used   : chr [1:79] KiB KiB KiB KiB ...
#  ..$ Program: chr [1:79] klogd hidd gpm hald-addon-storage ...

lapply(lst1,head)
#$`0`
#    RAM used    Program
#1  98.5  KiB  sleep
#2 119.5  KiB  klogd
#3 139.0  KiB   hidd
#4 140.5  KiB    gpm
#5 144.5  KiB hald-addon-storage
#6 148.0  KiB  acpid
#
#$`1`
#    RAM used    Program
#1 119.5  KiB  klogd
#2 139.0  KiB   hidd
#3 140.5  KiB    gpm
#4 145.5  KiB hald-addon-storage
#5 148.0  KiB  acpid
#6 153.0  KiB    dbus-launch

A.K.



- Original Message -
From: mohan.radhakrish...@polarisft.com mohan.radhakrish...@polarisft.com
To: jim holtman jholt...@gmail.com
Cc: R mailing list r-help@r-project.org
Sent: Wednesday, September 4, 2013 6:43 AM
Subject: Re: [R] Memory usage bar plot

Hi,
               I have tried the ideas with an actual data set but couldn't 
pass the parsing

[R] Memory usage bar plot

2013-08-30 Thread mohan . radhakrishnan

Hi,
  I haven't tried the code yet. Is there a way to parse this data 
using R and create bar plots so that each program's 'RAM used' figures are 
grouped together.
So 'uuidd' bars will be together. The data will have about 50 sets. So if 
there are 100 processes each will have about 50 bars.

What is the recommended way to graph these big barplots ? I am looking for 
only 'RAM used' figures.


Thanks,
Mohan


 Private  +   Shared  =  RAM used   Program

 96.0 KiB +  11.5 KiB = 107.5 KiB   uuidd
108.0 KiB +  12.5 KiB = 120.5 KiB   klogd
124.0 KiB +  17.0 KiB = 141.0 KiB   hidd
116.0 KiB +  30.0 KiB = 146.0 KiB   acpid
124.0 KiB +  29.5 KiB = 153.5 KiB   hald-addon-storage
144.0 KiB +  15.0 KiB = 159.0 KiB   gpm
136.0 KiB +  26.5 KiB = 162.5 KiB   pam_timestamp_check
-
453.9 MiB

=
 Private  +   Shared  =  RAM used   Program

 96.0 KiB +  11.5 KiB = 107.5 KiB   uuidd
108.0 KiB +  12.5 KiB = 120.5 KiB   klogd
124.0 KiB +  17.0 KiB = 141.0 KiB   hidd
116.0 KiB +  30.0 KiB = 146.0 KiB   acpid
124.0 KiB +  29.5 KiB = 153.5 KiB   hald-addon-storage
144.0 KiB +  15.0 KiB = 159.0 KiB   gpm
136.0 KiB +  26.5 KiB = 162.5 KiB   pam_timestamp_check
--
453.9 MiB
=


This e-Mail may contain proprietary and confidential information and is sent 
for the intended recipient(s) only.  If by an addressing or transmission error 
this mail has been misdirected to you, you are requested to delete this mail 
immediately. You are also hereby notified that any use, any form of 
reproduction, dissemination, copying, disclosure, modification, distribution 
and/or publication of this e-mail message, contents or its attachment other 
than by its intended recipient/s is strictly prohibited.

Visit us at http://www.polarisFT.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory usage bar plot

2013-08-30 Thread mohan . radhakrishnan

Hello,
This memory usage should be graphed with time. Are there 
examples of scatterplots that can clearly show usage vs time ?  This is 
for  memory leak detection.


Thanks,
Mohan



From:   PIKAL Petr petr.pi...@precheza.cz
To: mohan.radhakrish...@polarisft.com 
mohan.radhakrish...@polarisft.com, r-help@r-project.org 
r-help@r-project.org
Date:   08/30/2013 05:33 PM
Subject:RE: [R] Memory usage bar plot



Hi

For reading data into R you shall look to read.table and similar.

For plotting ggplot could handle it. However I wonder if 100 times 50 bars 
is the way how to present your data. You shall think over what do you want 
to show to yourself or your audience. Maybe boxplots or scatterplots could 
be better.

Petr


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of mohan.radhakrish...@polarisft.com
 Sent: Friday, August 30, 2013 1:25 PM
 To: r-help@r-project.org
 Subject: [R] Memory usage bar plot
 
 Hi,
   I haven't tried the code yet. Is there a way to parse this
 data using R and create bar plots so that each program's 'RAM used'
 figures are grouped together.
 So 'uuidd' bars will be together. The data will have about 50 sets. So
 if there are 100 processes each will have about 50 bars.
 
 What is the recommended way to graph these big barplots ? I am looking
 for only 'RAM used' figures.
 
 
 Thanks,
 Mohan
 
 
  Private  +   Shared  =  RAM used   Program
 
  96.0 KiB +  11.5 KiB = 107.5 KiB   uuidd
 108.0 KiB +  12.5 KiB = 120.5 KiB   klogd
 124.0 KiB +  17.0 KiB = 141.0 KiB   hidd
 116.0 KiB +  30.0 KiB = 146.0 KiB   acpid
 124.0 KiB +  29.5 KiB = 153.5 KiB   hald-addon-storage
 144.0 KiB +  15.0 KiB = 159.0 KiB   gpm
 136.0 KiB +  26.5 KiB = 162.5 KiB   pam_timestamp_check
 -
 453.9 MiB
 
 =
  Private  +   Shared  =  RAM used   Program
 
  96.0 KiB +  11.5 KiB = 107.5 KiB   uuidd
 108.0 KiB +  12.5 KiB = 120.5 KiB   klogd
 124.0 KiB +  17.0 KiB = 141.0 KiB   hidd
 116.0 KiB +  30.0 KiB = 146.0 KiB   acpid
 124.0 KiB +  29.5 KiB = 153.5 KiB   hald-addon-storage
 144.0 KiB +  15.0 KiB = 159.0 KiB   gpm
 136.0 KiB +  26.5 KiB = 162.5 KiB   pam_timestamp_check
 --
 453.9 MiB
 =
 
 
 This e-Mail may contain proprietary and confidential information and is
 sent for the intended recipient(s) only.  If by an addressing or
 transmission error this mail has been misdirected to you, you are
 requested to delete this mail immediately. You are also hereby notified
 that any use, any form of reproduction, dissemination, copying,
 disclosure, modification, distribution and/or publication of this e-
 mail message, contents or its attachment other than by its intended
 recipient/s is strictly prohibited.
 
 Visit us at http://www.polarisFT.com
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.




This e-Mail may contain proprietary and confidential information and is sent 
for the intended recipient(s) only.  If by an addressing or transmission error 
this mail has been misdirected to you, you are requested to delete this mail 
immediately. You are also hereby notified that any use, any form of 
reproduction, dissemination, copying, disclosure, modification, distribution 
and/or publication of this e-mail message, contents or its attachment other 
than by its intended recipient/s is strictly prohibited.

Visit us at http://www.polarisFT.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory usage bar plot

2013-08-30 Thread jim holtman

Here is how to parse the data and put it into groups.  Not sure what
the 'timing' of each group is since not time information was given.
Also not sure is there is an 'MiB' qualifier on the data, but you have
the matrix of data which is easy to do with as you want.


 input - readLines(textConnection(
+  Private  +   Shared  =  RAM used   Program
+
+  96.0 KiB +  11.5 KiB = 107.5 KiB   uuidd
+ 108.0 KiB +  12.5 KiB = 120.5 KiB   klogd
+ 124.0 KiB +  17.0 KiB = 141.0 KiB   hidd
+ 116.0 KiB +  30.0 KiB = 146.0 KiB   acpid
+ 124.0 KiB +  29.5 KiB = 153.5 KiB   hald-addon-storage
+ 144.0 KiB +  15.0 KiB = 159.0 KiB   gpm
+ 136.0 KiB +  26.5 KiB = 162.5 KiB   pam_timestamp_check
+ -
+ 453.9 MiB
+
+ =
+  Private  +   Shared  =  RAM used   Program
+
+  96.0 KiB +  11.5 KiB = 107.5 KiB   uuidd
+ 108.0 KiB +  12.5 KiB = 120.5 KiB   klogd
+ 124.0 KiB +  17.0 KiB = 141.0 KiB   hidd
+ 116.0 KiB +  30.0 KiB = 146.0 KiB   acpid
+ 124.0 KiB +  29.5 KiB = 153.5 KiB   hald-addon-storage
+ 144.0 KiB +  15.0 KiB = 159.0 KiB   gpm
+ 136.0 KiB +  26.5 KiB = 162.5 KiB   pam_timestamp_check
+ --
+ 453.9 MiB
+ =))

 # keep only the data
 input - input[grepl('=', input)]

 # separate into groups
 grps - split(input, cumsum(grepl(=  RAM, input)))

 # parse the data (not sure if there is also 'MiB')
 parsed - lapply(grps, function(.grp){
+ # parse ignoring first and last lines
+ .data - sub(.*= ([^ ]+) ([^ ]+)\\s+(.*), \\1 \\2 \\3
+ , .grp[2:(length(.grp) - 1L)]
+ )
+ # return matrix
+ do.call(rbind, strsplit(.data, ' '))
+ })



 parsed
$`1`
 [,1][,2]  [,3]
[1,] 107.5 KiB uuidd
[2,] 120.5 KiB klogd
[3,] 141.0 KiB hidd
[4,] 146.0 KiB acpid
[5,] 153.5 KiB hald-addon-storage
[6,] 159.0 KiB gpm
[7,] 162.5 KiB pam_timestamp_check

$`2`
 [,1][,2]  [,3]
[1,] 107.5 KiB uuidd
[2,] 120.5 KiB klogd
[3,] 141.0 KiB hidd
[4,] 146.0 KiB acpid
[5,] 153.5 KiB hald-addon-storage
[6,] 159.0 KiB gpm
[7,] 162.5 KiB pam_timestamp_check


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Fri, Aug 30, 2013 at 7:24 AM,  mohan.radhakrish...@polarisft.com wrote:
 Hi,
   I haven't tried the code yet. Is there a way to parse this data
 using R and create bar plots so that each program's 'RAM used' figures are
 grouped together.
 So 'uuidd' bars will be together. The data will have about 50 sets. So if
 there are 100 processes each will have about 50 bars.

 What is the recommended way to graph these big barplots ? I am looking for
 only 'RAM used' figures.


 Thanks,
 Mohan


  Private  +   Shared  =  RAM used   Program

  96.0 KiB +  11.5 KiB = 107.5 KiB   uuidd
 108.0 KiB +  12.5 KiB = 120.5 KiB   klogd
 124.0 KiB +  17.0 KiB = 141.0 KiB   hidd
 116.0 KiB +  30.0 KiB = 146.0 KiB   acpid
 124.0 KiB +  29.5 KiB = 153.5 KiB   hald-addon-storage
 144.0 KiB +  15.0 KiB = 159.0 KiB   gpm
 136.0 KiB +  26.5 KiB = 162.5 KiB   pam_timestamp_check
 -
 453.9 MiB

 =
  Private  +   Shared  =  RAM used   Program

  96.0 KiB +  11.5 KiB = 107.5 KiB   uuidd
 108.0 KiB +  12.5 KiB = 120.5 KiB   klogd
 124.0 KiB +  17.0 KiB = 141.0 KiB   hidd
 116.0 KiB +  30.0 KiB = 146.0 KiB   acpid
 124.0 KiB +  29.5 KiB = 153.5 KiB   hald-addon-storage
 144.0 KiB +  15.0 KiB = 159.0 KiB   gpm
 136.0 KiB +  26.5 KiB = 162.5 KiB   pam_timestamp_check
 --
 453.9 MiB
 =


 This e-Mail may contain proprietary and confidential information and is sent 
 for the intended recipient(s) only.  If by an addressing or transmission 
 error this mail has been misdirected to you, you are requested to delete this 
 mail immediately. You are also hereby notified that any use, any form of 
 reproduction, dissemination, copying, disclosure, modification, distribution 
 and/or publication of this e-mail message, contents or its attachment other 
 than by its intended recipient/s is strictly prohibited.

 Visit us at http://www.polarisFT.com

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__

Re: [R] Memory usage bar plot

2013-08-30 Thread arun

HI,
You could also parse the data by:

input1- input
library(stringr)
input2-str_trim(gsub([=+],,input1))
dat1-read.table(text=word(input2[!grepl(---,input2) input2!=  
!grepl(RAM|MiB,input2)],8,15),sep=,header=FALSE,stringsAsFactors=FALSE)
lst1-split(dat1,cumsum(dat1$V3==uuidd)) 


lst1
#$`1`
# V1  V2  V3
#1 107.5 KiB   uuidd
#2 120.5 KiB   klogd
#3 141.0 KiB    hidd
#4 146.0 KiB   acpid
#5 153.5 KiB  hald-addon-storage
#6 159.0 KiB gpm
#7 162.5 KiB pam_timestamp_check
#
#$`2`
 #     V1  V2  V3
#8  107.5 KiB   uuidd
#9  120.5 KiB   klogd
#10 141.0 KiB    hidd
#11 146.0 KiB   acpid
#12 153.5 KiB  hald-addon-storage
#13 159.0 KiB gpm
#14 162.5 KiB pam_timestamp_check

A.K.

- Original Message -
From: jim holtman jholt...@gmail.com
To: mohan.radhakrish...@polarisft.com
Cc: R mailing list r-help@r-project.org
Sent: Friday, August 30, 2013 9:44 AM
Subject: Re: [R] Memory usage bar plot

Here is how to parse the data and put it into groups.  Not sure what
the 'timing' of each group is since not time information was given.
Also not sure is there is an 'MiB' qualifier on the data, but you have
the matrix of data which is easy to do with as you want.


 input - readLines(textConnection(
+  Private  +   Shared  =  RAM used       Program
+
+  96.0 KiB +  11.5 KiB = 107.5 KiB       uuidd
+ 108.0 KiB +  12.5 KiB = 120.5 KiB       klogd
+ 124.0 KiB +  17.0 KiB = 141.0 KiB       hidd
+ 116.0 KiB +  30.0 KiB = 146.0 KiB       acpid
+ 124.0 KiB +  29.5 KiB = 153.5 KiB       hald-addon-storage
+ 144.0 KiB +  15.0 KiB = 159.0 KiB       gpm
+ 136.0 KiB +  26.5 KiB = 162.5 KiB       pam_timestamp_check
+ -
+                                             453.9 MiB
+
+ =
+  Private  +   Shared  =  RAM used       Program
+
+  96.0 KiB +  11.5 KiB = 107.5 KiB       uuidd
+ 108.0 KiB +  12.5 KiB = 120.5 KiB       klogd
+ 124.0 KiB +  17.0 KiB = 141.0 KiB       hidd
+ 116.0 KiB +  30.0 KiB = 146.0 KiB       acpid
+ 124.0 KiB +  29.5 KiB = 153.5 KiB       hald-addon-storage
+ 144.0 KiB +  15.0 KiB = 159.0 KiB       gpm
+ 136.0 KiB +  26.5 KiB = 162.5 KiB       pam_timestamp_check
+ --
+                                             453.9 MiB
+ =))

 # keep only the data
 input - input[grepl('=', input)]

 # separate into groups
 grps - split(input, cumsum(grepl(=  RAM, input)))

 # parse the data (not sure if there is also 'MiB')
 parsed - lapply(grps, function(.grp){
+     # parse ignoring first and last lines
+     .data - sub(.*= ([^ ]+) ([^ ]+)\\s+(.*), \\1 \\2 \\3
+                 , .grp[2:(length(.grp) - 1L)]
+                 )
+     # return matrix
+     do.call(rbind, strsplit(.data, ' '))
+ })



 parsed
$`1`
     [,1]    [,2]  [,3]
[1,] 107.5 KiB uuidd
[2,] 120.5 KiB klogd
[3,] 141.0 KiB hidd
[4,] 146.0 KiB acpid
[5,] 153.5 KiB hald-addon-storage
[6,] 159.0 KiB gpm
[7,] 162.5 KiB pam_timestamp_check

$`2`
     [,1]    [,2]  [,3]
[1,] 107.5 KiB uuidd
[2,] 120.5 KiB klogd
[3,] 141.0 KiB hidd
[4,] 146.0 KiB acpid
[5,] 153.5 KiB hald-addon-storage
[6,] 159.0 KiB gpm
[7,] 162.5 KiB pam_timestamp_check


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Fri, Aug 30, 2013 at 7:24 AM,  mohan.radhakrish...@polarisft.com wrote:
 Hi,
           I haven't tried the code yet. Is there a way to parse this data
 using R and create bar plots so that each program's 'RAM used' figures are
 grouped together.
 So 'uuidd' bars will be together. The data will have about 50 sets. So if
 there are 100 processes each will have about 50 bars.

 What is the recommended way to graph these big barplots ? I am looking for
 only 'RAM used' figures.


 Thanks,
 Mohan


  Private  +   Shared  =  RAM used       Program

  96.0 KiB +  11.5 KiB = 107.5 KiB       uuidd
 108.0 KiB +  12.5 KiB = 120.5 KiB       klogd
 124.0 KiB +  17.0 KiB = 141.0 KiB       hidd
 116.0 KiB +  30.0 KiB = 146.0 KiB       acpid
 124.0 KiB +  29.5 KiB = 153.5 KiB       hald-addon-storage
 144.0 KiB +  15.0 KiB = 159.0 KiB       gpm
 136.0 KiB +  26.5 KiB = 162.5 KiB       pam_timestamp_check
 -
                                             453.9 MiB

 =
  Private  +   Shared  =  RAM used       Program

  96.0 KiB +  11.5 KiB = 107.5 KiB       uuidd
 108.0 KiB +  12.5 KiB = 120.5 KiB       klogd
 124.0 KiB +  17.0 KiB = 141.0 KiB       hidd
 116.0 KiB +  30.0 KiB = 146.0 KiB       acpid
 124.0 KiB +  29.5 KiB = 153.5 KiB       hald-addon-storage
 144.0 KiB +  15.0 KiB = 159.0 KiB       gpm
 136.0 KiB +  26.5 KiB = 162.5 KiB       pam_timestamp_check

Re: [R] Memory usage bar plot

2013-08-30 Thread PIKAL Petr

Hi

From: mohan.radhakrish...@polarisft.com 
[mailto:mohan.radhakrish...@polarisft.com]
Sent: Friday, August 30, 2013 3:16 PM
To: PIKAL Petr
Cc: r-help@r-project.org
Subject: RE: [R] Memory usage bar plot

Hello,
This memory usage should be graphed with time. Are there 
examples of scatterplots that can clearly show usage vs time ?  This is for  
memory leak detection.

Hm, Actually I do not understand what do you want. No data, no code just some 
vague description. If you have data frame with variables usage and time you can 
plot

plot(time, usage)

Regards
Petr

Thanks,
Mohan

From:PIKAL Petr petr.pi...@precheza.czmailto:petr.pi...@precheza.cz
To:
mohan.radhakrish...@polarisft.commailto:mohan.radhakrish...@polarisft.com 
mohan.radhakrish...@polarisft.commailto:mohan.radhakrish...@polarisft.com, 
r-help@r-project.orgmailto:r-help@r-project.org 
r-help@r-project.orgmailto:r-help@r-project.org
Date:08/30/2013 05:33 PM
Subject:RE: [R] Memory usage bar plot

Hi

For reading data into R you shall look to read.table and similar.

For plotting ggplot could handle it. However I wonder if 100 times 50 bars is 
the way how to present your data. You shall think over what do you want to show 
to yourself or your audience. Maybe boxplots or scatterplots could be better.

Petr

 -Original Message-
 From: r-help-boun...@r-project.orgmailto:r-help-boun...@r-project.org 
 [mailto:r-help-bounces@r-
 project.org] On Behalf Of 
 mohan.radhakrish...@polarisft.commailto:mohan.radhakrish...@polarisft.com
 Sent: Friday, August 30, 2013 1:25 PM
 To: r-help@r-project.orgmailto:r-help@r-project.org
 Subject: [R] Memory usage bar plot

 Hi,
   I haven't tried the code yet. Is there a way to parse this
 data using R and create bar plots so that each program's 'RAM used'
 figures are grouped together.
 So 'uuidd' bars will be together. The data will have about 50 sets. So
 if there are 100 processes each will have about 50 bars.

 What is the recommended way to graph these big barplots ? I am looking
 for only 'RAM used' figures.

 Thanks,
 Mohan

  Private  +   Shared  =  RAM used   Program

  96.0 KiB +  11.5 KiB = 107.5 KiB   uuidd
 108.0 KiB +  12.5 KiB = 120.5 KiB   klogd
 124.0 KiB +  17.0 KiB = 141.0 KiB   hidd
 116.0 KiB +  30.0 KiB = 146.0 KiB   acpid
 124.0 KiB +  29.5 KiB = 153.5 KiB   hald-addon-storage
 144.0 KiB +  15.0 KiB = 159.0 KiB   gpm
 136.0 KiB +  26.5 KiB = 162.5 KiB   pam_timestamp_check
 -
 453.9 MiB

 =
  Private  +   Shared  =  RAM used   Program

  96.0 KiB +  11.5 KiB = 107.5 KiB   uuidd
 108.0 KiB +  12.5 KiB = 120.5 KiB   klogd
 124.0 KiB +  17.0 KiB = 141.0 KiB   hidd
 116.0 KiB +  30.0 KiB = 146.0 KiB   acpid
 124.0 KiB +  29.5 KiB = 153.5 KiB   hald-addon-storage
 144.0 KiB +  15.0 KiB = 159.0 KiB   gpm
 136.0 KiB +  26.5 KiB = 162.5 KiB   pam_timestamp_check
 --
 453.9 MiB
 =

 This e-Mail may contain proprietary and confidential information and is
 sent for the intended recipient(s) only.  If by an addressing or
 transmission error this mail has been misdirected to you, you are
 requested to delete this mail immediately. You are also hereby notified
 that any use, any form of reproduction, dissemination, copying,
 disclosure, modification, distribution and/or publication of this e-
 mail message, contents or its attachment other than by its intended
 recipient/s is strictly prohibited.

 Visit us at http://www.polarisFT.comhttp://www.polarisft.com/

  [[alternative HTML version deleted]]

 __
 R-help@r-project.orgmailto:R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-http://www.r-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

This e-Mail may contain proprietary and confidential information and is sent 
for the intended recipient(s) only. If by an addressing or transmission error 
this mail has been misdirected to you, you are requested to delete this mail 
immediately. You are also hereby notified that any use, any form of 
reproduction, dissemination, copying, disclosure, modification, distribution 
and/or publication of this e-mail message, contents or its attachment other 
than by its intended recipient/s is strictly prohibited. Visit us at 
http://www.polarisFT.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list

Re: [R] Memory usage bar plot

2013-08-30 Thread PIKAL Petr

Hi

For reading data into R you shall look to read.table and similar.

For plotting ggplot could handle it. However I wonder if 100 times 50 bars is 
the way how to present your data. You shall think over what do you want to show 
to yourself or your audience. Maybe boxplots or scatterplots could be better.

Petr


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of mohan.radhakrish...@polarisft.com
 Sent: Friday, August 30, 2013 1:25 PM
 To: r-help@r-project.org
 Subject: [R] Memory usage bar plot
 
 Hi,
   I haven't tried the code yet. Is there a way to parse this
 data using R and create bar plots so that each program's 'RAM used'
 figures are grouped together.
 So 'uuidd' bars will be together. The data will have about 50 sets. So
 if there are 100 processes each will have about 50 bars.
 
 What is the recommended way to graph these big barplots ? I am looking
 for only 'RAM used' figures.
 
 
 Thanks,
 Mohan
 
 
  Private  +   Shared  =  RAM used   Program
 
  96.0 KiB +  11.5 KiB = 107.5 KiB   uuidd
 108.0 KiB +  12.5 KiB = 120.5 KiB   klogd
 124.0 KiB +  17.0 KiB = 141.0 KiB   hidd
 116.0 KiB +  30.0 KiB = 146.0 KiB   acpid
 124.0 KiB +  29.5 KiB = 153.5 KiB   hald-addon-storage
 144.0 KiB +  15.0 KiB = 159.0 KiB   gpm
 136.0 KiB +  26.5 KiB = 162.5 KiB   pam_timestamp_check
 -
 453.9 MiB
 
 =
  Private  +   Shared  =  RAM used   Program
 
  96.0 KiB +  11.5 KiB = 107.5 KiB   uuidd
 108.0 KiB +  12.5 KiB = 120.5 KiB   klogd
 124.0 KiB +  17.0 KiB = 141.0 KiB   hidd
 116.0 KiB +  30.0 KiB = 146.0 KiB   acpid
 124.0 KiB +  29.5 KiB = 153.5 KiB   hald-addon-storage
 144.0 KiB +  15.0 KiB = 159.0 KiB   gpm
 136.0 KiB +  26.5 KiB = 162.5 KiB   pam_timestamp_check
 --
 453.9 MiB
 =
 
 
 This e-Mail may contain proprietary and confidential information and is
 sent for the intended recipient(s) only.  If by an addressing or
 transmission error this mail has been misdirected to you, you are
 requested to delete this mail immediately. You are also hereby notified
 that any use, any form of reproduction, dissemination, copying,
 disclosure, modification, distribution and/or publication of this e-
 mail message, contents or its attachment other than by its intended
 recipient/s is strictly prohibited.
 
 Visit us at http://www.polarisFT.com
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory usage bar plot

2013-08-30 Thread Richard M. Heiberger

## Here is a plot.  The input was parsed with Jim Holtman's code.
## The panel.dumbell is something I devised to show differences.

## Rich


input - readLines(textConnection(
 Private  +   Shared  =  RAM used   Program

 96.0 KiB +  11.5 KiB = 107.5 KiB   uuidd
108.0 KiB +  12.5 KiB = 120.5 KiB   klogd
124.0 KiB +  17.0 KiB = 141.0 KiB   hidd
116.0 KiB +  30.0 KiB = 146.0 KiB   acpid
124.0 KiB +  29.5 KiB = 153.5 KiB   hald-addon-storage
144.0 KiB +  15.0 KiB = 159.0 KiB   gpm
136.0 KiB +  26.5 KiB = 162.5 KiB   pam_timestamp_check
-
453.9 MiB

=
 Private  +   Shared  =  RAM used   Program

 96.0 KiB +  11.5 KiB = 107.5 KiB   uuidd
108.0 KiB +  12.5 KiB = 120.5 KiB   klogd
124.0 KiB +  17.0 KiB = 141.0 KiB   hidd
116.0 KiB +  30.0 KiB = 146.0 KiB   acpid
124.0 KiB +  29.5 KiB = 153.5 KiB   hald-addon-storage
144.0 KiB +  15.0 KiB = 159.0 KiB   gpm
136.0 KiB +  26.5 KiB = 162.5 KiB   pam_timestamp_check
--
453.9 MiB
=))

# keep only the data
input - input[grepl('=', input)]

# separate into groups
grps - split(input, cumsum(grepl(=  RAM, input)))

# parse the data (not sure if there is also 'MiB')
parsed - lapply(grps, function(.grp){
# parse ignoring first and last lines
.data - sub(.*= ([^ ]+) ([^ ]+)\\s+(.*), \\1 \\2 \\3
, .grp[2:(length(.grp) - 1L)]
)
# return matrix
do.call(rbind, strsplit(.data, ' '))
})



parsed

tmp1 - do.call(rbind, lapply(parsed, function(x) data.frame(x)))
names(tmp1) - c(RamUsed, units, Program)
tmp1$Time - factor(rep(1:2, each=7))
tmp1$RamUsed - as.numeric(tmp1$RamUsed)

library(lattice)
dotplot(Program ~ RamUsed, groups=Time, data=tmp1)
## this is silly.  Let me construct a more interesting example with
different values at each time.
tmp1$RamUsed[8:14] - tmp1$RamUsed[1:7] + 10*(sample(1:7))
tmp1
dotplot(Program ~ RamUsed, groups=Time, data=tmp1,
auto.key=list(title=Time, border=TRUE, columns=2))

panel.dumbell - function(x, y, ..., lwd=1) {
  n - length(x)/2
  panel.segments(x[1:n], as.numeric(y)[n+(1:n)], x[n+(1:n)],
as.numeric(y)[n+(1:n)], lwd=lwd)
  panel.dotplot(x, y, ...)
}

dotplot(Program ~ RamUsed, groups=Time, data=tmp1,
auto.key=list(title=Time, border=TRUE, columns=2),
panel=panel.dumbell,
par.settings=list(superpose.symbol=list(pch=19)),
)



On Fri, Aug 30, 2013 at 9:44 AM, jim holtman jholt...@gmail.com wrote:

 Here is how to parse the data and put it into groups.  Not sure what
 the 'timing' of each group is since not time information was given.
 Also not sure is there is an 'MiB' qualifier on the data, but you have
 the matrix of data which is easy to do with as you want.


  input - readLines(textConnection(
 +  Private  +   Shared  =  RAM used   Program
 +
 +  96.0 KiB +  11.5 KiB = 107.5 KiB   uuidd
 + 108.0 KiB +  12.5 KiB = 120.5 KiB   klogd
 + 124.0 KiB +  17.0 KiB = 141.0 KiB   hidd
 + 116.0 KiB +  30.0 KiB = 146.0 KiB   acpid
 + 124.0 KiB +  29.5 KiB = 153.5 KiB   hald-addon-storage
 + 144.0 KiB +  15.0 KiB = 159.0 KiB   gpm
 + 136.0 KiB +  26.5 KiB = 162.5 KiB   pam_timestamp_check
 + -
 + 453.9 MiB
 +
 + =
 +  Private  +   Shared  =  RAM used   Program
 +
 +  96.0 KiB +  11.5 KiB = 107.5 KiB   uuidd
 + 108.0 KiB +  12.5 KiB = 120.5 KiB   klogd
 + 124.0 KiB +  17.0 KiB = 141.0 KiB   hidd
 + 116.0 KiB +  30.0 KiB = 146.0 KiB   acpid
 + 124.0 KiB +  29.5 KiB = 153.5 KiB   hald-addon-storage
 + 144.0 KiB +  15.0 KiB = 159.0 KiB   gpm
 + 136.0 KiB +  26.5 KiB = 162.5 KiB   pam_timestamp_check
 + --
 + 453.9 MiB
 + =))
 
  # keep only the data
  input - input[grepl('=', input)]
 
  # separate into groups
  grps - split(input, cumsum(grepl(=  RAM, input)))
 
  # parse the data (not sure if there is also 'MiB')
  parsed - lapply(grps, function(.grp){
 + # parse ignoring first and last lines
 + .data - sub(.*= ([^ ]+) ([^ ]+)\\s+(.*), \\1 \\2 \\3
 + , .grp[2:(length(.grp) - 1L)]
 + )
 + # return matrix
 + do.call(rbind, strsplit(.data, ' '))
 + })
 
 
 
  parsed
 $`1`
  [,1][,2]  [,3]
 [1,] 107.5 KiB uuidd
 [2,] 120.5 KiB klogd
 [3,] 141.0 KiB hidd
 [4,] 146.0 KiB acpid
 [5,] 153.5 KiB hald-addon-storage
 [6,] 159.0 KiB gpm
 [7,] 162.5 KiB pam_timestamp_check

 $`2`
  [,1][,2]  [,3]
 [1,] 107.5 KiB uuidd
 [2,] 120.5 KiB klogd
 [3,] 141.0 KiB hidd
 [4,] 146.0 KiB acpid
 [5,] 153.5 KiB

Re: [R] Memory usage reported by gc() differs from 'top'

2013-04-19 Thread Christian Brechbühler

Merci beaucoup Milan, thank you very much Martin and Kjetil for your
responses.

I appreciate the caveat about virtual memory.  I gather that besides
resident memory and swap space, it may also include memory mapped files,
which don't cost anything.  Maybe by pure chance, in my case virtual
memory still seems mildly relevant.  While there is RAM available, res
tracks virt closely, about till this point:

   top - 13:32:28 up 208 days, 20:46,  3 users,  load average: 1.68, 1.41,
1.17
   Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
   Cpu(s): 46.5%us,  6.5%sy,  0.0%ni,  6.0%id, 41.0%wa,  0.0%hi,  0.0%si,
 0.0%st
   Mem:   8063744k total,  8012976k used,50768k free,  464k buffers
   Swap: 19543064k total,  3445236k used, 16097828k free,35096k cached

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

6210 brech 20   0 7486m 7.2g 7424 R   98 93.2   6:15.74 R

(That's 7.3g virtual memory.)  After that, res stays the same, while
virt keeps growing.  That's an issue because if it uses up all the  swap
space (a bit beyond the state I showed in my original post), R starts
reporting problems, e.g.:

   Error in system(command = command, intern = output) :
 cannot popen 'whoami', probable reason 'Cannot allocate memory'

It sounds like a major reason for the discrepancy could be fragmentation,
possibly caused by repeated copying.  It will take some work to profile
memory usage (thanks for you pointers to the tools) get a better picture,
and to create a minimal reproducible example.

I'm glad you pointed me to /proc/[pid]/maps and smaps; they have a wealth
of information.  The most interesting entry is [heap]; it's growing rapidly
during the run of my code, and accounts for all of res and 98% of virt.
The others are less exciting, mostly memory mapped files (e.g.,
lib/R/library/MASS/libs/x86_64/MASS.so), and change at most by a few k
Referenced, or moved from Rss to Swap.

So clearly my interest is in R's heap.  Again many thanks for all your help!

/Christian


On Thu, Apr 18, 2013 at 5:17 PM, Kjetil Kjernsmo kje...@ifi.uio.no wrote:

 On Thursday 18. April 2013 12.18.03 Milan Bouchet-Valat wrote:
  First, completely stop looking at virtual memory: it does not mean much,
 if
  anything. What you care about is resident memory. See e.g.:
 
 http://serverfault.com/questions/138427/top-what-does-virtual-memory-size-m
  ean-linux-ubuntu

 I concur. I have lost track of R's internals long ago, but in a previous
 life
 analyzing the Apache HTTP server's actual memory use (something that
 focused
 on shared RAM, quite different from what you'd probably like to do), I
 found
 that if you really need to understand what's going on, you would need to
 look
 elsewhere.

 On Linux, you'll find the details in the /proc/[pid]/maps and
 /proc/[pid]/smaps
 pseudo-filesystem files, where [pid] is the process ID, im your example
 6210.
 That's where you really see what's eating your RAM. :-)

 Cheers,

 KJetil


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory usage reported by gc() differs from 'top'

2013-04-18 Thread Milan Bouchet-Valat

Le mercredi 17 avril 2013 à 23:17 -0400, Christian Brechbühler a écrit :
In help(gc) I read, ...the primary purpose of calling 'gc' is for the
report on memory usage.
What memory usage does gc() report? And more importantly, which memory
uses does it NOT report? Because I see one answer from gc():

used (Mb) gc trigger (Mb) max used (Mb)
Ncells 14875922 794.5 21754962 1161.9 17854776 953.6
Vcells 59905567 457.1 84428913 644.2 72715009 554.8

(That's about 1.5g max used, 1.8g trigger.)
And a different answer from an OS utility, 'top':

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

6210 brech 20 0 18.2g 7.2g 2612 S 1 93.4 16:26.73 R

So the R process is holding on to 18.2g memory, but it only seems to have
accout of 1.5g or so.
Where is the rest?

I tried searching the archives, and found answers like just buy more RAM.
Which doesn't exactly answer my question. And come on, 18g is pretty big;
sure it doesn't fit in my RAM (only 7.2g are in), but that's beside the
point.

The huge memory demand is specific to R version 2.15.3 Patched (2013-03-13
r62500) -- Security Blanket. The same test runs without issues under R
version 2.15.1 beta (2012-06-11 r59557) -- Roasted Marshmallows.

I appreciate any insights you can share into R's memory management, and
gc() in particular.
/Christian
First, completely stop looking at virtual memory: it does not mean much, if
anything. What you care about is resident memory. See e.g.:
http://serverfault.com/questions/138427/top-what-does-virtual-memory-size-mean-linux-ubuntu

Then, there is a limitation with R/Linux: gc() does not reorder objects in
memory
so that they are all on the same area. This means that while the total size of
R objects in memory is 457MB, they are spread all over the RAM, and a single
object in a memory page forces the Linux kernel to keep it in RAM.

I do not know the exact details, as it seems that Windows does a better
job than Linux in that regard. One workaround is to save the session and
restart R: objects will be loaded in a more compact fashion.

As for the differences between R 2.15.1 and R 2.15.3, maybe there is some
more copying that increases memory fragmentation, but the fundamental
problem has not changed AFAIK. You can call tracemem() on large objects
to see how many times they are being copied. See
http://developer.r-project.org/memory-profiling.html

My two cents

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory usage reported by gc() differs from 'top'

2013-04-18 Thread Martin Morgan

On 04/18/2013 03:18 AM, Milan Bouchet-Valat wrote:

Le mercredi 17 avril 2013 à 23:17 -0400, Christian Brechbühler a écrit :

In help(gc) I read, ...the primary purpose of calling 'gc' is for the
report on memory usage.
What memory usage does gc() report? And more importantly, which memory
uses does it NOT report? Because I see one answer from gc():

used (Mb) gc trigger (Mb) max used (Mb)
Ncells 14875922 794.5 21754962 1161.9 17854776 953.6
Vcells 59905567 457.1 84428913 644.2 72715009 554.8

From the R side of things, this is an (approximate) accounting of memory
actually reached by objects in the current session. One possible reason for
discrepancy with the OS is that you are using a package that references memory R
does not know about (e.g., 'external pointers'), or there is a memory leak in R
or a third party package where memory is not returned to the OS. Even if the
reason is 'memory fragmentation' as suggested by Milan, it is interesting to
understand how that fragmentation arises, either to identify a work-around or
more productively to understand and address the underlying problem.

So a reasonable avenue is to develop a minimal, reproducible example of how one
could arrive at the situation you report.

Martin

(That's about 1.5g max used, 1.8g trigger.)
And a different answer from an OS utility, 'top':

PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND

6210 brech 20 0 18.2g 7.2g 2612 S1 93.4 16:26.73 R

So the R process is holding on to 18.2g memory, but it only seems to have
accout of 1.5g or so.
Where is the rest?

I appreciate any insights you can share into R's memory management, and
gc() in particular.
/Christian

First, completely stop looking at virtual memory: it does not mean much, if
anything. What you care about is resident memory. See e.g.:
http://serverfault.com/questions/138427/top-what-does-virtual-memory-size-mean-linux-ubuntu

My two cents

[[alternative HTML version deleted]]

--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

Re: [R] Memory usage reported by gc() differs from 'top'

2013-04-18 Thread Kjetil Kjernsmo

On Thursday 18. April 2013 12.18.03 Milan Bouchet-Valat wrote:
 First, completely stop looking at virtual memory: it does not mean much, if
 anything. What you care about is resident memory. See e.g.:
 http://serverfault.com/questions/138427/top-what-does-virtual-memory-size-m
 ean-linux-ubuntu

I concur. I have lost track of R's internals long ago, but in a previous life 
analyzing the Apache HTTP server's actual memory use (something that focused 
on shared RAM, quite different from what you'd probably like to do), I found 
that if you really need to understand what's going on, you would need to look 
elsewhere. 

On Linux, you'll find the details in the /proc/[pid]/maps and /proc/[pid]/smaps 
pseudo-filesystem files, where [pid] is the process ID, im your example 6210. 
That's where you really see what's eating your RAM. :-)

Cheers,

KJetil

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Memory usage reported by gc() differs from 'top'

2013-04-17 Thread Christian Brechbühler

In help(gc) I read, ...the primary purpose of calling 'gc' is for the
report on memory usage.
What memory usage does gc() report?  And more importantly, which memory
uses does it NOT report?  Because I see one answer from gc():

   used  (Mb) gc trigger   (Mb) max used  (Mb)
Ncells 14875922 794.5   21754962 1161.9 17854776 953.6
Vcells 59905567 457.1   84428913  644.2 72715009 554.8

(That's about 1.5g max used, 1.8g trigger.)
And a different answer from an OS utility, 'top':

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND

 6210 brech 20   0 18.2g 7.2g 2612 S1 93.4  16:26.73 R

So the R process is holding on to 18.2g memory, but it only seems to have
accout of 1.5g or so.
Where is the rest?

I tried searching the archives, and found answers like just buy more RAM.
 Which doesn't exactly answer my question.  And come on, 18g is pretty big;
sure it doesn't fit in my RAM (only 7.2g are in), but that's beside the
point.

The huge memory demand is specific to R version 2.15.3 Patched (2013-03-13
r62500) -- Security Blanket.  The same test runs without issues under R
version 2.15.1 beta (2012-06-11 r59557) -- Roasted Marshmallows.

I appreciate any insights you can share into R's memory management, and
gc() in particular.
/Christian

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory usage in R grows considerably while calculating word frequencies

2012-09-26 Thread Rainer M Krug

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 25/09/12 01:29, mcelis wrote:
 I am working with some large text files (up to 16 GBytes).  I am interested 
 in extracting the 
 words and counting each time each word appears in the text. I have written a 
 very simple R 
 program by following some suggestions and examples I found online.

Just an idea (I have no experience with what you want to do, so it might not 
work):

What about putting the text in a database (sqlite comes to mind) where each 
word is one entry.
Then you could use sql to query the database, which should need much less 
memory.

In addition, it should make further processing much easier.

Cheers,

Rainer

 
 If my input file is 1 GByte, I see that R uses up to 11 GBytes of memory when 
 executing the 
 program on a 64-bit system running CentOS 6.3. Why is R using so much memory? 
 Is there a
 better way to do this that will minimize memory usage.
 
 I am very new to R, so I would appreciate some tips on how to improve my 
 program or a better 
 way to do it.
 
 R program: # Read in the entire file and convert all words in text to lower 
 case 
 words.txt-tolower(scan(text_file,character,sep=\n))
 
 # Extract words pattern - (\\b[A-Za-z]+\\b) match - 
 gregexpr(pattern,words.txt) words.txt 
 - regmatches(words.txt,match)
 
 # Create a vector from the list of words words.txt-unlist(words.txt)
 
 # Calculate word frequencies words.txt-table(words.txt,dnn=words)
 
 # Sort by frequency, not alphabetically 
 words.txt-sort(words.txt,decreasing=TRUE)
 
 # Put into some readable form, Name of word and Number of times it occurs 
 words.txt-paste(names(words.txt),words.txt,sep=\t)
 
 # Results to a file cat(Word\tFREQ,words.txt,file=frequencies,sep=\n)
 
 
 
 -- View this message in context: 
 http://r.789695.n4.nabble.com/Memory-usage-in-R-grows-considerably-while-calculating-word-frequencies-tp4644053.html


 
Sent from the R help mailing list archive at Nabble.com.
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlBitboACgkQoYgNqgF2egr1pgCgjHxE/E1qIwUbrYzB30qIk9cK
z/oAoILCYn66+c9CF5tzkWeQH3E2utwi
=ahI5
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory usage in R grows considerably while calculating word frequencies

2012-09-25 Thread arun

HI,
In a text file of 6834 words, I compared your program with a modified program.
sapply(strsplit(txt1, ),length)
#[1] 6834

#your program
system.time({
txt1-tolower(scan(text_file,character,sep=\n)) 
pattern - (\\b[A-Za-z]+\\b)
match - gregexpr(pattern,txt1)
words.txt - regmatches(txt1,match)
words.txt-unlist(words.txt)
words.txt-table(words.txt,dnn=words)
words.txt-sort(words.txt,decreasing=TRUE)
words.txt-paste(names(words.txt),words.txt,sep=\t)
cat(Word\tFREQ,words.txt,file=frequencies,sep=\n)
})

#   user  system elapsed 
 # 0.208   0.000   0.206 

#Modified code

system.time({
txt1-tolower(scan(text_file,character,sep=\n)) 
 words.txt-sort(table(strsplit(tolower(txt1),\\s)),decreasing=TRUE)
 words.txt-paste(names(words.txt),words.txt,sep=\t)
 cat(Word\tFREQ,words.txt,file=frequencies,sep=\n) 
})
#  user  system elapsed 
 # 0.016   0.000   0.014  

A.K.









- Original Message -
From: mcelis mce...@lightminersystems.com
To: r-help@r-project.org
Cc: 
Sent: Monday, September 24, 2012 7:29 PM
Subject: [R] Memory usage in R grows considerably while calculating word 
frequencies

I am working with some large text files (up to 16 GBytes).  I am interested
in extracting the words and counting each time each word appears in the
text. I have written a very simple R program by following some suggestions
and examples I found online.  

If my input file is 1 GByte, I see that R uses up to 11 GBytes of memory
when executing the program on
a 64-bit system running CentOS 6.3. Why is R using so much memory? Is there
a better way to do this that will
minimize memory usage.

I am very new to R, so I would appreciate some tips on how to improve my
program or a better way to do it.

R program:
# Read in the entire file and convert all words in text to lower case
words.txt-tolower(scan(text_file,character,sep=\n))

# Extract words
pattern - (\\b[A-Za-z]+\\b)
match - gregexpr(pattern,words.txt)
words.txt - regmatches(words.txt,match)

# Create a vector from the list of words
words.txt-unlist(words.txt)

# Calculate word frequencies
words.txt-table(words.txt,dnn=words)

# Sort by frequency, not alphabetically
words.txt-sort(words.txt,decreasing=TRUE)

# Put into some readable form, Name of word and Number of times it
occurs
words.txt-paste(names(words.txt),words.txt,sep=\t)

# Results to a file
cat(Word\tFREQ,words.txt,file=frequencies,sep=\n)



--
View this message in context: 
http://r.789695.n4.nabble.com/Memory-usage-in-R-grows-considerably-while-calculating-word-frequencies-tp4644053.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory usage in R grows considerably while calculating word frequencies

2012-09-25 Thread arun

HI,

In the previous email, I forgot to add unlist().
With four paragraphs,
sapply(strsplit(txt1, ),length)
#[1] 4850 9072 6400 2071


#Your code:
system.time({
txt1-tolower(scan(text_file,character,sep=\n)) 
pattern - (\\b[A-Za-z]+\\b)
match - gregexpr(pattern,txt1)
words.txt - regmatches(txt1,match)
words.txt-unlist(words.txt)
words.txt-table(words.txt,dnn=words)
words.txt-sort(words.txt,decreasing=TRUE)
words.txt-paste(names(words.txt),words.txt,sep=\t)
cat(Word\tFREQ,words.txt,file=frequencies,sep=\n)
})

#Read 4 items
#   user  system elapsed 
# 11.781   0.004  11.799 


#Modified code:
system.time({
txt1-tolower(scan(text_file,character,sep=\n)) 
 words.txt-sort(table(unlist(strsplit(tolower(txt1),\\s))),decreasing=TRUE)
 words.txt-paste(names(words.txt),words.txt,sep=\t)
 cat(Word\tFREQ,words.txt,file=frequencies,sep=\n) 
})
#Read 4 items
 #user  system elapsed 
 # 0.036   0.008   0.043 


A.K.




- Original Message -
From: mcelis mce...@lightminersystems.com
To: r-help@r-project.org
Cc: 
Sent: Monday, September 24, 2012 7:29 PM
Subject: [R] Memory usage in R grows considerably while calculating word 
frequencies

I am working with some large text files (up to 16 GBytes).  I am interested
in extracting the words and counting each time each word appears in the
text. I have written a very simple R program by following some suggestions
and examples I found online.  

If my input file is 1 GByte, I see that R uses up to 11 GBytes of memory
when executing the program on
a 64-bit system running CentOS 6.3. Why is R using so much memory? Is there
a better way to do this that will
minimize memory usage.

I am very new to R, so I would appreciate some tips on how to improve my
program or a better way to do it.

R program:
# Read in the entire file and convert all words in text to lower case
words.txt-tolower(scan(text_file,character,sep=\n))

# Extract words
pattern - (\\b[A-Za-z]+\\b)
match - gregexpr(pattern,words.txt)
words.txt - regmatches(words.txt,match)

# Create a vector from the list of words
words.txt-unlist(words.txt)

# Calculate word frequencies
words.txt-table(words.txt,dnn=words)

# Sort by frequency, not alphabetically
words.txt-sort(words.txt,decreasing=TRUE)

# Put into some readable form, Name of word and Number of times it
occurs
words.txt-paste(names(words.txt),words.txt,sep=\t)

# Results to a file
cat(Word\tFREQ,words.txt,file=frequencies,sep=\n)



--
View this message in context: 
http://r.789695.n4.nabble.com/Memory-usage-in-R-grows-considerably-while-calculating-word-frequencies-tp4644053.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory usage in R grows considerably while calculating word frequencies

2012-09-25 Thread Martin Maechler

 arun  smartpink...@yahoo.com
 on Mon, 24 Sep 2012 19:59:35 -0700 writes:

 HI,
 In the previous email, I forgot to add unlist().
 With four paragraphs,
 sapply(strsplit(txt1, ),length)
 #[1] 4850 9072 6400 2071


 #Your code:
 system.time({
 txt1-tolower(scan(text_file,character,sep=\n)) 
 pattern - (\\b[A-Za-z]+\\b)
 match - gregexpr(pattern,txt1)
 words.txt - regmatches(txt1,match)
 words.txt-unlist(words.txt)
 words.txt-table(words.txt,dnn=words)
 words.txt-sort(words.txt,decreasing=TRUE)
 words.txt-paste(names(words.txt),words.txt,sep=\t)
 cat(Word\tFREQ,words.txt,file=frequencies,sep=\n)
 })

 #Read 4 items
 #   user  system elapsed 
 # 11.781   0.004  11.799 


 #Modified code:
 system.time({
 txt1-tolower(scan(text_file,character,sep=\n)) 
  
words.txt-sort(table(unlist(strsplit(tolower(txt1),\\s))),decreasing=TRUE)
  words.txt-paste(names(words.txt),words.txt,sep=\t)
  cat(Word\tFREQ,words.txt,file=frequencies,sep=\n) 
 })
 #Read 4 items
  #user  system elapsed 
  # 0.036   0.008   0.043 


 A.K.

Well, dear A.K., your definition of word is really different,
and in my view clearly much too simplistic, compared to what the
OP (= original-poster) asked from.

E.g., from the above paragraph, your method will get words such as
 A.K.,   different,  or  (=  
clearly wrongly.

Martin Maechler, ETH Zurich



 - Original Message -
 From: mcelis mce...@lightminersystems.com
 To: r-help@r-project.org
 Cc: 
 Sent: Monday, September 24, 2012 7:29 PM
 Subject: [R] Memory usage in R grows considerably while calculating word 
frequencies

 I am working with some large text files (up to 16 GBytes).  I am 
interested
 in extracting the words and counting each time each word appears in the
 text. I have written a very simple R program by following some suggestions
 and examples I found online.  

 If my input file is 1 GByte, I see that R uses up to 11 GBytes of memory
 when executing the program on
 a 64-bit system running CentOS 6.3. Why is R using so much memory? Is 
there
 a better way to do this that will
 minimize memory usage.

 I am very new to R, so I would appreciate some tips on how to improve my
 program or a better way to do it.

 R program:
 # Read in the entire file and convert all words in text to lower case
 words.txt-tolower(scan(text_file,character,sep=\n))

 # Extract words
 pattern - (\\b[A-Za-z]+\\b)
 match - gregexpr(pattern,words.txt)
 words.txt - regmatches(words.txt,match)

 # Create a vector from the list of words
 words.txt-unlist(words.txt)

 # Calculate word frequencies
 words.txt-table(words.txt,dnn=words)

 # Sort by frequency, not alphabetically
 words.txt-sort(words.txt,decreasing=TRUE)

 # Put into some readable form, Name of word and Number of times it
 occurs
 words.txt-paste(names(words.txt),words.txt,sep=\t)

 # Results to a file
 cat(Word\tFREQ,words.txt,file=frequencies,sep=\n)



 --
 View this message in context: 
http://r.789695.n4.nabble.com/Memory-usage-in-R-grows-considerably-while-calculating-word-frequencies-tp4644053.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory usage in R grows considerably while calculating word frequencies

2012-09-25 Thread Milan Bouchet-Valat

Le lundi 24 septembre 2012 à 16:29 -0700, mcelis a écrit :
I am working with some large text files (up to 16 GBytes). I am interested
in extracting the words and counting each time each word appears in the
text. I have written a very simple R program by following some suggestions
and examples I found online.

If my input file is 1 GByte, I see that R uses up to 11 GBytes of memory
when executing the program on
a 64-bit system running CentOS 6.3. Why is R using so much memory? Is there
a better way to do this that will
minimize memory usage.

I am very new to R, so I would appreciate some tips on how to improve my
program or a better way to do it.
First, I think you should have a look at the tm package by Ingo
Feinerer. It will help you to import the texts, optionally run
processing steps on it, and then extract the words and create a
document-term matrix counting their frequencies. No need to reinvent the
wheel.

Second, there's nothing wrong with using RAM as long as it's available.
If other programs need it, the Linux will reclaim it. There's a problem
only if R's memory use does not reduce at that point. Use gc() to check
whether the RAM allocated to R is really in use. But tm should improve
the efficiency of the computations.

My two cents

R program:
# Read in the entire file and convert all words in text to lower case
words.txt-tolower(scan(text_file,character,sep=\n))

# Extract words
pattern - (\\b[A-Za-z]+\\b)
match - gregexpr(pattern,words.txt)
words.txt - regmatches(words.txt,match)

# Create a vector from the list of words
words.txt-unlist(words.txt)

# Calculate word frequencies
words.txt-table(words.txt,dnn=words)

# Sort by frequency, not alphabetically
words.txt-sort(words.txt,decreasing=TRUE)

# Put into some readable form, Name of word and Number of times it
occurs
words.txt-paste(names(words.txt),words.txt,sep=\t)

# Results to a file
cat(Word\tFREQ,words.txt,file=frequencies,sep=\n)

--
View this message in context:
http://r.789695.n4.nabble.com/Memory-usage-in-R-grows-considerably-while-calculating-word-frequencies-tp4644053.html
Sent from the R help mailing list archive at Nabble.com.

Re: [R] Memory usage in R grows considerably while calculating word frequencies

2012-09-25 Thread arun

Dear Martin,

Thanks for testing the code.  You are right.
I modified the code:

If I test it for a sample text,

txt1-Romney A.K. different, (= than other people.  Is it?
OP's code:
pattern - (\\b[A-Za-z]+\\b)
 match - gregexpr(pattern,txt1)
 words.txt - regmatches(txt1,match)
 words.txt-unlist(words.txt)
 words.txt-table(words.txt,dnn=words)
words.txt-sort(words.txt,decreasing=TRUE)
words.txt
#words
 #   A different    Is    it K other    people    
Romney 
 #   1 1 1 1 1 1 1 
1 
  #   than 
   # 1 


#My code:

 words.txt1-sort(table(gsub(\\W,,unlist(strsplit(tolower(txt1),\\s)))[grepl(\\b\\w+\\b,gsub(\\W,,unlist(strsplit(tolower(txt1),\\s]))
   #  ak different    is    it other    people    romney  than 
   # 1 1 1 1 1 1 1 
1 
 

Here, as you can see, OP's code split A.K. to two words, but my code joins it. 
I didn't fix it because the concern is to minimize memory usage.

I again, tested the new code with text of :
 sapply(strsplit(txt1, ),length)
#[1] 4850 9072 6400 2071
 sum(sapply(strsplit(txt1, ),length))
#[1] 22393
: words.

#OP's code:
system.time({
txt1-tolower(scan(text_file,character,sep=\n))
pattern - (\\b[A-Za-z]+\\b)
match - gregexpr(pattern,txt1)
words.txt - regmatches(txt1,match)
words.txt-unlist(words.txt)
words.txt-table(words.txt,dnn=words)
words.txt-sort(words.txt,decreasing=TRUE)
words.txt-paste(names(words.txt),words.txt,sep=\t)
cat(Word\tFREQ,words.txt,file=frequencies,sep=\n)
})
#Read 4 items
 # user  system elapsed 
# 12.056   0.000  12.066 

#My code:
system.time({
txt1-tolower(scan(text_file,character,sep=\n)) 
 words.txt-sort(table(gsub(\\W,,unlist(strsplit(tolower(txt1),\\s)))[grepl(\\b\\w+\\b,gsub(\\W,,unlist(strsplit(tolower(txt1),\\s]),decreasing=TRUE)
 words.txt-paste(names(words.txt),words.txt,sep=\t)
 cat(Word\tFREQ,words.txt,file=frequencies,sep=\n) 
})
#Read 4 items
  # user  system elapsed 
 # 0.148   0.000   0.150 

There is improvement in the speed.  Output also looked similar.  This code may 
be still improved.
A.K.
   




- Original Message -
From: Martin Maechler maech...@stat.math.ethz.ch
To: arun smartpink...@yahoo.com
Cc: mcelis mce...@lightminersystems.com; R help r-help@r-project.org
Sent: Tuesday, September 25, 2012 9:07 AM
Subject: Re: [R] Memory usage in R grows considerably while calculating word 
frequencies

 arun  smartpink...@yahoo.com
     on Mon, 24 Sep 2012 19:59:35 -0700 writes:

     HI,
     In the previous email, I forgot to add unlist().
     With four paragraphs,
     sapply(strsplit(txt1, ),length)
     #[1] 4850 9072 6400 2071


     #Your code:
     system.time({
     txt1-tolower(scan(text_file,character,sep=\n)) 
     pattern - (\\b[A-Za-z]+\\b)
     match - gregexpr(pattern,txt1)
     words.txt - regmatches(txt1,match)
     words.txt-unlist(words.txt)
     words.txt-table(words.txt,dnn=words)
     words.txt-sort(words.txt,decreasing=TRUE)
     words.txt-paste(names(words.txt),words.txt,sep=\t)
     cat(Word\tFREQ,words.txt,file=frequencies,sep=\n)
     })

     #Read 4 items
     #   user  system elapsed 
     # 11.781   0.004  11.799 


     #Modified code:
     system.time({
     txt1-tolower(scan(text_file,character,sep=\n)) 
      
words.txt-sort(table(unlist(strsplit(tolower(txt1),\\s))),decreasing=TRUE)
      words.txt-paste(names(words.txt),words.txt,sep=\t)
      cat(Word\tFREQ,words.txt,file=frequencies,sep=\n) 
     })
     #Read 4 items
      #user  system elapsed 
      # 0.036   0.008   0.043 


     A.K.

Well, dear A.K., your definition of word is really different,
and in my view clearly much too simplistic, compared to what the
OP (= original-poster) asked from.

E.g., from the above paragraph, your method will get words such as
A.K.,   different,  or  (=  
clearly wrongly.

Martin Maechler, ETH Zurich



     - Original Message -
     From: mcelis mce...@lightminersystems.com
     To: r-help@r-project.org
     Cc: 
     Sent: Monday, September 24, 2012 7:29 PM
     Subject: [R] Memory usage in R grows considerably while calculating word 
frequencies

     I am working with some large text files (up to 16 GBytes).  I am 
interested
     in extracting the words and counting each time each word appears in the
     text. I have written a very simple R program by following some suggestions
     and examples I found online.  

     If my input file is 1 GByte, I see that R uses up to 11 GBytes of memory
     when executing the program on
     a 64-bit system running CentOS 6.3. Why is R using so much memory? Is 
there
     a better way to do this that will
     minimize memory usage.

     I am very new to R, so I would appreciate some tips on how to improve my
     program or a better way to do it.

     R program:
     # Read in the entire file and convert all words in text to lower case
     words.txt-tolower(scan(text_file,character

[R] Memory usage in R grows considerably while calculating word frequencies

2012-09-24 Thread mcelis

I am working with some large text files (up to 16 GBytes).  I am interested
in extracting the words and counting each time each word appears in the
text. I have written a very simple R program by following some suggestions
and examples I found online.  

If my input file is 1 GByte, I see that R uses up to 11 GBytes of memory
when executing the program on
a 64-bit system running CentOS 6.3. Why is R using so much memory? Is there
a better way to do this that will
minimize memory usage.

I am very new to R, so I would appreciate some tips on how to improve my
program or a better way to do it.

R program:
# Read in the entire file and convert all words in text to lower case
words.txt-tolower(scan(text_file,character,sep=\n))

# Extract words
pattern - (\\b[A-Za-z]+\\b)
match - gregexpr(pattern,words.txt)
words.txt - regmatches(words.txt,match)

# Create a vector from the list of words
words.txt-unlist(words.txt)

# Calculate word frequencies
words.txt-table(words.txt,dnn=words)

# Sort by frequency, not alphabetically
words.txt-sort(words.txt,decreasing=TRUE)

# Put into some readable form, Name of word and Number of times it
occurs
words.txt-paste(names(words.txt),words.txt,sep=\t)

# Results to a file
cat(Word\tFREQ,words.txt,file=frequencies,sep=\n)



--
View this message in context: 
http://r.789695.n4.nabble.com/Memory-usage-in-R-grows-considerably-while-calculating-word-frequencies-tp4644053.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] memory usage benefit from anonymous variable constructions.

2012-06-03 Thread Paul Johnson

This is an I was just wondering question.

When the package dataframe was announced, the author claimed to
reduce the number of times a data frame was copied, I started to
wonder if I should care about this in my projects.  Has anybody
written a general guide for how to write R code that doesn't
needlessly exhaust RAM?

In Objective-C, we used to gain some considerable advantages by
avoiding declaring objects separately, using anonymous variable
instead. The storage was allocated on the stack, I think, and I think
there was talk that the numbers might stay 'closer' to the CPU
(register?) for immediate usage.

Does this benefit in R as well?  For example, instead of the way I
would usually do this:

 mf - model.frame(model)
  y - model.response(mf)

Here is the anonymous alternative, mf is never declared

y - model.response(model.frame(model))

On the face of it, I can imagine this might be better because no
permanent thing mf is created, the garbage collector wouldn't be
called into play if all the data is local and disappears immediately.
But, then again, R is doing lots of stuff under the hood that I've
never bothered to learn about.


pj
-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] memory usage upon web-query using try function

2011-06-26 Thread cir p

Dear Community,
my program below runs quite slow and I'm not sure whether the http-requests are 
to blame for this. Also, when running it gradually increases the memory usage 
enormously. After the program finishes, the memory is not freed. Can someone 
point out a problem in the code? Sorry my basic question, but I am totally new 
to R programming...

Many thans for your time,
Cyrus

require(XML)
row=0
URL=http://de.finance.yahoo.com/lookup?s=;
df - matrix(ncol=6,nrow=10)
for (Ticker in 10:20)
{
URLTicker=paste(URL,Ticker,sep=)
query=try(readHTMLTable(
URLTicker,
which=2,
header=T,
colClasses = c(character,character,character,
   character,character,character),
stringsAsFactors=F,)[1,],silent=T)

if (class(query)==data.frame)
{
row=row+1
df[row,]=as.character(query)
}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory usage in read.csv()

2010-01-20 Thread nabble . 30 . miller_2555

Hi Jim  Gabor -

   Apparently, it was most likely a hardware issue (shortly after
sending my last e-mail, the computer promptly died). After buying a
new system and restoring, the script runs fine. Thanks for your help!

On Tue, Jan 19, 2010 at 2:02 PM, jim holtman - jholt...@gmail.com
+nabble+miller_2555+9dc9649aca.jholtman#gmail@spamgourmet.com
wrote:
 I read vmstat data in just fine without any problems.  Here is an
 example of how I do it:

 VMstat - read.table('vmstat.txt', header=TRUE, as.is=TRUE)

 vmstat.txt looks like this:

 date time r b w swap free re mf pi po fr de sr intr syscalls cs user sys id
 07/27/05 00:13:06 0 0 0 27755440 13051648 20 86 0 0 0 0 0 456 2918 1323 0 1 99
 07/27/05 00:13:36 0 0 0 27755280 13051480 11 53 0 0 0 0 0 399 1722 1411 0 1 99
 07/27/05 00:14:06 0 0 0 27753952 13051248 18 88 0 0 0 0 0 424 1259 1254 0 1 99
 07/27/05 00:14:36 0 0 0 27755304 13051496 17 85 0 0 0 0 0 430 1029 1246 0 1 99
 07/27/05 00:15:06 0 0 0 27755064 13051232 41 278 0 1 1 0 0 452 2047 1386 0 1 
 99
 07/27/05 00:15:36 0 0 0 27753824 13040720 125 1039 0 0 0 0 0 664 4097
 1901 3 2 95
 07/27/05 00:16:06 0 0 0 27754472 13027000 15 91 0 0 0 0 0 432 1160 1273 0 1 99
 07/27/05 00:16:36 0 0 0 27754568 13027104 17 85 0 0 0 0 0 416 1058 1271 0 1 99

 Have you tried a smaller portion of data?

 Here is what it took to read in a file with 85K lines:

 system.time(vmstat - read.table('c:/vmstat.txt', header=TRUE))
   user  system elapsed
   2.01    0.01    2.03
 str(vmstat)
 'data.frame':   85680 obs. of  20 variables:
  $ date    : Factor w/ 2 levels 07/27/05,07/28/05: 1 1 1 1 1 1 1 1 1 1 ...
  $ time    : Factor w/ 2856 levels 00:00:26,00:00:56,..: 27 29 31
 33 35 37 39 41 43 45 ...
  $ r       : int  0 0 0 0 0 0 0 0 0 0 ...
  $ b       : int  0 0 0 0 0 0 0 0 0 0 ...
  $ w       : int  0 0 0 0 0 0 0 0 0 0 ...
  $ swap    : int  27755440 27755280 27753952 27755304 27755064
 27753824 27754472 27754568 27754560 27754704 ...
  $ free    : int  13051648 13051480 13051248 13051496 13051232
 13040720 13027000 13027104 13027096 13027240 ...
  $ re      : int  20 11 18 17 41 125 15 17 13 12 ...
  $ mf      : int  86 53 88 85 278 1039 91 85 69 51 ...
  $ pi      : int  0 0 0 0 0 0 0 0 0 0 ...
  $ po      : int  0 0 0 0 1 0 0 0 0 1 ...
  $ fr      : int  0 0 0 0 1 0 0 0 0 1 ...
  $ de      : int  0 0 0 0 0 0 0 0 0 0 ...
  $ sr      : int  0 0 0 0 0 0 0 0 0 0 ...
  $ intr    : int  456 399 424 430 452 664 432 416 425 432 ...
  $ syscalls: int  2918 1722 1259 1029 2047 4097 1160 1058 1198 1727 ...
  $ cs      : int  1323 1411 1254 1246 1386 1901 1273 1271 1268 1477 ...
  $ user    : int  0 0 0 0 0 3 0 0 0 0 ...
  $ sys     : int  1 1 1 1 1 2 1 1 1 1 ...
  $ id      : int  99 99 99 99 99 95 99 99 99 99 ...



 On Tue, Jan 19, 2010 at 9:25 AM, nabble.30.miller_2...@spamgourmet.com 
 wrote:

 I'm sure this has gotten some attention before, but I have two CSV
 files generated from vmstat and free that are roughly 6-8 Mb (about
 80,000 lines) each. When I try to use read.csv(), R allocates all
 available memory (about 4.9 Gb) when loading the files, which is over
 300 times the size of the raw data.  Here are the scripts used to
 generate the CSV files as well as the R code:

 Scripts (run for roughly a 24-hour period):
    vmstat -ant 1 | awk '$0 !~ /(proc|free)/ {FS= ; OFS=,; print
 strftime(%F %T %Z),$6,$7,$12,$13,$14,$15,$16,$17;}' 
 ~/vmstat_20100118_133845.o;
    free -ms 1 | awk '$0 ~ /Mem\:/ {FS= ; OFS=,; print
 strftime(%F %T %Z),$2,$3,$4,$5,$6,$7}' 
 ~/memfree_20100118_140845.o;

 R code:
    infile.vms - ~/vmstat_20100118_133845.o;
    infile.mem - ~/memfree_20100118_140845.o;
    vms.colnames -
 c(time,r,b,swpd,free,inact,active,si,so,bi,bo,in,cs,us,sy,id,wa,st);
    vms.colclass - c(character,rep(integer,length(vms.colnames)-1));
    mem.colnames - 
 c(time,total,used,free,shared,buffers,cached);
    mem.colclass - c(character,rep(integer,length(mem.colnames)-1));
    vmsdf - 
 (read.csv(infile.vms,header=FALSE,colClasses=vms.colclass,col.names=vms.colnames));
    memdf - 
 (read.csv(infile.mem,header=FALSE,colClasses=mem.colclass,col.names=mem.colnames));

 I am running R v2.10.0 on a 64-bit machine with Fedora 10 (Linux
 version 2.6.27.41-170.2.117.fc10.x86_64 ) with 6Gb of memory. There
 are no other significant programs running and `rm()` followed by `
 gc()` successfully frees the memory (followed by swapins after other
 programs seek to used previously cached information swapped to disk).
 I've incorporated the memory-saving suggestions in the `read.csv()`
 manual page, excluding the limit on the lines read (which shouldn't
 really be necessary here since we're only talking about  20 Mb of raw
 data. Any suggestions, or is the read.csv() code known to have memory
 leak/ overcommit issues?

 Thanks

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

[R] Memory usage in read.csv()

2010-01-19 Thread nabble . 30 . miller_2555

I'm sure this has gotten some attention before, but I have two CSV
files generated from vmstat and free that are roughly 6-8 Mb (about
80,000 lines) each. When I try to use read.csv(), R allocates all
available memory (about 4.9 Gb) when loading the files, which is over
300 times the size of the raw data.  Here are the scripts used to
generate the CSV files as well as the R code:

Scripts (run for roughly a 24-hour period):
vmstat -ant 1 | awk '$0 !~ /(proc|free)/ {FS= ; OFS=,; print
strftime(%F %T %Z),$6,$7,$12,$13,$14,$15,$16,$17;}' 
~/vmstat_20100118_133845.o;
free -ms 1 | awk '$0 ~ /Mem\:/ {FS= ; OFS=,; print
strftime(%F %T %Z),$2,$3,$4,$5,$6,$7}' 
~/memfree_20100118_140845.o;

R code:
infile.vms - ~/vmstat_20100118_133845.o;
infile.mem - ~/memfree_20100118_140845.o;
vms.colnames -
c(time,r,b,swpd,free,inact,active,si,so,bi,bo,in,cs,us,sy,id,wa,st);
vms.colclass - c(character,rep(integer,length(vms.colnames)-1));
mem.colnames - c(time,total,used,free,shared,buffers,cached);
mem.colclass - c(character,rep(integer,length(mem.colnames)-1));
vmsdf - 
(read.csv(infile.vms,header=FALSE,colClasses=vms.colclass,col.names=vms.colnames));
memdf - 
(read.csv(infile.mem,header=FALSE,colClasses=mem.colclass,col.names=mem.colnames));

I am running R v2.10.0 on a 64-bit machine with Fedora 10 (Linux
version 2.6.27.41-170.2.117.fc10.x86_64 ) with 6Gb of memory. There
are no other significant programs running and `rm()` followed by `
gc()` successfully frees the memory (followed by swapins after other
programs seek to used previously cached information swapped to disk).
I've incorporated the memory-saving suggestions in the `read.csv()`
manual page, excluding the limit on the lines read (which shouldn't
really be necessary here since we're only talking about  20 Mb of raw
data. Any suggestions, or is the read.csv() code known to have memory
leak/ overcommit issues?

Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory usage in read.csv()

2010-01-19 Thread jim holtman

I read vmstat data in just fine without any problems.  Here is an
example of how I do it:

VMstat - read.table('vmstat.txt', header=TRUE, as.is=TRUE)

vmstat.txt looks like this:

date time r b w swap free re mf pi po fr de sr intr syscalls cs user sys id
07/27/05 00:13:06 0 0 0 27755440 13051648 20 86 0 0 0 0 0 456 2918 1323 0 1 99
07/27/05 00:13:36 0 0 0 27755280 13051480 11 53 0 0 0 0 0 399 1722 1411 0 1 99
07/27/05 00:14:06 0 0 0 27753952 13051248 18 88 0 0 0 0 0 424 1259 1254 0 1 99
07/27/05 00:14:36 0 0 0 27755304 13051496 17 85 0 0 0 0 0 430 1029 1246 0 1 99
07/27/05 00:15:06 0 0 0 27755064 13051232 41 278 0 1 1 0 0 452 2047 1386 0 1 99
07/27/05 00:15:36 0 0 0 27753824 13040720 125 1039 0 0 0 0 0 664 4097
1901 3 2 95
07/27/05 00:16:06 0 0 0 27754472 13027000 15 91 0 0 0 0 0 432 1160 1273 0 1 99
07/27/05 00:16:36 0 0 0 27754568 13027104 17 85 0 0 0 0 0 416 1058 1271 0 1 99

Have you tried a smaller portion of data?

Here is what it took to read in a file with 85K lines:

 system.time(vmstat - read.table('c:/vmstat.txt', header=TRUE))
   user  system elapsed
   2.010.012.03
 str(vmstat)
'data.frame':   85680 obs. of  20 variables:
 $ date: Factor w/ 2 levels 07/27/05,07/28/05: 1 1 1 1 1 1 1 1 1 1 ...
 $ time: Factor w/ 2856 levels 00:00:26,00:00:56,..: 27 29 31
33 35 37 39 41 43 45 ...
 $ r   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ b   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ w   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ swap: int  27755440 27755280 27753952 27755304 27755064
27753824 27754472 27754568 27754560 27754704 ...
 $ free: int  13051648 13051480 13051248 13051496 13051232
13040720 13027000 13027104 13027096 13027240 ...
 $ re  : int  20 11 18 17 41 125 15 17 13 12 ...
 $ mf  : int  86 53 88 85 278 1039 91 85 69 51 ...
 $ pi  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ po  : int  0 0 0 0 1 0 0 0 0 1 ...
 $ fr  : int  0 0 0 0 1 0 0 0 0 1 ...
 $ de  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ sr  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ intr: int  456 399 424 430 452 664 432 416 425 432 ...
 $ syscalls: int  2918 1722 1259 1029 2047 4097 1160 1058 1198 1727 ...
 $ cs  : int  1323 1411 1254 1246 1386 1901 1273 1271 1268 1477 ...
 $ user: int  0 0 0 0 0 3 0 0 0 0 ...
 $ sys : int  1 1 1 1 1 2 1 1 1 1 ...
 $ id  : int  99 99 99 99 99 95 99 99 99 99 ...



On Tue, Jan 19, 2010 at 9:25 AM, nabble.30.miller_2...@spamgourmet.com wrote:

 I'm sure this has gotten some attention before, but I have two CSV
 files generated from vmstat and free that are roughly 6-8 Mb (about
 80,000 lines) each. When I try to use read.csv(), R allocates all
 available memory (about 4.9 Gb) when loading the files, which is over
 300 times the size of the raw data.  Here are the scripts used to
 generate the CSV files as well as the R code:

 Scripts (run for roughly a 24-hour period):
    vmstat -ant 1 | awk '$0 !~ /(proc|free)/ {FS= ; OFS=,; print
 strftime(%F %T %Z),$6,$7,$12,$13,$14,$15,$16,$17;}' 
 ~/vmstat_20100118_133845.o;
    free -ms 1 | awk '$0 ~ /Mem\:/ {FS= ; OFS=,; print
 strftime(%F %T %Z),$2,$3,$4,$5,$6,$7}' 
 ~/memfree_20100118_140845.o;

 R code:
    infile.vms - ~/vmstat_20100118_133845.o;
    infile.mem - ~/memfree_20100118_140845.o;
    vms.colnames -
 c(time,r,b,swpd,free,inact,active,si,so,bi,bo,in,cs,us,sy,id,wa,st);
    vms.colclass - c(character,rep(integer,length(vms.colnames)-1));
    mem.colnames - 
 c(time,total,used,free,shared,buffers,cached);
    mem.colclass - c(character,rep(integer,length(mem.colnames)-1));
    vmsdf - 
 (read.csv(infile.vms,header=FALSE,colClasses=vms.colclass,col.names=vms.colnames));
    memdf - 
 (read.csv(infile.mem,header=FALSE,colClasses=mem.colclass,col.names=mem.colnames));

 I am running R v2.10.0 on a 64-bit machine with Fedora 10 (Linux
 version 2.6.27.41-170.2.117.fc10.x86_64 ) with 6Gb of memory. There
 are no other significant programs running and `rm()` followed by `
 gc()` successfully frees the memory (followed by swapins after other
 programs seek to used previously cached information swapped to disk).
 I've incorporated the memory-saving suggestions in the `read.csv()`
 manual page, excluding the limit on the lines read (which shouldn't
 really be necessary here since we're only talking about  20 Mb of raw
 data. Any suggestions, or is the read.csv() code known to have memory
 leak/ overcommit issues?

 Thanks

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal,

Re: [R] Memory usage in read.csv()

2010-01-19 Thread Gabor Grothendieck

You could also try read.csv.sql in sqldf.  See examples on sqldf home page:

http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql

On Tue, Jan 19, 2010 at 9:25 AM,  nabble.30.miller_2...@spamgourmet.com wrote:
 I'm sure this has gotten some attention before, but I have two CSV
 files generated from vmstat and free that are roughly 6-8 Mb (about
 80,000 lines) each. When I try to use read.csv(), R allocates all
 available memory (about 4.9 Gb) when loading the files, which is over
 300 times the size of the raw data.  Here are the scripts used to
 generate the CSV files as well as the R code:

 Scripts (run for roughly a 24-hour period):
    vmstat -ant 1 | awk '$0 !~ /(proc|free)/ {FS= ; OFS=,; print
 strftime(%F %T %Z),$6,$7,$12,$13,$14,$15,$16,$17;}' 
 ~/vmstat_20100118_133845.o;
    free -ms 1 | awk '$0 ~ /Mem\:/ {FS= ; OFS=,; print
 strftime(%F %T %Z),$2,$3,$4,$5,$6,$7}' 
 ~/memfree_20100118_140845.o;

 R code:
    infile.vms - ~/vmstat_20100118_133845.o;
    infile.mem - ~/memfree_20100118_140845.o;
    vms.colnames -
 c(time,r,b,swpd,free,inact,active,si,so,bi,bo,in,cs,us,sy,id,wa,st);
    vms.colclass - c(character,rep(integer,length(vms.colnames)-1));
    mem.colnames - 
 c(time,total,used,free,shared,buffers,cached);
    mem.colclass - c(character,rep(integer,length(mem.colnames)-1));
    vmsdf - 
 (read.csv(infile.vms,header=FALSE,colClasses=vms.colclass,col.names=vms.colnames));
    memdf - 
 (read.csv(infile.mem,header=FALSE,colClasses=mem.colclass,col.names=mem.colnames));

 I am running R v2.10.0 on a 64-bit machine with Fedora 10 (Linux
 version 2.6.27.41-170.2.117.fc10.x86_64 ) with 6Gb of memory. There
 are no other significant programs running and `rm()` followed by `
 gc()` successfully frees the memory (followed by swapins after other
 programs seek to used previously cached information swapped to disk).
 I've incorporated the memory-saving suggestions in the `read.csv()`
 manual page, excluding the limit on the lines read (which shouldn't
 really be necessary here since we're only talking about  20 Mb of raw
 data. Any suggestions, or is the read.csv() code known to have memory
 leak/ overcommit issues?

 Thanks

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Passing lists and R memory usage growth

2009-10-03 Thread Rajeev Ayyagari

Hello,

I can't think of an explanation for this memory allocation behaviour
and was hoping someone on the list could help out.

Setup:
--

R version 2.8.1, 32-bit Ubuntu 9.04 Linux, Core 2 Duo with 3GB ram

Description:


Inside a for loop, I am passing a list to a function.  The function
accesses various members of the list.

I understand that in this situation, the entire list may be duplicated
in each function call.  That's ok.  But the memory given to these
duplicates doesn't seem to be recovered by the garbage collector after
the function call has ended and more memory is allocated in each
iteration. (See output below.)

I also tried summing up object.size() for all objects in all
environments, and the total is constant about 15 Mbytes at each
iteration.  But overall memory consumption as reported by gc() (and my
operating system) keeps going up to 2 Gbytes and more.

Pseudocode:
---

# This function and its callees need a 'results' list
some.function.1 - function(iter, res, par)
{
  # access res$gamma[[iter-1]], res$beta[[iter-1]]
  ...
}

# This function and its callees need a 'results' list
some.function.2 - function(iter, res, par)
{
  # access res$gamma[[iter-1]], res$beta[[iter-1]]
  ...
}

# Some parameters
par - list( ... )

# List storing results.
# Only results$gamma[1:3], results$beta[1:3] are used
results - list(gamma = list(), beta = list())

for (iter in 1:100)
{
  print(paste(Iteration , iter))

  # min(iter, 3) is the most recent slot of results$gamma etc.
  results$gamma[[min(iter, 3)]] - some.function.1(min(iter, 3), results,
par)
  results$beta[[min(iter, 3)]] - some.function.2(min(iter, 3), results,
par)

  # Delete earlier results
  if (iter  2)
  {
results$gamma[[1]] - NULL
results$beta[[1]] - NULL
  }

  # Report on memory usage
  gc(verbose=TRUE)
}

Output from an actual run of my program:


[1] Iteration  1
Garbage collection 255 = 122+60+73 (level 2) ...
6.1 Mbytes of cons cells used (48%)
232.3 Mbytes of vectors used (69%)
[1] Iteration  2
Garbage collection 257 = 123+60+74 (level 2) ...
6.1 Mbytes of cons cells used (48%)
238.3 Mbytes of vectors used (67%)
[1] Iteration  3
Garbage collection 258 = 123+60+75 (level 2) ...
6.1 Mbytes of cons cells used (49%)
242.8 Mbytes of vectors used (69%)
[1] Iteration  4
Garbage collection 259 = 123+60+76 (level 2) ...
6.2 Mbytes of cons cells used (49%)
247.3 Mbytes of vectors used (66%)
[1] Iteration  5
Garbage collection 260 = 123+60+77 (level 2) ...
6.2 Mbytes of cons cells used (50%)
251.8 Mbytes of vectors used (68%)
...

Thanks,
Rajeev.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Passing lists and R memory usage growth

2009-10-03 Thread Duncan Murdoch


You need to give reproducible code for a question like this, not pseudocode.

And you should consider using a recent version of R, not the relatively 
ancient 2.8.1 (which was released in late 2008.


Duncan Murdoch

On 03/10/2009 1:30 PM, Rajeev Ayyagari wrote:

Hello,

I can't think of an explanation for this memory allocation behaviour
and was hoping someone on the list could help out.

Setup:
--

R version 2.8.1, 32-bit Ubuntu 9.04 Linux, Core 2 Duo with 3GB ram

Description:


Inside a for loop, I am passing a list to a function.  The function
accesses various members of the list.

I understand that in this situation, the entire list may be duplicated
in each function call.  That's ok.  But the memory given to these
duplicates doesn't seem to be recovered by the garbage collector after
the function call has ended and more memory is allocated in each
iteration. (See output below.)

I also tried summing up object.size() for all objects in all
environments, and the total is constant about 15 Mbytes at each
iteration.  But overall memory consumption as reported by gc() (and my
operating system) keeps going up to 2 Gbytes and more.

Pseudocode:
---

# This function and its callees need a 'results' list
some.function.1 - function(iter, res, par)
{
  # access res$gamma[[iter-1]], res$beta[[iter-1]]
  ...
}

# This function and its callees need a 'results' list
some.function.2 - function(iter, res, par)
{
  # access res$gamma[[iter-1]], res$beta[[iter-1]]
  ...
}

# Some parameters
par - list( ... )

# List storing results.
# Only results$gamma[1:3], results$beta[1:3] are used
results - list(gamma = list(), beta = list())

for (iter in 1:100)
{
  print(paste(Iteration , iter))

  # min(iter, 3) is the most recent slot of results$gamma etc.
  results$gamma[[min(iter, 3)]] - some.function.1(min(iter, 3), results,
par)
  results$beta[[min(iter, 3)]] - some.function.2(min(iter, 3), results,
par)

  # Delete earlier results
  if (iter  2)
  {
results$gamma[[1]] - NULL
results$beta[[1]] - NULL
  }

  # Report on memory usage
  gc(verbose=TRUE)
}

Output from an actual run of my program:


[1] Iteration  1
Garbage collection 255 = 122+60+73 (level 2) ...
6.1 Mbytes of cons cells used (48%)
232.3 Mbytes of vectors used (69%)
[1] Iteration  2
Garbage collection 257 = 123+60+74 (level 2) ...
6.1 Mbytes of cons cells used (48%)
238.3 Mbytes of vectors used (67%)
[1] Iteration  3
Garbage collection 258 = 123+60+75 (level 2) ...
6.1 Mbytes of cons cells used (49%)
242.8 Mbytes of vectors used (69%)
[1] Iteration  4
Garbage collection 259 = 123+60+76 (level 2) ...
6.2 Mbytes of cons cells used (49%)
247.3 Mbytes of vectors used (66%)
[1] Iteration  5
Garbage collection 260 = 123+60+77 (level 2) ...
6.2 Mbytes of cons cells used (50%)
251.8 Mbytes of vectors used (68%)
...

Thanks,
Rajeev.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Passing lists and R memory usage growth

2009-10-03 Thread Rajeev Ayyagari

Duncan:

I took your suggestion and upgraded to R 2.9.2, but the problem persists.

I am not able to reproduce the problem in a simple case.   In my
actual code the functions some.function.1() and some.function.2() are
quite complicated and call various other functions which also access
elements of the list.  If I can find a simple way to reproduce it, I
will post the code to the list.

I know it must be the results list in the pseudocode which is
causing the problem because:

1. I tried tracemem() on par and results; results is duplicated
several times but par is not.

2. I can eliminate the memory problem completely by rewriting
some.function.1() and some.function.2() to accept individual elements
of the list as arguments, and passing several list elements like
results$gamma[[iter-1]] etc. in the call. (Rather than passing the
entire list as a single argument.)  This makes the code harder to read
but the memory problem is eliminated.

Regards
Rajeev.

On Sat, Oct 3, 2009 at 1:43 PM, Duncan Murdoch murd...@stats.uwo.ca wrote:
 You need to give reproducible code for a question like this, not pseudocode.

 And you should consider using a recent version of R, not the relatively
 ancient 2.8.1 (which was released in late 2008.

 Duncan Murdoch

 On 03/10/2009 1:30 PM, Rajeev Ayyagari wrote:

 Hello,

 I can't think of an explanation for this memory allocation behaviour
 and was hoping someone on the list could help out.

 Setup:
 --

 R version 2.8.1, 32-bit Ubuntu 9.04 Linux, Core 2 Duo with 3GB ram

 Description:
 

 Inside a for loop, I am passing a list to a function.  The function
 accesses various members of the list.

 I understand that in this situation, the entire list may be duplicated
 in each function call.  That's ok.  But the memory given to these
 duplicates doesn't seem to be recovered by the garbage collector after
 the function call has ended and more memory is allocated in each
 iteration. (See output below.)

 I also tried summing up object.size() for all objects in all
 environments, and the total is constant about 15 Mbytes at each
 iteration.  But overall memory consumption as reported by gc() (and my
 operating system) keeps going up to 2 Gbytes and more.

 Pseudocode:
 ---

 # This function and its callees need a 'results' list
 some.function.1 - function(iter, res, par)
 {
  # access res$gamma[[iter-1]], res$beta[[iter-1]]
  ...
 }

 # This function and its callees need a 'results' list
 some.function.2 - function(iter, res, par)
 {
  # access res$gamma[[iter-1]], res$beta[[iter-1]]
  ...
 }

 # Some parameters
 par - list( ... )

 # List storing results.
 # Only results$gamma[1:3], results$beta[1:3] are used
 results - list(gamma = list(), beta = list())

 for (iter in 1:100)
 {
  print(paste(Iteration , iter))

  # min(iter, 3) is the most recent slot of results$gamma etc.
  results$gamma[[min(iter, 3)]] - some.function.1(min(iter, 3), results,
 par)
  results$beta[[min(iter, 3)]] - some.function.2(min(iter, 3), results,
 par)

  # Delete earlier results
  if (iter  2)
  {
    results$gamma[[1]] - NULL
    results$beta[[1]] - NULL
  }

  # Report on memory usage
  gc(verbose=TRUE)
 }

 Output from an actual run of my program:
 

 [1] Iteration  1
 Garbage collection 255 = 122+60+73 (level 2) ...
 6.1 Mbytes of cons cells used (48%)
 232.3 Mbytes of vectors used (69%)
 [1] Iteration  2
 Garbage collection 257 = 123+60+74 (level 2) ...
 6.1 Mbytes of cons cells used (48%)
 238.3 Mbytes of vectors used (67%)
 [1] Iteration  3
 Garbage collection 258 = 123+60+75 (level 2) ...
 6.1 Mbytes of cons cells used (49%)
 242.8 Mbytes of vectors used (69%)
 [1] Iteration  4
 Garbage collection 259 = 123+60+76 (level 2) ...
 6.2 Mbytes of cons cells used (49%)
 247.3 Mbytes of vectors used (66%)
 [1] Iteration  5
 Garbage collection 260 = 123+60+77 (level 2) ...
 6.2 Mbytes of cons cells used (50%)
 251.8 Mbytes of vectors used (68%)
 ...

 Thanks,
 Rajeev.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Memory Usage Concerns

2009-09-15 Thread Evan Klitzke

On Mon, Sep 14, 2009 at 10:01 PM, Henrik Bengtsson h...@stat.berkeley.edu 
wrote:
 As already suggested, you're (much) better off if you specify colClasses, e.g.

 tab - read.table(~/20090708.tab, colClasses=c(factor, double, 
 double));

 Otherwise, R has to load all the data, make a best guess of the column
 classes, and then coerce (which requires a copy).

Thanks Henrik, I tried this as well as a variant that another user
sent me privately. When I tell R the colClasses, it does a much better
job of allocating memory (ending up with 96M of RSS memory, which
isn't great but is definitely acceptable).

A couple of notes I made from testing some variants, if anyone else is
interested:
 * giving it an nrows argument doesn't help it allocate less memory
(just a guess, but maybe because it's trying the powers-of-two
allocation strategy in both cases)
 * there's no difference in memory usage between telling it a column
is numeric vs double
 * when telling it the types in advance, loading the table is much, much faster

Maybe if I gather some more fortitude in the future, I'll poke around
at the internals and see where the extra memory is going, since I'm
still curious where the extra memory is going. Is that just the
overhead of allocating a full object for each value (i.e. rather than
just a double[] or whatever)?

-- 
Evan Klitzke e...@eklitzke.org :wq

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Memory Usage Concerns

2009-09-15 Thread Thomas Lumley

On Tue, 15 Sep 2009, Evan Klitzke wrote:

On Mon, Sep 14, 2009 at 10:01 PM, Henrik Bengtsson h...@stat.berkeley.edu
wrote:

As already suggested, you're (much) better off if you specify colClasses, e.g.

tab - read.table(~/20090708.tab, colClasses=c(factor, double, double));

Otherwise, R has to load all the data, make a best guess of the column
classes, and then coerce (which requires a copy).

Thanks Henrik, I tried this as well as a variant that another user
sent me privately. When I tell R the colClasses, it does a much better
job of allocating memory (ending up with 96M of RSS memory, which
isn't great but is definitely acceptable).

A couple of notes I made from testing some variants, if anyone else is
interested:
* giving it an nrows argument doesn't help it allocate less memory
(just a guess, but maybe because it's trying the powers-of-two
allocation strategy in both cases)
* there's no difference in memory usage between telling it a column
is numeric vs double

Because they are the same type

* when telling it the types in advance, loading the table is much, much faster

Indeed.

Maybe if I gather some more fortitude in the future, I'll poke around
at the internals and see where the extra memory is going, since I'm
still curious where the extra memory is going. Is that just the
overhead of allocating a full object for each value (i.e. rather than
just a double[] or whatever)?

No, because it doesn't allocate a full object for each value, it does just allocate a double[] plus a
constant amount of overhead. R doesn't have scalar types so there isn't even such a thing as an object
for a single value, just vectors with a single element. R will use more than the object size for the data
set, because it makes temporary copies of things.

-thomas

Thomas Lumley Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

Re: [R] R Memory Usage Concerns

2009-09-15 Thread Carlos J. Gil Bellosta

Hello,

I do not know whether my package colbycol may help you. It can help
you read files that would not have fitted into memory otherwise.
Internally, as the name indicates, data is read into R in a column by
column fashion. 

IO times increase but you need just a fraction of intermediate memory
to read the files.

Best regards,

Carlos J. Gil Bellosta
http://www.datanalytics.com


On Tue, 2009-09-15 at 00:10 -0700, Evan Klitzke wrote:
 On Mon, Sep 14, 2009 at 10:01 PM, Henrik Bengtsson h...@stat.berkeley.edu 
 wrote:
  As already suggested, you're (much) better off if you specify colClasses, 
  e.g.
 
  tab - read.table(~/20090708.tab, colClasses=c(factor, double, 
  double));
 
  Otherwise, R has to load all the data, make a best guess of the column
  classes, and then coerce (which requires a copy).
 
 Thanks Henrik, I tried this as well as a variant that another user
 sent me privately. When I tell R the colClasses, it does a much better
 job of allocating memory (ending up with 96M of RSS memory, which
 isn't great but is definitely acceptable).
 
 A couple of notes I made from testing some variants, if anyone else is
 interested:
  * giving it an nrows argument doesn't help it allocate less memory
 (just a guess, but maybe because it's trying the powers-of-two
 allocation strategy in both cases)
  * there's no difference in memory usage between telling it a column
 is numeric vs double
  * when telling it the types in advance, loading the table is much, much 
 faster
 
 Maybe if I gather some more fortitude in the future, I'll poke around
 at the internals and see where the extra memory is going, since I'm
 still curious where the extra memory is going. Is that just the
 overhead of allocating a full object for each value (i.e. rather than
 just a double[] or whatever)?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R Memory Usage Concerns

2009-09-14 Thread Evan Klitzke

Hello all,

To start with, these measurements are on Linux with R 2.9.2 (64-bit
build) and Python 2.6 (also 64-bit).

I've been investigating R for some log file analysis that I've been
doing. I'm coming at this from the angle of a programmer whose
primarily worked in Python. As I've been playing around with R, I've
noticed that R seems to use a *lot* of memory, especially compared to
Python. Here's an example of what I'm talking about. I have a sample
data file whose characteristics are like this:

[e...@t500 ~]$ ls -lh 20090708.tab
-rw-rw-r-- 1 evan evan 63M 2009-07-08 20:56 20090708.tab

[e...@t500 ~]$ head 20090708.tab
spice 1247036405.04 0.0141088962555
spice 1247036405.01 0.046797990799
spice 1247036405.13 0.0137498378754
spice 1247036404.87 0.0594480037689
spice 1247036405.02 0.0170919895172
topic 1247036404.74 0.512196063995
user_details 1247036404.64 0.242133140564
spice 1247036405.23 0.0408620834351
biz_details 1247036405.04 0.40732884407
spice 1247036405.35 0.0501029491425

[e...@t500 ~]$ wc -l 20090708.tab
1797601 20090708.tab

So it's basically a CSV file (actually, space delimited) where all of
the lines are three columns, a low-cardinality string, a double, and a
double. The file itself is 63M. Python can load all of the data from
the file really compactly (source for the script at the bottom of the
message):

[e...@t500 ~]$ python code/scratch/pymem.py
VIRT = 25230, RSS = 860
VIRT = 81142, RSS = 55825

So this shows that my Python process starts out at 860K RSS memory
before doing any processing, and ends at 55M of RSS memory. This is
pretty good, actually it's better than the size of the file, since a
double can be stored more compactly than the textual data stored in
the data file.

Since I'm new to R I didn't know how to read /proc and so forth, so
instead I launched an R repl and used ps to record the RSS memory
usage before and after running the following statement:

 tab - read.table(~/20090708.tab)

The numbers I measured were:
VIRT = 176820, RSS = 26180   (just after starting the repl)
VIRT = 414284, RSS = 263708 (after executing the command)

This kind of concerns me. I can understand why R uses more memory at
startup, since it's launching a full repl which my Python script
wasn't doing. But I would have expected the memory usage to not have
grown more like Python did after loading the data. In fact, R ought to
be able to use less memory, since the first column is textual and has
low cardinality (I think 7 distinct values), so storing it as a factor
should be very memory efficient.

For the things that I want to use R for, I know I'll be processing
much larger datasets, and at the rate that R is consuming memory it
may not be possible to fully load the data into memory. I'm concerned
that it may not be worth pursuing learning R if it's possible to load
the data into memory using something like Python but not R. I don't
want to overlook the possibility that I'm overlooking something, since
I'm new to the language. Can anyone answer for me:
 * What is R doing with all of that memory?
 * Is there something I did wrong? Is there a more memory-efficient
way to load this data?
 * Are there R modules that can store large data-sets in a more
memory-efficient way? Can anyone relate their experiences with them?

For reference, here's the Python script I used to measure Python's memory usage:

import os

def show_mem():
statm = open('/proc/%d/statm' % os.getpid()).read()
print 'VIRT = %s, RSS = %s' % tuple(statm.split(' ')[:2])

def read_data(fname):
servlets = []
timestamps = []
elapsed = []

for line in open(fname, 'r'):
s, t, e = line.strip().split(' ')
servlets.append(s)
timestamps.append(float(t))
elapsed.append(float(e))

show_mem()

if __name__ == '__main__':
show_mem()
read_data('/home/evan/20090708.tab')


--
Evan Klitzke e...@eklitzke.org :wq

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Memory Usage Concerns

2009-09-14 Thread jim holtman

When you read your file into R, show the structure of the object:

str(tab)

also the size of the object:

object.size(tab)

This will tell you what your data looks like and the size taken in R.
Also in read.table, use colClasses to define what the format of the
data is; may make it faster.  You might want to force a garbage
collection 'gc()' to see if that frees up any memory.  If your input
is about 2M lines and it looks like there are three column (alpha,
numeric, numeric), I would guess that you will probably have an
object.size of about 50MB.  This information would help.



On Mon, Sep 14, 2009 at 11:11 PM, Evan Klitzke e...@eklitzke.org wrote:
 Hello all,

 To start with, these measurements are on Linux with R 2.9.2 (64-bit
 build) and Python 2.6 (also 64-bit).

 I've been investigating R for some log file analysis that I've been
 doing. I'm coming at this from the angle of a programmer whose
 primarily worked in Python. As I've been playing around with R, I've
 noticed that R seems to use a *lot* of memory, especially compared to
 Python. Here's an example of what I'm talking about. I have a sample
 data file whose characteristics are like this:

 [e...@t500 ~]$ ls -lh 20090708.tab
 -rw-rw-r-- 1 evan evan 63M 2009-07-08 20:56 20090708.tab

 [e...@t500 ~]$ head 20090708.tab
 spice 1247036405.04 0.0141088962555
 spice 1247036405.01 0.046797990799
 spice 1247036405.13 0.0137498378754
 spice 1247036404.87 0.0594480037689
 spice 1247036405.02 0.0170919895172
 topic 1247036404.74 0.512196063995
 user_details 1247036404.64 0.242133140564
 spice 1247036405.23 0.0408620834351
 biz_details 1247036405.04 0.40732884407
 spice 1247036405.35 0.0501029491425

 [e...@t500 ~]$ wc -l 20090708.tab
 1797601 20090708.tab

 So it's basically a CSV file (actually, space delimited) where all of
 the lines are three columns, a low-cardinality string, a double, and a
 double. The file itself is 63M. Python can load all of the data from
 the file really compactly (source for the script at the bottom of the
 message):

 [e...@t500 ~]$ python code/scratch/pymem.py
 VIRT = 25230, RSS = 860
 VIRT = 81142, RSS = 55825

 So this shows that my Python process starts out at 860K RSS memory
 before doing any processing, and ends at 55M of RSS memory. This is
 pretty good, actually it's better than the size of the file, since a
 double can be stored more compactly than the textual data stored in
 the data file.

 Since I'm new to R I didn't know how to read /proc and so forth, so
 instead I launched an R repl and used ps to record the RSS memory
 usage before and after running the following statement:

 tab - read.table(~/20090708.tab)

 The numbers I measured were:
 VIRT = 176820, RSS = 26180   (just after starting the repl)
 VIRT = 414284, RSS = 263708 (after executing the command)

 This kind of concerns me. I can understand why R uses more memory at
 startup, since it's launching a full repl which my Python script
 wasn't doing. But I would have expected the memory usage to not have
 grown more like Python did after loading the data. In fact, R ought to
 be able to use less memory, since the first column is textual and has
 low cardinality (I think 7 distinct values), so storing it as a factor
 should be very memory efficient.

 For the things that I want to use R for, I know I'll be processing
 much larger datasets, and at the rate that R is consuming memory it
 may not be possible to fully load the data into memory. I'm concerned
 that it may not be worth pursuing learning R if it's possible to load
 the data into memory using something like Python but not R. I don't
 want to overlook the possibility that I'm overlooking something, since
 I'm new to the language. Can anyone answer for me:
  * What is R doing with all of that memory?
  * Is there something I did wrong? Is there a more memory-efficient
 way to load this data?
  * Are there R modules that can store large data-sets in a more
 memory-efficient way? Can anyone relate their experiences with them?

 For reference, here's the Python script I used to measure Python's memory 
 usage:

 import os

 def show_mem():
        statm = open('/proc/%d/statm' % os.getpid()).read()
        print 'VIRT = %s, RSS = %s' % tuple(statm.split(' ')[:2])

 def read_data(fname):
        servlets = []
        timestamps = []
        elapsed = []

        for line in open(fname, 'r'):
                s, t, e = line.strip().split(' ')
                servlets.append(s)
                timestamps.append(float(t))
                elapsed.append(float(e))

        show_mem()

 if __name__ == '__main__':
        show_mem()
        read_data('/home/evan/20090708.tab')


 --
 Evan Klitzke e...@eklitzke.org :wq

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman

Re: [R] R Memory Usage Concerns

2009-09-14 Thread Eduardo Leoni

And, by the way, factors take up _more_ memory than character vectors.

 object.size(sample(c(a,b), 1000, replace=TRUE))
4088 bytes
 object.size(factor(sample(c(a,b), 1000, replace=TRUE)))
4296 bytes


On Mon, Sep 14, 2009 at 11:35 PM, jim holtman jholt...@gmail.com wrote:
 When you read your file into R, show the structure of the object:

 str(tab)

 also the size of the object:

 object.size(tab)

 This will tell you what your data looks like and the size taken in R.
 Also in read.table, use colClasses to define what the format of the
 data is; may make it faster.  You might want to force a garbage
 collection 'gc()' to see if that frees up any memory.  If your input
 is about 2M lines and it looks like there are three column (alpha,
 numeric, numeric), I would guess that you will probably have an
 object.size of about 50MB.  This information would help.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Memory Usage Concerns

2009-09-14 Thread Evan Klitzke

On Mon, Sep 14, 2009 at 8:35 PM, jim holtman jholt...@gmail.com wrote:
 When you read your file into R, show the structure of the object:
...

Here's the data I get:

 tab - read.table(~/20090708.tab)
 str(tab)
'data.frame':   1797601 obs. of  3 variables:
 $ V1: Factor w/ 6 levels biz_details,..: 4 4 4 4 4 5 6 4 1 4 ...
 $ V2: num  1.25e+09 1.25e+09 1.25e+09 1.25e+09 1.25e+09 ...
 $ V3: num  0.0141 0.0468 0.0137 0.0594 0.0171 ...
 object.size(tab)
35953640 bytes
 gc()
  used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells  119580  6.41489330  79.6  2380869 127.2
Vcells 6647905 50.8   17367032 132.5 16871956 128.8

Forcing a GC doesn't seem to free up an appreciable amount of memory
(memory usage reported by ps is about the same), but it's encouraging
that the output from object.size shows that the object is small. I am,
however, a little bit skeptical of this:

1797601 * (4 + 8 + 8) = 35952020, which is awfully close to 35953640.
My assumption is that the first column is mapped to a 32-bit integer,
plus two 8-byte numbers for the doubles, plus a little bit of overhead
to store whatever structs for the objects and the mapping of servlet
name (i.e. to store the string - int mapping used by the factor) to
its 32-bit representation. This seems like it might be too
conservative for me, since it implies that R allocated exactly as much
memory for the lists as there were numbers in the list (e.g. typically
in an interpreter like this you'd be allocating on order-of-two
boundaries, i.e. sizeof(obj)  21; this is how Python lists
internally work).

Is it possible that R is counting its memory usage naively, e.g. just
adding up the size of all of the constituent objects, rather than the
amount of space it actually allocated for those objects?

-- 
Evan Klitzke e...@eklitzke.org :wq

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Memory Usage Concerns

2009-09-14 Thread Evan Klitzke

On Mon, Sep 14, 2009 at 8:58 PM, Eduardo Leoni leoni...@msu.edu wrote:
 And, by the way, factors take up _more_ memory than character vectors.

 object.size(sample(c(a,b), 1000, replace=TRUE))
 4088 bytes
 object.size(factor(sample(c(a,b), 1000, replace=TRUE)))
 4296 bytes

I think this is just because you picked short strings. If the factor
is mapping the string to a native integer type, the strings would have
to be larger for you to notice:

 object.size(sample(c(a pretty long string, another pretty long string), 
 1000, replace=TRUE))
8184 bytes
 object.size(factor(sample(c(a pretty long string, another pretty long 
 string), 1000, replace=TRUE)))
4560 bytes

-- 
Evan Klitzke e...@eklitzke.org :wq

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Memory Usage Concerns

2009-09-14 Thread hadley wickham

 I think this is just because you picked short strings. If the factor
 is mapping the string to a native integer type, the strings would have
 to be larger for you to notice:

 object.size(sample(c(a pretty long string, another pretty long string), 
 1000, replace=TRUE))
 8184 bytes
 object.size(factor(sample(c(a pretty long string, another pretty long 
 string), 1000, replace=TRUE)))
 4560 bytes

No, it's probably because you have an older version of R, which
doesn't have the global string cache.

 object.size(sample(c(a pretty long string, another pretty long string), 
 1000, replace=TRUE))
4136 bytes
 object.size(factor(sample(c(a pretty long string, another pretty long 
 string), 1000, replace=TRUE)))
4344 bytes

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Memory Usage Concerns

2009-09-14 Thread hadley wickham

 its 32-bit representation. This seems like it might be too
 conservative for me, since it implies that R allocated exactly as much
 memory for the lists as there were numbers in the list (e.g. typically
 in an interpreter like this you'd be allocating on order-of-two
 boundaries, i.e. sizeof(obj)  21; this is how Python lists
 internally work).

This is not how R vectors work.  R data structures tend to be
immutable, and so are designed somewhat differently to their python
equivalents.

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Memory Usage Concerns

2009-09-14 Thread Henrik Bengtsson

As already suggested, you're (much) better off if you specify colClasses, e.g.

tab - read.table(~/20090708.tab, colClasses=c(factor, double, double));

Otherwise, R has to load all the data, make a best guess of the column
classes, and then coerce (which requires a copy).

/Henrik

On Mon, Sep 14, 2009 at 9:26 PM, Evan Klitzke e...@eklitzke.org wrote:
 On Mon, Sep 14, 2009 at 8:35 PM, jim holtman jholt...@gmail.com wrote:
 When you read your file into R, show the structure of the object:
 ...

 Here's the data I get:

 tab - read.table(~/20090708.tab)
 str(tab)
 'data.frame':   1797601 obs. of  3 variables:
  $ V1: Factor w/ 6 levels biz_details,..: 4 4 4 4 4 5 6 4 1 4 ...
  $ V2: num  1.25e+09 1.25e+09 1.25e+09 1.25e+09 1.25e+09 ...
  $ V3: num  0.0141 0.0468 0.0137 0.0594 0.0171 ...
 object.size(tab)
 35953640 bytes
 gc()
          used (Mb) gc trigger  (Mb) max used  (Mb)
 Ncells  119580  6.4    1489330  79.6  2380869 127.2
 Vcells 6647905 50.8   17367032 132.5 16871956 128.8

 Forcing a GC doesn't seem to free up an appreciable amount of memory
 (memory usage reported by ps is about the same), but it's encouraging
 that the output from object.size shows that the object is small. I am,
 however, a little bit skeptical of this:

 1797601 * (4 + 8 + 8) = 35952020, which is awfully close to 35953640.
 My assumption is that the first column is mapped to a 32-bit integer,
 plus two 8-byte numbers for the doubles, plus a little bit of overhead
 to store whatever structs for the objects and the mapping of servlet
 name (i.e. to store the string - int mapping used by the factor) to
 its 32-bit representation. This seems like it might be too
 conservative for me, since it implies that R allocated exactly as much
 memory for the lists as there were numbers in the list (e.g. typically
 in an interpreter like this you'd be allocating on order-of-two
 boundaries, i.e. sizeof(obj)  21; this is how Python lists
 internally work).

 Is it possible that R is counting its memory usage naively, e.g. just
 adding up the size of all of the constituent objects, rather than the
 amount of space it actually allocated for those objects?

 --
 Evan Klitzke e...@eklitzke.org :wq

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] memory usage grows too fast

2009-05-15 Thread Ping-Hsun Hsieh

Thanks for Peter, William, and Hadley's helps.
Your codes are much more concise than mine.  :P
 
Both William and Hadley's comments are the same. Here are their codes.

f - function(dataMatrix) rowMeans(datamatrix==02)

And Peter's codes are the following.

apply(yourMatrix, 1, function(x) 
length(x[x==yourPattern]))/ncol(yourMatrix)


In terms of the running time, the first one ran faster than the later one on my 
dataset (2.5 mins vs. 6.4 mins)
The memory consumption, however, of the first one is much higher than the 
later.  ( 8G vs. ~3G )

Any thoughts? My guess is the rowMeans created extra copies to perform its 
calculation, but not so sure.
And I am also interested in understanding ways to handle memory issues. Help 
someone could shed light on this for me. :)

Best,
Mike

-Original Message-
From: Peter Alspach [mailto:palsp...@hortresearch.co.nz] 
Sent: Thursday, May 14, 2009 4:47 PM
To: Ping-Hsun Hsieh
Subject: RE: [R] memory usage grows too fast

Tena koe Mike

If I understand you correctly, you should be able to use something like:

apply(yourMatrix, 1, function(x)
length(x[x==yourPattern]))/ncol(yourMatrix)

I see you've divided by nrow(yourMatrix) so perhaps I am missing
something.

HTH ...

Peter Alspach

 

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Ping-Hsun Hsieh
 Sent: Friday, 15 May 2009 11:22 a.m.
 To: r-help@r-project.org
 Subject: [R] memory usage grows too fast
 
 Hi All,
 
 I have a 1000x100 matrix. 
 The calculation I would like to do is actually very simple: 
 for each row, calculate the frequency of a given pattern. For 
 example, a toy dataset is as follows.
 
 Col1  Col2Col3Col4
 0102  02  00  = Freq of 02 is 0.5
 0202  02  01  = Freq of 02 is 0.75
 0002  01  01  ...
 
 My code is quite simple as the following to find the pattern 02.
 
 OccurrenceRate_Fun-function(dataMatrix)
 {
   tmp-NULL
   tmpMatrix-apply(dataMatrix,1,match,02)
for ( i in 1: ncol(tmpMatrix))
   {
 tmpRate-table(tmpMatrix[,i])[[1]]/ nrow(tmpMatrix)
 tmp-c(tmp,tmpHET)
   }
   rm(tmpMatrix)
   rm(tmpRate)
   return(tmp)
   gc()
 }
 
 The problem is the memory usage grows very fast and hard to 
 be handled on machines with less RAM.
 Could anyone please give me some comments on how to reduce 
 the space complexity in this calculation?
 
 Thanks,
 Mike
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

The contents of this e-mail are confidential and may be ...{{dropped:14}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] memory usage grows too fast

2009-05-15 Thread Ping-Hsun Hsieh

Hi William,

Thanks for the comments and explanation.
It is really good to know the details of rowMeans.
I did modified Peter's codes from length(x[x==02]) to sum(x==02), though it 
improved only in few seconds. :)

Best,
Mike

-Original Message-
From: William Dunlap [mailto:wdun...@tibco.com] 
Sent: Friday, May 15, 2009 10:09 AM
To: Ping-Hsun Hsieh
Subject: RE: [R] memory usage grows too fast

rowMeans(dataMatrix==02) must
  (a) make a logical matrix the dimensions of dataMatrix in which to put
   the result of dataMatrix==02 (4 bytes/logical element)
  (b) make a double precision matrix (8 bytes/element) the size of that
   logical matrix because rowMeans uses some C code that only works
on
   doubles
apply(dataMatrix,1,function(x)length(x[x==02])/ncol(dataMatrix))
never has to make any copies of the entire matrix.  It extracts a row
at a time and when it is done with the row, the memory used for
working on the row is available for other uses.  Note that it would
probably
be a tad faster if it were changed to
   apply(dataMatrix,1,function(x)sum(x==02)) / ncol(dataMatrix)
as sum(logicalVector) is the same as length(x[logicalVector]) and there
is no need to compute ncol(dataMatrix) more than once.

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com  

 -Original Message-
 From: Ping-Hsun Hsieh [mailto:hsi...@ohsu.edu] 
 Sent: Friday, May 15, 2009 9:58 AM
 To: Peter Alspach; William Dunlap; hadley wickham
 Cc: r-help@r-project.org
 Subject: RE: [R] memory usage grows too fast
 
 Thanks for Peter, William, and Hadley's helps.
 Your codes are much more concise than mine.  :P
  
 Both William and Hadley's comments are the same. Here are their codes.
 
   f - function(dataMatrix) rowMeans(datamatrix==02)
 
 And Peter's codes are the following.
 
   apply(yourMatrix, 1, function(x) 
 length(x[x==yourPattern]))/ncol(yourMatrix)
 
 
 In terms of the running time, the first one ran faster than 
 the later one on my dataset (2.5 mins vs. 6.4 mins)
 The memory consumption, however, of the first one is much 
 higher than the later.  ( 8G vs. ~3G )
 
 Any thoughts? My guess is the rowMeans created extra copies 
 to perform its calculation, but not so sure.
 And I am also interested in understanding ways to handle 
 memory issues. Help someone could shed light on this for me. :)
 
 Best,
 Mike
 
 -Original Message-
 From: Peter Alspach [mailto:palsp...@hortresearch.co.nz] 
 Sent: Thursday, May 14, 2009 4:47 PM
 To: Ping-Hsun Hsieh
 Subject: RE: [R] memory usage grows too fast
 
 Tena koe Mike
 
 If I understand you correctly, you should be able to use 
 something like:
 
 apply(yourMatrix, 1, function(x)
 length(x[x==yourPattern]))/ncol(yourMatrix)
 
 I see you've divided by nrow(yourMatrix) so perhaps I am missing
 something.
 
 HTH ...
 
 Peter Alspach
 
  
 
  -Original Message-
  From: r-help-boun...@r-project.org 
  [mailto:r-help-boun...@r-project.org] On Behalf Of Ping-Hsun Hsieh
  Sent: Friday, 15 May 2009 11:22 a.m.
  To: r-help@r-project.org
  Subject: [R] memory usage grows too fast
  
  Hi All,
  
  I have a 1000x100 matrix. 
  The calculation I would like to do is actually very simple: 
  for each row, calculate the frequency of a given pattern. For 
  example, a toy dataset is as follows.
  
  Col1Col2Col3Col4
  01  02  02  00  = Freq of 02 is 0.5
  02  02  02  01  = Freq of 02 is 0.75
  00  02  01  01  ...
  
  My code is quite simple as the following to find the pattern 02.
  
  OccurrenceRate_Fun-function(dataMatrix)
  {
tmp-NULL
tmpMatrix-apply(dataMatrix,1,match,02)
 for ( i in 1: ncol(tmpMatrix))
{
  tmpRate-table(tmpMatrix[,i])[[1]]/ nrow(tmpMatrix)
  tmp-c(tmp,tmpHET)
}
rm(tmpMatrix)
rm(tmpRate)
return(tmp)
gc()
  }
  
  The problem is the memory usage grows very fast and hard to 
  be handled on machines with less RAM.
  Could anyone please give me some comments on how to reduce 
  the space complexity in this calculation?
  
  Thanks,
  Mike
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  
 
 The contents of this e-mail are confidential and may be 
 subject to legal privilege.
  If you are not the intended recipient you must not use, 
 disseminate, distribute or
  reproduce all or any part of this e-mail or attachments.  If 
 you have received this
  e-mail in error, please notify the sender and delete all 
 material pertaining to this
  e-mail.  Any opinion or views expressed in this e-mail are 
 those of the individual
  sender and may not represent those of The New Zealand 
 Institute for Plant and
  Food Research Limited

[R] memory usage grows too fast

2009-05-14 Thread Ping-Hsun Hsieh

Hi All,

I have a 1000x100 matrix. 
The calculation I would like to do is actually very simple: for each row, 
calculate the frequency of a given pattern. For example, a toy dataset is as 
follows.

Col1Col2Col3Col4
01  02  02  00  = Freq of “02” is 0.5
02  02  02  01  = Freq of “02” is 0.75
00  02  01  01  …

My code is quite simple as the following to find the pattern “02”.

OccurrenceRate_Fun-function(dataMatrix)
{
  tmp-NULL
  tmpMatrix-apply(dataMatrix,1,match,02)
   for ( i in 1: ncol(tmpMatrix))
  {
tmpRate-table(tmpMatrix[,i])[[1]]/ nrow(tmpMatrix)
tmp-c(tmp,tmpHET)
  }
  rm(tmpMatrix)
  rm(tmpRate)
  return(tmp)
  gc()
}

The problem is the memory usage grows very fast and hard to be handled on 
machines with less RAM.
Could anyone please give me some comments on how to reduce the space complexity 
in this calculation?

Thanks,
Mike
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] memory usage grows too fast

2009-05-14 Thread hadley wickham

On Thu, May 14, 2009 at 6:21 PM, Ping-Hsun Hsieh hsi...@ohsu.edu wrote:
 Hi All,

 I have a 1000x100 matrix.
 The calculation I would like to do is actually very simple: for each row, 
 calculate the frequency of a given pattern. For example, a toy dataset is as 
 follows.

 Col1    Col2    Col3    Col4
 01      02      02      00              = Freq of “02” is 0.5
 02      02      02      01              = Freq of “02” is 0.75
 00      02      01      01              …

 My code is quite simple as the following to find the pattern “02”.

 OccurrenceRate_Fun-function(dataMatrix)
 {
  tmp-NULL
  tmpMatrix-apply(dataMatrix,1,match,02)
   for ( i in 1: ncol(tmpMatrix))
  {
    tmpRate-table(tmpMatrix[,i])[[1]]/ nrow(tmpMatrix)
    tmp-c(tmp,tmpHET)
  }
  rm(tmpMatrix)
  rm(tmpRate)
  return(tmp)
  gc()
 }

 The problem is the memory usage grows very fast and hard to be handled on 
 machines with less RAM.
 Could anyone please give me some comments on how to reduce the space 
 complexity in this calculation?

rowMeans(dataMatrix == 02)  ?

Hadley


-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R memory usage and size limits

2009-02-05 Thread Tom Quarendon

I have a general question about R's usage or memory and what limits 
exist on the size of datasets it can deal with.
My understanding was that all object in a session are held in memory. 
This implies that you're limited in the size of datasets that you can 
process by the amount of memory you've got access to (be it physical or 
paging). Is this true? Or does R store objects on disk and page them in 
as parts are needed in the way that SAS does?
Are there 64 bit versions of R that can therefore deal with much larger 
objects?


Many thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R memory usage and size limits

2009-02-05 Thread Prof Brian Ripley

Please read ?Memory-limits and the R-admin manual for basic 
information.


On Thu, 5 Feb 2009, Tom Quarendon wrote:

I have a general question about R's usage or memory and what limits exist on 
the size of datasets it can deal with.
My understanding was that all object in a session are held in memory. This 
implies that you're limited in the size of datasets that you can process by 
the amount of memory you've got access to (be it physical or paging). Is this 
true? Or does R store objects on disk and page them in as parts are needed in 
the way that SAS does?


That's rather a false dichotomy: paging uses the disk, so the 
distinction is if R implemented its own virtual memory system or uses 
the OS's one (the latter).


There are also interfaces to DBMSs for use with large datasets: see 
the R-data manual and also look at the package list in the FAQ.


Are there 64 bit versions of R that can therefore deal with much larger 
objects?


Yes, there have been 64-bit versions of R for many years, and they are 
in routine use on very large problems.




Many thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problems with R memory usage on Linux

2008-10-15 Thread B. Bogart

Hello all,

I'm working with a large data-set, and upgraded my RAM to 4GB to help
with the mem use.

I've got a 32bit kernel with 64GB memory support compiled in.

gnome-system-monitor and free both show the full 4GB as being available.

In R I was doing some processing and I got the following message (when
collecting 100 307200*8 dataframes into a single data-frame (for plotting):

Error: cannot allocate vector of size 2.3 Mb

So I checked the R memory usage:

$ ps -C R -o size
   SZ
3102548

I tried removing some objects and running gc() R then shows much less
memory being used:

$ ps -C R -o size
   SZ
2732124

Which should give me an extra 300MB in R.

I still get the same error about R being unable to allocate another 2.3MB.

I deleted well over 2.3MB of objects...

Any suggestions as to get around this?

Is the only way to use all 4GB in R to use a 64bit kernel?

Thanks all,
B. Bogart

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with R memory usage on Linux

2008-10-15 Thread Prof Brian Ripley


See ?Memory-size

On Wed, 15 Oct 2008, B. Bogart wrote:


Hello all,

I'm working with a large data-set, and upgraded my RAM to 4GB to help
with the mem use.

I've got a 32bit kernel with 64GB memory support compiled in.

gnome-system-monitor and free both show the full 4GB as being available.

In R I was doing some processing and I got the following message (when
collecting 100 307200*8 dataframes into a single data-frame (for plotting):

Error: cannot allocate vector of size 2.3 Mb

So I checked the R memory usage:

$ ps -C R -o size
  SZ
3102548

I tried removing some objects and running gc() R then shows much less
memory being used:

$ ps -C R -o size
  SZ
2732124

Which should give me an extra 300MB in R.

I still get the same error about R being unable to allocate another 2.3MB.

I deleted well over 2.3MB of objects...

Any suggestions as to get around this?

Is the only way to use all 4GB in R to use a 64bit kernel?

Thanks all,
B. Bogart

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with R memory usage on Linux

2008-10-15 Thread repkakala Gazeta.pl

Doen't work.

\misiek

Prof Brian Ripley wrote:
 See ?Memory-size

 On Wed, 15 Oct 2008, B. Bogart wrote:
[...]

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with R memory usage on Linux

2008-10-15 Thread Prof Brian Ripley


Or ?Memory-limits (and the posting guide of course).

On Wed, 15 Oct 2008, Prof Brian Ripley wrote:


See ?Memory-size

On Wed, 15 Oct 2008, B. Bogart wrote:


Hello all,

I'm working with a large data-set, and upgraded my RAM to 4GB to help
with the mem use.

I've got a 32bit kernel with 64GB memory support compiled in.

gnome-system-monitor and free both show the full 4GB as being available.

In R I was doing some processing and I got the following message (when
collecting 100 307200*8 dataframes into a single data-frame (for plotting):

Error: cannot allocate vector of size 2.3 Mb

So I checked the R memory usage:

$ ps -C R -o size
  SZ
3102548

I tried removing some objects and running gc() R then shows much less
memory being used:

$ ps -C R -o size
  SZ
2732124

Which should give me an extra 300MB in R.

I still get the same error about R being unable to allocate another 2.3MB.

I deleted well over 2.3MB of objects...

Any suggestions as to get around this?

Is the only way to use all 4GB in R to use a 64bit kernel?

Thanks all,
B. Bogart

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] reducing memory usage WAS: Problems with R memory usage on Linux

2008-10-15 Thread B. Bogart

Hello,

I have read the R memory pages.

I realized after my post that I would not have enough memory to
accomplish this task.

The command I'm using to convert the list into a data-frame is as such:

som - do.call(rbind, somlist)

Where som is the dataframe resulting from combining all the dataframes
in somlist.

Is there a way I can remove each item from the list and gc() once it has
been collected in the som data frame? That way the memory usage should
be able the same, rather than double or triple?

Any other suggestions on reducing memory usage? (I'm already running
blackbox and a single terminal to do the job)

I do have enough memory to store the somlist twice over, but the do.call
bails before its done, so I suppose it uses a workspace so that I need
2x the space of the somlist to collect it?

Is there another function that does the same thing but only uses 2x the
size of somlist of memory?

Thanks for your help,

Prof Brian Ripley wrote:
 Or ?Memory-limits (and the posting guide of course).
 
 On Wed, 15 Oct 2008, Prof Brian Ripley wrote:
 
 See ?Memory-size

 On Wed, 15 Oct 2008, B. Bogart wrote:

 Hello all,

 I'm working with a large data-set, and upgraded my RAM to 4GB to help
 with the mem use.

 I've got a 32bit kernel with 64GB memory support compiled in.

 gnome-system-monitor and free both show the full 4GB as being available.

 In R I was doing some processing and I got the following message (when
 collecting 100 307200*8 dataframes into a single data-frame (for
 plotting):

 Error: cannot allocate vector of size 2.3 Mb

 So I checked the R memory usage:

 $ ps -C R -o size
   SZ
 3102548

 I tried removing some objects and running gc() R then shows much less
 memory being used:

 $ ps -C R -o size
   SZ
 2732124

 Which should give me an extra 300MB in R.

 I still get the same error about R being unable to allocate another
 2.3MB.

 I deleted well over 2.3MB of objects...

 Any suggestions as to get around this?

 Is the only way to use all 4GB in R to use a 64bit kernel?

 Thanks all,
 B. Bogart

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 -- 
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Memory usage

2008-04-24 Thread Christian Oswald

Hello,

I have aggregate a data.frame from 16MB (Object size). After some
minutes I get the error message cannot allocate vector of size 64.5MB.
My computer has a physical memory of 4GB under Windows Vista.

I have test the same command on another computer with the same OS and
2GB RAM. In nearly 2sec I get the result without problems.

Thanks


buch-read.delim(Y2006_1.csv,sep=;,as.is=TRUE,header=TRUE,dec=,)
ana01-aggregate(buch[,c(VALUELW,LZLW,SZLW)],by=data.frame(buch$PRODGRP,buch$LAND1,buch$KUNDE1),sum)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

59 matches

Mail list logo