Re: [R] Memory usage in read.csv()

2010-01-20 Thread nabble . 30 . miller_2555
Hi Jim  Gabor -

   Apparently, it was most likely a hardware issue (shortly after
sending my last e-mail, the computer promptly died). After buying a
new system and restoring, the script runs fine. Thanks for your help!

On Tue, Jan 19, 2010 at 2:02 PM, jim holtman - jholt...@gmail.com
+nabble+miller_2555+9dc9649aca.jholtman#gmail@spamgourmet.com
wrote:
 I read vmstat data in just fine without any problems.  Here is an
 example of how I do it:

 VMstat - read.table('vmstat.txt', header=TRUE, as.is=TRUE)

 vmstat.txt looks like this:

 date time r b w swap free re mf pi po fr de sr intr syscalls cs user sys id
 07/27/05 00:13:06 0 0 0 27755440 13051648 20 86 0 0 0 0 0 456 2918 1323 0 1 99
 07/27/05 00:13:36 0 0 0 27755280 13051480 11 53 0 0 0 0 0 399 1722 1411 0 1 99
 07/27/05 00:14:06 0 0 0 27753952 13051248 18 88 0 0 0 0 0 424 1259 1254 0 1 99
 07/27/05 00:14:36 0 0 0 27755304 13051496 17 85 0 0 0 0 0 430 1029 1246 0 1 99
 07/27/05 00:15:06 0 0 0 27755064 13051232 41 278 0 1 1 0 0 452 2047 1386 0 1 
 99
 07/27/05 00:15:36 0 0 0 27753824 13040720 125 1039 0 0 0 0 0 664 4097
 1901 3 2 95
 07/27/05 00:16:06 0 0 0 27754472 13027000 15 91 0 0 0 0 0 432 1160 1273 0 1 99
 07/27/05 00:16:36 0 0 0 27754568 13027104 17 85 0 0 0 0 0 416 1058 1271 0 1 99

 Have you tried a smaller portion of data?

 Here is what it took to read in a file with 85K lines:

 system.time(vmstat - read.table('c:/vmstat.txt', header=TRUE))
   user  system elapsed
   2.01    0.01    2.03
 str(vmstat)
 'data.frame':   85680 obs. of  20 variables:
  $ date    : Factor w/ 2 levels 07/27/05,07/28/05: 1 1 1 1 1 1 1 1 1 1 ...
  $ time    : Factor w/ 2856 levels 00:00:26,00:00:56,..: 27 29 31
 33 35 37 39 41 43 45 ...
  $ r       : int  0 0 0 0 0 0 0 0 0 0 ...
  $ b       : int  0 0 0 0 0 0 0 0 0 0 ...
  $ w       : int  0 0 0 0 0 0 0 0 0 0 ...
  $ swap    : int  27755440 27755280 27753952 27755304 27755064
 27753824 27754472 27754568 27754560 27754704 ...
  $ free    : int  13051648 13051480 13051248 13051496 13051232
 13040720 13027000 13027104 13027096 13027240 ...
  $ re      : int  20 11 18 17 41 125 15 17 13 12 ...
  $ mf      : int  86 53 88 85 278 1039 91 85 69 51 ...
  $ pi      : int  0 0 0 0 0 0 0 0 0 0 ...
  $ po      : int  0 0 0 0 1 0 0 0 0 1 ...
  $ fr      : int  0 0 0 0 1 0 0 0 0 1 ...
  $ de      : int  0 0 0 0 0 0 0 0 0 0 ...
  $ sr      : int  0 0 0 0 0 0 0 0 0 0 ...
  $ intr    : int  456 399 424 430 452 664 432 416 425 432 ...
  $ syscalls: int  2918 1722 1259 1029 2047 4097 1160 1058 1198 1727 ...
  $ cs      : int  1323 1411 1254 1246 1386 1901 1273 1271 1268 1477 ...
  $ user    : int  0 0 0 0 0 3 0 0 0 0 ...
  $ sys     : int  1 1 1 1 1 2 1 1 1 1 ...
  $ id      : int  99 99 99 99 99 95 99 99 99 99 ...



 On Tue, Jan 19, 2010 at 9:25 AM, nabble.30.miller_2...@spamgourmet.com 
 wrote:

 I'm sure this has gotten some attention before, but I have two CSV
 files generated from vmstat and free that are roughly 6-8 Mb (about
 80,000 lines) each. When I try to use read.csv(), R allocates all
 available memory (about 4.9 Gb) when loading the files, which is over
 300 times the size of the raw data.  Here are the scripts used to
 generate the CSV files as well as the R code:

 Scripts (run for roughly a 24-hour period):
    vmstat -ant 1 | awk '$0 !~ /(proc|free)/ {FS= ; OFS=,; print
 strftime(%F %T %Z),$6,$7,$12,$13,$14,$15,$16,$17;}' 
 ~/vmstat_20100118_133845.o;
    free -ms 1 | awk '$0 ~ /Mem\:/ {FS= ; OFS=,; print
 strftime(%F %T %Z),$2,$3,$4,$5,$6,$7}' 
 ~/memfree_20100118_140845.o;

 R code:
    infile.vms - ~/vmstat_20100118_133845.o;
    infile.mem - ~/memfree_20100118_140845.o;
    vms.colnames -
 c(time,r,b,swpd,free,inact,active,si,so,bi,bo,in,cs,us,sy,id,wa,st);
    vms.colclass - c(character,rep(integer,length(vms.colnames)-1));
    mem.colnames - 
 c(time,total,used,free,shared,buffers,cached);
    mem.colclass - c(character,rep(integer,length(mem.colnames)-1));
    vmsdf - 
 (read.csv(infile.vms,header=FALSE,colClasses=vms.colclass,col.names=vms.colnames));
    memdf - 
 (read.csv(infile.mem,header=FALSE,colClasses=mem.colclass,col.names=mem.colnames));

 I am running R v2.10.0 on a 64-bit machine with Fedora 10 (Linux
 version 2.6.27.41-170.2.117.fc10.x86_64 ) with 6Gb of memory. There
 are no other significant programs running and `rm()` followed by `
 gc()` successfully frees the memory (followed by swapins after other
 programs seek to used previously cached information swapped to disk).
 I've incorporated the memory-saving suggestions in the `read.csv()`
 manual page, excluding the limit on the lines read (which shouldn't
 really be necessary here since we're only talking about  20 Mb of raw
 data. Any suggestions, or is the read.csv() code known to have memory
 leak/ overcommit issues?

 Thanks

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 

[R] Memory usage in read.csv()

2010-01-19 Thread nabble . 30 . miller_2555
I'm sure this has gotten some attention before, but I have two CSV
files generated from vmstat and free that are roughly 6-8 Mb (about
80,000 lines) each. When I try to use read.csv(), R allocates all
available memory (about 4.9 Gb) when loading the files, which is over
300 times the size of the raw data.  Here are the scripts used to
generate the CSV files as well as the R code:

Scripts (run for roughly a 24-hour period):
vmstat -ant 1 | awk '$0 !~ /(proc|free)/ {FS= ; OFS=,; print
strftime(%F %T %Z),$6,$7,$12,$13,$14,$15,$16,$17;}' 
~/vmstat_20100118_133845.o;
free -ms 1 | awk '$0 ~ /Mem\:/ {FS= ; OFS=,; print
strftime(%F %T %Z),$2,$3,$4,$5,$6,$7}' 
~/memfree_20100118_140845.o;

R code:
infile.vms - ~/vmstat_20100118_133845.o;
infile.mem - ~/memfree_20100118_140845.o;
vms.colnames -
c(time,r,b,swpd,free,inact,active,si,so,bi,bo,in,cs,us,sy,id,wa,st);
vms.colclass - c(character,rep(integer,length(vms.colnames)-1));
mem.colnames - c(time,total,used,free,shared,buffers,cached);
mem.colclass - c(character,rep(integer,length(mem.colnames)-1));
vmsdf - 
(read.csv(infile.vms,header=FALSE,colClasses=vms.colclass,col.names=vms.colnames));
memdf - 
(read.csv(infile.mem,header=FALSE,colClasses=mem.colclass,col.names=mem.colnames));

I am running R v2.10.0 on a 64-bit machine with Fedora 10 (Linux
version 2.6.27.41-170.2.117.fc10.x86_64 ) with 6Gb of memory. There
are no other significant programs running and `rm()` followed by `
gc()` successfully frees the memory (followed by swapins after other
programs seek to used previously cached information swapped to disk).
I've incorporated the memory-saving suggestions in the `read.csv()`
manual page, excluding the limit on the lines read (which shouldn't
really be necessary here since we're only talking about  20 Mb of raw
data. Any suggestions, or is the read.csv() code known to have memory
leak/ overcommit issues?

Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [Lattice] panel.levelplot - shrink argument to highlight absolute z-values

2010-01-09 Thread nabble . 30 . miller_2555
Hi -

I have a levelplot with positive and negative z-values. I'd like
to scale the levelplot rectangles proportional to the *absolute*
z-values to highlight the z-value extremes (while retaining the color
difference to track the positive/negative attribute). I've likely
missed something in the documentation, but have been at it a couple
days without finding a solution (other than patching the
panel.levelplot routine). Is there a way to do this? If not, can we
add the functionality?

By way of a very simple example, I'd like to make the deep-colors
bigger in the following example (please note the I am using the
`expression` version of the leveplot() function with a data frame
instead of the `matrix` version in my particular application).

 
levelplot(matrix(runif(21),nrow=7,ncol=3)-0.5,panel=function(x,y,z,subscript,
...) { panel.levelplot(x,y,z,shrink=c(0,2),...)})


Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Feature request for as.Date() function

2009-11-26 Thread nabble . 30 . miller_2555
On Thu, Nov 26, 2009 at 12:08 AM, jim holtman - jholt...@gmail.com
+nabble+miller_2555+9dc9649aca.jholtman#gmail@spamgourmet.com
wrote:
 An easy way is just to write your own function that will accept NA,
 convert it to NA and then call as.Date.


I have written such a function, which has provided the temporary
workaround mentioned. ( I am not that lazy yet :-) )

 R is a functional language, so write some functions.  Don't try to
 overload existing functions with new options that may break a lot of
 existing code.  If you have special requirements, then adapt your code
 to them.  You would probably have to wait around for a long time
 before an new option got in, so it is easier to create your own.

I do not mind waiting for the additional functionality (and it is no
longer an immediate need given the workaround). I was attempting to
contribute to the continued enhancement of an open source project.
Since the as.Date() function already defines standard unambiguous
formats, and since NA (and NaN, Inf, etc) are not ambiguous
within the transform to their numeric counterparts, it stands to
reason that this is logical behaviour of this function.

I also doubt this enhancement would break moderate-to-well-designed code since:
 (1) Existing code would enact a stop() condition based on the
current implementation, forcing error-handling, if any.
 (2) Converting NA (and NaN, inf, etc) is not ambiguous.
Coders feeding such strings should expect their numeric counterparts.
In all likelihood, coders would convert these strings manually in
error-handling code anyway.

I have my solution, but wanted to better the project for use by other
community members. The R Core Development Team is welcome to accept or
ignore the suggestion. I do appreciate the time to discuss this topic,
but will consider the matter closed for my part.

Thanks.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Feature request for as.Date() function

2009-11-25 Thread nabble . 30 . miller_2555
Hello -

I have a csv file with a few date columns. Some of the records have an
NA character string instead of the date. When I attempt to use
read.csv() and typecast the columns using colClasses, I receive the
following error:
Error in charToDate(x) :
  character string is not in a standard unambiguous format

Similarly, the following command produces the same error:
as.Date(NA)

However, as.Date(NA) performs as documented.

Can we enhance the as.Date() function to convert NA strings into NA
value prior to type conversion?

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Feature request for as.Date() function 20)

2009-11-25 Thread nabble . 30 . miller_2555
On Wed, Nov 25, 2009 at 2:56 PM, jim holtman - jholt...@gmail.com
+nabble+miller_2555+9dc9649aca.jholtman#gmail@spamgourmet.com
wrote:
 Seems to work fine in my testing:

 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 Similarly, the following command produces the same error:
    as.Date(NA)

 However, as.Date(NA) performs as documented.

 Can we enhance the as.Date() function to convert NA strings into NA
 value prior to type conversion?

I sincerely appreciate the help, but with all due respect, I have read
the posting guide and did provide the minimal code necessary to
reproduce the desired feature. To reiterate, I would like to be able
to feed the character string NA to the as.Date() function to yield
the same result as `as.Date(NA)`. Please advise if testing the
following does not yield an error:
 as.Date(NA);

This may or may not aid the read.csv() error message in my particular
code (for which a workaround has already been identified).

Thank you.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Feature request for as.Date() function 20)

2009-11-25 Thread nabble . 30 . miller_2555
An easy way is just to write your own function that will accept NA,
convert it to NA and then call as.Date.

R is a functional language, so write some functions.  Don't try to
overload existing functions with new options that may break a lot of
existing code.  If you have special requirements, then adapt your code
to them.  You would probably have to wait around for a long time
before an new option got in, so it is easier to create your own.



On Wed, Nov 25, 2009 at 4:40 PM,  nabble.30.miller_2...@spamgourmet.com wrote:
 On Wed, Nov 25, 2009 at 2:56 PM, jim holtman - 
 nabble.30.miller_2...@spamgourmet.com
 
 wrote:
 Seems to work fine in my testing:

 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 Similarly, the following command produces the same error:
    as.Date(NA)

 However, as.Date(NA) performs as documented.

 Can we enhance the as.Date() function to convert NA strings into NA
 value prior to type conversion?

 I sincerely appreciate the help, but with all due respect, I have read
 the posting guide and did provide the minimal code necessary to
 reproduce the desired feature. To reiterate, I would like to be able
 to feed the character string NA to the as.Date() function to yield
 the same result as `as.Date(NA)`. Please advise if testing the
 following does not yield an error:
 as.Date(NA);

 This may or may not aid the read.csv() error message in my particular
 code (for which a workaround has already been identified).

 Thank you.





-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Applying do.call to a data.frame using function arguments (nabble: message 8 of 20)

2009-08-26 Thread nabble . 30 . miller_2555
On Wed, Aug 26, 2009 at 12:47 PM, hadley wickham -
 I think you're missing some quotes:
 cat(do.call(paste,c(x2,sep=','))[1], \n)

Thanks - the strings are actually substrings of larger strings
(specifically, SQL statements), which will wrap with the leading and
trailing quotes (though I should have pointed this out in my original
post).

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Accessing list object from within a function as a list element

2009-07-20 Thread nabble . 30 . miller_2555
Hi -

I have a list (call it 'mylist') with the following elements: (i) a
function (call it 'myfunc' and expressed as 'mylist$myfunc') and (ii)
a variable (call it 'myvar' and expressed as 'mylist$myvar'). Since I
use mylist as a pseudo-class (I assign mylist to multiple different R
objects), I would like to access the mylist R object from within the
'myfunc' function to use 'myvar.' Here is a simple example:
mylist - list();
mylist$myvar - ~/file.out;
mylist$myfunc - function (mymsg=hello world) {
cat(mymsg,mylist$myvar); };

If I perform the following:
myclassobj_1 - myclassobj_2 - myclassobj_3 - mylist;
myclassobj_1 - ~/file_1.out;
myclassobj_2 - ~/file_2.out;
myclassobj_3 - ~/file_3.out;

I cannot use myclassobj_1$myfunc() as it will place hello world into
~/file.out instead of ~/file_1.out.

Unless I want to make the myclassobj_? lists an argument to the
'myfunc' function, I need an object reference to the appropriate
'mylist.' For instance:
mylist$myfunc - function (mymsg=hello world) { objref -
function returning container list; cat(mymsg,objref$myvar); };

I had some success with the sys.call() function, though the
implementation is sloppy and inconsistent as it relies on the
environment frame in which the call is made as opposed to an attribute
of the 'myfunc' function.

Any suggestions as to this specific issue or preferred implementation
of user-defined classes in R?

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Passing arguments to forked children

2009-07-11 Thread nabble . 30 . miller_2555
Hi -

I have attempted to use the fork::fork() function to perform
parallel processing. However, the child R function called needs to
know a given set of parameters to complete its task. Specifically, I
iterate through a vector, and output values based on the elements of
that vector to a database. The output strings contain elements of the
iterated vector. I mocked-up the following code as an example (NOTE:
WHILE NOT SPECIFICALLY DANGEROUS, THIS CODE MAXED OUT THE LIMIT OF MY
SYSTEMS FORKS -- this means that, if you run this code, no additional
processes on your system may start until you kill the parent R
session! BE VERY CAREFUL IF YOU DECIDE TO EXECUTE THIS CODE --
obviously, I do not recommend it. Presumably, an infinite recursion
scenario arose so  you are just left with a *lot* of R sessions. Also,
as each R session has equal access to stdin, you cannot reliably type
commands into a given R session to terminate it -- so definitely don't
run in a CLI environment -- at least you can kill the parent window
running R in a GUI environment). In any case, here is the code:

# -- BEGIN CODE
library(fork);

myforksub - function(mymsg='default') {
cat(mymsg,sep='\n');
exit();
}

myforkparent - function(n=10, mymsg='') {
mypid - c();
for (i in 1:n) {
mypid - c(mypid, fork(myforksub(mymsg)));
}
# wait(NULL) apparently does not wait for all children to finish
for (i in 1:n) {
wait(mypid[i]);
}
}

myforkparent(mymsg='new');

# -- END CODE

Obviously, 'fork(myforksub)' will work fine, but myforksub cannot
access the mymsg variable containing the 'new' value. How can I amend
the above without having to resort to socket connections to pass
information?

While this question is specific to the 'fork' non-standard package, I
thought a few people here would be familiar enough with its use to
offer a suggestion or two.

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.