Re: [R] Memory usage in read.csv()
Hi Jim Gabor - Apparently, it was most likely a hardware issue (shortly after sending my last e-mail, the computer promptly died). After buying a new system and restoring, the script runs fine. Thanks for your help! On Tue, Jan 19, 2010 at 2:02 PM, jim holtman - jholt...@gmail.com +nabble+miller_2555+9dc9649aca.jholtman#gmail@spamgourmet.com wrote: I read vmstat data in just fine without any problems. Here is an example of how I do it: VMstat - read.table('vmstat.txt', header=TRUE, as.is=TRUE) vmstat.txt looks like this: date time r b w swap free re mf pi po fr de sr intr syscalls cs user sys id 07/27/05 00:13:06 0 0 0 27755440 13051648 20 86 0 0 0 0 0 456 2918 1323 0 1 99 07/27/05 00:13:36 0 0 0 27755280 13051480 11 53 0 0 0 0 0 399 1722 1411 0 1 99 07/27/05 00:14:06 0 0 0 27753952 13051248 18 88 0 0 0 0 0 424 1259 1254 0 1 99 07/27/05 00:14:36 0 0 0 27755304 13051496 17 85 0 0 0 0 0 430 1029 1246 0 1 99 07/27/05 00:15:06 0 0 0 27755064 13051232 41 278 0 1 1 0 0 452 2047 1386 0 1 99 07/27/05 00:15:36 0 0 0 27753824 13040720 125 1039 0 0 0 0 0 664 4097 1901 3 2 95 07/27/05 00:16:06 0 0 0 27754472 13027000 15 91 0 0 0 0 0 432 1160 1273 0 1 99 07/27/05 00:16:36 0 0 0 27754568 13027104 17 85 0 0 0 0 0 416 1058 1271 0 1 99 Have you tried a smaller portion of data? Here is what it took to read in a file with 85K lines: system.time(vmstat - read.table('c:/vmstat.txt', header=TRUE)) user system elapsed 2.01 0.01 2.03 str(vmstat) 'data.frame': 85680 obs. of 20 variables: $ date : Factor w/ 2 levels 07/27/05,07/28/05: 1 1 1 1 1 1 1 1 1 1 ... $ time : Factor w/ 2856 levels 00:00:26,00:00:56,..: 27 29 31 33 35 37 39 41 43 45 ... $ r : int 0 0 0 0 0 0 0 0 0 0 ... $ b : int 0 0 0 0 0 0 0 0 0 0 ... $ w : int 0 0 0 0 0 0 0 0 0 0 ... $ swap : int 27755440 27755280 27753952 27755304 27755064 27753824 27754472 27754568 27754560 27754704 ... $ free : int 13051648 13051480 13051248 13051496 13051232 13040720 13027000 13027104 13027096 13027240 ... $ re : int 20 11 18 17 41 125 15 17 13 12 ... $ mf : int 86 53 88 85 278 1039 91 85 69 51 ... $ pi : int 0 0 0 0 0 0 0 0 0 0 ... $ po : int 0 0 0 0 1 0 0 0 0 1 ... $ fr : int 0 0 0 0 1 0 0 0 0 1 ... $ de : int 0 0 0 0 0 0 0 0 0 0 ... $ sr : int 0 0 0 0 0 0 0 0 0 0 ... $ intr : int 456 399 424 430 452 664 432 416 425 432 ... $ syscalls: int 2918 1722 1259 1029 2047 4097 1160 1058 1198 1727 ... $ cs : int 1323 1411 1254 1246 1386 1901 1273 1271 1268 1477 ... $ user : int 0 0 0 0 0 3 0 0 0 0 ... $ sys : int 1 1 1 1 1 2 1 1 1 1 ... $ id : int 99 99 99 99 99 95 99 99 99 99 ... On Tue, Jan 19, 2010 at 9:25 AM, nabble.30.miller_2...@spamgourmet.com wrote: I'm sure this has gotten some attention before, but I have two CSV files generated from vmstat and free that are roughly 6-8 Mb (about 80,000 lines) each. When I try to use read.csv(), R allocates all available memory (about 4.9 Gb) when loading the files, which is over 300 times the size of the raw data. Here are the scripts used to generate the CSV files as well as the R code: Scripts (run for roughly a 24-hour period): vmstat -ant 1 | awk '$0 !~ /(proc|free)/ {FS= ; OFS=,; print strftime(%F %T %Z),$6,$7,$12,$13,$14,$15,$16,$17;}' ~/vmstat_20100118_133845.o; free -ms 1 | awk '$0 ~ /Mem\:/ {FS= ; OFS=,; print strftime(%F %T %Z),$2,$3,$4,$5,$6,$7}' ~/memfree_20100118_140845.o; R code: infile.vms - ~/vmstat_20100118_133845.o; infile.mem - ~/memfree_20100118_140845.o; vms.colnames - c(time,r,b,swpd,free,inact,active,si,so,bi,bo,in,cs,us,sy,id,wa,st); vms.colclass - c(character,rep(integer,length(vms.colnames)-1)); mem.colnames - c(time,total,used,free,shared,buffers,cached); mem.colclass - c(character,rep(integer,length(mem.colnames)-1)); vmsdf - (read.csv(infile.vms,header=FALSE,colClasses=vms.colclass,col.names=vms.colnames)); memdf - (read.csv(infile.mem,header=FALSE,colClasses=mem.colclass,col.names=mem.colnames)); I am running R v2.10.0 on a 64-bit machine with Fedora 10 (Linux version 2.6.27.41-170.2.117.fc10.x86_64 ) with 6Gb of memory. There are no other significant programs running and `rm()` followed by ` gc()` successfully frees the memory (followed by swapins after other programs seek to used previously cached information swapped to disk). I've incorporated the memory-saving suggestions in the `read.csv()` manual page, excluding the limit on the lines read (which shouldn't really be necessary here since we're only talking about 20 Mb of raw data. Any suggestions, or is the read.csv() code known to have memory leak/ overcommit issues? Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
[R] Memory usage in read.csv()
I'm sure this has gotten some attention before, but I have two CSV files generated from vmstat and free that are roughly 6-8 Mb (about 80,000 lines) each. When I try to use read.csv(), R allocates all available memory (about 4.9 Gb) when loading the files, which is over 300 times the size of the raw data. Here are the scripts used to generate the CSV files as well as the R code: Scripts (run for roughly a 24-hour period): vmstat -ant 1 | awk '$0 !~ /(proc|free)/ {FS= ; OFS=,; print strftime(%F %T %Z),$6,$7,$12,$13,$14,$15,$16,$17;}' ~/vmstat_20100118_133845.o; free -ms 1 | awk '$0 ~ /Mem\:/ {FS= ; OFS=,; print strftime(%F %T %Z),$2,$3,$4,$5,$6,$7}' ~/memfree_20100118_140845.o; R code: infile.vms - ~/vmstat_20100118_133845.o; infile.mem - ~/memfree_20100118_140845.o; vms.colnames - c(time,r,b,swpd,free,inact,active,si,so,bi,bo,in,cs,us,sy,id,wa,st); vms.colclass - c(character,rep(integer,length(vms.colnames)-1)); mem.colnames - c(time,total,used,free,shared,buffers,cached); mem.colclass - c(character,rep(integer,length(mem.colnames)-1)); vmsdf - (read.csv(infile.vms,header=FALSE,colClasses=vms.colclass,col.names=vms.colnames)); memdf - (read.csv(infile.mem,header=FALSE,colClasses=mem.colclass,col.names=mem.colnames)); I am running R v2.10.0 on a 64-bit machine with Fedora 10 (Linux version 2.6.27.41-170.2.117.fc10.x86_64 ) with 6Gb of memory. There are no other significant programs running and `rm()` followed by ` gc()` successfully frees the memory (followed by swapins after other programs seek to used previously cached information swapped to disk). I've incorporated the memory-saving suggestions in the `read.csv()` manual page, excluding the limit on the lines read (which shouldn't really be necessary here since we're only talking about 20 Mb of raw data. Any suggestions, or is the read.csv() code known to have memory leak/ overcommit issues? Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [Lattice] panel.levelplot - shrink argument to highlight absolute z-values
Hi - I have a levelplot with positive and negative z-values. I'd like to scale the levelplot rectangles proportional to the *absolute* z-values to highlight the z-value extremes (while retaining the color difference to track the positive/negative attribute). I've likely missed something in the documentation, but have been at it a couple days without finding a solution (other than patching the panel.levelplot routine). Is there a way to do this? If not, can we add the functionality? By way of a very simple example, I'd like to make the deep-colors bigger in the following example (please note the I am using the `expression` version of the leveplot() function with a data frame instead of the `matrix` version in my particular application). levelplot(matrix(runif(21),nrow=7,ncol=3)-0.5,panel=function(x,y,z,subscript, ...) { panel.levelplot(x,y,z,shrink=c(0,2),...)}) Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Feature request for as.Date() function
On Thu, Nov 26, 2009 at 12:08 AM, jim holtman - jholt...@gmail.com +nabble+miller_2555+9dc9649aca.jholtman#gmail@spamgourmet.com wrote: An easy way is just to write your own function that will accept NA, convert it to NA and then call as.Date. I have written such a function, which has provided the temporary workaround mentioned. ( I am not that lazy yet :-) ) R is a functional language, so write some functions. Don't try to overload existing functions with new options that may break a lot of existing code. If you have special requirements, then adapt your code to them. You would probably have to wait around for a long time before an new option got in, so it is easier to create your own. I do not mind waiting for the additional functionality (and it is no longer an immediate need given the workaround). I was attempting to contribute to the continued enhancement of an open source project. Since the as.Date() function already defines standard unambiguous formats, and since NA (and NaN, Inf, etc) are not ambiguous within the transform to their numeric counterparts, it stands to reason that this is logical behaviour of this function. I also doubt this enhancement would break moderate-to-well-designed code since: (1) Existing code would enact a stop() condition based on the current implementation, forcing error-handling, if any. (2) Converting NA (and NaN, inf, etc) is not ambiguous. Coders feeding such strings should expect their numeric counterparts. In all likelihood, coders would convert these strings manually in error-handling code anyway. I have my solution, but wanted to better the project for use by other community members. The R Core Development Team is welcome to accept or ignore the suggestion. I do appreciate the time to discuss this topic, but will consider the matter closed for my part. Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Feature request for as.Date() function
Hello - I have a csv file with a few date columns. Some of the records have an NA character string instead of the date. When I attempt to use read.csv() and typecast the columns using colClasses, I receive the following error: Error in charToDate(x) : character string is not in a standard unambiguous format Similarly, the following command produces the same error: as.Date(NA) However, as.Date(NA) performs as documented. Can we enhance the as.Date() function to convert NA strings into NA value prior to type conversion? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Feature request for as.Date() function 20)
On Wed, Nov 25, 2009 at 2:56 PM, jim holtman - jholt...@gmail.com +nabble+miller_2555+9dc9649aca.jholtman#gmail@spamgourmet.com wrote: Seems to work fine in my testing: PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Similarly, the following command produces the same error: as.Date(NA) However, as.Date(NA) performs as documented. Can we enhance the as.Date() function to convert NA strings into NA value prior to type conversion? I sincerely appreciate the help, but with all due respect, I have read the posting guide and did provide the minimal code necessary to reproduce the desired feature. To reiterate, I would like to be able to feed the character string NA to the as.Date() function to yield the same result as `as.Date(NA)`. Please advise if testing the following does not yield an error: as.Date(NA); This may or may not aid the read.csv() error message in my particular code (for which a workaround has already been identified). Thank you. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Feature request for as.Date() function 20)
An easy way is just to write your own function that will accept NA, convert it to NA and then call as.Date. R is a functional language, so write some functions. Don't try to overload existing functions with new options that may break a lot of existing code. If you have special requirements, then adapt your code to them. You would probably have to wait around for a long time before an new option got in, so it is easier to create your own. On Wed, Nov 25, 2009 at 4:40 PM, nabble.30.miller_2...@spamgourmet.com wrote: On Wed, Nov 25, 2009 at 2:56 PM, jim holtman - nabble.30.miller_2...@spamgourmet.com wrote: Seems to work fine in my testing: PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Similarly, the following command produces the same error: as.Date(NA) However, as.Date(NA) performs as documented. Can we enhance the as.Date() function to convert NA strings into NA value prior to type conversion? I sincerely appreciate the help, but with all due respect, I have read the posting guide and did provide the minimal code necessary to reproduce the desired feature. To reiterate, I would like to be able to feed the character string NA to the as.Date() function to yield the same result as `as.Date(NA)`. Please advise if testing the following does not yield an error: as.Date(NA); This may or may not aid the read.csv() error message in my particular code (for which a workaround has already been identified). Thank you. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Applying do.call to a data.frame using function arguments (nabble: message 8 of 20)
On Wed, Aug 26, 2009 at 12:47 PM, hadley wickham - I think you're missing some quotes: cat(do.call(paste,c(x2,sep=','))[1], \n) Thanks - the strings are actually substrings of larger strings (specifically, SQL statements), which will wrap with the leading and trailing quotes (though I should have pointed this out in my original post). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Accessing list object from within a function as a list element
Hi - I have a list (call it 'mylist') with the following elements: (i) a function (call it 'myfunc' and expressed as 'mylist$myfunc') and (ii) a variable (call it 'myvar' and expressed as 'mylist$myvar'). Since I use mylist as a pseudo-class (I assign mylist to multiple different R objects), I would like to access the mylist R object from within the 'myfunc' function to use 'myvar.' Here is a simple example: mylist - list(); mylist$myvar - ~/file.out; mylist$myfunc - function (mymsg=hello world) { cat(mymsg,mylist$myvar); }; If I perform the following: myclassobj_1 - myclassobj_2 - myclassobj_3 - mylist; myclassobj_1 - ~/file_1.out; myclassobj_2 - ~/file_2.out; myclassobj_3 - ~/file_3.out; I cannot use myclassobj_1$myfunc() as it will place hello world into ~/file.out instead of ~/file_1.out. Unless I want to make the myclassobj_? lists an argument to the 'myfunc' function, I need an object reference to the appropriate 'mylist.' For instance: mylist$myfunc - function (mymsg=hello world) { objref - function returning container list; cat(mymsg,objref$myvar); }; I had some success with the sys.call() function, though the implementation is sloppy and inconsistent as it relies on the environment frame in which the call is made as opposed to an attribute of the 'myfunc' function. Any suggestions as to this specific issue or preferred implementation of user-defined classes in R? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Passing arguments to forked children
Hi - I have attempted to use the fork::fork() function to perform parallel processing. However, the child R function called needs to know a given set of parameters to complete its task. Specifically, I iterate through a vector, and output values based on the elements of that vector to a database. The output strings contain elements of the iterated vector. I mocked-up the following code as an example (NOTE: WHILE NOT SPECIFICALLY DANGEROUS, THIS CODE MAXED OUT THE LIMIT OF MY SYSTEMS FORKS -- this means that, if you run this code, no additional processes on your system may start until you kill the parent R session! BE VERY CAREFUL IF YOU DECIDE TO EXECUTE THIS CODE -- obviously, I do not recommend it. Presumably, an infinite recursion scenario arose so you are just left with a *lot* of R sessions. Also, as each R session has equal access to stdin, you cannot reliably type commands into a given R session to terminate it -- so definitely don't run in a CLI environment -- at least you can kill the parent window running R in a GUI environment). In any case, here is the code: # -- BEGIN CODE library(fork); myforksub - function(mymsg='default') { cat(mymsg,sep='\n'); exit(); } myforkparent - function(n=10, mymsg='') { mypid - c(); for (i in 1:n) { mypid - c(mypid, fork(myforksub(mymsg))); } # wait(NULL) apparently does not wait for all children to finish for (i in 1:n) { wait(mypid[i]); } } myforkparent(mymsg='new'); # -- END CODE Obviously, 'fork(myforksub)' will work fine, but myforksub cannot access the mymsg variable containing the 'new' value. How can I amend the above without having to resort to socket connections to pass information? While this question is specific to the 'fork' non-standard package, I thought a few people here would be familiar enough with its use to offer a suggestion or two. Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.