[R] fitting mixed models to censored data?
Hi, I'm trying to figure out if there are any packages allowing one to fit mixed models (or non-linear mixed models) to data that includes censoring. I've done some searching already on CRAN and through the mailing list archives, but haven't discovered anything. Since I may well have done a poor job searching I thought I'd ask here prior to giving up. I understand that SAS's proc nlmixed can accomodate censoring (though proc mixed apparently can't), so if I can't find something available in R, I'll have to break down and use that. Please, save me from having to use SAS! Thanks much, Doug __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fitting mixed models to censored data?
Hi Bert, Yes, I am always wary when one software offers something that other do not. The censoring I'm faced with (at present) isn't as complicated as with much 'survival' data. I'm trying to analyze assay data and have a lower limit of detection (LLD) to contend with. Once the level of the analyte gets low enough it can't be accurately quantitated, hence all that is reported is that the level is less than some value (the LLD). So I'm not worried about all the complex assumptions that go along with censoring in clinical trials, etc. Thanks, Doug On Mon, 23 Apr 2007, Bert Gunter wrote: Douglas: AFAIK, this is subject area of active current research. Diggle, Heagerty, Liang, and Zeger , 2002, (ANALYSIS OF LONGITUDINAL DATA) say on p.316: An emerging consensus is that analysis of data with potentially informative dropouts necessarily involves assumptions which are difficult, or even impossible, to check from the observed data. This was ca 1994, I believe, so I don't know whether this view is still held among experts (which I am not). But if it is, you may do well to be careful of whatever SAS does even if you do have to go running off to it. Cheers, Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Douglas Grove Sent: Monday, April 23, 2007 10:58 AM To: r-help@stat.math.ethz.ch Subject: [R] fitting mixed models to censored data? Hi, I'm trying to figure out if there are any packages allowing one to fit mixed models (or non-linear mixed models) to data that includes censoring. I've done some searching already on CRAN and through the mailing list archives, but haven't discovered anything. Since I may well have done a poor job searching I thought I'd ask here prior to giving up. I understand that SAS's proc nlmixed can accomodate censoring (though proc mixed apparently can't), so if I can't find something available in R, I'll have to break down and use that. Please, save me from having to use SAS! Thanks much, Doug __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fitting mixed models to censored data?
Hi Bill, Thanks for your reply. The first place I looked was in the survival package since it can obviously handle censored data. However, I don't have any particular desire to restrict myself to standard survival models just because I have some censoring. Frailties appear to fit in nicely with the types of models typically used with survival data, but that's not the only kind of model I'd like to look at. Thanks, Doug On Mon, 23 Apr 2007, Pikounis, Bill [CNTUS] wrote: Doug, In perhaps similar situations where there are clusters of measurements due to repeated time or space on an individual subject or experimental unit, I have used the survreg() function from the survival library. You can specify left, right, and/or interval censoring within a data set through Surv(), and so I have used left censoring for the LOD observations. I was just focused on marginal or population-averaged estimation, so the use of cluster() in the argument for survreg() and the robust option in survreg() to get sandwich error estimates was sufficient for me. Depending on your needs to evaluate random effects, frailty() in the survival package -- which can be used with survreg() or coxph() --- is another alternative to explore, I believe. Hope that helps, Bill Nonclinical Statistics, Centocor R D -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Douglas Grove Sent: Monday, April 23, 2007 2:29 PM To: Bert Gunter Cc: r-help@stat.math.ethz.ch Subject: Re: [R] fitting mixed models to censored data? Hi Bert, Yes, I am always wary when one software offers something that other do not. The censoring I'm faced with (at present) isn't as complicated as with much 'survival' data. I'm trying to analyze assay data and have a lower limit of detection (LLD) to contend with. Once the level of the analyte gets low enough it can't be accurately quantitated, hence all that is reported is that the level is less than some value (the LLD). So I'm not worried about all the complex assumptions that go along with censoring in clinical trials, etc. Thanks, Doug On Mon, 23 Apr 2007, Bert Gunter wrote: Douglas: AFAIK, this is subject area of active current research. Diggle, Heagerty, Liang, and Zeger , 2002, (ANALYSIS OF LONGITUDINAL DATA) say on p.316: An emerging consensus is that analysis of data with potentially informative dropouts necessarily involves assumptions which are difficult, or even impossible, to check from the observed data. This was ca 1994, I believe, so I don't know whether this view is still held among experts (which I am not). But if it is, you may do well to be careful of whatever SAS does even if you do have to go running off to it. Cheers, Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Douglas Grove Sent: Monday, April 23, 2007 10:58 AM To: r-help@stat.math.ethz.ch Subject: [R] fitting mixed models to censored data? Hi, I'm trying to figure out if there are any packages allowing one to fit mixed models (or non-linear mixed models) to data that includes censoring. I've done some searching already on CRAN and through the mailing list archives, but haven't discovered anything. Since I may well have done a poor job searching I thought I'd ask here prior to giving up. I understand that SAS's proc nlmixed can accomodate censoring (though proc mixed apparently can't), so if I can't find something available in R, I'll have to break down and use that. Please, save me from having to use SAS! Thanks much, Doug __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cannot turn some columns in a data frame into factors
You need to create a new object and assign it to 'df' so you'd do something like this: df - sapply(factors, function (name) { pos - match(name,df.names) factor(df[[pos]]) }) Doug On Thu, 11 May 2006, Sam Steingold wrote: * jim holtman [EMAIL PROTECTED] [2006-05-11 12:27:39 -0400]: try '-' as the assignment to make it global. df[[pos]] - factor(df[[pos]]) nothing changed -- I observe the exact same behaviour: Month ( 1 ): TRUE factors: FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE On 5/11/06, Sam Steingold [EMAIL PROTECTED] wrote: Hi, I have a data frame df and a list of names of columns that I want to turn into factors: df.names - attr(df,names) sapply(factors, function (name) { pos - match(name,df.names) if (is.na(pos)) stop(paste(name,: no such column\n)) df[[pos]] - factor(df[[pos]]) cat(name,(,pos,):,is.factor(df[[pos]]),\n) }) cat(factors:,sapply(df,is.factor),\n) the output is: Month ( 1 ): TRUE factors: FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE i.e., there is a column named Month (the 1st column), and it is indeed turned into a factor inside sapply(), but after that it is numerical again! what am I doing wrong? -- Sam Steingold (http://www.podval.org/~sds) on Fedora Core release 5 (Bordeaux) http://pmw.org.il http://ffii.org http://memri.org http://palestinefacts.org http://truepeace.org http://mideasttruth.com http://dhimmi.com If you're being passed on the right, you're in the wrong lane. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] is there a formatted output in R?
You really need to learn how to do some searching, as you seem to be constantly asking questions you can answer yourself help.search(sprintf) On Fri, 10 Mar 2006, Michael wrote: something like sprintf in C? so I can do: print(sprintf(the correct result is %3.4f\n, myresult)); --- Also, I am desperately looking for a clear console screen function in R... thanks a lot! [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] can I do this with read.table??
Hi, I'm trying to figure out if there's an automated way to get read.table to read in my data and *not* convert the character columns into anything, just leave them alone. What I'm referring to as 'character columns' are columns in the data that are quoted. For columns of alphabetic strings (that aren't TRUE or FALSE) I can suppress conversion to factor with as.is=TRUE, but what I'd like to stop is the conversion of quoted numbers of the form 01,02,..., into numeric form. By an 'automated way', I mean one that does not involve me having to know which columns in the data are the ones I want kept as they are. This doesn't seem like an unreasonable thing to want to do. After all, say I've got the data.frame: A - data.frame(a=1:3, b=I(c(01,02,03))) I can export this to a text file with the simple command write.table(A, A.txt, sep=\t, row.names=FALSE, quote=TRUE) but I cannot find an equally simple mechanism for reading this data back in from A.txt that allows me to reconstruct my data.frame 'A'. Is this an unreasonable thing to expect? Thanks, Doug __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] can I do this with read.table??
I did read the help page, very carefully. The colClasses argument can be used if I want to stop and look through every data set to see which column I need to protect. But that's what I said that I don't want to do. As for 'as.is', I wish it did what you suggest, but it doesn't. If one reads carefully, as.is protects a character vector from converstion to a *factor*, but not from conversion to numeric/logical. Doug On Sun, 26 Feb 2006, Kjetil Brinchmann Halvorsen wrote: Douglas Grove wrote: Hi, I'm trying to figure out if there's an automated way to get read.table to read in my data and *not* convert the character columns into anything, just leave them alone. What I'm referring ?Did you read the help page? What about argument as.is=TRUE? See also argument colClasses Kjetil to as 'character columns' are columns in the data that are quoted. For columns of alphabetic strings (that aren't TRUE or FALSE) I can suppress conversion to factor with as.is=TRUE, but what I'd like to stop is the conversion of quoted numbers of the form 01,02,..., into numeric form. By an 'automated way', I mean one that does not involve me having to know which columns in the data are the ones I want kept as they are. This doesn't seem like an unreasonable thing to want to do. After all, say I've got the data.frame: A - data.frame(a=1:3, b=I(c(01,02,03))) I can export this to a text file with the simple command write.table(A, A.txt, sep=\t, row.names=FALSE, quote=TRUE) but I cannot find an equally simple mechanism for reading this data back in from A.txt that allows me to reconstruct my data.frame 'A'. Is this an unreasonable thing to expect? Thanks, Doug __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Selecting data frame components by name - do you know a shorter way?
So you want to create a subset of a data frame? with components name1 name2 name3 ... dframe[, c(name1,name2,name3,...)] will do that Doug On Fri, 20 Jan 2006, Michael Reinecke wrote: Hi! I suspect there must be an easy way to access components of a data frame by name, i.e. the input should look like name1 name2 name3 ... and the output be a data frame of those components with the corresponding names. I ´ve been trying for hours, but only found the long way to do it (which is not feasible, since I have lots of components to select): dframe[names(dframe)==name1 | dframe==name2 | dframe==name3] Do you know a shortcut? Michael [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] 'x' must be numeric
It's much more helpful if you show the actual command you used. Presumably you have a data frame 'd' and you've done hist(d), and 'hist' has complained because d is not numeric, d is a data frame that *contains* a numeric vector. You need to give hist() that numeric vector, which you can do in many ways, including: d$V1, d[,V1] and d[,1] Doug On Fri, 20 Jan 2006, Naiara S. Pinto wrote: Hello all, I am importing data from a txt file and try to get a histogram, I get the message: Error in hist: 'x' must be numeric. When I use mode R returns List. However when I use srt I get: `data.frame': 456 obs. of 1 variable: $ V1: num 0.6344 0.4516 0.0968 0.7634 0.7957 ... My file consists of one column only (no headers) and I can't figure out why I am getting this error message. Why does this happen? Thanks! Naiara. Naiara S. Pinto Ecology, Evolution and Behavior 1 University Station A6700 Austin, TX, 78712 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] discovery (was: data.frame to character)w
Help pages are useful, you should try them e.g. ?pi or ?LETTERS How can one discover or list all available built-in objects? On Jun 10, 2005, at 7:23 AM, Muhammad Subianto wrote: L3 - LETTERS[1:3] L10 - LETTERS[1:10] LETTERS is apparently a built-in character vector. ls() and objects () only lists the ones I've created. Is there a function that lists all available built-in objects? For example, pi is another built-in, but e is not. A means to list them would be nice. Regards, - Robert __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] problem with dir() in R-2.1.0?
The new version of R has begun enforcing rules on regular expressions. Your pattern is not a valid regular expression, hence it no longer works. The meaning of '*' is with respect to a preceding character, hence it is ill-defined without one. On Mon, 25 Apr 2005, Ye, Bin wrote: Hi, I always use dir(pattern=*.RData) in all the earlier version of R (1.8, 1.9, 2.0.1). Error messege is as below: Error in list.files(path, pattern, all.files, full.names, recursive) : invalid 'pattern' regular expression Does anyone have an idea what's going on? How should I define the pattern I need in R-2.1.0? Thanks! Bin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How about a mascot for R?
When I think of New Zealand I think Rabbit :) How 'bout something like the Monty Python rabbit from the Holy Grail (nasty pointy teeth..., look at the bones!) Doug __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] inverse function of order()
An alternate method that saves having to use order() again is r[o] - r Doug On Mon, 2004-10-04 at 15:21, Wolfram Fischer wrote: I have: d - sample(10:100, 9) o - order(d) r - d[o] How I can get d (in the original order), knowing only r and o? Thanks - Wolfram __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] alternate rank method
I agree. These are obvious extensions to the options provided now by rank. I didn't suggest this as I am not a contributor and don't feel comfortable asking others to do more work :) Thanks, Doug On Tue, 29 Jun 2004, Martin Maechler wrote: Torsten == Torsten Hothorn [EMAIL PROTECTED] on Mon, 28 Jun 2004 10:59:26 +0200 (CEST) writes: Torsten On Fri, 25 Jun 2004, Douglas Grove wrote: I should have specified an additional constraint: I'm going to need to use this repeatedly on large vectors (length 10^6), so something efficient is needed. Torsten give function `irank' in package `exactRankTests' a Torsten try. As an answer to Torsten (who got it already orally) and Gabor's original tricky suggestions: I strongly believe this should happen in the same C code on which R's base rank() function works and already implements the *averaging* of ties. Doing the analog of changing average(..) to min(..) or max(..) shouldn't be hard and certainly will be more efficient than the workarounds posted here. Patches welcome... since otherwise I'm not sure I'll get there in time. Martin __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] ties in runif() output
On Sat, 26 Jun 2004, Prof Brian Ripley wrote: On Fri, 25 Jun 2004, Douglas Grove wrote: I get ties in output from runif() when I generate as few as 10^5 variates and get quite a lot when I generate 10^6. Is this expected?? It should have been. I haven't seen any duplication with rnorm(10^6), but see varying amounts of duplication using rexp(), rbeta() and rgamma(). I would have thought that there'd be enough precision that one wouldn't get ties until generating samples larger than this.. Did you do the calculations? Please do so. There are about 2e9 possible values of the standard generators. I know little about the limitations of random number generation and didn't realize that only 2e9 values were obtainable. I could have done the math myself had I known Thanks very much for your help, Doug qbirthday(classes=2e9) [1] 52655 Statisticians ought to know about the birthday problem! (rnorm is different because the default generator uses two uniforms, deliberately to increase the precision.) set.seed(222) sum(duplicated(runif(10^5))) [1] 4 That's unusually high, BTW. sum(duplicated(runif(10^6))) [1] 140 -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] alternate rank method
Hi, I'm wondering if anyone can point me to a function that will allow me to do a ranking that treats ties differently than rank() provides for? I'd like a method that will assign to the elements of each tie group the largest rank. An example: For the vector 'v', I'd like the method to return 'rv' v: 1 2 3 3 3 4 5 5 6 7 rv: 1 2 5 5 5 6 8 8 9 10 Thanks, Doug Grove __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] alternate rank method
I should have specified an additional constraint: I'm going to need to use this repeatedly on large vectors (length 10^6), so something efficient is needed. On Fri, 25 Jun 2004, Sundar Dorai-Raj wrote: Douglas Grove wrote: Hi, I'm wondering if anyone can point me to a function that will allow me to do a ranking that treats ties differently than rank() provides for? I'd like a method that will assign to the elements of each tie group the largest rank. An example: For the vector 'v', I'd like the method to return 'rv' v: 1 2 3 3 3 4 5 5 6 7 rv: 1 2 5 5 5 6 8 8 9 10 Thanks, Doug Grove How about rv - rowSums(outer(v, v, =)) Adapted from Prof. Ripley's reply in the following thread: http://finzi.psych.upenn.edu/R/Rhelp02/archive/31993.html HTH, --sundar __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] ties in runif() output
I get ties in output from runif() when I generate as few as 10^5 variates and get quite a lot when I generate 10^6. Is this expected?? I haven't seen any duplication with rnorm(10^6), but see varying amounts of duplication using rexp(), rbeta() and rgamma(). I would have thought that there'd be enough precision that one wouldn't get ties until generating samples larger than this.. set.seed(222) sum(duplicated(runif(10^5))) [1] 4 sum(duplicated(runif(10^6))) [1] 140 platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status Patched major1 minor9.0 year 2004 month04 day 13 language R Thanks, Doug Grove __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] predict function
You can't use this anymore. The function predict() has a method for loess objects, but there is no longer an available function called predict.loess. So just replace predict.loess with predict. On Fri, 13 Feb 2004, Thomas Jagoe wrote: I am using R to do a loess normalisation procedure. In 1.5.1 I used the following commands to normalise the variable logratio, over a 2d surface (defined by coordinates x and y): array - read.table(121203B_QCnew.txt, header=T, sep=\t) array$logs555-log(array$s555)/log(2) array$logs647-log(array$s647)/log(2) array$logratio-array$logs555-array$logs647 array$logav-(array$logs555+array$logs647)/2 library(modreg) loess2d-loess(logratio~x+y,data=array) array$logratio2DLoeNorm -array$logratio - predict.loess(loess2d, array) However in 1.8.1 all goes well until the last step when I get an error: Error: couldn't find function predict.loess Can anyone help ? Thomas __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Windows Memory Issues
On Sat, 6 Dec 2003, Prof Brian Ripley wrote: I think you misunderstand how R uses memory. gc() does not free up all the memory used for the objects it frees, and repeated calls will free more. Don't speculate about how memory management works: do your homework! Are you saying that consecutive calls to gc() will free more memory than a single call, or am I misunderstanding? Reading ?gc and ?Memory I don't see anything about this mentioned. Where should I be looking to find more comprehensive info on R's memory management?? I'm not writing any packages, just would like to have a better handle on efficiently using memory as it is usually the limiting factor with R. FYI, I'm running R1.8.1 and RedHat9 on a P4 with 2GB of RAM in case there is any platform specific info that may be applicable. Thanks, Doug Grove Statistical Research Associate Fred Hutchinson Cancer Research Center In any case, you are using an outdated version of R, and your first course of action should be to compile up R-devel and try that, as there has been improvements to memory management under Windows. You could also try compiling using the native malloc (and that *is* described in the INSTALL file) as that has different compromises. On Sat, 6 Dec 2003, Richard Pugh wrote: Hi all, I am currently building an application based on R 1.7.1 (+ compiled C/C++ code + MySql + VB). I am building this application to work on 2 different platforms (Windows XP Professional (500mb memory) and Windows NT 4.0 with service pack 6 (1gb memory)). This is a very memory intensive application performing sophisticated operations on large matrices (typically 5000x1500 matrices). I have run into some issues regarding the way R handles its memory, especially on NT. In particular, R does not seem able to recollect some of the memory used following the creation and manipulation of large data objects. For example, I have a function which receives a (large) numeric matrix, matches against more data (maybe imported from MySql) and returns a large list structure for further analysis. A typical call may look like this . myInputData - matrix(sample(1:100, 750, T), nrow=5000) myPortfolio - createPortfolio(myInputData) It seems I can only repeat this code process 2/3 times before I have to restart R (to get the memory back). I use the same object names (myInputData and myPortfolio) each time, so I am not create more large objects .. I think the problems I have are illustrated with the following example from a small R session . # Memory usage for Rui process = 19,800 testData - matrix(rnorm(1000), 1000) # Create big matrix # Memory usage for Rgui process = 254,550k rm(testData) # Memory usage for Rgui process = 254,550k gc() used (Mb) gc trigger (Mb) Ncells 369277 9.9 667722 17.9 Vcells 87650 0.7 24286664 185.3 # Memory usage for Rgui process = 20,200k In the above code, R cannot recollect all memory used, so the memory usage increases from 19.8k to 20.2. However, the following example is more typical of the environments I use . # Memory 128,100k myTestData - matrix(rnorm(1000), 1000) # Memory 357,272k rm(myTestData) # Memory 357,272k gc() used (Mb) gc trigger (Mb) Ncells 478197 12.8 818163 21.9 Vcells 9309525 71.1 31670210 241.7 # Memory 279,152k Here, the memory usage increases from 128.1k to 279.1k Could anyone point out what I could do to rectify this (if anything), or generally what strategy I could take to improve this? Many thanks, Rich. Mango Solutions Tel : (01628) 418134 Mob : (07967) 808091 [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Kmeans again
I'm sorry to insist but I still think there is something wrong with the function kmeans. For instance, let's try the same small example: dados-matrix(c(-1,0,2,2.5,7,9,0,3,0,6,1,4),6,2) I will choose observations 3 and 4 for initial centers and just one iteration. The results are A-kmeans(dados,dados[c(3,4),],1) A $cluster [1] 1 1 1 1 2 2 $centers [,1] [,2] 1 0.875 2.75 2 8.000 2.50 $withinss [1] 38.9375 6.5000 $size [1] 4 2 If I do it by hand, after one iteration, the results are $cluster [1] 1 2 1 2 1 2 So I think that something is wrong with the function kmeans; probably the initial centers given by the user are not being taken into account. Andy Liaw already gave an example where he specified two different starting values and Kmeans gave different results after 1 iteration, so clearly your hypothesis is incorrect. Either your calculations are wrong or you are calculating the wrong formulae. It is very doubtful that anything is wrong with Kmeans. Doug Grove __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] removing leading/trailing blanks
Hi, What's the best way of dropping leading or trailing blanks from a character string? The only thing I can think of is using sub() to replace blanks with null strings, but I don't know if there is a better way (I also don't know how to represent the trailing blank in a regular expression). Thanks, Doug Grove __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] dataframe subsetting behaviour
Douglas Grove [EMAIL PROTECTED] writes: Hi, I'm trying to understand a behaviour that I have encountered and can't fathom. Here's some code I will use to illustrate the behaviour: # start with some data frame a having some named columns a - data.frame(a=rep(1,3),c=rep(2,3),d=rep(3,3),e=rep(4,3)) # create a subset of the original data frame, but include a # name b that is not present in my original data frame b - a[,c(a,b,c)] ## Up until now no errors are issued, but the following commands ## will give the error shown: b[1,] ## Error in x[[j]] : subscript out of bounds b[1,2]## Error in names-.default(*tmp*, value = cols) : ## names attribute must be the same length as the vector Can anyone explain to me the meaning of these error messages in terms of R is actually doing? These error messages had me baffled and it took me hours to track down that the source of the error was an incorrect column name in my data frame subsetting. Looks like a (semi-)bug. Indexing outside of the data frame creates a column which is really the single value NULL, e.g. dput(a[,4:5]) structure(list(e = c(4, 4, 4), NA = NULL), .Names = c(e, NA), row.names = c(1, 2, 3), class = data.frame) This will print because the format.data.frame called inside print.data.frame will recycle the NULL and give you a[,4:5] e NA 1 4 NULL 2 4 NULL 3 4 NULL However, it confuses the h*ck out of [.data.frame (a[,4:5])[2] Error in [.data.frame((a[, 4:5]), 2) : undefined columns selected (a[,4:5])[,2] NULL (a[,4:5])[,1] [1] 4 4 4 and also the examples you found. However, the main issue is that you have managed to construct a corrupt data frame. So indexing outside the array should probably either give an error or return a column of NA. Yes, it would be nice if trying to index outside the data frame generated an error, that is what happens in Splus (at least the version I have access to: 6.0 Release 1 for Linux 2.2.12) -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help