[R] Why it is happeing?
Dear all, I had following calculations with R: x = vector(length = 4) x[1] = 1 x[2] = 3 x[3] = 123456789123456 x[4] = -9876543219876 as.integer(x) [1] 1 3 NA NA Warning message: NAs introduced by coercion What went wrong? Thanks and regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why it is happeing?
Note that current implementations of*R*use 32-bit integers for integer vectors, so the range of representable integers is restricted to about/+/-2*10^9/:|double http://stat.ethz.ch/R-manual/R-patched/library/base/html/double.html|s can hold much larger integers exactly. hth d 2011-11-26 13:05 keltezéssel, Christofer Bogaso írta: Dear all, I had following calculations with R: x = vector(length = 4) x[1] = 1 x[2] = 3 x[3] = 123456789123456 x[4] = -9876543219876 as.integer(x) [1] 1 3 NA NA Warning message: NAs introduced by coercion What went wrong? Thanks and regards, __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Constrained linear regression
Dear all, I need to run a simple linear regression such that: y = b0 + b1*x1 + (1-b1)*x2 + e which I know I can use: lm(y ~ I(x1 - x2) + offset(x2)). However, I also need to restrict the coefficient b1 to be between 0 and 1. Is there any way to include such restriction in the linear regression estimation? I saw suggestion related with the function Solve.QP, but I really did not understand such method. Thanks in advance, Julia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrained linear regression
Sounds like it is or could be considered a mixtures problem. Check out the FlexMix package, which looks like it should do exactly what you want. (But maybe not, so look carefully). -- Bert On Sat, Nov 26, 2011 at 6:10 AM, Julia Lira julia.l...@hotmail.co.uk wrote: Dear all, I need to run a simple linear regression such that: y = b0 + b1*x1 + (1-b1)*x2 + e which I know I can use: lm(y ~ I(x1 - x2) + offset(x2)). However, I also need to restrict the coefficient b1 to be between 0 and 1. Is there any way to include such restriction in the linear regression estimation? I saw suggestion related with the function Solve.QP, but I really did not understand such method. Thanks in advance, Julia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] dir.create() does not create directory
Hello, I am running Windows 7 and R-2.13 in StatET. When I try to create a directory it does not print any errors but if I check outside eclipse if it exists or do a refresh in Eclipse the directory is not been created. The strange thing is that it happens only to some sub-folders... On the other hand when I use the normal windows explorer in one of these sub-folders and create a folder it works. Any ideas? I also tried the mode=777 option but still the same problem Cheers, syrvn -- View this message in context: http://r.789695.n4.nabble.com/dir-create-does-not-create-directory-tp4110517p4110517.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dir.create() does not create directory
I'm not a Windows man, but have you tried in the R CLI or GUI rather than Eclipse? That would help narrow down the problem. Also, if you could provide a minimal example for those who have Windows boxes that'd be great - though admittedly it sounds hard here. As an outline, something like: sessionInfo() setwd() list.files() create.dir() list.files() should suffice. Michael On Nov 26, 2011, at 10:07 AM, syrvn ment...@gmx.net wrote: Hello, I am running Windows 7 and R-2.13 in StatET. When I try to create a directory it does not print any errors but if I check outside eclipse if it exists or do a refresh in Eclipse the directory is not been created. The strange thing is that it happens only to some sub-folders... On the other hand when I use the normal windows explorer in one of these sub-folders and create a folder it works. Any ideas? I also tried the mode=777 option but still the same problem Cheers, syrvn -- View this message in context: http://r.789695.n4.nabble.com/dir-create-does-not-create-directory-tp4110517p4110517.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cumsum in 3d arrays
Hello! Is it posible to apply /cumsum()/ along the 3rd dimension of 3D array? Something like matrlab function - /cumsum (*A*,dim)/ which returns the cumulative sum of the elements along the dimension of *A* specified by scalar dim. Thanks in advance Željka -- View this message in context: http://r.789695.n4.nabble.com/cumsum-in-3d-arrays-tp4110470p4110470.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Time series merge?
I have two time series a - ts(1:10, start=c(1,6), end=c(2,5), frequency=10) b - ts(1:5, start=c(2,1), end=c(2,5), frequency=10) Obviously 'b' is a subset of 'a'. I want a single index value indicating where that start of 'b' lines up with the start of 'a'. So in this simple example I would expect an index of 5. I was playing with 'merge'. But, for a 'ts' object this does not produce anything that is useful: merge(a,b) x 1 1 2 2 3 3 4 4 5 5 I get the same answer if I use 'merge(b,a)' so I don't know how to convert this result to something useful. So then I decided to use 'xts'. But the conversion fails: ax - as.xts(a) Error in as.xts.ts(a) : could not convert index to appropriate type For this simple example I could code it myself using a simple for loop but if I add capability to handle missing dates, different frequencies, etc. it gets complicated very fast. It seems that 'xts' has more extensive date handling facilities that 'ts' but I am stuck since it doesn't look like I can convert from 'ts' to 'xts'. Thanks in advance for your suggestions. Kevin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time series merge?
On Sat, Nov 26, 2011 at 10:55 AM, Kevin Burton rkevinbur...@charter.net wrote: I have two time series a - ts(1:10, start=c(1,6), end=c(2,5), frequency=10) b - ts(1:5, start=c(2,1), end=c(2,5), frequency=10) Obviously 'b' is a subset of 'a'. I want a single index value indicating where that start of 'b' lines up with the start of 'a'. So in this simple example I would expect an index of 5. I was playing with 'merge'. But, for a 'ts' object this does not produce anything that is useful: merge(a,b) Try this: library(zoo) m - merge(a = as.zoo(a), b = as.zoo(b)) m or to get a ts object back: as.ts(m) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cumsum in 3d arrays
zloncaric zloncaric at biologija.unios.hr writes: Is it posible to apply /cumsum()/ along the 3rd dimension of 3D array? Something like matrlab function - /cumsum (*A*,dim)/ which returns the cumulative sum of the elements along the dimension of *A* specified by scalar dim. Check out the combination of apply and cumsum. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot xy data
On Nov 25, 2011, at 11:27 PM, sutada Mungpakdee wrote: Hi, Has anyone know about how to get the correct plot? I have use this R script (as below), so I expect the plot is based on x axis, but the result was opposite. Any suggestion will be great. You question doesn't make clear what you expected and what you are seeing. I also do not see why you added library(IRanges) because I see nothing from that package in the code. We cannot run it because the data is not provided and you made not effort to construct a data.frame that would match the attributes of the real data. It would be better of course to call your data something other than dog. library(IRanges) data -read.table(file=~/q20snpref/ illusmp454merbed,sep=\t,header=F) colnames(data)-c(Scaffold,sca_position,coverage) depth-mean(data[,coverage]) #depth now has the mean (overall)coverage #set the bin-size window-10001 rangefrom-0 rangeto-length(data[,sca_position]) data.10kb-runmed(data[,coverage],k=window) png(file=cov_10k.png,width=1000,height=1000) plot(x=data. 10kb [rangefrom :rangeto ],y = data [rangefrom :rangeto ,sca_position ],pch=.,cex=1,xlab=depth,ylab=bp_position,type=p) If you want to swap the roles of data.10kb (AKA coverage) and sca_position then just reverse the x and y assignments. dev.off() Best regards, Sutada __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cumsum in 3d arrays
On Nov 26, 2011, at 9:32 AM, zloncaric wrote: Hello! Is it posible to apply /cumsum()/ along the 3rd dimension of 3D array? Something like matrlab function - /cumsum (*A*,dim)/ which returns the cumulative sum of the elements along the dimension of *A* specified by scalar dim. `apply` lets you chose which dimension gets selected. Perhaps: apply(mat, 3, cumsum) (This is pretty basic stuff so you should probably be reading or at least skimming somewhat more thoroughly than you have so far the Introduction to R document and there is also the R for Matlab document by Bob Muenchen ... and a compendium of equivalencies by Hiebeler at: www.math.umaine.edu/~hiebeler/comp/matlabR.html ) -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cumsum in 3d arrays
On Nov 26, 2011, at 11:24 AM, David Winsemius wrote: On Nov 26, 2011, at 9:32 AM, zloncaric wrote: Hello! Is it posible to apply /cumsum()/ along the 3rd dimension of 3D array? Something like matrlab function - /cumsum (*A*,dim)/ which returns the cumulative sum of the elements along the dimension of *A* specified by scalar dim. `apply` lets you chose which dimension gets selected. Perhaps: apply(mat, 3, cumsum) Or perhaps apply(mat, 1:2, cumsum) (This is pretty basic stuff so you should probably be reading or at least skimming somewhat more thoroughly than you have so far the Introduction to R document and there is also the R for Matlab document by Bob Muenchen ... and a compendium of equivalencies by Hiebeler at: www.math.umaine.edu/~hiebeler/comp/matlabR.html ) -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time series merge?
Seems to work fine. Thank you. -Original Message- From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] Sent: Saturday, November 26, 2011 10:11 AM To: Kevin Burton Cc: r-help@r-project.org Subject: Re: [R] Time series merge? On Sat, Nov 26, 2011 at 10:55 AM, Kevin Burton rkevinbur...@charter.net wrote: I have two time series a - ts(1:10, start=c(1,6), end=c(2,5), frequency=10) b - ts(1:5, start=c(2,1), end=c(2,5), frequency=10) Obviously 'b' is a subset of 'a'. I want a single index value indicating where that start of 'b' lines up with the start of 'a'. So in this simple example I would expect an index of 5. I was playing with 'merge'. But, for a 'ts' object this does not produce anything that is useful: merge(a,b) Try this: library(zoo) m - merge(a = as.zoo(a), b = as.zoo(b)) m or to get a ts object back: as.ts(m) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how big (in RAM and/or disk storage) is each of these objects in a list?
Greetings, friends (and others :) ) We generated a bunch of results and saved them in an RData file. We can open, use, all is well, except that the size of the saved file is quite a bit larger than we expected. I suspect there's something floating about in there that one of the packages we are using puts in, such as a spare copy of a data frame that is saved in some subtle way that has escaped my attention. Consider a list of objects. Are there ways to do these things: 1. ask R how much memory is used by the things inside the list? 2. Does as.expression(anObject) print everything in there? Or, is there a better way to convert each thing to text or some other format that you can actually read line by line to see what is in there, to see everything? If there's no giant hidden data frame floating about, I figure I'll have to convert symmetric matrices to lower triangles or such to save space. Unless R already is automatically saving a matrix in that way but just showing me the full matrix, which I suppose is possible. If you have other ideas about general ways to make saved objects smaller, I'm open for suggestions. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SPSS - R
I'm an SPSS user trying to make the transition to R. Can someone help me translate the following SPSS code into R?: GLM Total_tp1 Total_tp2 WITH Age Sex /WSFACTOR=Time 2 Repeated /METHOD=SSTYPE(3) /CRITERIA=ALPHA(.05) /WSDESIGN= Time /DESIGN= Age Sex Age*Sex. Also. can anyone recommend any resources to help SPSS users learn to things in R? Thanks, -kristi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need some vectorizing help
Thank you very much David - R is so rich, the easy way can be hard to find. Just to close this out for others, the final solution I used was: Peak2Return - function(v) { S - cummax(v) L - which((v ==S) (diff(c(0,v)0)) R - sapply(v[L], function(x,S) {which(x S)[1]; }, S) now you have L for the left index, and R for the corresponding right index. If there is no right index due to the curve, the R value is NA. On 11/24/2011 7:35 AM, David Winsemius wrote: On Nov 24, 2011, at 4:52 AM, Scott Tetrick wrote: So I have a problem that I'm trying to get through, and I just can't seem to get it to run very fast in R. What I'm trying to do is to find in a vector a local peak, then the next time that value is crossed later. I don't care about peaks that may be lower than this first one - they can be ignored. I've tried some sapply methods along the way, but they all are slower. The best solution I have is a loop, and I just know there are smart R folks that could help me eliminate it. It looks as though you are reinventing hte function: ?cummax Peak2Return - function(v) { Q - (1:m)[diff(v)0]; find all the peaks L - Q[c(TRUE,v[Q[-1]] v[Q[-length(Q)]])] ; eliminate lower peaks R - sapply(L,function (x,v) { ((x+1):length(v))[v[x] v[(x+1):m]][1]; }, v) ; find the next crossing out - data.frame(peak=L,Return=R) out } Thanks in advance! David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] computationally singular error with mice()
Hi Josh, Thanks for the kind reminder of posting the dataframe on. My dataframe contains lots of categorical variables, which seems to be problematic. For instance, dobstatus edu mrext married highschool yes, full time Do you know how to specify the imputation methods and the visitSquence so that those categorical variables are not involved in the imputation process? Thank you. Fei -- View this message in context: http://r.789695.n4.nabble.com/computationally-singular-error-with-mice-tp4109583p4110776.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how big (in RAM and/or disk storage) is each of these objects in a list?
On 11-11-26 1:41 PM, Paul Johnson wrote: We generated a bunch of results and saved them in an RData file. We can open, use, all is well, except that the size of the saved file is quite a bit larger than we expected. I suspect there's something floating about in there that one of the packages we are using puts in, such as a spare copy of a data frame that is saved in some subtle way that has escaped my attention. Consider a list of objects. Are there ways to do these things: 1. ask R how much memory is used by the things inside the list? You can use object.size, but read the man page: it is not a completely well-defined question. 2. Does as.expression(anObject) print everything in there? Or, is there a better way to convert each thing to text or some other format that you can actually read line by line to see what is in there, to see everything? No, as.expression won't necessarily work. save(..., ascii=TRUE) will show you everything, but it's not designed to be readable. Probably the most useful function is str(). If there's no giant hidden data frame floating about, I figure I'll have to convert symmetric matrices to lower triangles or such to save space. Unless R already is automatically saving a matrix in that way but just showing me the full matrix, which I suppose is possible. If you have other ideas about general ways to make saved objects smaller, I'm open for suggestions. You could try different compression methods (see ?save), but probably the best idea is to identify the things that you didn't mean to include, and don't include those. A common way this happens is objects like functions or formulas that carry their environment with them. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SPSS - R
If you know SPSS already why not learn R modeling syntax and do this yourself? If ALPHA(.05) implies that you are using stepwise variable selection note that this is an invalid statistical technique. Frank Kristi Shoemaker wrote I'm an SPSS user trying to make the transition to R. Can someone help me translate the following SPSS code into R?: GLM Total_tp1 Total_tp2 WITH Age Sex /WSFACTOR=Time 2 Repeated /METHOD=SSTYPE(3) /CRITERIA=ALPHA(.05) /WSDESIGN= Time /DESIGN= Age Sex Age*Sex. Also. can anyone recommend any resources to help SPSS users learn to things in R? Thanks, -kristi [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/SPSS-R-tp4110995p4111006.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SPSS - R
Perhaps this website and the associated book will be of help: http://r4stats.com/ Michael On Nov 26, 2011, at 11:08 AM, Kristi Shoemaker kristi.shoema...@yahoo.com wrote: I'm an SPSS user trying to make the transition to R. Can someone help me translate the following SPSS code into R?: GLM Total_tp1 Total_tp2 WITH Age Sex � /WSFACTOR=Time 2 Repeated � /METHOD=SSTYPE(3) � /CRITERIA=ALPHA(.05) � /WSDESIGN= Time � /DESIGN= Age Sex Age*Sex. Also. can anyone recommend any resources to help SPSS users learn to things in R? Thanks, -kristi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SPSS - R
Dear Kristi, I assume that this is a repeated-measures ANOVA with one within-subjects factor (Time) and two between-subjects factors (Age and Sex, which are crossed). If Age is numeric, and not a factor, then the type-III tests that you requested don't test sensible hypotheses. In any event, if my guess is right about the design, then you can use the Anova() function in the car package for an equivalent analysis. See the repeated-measures example in ?Anova (for the O'Brien and Kaiser data). You've already had an answer to the more general question. I hope this helps, John John Fox Senator William McMaster Professor of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Kristi Shoemaker Sent: November-26-11 11:08 AM To: r-help@r-project.org Subject: [R] SPSS - R I'm an SPSS user trying to make the transition to R. Can someone help me translate the following SPSS code into R?: GLM Total_tp1 Total_tp2 WITH Age Sex /WSFACTOR=Time 2 Repeated /METHOD=SSTYPE(3) /CRITERIA=ALPHA(.05) /WSDESIGN= Time /DESIGN= Age Sex Age*Sex. Also. can anyone recommend any resources to help SPSS users learn to things in R? Thanks, -kristi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how big (in RAM and/or disk storage) is each of these objects in a list?
On Sat, 26 Nov 2011 12:41:08 -0600 Paul Johnson pauljoh...@gmail.com wrote: Greetings, friends (and others :) ) We generated a bunch of results and saved them in an RData file. We can open, use, all is well, except that the size of the saved file is quite a bit larger than we expected. I suspect there's something floating about in there that one of the packages we are using puts in, such as a spare copy of a data frame that is saved in some subtle way that has escaped my attention. Consider a list of objects. Are there ways to do these things: 1. ask R how much memory is used by the things inside the list? 2. Does as.expression(anObject) print everything in there? Or, is there a better way to convert each thing to text or some other format that you can actually read line by line to see what is in there, to see everything? If there's no giant hidden data frame floating about, I figure I'll have to convert symmetric matrices to lower triangles or such to save space. Unless R already is automatically saving a matrix in that way but just showing me the full matrix, which I suppose is possible. If you have other ideas about general ways to make saved objects smaller, I'm open for suggestions. As an initial step, what is the result of running ls() with your RData file loaded? You should get a list of what is in memory. Using RData files can be as space-efficient or costly as the user's habits. Did you use save() or the save.image() command to produce the file? The save.image() command stashes what is in memory and if you've run a number of experimental procedures that did not pan out and you did not discard with the results with rm(), they were saved to the rdata file along with the information you did want, a procedure rather like filing away all your work in a file drawer and then emptying the waste basket into the drawer as well. If you save the data with ascii = TRUE as an option, you can troll through the file and read what you saved. JWD __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Missing data?
Why do you need to use a frequency attribute for these data? The point of the zoo/xts line of time series implementations is that the time stamps are carried through for each observation (unlike ts) and can be irregular. Both classes exist precisely to avoid being forced into a frequency attribute. As far as setting up the time elements, wouldn't this work? Change the start date to get weeks on any desired day d - seq.Date(from = as.Date(2011-11-26), by = -7, length.out = 100) xts(rep(NA, length(d)), d) You can avoid the OHLC formatting of to.weekly if you want with the OHLC = FALSE parameter. And if you want to index it by the first of the week rather htan the last, just try this: time(x) - time(x) - 6 Michael On Tue, Nov 22, 2011 at 6:50 PM, Kevin Burton rkevinbur...@charter.net wrote: Void of any other suggestions this approach makes sense but for my case I think I need to use zoo objects rather than xts. If I sequence the data generally I don't know if there will be 365 days in the year or 366. So I have to sequence the dates as: seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31), by=day) If I use this sequence with xts I get: ds - xts(NA, seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31), by=day)) Error in xts(NA, seq(from = as.Date(2011-01-01), to = as.Date(2011-12-31), : NROW(x) must match length(order.by) If I leave the 'data' empty I don't get the error but if I try to assign an individual item (fill as appropriate) ds - xts(, seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31), by=day)) ds[2011-12-24] - 10 ds Error in structure(coredata(x), names = x.attr$dimnames[[1]]) : 'names' attribute [365] must be the same length as the vector [358] So now I need to remember that I have not filled in all of the data. Also simple dereferencing gives: ds[1] Error in `[.xts`(ds, 1) : subscript out of bounds With zoo I am able to create a time-series where all of the data is initially NA: ds - zoo(NA, seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31), by=day)) So I can fill the data as appropriate and the remaining slots will have NA. I may be new with xts but I cannot see a way of creating a useable 'blank' time-series. Also with xts it seems like the frequency is ignored. ds - xts(1:365, seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31), by=day), frequency=52) frequency(ds) [1] 1 Whereas zoo remembers the frequency setting ds - zoo(1:365, seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31), by=day), frequency=52) frequency(ds) [1] 52 But since the ultimate goal is to get the time-series in a 'ts' format (as many functions require 'ts') it seems like even zoo has problems: as.ts(ds) Time Series: Start = c(14975, 1) End = c(15339, 1) Frequency = 52 [1] 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [42] NA NA NA NA NA NA NA NA NA NA NA 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [83] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [124] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 4 NA NA NA NA NA NA NA [165] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA [206] . . . . . . So the conversion from zoo to ts maintained the frequency but I am not sure where it decided on the start and end values. Also the conversion seemed to changed the data also. Notice that every period (52 entries) the original data is maintained. In other words if ds is the original zoo time series then ds[1] is 1 and ds[2] is 2 etc. The converted time-series keeps ds[1] but inserts 51 NA's then adds ds[2] etc till the end of the series. That is not what the initial data was. The conversion is inserting data of its own. The conversion to ts from xts seems better behaved: ds - xts(1:365, seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31), by=day), frequency=52) as.ts(ds) Time Series: Start = 1 End = 365 Frequency = 1 [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 [43] 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 [85] 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 [127] 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141
Re: [R] Time series merge?
Try xts (tsObj, order.by=index (tsobj)) On Nov 26, 2011 10:57 AM, Kevin Burton rkevinbur...@charter.net wrote: I have two time series a - ts(1:10, start=c(1,6), end=c(2,5), frequency=10) b - ts(1:5, start=c(2,1), end=c(2,5), frequency=10) Obviously 'b' is a subset of 'a'. I want a single index value indicating where that start of 'b' lines up with the start of 'a'. So in this simple example I would expect an index of 5. I was playing with 'merge'. But, for a 'ts' object this does not produce anything that is useful: merge(a,b) x 1 1 2 2 3 3 4 4 5 5 I get the same answer if I use 'merge(b,a)' so I don't know how to convert this result to something useful. So then I decided to use 'xts'. But the conversion fails: ax - as.xts(a) Error in as.xts.ts(a) : could not convert index to appropriate type For this simple example I could code it myself using a simple for loop but if I add capability to handle missing dates, different frequencies, etc. it gets complicated very fast. It seems that 'xts' has more extensive date handling facilities that 'ts' but I am stuck since it doesn't look like I can convert from 'ts' to 'xts'. Thanks in advance for your suggestions. Kevin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Missing data?
On Tue, Nov 22, 2011 at 6:50 PM, Kevin Burton rkevinbur...@charter.net wrote: Void of any other suggestions this approach makes sense but for my case I think I need to use zoo objects rather than xts. If I sequence the data generally I don't know if there will be 365 days in the year or 366. So I have to sequence the dates as: seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31), by=day) If I use this sequence with xts I get: ds - xts(NA, seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31), by=day)) Error in xts(NA, seq(from = as.Date(2011-01-01), to = as.Date(2011-12-31), : NROW(x) must match length(order.by) If I leave the 'data' empty I don't get the error but if I try to assign an individual item (fill as appropriate) ds - xts(, seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31), by=day)) ds[2011-12-24] - 10 ds Error in structure(coredata(x), names = x.attr$dimnames[[1]]) : 'names' attribute [365] must be the same length as the vector [358] So now I need to remember that I have not filled in all of the data. Also simple dereferencing gives: ds[1] Error in `[.xts`(ds, 1) : subscript out of bounds With zoo I am able to create a time-series where all of the data is initially NA: ds - zoo(NA, seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31), by=day)) So I can fill the data as appropriate and the remaining slots will have NA. I may be new with xts but I cannot see a way of creating a useable 'blank' time-series. Also with xts it seems like the frequency is ignored. ds - xts(1:365, seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31), by=day), frequency=52) frequency(ds) [1] 1 Whereas zoo remembers the frequency setting ds - zoo(1:365, seq(from=as.Date(2011-01-01), to=as.Date(2011-12-31), by=day), frequency=52) frequency(ds) [1] 52 But since the ultimate goal is to get the time-series in a 'ts' format (as many functions require 'ts') it seems like even zoo has problems: The problem is that you seem to want a fixed number of periods per year but there is not a constant of 52 weeks nor 365 days in a year. You are going to have give up something since your apparent criteria conflict with reality. For example, you could use months in which case there are exactly 12 or you could stick more than 7 days into the first or last week of the year so that there are exactly 52 weeks in a year but they don't all have the same number of days, etc. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] computationally singular error with mice()
Hi Fei, I wouldn't worry to much about categorical variables for mice. Mice would use logisitic regression for binary and polytomous logistic regression for categorical variables with 2 levels. However, you should not include factors with a lot of levels, saying30, in imputation models because it would require a lot of dummy variables. Another thing is that not excluding variables you would use in substantive analysis. Otherwise, estimation would be biased. Weidong On Sat, Nov 26, 2011 at 12:07 PM, Fei fayechen0...@hotmail.com wrote: Hi Josh, Thanks for the kind reminder of posting the dataframe on. My dataframe contains lots of categorical variables, which seems to be problematic. For instance, dob status edu mrext married highschool yes, full time Do you know how to specify the imputation methods and the visitSquence so that those categorical variables are not involved in the imputation process? Thank you. Fei -- View this message in context: http://r.789695.n4.nabble.com/computationally-singular-error-with-mice-tp4109583p4110776.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question about randomForest
I've been using the R package randomForest but there is an aspect I cannot work out the meaning of. After calling the randomForest function, the returned object contains an element called prediction, which is the prediction obtained using all the trees (at least that's my understanding). I've checked that this prediction set has the error rate as reported by err.rate. However, if I send the training data back into the the predict.randomForest function I find I get a different result to the stored set of predictions. This is true for both classification and regression. I find the predictions obtained this way also have a much lower error rate and perform very well (suspiciously well...) on measures such as AUC. My understanding is that the two predictions above should be the same. Since they are not, I must be not understanding something properly. Any ideas what's going on? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] computationally singular error with mice()
Hi Weidong, Thank you for the clear explanation. You are right it is not the categorical variables that are causing the trouble. It might be the relatively small number of sample that causing the problem given so many variables. I tried to exclude some variables that are not essential to all the analyses I am going to conduct and get the commands run successfully. Thank you. -- View this message in context: http://r.789695.n4.nabble.com/computationally-singular-error-with-mice-tp4109583p4111304.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] simplify source code
Hi I would like to shorten mod1 - nls(ColName2 ~ ColName1, data = table, ...) mod2 - nls(ColName3 ~ ColName1, data = table, ...) mod3 - nls(ColName4 ~ ColName1, data = table, ...) ... is there something like cols = c(ColName2,ColName3,ColName4,...) for i in ... mod[i-1] - nls(ColName[i] ~ ColName1, data = table, ...) I am looking forward to help Christof __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] append to PDF file
Hi is there a way to append a plot as PDF to an existing PDF file? savePlot seems not to have this possibility. Christof __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] append to PDF file
PDF files contain information at the end of them and so you cannot append without altering the file (universally true for PDF). Perhaps pdf() your plots and use external tools to convert the PDFs to .ps then re-merge. Might not be the best way, but an effective one. Ken Hutchison On Nov 26, 2554 BE, at 5:38 PM, Christof Kluß ckl...@email.uni-kiel.de wrote: Hi is there a way to append a plot as PDF to an existing PDF file? savePlot seems not to have this possibility. Christof __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] append to PDF file
There is the 'pdftk' (PDF tool kit) that you will find on the web that will do the job. I have used it to both combine and split out the pages in the PDF file. On Sat, Nov 26, 2011 at 5:51 PM, Ken vicvoncas...@gmail.com wrote: PDF files contain information at the end of them and so you cannot append without altering the file (universally true for PDF). Perhaps pdf() your plots and use external tools to convert the PDFs to .ps then re-merge. Might not be the best way, but an effective one. Ken Hutchison On Nov 26, 2554 BE, at 5:38 PM, Christof Kluß ckl...@email.uni-kiel.de wrote: Hi is there a way to append a plot as PDF to an existing PDF file? savePlot seems not to have this possibility. Christof __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data.table merge equivalent for all.x
Dear all, I'm trying to use data.table to summarise a table and merge it to another table. Here is what I would like to do, but by using data.table() in a proper way. library(data.table) tab1 - data.table(ID = 11:20, A = rnorm(10), D = 1:10, key = ID) tab2 - data.table(ID2 = 1:10, D = rep(1:5, 2), B = rnorm(10), key = ID2) junk - aggregate(tab2[, B], by = list(D = tab2[, D]), FUN = sum) merge(tab1, junk, by = D, all.x = TRUE) This my attempt using data.table() junk - tab2[, mean(B), by = D] tab1[junk] Best regards, Thierry [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] computationally singular error with mice()
Hi Fei, On Sat, Nov 26, 2011 at 9:07 AM, Fei fayechen0...@hotmail.com wrote: Hi Josh, Thanks for the kind reminder of posting the dataframe on. My dataframe contains lots of categorical variables, which seems to be problematic. For instance, dob status edu mrext married highschool yes, full time Still not exactly a useable dataset, but here is a snippet of code I used: ## # Multiple Imputation Model # ## ## specify the predictor matrix for the imputation pred.matrix - rbind( VFQRoleDifficulties1 = c(0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1), MOODVision1 = c(1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1), MOODImpact1 = c(1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1), [snip] SocialFunctioning1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1), RoleEmotional1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1), MentalHealth1 =c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0)) ## set rownames to column names of the data (this is a square matrix) colnames(pred.matrix) - colnames(dat) ## Set the methods used to impute each variable imp.method - c( VFQRoleDifficulties1 = pmm, MOODVision1 = pmm, MOODImpact1 = pmm, [snip] SocialFunctioning1 = pmm, RoleEmotional1 = pmm, MentalHealth1 = pmm ) ## Create multiply imputed dataset datimp - mice(data = dat, m = 500, method = imp.method, predictorMatrix = pred.matrix, seed = 1, print = FALSE) Basically you can write a k x k matrix where k is the number of variables in your dataset. This can control what variables are used in the imputation model for each variable (all 0s would mean no variables). You can also pass a k length character vector controlling the method used for each variable. You can also control the order mice goes in. Cheers, Josh Do you know how to specify the imputation methods and the visitSquence so that those categorical variables are not involved in the imputation process? Thank you. Fei -- View this message in context: http://r.789695.n4.nabble.com/computationally-singular-error-with-mice-tp4109583p4110776.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simplify source code
Hi: Here's one way you could do it. I manufactured some fake data with a simple model to illustrate. This assumes you are using the same model formula with the same starting values and remaining arguments for each response. dg - data.frame(x = 1:10, y1 = sort(abs(rnorm(10))), y2 = sort(abs(rnorm(10))), y3 = sort(abs(rnorm(10 # Model: y = b0 + b1 exp(x/theta) vars - c('y1', 'y2', 'y3') # Function to create the model formula by plugging in the # response y and run the model mfun - function(y) { form - as.formula(paste(y, 'cbind(1, exp(x/th))', sep = ' ~ ')) nls(form, data = dg, start = list(th = 0.3), algorithm = 'plinear') } # Generate a list of model objects: mlist - lapply(vars, mfun) # To see what they contain: str(mlist[[1]]) str(summary(mlist[[1]])) # Extract a few features from each: # The first two return matrices, the third returns a list do.call(rbind, lapply(mlist, function(m) coef(m))) do.call(rbind, lapply(mlist, function(m) deviance(m))) lapply(mlist, function(m) summary(m)$cov.unscaled) To get more control over the output format, the plyr package can come in handy. For example, to get data frames for the first two extractions above, one would do library('plyr') ldply(mlist, function(m) coef(m)) ldply(mlist, function(m) deviance(m)) # ldply() means list input, data frame output (ld). # For the third extraction, one has a list input and a list output: llply(mlist, function(m) summary(m)$cov.unscaled) HTH, Dennis On Sat, Nov 26, 2011 at 2:30 PM, Christof Kluß ckl...@email.uni-kiel.de wrote: Hi I would like to shorten mod1 - nls(ColName2 ~ ColName1, data = table, ...) mod2 - nls(ColName3 ~ ColName1, data = table, ...) mod3 - nls(ColName4 ~ ColName1, data = table, ...) ... is there something like cols = c(ColName2,ColName3,ColName4,...) for i in ... mod[i-1] - nls(ColName[i] ~ ColName1, data = table, ...) I am looking forward to help Christof __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about randomForest
Hi Matthew, The error rate reported by randomForest is the prediction error based on out-of-bag OOB data. Therefore, it is different from prediction error on the original data since each tree was built using bootstrap samples (about 70% of the original data), and the error rate of OOB is likely higher than the prediction error of the original data as you observed. Weidong On Sat, Nov 26, 2011 at 3:02 PM, Matthew Francis mattjamesfran...@gmail.com wrote: I've been using the R package randomForest but there is an aspect I cannot work out the meaning of. After calling the randomForest function, the returned object contains an element called prediction, which is the prediction obtained using all the trees (at least that's my understanding). I've checked that this prediction set has the error rate as reported by err.rate. However, if I send the training data back into the the predict.randomForest function I find I get a different result to the stored set of predictions. This is true for both classification and regression. I find the predictions obtained this way also have a much lower error rate and perform very well (suspiciously well...) on measures such as AUC. My understanding is that the two predictions above should be the same. Since they are not, I must be not understanding something properly. Any ideas what's going on? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data.table merge equivalent for all.x
Hi: There may well be a more efficient way to do this, but here's one take. library('data.table') # Want to merge by D in the end, so set D as part of the key: t1 - data.table(ID = 11:20, A = rnorm(10), D = 1:10, key = ID, D) t2 - data.table(ID2 = 1:10, D = rep(1:5, 2), B = rnorm(10), key = ID2, D) # The J expression produces sums of B (the non-key variable) for each D group # .SD denotes 'sub-data'. The result 'junk' is a data table. junk - t2[, lapply(.SD, sum), by = D] tables() # junk has no key # set a key for junk so that it can be merged setkey(junk, 'D') # t1 and junk have a common key variable D, so the left join is merge(t1, junk, by = 'D', all.x = TRUE) # check against t1 junk HTH, Dennis On Sat, Nov 26, 2011 at 3:59 PM, ONKELINX, Thierry thierry.onkel...@inbo.be wrote: Dear all, I'm trying to use data.table to summarise a table and merge it to another table. Here is what I would like to do, but by using data.table() in a proper way. library(data.table) tab1 - data.table(ID = 11:20, A = rnorm(10), D = 1:10, key = ID) tab2 - data.table(ID2 = 1:10, D = rep(1:5, 2), B = rnorm(10), key = ID2) junk - aggregate(tab2[, B], by = list(D = tab2[, D]), FUN = sum) merge(tab1, junk, by = D, all.x = TRUE) This my attempt using data.table() junk - tab2[, mean(B), by = D] tab1[junk] Best regards, Thierry [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Boxplot
I'm trying to do the second case among Jim's suggestions. I used Bert's suggestion and it works great. I would also like to ask if anyone is familiar with a package for making box-plots. I would like to bin my datapoints at defined X intervals and display a boxplot for each bin on the same chart. In Stata, there is a tool for making these, and it varies the width of the boxplot based on the number of points in each plot. I am hoping there is a similar tool for R. Thank you, Jeffrey Date: Tue, 22 Nov 2011 18:51:05 +1100 From: j...@bitwrit.com.au To: johjeff...@hotmail.com CC: r-help@r-project.org Subject: Re: [R] Binned line plot On 11/22/2011 04:29 PM, Jeffrey Joh wrote: I have a scatter plot with 1 points. I would like to add a line that bins every 50 points and connects the average of each bin. I'm looking for something similar to line type m in Stata. With this dataset of 1 points, I would also like to bin the data and make boxplots at certain intervals, so that I have a set of boxplots to represent each bin. I would also like the width of each box to be proportional to the number of points in each bin. How can I make these plots? Is there a simple package to use? Hi Jeffrey, There are three possibilities that come to mind: 1) You want to bin the points based on their order in the data frame. 2) You want to bin the points based on the x or y values of the coordinates. 3) You want to bin the points based on the x _and_ y values of the coordinates. Number 1 is trivial and has already been answered (assume a two column data frame of coordinates named xypoints). #first point - set up a loop to get a vector of averages meanx-rep(0,200) meany-rep(0,200) for(index in 1:200) { start-1+50*(index-1) meanx[index]-mean(xypoints[start:(start+49),x]) meany[index]-mean(xypoints[start:(start+49),y]) } plot(meanx,meany,type=l) Number 2 requires that you sort the pairs based on the value of the one you want, then apply the same process as 1 to the sorted pairs. Number 3 is somewhat more difficult. I don't do this much, and some of the people who do map analysis will probably come up with a much better method. Find the most extreme point. Find the 49 points closest to that point to constitute group 1. Remove those points from the data frame. Go back to the first step if there are any points left. You will end up with 200 groups of points that are spatially grouped. Get the centroids and plot as above. Another wild guess from Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Boxplot
On Nov 27, 2011, at 12:15 AM, Jeffrey Joh wrote: I'm trying to do the second case among Jim's suggestions. I used Bert's suggestion and it works great. I would also like to ask if anyone is familiar with a package for making box-plots. I would like to bin my datapoints at defined X intervals and display a boxplot for each bin on the same chart. Combining `cut` (to define the intervals) and `boxplot` should be fairly straight-forward. In Stata, there is a tool for making these, and it varies the width of the boxplot based on the number of points in each plot. We have a tool for that, too. Study `quantile` a bit, to automatically pick cutpoints that will divide into approximately equal groups. (I use the `cut2` function in the Hmisc package, because it is integrated with `rms` that I use all the time, and because its defaults for cut()-ting are more to my liking. It also has a g= parameter that automates the cut( ..., quantile(...)) processing. I am hoping there is a similar tool for R. Thank you, Jeffrey Date: Tue, 22 Nov 2011 18:51:05 +1100 From: j...@bitwrit.com.au To: johjeff...@hotmail.com CC: r-help@r-project.org Subject: Re: [R] Binned line plot On 11/22/2011 04:29 PM, Jeffrey Joh wrote: I have a scatter plot with 1 points. I would like to add a line that bins every 50 points and connects the average of each bin. I'm looking for something similar to line type m in Stata. With this dataset of 1 points, I would also like to bin the data and make boxplots at certain intervals, so that I have a set of boxplots to represent each bin. I would also like the width of each box to be proportional to the number of points in each bin. How can I make these plots? Is there a simple package to use? Hi Jeffrey, There are three possibilities that come to mind: 1) You want to bin the points based on their order in the data frame. 2) You want to bin the points based on the x or y values of the coordinates. 3) You want to bin the points based on the x _and_ y values of the coordinates. Number 1 is trivial and has already been answered (assume a two column data frame of coordinates named xypoints). #first point - set up a loop to get a vector of averages meanx-rep(0,200) meany-rep(0,200) for(index in 1:200) { start-1+50*(index-1) meanx[index]-mean(xypoints[start:(start+49),x]) meany[index]-mean(xypoints[start:(start+49),y]) } plot(meanx,meany,type=l) Number 2 requires that you sort the pairs based on the value of the one you want, then apply the same process as 1 to the sorted pairs. Number 3 is somewhat more difficult. I don't do this much, and some of the people who do map analysis will probably come up with a much better method. Find the most extreme point. Find the 49 points closest to that point to constitute group 1. Remove those points from the data frame. Go back to the first step if there are any points left. You will end up with 200 groups of points that are spatially grouped. Get the centroids and plot as above. Another wild guess from Jim David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf if iif
Dear all, I have problems with iif function using sqldf library. I counted abundance (Num) of different SPECIES in two moments (esf) saving the information in two Tables (esf50, esf100): esf50 SAMPLE SPECIES Num esf 1289diso1 44 50 1289diso2 5 50 1289diso3 1 50 diso1 44 50 diso2 5 50 diso3 1 50 esf100 SAMPLE SPECIES Num esf 1289diso1 82 100 1289diso2 13 100 1289diso3 2 100 1289diso4 3 100 diso1 82 100 diso2 13 100 diso3 2 100 diso4 3 100 I would like subtract column Num between the two moments considering only the changes, therefore I use the conditional if: var100-sqldf(select esf100.SAMPLE, esf100.SPECIES, esf100.Num, esf100.esf, iif esf100.Num - esf50.Num =0, esf100.Num-esf50.Num, esf100.Num as PIPAS from esf100 left join esf50 on esf100.SAMPLE = esf50.SAMPLE and esf100.SPECIES = esf50.SPECIES) I think the structure is right because the SQL query run ok in Access. Is the if syntax the problems? Thank in advanced. Best wishes, Carlos Rivera [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] generating a vector of y_t = \sum_{i = 1}^t (alpha^i * x_{t - i + 1})
Dear R-help, I have been trying really hard to generate the following vector given the data (x) and parameter (alpha) efficiently. Let y be the output list, the aim is to produce the the following vector(y) with at least half the time used by the loop example below. y[1] = alpha * x[1] y[2] = alpha^2 * x[1] + alpha * x[2] y[3] = alpha^3 * x[1] + alpha^2 * x[2] + alpha * x[3] . below are the methods I have tried and failed miserably, some are just totally ridiculous so feel free to have a laugh but would appreciate if someone can give me a hint. Otherwise I guess I'll have to give RCpp a try. ## Bench mark the recursion functions loopRec - function(x, alpha){ n - length(x) y - double(n) for(i in 1:n){ y[i] - sum(cumprod(rep(alpha, i)) * rev(x[1:i])) } y } loopRec(c(1, 2, 3), 0.5) ## This is a crazy solution, but worth giving it a try. charRec - function(x, alpha){ n - length(x) exp.mat - matrix(rep(x, each = n), nc = n, byrow = TRUE) up.mat - matrix(eval(parse(text = paste(c(, paste(paste(paste(rep(0, , 0:(n - 1), ), sep = ), paste(cumprod(rep(, alpha, ,, n:1, )), sep = ) , sep = ,), collapse = ,), ), sep = ))), nc = n, byrow = TRUE) colSums(up.mat * exp.mat) } vecRec(c(1, 2, 3), 0.5) ## Sweep is slow, shouldn't use it. matRec - function(x, alpha){ n - length(x) exp.mat - matrix(rep(x, each = n), nc = n, byrow = TRUE) up.mat - sweep(matrix(cumprod(rep(alpha, n)), nc = n, nr = n, byrow = TRUE), 1, c(1, cumprod(rep(1/alpha, n - 1))), FUN = *) up.mat[lower.tri(up.mat)] - 0 colSums(up.mat * exp.mat) } matRec(c(1, 2, 3), 0.5) matRec2 - function(x, alpha){ n - length(x) exp.mat - matrix(rep(x, each = n), nc = n, byrow = TRUE) up.mat1 - matrix(cumprod(rep(alpha, n)), nc = n, nr = n, byrow = TRUE) up.mat2 - matrix(c(1, cumprod(rep(1/alpha, n - 1))), nc = n, nr = n) up.mat - up.mat1 * up.mat2 up.mat[lower.tri(up.mat)] - 0 colSums(up.mat * exp.mat) } matRec2(c(1, 2, 3), 0.5) ## Check whether value is correct all.equal(loopRec(1:1000, 0.5), vecRec(1:1000, 0.5)) all.equal(loopRec(1:1000, 0.5), matRec(1:1000, 0.5)) all.equal(loopRec(1:1000, 0.5), matRec2(1:1000, 0.5)) ## benchmark the functions. benchmark(loopRec(1:1000, 0.5), vecRec(1:1000, 0.5), matRec(1:1000, 0.5), matRec2(1:1000, 0.5), replications = 50, order = relative) Thank you very much for your help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nnet plot
good night Again I ask for help to the community, as I am new at this, I have some basic questions. I am looking for packages on neural networks and so you can search found these two that I think are the most used, neuralnet, nnet. So you can test, and correct me if I'm wrong the neuralnet only accepts as input values nomer, did a little test data (iris) library (neuralnet) Species.numeric - as.numeric (iris $ Species) iris.df - data.frame (iris, Species.numeric) net - neuralnet (~ Species.numeric Sepal.Width Sepal.Length + + + Petal.Width Petal.Length, iris.df, hidden = 2) options (device = windows) plot (net) net I think the net library supports all type of data. library (nnet) library (nnet) RN - nnet (iris $ Species ~., Data = iris, size = 3, rang = 0.1, decay = 0.01, maxit = 20) plot (RN) my question is how this package can enter all the input attributes. and how can I draw a sketch of the network similar to that of neuralnet, or how I can put all the attributes not transform in the numeric neuralnet. Is there a more effective package of neural networks. thank you -- View this message in context: http://r.789695.n4.nabble.com/nnet-plot-tp4111620p4111620.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] computationally singular error with mice()
Hi Josh, You opened the blackbox up to me. Now I know what is the right way to go. Thank you so much! Best, Fei -- View this message in context: http://r.789695.n4.nabble.com/computationally-singular-error-with-mice-tp4109583p4111537.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf if iif
sqldf uses the SQLite database by default for backend processing. The iif function is specific to the Jet database engine syntax (which underlies MS Access). You could read up on SQLite syntax, or you could avoid using nonstandard SQL syntax, retrieve the data into a data frame, and use R code to do your logical merging into one column. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Carlos Rivera limnoriv...@gmail.com wrote: Dear all, I have problems with iif function using sqldf library. I counted abundance (Num) of different SPECIES in two moments (esf) saving the information in two Tables (esf50, esf100): esf50 SAMPLE SPECIES Num esf 1289diso1 44 50 1289diso2 5 50 1289diso3 1 50 diso1 44 50 diso2 5 50 diso3 1 50 esf100 SAMPLE SPECIES Num esf 1289diso1 82 100 1289diso2 13 100 1289diso3 2 100 1289diso4 3 100 diso1 82 100 diso2 13 100 diso3 2 100 diso4 3 100 I would like subtract column Num between the two moments considering only the changes, therefore I use the conditional if: var100-sqldf(select esf100.SAMPLE, esf100.SPECIES, esf100.Num, esf100.esf, iif esf100.Num - esf50.Num =0, esf100.Num-esf50.Num, esf100.Num as PIPAS from esf100 left join esf50 on esf100.SAMPLE = esf50.SAMPLE and esf100.SPECIES = esf50.SPECIES) I think the structure is right because the SQL query run ok in Access. Is the if syntax the problems? Thank in advanced. Best wishes, Carlos Rivera [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] tikzDevice and sans serif
Aloha all, I haven't been able to find how to choose the font used by tikzDevice. My first tries have all been set with a serif font and I'd like to have them use the sans serif font instead. I've looked through the documentation and googled a bit without success. Is this possible? Can someone point me to instructions? All the best, Tom -- Thomas S. Dye http://www.tsdye.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.