Re: [R] different results in MASS's mca and SAS's corresp
When I came to David's comment, I understood the theory, but not the numbers in his answer. I wanted to see the MASS mca answers match up with SAS, and the example did not (yet). I am inclined to write, O yea of little faith. David showed perfectly well that when the results of the two algorithms are put on the same scale (z-scored) then they are identical to 4th decimal place (except for a change of sign in the second dimension, which is of no import). Surely that is miracle-enough? Mark. -- View this message in context: http://r.789695.n4.nabble.com/different-results-in-MASS-s-mca-and-SAS-s-corresp-tp3261494p3262600.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Legend outside the plot? xpd?
On Feb 6, 2011, at 2:41 AM, Matt Cooper wrote: Hi All, BG: Will try be brief. I'd like 3 graphs on a page (below each other mfrow=c(3,1)), saved to pdf. The three plot data on the same subject so I'm having one legend, to the right of the center graph. I'm using mar=c(5,15,4,15) to bring the sides in so that the graphs are square and not stretched wide. To have the graph to the side I'm thinking xpd=T. Each graph has a number of points and lines overlaid, so plot ();lines(),lines(),lines() etc. The data is somewhat of a subset of a larger set, so I'm limiting what is being displayed. The lines plot trends of the wider data. P: With xpd=T, the lines are going right out of the graph boxes to the outer limit of the plot boundaries, as would be the intended behaviour of xpd. Goes without saying that this is undesirable. Q: Is there a better way to achieve what I want? At this stage I have xpd=T in the par(), and xpd=F in the plot() commands and all the line commands, just so I can get the legend to actually print... Why not just use xpd=T inside legend()? Aside: Given the way I've done this, the lines() are all clipped RIGHT at the limit of the plot box. So given the plot is called first these lines leave little coloured dashes on the black of the plot box. To sort this I've called box() at the end. This all seems very redundant? Thanks Matt -- mattcst...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Legend outside the plot? xpd?
On Sun, Feb 6, 2011 at 3:56 PM, David Winsemius dwinsem...@comcast.netwrote: Why not just use xpd=T inside legend()? David Winsemius, MD West Hartford, CT I had a feeling I'd look stupid after sending that. Thanks, works great. For some reason I thought if xpd would have to be set to T at the par() level to ever be able to get anything outside of a plot square. Alas. Thanks David Matt [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] manipulate dataframe
Hello, Can someone give me hint to change a data.frame. I want to split a column in more columns depending on the value of a other column. Thanks for the reaction, Andre Example: dat x1 x2 1 1 a 2 1 b 3 1 c 4 2 d 5 2 e 6 2 f 7 3 g 8 3 h 9 3 i in dur d1 d2 d3 1 a d g 2 b e h 3 c f i [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] anova() interpretation and error message
Hi there, I have a data frame as listed below: Ca.P.Biomass.A P Biomass 1 334.5567 0.287 2 737.5400 0.571 3 894.5300 0.639 4 782.3800 0.5836667 5 857.5900 0.600 6 829.2700 0.588 I have fit the data using logistic, Michaelis–Menten, and linear model, they all give significance. fm1 - nls(Biomass~SSlogis(P, phi1, phi2, phi3), data=Ca.P.Biomass.A) fm2 - nls(Biomass~SSmicmen(P, phi1, phi2), data=Ca.P.Biomass.A) fm3 - lm(Biomass~P, data = Ca.P.Biomass.A) I hope to compare the difference among the three models, and I using anova(). As for the example here, the three models seem not have significant difference. However, I am confused by the negative df in the following ANOVA table. And my question is how to interpret the results, if the Pr 0.05. anova(fm1,fm2,fm3) Analysis of Variance Table Model 1: Biomass ~ SSlogis(P, phi1, phi2, phi3) Model 2: Biomass ~ SSmicmen(P, phi1, phi2) Model 3: Biomass ~ P Res.Df Res.Sum Sq Df Sum Sq F value Pr(F) 1 3 0.00063741 2 4 0.00087249 -1 -0.00023508 1.1064 0.3701 3 4 0.00142751 0 0. And when the argument position changed, the anova() give different results. It seems the anova() compare the first model with all other models. anova(fm2,fm1,fm3) Analysis of Variance Table Model 1: Biomass ~ SSmicmen(P, phi1, phi2) Model 2: Biomass ~ SSlogis(P, phi1, phi2, phi3) Model 3: Biomass ~ P Res.Df Res.Sum Sq Df Sum Sq F value Pr(F) 1 4 0.00087249 2 3 0.00063741 1 0.00023508 1.1064 0.3701 3 4 0.00142751 -1 -0.00079010 3.7187 0.1494 When I put the fm3, a linear model, in the first position, and two nls model following it, anova() give the following error message. It seems abnormal. anova(fm3,fm1,fm2) Analysis of Variance Table Response: Biomass Df Sum Sq Mean Sq F valuePr(F) P 1 0.081163 0.081163 227.43 0.0001127 *** Residuals 4 0.001428 0.000357 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Warning message: In anova.lmlist(object, ...) : models with response c(NULL, NULL) removed because response differs from model 1 Any suggestions and comments will be really appreciated. Thanks in advance. Regards, Jinsong __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] boot() versus loop, and statistics option
Hello R users I am quite new to bootstrapping. Now, having some data x, R: set.seed(1234) R: x - runif(300) I want to bootstrap simple statistics, mean and quantiles (.025, .975). Currently, I run a loop R: res - as.data.frame(matrix(ncol = 3, dimnames = list(NULL, ...c(M, Lo, Hi R: for (i in 1:100) { ...y - x[sample(1:length(x), length(x), repl = T)] ...res[i, ] - c(mean(y), quantile(y, c(0.025, 0.975))) ...} and then apply mean() R: apply(res, 2, mean) M Lo Hi 0.49377715 0.03089873 0.98120235 to get the indices of interest. I found the package 'boot' with the function of the same name. I tried to replicate my tiny simulation using this code: R: library(boot) R: myfun - function(x) { ...return(c(mean(x), quantile(x, c(0.025, 0.975 ...} R: boot(x, myfun, 100, sim = parametric) PARAMETRIC BOOTSTRAP Call: boot(data = x, statistic = myfun, R = 100, sim = parametric) Bootstrap Statistics : original biasstd. error t1* 0.48925194 0 0 t2* 0.02806586 0 0 t3* 0.98335435 0 0 The outcome looks quite similar to what my loop returned, so that would be fine. Yet, there is three things I don't understand: (1) I have to use the option 'sim=parametric'. If I don't use this option the function (provided via the statistic option) requires a second argument, which -- according to '?boot' will be a vector of indices, frequencies or weights which define the bootstrap sample. What is that? Or is my simulation simply parametric? Why? (2) What are the advantages and/or disadvantages of 'boot()' over my loop? (3) Can I in principle use 'boot()' to return all of the 100 different data vectors used in the loop, or does 'boot()' by default return already-calculated statistics? Thanks for hints and help, *S* -- Sascha Vieweg, saschav...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confidence interval based on MLE
Hi there, I have fitted a sample (with size 20) to a normal and/or logistic distribution using fitdistr() in MASS or fitdist() in fitdistrplus package. It's easy to get the parameter estimates. Now, I hope to report the confidence interval for those parameter estimates. However, I don't find a function that could give the confidence interval in R. I hope to write a function, however, I don't find some detailed information on the CI based on MLE. Would you please to give me some hints on the CI calculation based on MLE? Any suggestions will be really appreciated. Thanks in advance. Regards, Jinsong __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] boot() versus loop, and statistics option
Package boot is support software for a book: have you consulted it? It answers all your questions, and has copious examples. On Sun, 6 Feb 2011, Sascha Vieweg wrote: Hello R users I am quite new to bootstrapping. Now, having some data x, R: set.seed(1234) R: x - runif(300) I want to bootstrap simple statistics, mean and quantiles (.025, .975). Currently, I run a loop R: res - as.data.frame(matrix(ncol = 3, dimnames = list(NULL, ...c(M, Lo, Hi R: for (i in 1:100) { ...y - x[sample(1:length(x), length(x), repl = T)] ...res[i, ] - c(mean(y), quantile(y, c(0.025, 0.975))) ...} and then apply mean() R: apply(res, 2, mean) M Lo Hi 0.49377715 0.03089873 0.98120235 to get the indices of interest. I found the package 'boot' with the function of the same name. I tried to replicate my tiny simulation using this code: R: library(boot) R: myfun - function(x) { ...return(c(mean(x), quantile(x, c(0.025, 0.975 ...} R: boot(x, myfun, 100, sim = parametric) PARAMETRIC BOOTSTRAP Call: boot(data = x, statistic = myfun, R = 100, sim = parametric) Bootstrap Statistics : original biasstd. error t1* 0.48925194 0 0 t2* 0.02806586 0 0 t3* 0.98335435 0 0 The outcome looks quite similar to what my loop returned, so that would be fine. Yet, there is three things I don't understand: (1) I have to use the option 'sim=parametric'. If I don't use this option the function (provided via the statistic option) requires a second argument, which -- according to '?boot' will be a vector of indices, frequencies or weights which define the bootstrap sample. What is that? Or is my simulation simply parametric? Why? (2) What are the advantages and/or disadvantages of 'boot()' over my loop? (3) Can I in principle use 'boot()' to return all of the 100 different data vectors used in the loop, or does 'boot()' by default return already-calculated statistics? Thanks for hints and help, *S* -- Sascha Vieweg, saschav...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] manipulate dataframe
Here is one way of doing it: x x1 x2 row 1 1 a 1 2 1 b 2 3 1 c 3 4 2 d 1 5 2 e 2 6 2 f 3 7 3 g 1 8 3 h 2 9 3 i 3 # create indices for new table x$row - ave(x$x1, x$x1, FUN=seq_along) # create output matrix result - matrix('', max(x$row), max(x$x1)) result[cbind(x$row, x$x1)] - x$x2 result [,1] [,2] [,3] [1,] a d g [2,] b e h [3,] c f i On Sun, Feb 6, 2011 at 6:13 AM, André de Boer rnie...@gmail.com wrote: Hello, Can someone give me hint to change a data.frame. I want to split a column in more columns depending on the value of a other column. Thanks for the reaction, Andre Example: dat x1 x2 1 1 a 2 1 b 3 1 c 4 2 d 5 2 e 6 2 f 7 3 g 8 3 h 9 3 i in dur d1 d2 d3 1 a d g 2 b e h 3 c f i [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] manipulate dataframe
Hi André, try this: df1 - data.frame(x1 = rep(1:3, each=3), x2=letters[1:9]) dfs - split(df1, df1$x1) df2 - data.frame(sapply(dfs, FUN=[[, x2)) colnames(df2) - paste(d, unique(df1$x1), sep=) df2 HTH Patrick Am 06.02.2011 12:13, schrieb André de Boer: Hello, Can someone give me hint to change a data.frame. I want to split a column in more columns depending on the value of a other column. Thanks for the reaction, Andre Example: dat x1 x2 1 1 a 2 1 b 3 1 c 4 2 d 5 2 e 6 2 f 7 3 g 8 3 h 9 3 i in dur d1 d2 d3 1 a d g 2 b e h 3 c f i [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confidence interval based on MLE
Jinsong Zhao jszhao at yeah.net writes: Hi there, I have fitted a sample (with size 20) to a normal and/or logistic distribution using fitdistr() in MASS or fitdist() in fitdistrplus package. It's easy to get the parameter estimates. Now, I hope to report the confidence interval for those parameter estimates. However, I don't find a function that could give the confidence interval in R. I hope to write a function, however, I don't find some detailed information on the CI based on MLE. Would you please to give me some hints on the CI calculation based on MLE? Well, for the normal distribution I believe that the standard-error- based confidence intervals are the same as those based on the MLE, but in general I would suggest something along these lines: library(bbmle) z - rnorm(20) m - mle2(z~dnorm(mean=mu,sd=sd),start=list(mu=0,sd=1),data=data.frame(z)) Warning message: In dnorm(x, mean, sd, log) : NaNs produced confint(m) Profiling... 2.5 % 97.5 % mu -0.07880835 0.985382 sd 0.87314467 1.633600 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] manipulate dataframe
another option is to use reshape() x-data.frame(x1=rep(1:3,each=3),x2=letters[1:9]) x$id-rep(1:3,3) dur-reshape(x,timevar=x1,idvar=id,direction=wide) dur-dur[,-1] colnames(dur) - paste(d, unique(x$x1), sep=) dur cheers, Victor _ Em 06/02/2011 12:50, Patrick Hausmann patrick.hausm...@uni-bremen.de escreveu: Hi André, try this: df1 - data.frame(x1 = rep(1:3, each=3), x2=letters[1:9]) dfs - split(df1, df1$x1) df2 - data.frame(sapply(dfs, FUN=[[, x2)) colnames(df2) - paste(d, unique(df1$x1), sep=) df2 HTH Patrick Am 06.02.2011 12:13, schrieb André de Boer: Hello, Can someone give me hint to change a data.frame. I want to split a column in more columns depending on the value of a other column. Thanks for the reaction, Andre Example: dat x1 x2 1 1 a 2 1 b 3 1 c 4 2 d 5 2 e 6 2 f 7 3 g 8 3 h 9 3 i in dur d1 d2 d3 1 a d g 2 b e h 3 c f i [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] anova() interpretation and error message
See comments inline. On 2011-02-06 03:17, Jinsong Zhao wrote: Hi there, I have a data frame as listed below: Ca.P.Biomass.A P Biomass 1 334.5567 0.287 2 737.5400 0.571 3 894.5300 0.639 4 782.3800 0.5836667 5 857.5900 0.600 6 829.2700 0.588 I have fit the data using logistic, Michaelis–Menten, and linear model, they all give significance. fm1- nls(Biomass~SSlogis(P, phi1, phi2, phi3), data=Ca.P.Biomass.A) fm2- nls(Biomass~SSmicmen(P, phi1, phi2), data=Ca.P.Biomass.A) fm3- lm(Biomass~P, data = Ca.P.Biomass.A) I hope to compare the difference among the three models, and I using anova(). As for the example here, the three models seem not have significant difference. However, I am confused by the negative df in the following ANOVA table. And my question is how to interpret the results, if the Pr 0.05. anova(fm1,fm2,fm3) Analysis of Variance Table Model 1: Biomass ~ SSlogis(P, phi1, phi2, phi3) Model 2: Biomass ~ SSmicmen(P, phi1, phi2) Model 3: Biomass ~ P Res.Df Res.Sum Sq Df Sum Sq F value Pr(F) 1 3 0.00063741 2 4 0.00087249 -1 -0.00023508 1.1064 0.3701 3 4 0.00142751 0 0. Read the Details section in help(anova.lm) to see why you get negative DF values. The reason is simply that 3 - 4 = -1. Compare anova(fm1, fm2) with anova(fm2, fm1) That's why I prefer to list my models in order of increasing complexity. Note also that for interpretation models should be nested. Yours aren't. And when the argument position changed, the anova() give different results. It seems the anova() compare the first model with all other models. anova(fm2,fm1,fm3) Analysis of Variance Table Model 1: Biomass ~ SSmicmen(P, phi1, phi2) Model 2: Biomass ~ SSlogis(P, phi1, phi2, phi3) Model 3: Biomass ~ P Res.Df Res.Sum Sq Df Sum Sq F value Pr(F) 1 4 0.00087249 2 3 0.00063741 1 0.00023508 1.1064 0.3701 3 4 0.00142751 -1 -0.00079010 3.7187 0.1494 When I put the fm3, a linear model, in the first position, and two nls model following it, anova() give the following error message. It seems abnormal. Not so strange. If you look at methods(anova) you will see that there are different functions for different classes of model. So, if your first model is of class nls, then anova.nls will be used; for a first model of class lm anova.lm is used. The main thing is not to mix models of different class. If you want a linear model to compare with the nonlinear ones, use nls() to estimate the linear model. But you still have the problem of non-nestedness. anova(fm3,fm1,fm2) Analysis of Variance Table Response: Biomass Df Sum Sq Mean Sq F valuePr(F) P 1 0.081163 0.081163 227.43 0.0001127 *** Residuals 4 0.001428 0.000357 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Warning message: In anova.lmlist(object, ...) : models with response c(NULL, NULL) removed because response differs from model 1 Any suggestions and comments will be really appreciated. Thanks in advance. After the above comments, I have one more: You have 6 observations and you want to fit relatively complex models whose estimated coefficients will be *extremely* sensitive to *one* of your observations (I assume that you have looked at a plot of the data). The only way this could make any sense is if there is well established theory that specifies one particular model and you want to check that your data are at least not obviously inconsistent with that theory. Fishing through several models with six data points makes no sense. I think that the best you can conclude from your data is that Biomass probably increases with P. Peter Ehlers Regards, Jinsong __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] random interaction effect in lmer
Hi dears while modeling an interaction random effect in lmer i receive the instantaneous error message ldlM4-lmer(ldl~rt*cd4+age+rf+pharmac+factor(hcv)+ + hivdur+(rt:cd4|id),na.action=na.omit,REML=F) *Warning message: In mer_finalize(ans) : false convergence (8) * I think the matter lies in syntax, 'cause i sistematically receive the same message even when changing response... PS: the above model runs quite well in lme and only rarely I recive : * Errore in lme.formula(fixed = ldl ~ rt + cd4 + age + rf + pharmac + factor(hcv) + : nlminb problem, convergence error code = 1 message = iteration limit reached without convergence (9) * ..reason for which I switch into lmer Thank u in advance foe any hints...;) -- *Little u can do against ignorance,it will always disarm u: is the 2nd principle of thermodinamics made manifest, ...entropy in expansion.**But setting order is the real quest 4 truth, ..and the mission of a (temporally) wise dude. * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiplying elements of vectors
Super Phil!! Thank you very much!! Mariana On Sat, Feb 5, 2011 at 8:54 PM, Phil Spector spec...@stat.berkeley.edu wrote: If the seq(5,205) was a typo, and should have been seq(5,20,5), then what you're looking for is the outer product of x and y: x = seq(5,20,5) y = seq(5,20,5) x %o% y [,1] [,2] [,3] [,4] [1,] 25 50 75 100 [2,] 50 100 150 200 [3,] 75 150 225 300 [4,] 100 200 300 400 outer(x,y) [,1] [,2] [,3] [,4] [1,] 25 50 75 100 [2,] 50 100 150 200 [3,] 75 150 225 300 [4,] 100 200 300 400 The outer() function will accepts a FUN= argument which defaults to '*'. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Sat, 5 Feb 2011, Mariana Martinez-Morales wrote: Hi guys: Sorry if this question is very basic. I’m learning basic matrix and vectors multiplication to develop a population matrix model for plants. I’m trying to multiply the elements of two vectors (each of the “x” values by each of the “y” values) to obtain a square matrix of xy values. f.e. x-seq(5,205) y-seq(5,20,5) stages-c(“Sdl”, “Juv”, “Ad1”, “Ad2”) If I just multiply xy as a matrix xy-matrix(x,y,nrow=4,ncol=4,dimnames=list(stages,stages)) I obtain this xy Sdl Juv A1 A2 Sdl 5 10 15 20 Juv 5 10 15 20 A1 5 10 15 20 A2 5 10 15 20 but what I want to obtain is this matrix Sdl Juv A1 A2 Sdl 25 50 75 100 Juv 50 100 150 200 A1 75 50 225 300 A2 100 200 300 400 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with integrating R and c/c++
Hi, I have been using R for close to two years now and have grown quite comfortable with the language. I am presently trying to implement an optimization routine in R (Newton Rhapson). I have some R functions that calculate the gradient and hessian (pre requisite matrices) fairly efficiently. Now, I have to call this function iteratively until some convergance criterion is reached. I think the standard method of doing this in most programming languages is a while loop. However, I know R can get pretty slow when you use loops. In order to make this efficient, I want to transfer this part of my code to a more efficient programming language like c++ or c. However, I have been trying to learn this all day without any luck. I found a package called Rcpp that makes this easier. However, it seems some functional knowledge of writing R packages is a pre requisite. I tried to follow the standard manual for doing this, but could not find a simple example to get me started. I know I am supposed to make a cpp file and put it some where before it can be called from R, but I'm confused as to how this can be done. My requirement is to start with a parameter vector, update it according to the gradient and hessian, check if the parameter satisfies some convergance criterion and continue doing this until it does. Is there a way to efficiently do this through an R function (replicate?). The problem is that the number of iterations is not fixed. If there is no function in R, is there a way I can quickly use Rcpp or some thing to have this last part of my code in a C or C++ program which repeatedly calls my R functions for updating the parameters? -- Thanks in advance, Rohit Mob: 91 9819926213 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] random interaction effect in lmer
Hello Federico, You should try sending this to the mixed models mailing list (link below). Also, it would probably help to know what the data looks like. With the information you provide, it's hard to say what the problem could be. r-sig-mixed-mod...@r-project.org Best, ~Jason On 2011.02.06, Federico Bonofiglio wrote: Hi dears while modeling an interaction random effect in lmer i receive the instantaneous error message ldlM4-lmer(ldl~rt*cd4+age+rf+pharmac+factor(hcv)+ + hivdur+(rt:cd4|id),na.action=na.omit,REML=F) *Warning message: In mer_finalize(ans) : false convergence (8) * I think the matter lies in syntax, 'cause i sistematically receive the same message even when changing response... PS: the above model runs quite well in lme and only rarely I recive : * Errore in lme.formula(fixed = ldl ~ rt + cd4 + age + rf + pharmac + factor(hcv) + : nlminb problem, convergence error code = 1 message = iteration limit reached without convergence (9) * ..reason for which I switch into lmer Thank u in advance foe any hints...;) -- *Little u can do against ignorance,it will always disarm u: is the 2nd principle of thermodinamics made manifest, ...entropy in expansion.**But setting order is the real quest 4 truth, ..and the mission of a (temporally) wise dude. * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jason W. Morgan Ph.D. Candidate Department of Political Science *The Ohio State University* 154 North Oval Mall Columbus, Ohio 43210 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with integrating R and c/c++
On Feb 6, 2011, at 10:28 AM, Rohit Pandey wrote: Hi, I have been using R for close to two years now and have grown quite comfortable with the language. I am presently trying to implement an optimization routine in R (Newton Rhapson). I have some R functions that calculate the gradient and hessian (pre requisite matrices) fairly efficiently. Now, I have to call this function iteratively until some convergance criterion is reached. I think the standard method of doing this in most programming languages is a while loop. However, I know R can get pretty slow when you use loops. In order to make this efficient, I want to transfer this part of my code to a more efficient programming language like c++ or c. However, I have been trying to learn this all day without any luck. I found a package called Rcpp that makes this easier. However, it seems some functional knowledge of writing R packages is a pre requisite. I tried to follow the standard manual for doing this, but could not find a simple example to get me started. I know I am supposed to make a cpp file and put it some where before it can be called from R, but I'm confused as to how this can be done. Dirk Eddelbuettel has 8 vignettes linked from his Rcpp page: http://dirk.eddelbuettel.com/code/rcpp.html And reading further you will see about 20 packages written with Rcpp. And there is a mailing list link which I had originally sought on the main Mailing List page (unsuccessfully): http://lists.r-forge.r-project.org/pipermail/rcpp-devel/ -- David. My requirement is to start with a parameter vector, update it according to the gradient and hessian, check if the parameter satisfies some convergance criterion and continue doing this until it does. Is there a way to efficiently do this through an R function (replicate?). The problem is that the number of iterations is not fixed. If there is no function in R, is there a way I can quickly use Rcpp or some thing to have this last part of my code in a C or C++ program which repeatedly calls my R functions for updating the parameters? -- Thanks in advance, Rohit Mob: 91 9819926213 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with integrating R and c/c++
On Sun, Feb 6, 2011 at 10:56 AM, David Winsemius dwinsem...@comcast.net wrote: On Feb 6, 2011, at 10:28 AM, Rohit Pandey wrote: Hi, I have been using R for close to two years now and have grown quite comfortable with the language. I am presently trying to implement an optimization routine in R (Newton Rhapson). I have some R functions that calculate the gradient and hessian (pre requisite matrices) fairly efficiently. Now, I have to call this function iteratively until some convergance criterion is reached. I think the standard method of doing this in most programming languages is a while loop. However, I know R can get pretty slow when you use loops. In order to make this efficient, I want to transfer this part of my code to a more efficient programming language like c++ or c. However, I have been trying to learn this all day without any luck. I found a package called Rcpp that makes this easier. However, it seems some functional knowledge of writing R packages is a pre requisite. I tried to follow the standard manual for doing this, but could not find a simple example to get me started. I know I am supposed to make a cpp file and put it some where before it can be called from R, but I'm confused as to how this can be done. Dirk Eddelbuettel has 8 vignettes linked from his Rcpp page: http://dirk.eddelbuettel.com/code/rcpp.html And reading further you will see about 20 packages written with Rcpp. Including one called RcppDE which does exactly what you want to do, implementing an optimizer in C++ to be called from R. Another package to look at is RcppArmadillo which provides a linkage between Rcpp and the Armadillo C++ linear algebra library (although it still doesn't do as much as I would like it to for writing optimizers). And there is a mailing list link which I had originally sought on the main Mailing List page (unsuccessfully): http://lists.r-forge.r-project.org/pipermail/rcpp-devel/ -- David. My requirement is to start with a parameter vector, update it according to the gradient and hessian, check if the parameter satisfies some convergance criterion and continue doing this until it does. Is there a way to efficiently do this through an R function (replicate?). The problem is that the number of iterations is not fixed. If there is no function in R, is there a way I can quickly use Rcpp or some thing to have this last part of my code in a C or C++ program which repeatedly calls my R functions for updating the parameters? -- Thanks in advance, Rohit Mob: 91 9819926213 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with integrating R and c/c++
On 02/06/2011 07:28 AM, Rohit Pandey wrote: Hi, I have been using R for close to two years now and have grown quite comfortable with the language. I am presently trying to implement an optimization routine in R (Newton Rhapson). I have some R functions that calculate the gradient and hessian (pre requisite matrices) fairly efficiently. Now, I have to call this function iteratively until some convergance criterion is reached. I think the standard method of doing this in most programming languages is a while loop. However, I know R can get pretty slow when you use loops. In order to make this efficient, I want to Have you looked at ?optim or ?optimize or perhaps ?nlm ? The bulk of the computation is in repeatedly evaluating the function and its gradient, rather than the machinery to iterate to convergence. Maybe these parts of your code can be written more efficiently? Martin transfer this part of my code to a more efficient programming language like c++ or c. However, I have been trying to learn this all day without any luck. I found a package called Rcpp that makes this easier. However, it seems some functional knowledge of writing R packages is a pre requisite. I tried to follow the standard manual for doing this, but could not find a simple example to get me started. I know I am supposed to make a cpp file and put it some where before it can be called from R, but I'm confused as to how this can be done. My requirement is to start with a parameter vector, update it according to the gradient and hessian, check if the parameter satisfies some convergance criterion and continue doing this until it does. Is there a way to efficiently do this through an R function (replicate?). The problem is that the number of iterations is not fixed. If there is no function in R, is there a way I can quickly use Rcpp or some thing to have this last part of my code in a C or C++ program which repeatedly calls my R functions for updating the parameters? -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with integrating R and c/c++
On 6 February 2011 at 20:58, Rohit Pandey wrote: | Hi, | | I have been using R for close to two years now and have grown quite | comfortable with the language. I am presently trying to implement an | optimization routine in R (Newton Rhapson). I have some R functions that | calculate the gradient and hessian (pre requisite matrices) fairly | efficiently. Now, I have to call this function iteratively until some | convergance criterion is reached. I think the standard method of doing this | in most programming languages is a while loop. However, I know R can get | pretty slow when you use loops. In order to make this efficient, I want to | transfer this part of my code to a more efficient programming language like | c++ or c. However, I have been trying to learn this all day without any | luck. I found a package called Rcpp that makes this easier. However, it | seems some functional knowledge of writing R packages is a pre requisite. I What gave you that impression? Here is a counter-example, using the packages inline (for cxxfunction) and Rcpp: R library(inline) R src - 'std::cout Hello C++_From_R World std::endl; return(Rcpp::wrap(42));' R rohit - cxxfunction(signature(), src, plugin=Rcpp) R rohit() Hello C++_From_R World [1] 42 R This compiled, linked and loaded a C++ routine built from the two-statement program submitted as character variable. The Rcpp documentation, including its eight vignettes, is full of other examples. Start with Rcpp-introduction and maybe the Rcpp-FAQ. | tried to follow the standard manual for doing this, but could not find a | simple example to get me started. I know I am supposed to make a cpp file | and put it some where before it can be called from R, but I'm confused as to | how this can be done. | | My requirement is to start with a parameter vector, update it according to | the gradient and hessian, check if the parameter satisfies some convergance | criterion and continue doing this until it does. Is there a way to | efficiently do this through an R function (replicate?). The problem is that | the number of iterations is not fixed. If there is no function in R, is | there a way I can quickly use Rcpp or some thing to have this last part of | my code in a C or C++ program which repeatedly calls my R functions for | updating the parameters? Give the example above a first try, and then read some more. The archives of the rcpp-devel (CC'ed; post there for follow-ups after subscribing) list are full of examples, and the CRAN page for Rcpp lists almost two dozen other packages using Rcpp giving you plenty of examples should you want to write a package using Rcpp. Several of these packages do optimization giving you examples of you to pass parameters etc pp. Hope this helps, Dirk | | -- | Thanks in advance, | Rohit | Mob: 91 9819926213 | | [[alternative HTML version deleted]] | | __ | R-help@r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-help | PLEASE do read the posting guide http://www.R-project.org/posting-guide.html | and provide commented, minimal, self-contained, reproducible code. -- Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to use a value of an aboject in a line command?
This is my first time here at R Forum! I am a new user of R. I am very happy with this fabulous software! I know how to use greps, fors, seeds, %in%, paste, etc. But I need to know how to use a value of an aboject in a line command? Example: i [1] TP53 TP53 V1 V2 V3 1 1 TP53 1.1 2 2 TP53 1.2 3 3 TP53 1.3 I would like to do a t.test of TP53 x control. But this is a gene list. My line command would be result = t.test( i[,3], control[,3]). But using this, the test will be in the object i and I would like to do in the object which is value of i. I though to use `print(i)`, like Shell, to modify my test line command, but i does not works. Do I was clear? I need to do it because the test will be done in a list of genes (for i in genes). Thank you very much, regards, Leandro Colli [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Applying 'cbind/rbind' among different list object
Hi, I am wondering whether we can apply 'cbind/rbind' on many **equivalent** list objects. For example please consider following: list1 - list2 - vector(list, length=2); names(list1) - names(list2) - c(a, b) list1[[1]] - matrix(1:25, 5) list1[[2]] - matrix(2:26, 5) list2[[1]] - 10:14 list2[[2]] - 11:15 list1 $a [,1] [,2] [,3] [,4] [,5] [1,]16 11 16 21 [2,]27 12 17 22 [3,]38 13 18 23 [4,]49 14 19 24 [5,]5 10 15 20 25 $b [,1] [,2] [,3] [,4] [,5] [1,]27 12 17 22 [2,]38 13 18 23 [3,]49 14 19 24 [4,]5 10 15 20 25 [5,]6 11 16 21 26 list2 $a [1] 10 11 12 13 14 $b [1] 11 12 13 14 15 Here I want to rbind these 2 list-s according to a and b i.e. I want to get another list of length 2 where each element will be matrix with (6x5) dimension. How can I do that? Thanks for your time [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fortran and long integers
Hi all, I'm hoping someone more knowledgeable in Fortran than I can chime in with opinion. I'm the maintainer of the flashClust package that implements fast hierarchical clustering. The fortran code fails when the number of clustered objects is larger than about 46300. My guess is that this is because the code uses the following construct: IOFFSET=J+(I-1)*N-(I*(I+1))/2 where N is the number of clustered objects and I, J can vary between 1 and N. The result is used as index to access an array (namely the distance structure). When N is more than 46341 (or 2^16/sqrt(2)), the expressions I*(I+1) and (I-1)*N can overflow and turn negative, messing up the indexing and crashing the code and the entire R session with a segmentation fault. My solution is to turn the integers into double precision's, calculate the index and convert back as follows: XI = DBLE(I) IOFFSET=J+NINT( (XI-1)*N - (XI*(XI+1))/2) I'm wondering if there's a better way, something along the lines of unsigned and/or long integers that are available in C? Thanks, Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Applying 'cbind/rbind' among different list object
will this do it for you: lapply(seq(length(list1)), function(i)rbind(list1[[i]], list2[[i]])) [[1]] [,1] [,2] [,3] [,4] [,5] [1,]16 11 16 21 [2,]27 12 17 22 [3,]38 13 18 23 [4,]49 14 19 24 [5,]5 10 15 20 25 [6,] 10 11 12 13 14 [[2]] [,1] [,2] [,3] [,4] [,5] [1,]27 12 17 22 [2,]38 13 18 23 [3,]49 14 19 24 [4,]5 10 15 20 25 [5,]6 11 16 21 26 [6,] 11 12 13 14 15 On Sun, Feb 6, 2011 at 1:32 PM, B. Jonathan B. Jonathan bkheijonat...@gmail.com wrote: Hi, I am wondering whether we can apply 'cbind/rbind' on many **equivalent** list objects. For example please consider following: list1 - list2 - vector(list, length=2); names(list1) - names(list2) - c(a, b) list1[[1]] - matrix(1:25, 5) list1[[2]] - matrix(2:26, 5) list2[[1]] - 10:14 list2[[2]] - 11:15 list1 $a [,1] [,2] [,3] [,4] [,5] [1,] 1 6 11 16 21 [2,] 2 7 12 17 22 [3,] 3 8 13 18 23 [4,] 4 9 14 19 24 [5,] 5 10 15 20 25 $b [,1] [,2] [,3] [,4] [,5] [1,] 2 7 12 17 22 [2,] 3 8 13 18 23 [3,] 4 9 14 19 24 [4,] 5 10 15 20 25 [5,] 6 11 16 21 26 list2 $a [1] 10 11 12 13 14 $b [1] 11 12 13 14 15 Here I want to rbind these 2 list-s according to a and b i.e. I want to get another list of length 2 where each element will be matrix with (6x5) dimension. How can I do that? Thanks for your time [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using character vector as input argument to setkey (data.table pakcage)
Dear R helpers, I wonder how to use a character vector as an input argument to setkey (data.table package). The following works: library(data.table) test.dt - data.table(expand.grid(a=1:30,b=LETTERS),c=seq(30*26)) setkey(test.dt,a,b) I like a similar function, but can accept c('a','b') as an input argument as below setkey.wanted(test.dt,c('a','b')) Your help will be highly appreciated. -Sean [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with integrating R and c/c++
Hi Rohit, It is fine if you are using the Newton's method just as an excuse to learn Rcpp. If, however, your main goal is to develop a package to implement Newton's method, then you need to stop and look at the numerous optimization packages available in R. A good place to start would be the optimx package, which unifies a large number of optimiaztion tools under one umbrella. Hope this helps, Ravi. Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu - Original Message - From: Rohit Pandey rohitpandey...@gmail.com Date: Sunday, February 6, 2011 11:33 am Subject: [R] Help with integrating R and c/c++ To: r-help@r-project.org Hi, I have been using R for close to two years now and have grown quite comfortable with the language. I am presently trying to implement an optimization routine in R (Newton Rhapson). I have some R functions that calculate the gradient and hessian (pre requisite matrices) fairly efficiently. Now, I have to call this function iteratively until some convergance criterion is reached. I think the standard method of doing this in most programming languages is a while loop. However, I know R can get pretty slow when you use loops. In order to make this efficient, I want to transfer this part of my code to a more efficient programming language like c++ or c. However, I have been trying to learn this all day without any luck. I found a package called Rcpp that makes this easier. However, it seems some functional knowledge of writing R packages is a pre requisite. I tried to follow the standard manual for doing this, but could not find a simple example to get me started. I know I am supposed to make a cpp file and put it some where before it can be called from R, but I'm confused as to how this can be done. My requirement is to start with a parameter vector, update it according to the gradient and hessian, check if the parameter satisfies some convergance criterion and continue doing this until it does. Is there a way to efficiently do this through an R function (replicate?). The problem is that the number of iterations is not fixed. If there is no function in R, is there a way I can quickly use Rcpp or some thing to have this last part of my code in a C or C++ program which repeatedly calls my R functions for updating the parameters? -- Thanks in advance, Rohit Mob: 91 9819926213 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to use a value of an aboject in a line command?
On Sun, Feb 06, 2011 at 04:31:43PM -0200, Leandro Colli wrote: This is my first time here at R Forum! I am a new user of R. I am very happy with this fabulous software! I know how to use greps, fors, seeds, %in%, paste, etc. But I need to know how to use a value of an aboject in a line command? Example: i [1] TP53 TP53 V1 V2 V3 1 1 TP53 1.1 2 2 TP53 1.2 3 3 TP53 1.3 I would like to do a t.test of TP53 x control. But this is a gene list. My line command would be result = t.test( i[,3], control[,3]). But using this, the test will be in the object i and I would like to do in the object which is value of i. I though to use `print(i)`, like Shell, to modify my test line command, but i does not works. Do I was clear? I need to do it because the test will be done in a list of genes (for i in genes). If i understand correctly, then what you ask for is result = t.test( get(i)[,3], control[,3]) However, get() is only rarely needed. If you have a list of tables, you can have them all in a single list variable and use a loop over the list. I mean something like the following tab - list() tab[[1]] - data.frame(V1=1:3, V2=AA, V3=seq(1.1, 1.3, by=0.1)) tab[[2]] - data.frame(V1=1:3, V2=BB, V3=seq(2.1, 2.3, by=0.1)) tab[[3]] - data.frame(V1=1:3, V2=CC, V3=seq(3.1, 3.3, by=0.1)) for (j in 1:3) { # here any command using tab[[j]] may be used # using print() for simplicity print(tab[[j]]) } See chapter 6 Lists and data frames of R-intro.pdf available at http://cran.r-project.org/manuals.html Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate function - na.action
On Fri, Feb 4, 2011 at 6:54 PM, Ista Zahn iz...@psych.rochester.edu wrote: However, I don't think you've told us what you're actually trying to accomplish... I'm trying to aggregate the y value of a big data set which has several x's and a y. I'm using an abstracted example for many reasons. Partially, I'm using an abstracted example to comply with the posting guidelines of having a reproducible example. I'm really aggregating some incredibly boring and complex customer data for an undisclosed client. As it turns out, Aggregate will not work when some of x's are NA, unless you convert them to factors, with NA's included. In my case, the data is so big that doing the conversions causes other memory problems, and renders some of my numeric values useless for other calculations. My real data looks more like this (except with a few more categories and records): set.seed(100) library(plyr) dat=data.frame( x1=sample(c(NA,'m','f'), 2e6, replace=TRUE), x2=sample(c(NA, 1:10), 2e6, replace=TRUE), x3=sample(c(NA,letters[1:5]), 2e6, replace=TRUE), x4=sample(c(NA,T,F), 2e6, replace=TRUE), x5=sample(c(NA,'active','inactive','deleted','resumed'), 2e6, replace=TRUE), x6=sample(c(NA, 1:10), 2e6, replace=TRUE), x7=sample(c(NA,'married','divorced','separated','single','etc'), 2e6, replace=TRUE), x8=sample(c(NA,T,F), 2e6, replace=TRUE), y=trunc(rnorm(2e6)*1), stringsAsFactors=F) str(dat) ## The control total sum(dat$y, na.rm=T) ## The aggregate total sum(aggregate(dat$y, dat[,1:8], sum, na.rm=T)$x) ## The ddply total sum(ddply(dat, .(x1,x2,x3,x4,x5,x6,x7,x8), function(x) {data.frame(y.sum=sum(x$y,na.rm=TRUE))})$y.sum) ddply worked a little better than I expected at first, but it slows to a crawl or has runs out of memory too often for me to invest the time learning how to use it. Just now it worked for 1m records, and it was just a bit slower than aggregate. But for the 2m example it hasn't finished calculating. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] very basic HLM question
Thank you for your reply and sorry for my ambiguity. I computed: summary(anova1 - aov(math ~ as.factor(schoolid), data=nels88)) ICC1(anova1) ICC1 comes from the multilevel package. I have found an article where it is pointed out that with this formula: 34.011/(34.011+72.256) [1] 0.3200523 I should get practically an identical result than using ICC1. I have used the database of that article and I have got an identical result. But with the dataset I am using (schools) it doesn't work. This is the example of the article mentioned above: library(multilevel) base(bh1996) summary(lmer(WBEING ~ 1 + (1|GRP), data=bh1996)) 0.035801/(0.035801+0.789497) summary(test - aov(WBEING~as.factor(GRP),data=bh1996)) ICC1(test) I have attached the small database where this equality doesn't exist. Thank you in advance. On 2/5/2011 11:07 PM, Paul Johnson wrote: 2011/2/5 Sebastián Dazasebastian.d...@gmail.com: Hi everyone, I need to get a between-component variance (e.g. random effects Anova), but using lmer I don't get the same results (variance component) than using random effects Anova. I am using a database of students, clustered on schools (there is not the same number of students by school). According to the ICC1 command, the interclass correlation is .44 ICC1(anova1) [1] 0.4414491 If you don't tell us exactly what model you are calculating in anova1, how would we guess if there is something wrong? Similarly, I get this ICC1 Error: object 'ICC1' not found so it must mean you've loaded a package or written a function, which you've not shown us. I googled my way to a package called multilevel that has ICC1, and its code for ICC1 shows a formula that does not match the one you used to calculate ICC from lmer. function (object) { MOD- summary(object) MSB- MOD[[1]][1, 3] MSW- MOD[[1]][2, 3] GSIZE- (MOD[[1]][2, 1] + (MOD[[1]][1, 1] + 1))/(MOD[[1]][1, 1] + 1) OUT- (MSB - MSW)/(MSB + ((GSIZE - 1) * MSW)) return(OUT) } I'm not saying that's right or wrong, just not obviously identical to the formula you proposed. However, I cannot get the same ICC from the lmer output: anova2- lmer(math ~ 1 + (1|schoolid), data=nels88) summary(anova2- lmer(math ~ 1 + (1|schoolid), data=nels88)) Instead, do this (same thing, fits model only once): anova2- lmer(math ~ 1 + (1|schoolid), data=nels88) summary(anova2) Note that lmer is going to estimate a normally distributed random effect for each school, as well as an individual observation random effect (usual error term) that is assumed independent of the school-level effect. What is anova1 estimating? -- Sebastián Daza sebastian.d...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subsampling out of site*abundance matrix
Hello, How can I randomly sample individuals within a sites from a site (row) X species abundance (column) data frame or matrix? As an example, the matrix abund2 made below. # (sorry, Im a newbie and this is the only way I know to get an example on here) abund1 -c(150, 300, 0, 360, 150, 300, 0, 240, 150, 0, 60, 0, 150, 0, 540, 0, 0, 300, 0, 240, 300, 300, 0, 360, 300, 0, 600, 0) abund2 - matrix(data=abund1, nrow=4, ncol=7) colnames(abund2) - c(spA, spB, spC, spD, spa, spF, spG) rownames(abund2)-c(site1, site2, site3, site4) # abund2 spA spB spC spD spa spF spG site1 150 150 150 150 0 300 300 site2 300 300 0 0 300 300 0 site3 0 0 60 540 0 0 600 site4 360 240 0 0 240 360 0 How can I make a random subsample of 100 individuals from the abundances given for each site? This is probably really easy. Thanks. Bubba -- View this message in context: http://r.789695.n4.nabble.com/Subsampling-out-of-site-abundance-matrix-tp3263148p3263148.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate function - na.action
Try to use formula notation and use na.action=na.pass It is all described in the help(aggregate) У Няд, 06/02/2011 у 14:54 -0600, Gene Leynes піша: On Fri, Feb 4, 2011 at 6:54 PM, Ista Zahn iz...@psych.rochester.edu wrote: However, I don't think you've told us what you're actually trying to accomplish... I'm trying to aggregate the y value of a big data set which has several x's and a y. I'm using an abstracted example for many reasons. Partially, I'm using an abstracted example to comply with the posting guidelines of having a reproducible example. I'm really aggregating some incredibly boring and complex customer data for an undisclosed client. As it turns out, Aggregate will not work when some of x's are NA, unless you convert them to factors, with NA's included. In my case, the data is so big that doing the conversions causes other memory problems, and renders some of my numeric values useless for other calculations. My real data looks more like this (except with a few more categories and records): set.seed(100) library(plyr) dat=data.frame( x1=sample(c(NA,'m','f'), 2e6, replace=TRUE), x2=sample(c(NA, 1:10), 2e6, replace=TRUE), x3=sample(c(NA,letters[1:5]), 2e6, replace=TRUE), x4=sample(c(NA,T,F), 2e6, replace=TRUE), x5=sample(c(NA,'active','inactive','deleted','resumed'), 2e6, replace=TRUE), x6=sample(c(NA, 1:10), 2e6, replace=TRUE), x7=sample(c(NA,'married','divorced','separated','single','etc'), 2e6, replace=TRUE), x8=sample(c(NA,T,F), 2e6, replace=TRUE), y=trunc(rnorm(2e6)*1), stringsAsFactors=F) str(dat) ## The control total sum(dat$y, na.rm=T) ## The aggregate total sum(aggregate(dat$y, dat[,1:8], sum, na.rm=T)$x) ## The ddply total sum(ddply(dat, .(x1,x2,x3,x4,x5,x6,x7,x8), function(x) {data.frame(y.sum=sum(x$y,na.rm=TRUE))})$y.sum) ddply worked a little better than I expected at first, but it slows to a crawl or has runs out of memory too often for me to invest the time learning how to use it. Just now it worked for 1m records, and it was just a bit slower than aggregate. But for the 2m example it hasn't finished calculating. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fund style categorizing
I've got a fund performance report for the last year,in which one column states their management style,i.e. long/short,market neutral. I want to categorize all the fund into subgroups by their management style.How can I do that? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate function - na.action
By the way, thanks for sending that formula, it's quite thoughtful of you to send an answer with an actual working line of code! When I experimented with ddply earlier last week I couldn't figure out the syntax for a single line aggregation, so it's good to have this example. I will likely use it for other things. On Fri, Feb 4, 2011 at 6:54 PM, Ista Zahn iz...@psych.rochester.edu wrote: oops. For clarity, that should have been sum(ddply(dat, .(x1,x2,x3,x4), function(x){data.frame(y.sum=sum(x$y, na.rm=TRUE))})$y.sum) -Ista On Fri, Feb 4, 2011 at 7:52 PM, Ista Zahn iz...@psych.rochester.edu wrote: Hi again, On Fri, Feb 4, 2011 at 7:18 PM, Gene Leynes gleyne...@gmail.com wrote: Ista, Thank you again. I had figured that out... and was crafting another message when you replied. The NAs do come though on the variable that is being aggregated, However, they do not come through on the categorical variable(s). The aggregate function must be converting the data frame variables to factors, with the default omit=NA parameter. The help on aggregate says: na.action A function which indicates what should happen when the data contain NA values. The default is to ignore missing values in the given variables. By data it must only refer to the aggregated variable, and not the categorical variables. I thought it referred to both, because I thought it referred to the data argument, which is the underlying data frame. I think the proper way to accomplish this would be to recast my x (categorical) variables as factors. Yes, that would work. This is not feasible for me due to other complications. Also, (imho) the help should be more clear about what the na.action modifies. So, unless someone has a better idea, I guess I'm out of luck? Well, you can use ddply from the plyr package: library(plyr) # may need to install first. sum(ddply(dat, .(x1,x2,x3,x4), function(x){data.frame(y.sum=sum(x$y, na.rm=TRUE))})$y) However, I don't think you've told us what you're actually trying to accomplish... Best, Ista On Fri, Feb 4, 2011 at 6:05 PM, Ista Zahn iz...@psych.rochester.edu wrote: Hi, On Fri, Feb 4, 2011 at 6:33 PM, Gene Leynes gleyne...@gmail.com wrote: Thank you both for the thoughtful (and funny) replies. I agree with both of you that sum is the one picking up aggregate. Although I didn't mention it, I did realize that in the first place. Also, thank you Phil for pointing out that aggregate only accepts a formula value in more recent versions! I actually thought that was an older feature, but I must be thinking of other functions. I still don't see why these two values are not the same! It seems like a bug to me No, not a bug (see below). set.seed(100) dat=data.frame( + x1=sample(c(NA,'m','f'), 100, replace=TRUE), + x2=sample(c(NA, 1:10), 100, replace=TRUE), + x3=sample(c(NA,letters[1:5]), 100, replace=TRUE), + x4=sample(c(NA,T,F), 100, replace=TRUE), + y=sample(c(rep(NA,5), rnorm(95 sum(dat$y, na.rm=T) [1] 0.0815244116598 sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass, na.rm=T)$y) [1] -4.45087666247 Because in the first one you are only removing missing data in dat$y. In the second one you are removeing all rows that contain missing data in any of the columns. all.equal(sum(na.omit(dat)$y), sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass, na.rm=T)$y)) [1] TRUE Best, Ista On Fri, Feb 4, 2011 at 4:18 PM, Ista Zahn iz...@psych.rochester.edu wrote: Sorry, I didn't see Phil's reply, which is better than mine anyway. -Ista On Fri, Feb 4, 2011 at 5:16 PM, Ista Zahn iz...@psych.rochester.edu wrote: Hi, Please see ?na.action (just kidding!) So it seems to me the problem is that you are passing na.rm to the sum function. So there is no missing data for the na.action argument to operate on! Compare sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.fail)$y) sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass)$y) sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.omit)$y) Best, Ista On Fri, Feb 4, 2011 at 4:07 PM, Gene Leynes gleyne...@gmail.com wrote: Can someone please tell me what is up with na.action in aggregate? My (somewhat) reproducible example: (I say somewhat because some lines wouldn't run in a separate session, more below) set.seed(100) dat=data.frame( x1=sample(c(NA,'m','f'), 100, replace=TRUE), x2=sample(c(NA, 1:10), 100, replace=TRUE), x3=sample(c(NA,letters[1:5]), 100, replace=TRUE), x4=sample(c(NA,T,F), 100, replace=TRUE), y=sample(c(rep(NA,5),
[R] function optimization
Dear all, this is my first time, and just begin to use R. But I've a question about optimization using optimx library. It could sound stupid by I'm a bit affraid with the problem, because anything I try, anything Error. The procedure was: optimx(par=10,71,1,fn=(Prey*Provisioning)/Risk, control=list(maximize)) well the problem lies in the initial value parameters, again and again, R tells me that it is not possible to optimize the function at this values, even though these are real estimations. Any suggestion? Pablo -- View this message in context: http://r.789695.n4.nabble.com/function-optimization-tp3263244p3263244.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function optimization
garciap garciap at usal.es writes: this is my first time, and just begin to use R. But I've a question about optimization using optimx library. It could sound stupid by I'm a bit affraid with the problem, because anything I try, anything Error. The procedure was: optimx(par=10,71,1,fn=(Prey*Provisioning)/Risk, control=list(maximize)) well the problem lies in the initial value parameters, again and again, R tells me that it is not possible to optimize the function at this values, even though these are real estimations. I'm afraid there are quite a few things wrong with your approach. * parameters need to be a **numeric vector** (e.g. c(10,71,1)) rather than characters * you haven't defined a function that takes the vector of parameters as an argument and returns a goodness-of-fit measure or other objective function value. This is the biggest problem (the rest are just basic misunderstandings of R syntax, easily corrected). * if you want the function to maximize you need control=list(maximize=TRUE). * I think you have not given us exactly the command you entered. If I enter what you have typed above, the unmatched quotation mark before 'maximize' means that R doesn't even accept my command: it gives me a 'continuation character' (+) telling me that it is waiting for further input. It's very important to specify *exactly* the commands you typed; cutting and pasting from the R session (or better, keeping a script file and copying from there) is a good way to ensure this. I would suggest that you work carefully through the examples in the optimx help page, making sure that you understand them (or if you get totally stuck there, you can post to the help list with a *specific* question about what you haven't understood). I might also recommend that you try starting with the optim() help page, which is a little bit simpler than the optimx() page. optimx has better optimization routines, but for getting a basic understanding of how R does optimization optim() should suffice. Don't forget to read the posting guide before you come back with more questions. good luck, Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate function - na.action
Try 'data.table' package. It took 3 seconds to aggregate the 500K levels: Is this what you were after? # note the characters are converted to factors that 'data.table' likes dat=data.frame( +x1=sample(c(NA,'m','f'), 2e6, replace=TRUE), +x2=sample(c(NA, 1:10), 2e6, replace=TRUE), +x3=sample(c(NA,letters[1:5]), 2e6, replace=TRUE), +x4=sample(c(NA,T,F), 2e6, replace=TRUE), +x5=sample(c(NA,'active','inactive','deleted','resumed'), 2e6, + replace=TRUE), +x6=sample(c(NA, 1:10), 2e6, replace=TRUE), +x7=sample(c(NA,'married','divorced','separated','single','etc'), + 2e6, replace=TRUE), +x8=sample(c(NA,T,F), 2e6, replace=TRUE), +y=trunc(rnorm(2e6)*1)) str(dat) 'data.frame': 200 obs. of 9 variables: $ x1: Factor w/ 2 levels f,m: NA NA 2 NA NA NA NA 1 1 1 ... $ x2: int 4 5 3 10 10 7 1 1 3 5 ... $ x3: Factor w/ 5 levels a,b,c,d,..: 3 2 1 2 1 5 1 1 2 1 ... $ x4: logi NA TRUE TRUE NA FALSE NA ... $ x5: Factor w/ 4 levels active,deleted,..: 4 3 3 2 2 1 1 NA 3 3 ... $ x6: int NA 2 7 2 1 9 NA 1 1 9 ... $ x7: Factor w/ 5 levels divorced,etc,..: 1 3 5 NA 2 3 1 2 2 2 ... $ x8: logi NA NA NA FALSE FALSE FALSE ... $ y : num 3066 -13237 -7840 9728 1596 ... require(data.table) dat - data.table(dat) system.time(result - dat[, sum(y), by = list(x1,x2,x3,x4,x5,x6,x7,x8)]) user system elapsed 3.110.163.26 str(result) Classes ‘data.table’ and 'data.frame': 568594 obs. of 9 variables: $ x1: Factor w/ 2 levels f,m: NA NA NA NA NA NA NA NA NA NA ... $ x2: int NA NA NA NA NA NA NA NA NA NA ... $ x3: Factor w/ 5 levels a,b,c,d,..: NA NA NA NA NA NA NA NA NA NA ... $ x4: logi NA NA NA NA NA NA ... $ x5: Factor w/ 4 levels active,deleted,..: NA NA NA NA NA NA NA NA NA NA ... $ x6: int NA NA NA NA NA NA NA NA NA NA ... $ x7: Factor w/ 5 levels divorced,etc,..: NA NA NA 1 1 1 2 2 2 3 ... $ x8: logi NA FALSE TRUE NA FALSE TRUE ... $ V1: num 6641 -18158 3 -11202 -14437 ... On Sun, Feb 6, 2011 at 3:54 PM, Gene Leynes gleyne...@gmail.com wrote: On Fri, Feb 4, 2011 at 6:54 PM, Ista Zahn iz...@psych.rochester.edu wrote: However, I don't think you've told us what you're actually trying to accomplish... I'm trying to aggregate the y value of a big data set which has several x's and a y. I'm using an abstracted example for many reasons. Partially, I'm using an abstracted example to comply with the posting guidelines of having a reproducible example. I'm really aggregating some incredibly boring and complex customer data for an undisclosed client. As it turns out, Aggregate will not work when some of x's are NA, unless you convert them to factors, with NA's included. In my case, the data is so big that doing the conversions causes other memory problems, and renders some of my numeric values useless for other calculations. My real data looks more like this (except with a few more categories and records): set.seed(100) library(plyr) dat=data.frame( x1=sample(c(NA,'m','f'), 2e6, replace=TRUE), x2=sample(c(NA, 1:10), 2e6, replace=TRUE), x3=sample(c(NA,letters[1:5]), 2e6, replace=TRUE), x4=sample(c(NA,T,F), 2e6, replace=TRUE), x5=sample(c(NA,'active','inactive','deleted','resumed'), 2e6, replace=TRUE), x6=sample(c(NA, 1:10), 2e6, replace=TRUE), x7=sample(c(NA,'married','divorced','separated','single','etc'), 2e6, replace=TRUE), x8=sample(c(NA,T,F), 2e6, replace=TRUE), y=trunc(rnorm(2e6)*1), stringsAsFactors=F) str(dat) ## The control total sum(dat$y, na.rm=T) ## The aggregate total sum(aggregate(dat$y, dat[,1:8], sum, na.rm=T)$x) ## The ddply total sum(ddply(dat, .(x1,x2,x3,x4,x5,x6,x7,x8), function(x) {data.frame(y.sum=sum(x$y,na.rm=TRUE))})$y.sum) ddply worked a little better than I expected at first, but it slows to a crawl or has runs out of memory too often for me to invest the time learning how to use it. Just now it worked for 1m records, and it was just a bit slower than aggregate. But for the 2m example it hasn't finished calculating. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsampling out of site*abundance matrix
On Feb 6, 2011, at 3:25 PM, B77S wrote: Hello, How can I randomly sample individuals within a sites from a site (row) X species abundance (column) data frame or matrix? As an example, the matrix abund2 made below. # (sorry, Im a newbie and this is the only way I know to get an example on here) abund1 -c(150, 300, 0, 360, 150, 300, 0, 240, 150, 0, 60, 0, 150, 0, 540, 0, 0, 300, 0, 240, 300, 300, 0, 360, 300, 0, 600, 0) abund2 - matrix(data=abund1, nrow=4, ncol=7) colnames(abund2) - c(spA, spB, spC, spD, spa, spF, spG) rownames(abund2)-c(site1, site2, site3, site4) Perfect. Best submission of an example by a newbie in weeks. # abund2 spA spB spC spD spa spF spG site1 150 150 150 150 0 300 300 site2 300 300 0 0 300 300 0 site3 0 0 60 540 0 0 600 site4 360 240 0 0 240 360 0 How can I make a random subsample of 100 individuals from the abundances given for each site? samptbl - apply(abund2, 1, function(x) sample(colnames(abund2), 100, prob=x, replace=TRUE) ) samptbl site1 site2 site3 site4 [1,] spG spa spD spF [2,] spF spF spG spB [3,] spF spB spC spA [4,] spD spa spG spA [5,] spF spa spD spa [6,] spA spB spD spF [7,] spA spF spD spA [8,] spG spF spG spa [9,] spF spF spG spa [10,] spG spB spD spA Snipped apply() always transposes the results when called with row margins. The t() function would fix this if it needed to be arranged with rows by site. You could check by further apply-(cation) of table to the columns: apply(samptbl, 2, table) $site1 spA spB spC spD spF spG 8 13 6 13 32 28 $site2 spa spA spB spF 25 31 25 19 $site3 spC spD spG 9 51 40 $site4 spa spA spB spF 22 27 19 32 This is probably really easy. Thanks. Bubba -- View this message in context: http://r.789695.n4.nabble.com/Subsampling-out-of-site-abundance-matrix-tp3263148p3263148.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fund style categorizing
?aggregate ?tapply data.table package example data and output would have been useful Sent from my iPad On Feb 6, 2011, at 15:02, Ning Cheng wakinchaue...@gmail.com wrote: I've got a fund performance report for the last year,in which one column states their management style,i.e. long/short,market neutral. I want to categorize all the fund into subgroups by their management style.How can I do that? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Applying 'cbind/rbind' among different list object
Another possibility is mapply(rbind,list1,list2,SIMPLIFY=FALSE) - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Sun, 6 Feb 2011, jim holtman wrote: will this do it for you: lapply(seq(length(list1)), function(i)rbind(list1[[i]], list2[[i]])) [[1]] [,1] [,2] [,3] [,4] [,5] [1,]16 11 16 21 [2,]27 12 17 22 [3,]38 13 18 23 [4,]49 14 19 24 [5,]5 10 15 20 25 [6,] 10 11 12 13 14 [[2]] [,1] [,2] [,3] [,4] [,5] [1,]27 12 17 22 [2,]38 13 18 23 [3,]49 14 19 24 [4,]5 10 15 20 25 [5,]6 11 16 21 26 [6,] 11 12 13 14 15 On Sun, Feb 6, 2011 at 1:32 PM, B. Jonathan B. Jonathan bkheijonat...@gmail.com wrote: Hi, I am wondering whether we can apply 'cbind/rbind' on many **equivalent** list objects. For example please consider following: list1 - list2 - vector(list, length=2); names(list1) - names(list2) - c(a, b) list1[[1]] - matrix(1:25, 5) list1[[2]] - matrix(2:26, 5) list2[[1]] - 10:14 list2[[2]] - 11:15 list1 $a [,1] [,2] [,3] [,4] [,5] [1,] 1 6 11 16 21 [2,] 2 7 12 17 22 [3,] 3 8 13 18 23 [4,] 4 9 14 19 24 [5,] 5 10 15 20 25 $b [,1] [,2] [,3] [,4] [,5] [1,] 2 7 12 17 22 [2,] 3 8 13 18 23 [3,] 4 9 14 19 24 [4,] 5 10 15 20 25 [5,] 6 11 16 21 26 list2 $a [1] 10 11 12 13 14 $b [1] 11 12 13 14 15 Here I want to rbind these 2 list-s according to a and b i.e. I want to get another list of length 2 where each element will be matrix with (6x5) dimension. How can I do that? Thanks for your time [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Legend outside the plot? xpd?
you can use layout to setup 4 plot areas; 3 for the graphs and then along one on the right for the legend Sent from my iPad On Feb 6, 2011, at 2:41, Matt Cooper mattcst...@gmail.com wrote: Hi All, BG: Will try be brief. I'd like 3 graphs on a page (below each other mfrow=c(3,1)), saved to pdf. The three plot data on the same subject so I'm having one legend, to the right of the center graph. I'm using mar=c(5,15,4,15) to bring the sides in so that the graphs are square and not stretched wide. To have the graph to the side I'm thinking xpd=T. Each graph has a number of points and lines overlaid, so plot ();lines(),lines(),lines() etc. The data is somewhat of a subset of a larger set, so I'm limiting what is being displayed. The lines plot trends of the wider data. P: With xpd=T, the lines are going right out of the graph boxes to the outer limit of the plot boundaries, as would be the intended behaviour of xpd. Goes without saying that this is undesirable. Q: Is there a better way to achieve what I want? At this stage I have xpd=T in the par(), and xpd=F in the plot() commands and all the line commands, just so I can get the legend to actually print... Aside: Given the way I've done this, the lines() are all clipped RIGHT at the limit of the plot box. So given the plot is called first these lines leave little coloured dashes on the black of the plot box. To sort this I've called box() at the end. This all seems very redundant? Thanks Matt -- mattcst...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate function - na.action
Hi: There's definitely something amiss with aggregate() here since similar functions from other packages can reproduce your 'control' sum. I expect ddply() will have some timing issues because of all the subgrouping in your data frame, but data.table did very well and the summaryBy() function in the doBy package did OK: library(data.table) library(doBy) # Utility function to remove missing values when computing the sum # for use in summaryBy() f - function(x) sum(x, na.rm = TRUE) system.time({ + dt - data.table(dat) + setkey(dt, x1, x2, x3, x4, x5, x6, x7, x8) + udt - dt[, list(y = sum(y, na.rm = TRUE)), + by = 'x1, x2, x3, x4, x5, x6, x7, x8'] + }) user system elapsed 7.521.569.14 system.time( + udb - summaryBy(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8, +data = dat, FUN = f) +) user system elapsed 83.820.83 85.21 # To verify against the control sum: sum(udt$y) [1] -5611158 sum(udb$y) [1] -5611158 Notice the difference in the number of rows of uda in comparison to the other two: uda - aggregate(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8, data = dat, FUN = f) sum(uda$y) [1] 6396661 dim(uda) [1] 77321 9 dim(udt) [1] 568353 9 dim(udb) [1] 568353 9 I used four different approaches to aggregate(), excluding the one above: aggregate(y ~ ., data = dat, FUN = f) aggregate(y ~ ., data = dat, FUN = sum, na.rm = TRUE) aggregate(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8, data = dat, FUN = sum, na.rm = TRUE) aggregate(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8, data = dat, FUN = sum, na.action= na.pass) All yielded the same answer as in uda. data.table has some distinct advantages in situations such as this - see its vignette and FAQ for details. HTH, Dennis On Sun, Feb 6, 2011 at 12:54 PM, Gene Leynes gleyne...@gmail.com wrote: On Fri, Feb 4, 2011 at 6:54 PM, Ista Zahn iz...@psych.rochester.edu wrote: However, I don't think you've told us what you're actually trying to accomplish... I'm trying to aggregate the y value of a big data set which has several x's and a y. I'm using an abstracted example for many reasons. Partially, I'm using an abstracted example to comply with the posting guidelines of having a reproducible example. I'm really aggregating some incredibly boring and complex customer data for an undisclosed client. As it turns out, Aggregate will not work when some of x's are NA, unless you convert them to factors, with NA's included. In my case, the data is so big that doing the conversions causes other memory problems, and renders some of my numeric values useless for other calculations. My real data looks more like this (except with a few more categories and records): set.seed(100) library(plyr) dat=data.frame( x1=sample(c(NA,'m','f'), 2e6, replace=TRUE), x2=sample(c(NA, 1:10), 2e6, replace=TRUE), x3=sample(c(NA,letters[1:5]), 2e6, replace=TRUE), x4=sample(c(NA,T,F), 2e6, replace=TRUE), x5=sample(c(NA,'active','inactive','deleted','resumed'), 2e6, replace=TRUE), x6=sample(c(NA, 1:10), 2e6, replace=TRUE), x7=sample(c(NA,'married','divorced','separated','single','etc'), 2e6, replace=TRUE), x8=sample(c(NA,T,F), 2e6, replace=TRUE), y=trunc(rnorm(2e6)*1), stringsAsFactors=F) str(dat) ## The control total sum(dat$y, na.rm=T) ## The aggregate total sum(aggregate(dat$y, dat[,1:8], sum, na.rm=T)$x) ## The ddply total sum(ddply(dat, .(x1,x2,x3,x4,x5,x6,x7,x8), function(x) {data.frame(y.sum=sum(x$y,na.rm=TRUE))})$y.sum) ddply worked a little better than I expected at first, but it slows to a crawl or has runs out of memory too often for me to invest the time learning how to use it. Just now it worked for 1m records, and it was just a bit slower than aggregate. But for the 2m example it hasn't finished calculating. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate function - na.action
There's definitely something amiss with aggregate() here since similar functions from other packages can reproduce your 'control' sum. I expect ddply() will have some timing issues because of all the subgrouping in your data frame, but data.table did very well and the summaryBy() function in the doBy package did OK: Well, if you use the right plyr function, it works just fine: system.time(count(dat, c(x1, x2, x3, x4, x4, x5, x6, x7, x8), y)) # user system elapsed # 9.754 1.314 11.073 Which illustrates something that I've believed for a while about data.table - it's not the indexing that speed things up, it's the custom data structure. If you use ddply with data frames, it's slow because data frames are slow. I think the right way to resolve this is to to make data frames more efficient, perhaps using some kind of mutable interface where necessary for high-performance operations. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] questions about counting numbers
Hello R-helpers, I have a question about counting numbers. Here is a simple example. a=c(2, 3, 3,4) table(a) a 2 3 4 1 2 1 so, I can to create another variables that has the corresponding counting numbers. In this case, I want to have: b=c(1,2,2,1) Is there any way coding for this ? Thanks for helps! Carrie-- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] questions about counting numbers
Hi Carrie, Try x - rle(a) rep(x$lengths, x$lengths) [1] 1 2 2 1 HTH, Jorge On Sun, Feb 6, 2011 at 8:21 PM, Carrie Li wrote: Hello R-helpers, I have a question about counting numbers. Here is a simple example. a=c(2, 3, 3,4) table(a) a 2 3 4 1 2 1 so, I can to create another variables that has the corresponding counting numbers. In this case, I want to have: b=c(1,2,2,1) Is there any way coding for this ? Thanks for helps! Carrie-- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] questions about counting numbers
It's not quite clear what the OP really wanted. A more general solution may be: a=c(4, 3, 2, 3) ta - table(a) ta a 2 3 4 1 2 1 ta[as.character(a)] a 4 3 2 3 1 2 1 2 HTH Ray Brownrigg On Mon, 07 Feb 2011, Jorge Ivan Velez wrote: Hi Carrie, Try x - rle(a) rep(x$lengths, x$lengths) [1] 1 2 2 1 HTH, Jorge On Sun, Feb 6, 2011 at 8:21 PM, Carrie Li wrote: Hello R-helpers, I have a question about counting numbers. Here is a simple example. a=c(2, 3, 3,4) table(a) a 2 3 4 1 2 1 so, I can to create another variables that has the corresponding counting numbers. In this case, I want to have: b=c(1,2,2,1) Is there any way coding for this ? Thanks for helps! Carrie-- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate function - na.action
On Feb 6, 2011, at 7:41 PM, Hadley Wickham wrote: There's definitely something amiss with aggregate() here since similar functions from other packages can reproduce your 'control' sum. I expect ddply() will have some timing issues because of all the subgrouping in your data frame, but data.table did very well and the summaryBy() function in the doBy package did OK: Well, if you use the right plyr function, it works just fine: system.time(count(dat, c(x1, x2, x3, x4, x4, x5, x6, x7, x8), y)) # user system elapsed # 9.754 1.314 11.073 Which illustrates something that I've believed for a while about data.table - it's not the indexing that speed things up, it's the custom data structure. If you use ddply with data frames, it's slow because data frames are slow. I think the right way to resolve this is to to make data frames more efficient, perhaps using some kind of mutable interface where necessary for high-performance operations. Data.frames are also fat. Simply adding a single new column to a dataset bordering on large (5 million rows by 200 columns) requires more memory than even twice the size of the full dataframe. (Paging ensues on a Mac with 24GB.) Unless, of course, there is a more memory- efficient strategy than: dfrm$newcol - with(dfrm, func(variables) ). The table() operation on the other hand is blazingly fast and requires practically no memory. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] questions about counting numbers
Many thanks! all methods work well! thanks again! On Sun, Feb 6, 2011 at 8:42 PM, Ray Brownrigg ray.brownr...@ecs.vuw.ac.nzwrote: It's not quite clear what the OP really wanted. A more general solution may be: a=c(4, 3, 2, 3) ta - table(a) ta a 2 3 4 1 2 1 ta[as.character(a)] a 4 3 2 3 1 2 1 2 HTH Ray Brownrigg On Mon, 07 Feb 2011, Jorge Ivan Velez wrote: Hi Carrie, Try x - rle(a) rep(x$lengths, x$lengths) [1] 1 2 2 1 HTH, Jorge On Sun, Feb 6, 2011 at 8:21 PM, Carrie Li wrote: Hello R-helpers, I have a question about counting numbers. Here is a simple example. a=c(2, 3, 3,4) table(a) a 2 3 4 1 2 1 so, I can to create another variables that has the corresponding counting numbers. In this case, I want to have: b=c(1,2,2,1) Is there any way coding for this ? Thanks for helps! Carrie-- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] questions about counting numbers
?ave a [1] 2 3 3 4 cbind(a, ave(a, a, FUN=length)) a [1,] 2 1 [2,] 3 2 [3,] 3 2 [4,] 4 1 On Sun, Feb 6, 2011 at 8:21 PM, Carrie Li carrieands...@gmail.com wrote: Hello R-helpers, I have a question about counting numbers. Here is a simple example. a=c(2, 3, 3,4) table(a) a 2 3 4 1 2 1 so, I can to create another variables that has the corresponding counting numbers. In this case, I want to have: b=c(1,2,2,1) Is there any way coding for this ? Thanks for helps! Carrie-- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] delete rows
Hello. I came across your response in an R forum and could use your help. I have a data set with 472 rows. I want to delete rows 416 through 472. The name of my data set is MERGE. I am an extreme R novice. How do I write a script to accomplish this? Thank you. --- Christopher H. Porter, M.A., M.Ed. Director, Undergraduate Recruitment College of Engineering and Science Clemson University 106B Holtzendorff Hall (864) 656-7870 (864) 656-1327 - Fax AIM: ClemsonCES http://www.clemson.edu/ces/psu/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding non-normal distributions per row of data frame?
Hugo - thanks for the link, but that's a bit beyond me right now. I have ideas of the distribution for some 'nuggets', but others might be new. I hope that there's an easier way to pick them out than going after pre-defined patterns... Greg - I use bioconductor for a lot of processing, but I'm not aware of anything that does this job in particular. Still looking though. Thanks for your input. -- View this message in context: http://r.789695.n4.nabble.com/Finding-non-normal-distributions-per-row-of-data-frame-tp3259439p3263461.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsampling out of site*abundance matrix
I figured there would be an even more straightforward way, but that works David, thanks. There has to be a way to get the output I want/need (see below). I tried to bind or merge the elements of apply(samptbl, 2, table) but with no success. I could probably make a for loop with a merge statement, it would work.. but I'm guessing unnecessary and just plain ugly. ## what I want/need spA spB spC spD spa spF spG site1 813 613 320 28 site2 31 25 0 0 25 19 0 site300 9 510 040 site4 27 19 0 022 32 0 If you know, I'd appreciate it.. thanks again for the help. David Winsemius wrote: On Feb 6, 2011, at 3:25 PM, B77S wrote: Hello, How can I randomly sample individuals within a sites from a site (row) X species abundance (column) data frame or matrix? As an example, the matrix abund2 made below. # (sorry, Im a newbie and this is the only way I know to get an example on here) abund1 -c(150, 300, 0, 360, 150, 300, 0, 240, 150, 0, 60, 0, 150, 0, 540, 0, 0, 300, 0, 240, 300, 300, 0, 360, 300, 0, 600, 0) abund2 - matrix(data=abund1, nrow=4, ncol=7) colnames(abund2) - c(spA, spB, spC, spD, spa, spF, spG) rownames(abund2)-c(site1, site2, site3, site4) Perfect. Best submission of an example by a newbie in weeks. # abund2 spA spB spC spD spa spF spG site1 150 150 150 150 0 300 300 site2 300 300 0 0 300 300 0 site3 0 0 60 540 0 0 600 site4 360 240 0 0 240 360 0 How can I make a random subsample of 100 individuals from the abundances given for each site? samptbl - apply(abund2, 1, function(x) sample(colnames(abund2), 100, prob=x, replace=TRUE) ) samptbl site1 site2 site3 site4 [1,] spG spa spD spF [2,] spF spF spG spB [3,] spF spB spC spA [4,] spD spa spG spA [5,] spF spa spD spa [6,] spA spB spD spF [7,] spA spF spD spA [8,] spG spF spG spa [9,] spF spF spG spa [10,] spG spB spD spA Snipped apply() always transposes the results when called with row margins. The t() function would fix this if it needed to be arranged with rows by site. You could check by further apply-(cation) of table to the columns: apply(samptbl, 2, table) $site1 spA spB spC spD spF spG 8 13 6 13 32 28 $site2 spa spA spB spF 25 31 25 19 $site3 spC spD spG 9 51 40 $site4 spa spA spB spF 22 27 19 32 This is probably really easy. Thanks. Bubba -- View this message in context: http://r.789695.n4.nabble.com/Subsampling-out-of-site-abundance-matrix-tp3263148p3263148.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://r.789695.n4.nabble.com/Subsampling-out-of-site-abundance-matrix-tp3263148p3263488.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] delete rows
Hi, On Sun, Feb 6, 2011 at 6:16 PM, Christopher Porter cpor...@clemson.edu wrote: Hello. I came across your response in an R forum and could use your help. I have a data set with 472 rows. I want to delete rows 416 through 472. The name of my data set is MERGE. Try this: MERGE2 - MERGE[-c(416:472), ] Cheers, Josh I am an extreme R novice. How do I write a script to accomplish this? Thank you. --- Christopher H. Porter, M.A., M.Ed. Director, Undergraduate Recruitment College of Engineering and Science Clemson University 106B Holtzendorff Hall (864) 656-7870 (864) 656-1327 - Fax AIM: ClemsonCES http://www.clemson.edu/ces/psu/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] delete rows
?subset On Mon, Feb 7, 2011 at 7:46 AM, Christopher Porter cpor...@clemson.edu wrote: Hello. I came across your response in an R forum and could use your help. I have a data set with 472 rows. I want to delete rows 416 through 472. The name of my data set is MERGE. I am an extreme R novice. How do I write a script to accomplish this? Thank you. --- Christopher H. Porter, M.A., M.Ed. Director, Undergraduate Recruitment College of Engineering and Science Clemson University 106B Holtzendorff Hall (864) 656-7870 (864) 656-1327 - Fax AIM: ClemsonCES http://www.clemson.edu/ces/psu/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] delete rows
On Sun, Feb 6, 2011 at 6:16 PM, Christopher Porter cpor...@clemson.eduwrote: Hello. I came across your response in an R forum and could use your help. I have a data set with 472 rows. I want to delete rows 416 through 472. The name of my data set is MERGE. I am an extreme R novice. How do I write a script to accomplish this? By first reading An Introduction to R before posting such basic questions on this list?? -- Bert Thank you. --- Christopher H. Porter, M.A., M.Ed. Director, Undergraduate Recruitment College of Engineering and Science Clemson University 106B Holtzendorff Hall (864) 656-7870 (864) 656-1327 - Fax AIM: ClemsonCES http://www.clemson.edu/ces/psu/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsampling out of site*abundance matrix
On Feb 6, 2011, at 9:35 PM, B77S wrote: I figured there would be an even more straightforward way, but that works David, thanks. I am rather puzzled. What prior experience with computing would lead you to believe that one line of code was not a straightforward method to do multinomial sampling from 4 different sets of probabilities??? There has to be a way to get the output I want/need (see below). I tried to bind or merge the elements of apply(samptbl, 2, table) but with no success. I could probably make a for loop with a merge statement, it would work.. but I'm guessing unnecessary and just plain ugly. ## what I want/need spA spB spC spD spa spF spG site1 813 613 320 28 site2 31 25 0 0 25 19 0 site300 9 510 040 site4 27 19 0 022 32 0 If you know, I'd appreciate it.. thanks again for the help. In order to keep the zero entries straight, there needs to be some sort of structure that records the fact that there were not occurrences in a column. Using a factor construct is the method R uses for that purpose. Factors work better in data.frames: set.seed(123) samptbl - apply(abund2, 1, function(x) sample(colnames(abund2), 100, prob=x, replace=TRUE) ) sampdf - as.data.frame(samptbl) sampdf[[1]] - factor( sampdf[[1]], levels= colnames(abund2) ) sampdf[[2]] - factor( sampdf[[2]], levels= colnames(abund2) ) sampdf[[3]] - factor( sampdf[[3]], levels= colnames(abund2) ) sampdf[[4]] - factor( sampdf[[4]], levels= colnames(abund2) ) sapply(sampdf, table) site1 site2 site3 site4 spA1420 031 spB1230 020 spC 8 0 7 0 spD13 041 0 spa 020 019 spF2630 030 spG27 052 0 Again, the t() function would flip that: t( sapply(sampdf, table) ) spA spB spC spD spa spF spG site1 14 12 8 13 0 26 27 site2 20 30 0 0 20 30 0 site3 0 0 7 41 0 0 52 site4 31 20 0 0 19 30 0 -- david. David Winsemius wrote: On Feb 6, 2011, at 3:25 PM, B77S wrote: Hello, How can I randomly sample individuals within a sites from a site (row) X species abundance (column) data frame or matrix? As an example, the matrix abund2 made below. # (sorry, Im a newbie and this is the only way I know to get an example on here) abund1 -c(150, 300, 0, 360, 150, 300, 0, 240, 150, 0, 60, 0, 150, 0, 540, 0, 0, 300, 0, 240, 300, 300, 0, 360, 300, 0, 600, 0) abund2 - matrix(data=abund1, nrow=4, ncol=7) colnames(abund2) - c(spA, spB, spC, spD, spa, spF, spG) rownames(abund2)-c(site1, site2, site3, site4) Perfect. Best submission of an example by a newbie in weeks. # abund2 spA spB spC spD spa spF spG site1 150 150 150 150 0 300 300 site2 300 300 0 0 300 300 0 site3 0 0 60 540 0 0 600 site4 360 240 0 0 240 360 0 How can I make a random subsample of 100 individuals from the abundances given for each site? samptbl - apply(abund2, 1, function(x) sample(colnames(abund2), 100, prob=x, replace=TRUE) ) samptbl site1 site2 site3 site4 [1,] spG spa spD spF [2,] spF spF spG spB [3,] spF spB spC spA [4,] spD spa spG spA [5,] spF spa spD spa [6,] spA spB spD spF [7,] spA spF spD spA [8,] spG spF spG spa [9,] spF spF spG spa [10,] spG spB spD spA Snipped apply() always transposes the results when called with row margins. The t() function would fix this if it needed to be arranged with rows by site. You could check by further apply-(cation) of table to the columns: apply(samptbl, 2, table) $site1 spA spB spC spD spF spG 8 13 6 13 32 28 $site2 spa spA spB spF 25 31 25 19 $site3 spC spD spG 9 51 40 $site4 spa spA spB spF 22 27 19 32 This is probably really easy. Thanks. Bubba -- View this message in context: http://r.789695.n4.nabble.com/Subsampling-out-of-site-abundance-matrix-tp3263148p3263148.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://r.789695.n4.nabble.com/Subsampling-out-of-site-abundance-matrix-tp3263148p3263488.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
Re: [R] Fortran and long integers
Peter Langfelder wrote: Hi all, I'm hoping someone more knowledgeable in Fortran than I can chime in with opinion. I'm the maintainer of the flashClust package that implements fast hierarchical clustering. The fortran code fails when the number of clustered objects is larger than about 46300. My guess is that this is because the code uses the following construct: 2-byte (16 bit) signed integers would have a range from -32768 to +37267. So, it looks like you may be using 2-byte integers and 46,300 would definitely cause an overflow with 16-bit integers. I haven't used Fortran for a long time, but there could be a compiler switch that forces all 2-byte integers, or a specific declaration that says I, J, N, IOFFSET are only 2-byte (16-bit) integers. I'm guess, but you might try a specification like INTEGER*4 I, J, N, IOFFSET assuming INTEGER*4 is legal with your Fortran compiler: http://gcc.gnu.org/onlinedocs/gfortran/Old_002dstyle-kind-specifications.html#Old_002dstyle-kind-specifications efg Earl F Glynn Overland Park, KS __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Giving vector of colors to line in plots
Hey all, I can't for the life of me figure out what I'm missing here. I'm trying to change the color of the line in a time series type plot. I can change the point colors and symbols no problem, but for some reason the colors do not get passed to the lines, regardless of if I do type=b or type=l. The sample code I'm using is below. Any help would be greatly appreciated. Also, please CC me, as I only get daily summaries of the mailing list. Thanks, Brian ## Changing plot attributes through the plot set.seed(33) x - rpois(7,lambda=7) y - rpois(7,lambda=5) cols.x - c(rep(black,2),rep(red,3),rep(black,2)) cols.y - c(rep(blue,3),rep(yellow,2),rep(blue,2)) points.x - c(rep(x,2),rep(O,3),rep(x,2)) points.y - c(rep(8,3),rep(17,2),rep(8,2)) plot(x,col=cols.x,pch=points.x,type=b,ylim=c(0,15)) points(y,col=cols.y,pch=points.y,type=b) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fortran and long integers
Earl F Glynn wrote: 2-byte (16 bit) signed integers would have a range from -32768 to +37267. So, it looks like you may be using 2-byte integers and 46,300 would definitely cause an overflow with 16-bit integers. I haven't used Fortran for a long time, but there could be a compiler switch that forces all 2-byte integers, or a specific declaration that says I, J, N, IOFFSET are only 2-byte (16-bit) integers. I'm guess, but you might try a specification like INTEGER*4 I, J, N, IOFFSET assuming INTEGER*4 is legal with your Fortran compiler: The overflow is not caused by 16 bits integers. I'm quite sure the OP is using 32 bit integers. The overflow is caused by the multiplication N*(i-1) and/or i*(i+1). In Fortran there's not much you can do about this unless your compiler supports larger integers. A pity that fortran doesn't have a posint. Either your solution with doubles or a small C function looks like the way out. Berend -- View this message in context: http://r.789695.n4.nabble.com/Fortran-and-long-integers-tp3263054p3263605.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fortran and long integers
Hi, Does it alleviate things if you rewrite the sums to avoid large products? For I even: J+I*(N-I/2)-(N-I/2) For I odd: J+I*(N-(I+1)/2)-(N-(I+1)/2)+(I+1)/2 Hope it helps, Tsjerk On Mon, Feb 7, 2011 at 7:21 AM, Berend Hasselman b...@xs4all.nl wrote: Earl F Glynn wrote: 2-byte (16 bit) signed integers would have a range from -32768 to +37267. So, it looks like you may be using 2-byte integers and 46,300 would definitely cause an overflow with 16-bit integers. I haven't used Fortran for a long time, but there could be a compiler switch that forces all 2-byte integers, or a specific declaration that says I, J, N, IOFFSET are only 2-byte (16-bit) integers. I'm guess, but you might try a specification like INTEGER*4 I, J, N, IOFFSET assuming INTEGER*4 is legal with your Fortran compiler: The overflow is not caused by 16 bits integers. I'm quite sure the OP is using 32 bit integers. The overflow is caused by the multiplication N*(i-1) and/or i*(i+1). In Fortran there's not much you can do about this unless your compiler supports larger integers. A pity that fortran doesn't have a posint. Either your solution with doubles or a small C function looks like the way out. Berend -- View this message in context: http://r.789695.n4.nabble.com/Fortran-and-long-integers-tp3263054p3263605.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Tsjerk A. Wassenaar, Ph.D. post-doctoral researcher Molecular Dynamics Group * Groningen Institute for Biomolecular Research and Biotechnology * Zernike Institute for Advanced Materials University of Groningen The Netherlands __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fortran and long integers
Tsjerk Wassenaar wrote: Hi, Does it alleviate things if you rewrite the sums to avoid large products? For I even: J+I*(N-I/2)-(N-I/2) Shouldn't that be J+I*(N-I/2)-(N+I/2) ? Berend -- View this message in context: http://r.789695.n4.nabble.com/Fortran-and-long-integers-tp3263054p3263647.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fortran and long integers
Yes, thnx... Typo :$ On Mon, Feb 7, 2011 at 8:23 AM, Berend Hasselman b...@xs4all.nl wrote: Tsjerk Wassenaar wrote: Hi, Does it alleviate things if you rewrite the sums to avoid large products? For I even: J+I*(N-I/2)-(N-I/2) Shouldn't that be J+I*(N-I/2)-(N+I/2) ? Berend -- View this message in context: http://r.789695.n4.nabble.com/Fortran-and-long-integers-tp3263054p3263647.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Tsjerk A. Wassenaar, Ph.D. post-doctoral researcher Molecular Dynamics Group * Groningen Institute for Biomolecular Research and Biotechnology * Zernike Institute for Advanced Materials University of Groningen The Netherlands __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Image to plot
Dear all I would like to plot the contents of a matrix as an Image. I found this code here http://www.phaget4.org/R/image_matrix.html but this is not only what I want. Instead of having inside every cell the color of the cell it would be nice to have also the arithmetic value over the background color. Having only the color sometimes does not make it clear to understand what is the value each cell has .. so I was thinking to combine colors and text inside every cell. I would like to thank you in advance for your help Best Regards Alex. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lattice: Synchronizing key and lines; or: why is lty used as lwd in legend?
Dear latticists, Getting legend and graph line types synchronized has been a challenge in lattice; the last example below that works looks ridiculously ugly for such a simple job, I am sure there is a more elegant solution. When trying this out, I found a feature which I found confusing: why are line types (lty) translated to line widths (lwd) in the legend? See attached pdf in case this should not be reproducible on your computer. Dieter Library(lattice) os = OrchardSprays[OrchardSprays$rowpos 4,] os$rowpos = as.factor(os$rowpos) # Set it globally # This is confusing: why is lty used as lwd in legend?? trellis.par.set(superpose.line = list(col=c(blue,red,green), lwd = 2, lty=c(1,4,6))) xyplot(decrease ~ treatment, os, groups = rowpos, type = a, auto.key = list(space = right, points = FALSE, lines = TRUE)) dev.off() # To me, this seems to be the most consistent method if it worked. s= simpleTheme(lty=c(1,4,6),col=c(red,blue,green),lwd=3) xyplot(decrease ~ treatment, os, groups = rowpos, par.settings=s, type = a, auto.key = list(space = right, points = FALSE, lines = TRUE)) # !!! There must be an easier way to synchronize legend and plot xyplot(decrease~treatment,data=os,groups=rowpos, type = a,par.settings=s, key = list(text=list(levels(os$rowpos)), lines=list(lty=s$superpose.line$lty, lwd=s$superpose.line$lwd, col= s$superpose.line$col))) R version 2.12.1 (2010-12-16) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] lattice_0.19-17 loaded via a namespace (and not attached): [1] grid_2.12.1 tools_2.12.1 lwdAsLty.pdf Description: lwdAsLty.pdf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsampling out of site*abundance matrix
hehe... very true sir; I apologize, that was very straightforward. Thank you for your time. -- View this message in context: http://r.789695.n4.nabble.com/Subsampling-out-of-site-abundance-matrix-tp3263148p3263598.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] FW: multivariate regression
#I have got 3 dependent variables: Y-matrix(c(3,5,6,3,4,2,4,5,3,2,3,5,6,3,4,2,4,5,3,2,3,5,6,3,4,2,4,5,3,2), nrow = 10, ncol=3, byrow=TRUE) #I've got one independent variable: X-matrix(c(42,54,67,76,45,76,54,87,34,65), nrow = 10, ncol=1, byrow=TRUE) summary(lm(Y~X)) and the result is as below: Response Y1 : Call: lm(formula = Y1 ~ X) Residuals: Min 1Q Median 3Q Max -1.5040 -0.8838 -0.3960 1.1174 2.1162 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 4.435071.70369 2.603 0.0315 * X -0.012250.02742 -0.447 0.6668 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 1.401 on 8 degrees of freedom Multiple R-squared: 0.02435,Adjusted R-squared: -0.09761 F-statistic: 0.1997 on 1 and 8 DF, p-value: 0.6668 Response Y2 : Call: lm(formula = Y2 ~ X) Residuals: Min 1Q Median 3Q Max -1.4680 -0.8437 -0.2193 0.9050 1.9960 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 1.379941.50111 0.9190.385 X0.038670.02416 1.6010.148 Residual standard error: 1.235 on 8 degrees of freedom Multiple R-squared: 0.2426, Adjusted R-squared: 0.1479 F-statistic: 2.562 on 1 and 8 DF, p-value: 0.1481 Response Y3 : Call: lm(formula = Y3 ~ X) Residuals: Min 1Q Median 3Q Max -1.7689 -0.7316 -0.1943 1.1448 2.0933 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 4.389131.70626 2.5720.033 * X -0.011490.02746 -0.4180.687 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 1.403 on 8 degrees of freedom Multiple R-squared: 0.0214, Adjusted R-squared: -0.1009 F-statistic: 0.175 on 1 and 8 DF, p-value: 0.6867 There are 3 F statistics, R2 and p-values. But I want just one R2 and pvalue for my multivariate regression model. Date: Fri, 4 Feb 2011 08:23:39 -0500 From: jsor...@grecc.umaryland.edu To: denizsigi...@hotmail.com; r-help@r-project.org Subject: Re: [R] multivariate regression Please help us help you. Follow the posting rules and send us a copy of your code and output. John John Sorkin Chief Biostatistics and Informatics Univ. of Maryland School of Medicine Division of Gerontology and Geriatric Medicine jsor...@grecc.umaryland.edu -Original Message- From: Deniz SIGIRLI denizsigi...@hotmail.com To: r-help@r-project.org Sent: 2/4/2011 7:54:56 AM Subject: [R] multivariate regression How can I run multivariate linear regression in R (I have got 3 dependent variables and only 1 independent variable)? I tried lm function, but it gave different R2 and p values for every dependent variable. I need one R2 and p value for the model. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Statement: This email message, including any attachments, is for ...{{dropped:5}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.