[R] Background fill and border for a legend in dotplot
Dear R help group I've been working on this plot for a while now and now getting around to the minor adjusments. I would like to be able to put a border and background fill around the legend in this plot. I understand the legend 'bty' should do this have this capablity but not sure how the syntax works in this case ## initalise library(lattice) library(latticeExtra) # for mergedTrellisLegendGrob() ##read the data to a variable # Cal_dat - read.table(Calibration2.dat,header = TRUE,sep = \t,) ## set up plotting colours # col.pat-c(violet,cyan,green,red,blue,black,yellow) sym.pat-c(19,20,21) ##set up the plot key # key1 - draw.key(list(text=list(levels(Cal_dat$Commodity)), title=Ore type, points=list(pch=22, cex=1.3, fill=col.pat, col=black)), draw = FALSE) key2 - draw.key(list(text=list(levels(factor(Cal_dat$Year))), title=Year, points = list(pch = c(21, 22, 23), cex=1.3, col=black)), draw = FALSE) mkey - mergedTrellisLegendGrob(list(fun = key2), list(fun = key1), vertical = TRUE ) ##set some parameters for the plot # trellis.par.set( dot.line=list(col = grey90, lty=dashed), axis.line=list(col = grey50), axis.text=list(col =grey50, cex=0.8), panel.background=list(col=transparent), par.xlab.text= list(col=grey50), ) ## Create the dot plot # with(Cal_dat, dotplot(reorder(paste(Mine,Company), Resc_Gt) ~ Resc_Gt, fill_var = Commodity, pch_var = factor(Year), cex=1.2, pch = c(21, 22, 23), col = black, fill = col.pat, aspect = 2.0, alpha=0.6, legend = list(inside = list(fun = mkey,corner = c(0.95, 0.01))), scales = list(x = list(log = 10)), xscale.components = xscale.components.log10ticks, origin = 0, type = c(p,a), main = Mineral resources, xlab= Total tonnes (billions), panel = function(x, y, ..., subscripts, fill, pch, fill_var, pch_var) { pch - pch[pch_var[subscripts]] fill - fill[fill_var[subscripts]] panel.dotplot(x, y, pch = pch, fill = fill, ...) })) panel = function(x, y, ..., subscripts, fill, pch, fill_var, pch_var) { pch - pch[pch_var[subscripts]] fill - fill[fill_var[subscripts]] panel.dotplot(x, y, pch = pch, fill = fill, ...) } http://r.789695.n4.nabble.com/file/n3785003/Calibration2.dat Calibration2.dat -- View this message in context: http://r.789695.n4.nabble.com/Background-fill-and-border-for-a-legend-in-dotplot-tp3785003p3785003.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Urgente
- This mail is in HTML. Some elements may be ommited in plain text. - Meu nome é Frank P, eu trabalhei com o Banco SNS em Holland.I habe Você contactado por conta de parente falecido com os nossos Banj e que foi deixado sem pretensões. Por favor, me envia um email com um número de telefone e também indica o melhor momento que eu possa chamá-lo por telefone. Isto irá permitir-me dar mais informações sobre este assunto. Por favor, note que eu estou atualmente na Inglaterra, em estudos adicionais para o meu trabalho. Regards Frank P www.snsbank.nl 447741782385 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about BIC of two different regression models? how should we compare two regression models?
Andra Isan andra_isan at yahoo.com writes: Hi All, In order to compare two different logistic regressions, I think I need to compare them based on their BIC values, but I am not sure if the smaller BIC would mean a better model or the reverse is true? Thanks a lot,Andra Smaller (i.e. lower value) BIC is always better (even if BIC happens to be negative, as can happen in some cases; i.e. BIC=-1002 is better than BIC=-1000, BIC=1000 is better than BIC=1002). I would suggest however that (a) there are better venues for this question (e.g. stats.stackexchange.com), since it's a stats and not an R question; (b) it might be a good idea to review a stats text, or even http://en.wikipedia.org/wiki/Bayesian_information_criterion , since this is a pretty basic question. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two question about plot
Hi Re: [R] two question about plot The help for boxplot offers suggestions for both those things. You may be particularly interested in: names: group labels which will be printed under each boxplot. Can be a character vector or an expression (see plotmath). add: logical, if true _add_ boxplot to current plot. Sarah It could be also worth to consult also bwplot from lattice or especially ggplot2 package for plotting. Regards Petr On Thu, Sep 1, 2011 at 1:31 PM, Jie TANG totang...@gmail.com wrote: 1) how to modify the the tickment of x-axis or y-axis. boxplot(data[,1:5]) the tickment in x-axis in V1 V2 V3 V4 V5 ,I want to be some name for example name-c(1day,2day,3day,4day,5day) 2) how to overlap two plot into one figure? plot(data[1:5]) boxplot(newdata[,1:5]) ? -- TANG Jie -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 to create a square plot
On 09/01/2011 09:02 PM, baptiste auguie wrote: Hi, Are you after this? last_plot() + opts(aspect.ratio=1) Even better (I once got correct by Hadley for using aspect.ratio, but this was plotting spatial data...) last_plot() + coord_equal() cheers, Paul Also, see https://github.com/hadley/ggplot2/wiki/Themes for some settings re: plot margins. HTH, baptiste On 1 September 2011 05:18, Alaios ala...@yahoo.com wrote: Dear all, I am using ggplot with geom_tile to print as an image a matrix I have. My matrix is a squared one of 512*512 cells. The code that does that is written below print(v + geom_tile(aes(fill=dB))+ opts(axis.text.x=theme_text(size=20),axis.text.y=theme_text(size=20), axis.title.x=theme_text(size=25) , axis.title.y=theme_text(size=25), legend.title=theme_text(size=25,hjust=-0.4) , legend.text=theme_text(size=20)) + scale_x_continuous('km') + scale_y_continuous('km')) as you can see from the picture below http://imageshack.us/photo/my-images/171/backupf.jpg/ this squared matrix is printed a bit squeezed with the height being bigger than the width. Would be possible somehow to print that plot by keeping the square-look of the matrix in the plot? Of course the other elements like axis and legend will make the over all plot to not be square but I do not care as the blue and red region forms a square. I would like to thank you in advance for your help B.R Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 to create a square plot
Dear Dennis, I wouldl ike to thank you for your answer. That did it work. how can I know remove the space between the axis labels and the image (blue-red area)? Right now things look like are floating a bit. I want the x,y labels to reduce the overall gap. I would like to thank you in advance for your help B.R Alex From: Dennis Murphy djmu...@gmail.com Cc: R-help@r-project.org R-help@r-project.org Sent: Thursday, September 1, 2011 3:18 PM Subject: Re: [R] ggplot2 to create a square plot Hi: Dear Dennis, I would like to thank you for your reply. I also checked the web sites that you gave me, it is hard to find everything about ggplot2 at one place with concrete examples that help you understand directly what you are plotting. There is work in progress to alleviate that problem, but it is still being worked on and it may be a little while before it becomes publicly available. That could mean anywhere from a few days to a few months. I don't know what is the current status wrt the project. As you have already mentioned ggsave can save the image as I want to and that is what I am using now. One minor issue (as you have also mentioned is to remove they gray border between the x and y legend and the red and blue area. See scale_continuous and look at the expand = argument; something like scale_x_continuous(expand = c(0, 0)) + scale_y_continuous(expand = c(0, 0)) might be what you need, but try it for yourself. HTH, Dennis Thus I checked the website and tried by applying the options to remove it. Unfortunately I ended up with a full list of more arguments print(v + geom_tile(aes(fill=dB))+ opts(axis.text.x=theme_text(size=20),axis.text.y=theme_text(size=20), axis.title.x=theme_text(size=25) , axis.title.y=theme_text(size=25), legend.title=theme_text(size=25,hjust=-0.4) , legend.text=theme_text(size=20) , panel.background=theme_blank() , panel.margin=unit(100,lines) , panel.grid.major=theme_line(size=0.1) , plot.margin=unit(c(0,0,0,0),lines) ) + scale_x_continuous('km') + scale_y_continuous('km') ) so far I have not remove that extra space.. Do you have any suggestion of how I can remove it? I would like to thank all for their time. B.R Alex From: Dennis Murphy djmu...@gmail.com Cc: R-help@r-project.org R-help@r-project.org Sent: Wednesday, August 31, 2011 9:34 PM Subject: Re: [R] ggplot2 to create a square plot Hi: I'd suggest using ggsave(); in particular, see its height = and width = arguments. If you have some time, you could look at some examples of ggplot2 themes: https://github.com/hadley/ggplot2/wiki/themes and some examples of how to use various opts(): https://github.com/hadley/ggplot2/wiki/%2Bopts%28%29-List These can be useful if you need to reduce the amount of space around the plot or reposition the legend to the top or bottom to get more horizontal space for the plot. This sometimes is a germane issue when the plot is intended to be square. HTH, Dennis Dear all, I am using ggplot with geom_tile to print as an image a matrix I have. My matrix is a squared one of 512*512 cells. The code that does that is written below print(v + geom_tile(aes(fill=dB))+ opts(axis.text.x=theme_text(size=20),axis.text.y=theme_text(size=20), axis.title.x=theme_text(size=25) , axis.title.y=theme_text(size=25), legend.title=theme_text(size=25,hjust=-0.4) , legend.text=theme_text(size=20)) + scale_x_continuous('km') + scale_y_continuous('km') ) as you can see from the picture below http://imageshack.us/photo/my-images/171/backupf.jpg/ this squared matrix is printed a bit squeezed with the height being bigger than the width. Would be possible somehow to print that plot by keeping the square-look of the matrix in the plot? Of course the other elements like axis and legend will make the over all plot to not be square but I do not care as the blue and red region forms a square. I would like to thank you in advance for your help B.R Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Automatic Recoding
Thank you both for your replies. I've tried it with a small sample of the data and it works perfectly. I have no idea yet how it works but I will spend some time to figure it out. Thank you! Thomas -- View this message in context: http://r.789695.n4.nabble.com/Automatic-Recoding-tp3784043p3785565.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] calling R from PHP error
Hello, I am having the following error while calling an R script through PHP. /usr/local/bin/R: line 227: /kk/Programs/R-2.13.0/etc/ldpaths: Permission denied ERROR: R_HOME ('/kk/Programs/R-2.13.0') not found I had compiled R from source and placed the generated R shell script in /usr/local/bin. Can you give me an insight of how to give permission to access the ldpaths file and why is the R_HOME tree not found? Thank you and regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Re : P values for vglm(zibinomial) function in VGAM
Estimation is realized by MLE, estimators are asymptotically normal Try this reg-vglm(...) p.value-1-pnorm(abs(coef(reg)/sqrt(diag(vcov(reg) Justin BEM BP 1917 Yaoundé Tél (237) 76043774 De : suuz suuz_b...@hotmail.com À : r-help@r-project.org Envoyé le : Mardi 23 Août 2011 15h04 Objet : [R] P values for vglm(zibinomial) function in VGAM Hi , I know this question has been asked twice in the past but to my knowldege, it still hasn't been solved. I am doing a zero inflated binomial model using the VGAM package, I need to obtain p values for my Tvalues in the vglm output. code is as follows mod2=vglm(dmat~Season+Diel+Tidal.phase+Tidal.cycle,zibinomial, data=mp1) summary(mod2) Call: vglm(formula = dmat ~ Season + Diel + Tidal.phase + Tidal.cycle, family = zibinomial, data = mp1) Pearson Residuals: Min 1Q Median 3Q Max logit(phi) -3.6496 0.273679 0.285619 0.296763 1.0974 logit(mu) -6.3631 -0.029027 -0.020786 -0.011719 80.6051 Coefficients: Value Std. Error t value (Intercept):1 2.365835 0.029142 81.18334 (Intercept):2 -3.182376 0.050054 -63.57944 SeasonSpring -0.080840 0.054201 -1.49147 SeasonSummer 0.204781 0.049936 4.10083 SeasonWinter 0.385078 0.043874 8.77692 DielE -0.079190 0.079859 -0.99163 DielM 0.071607 0.074620 0.95963 DielN 0.132377 0.036419 3.63488 Tidal.phaseNT -0.252715 0.054053 -4.67536 Tidal.phaseST 0.145777 0.045554 3.20010 Tidal.cycleF 0.114808 0.044897 2.55713 Tidal.cycleH -0.074224 0.048063 -1.54428 Tidal.cycleL -0.049681 0.047717 -1.04116 Number of linear predictors: 2 Names of linear predictors: logit(phi), logit(mu) Dispersion Parameter for zibinomial family: 1 Log-likelihood: -7831.693 on 30569 degrees of freedom Number of Iterations: 13 I have tried the suggestions pt(coef(summary(mod2)), 30569, lower.tail = TRUE, log.p = FALSE) Value Std. Error t value (Intercept):1 0.9910021735 0.5116242 1.00e+00 (Intercept):2 0.0007310901 0.5199600 0.00e+00 SeasonSpring 0.4677849543 0.5216125 6.792427e-02 SeasonSummer 0.5811276264 0.5199133 9.999794e-01 SeasonWinter 0.6499089525 0.5174974 1.00e+00 DielE 0.4684409796 0.5318250 1.606939e-01 DielM 0.5285424555 0.5297410 8.313752e-01 DielN 0.5526565030 0.5145256 9.998607e-01 Tidal.phaseNT 0.4002451101 0.5215532 1.473475e-06 Tidal.phaseST 0.5579507181 0.5181669 9.993124e-01 Tidal.cycleF 0.5457011061 0.5179053 9.947207e-01 Tidal.cycleH 0.4704164848 0.5191670 6.126507e-02 Tidal.cycleL 0.4801883607 0.5190291 1.489050e-01 pt(coef(mod2), 30569) (Intercept):1 (Intercept):2 SeasonSpring SeasonSummer SeasonWinter 0.9910021735 0.0007310901 0.4677849543 0.5811276264 0.6499089525 DielE DielM DielN Tidal.phaseNT Tidal.phaseST 0.4684409796 0.5285424555 0.5526565030 0.4002451101 0.5579507181 Tidal.cycleF Tidal.cycleH Tidal.cycleL 0.5457011061 0.4704164848 0.4801883607 But these seem to be giving strange results, those which are clearly more different to the basline levels (Autumn, D, bl and E) are coming up as least significant. Perhaps it is my interpretation. I couldnt follow 2*min(pt(coef(summary(mod2)), 30569), 1-pt(coef(summary(mod2)), 30569)) ?? Does anyone know what I might be doing wrong or how to go about the last code? I have read as much as I can find on VGAM and zib but have ran out of ideas. I apologise if this appears to be a beginners question. many thanks in advance -- View this message in context: http://r.789695.n4.nabble.com/P-values-for-vglm-zibinomial-function-in-VGAM-tp3762858p3762858.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using capture.output within a function
Dear R-users I'm running a maximum likelihood procedure using the spg package. I'd like to save some output produced in each iteration to a file, but if I put the capture.output() within the function I get the following message; Error in spg(par = startval, fn = loglik, gr = NULL, method = 3, lower = lo, : Failure in initial function evaluation!Error in -fn(par, ...) : invalid argument to unary operator I have considered putting the capture.output() after the function, but there are some issues with R stalling on me so I'd like that the output is saved for each iteration and not only at completion. Any suggestions on how to get this done would be much appreciated. Kristian Lind *Below an example of what I'm trying to do...* loglik - function(w){ state - c(b_1 = 0, b_2 = 0, a = 0) #declaring ODEs Kristian -function(t, state, w){ with(as.list(c(state, w)), { db_1 = -((w[1]+w[8])*b_1+(w[2]+w[6]*w[8] +w[7]*w[9])*b_2+0.5*(b_1)^2+w[6]*b_1*b_2+0.5* ((w[6])^2+(w[7])^2)*(b_2)^2) db_2 = -w[3]*b_2+1 da = w[1]*w[4]*b_1+(w[2]*w[4]+w[3]*w[5])*b_2 list(c(db_1, db_2, da)) }) } # time making a sequence from t to T evaluated at each delta seq(t, T, by = delta) times - seq(0, 10, by = 0.5) outmat - ode(y = state, times = times, func = Kristian, parms = w) print(w) print(outmat) . . . f-rep(NA, 1) f[1] - 1/(T-1)*sum(log(pJ$p)-log(pJ$J)) f capture.output(outmat, file = spgoutput.txt, append = TRUE) } fit - spg(fn =loglik, ...) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Maximum Likelihood using optim()
Dear mailing list, I would like to use the optim() command in order to maximize the logged likelihood of the following function, where p is the parameter of interest and should be constrained between 0 and positive infinity. y = 1/2 * ((te - x)/(te - tc))^p x and y are given by x - c(5.18, 6.28, 7.00, 7.08, 7.54, 7.90, 8.24, 8.64, 12.17, 12.89, 14.27, 15.38, 15.80, 16.46, 20.41, 21.27, 22.91) y - c(0.63, 0.64, 0.66, 0.68, 0.69, 0.71, 0.73, 0.75, 0.76, 0.78, 0.8, 0.81, 0.83, 0.85, 0.86, 0.88, 0.9) te and tc are fixed at 5 and 25. What is a good way to achieve this? Thanks a lot. Alrik Thiem Department of Humanities, Social and Political Sciences ETH Zurich [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] hydroTSM 0.3-0 and hydroGOF 0.3-0
Dear R users and hydrological/environmental community, I'm glad to announce that a major (and recommended) update for the packages hydroTSM and hydroGOF are now available on CRAN: -) hydroTSM: http://cran.r-project.org/web/packages/hydroTSM/ -) hydroGOF: http://cran.r-project.org/web/packages/hydroGOF/ ### # hydroTSM v0.3-0 # ### hydroTSM is a package for management, analysis, interpolation and plotting of time series used in hydrology and related environmental sciences. This new release collects feedback received since the first public release of the package (~ 1 year ago). Major changes are related to improved plotting of time series, better and faster support for zoo objects, and new features in some functions. A full list of changes can be found on: http://www.rforge.net/hydroTSM/news.html ### # hydroGOF v0.3-0 # ### hydroGOF is a package implementing both statistical and graphical goodness-of-fit measures between observed and simulated values, mainly oriented to be used during the calibration, validation, and application of hydrological/environmental models. Major changes in this new release are related to improved plotting of simulated vs observed values and a new vignette. A full list of changes can be found on: http://www.rforge.net/hydroGOF/news.html ### # Related Links # ### http://meetingorganizer.copernicus.org/EGU2010/EGU2010-13008.pdf http://www.slideshare.net/hzambran/egu2010-ra-statisticalenvironmentfordoinghydrologicalanalysis-9095709 http://www.r-project.org/conferences/useR-2009/slides/Zambrano+Bigiarini.pdf ### Bugs / comments / questions / collaboration of any kind are very welcomed, and in particular, datasets that can be included in the packages for academic purposes. Kind regards, Mauricio Zambrano-Bigiarini -- === FLOODS Action Land Management and Natural Hazards Unit Institute for Environment and Sustainability (IES) European Commission, Joint Research Centre (JRC) webinfo: http://floods.jrc.ec.europa.eu/ === DISCLAIMER:\ The views expressed are purely those of th...{{dropped:7}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] betareg question - keeping the mean fixed?
Thanks for your response, that does work, however, it is still not quite what want. I would like to tell betareg what the mean is (in my case, 0.5) and force it to use that value. Is this possible? -- View this message in context: http://r.789695.n4.nabble.com/betareg-question-keeping-the-mean-fixed-tp3783303p3785683.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using capture.output within a function
On 11-09-02 5:24 AM, Kristian Lind wrote: Dear R-users I'm running a maximum likelihood procedure using the spg package. I'd like to save some output produced in each iteration to a file, but if I put the capture.output() within the function I get the following message; Error in spg(par = startval, fn = loglik, gr = NULL, method = 3, lower = lo, : Failure in initial function evaluation!Error in -fn(par, ...) : invalid argument to unary operator It looks as though you put capture.output() last in your function, so the result of the function is the result of the capture.output call, not the function value. Duncan Murdoch I have considered putting the capture.output() after the function, but there are some issues with R stalling on me so I'd like that the output is saved for each iteration and not only at completion. Any suggestions on how to get this done would be much appreciated. Kristian Lind *Below an example of what I'm trying to do...* loglik- function(w){ state- c(b_1 = 0, b_2 = 0, a = 0) #declaring ODEs Kristian-function(t, state, w){ with(as.list(c(state, w)), { db_1 = -((w[1]+w[8])*b_1+(w[2]+w[6]*w[8] +w[7]*w[9])*b_2+0.5*(b_1)^2+w[6]*b_1*b_2+0.5* ((w[6])^2+(w[7])^2)*(b_2)^2) db_2 = -w[3]*b_2+1 da = w[1]*w[4]*b_1+(w[2]*w[4]+w[3]*w[5])*b_2 list(c(db_1, db_2, da)) }) } # time making a sequence from t to T evaluated at each delta seq(t, T, by = delta) times- seq(0, 10, by = 0.5) outmat- ode(y = state, times = times, func = Kristian, parms = w) print(w) print(outmat) . . . f-rep(NA, 1) f[1]- 1/(T-1)*sum(log(pJ$p)-log(pJ$J)) f capture.output(outmat, file = spgoutput.txt, append = TRUE) } fit- spg(fn =loglik, ...) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Advice on large data structures
i would suggest that if you want to use R that you get a 64-bit version with 24GB of memory to start. if your data is a numeric matrix, you will need 8GB for a single copy. Do you really need it all in memory at once, or can you partition the problem? Can you use a database to access the portion you need at any time? If you only need one, or two, columns at a time, then the use of a database storing the columns might work. You probably need some more analysis on exactly how you want to solve your problem understanding the limitations of the system. Sent from my iPad On Sep 2, 2011, at 1:13, Worik R wor...@gmail.com wrote: Friends I am starting on a (section of the) project where I need to build a matrix with on the order of 5 million rows and 200 columns I am wondering if I can stay in R. I need to do rollapply type operations on the columns, including some that will be functions of (windows of) two columns. I have been looking at the ff and bigmemory packages but am not sure that they will do. Before I get too deep can some one offer some wisdom about what the best direction to go would be? Switching to C/C++ is definitely an option if it is all too hard cheers Worik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using capture.output within a function
Yes, that's true. It works as it's supposed to now. Thank you. Kristian Lind 2011/9/2 Duncan Murdoch murdoch.dun...@gmail.com On 11-09-02 5:24 AM, Kristian Lind wrote: Dear R-users I'm running a maximum likelihood procedure using the spg package. I'd like to save some output produced in each iteration to a file, but if I put the capture.output() within the function I get the following message; Error in spg(par = startval, fn = loglik, gr = NULL, method = 3, lower = lo, : Failure in initial function evaluation!Error in -fn(par, ...) : invalid argument to unary operator It looks as though you put capture.output() last in your function, so the result of the function is the result of the capture.output call, not the function value. Duncan Murdoch I have considered putting the capture.output() after the function, but there are some issues with R stalling on me so I'd like that the output is saved for each iteration and not only at completion. Any suggestions on how to get this done would be much appreciated. Kristian Lind *Below an example of what I'm trying to do...* loglik- function(w){ state- c(b_1 = 0, b_2 = 0, a = 0) #declaring ODEs Kristian-function(t, state, w){ with(as.list(c(state, w)), { db_1 = -((w[1]+w[8])*b_1+(w[2]+w[6]***w[8] +w[7]*w[9])*b_2+0.5*(b_1)^2+w[**6]*b_1*b_2+0.5* ((w[6])^2+(w[7])^2)*(b_2)^2) db_2 = -w[3]*b_2+1 da = w[1]*w[4]*b_1+(w[2]*w[4]+w[3]***w[5])*b_2 list(c(db_1, db_2, da)) }) } # time making a sequence from t to T evaluated at each delta seq(t, T, by = delta) times- seq(0, 10, by = 0.5) outmat- ode(y = state, times = times, func = Kristian, parms = w) print(w) print(outmat) . . . f-rep(NA, 1) f[1]- 1/(T-1)*sum(log(pJ$p)-log(pJ$**J)) f capture.output(outmat, file = spgoutput.txt, append = TRUE) } fit- spg(fn =loglik, ...) [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] If NA Problem!
Hi All! Please find code and the respective lists below. My problem: I specify the case that lilwin[[p]] is not an NA and want the code found in iwish to be returned ONLY for that case. Why do I get a list of length 2 (and why is NULL the first element)? I understand that the code below is quite senseless. I have run into a problem while working on a large project and wanted to simplify it in order for it to be more understandable and accessible. If I should not be using the if function, please let me know what I should be doing instead. I know that I must use the for function for my project. The thing I most want to understand is how, after specifying a certain condition, one may save certain data that occurs when that condition is met. I hope I have been clear enough! Thank you very much for your help! Anna biglist-list(a=1:4,b=2:6) lilwin-list(x=NA,y=2) lilloss-list(m=1,n=3) biglist$a [1] 1 2 3 4 $b [1] 2 3 4 5 6 lilwin$x [1] NA $y [1] 2 lilloss$m [1] 1 $n [1] 3 iwish-list() for(p in 1:length(biglist)){ if(is.na(lilwin[[p]])==F) iwish[p]-list(biglist[[p]][lilwin[[p]]]) } iwish[[1]] NULL [[2]] [1] 3 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Background fill and border for a legend in dotplot
Hi: Try this: key1 - draw.key(list(text=list(levels(Cal_dat$Commodity)), title=Ore type, border = TRUE, background = 'ivory', points=list(pch=22, cex=1.3, fill=col.pat, col=black)), draw = FALSE) key2 - draw.key(list(text=list(levels(factor(Cal_dat$Year))), title=Year, border = TRUE, background = 'ivory', points = list(pch = c(21, 22, 23), cex=1.3, col=black)), draw = FALSE) mkey - mergedTrellisLegendGrob(list(fun = key2), list(fun = key1), vertical = TRUE ) Now rerun your dotplot; from the result I got, you may need to do some positional tweaking and may well want to change the background color of the legend to something else.. HTH, Dennis On Thu, Sep 1, 2011 at 4:47 PM, markm0705 markm0...@gmail.com wrote: Dear R help group I've been working on this plot for a while now and now getting around to the minor adjusments. I would like to be able to put a border and background fill around the legend in this plot. I understand the legend 'bty' should do this have this capablity but not sure how the syntax works in this case ## initalise library(lattice) library(latticeExtra) # for mergedTrellisLegendGrob() ##read the data to a variable # Cal_dat - read.table(Calibration2.dat,header = TRUE,sep = \t,) ## set up plotting colours # col.pat-c(violet,cyan,green,red,blue,black,yellow) sym.pat-c(19,20,21) ##set up the plot key # key1 - draw.key(list(text=list(levels(Cal_dat$Commodity)), title=Ore type, points=list(pch=22, cex=1.3, fill=col.pat, col=black)), draw = FALSE) key2 - draw.key(list(text=list(levels(factor(Cal_dat$Year))), title=Year, points = list(pch = c(21, 22, 23), cex=1.3, col=black)), draw = FALSE) mkey - mergedTrellisLegendGrob(list(fun = key2), list(fun = key1), vertical = TRUE ) ##set some parameters for the plot # trellis.par.set( dot.line=list(col = grey90, lty=dashed), axis.line=list(col = grey50), axis.text=list(col =grey50, cex=0.8), panel.background=list(col=transparent), par.xlab.text= list(col=grey50), ) ## Create the dot plot # with(Cal_dat, dotplot(reorder(paste(Mine,Company), Resc_Gt) ~ Resc_Gt, fill_var = Commodity, pch_var = factor(Year), cex=1.2, pch = c(21, 22, 23), col = black, fill = col.pat, aspect = 2.0, alpha=0.6, legend = list(inside = list(fun = mkey,corner = c(0.95, 0.01))), scales = list(x = list(log = 10)), xscale.components = xscale.components.log10ticks, origin = 0, type = c(p,a), main = Mineral resources, xlab= Total tonnes (billions), panel = function(x, y, ..., subscripts, fill, pch, fill_var, pch_var) { pch - pch[pch_var[subscripts]] fill - fill[fill_var[subscripts]] panel.dotplot(x, y, pch = pch, fill = fill, ...) })) panel = function(x, y, ..., subscripts, fill, pch, fill_var, pch_var) { pch - pch[pch_var[subscripts]] fill - fill[fill_var[subscripts]] panel.dotplot(x, y, pch = pch, fill = fill, ...) } http://r.789695.n4.nabble.com/file/n3785003/Calibration2.dat Calibration2.dat -- View this message in context: http://r.789695.n4.nabble.com/Background-fill-and-border-for-a-legend-in-dotplot-tp3785003p3785003.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] merge some columns
Dear all, I would like to know how to merge columns like: Input file: V1 V2 V3 V4 V5 V6 1 G A G G G G 2 A A G A A G Desired output file: V1 V2 V3 1 G/A G/G G/G 2 A/A G/A A/G So for every 2 consecutive columns merge their content into one. Thanks in advance. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] If NA Problem!
Anna Dunietz wrote on 09/02/2011 07:16:45 AM: Hi All! Please find code and the respective lists below. My problem: I specify the case that lilwin[[p]] is not an NA and want the code found in iwish to be returned ONLY for that case. Why do I get a list of length 2 (and why is NULL the first element)? The list is length 2 because when p=2 in your for loop you save the output to iwish[2]. Since nothing was saved to iwish[1] (when lilwin[[1]] was NA) that gives a NULL value for iwish[1]. If you want to allow the length of iwish to be shorter, you need to change the way you save your output. For example, iwish - list() count - 0 for(p in 1:length(biglist)){ if(is.na(lilwin[[p]])==F) { count - count + 1 iwish[count] - list(biglist[[p]][lilwin[[p]]]) } } Jean I understand that the code below is quite senseless. I have run into a problem while working on a large project and wanted to simplify it in order for it to be more understandable and accessible. If I should not be using the if function, please let me know what I should be doing instead. I know that I must use the for function for my project. The thing I most want to understand is how, after specifying a certain condition, one may save certain data that occurs when that condition is met. I hope I have been clear enough! Thank you very much for your help! Anna biglist-list(a=1:4,b=2:6) lilwin-list(x=NA,y=2) lilloss-list(m=1,n=3) biglist$a [1] 1 2 3 4 $b [1] 2 3 4 5 6 lilwin$x [1] NA $y [1] 2 lilloss$m [1] 1 $n [1] 3 iwish-list() for(p in 1:length(biglist)){ if(is.na(lilwin[[p]])==F) iwish[p]-list(biglist[[p]][lilwin[[p]]]) } iwish[[1]] NULL [[2]] [1] 3 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about BIC of two different regression models? how should we compare two regression models?
I believe when using BIC one needs to compare nested models, i.e. , when comparing models A and B one must make sure that model A contains all the parameters of model B and additionally A contains one or more extra parameter beyond those in B. Further the comparison of BICs requires that models A and B be run on the same data. Thus if we have model A y=age+sex, model B y=age, if some subjects are missing data on their sex, model A would be run on a subset of the data used when running model B. In this case the comparison of BICs would not be valid. John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Ben Bolker bbol...@gmail.com 9/2/2011 2:35 AM Andra Isan andra_isan at yahoo.com writes: Hi All, In order to compare two different logistic regressions, I think I need to compare them based on their BIC values, but I am not sure if the smaller BIC would mean a better model or the reverse is true? Thanks a lot,Andra Smaller (i.e. lower value) BIC is always better (even if BIC happens to be negative, as can happen in some cases; i.e. BIC=-1002 is better than BIC=-1000, BIC=1000 is better than BIC=1002). I would suggest however that (a) there are better venues for this question (e.g. stats.stackexchange.com), since it's a stats and not an R question; (b) it might be a good idea to review a stats text, or even http://en.wikipedia.org/wiki/Bayesian_information_criterion , since this is a pretty basic question. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge some columns
Hi: Here's one approach: d - read.table(textConnection( V1 V2 V3 V4 V5 V6 1 G A G G G G 2 A A G A A G), header = TRUE, stringsAsFactors = FALSE) closeAllConnections() # Create two vectors of variable names, one for odd numbered, # one for even numbered vars1 - names(d)[seq_along(names(d)) %% 2 == 1] vars2 - names(d)[seq_along(names(d)) %% 2 == 0] # Apply the paste sequentially to corresponding pairs # in vars1 and vars2; get() is used to get the data associated # with the variable names in vars1 and vars2 d2 - sapply(seq_along(vars1), function(j) with(d, paste(get(vars1[j]), get(vars2[j]), sep = '/'))) # Convert to data frame: d2 - as.data.frame(d2, stringsAsFactors = FALSE) str(d2) HTH, Dennis On Fri, Sep 2, 2011 at 5:34 AM, Joao Fadista joao.fadi...@med.lu.se wrote: Dear all, I would like to know how to merge columns like: Input file: V1 V2 V3 V4 V5 V6 1 G A G G G G 2 A A G A A G Desired output file: V1 V2 V3 1 G/A G/G G/G 2 A/A G/A A/G So for every 2 consecutive columns merge their content into one. Thanks in advance. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Automatic Recoding
On Sep 2, 2011, at 4:14 AM, thomas.chesney wrote: Thank you both for your replies. I've tried it with a small sample of the data and it works perfectly. I have no idea yet how it works but I will spend some time to figure it out. When you get around to putting in the time to figure it out, the solution to start with would be Dunlap's match strategy. Mine is really a perversion of an aspect of how factors work but isn't really how you are supposed to use them. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weights using Survreg
Thank you for your reply, it has been helpful. Do you know if the parameters estimators are MLE estimators? One more question: In my case study I have failures that occured on different objects that have different age and length, could I use weight to find the estimates of a weibull law and so to find the probabilty of failure per unit of length for example? Thank you very much again for your help, Boris -- View this message in context: http://r.789695.n4.nabble.com/Weights-using-Survreg-tp3781803p3785931.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bandwith selectors for multivariate local regression
Dear all, I'm trying to do a multivariate local regression whit the locfit instruction, such as:Y=m(Z,X,W)+u However, I have a problem at the moment of calculte the bandwith of the regression. If I had a univarite local regression model I could use the instruction regband. However, this command only works in models with only one predictor. Somebody knows how can I do that? Please help I completely lose. Thanks in advance [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Standard errors of sexual dimorphism?
Hello! I am working on a manuscript on sexual dimorphism in an aquatic invertebrate, where we have estimated sexual dimorphism (SD) for 7 different traits in four populations (a total of 28 SD-estimates). We have used the following formula for estimating SD: 100 * (mean male trait value - mean female trait value)/overall trait mean). Then, we have used these SD-estimates to perform a GLM against other interesting variables, such as the intersexual genetic correlations for each of the traits. Here are my questions: 1. Is there any procedure in R you would recommend that takes in to account the sampling variance of the SD-estimates, rather than using the mean value of each (which is supposed to reduce error and increase Type I-error rates? 2. Is there a procedure to estimate SE for ratios such as this SD-estimate? 3. The data in these GLM:s might not be entirely statistically non-independent (i. e. intersexual genetic correlations). Can you recommend any R-procedure (package) that can deal with this problem (e. g. bootstrapping or resampling)? Many thanks in advance for input! Erik Svensson -- View this message in context: http://r.789695.n4.nabble.com/Standard-errors-of-sexual-dimorphism-tp3785770p3785770.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge some columns
On Sep 2, 2011, at 8:34 AM, Joao Fadista wrote: Dear all, I would like to know how to merge columns like: Input file: V1 V2 V3 V4 V5 V6 1 G A G G G G 2 A A G A A G Looked like an mapply-type problem: with(dat, mapply(paste, list(V1, V3, V5), list(V2, V4, V6), MoreArgs=list(sep=/) ) ) [,1] [,2] [,3] [1,] G/A G/G G/G [2,] A/A G/A A/G Desired output file: V1 V2 V3 1 G/A G/G G/G 2 A/A G/A A/G So for every 2 consecutive columns merge their content into one. Thanks in advance. [[alternative HTML version deleted]] -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge some columns
On Sep 2, 2011, at 9:30 AM, David Winsemius wrote: On Sep 2, 2011, at 8:34 AM, Joao Fadista wrote: Dear all, I would like to know how to merge columns like: Input file: V1 V2 V3 V4 V5 V6 1 G A G G G G 2 A A G A A G Looked like an mapply-type problem: with(dat, mapply(paste, list(V1, V3, V5), list(V2, V4, V6), MoreArgs=list(sep=/) ) ) There is a further refinement that is possible that will result in naming of the columns made possible by the behavior of the USE.NAMES feature of mapply. From the help page: use names if the first ... argument has names, or if it is a character vector, use that character vector as the names; with(dat, mapply(paste, list(V1 =V1, V2=V3, V3=V5), list(V2, V4, V6), MoreArgs=list(sep=/) ) ) V1V2V3 [1,] G/A G/G G/G [2,] A/A G/A A/G [,1] [,2] [,3] [1,] G/A G/G G/G [2,] A/A G/A A/G Desired output file: V1 V2 V3 1 G/A G/G G/G 2 A/A G/A A/G So for every 2 consecutive columns merge their content into one. Thanks in advance. [[alternative HTML version deleted]] -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Maximum Likelihood using optim()
I think the following pdf will explain the details of how to use the optim function. http://www.unc.edu/~monogan/computing/r/MLE_in_R.pdf Hope that helps, Sam On Fri, Sep 2, 2011 at 7:06 AM, Thiem Alrik th...@sipo.gess.ethz.ch wrote: Dear mailing list, I would like to use the optim() command in order to maximize the logged likelihood of the following function, where p is the parameter of interest and should be constrained between 0 and positive infinity. y = 1/2 * ((te - x)/(te - tc))^p x and y are given by x - c(5.18, 6.28, 7.00, 7.08, 7.54, 7.90, 8.24, 8.64, 12.17, 12.89, 14.27, 15.38, 15.80, 16.46, 20.41, 21.27, 22.91) y - c(0.63, 0.64, 0.66, 0.68, 0.69, 0.71, 0.73, 0.75, 0.76, 0.78, 0.8, 0.81, 0.83, 0.85, 0.86, 0.88, 0.9) te and tc are fixed at 5 and 25. What is a good way to achieve this? Thanks a lot. Alrik Thiem Department of Humanities, Social and Political Sciences ETH Zurich [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] !!!function to do the knn!!!
really thxs to David Winsemius.. this websits helps a lot, -- View this message in context: http://r.789695.n4.nabble.com/function-to-do-the-knn-tp3781137p3786251.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] post hoc testing of glmer in lme4
I have a mixed model with a binomial response, four factor variables and one random factor. m1=glmer(nbhf.hour~Season+Diel+Tidal.phase+Tidal.cycle+(1|POD.ID),family=binomial,data =bl1,control=list(msVerbose=TRUE)) I have really need to try and find a post hoc test for this model and finding the pariwise comparisons, note the dataset is unbalanced. I had read many questions on this and there doesn't seem to be an acceptable/agreeable answer although perhaps this was some time ago and the question can be answered? Any help is greatly appreciated. Thanks in advance -- View this message in context: http://r.789695.n4.nabble.com/post-hoc-testing-of-glmer-in-lme4-tp3786201p3786201.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] !!!function to do the knn!!!
ths a lot, david. it helps a lot -- View this message in context: http://r.789695.n4.nabble.com/function-to-do-the-knn-tp3781137p3786253.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about BIC of two different regression models? how should we compare two regression models?
On 09/02/2011 08:48 AM, John Sorkin wrote: I believe when using BIC one needs to compare nested models This is wrong. Hypothesis tests rely on nested models; information criteria do not. -- Patrick Breheny Assistant Professor Department of Biostatistics Department of Statistics University of Kentucky __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] If NA Problem!
On 2011-09-02 05:16, Anna Dunietz wrote: Hi All! Please find code and the respective lists below. My problem: I specify the case that lilwin[[p]] is not an NA and want the code found in iwish to be returned ONLY for that case. Why do I get a list of length 2 (and why is NULL the first element)? I understand that the code below is quite senseless. I have run into a problem while working on a large project and wanted to simplify it in order for it to be more understandable and accessible. If I should not be using the if function, please let me know what I should be doing instead. I know that I must use the for function for my project. The thing I most want to understand is how, after specifying a certain condition, one may save certain data that occurs when that condition is met. I hope I have been clear enough! Thank you very much for your help! Anna biglist-list(a=1:4,b=2:6) lilwin-list(x=NA,y=2) lilloss-list(m=1,n=3) biglist$a [1] 1 2 3 4 $b [1] 2 3 4 5 6 lilwin$x [1] NA $y [1] 2 lilloss$m [1] 1 $n [1] 3 iwish-list() for(p in 1:length(biglist)){ if(is.na(lilwin[[p]])==F) iwish[p]-list(biglist[[p]][lilwin[[p]]]) } iwish[[1]] NULL [[2]] [1] 3 Jean has given you one fix. Here's another (see ?'c'): iwish-list() for(p in seq_along(biglist)){ if(!is.na(lilwin[[p]])) iwish - c(iwish, biglist[[p]][lilwin[[p]]]) } BTW, it's not a good idea to use 'F' instead of FALSE and the negation operator is usually a better way to test. Peter Ehlers [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about BIC of two different regression models? how should we compare two regression models?
On Fri, 2 Sep 2011, Patrick Breheny wrote: On 09/02/2011 08:48 AM, John Sorkin wrote: I believe when using BIC one needs to compare nested models This is wrong. Hypothesis tests rely on nested models; information criteria do not. Actually, this is off-topic on this list. But blanket statements are often themselves untrue: there are hypothesis tests of non-nested models (most famously due to Cox, 1961), and Akaike explicitly considered only nested models in his paper introducing AIC. Certainly criteria such as AIC and BIC (in the sense of Schwarz: there are several criteria of that name) can be used with non-nested models but are much sharper tools for nested models. -- Patrick Breheny Assistant Professor Department of Biostatistics Department of Statistics University of Kentucky __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about BIC of two different regression models? how should we compare two regression models?
Inline: On Fri, Sep 2, 2011 at 8:09 AM, Patrick Breheny patrick.breh...@uky.edu wrote: On 09/02/2011 08:48 AM, John Sorkin wrote: I believe when using BIC one needs to compare nested models This is wrong. Hypothesis tests rely on nested models; information criteria do not. Yes, indeed. It may additionally be worth noting what has has been noted on this list before: the actual definition of such criteria is given only up to a constant, so different **software** may give different answers on the same data. Hence be sure to compare results using the same software or make any necessary additive adjustments based on the details of how the software does the calculation when results from different software are being compared. Cheers, Bert -- Patrick Breheny Assistant Professor Department of Biostatistics Department of Statistics University of Kentucky __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mann Kendall Test for Trend
Hi there, I'm trying to apply the Mann Kendall test for trend analysis of a time series. I have downloaded and installed the package Kendall and subsequently loaded it into the software. My time series is a .txt file with 2 columns - column 1 is the year (1985 - 2009) and column 2 is the corresponding entry variable. According to the R guidelines, the call should be: MannKendall(x) [whereby x is a data vector, usually a time series] As such, I've loaded in my file 'data.txt' and then called on: MannKendall(data), to which I get the following: Error in Kendall(1:length(x), x) : length(x)3 What do I need to do to get beyond this highly annoying error? I've tried MannKendall(1:27(data), data), but then keep getting this: Error in Kendall(1:length(x), x) : attempt to apply non-function Any help greatly received! S -- View this message in context: http://r.789695.n4.nabble.com/Mann-Kendall-Test-for-Trend-tp3786392p3786392.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] previous monday date
Hello, I'm attempting to return the date (in form '%Y-%m-%d') of the Monday previous to the current date. For example: since it is 2011-09-02 today, I would expect 2011-08-29 to be the return value. I found the following in: http://www.mail-archive.com/r-help@r-project.org/msg144184.html Start quote from link: prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4) For example, prevmonday(Sys.Date()) [1] 2011-08-15 prevmonday(prevmonday(Sys.Date())) [1] 2011-08-15 End quote from link. But when I do it I get: prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4) prevmonday(Sys.Date()) Error in as.Date.numeric(1 - 4) : 'origin' must be supplied I've tried setting the 'origin' argument in as.Date() in different ways, but it returns inaccurate results. Thanks, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Advice on large data structures
On 09/01/2011 10:13 PM, Worik R wrote: I am starting on a (section of the) project where I need to build a matrix with on the order of 5 million rows and 200 columns I am wondering if I can stay in R. I need to do rollapply type operations on the columns, including some that will be functions of (windows of) two columns. Perhaps useful to you -- I recently added WINDOW FUNCTION support to PL/R*. Currently this new feature is only available in git master, but within a few days I will push a new release. You can download the source from git here if you want: https://github.com/jconway/plr The official docs have not been updated yet, but see the pre-release docs here (specifically chapter 9): http://www.joeconway.com/plr/doc/plr-git-US.pdf HTH, Joe *PL/R allows you to execute R functions from within a PostgreSQL database -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, 24x7 Support __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about BIC of two different regression models? how should we compare two regression models?
On 09/02/2011 11:26 AM, Prof Brian Ripley wrote: This is wrong. Hypothesis tests rely on nested models; information criteria do not. Actually, this is off-topic on this list. But blanket statements are often themselves untrue: there are hypothesis tests of non-nested models (most famously due to Cox, 1961), and Akaike explicitly considered only nested models in his paper introducing AIC. Certainly criteria such as AIC and BIC (in the sense of Schwarz: there are several criteria of that name) can be used with non-nested models but are much sharper tools for nested models. Good point; my remark was only meant to refer to the simple case of logistic regression in the original post, and certainly should not be construed as a blanket statement applying to all possible hypothesis tests of all possible models. -- Patrick Breheny Assistant Professor Department of Biostatistics Department of Statistics University of Kentucky __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mann Kendall Test for Trend
I don't have the Kendall package at my fingers, but it seems like you have some deeper trouble with R syntax if you are writing things like MannKendall(1:27(data),data) when you know that MannKendall only takes one argument. Can you verify that your data object actually has a full set of data to it? (use some combination of length(),nrow(),dput(),head(),tail() etc) For a general tip, data() is a function to R so its not the best idea for an object name. Outside chance that might be what's causing your problem. Michael Weylandt On Fri, Sep 2, 2011 at 11:35 AM, ScottM scott.mcgr...@abdn.ac.uk wrote: Hi there, I'm trying to apply the Mann Kendall test for trend analysis of a time series. I have downloaded and installed the package Kendall and subsequently loaded it into the software. My time series is a .txt file with 2 columns - column 1 is the year (1985 - 2009) and column 2 is the corresponding entry variable. According to the R guidelines, the call should be: MannKendall(x) [whereby x is a data vector, usually a time series] As such, I've loaded in my file 'data.txt' and then called on: MannKendall(data), to which I get the following: Error in Kendall(1:length(x), x) : length(x)3 What do I need to do to get beyond this highly annoying error? I've tried MannKendall(1:27(data), data), but then keep getting this: Error in Kendall(1:length(x), x) : attempt to apply non-function Any help greatly received! S -- View this message in context: http://r.789695.n4.nabble.com/Mann-Kendall-Test-for-Trend-tp3786392p3786392.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] previous monday date
On Sep 2, 2011, at 10:35 AM, Ben qant wrote: Hello, I'm attempting to return the date (in form '%Y-%m-%d') of the Monday previous to the current date. For example: since it is 2011-09-02 today, I would expect 2011-08-29 to be the return value. I found the following in: http://www.mail-archive.com/r-help@r-project.org/msg144184.html Start quote from link: prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4) For example, prevmonday(Sys.Date()) [1] 2011-08-15 prevmonday(prevmonday(Sys.Date())) [1] 2011-08-15 End quote from link. But when I do it I get: prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4) prevmonday(Sys.Date()) Error in as.Date.numeric(1 - 4) : 'origin' must be supplied I've tried setting the 'origin' argument in as.Date() in different ways, but it returns inaccurate results. Thanks, Ben If memory serves, this is because Gabor used the version of as.Date() from his 'zoo' package in that post, which does not require an origin to be specified, whereas the default as.Date() function in R's base package does: prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4) prevmonday(Sys.Date()) Error in as.Date.numeric(1 - 4) : 'origin' must be supplied require(zoo) Loading required package: zoo Attaching package: 'zoo' The following object(s) are masked from 'package:base': as.Date prevmonday(Sys.Date()) [1] 2011-08-29 # Remove 'zoo' to use the base function detach(package:zoo) prevmonday(Sys.Date()) Error in as.Date.numeric(1 - 4) : 'origin' must be supplied # Fix the function to use base::as.Date() prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4, origin = 1970-01-01) prevmonday(Sys.Date()) [1] 2011-08-29 See ?as.Date HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mann Kendall Test for Trend
Michael, Cheers for the input - still learning syntax as I go - very new to the use of programming language and having to self teach as part of my PhD. The input data was fine - it was the call on the R website which was confusing me - by defining the location of the actual data in the second column and calling result - MannKendall(filename[,2]), it worked without issue. Cheers again though, Scott McGrane MA (Hons), MRes SAGES Theme 1 PhD Student Northern Rivers Institute St Mary's Building University of Aberdeen http://www.abdn.ac.uk/nri From: R. Michael Weylandt michael.weyla...@gmail.com [via R] [ml-node+3786437-1846790326-252...@n4.nabble.com] Sent: 02 September 2011 04:47 PM To: Mcgrane, Scott Subject: Re: Mann Kendall Test for Trend I don't have the Kendall package at my fingers, but it seems like you have some deeper trouble with R syntax if you are writing things like MannKendall(1:27(data),data) when you know that MannKendall only takes one argument. Can you verify that your data object actually has a full set of data to it? (use some combination of length(),nrow(),dput(),head(),tail() etc) For a general tip, data() is a function to R so its not the best idea for an object name. Outside chance that might be what's causing your problem. Michael Weylandt On Fri, Sep 2, 2011 at 11:35 AM, ScottM [hidden email]/user/SendEmail.jtp?type=nodenode=3786437i=0 wrote: Hi there, I'm trying to apply the Mann Kendall test for trend analysis of a time series. I have downloaded and installed the package Kendall and subsequently loaded it into the software. My time series is a .txt file with 2 columns - column 1 is the year (1985 - 2009) and column 2 is the corresponding entry variable. According to the R guidelines, the call should be: MannKendall(x) [whereby x is a data vector, usually a time series] As such, I've loaded in my file 'data.txt' and then called on: MannKendall(data), to which I get the following: Error in Kendall(1:length(x), x) : length(x)3 What do I need to do to get beyond this highly annoying error? I've tried MannKendall(1:27(data), data), but then keep getting this: Error in Kendall(1:length(x), x) : attempt to apply non-function Any help greatly received! S -- View this message in context: http://r.789695.n4.nabble.com/Mann-Kendall-Test-for-Trend-tp3786392p3786392.html Sent from the R help mailing list archive at Nabble.com. __ [hidden email]/user/SendEmail.jtp?type=nodenode=3786437i=1 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ [hidden email]/user/SendEmail.jtp?type=nodenode=3786437i=2 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/Mann-Kendall-Test-for-Trend-tp3786392p3786437.html To unsubscribe from Mann Kendall Test for Trend, click herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3786392code=c2NvdHQubWNncmFuZUBhYmRuLmFjLnVrfDM3ODYzOTJ8OTM1MjEwNDY5. The University of Aberdeen is a charity registered in Scotland, No SC013683. -- View this message in context: http://r.789695.n4.nabble.com/Mann-Kendall-Test-for-Trend-tp3786392p3786488.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] previous monday date
I didn't sort out the issue in my email below but here is a (not very R'ish) solution: pm = function(x) { + for(i in 1:7){ + if(format(as.Date(Sys.Date()-i),'%w') == 1){ + d = Sys.Date() - i; + } + } + d + } pm(Sys.Date()) [1] 2011-08-29 On Fri, Sep 2, 2011 at 9:35 AM, Ben qant ccqu...@gmail.com wrote: Hello, I'm attempting to return the date (in form '%Y-%m-%d') of the Monday previous to the current date. For example: since it is 2011-09-02 today, I would expect 2011-08-29 to be the return value. I found the following in: http://www.mail-archive.com/r-help@r-project.org/msg144184.html Start quote from link: prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4) For example, prevmonday(Sys.Date()) [1] 2011-08-15 prevmonday(prevmonday(Sys.Date())) [1] 2011-08-15 End quote from link. But when I do it I get: prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4) prevmonday(Sys.Date()) Error in as.Date.numeric(1 - 4) : 'origin' must be supplied I've tried setting the 'origin' argument in as.Date() in different ways, but it returns inaccurate results. Thanks, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] previous monday date
Oh OK, missed that. Here is a solution using base: (already posted) I didn't sort out the issue in my email below but here is a (not very R'ish) solution: pm = function(x) { + for(i in 1:7){ + if(format(as.Date(Sys.Date()- i),'%w') == 1){ + d = Sys.Date() - i; + } + } + d + } pm(Sys.Date()) [1] 2011-08-29 On Fri, Sep 2, 2011 at 9:59 AM, Marc Schwartz marc_schwa...@me.com wrote: On Sep 2, 2011, at 10:35 AM, Ben qant wrote: Hello, I'm attempting to return the date (in form '%Y-%m-%d') of the Monday previous to the current date. For example: since it is 2011-09-02 today, I would expect 2011-08-29 to be the return value. I found the following in: http://www.mail-archive.com/r-help@r-project.org/msg144184.html Start quote from link: prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4) For example, prevmonday(Sys.Date()) [1] 2011-08-15 prevmonday(prevmonday(Sys.Date())) [1] 2011-08-15 End quote from link. But when I do it I get: prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4) prevmonday(Sys.Date()) Error in as.Date.numeric(1 - 4) : 'origin' must be supplied I've tried setting the 'origin' argument in as.Date() in different ways, but it returns inaccurate results. Thanks, Ben If memory serves, this is because Gabor used the version of as.Date() from his 'zoo' package in that post, which does not require an origin to be specified, whereas the default as.Date() function in R's base package does: prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4) prevmonday(Sys.Date()) Error in as.Date.numeric(1 - 4) : 'origin' must be supplied require(zoo) Loading required package: zoo Attaching package: 'zoo' The following object(s) are masked from 'package:base': as.Date prevmonday(Sys.Date()) [1] 2011-08-29 # Remove 'zoo' to use the base function detach(package:zoo) prevmonday(Sys.Date()) Error in as.Date.numeric(1 - 4) : 'origin' must be supplied # Fix the function to use base::as.Date() prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4, origin = 1970-01-01) prevmonday(Sys.Date()) [1] 2011-08-29 See ?as.Date HTH, Marc Schwartz [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mann Kendall Test for Trend
I'd suggest that you look into using a time series class to organize your data rather than just keeping levels and times next to each other. This will also make it alot easier to work with your data in specific time-series sort of ways. I really like the xts class and you can get to it with something like the following install.package(xts) library(xts) x = read.table(HOWEVER YOU NEED TO READ THE DATA) x = xts(x[,2],as.Date(x[,1], format = %Y)) The syntax to the last line is to construct an xts using the second column of the data as the levels of the time series while the first reads the years, coverts them to date objects (you supply the format, here its just a year number)) and puts the thing together. Then, I think you can just run MannKendall(x) Hope this helps, Michael Weylandt PS -- There are all sorts of wonderful introductions to R for people with a variety of programming backgrounds. If you want a recommendation, just say a little more about any programming experience you may or may not have had and I'm sure someone could recommend a few good ones. On Fri, Sep 2, 2011 at 12:00 PM, ScottM scott.mcgr...@abdn.ac.uk wrote: Michael, Cheers for the input - still learning syntax as I go - very new to the use of programming language and having to self teach as part of my PhD. The input data was fine - it was the call on the R website which was confusing me - by defining the location of the actual data in the second column and calling result - MannKendall(filename[,2]), it worked without issue. Cheers again though, Scott McGrane MA (Hons), MRes SAGES Theme 1 PhD Student Northern Rivers Institute St Mary's Building University of Aberdeen http://www.abdn.ac.uk/nri From: R. Michael Weylandt michael.weyla...@gmail.com [via R] [ ml-node+3786437-1846790326-252...@n4.nabble.com] Sent: 02 September 2011 04:47 PM To: Mcgrane, Scott Subject: Re: Mann Kendall Test for Trend I don't have the Kendall package at my fingers, but it seems like you have some deeper trouble with R syntax if you are writing things like MannKendall(1:27(data),data) when you know that MannKendall only takes one argument. Can you verify that your data object actually has a full set of data to it? (use some combination of length(),nrow(),dput(),head(),tail() etc) For a general tip, data() is a function to R so its not the best idea for an object name. Outside chance that might be what's causing your problem. Michael Weylandt On Fri, Sep 2, 2011 at 11:35 AM, ScottM [hidden email]/user/SendEmail.jtp?type=nodenode=3786437i=0 wrote: Hi there, I'm trying to apply the Mann Kendall test for trend analysis of a time series. I have downloaded and installed the package Kendall and subsequently loaded it into the software. My time series is a .txt file with 2 columns - column 1 is the year (1985 - 2009) and column 2 is the corresponding entry variable. According to the R guidelines, the call should be: MannKendall(x) [whereby x is a data vector, usually a time series] As such, I've loaded in my file 'data.txt' and then called on: MannKendall(data), to which I get the following: Error in Kendall(1:length(x), x) : length(x)3 What do I need to do to get beyond this highly annoying error? I've tried MannKendall(1:27(data), data), but then keep getting this: Error in Kendall(1:length(x), x) : attempt to apply non-function Any help greatly received! S -- View this message in context: http://r.789695.n4.nabble.com/Mann-Kendall-Test-for-Trend-tp3786392p3786392.html Sent from the R help mailing list archive at Nabble.com. __ [hidden email]/user/SendEmail.jtp?type=nodenode=3786437i=1 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ [hidden email]/user/SendEmail.jtp?type=nodenode=3786437i=2 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/Mann-Kendall-Test-for-Trend-tp3786392p3786437.html To unsubscribe from Mann Kendall Test for Trend, click here http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3786392code=c2NvdHQubWNncmFuZUBhYmRuLmFjLnVrfDM3ODYzOTJ8OTM1MjEwNDY5 . The University of Aberdeen is a charity registered in Scotland, No SC013683. -- View this message in context: http://r.789695.n4.nabble.com/Mann-Kendall-Test-for-Trend-tp3786392p3786488.html
Re: [R] previous monday date
On Fri, Sep 2, 2011 at 9:59 AM, Marc Schwartz marc_schwa...@me.com wrote: On Sep 2, 2011, at 10:35 AM, Ben qant wrote: Hello, I'm attempting to return the date (in form '%Y-%m-%d') of the Monday previous to the current date. For example: since it is 2011-09-02 today, I would expect 2011-08-29 to be the return value. I found the following in: http://www.mail-archive.com/r-help@r-project.org/msg144184.html Start quote from link: prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4) For example, prevmonday(Sys.Date()) [1] 2011-08-15 prevmonday(prevmonday(Sys.Date())) [1] 2011-08-15 End quote from link. But when I do it I get: prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4) prevmonday(Sys.Date()) Error in as.Date.numeric(1 - 4) : 'origin' must be supplied I've tried setting the 'origin' argument in as.Date() in different ways, but it returns inaccurate results. Thanks, Ben If memory serves, this is because Gabor used the version of as.Date() from his 'zoo' package in that post, which does not require an origin to be specified, whereas the default as.Date() function in R's base package does: prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4) prevmonday(Sys.Date()) Error in as.Date.numeric(1 - 4) : 'origin' must be supplied require(zoo) Loading required package: zoo Attaching package: 'zoo' The following object(s) are masked from 'package:base': as.Date prevmonday(Sys.Date()) [1] 2011-08-29 # Remove 'zoo' to use the base function detach(package:zoo) prevmonday(Sys.Date()) Error in as.Date.numeric(1 - 4) : 'origin' must be supplied # Fix the function to use base::as.Date() prevmonday - function(x) 7 * floor(as.numeric(x-1+4) / 7) + as.Date(1-4, origin = 1970-01-01) prevmonday(Sys.Date()) [1] 2011-08-29 See ?as.Date HTH, Marc Schwartz On Sep 2, 2011, at 11:02 AM, Ben qant wrote: Oh OK, missed that. Here is a solution using base: (already posted) I didn't sort out the issue in my email below but here is a (not very R'ish) solution: pm = function(x) { + for(i in 1:7){ + if(format(as.Date(Sys.Date()- i),'%w') == 1){ + d = Sys.Date() - i; + } + } + d + } pm(Sys.Date()) [1] 2011-08-29 It occurs to me that another solution would be: as.Date(cut(Sys.Date(), weeks)) [1] 2011-08-29 This uses cut.Date() to create a vector that contains the beginning of weekly intervals for each date value, with the week beginning on a Monday by default. Note that cut.Date() returns a factor, not a Date class object, hence the additional coercion of the result. See ?cut.Date So if you want that in a function wrapper, you could do: prev.monday - function(x) as.Date(cut(x, weeks)) prev.monday(Sys.Date()) [1] 2011-08-29 HTH, Marc __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Avoiding for Loop for moving average
Hello, I need to calculate a moving average and an exponentially weighted moving average over a fairly large data set (500K rows). Doing this in a for loop works nicely, but is slow. ewma - data$col[1] N - dim(data)[1] for(i in 2:N){ data$ewma - alpha * data$ewma[i-1] + (1-alpha) * data$value[i] } Since the moving average accumulates as we move through the data, I'm not sure on the best/fastest way to do this. Does anyone have any suggestions on how to avoid a loop doing this? -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building #8208 Los Angeles, CA 90095 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Avoiding for Loop for moving average
Have you looked at SMA/EMA from the TTR package? That's a pretty quick implementation. runmean from caTools is even better for the SMA but I don't think there's an easy way to turn that into an EWMA. Hope this helps, Michael Weylandt On Fri, Sep 2, 2011 at 12:43 PM, Noah Silverman noahsilver...@ucla.eduwrote: Hello, I need to calculate a moving average and an exponentially weighted moving average over a fairly large data set (500K rows). Doing this in a for loop works nicely, but is slow. ewma - data$col[1] N - dim(data)[1] for(i in 2:N){ data$ewma - alpha * data$ewma[i-1] + (1-alpha) * data$value[i] } Since the moving average accumulates as we move through the data, I'm not sure on the best/fastest way to do this. Does anyone have any suggestions on how to avoid a loop doing this? -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building #8208 Los Angeles, CA 90095 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Hints for Data Clustering
Dear All, I will be confronted (relatively soon) with the following problem: given a set of known statistical indicators {s_i} , i=1,2...N for a N countries I would like to be able to do some data clustering i.e. determining the best way to partition the N countries according to their known properties, encoded by the {s_i} set of indicators for those countries. Some properties of these countries may be categorical or anyway non-numerical variables (e.g. the fact of belonging/not belonging to a certain group; joining/not joining a certain treaty etc...). I have seen some data clustering examples, but without categorical variables and I wonder if this is an inherent limitation of the methodology (on the top of my head, I would not know how to define the distance between non-numerical variables). Any suggestions about the general methodology and R packages/code snippets is really appreciated. And also: do the units in which I express a statistical indicator play a role? For instance: for 2 given countries I could have the average age of the population, the average life expectancy and the average income per year in thousands of dollars. This would give rise e.g. to (40,72,26) and (44,75,36), but if I measure the average income in dollars, then I would get (40,72,26000) (44,75,36000). Would the units that I choose for an indicator impact on the clustering results? They should not, in my view, since the income does not change whichever way I express it, but I am not sure about the algorithm results. Many thanks Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Avoiding for Loop for moving average
On Fri, Sep 2, 2011 at 11:47 AM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Have you looked at SMA/EMA from the TTR package? That's a pretty quick implementation. runmean from caTools is even better for the SMA but I don't think there's an easy way to turn that into an EWMA. SMA still calls Fortran code, so that's why it's slower than caTools::runmean. I've moved the EMA code to C, so it's about as fast as it can be. Noah, use EMA's ratio argument to replicate your for loop. Hope this helps, Michael Weylandt Best, -- Joshua Ulrich | FOSS Trading: www.fosstrading.com On Fri, Sep 2, 2011 at 12:43 PM, Noah Silverman noahsilver...@ucla.eduwrote: Hello, I need to calculate a moving average and an exponentially weighted moving average over a fairly large data set (500K rows). Doing this in a for loop works nicely, but is slow. ewma - data$col[1] N - dim(data)[1] for(i in 2:N){ data$ewma - alpha * data$ewma[i-1] + (1-alpha) * data$value[i] } Since the moving average accumulates as we move through the data, I'm not sure on the best/fastest way to do this. Does anyone have any suggestions on how to avoid a loop doing this? -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building #8208 Los Angeles, CA 90095 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Avoiding for Loop for moving average
Joshua, Thanks for the tip. I need to roll my own code on this. But perhaps I can borrow some code from the package you mentioned. Is the package just performing the loop, but in a faster language? -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building #8208 Los Angeles, CA 90095 On Sep 2, 2011, at 9:58 AM, Joshua Ulrich wrote: On Fri, Sep 2, 2011 at 11:47 AM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Have you looked at SMA/EMA from the TTR package? That's a pretty quick implementation. runmean from caTools is even better for the SMA but I don't think there's an easy way to turn that into an EWMA. SMA still calls Fortran code, so that's why it's slower than caTools::runmean. I've moved the EMA code to C, so it's about as fast as it can be. Noah, use EMA's ratio argument to replicate your for loop. Hope this helps, Michael Weylandt Best, -- Joshua Ulrich | FOSS Trading: www.fosstrading.com On Fri, Sep 2, 2011 at 12:43 PM, Noah Silverman noahsilver...@ucla.eduwrote: Hello, I need to calculate a moving average and an exponentially weighted moving average over a fairly large data set (500K rows). Doing this in a for loop works nicely, but is slow. ewma - data$col[1] N - dim(data)[1] for(i in 2:N){ data$ewma - alpha * data$ewma[i-1] + (1-alpha) * data$value[i] } Since the moving average accumulates as we move through the data, I'm not sure on the best/fastest way to do this. Does anyone have any suggestions on how to avoid a loop doing this? -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building #8208 Los Angeles, CA 90095 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Avoiding for Loop for moving average
There is a recent blog post by Dirk Eddelbeutel on how to do something similar using his Rcpp package and C++, with massive time improvements. http://dirk.eddelbuettel.com/blog/ On 9/2/2011 12:43 PM, Noah Silverman wrote: Hello, I need to calculate a moving average and an exponentially weighted moving average over a fairly large data set (500K rows). Doing this in a for loop works nicely, but is slow. ewma- data$col[1] N- dim(data)[1] for(i in 2:N){ data$ewma- alpha * data$ewma[i-1] + (1-alpha) * data$value[i] } Since the moving average accumulates as we move through the data, I'm not sure on the best/fastest way to do this. Does anyone have any suggestions on how to avoid a loop doing this? -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building #8208 Los Angeles, CA 90095 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Avoiding for Loop for moving average
On Fri, Sep 2, 2011 at 12:06 PM, Noah Silverman noahsilver...@ucla.edu wrote: Joshua, Thanks for the tip. I need to roll my own code on this. But perhaps I can borrow some code from the package you mentioned. Is the package just performing the loop, but in a faster language? As I said, the function is in C. You could also use the compiler package to compile your pure R function for a 3-4x speedup. Best, -- Joshua Ulrich | FOSS Trading: www.fosstrading.com -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building #8208 Los Angeles, CA 90095 On Sep 2, 2011, at 9:58 AM, Joshua Ulrich wrote: On Fri, Sep 2, 2011 at 11:47 AM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Have you looked at SMA/EMA from the TTR package? That's a pretty quick implementation. runmean from caTools is even better for the SMA but I don't think there's an easy way to turn that into an EWMA. SMA still calls Fortran code, so that's why it's slower than caTools::runmean. I've moved the EMA code to C, so it's about as fast as it can be. Noah, use EMA's ratio argument to replicate your for loop. Hope this helps, Michael Weylandt Best, -- Joshua Ulrich | FOSS Trading: www.fosstrading.com On Fri, Sep 2, 2011 at 12:43 PM, Noah Silverman noahsilver...@ucla.eduwrote: Hello, I need to calculate a moving average and an exponentially weighted moving average over a fairly large data set (500K rows). Doing this in a for loop works nicely, but is slow. ewma - data$col[1] N - dim(data)[1] for(i in 2:N){ data$ewma - alpha * data$ewma[i-1] + (1-alpha) * data$value[i] } Since the moving average accumulates as we move through the data, I'm not sure on the best/fastest way to do this. Does anyone have any suggestions on how to avoid a loop doing this? -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building #8208 Los Angeles, CA 90095 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Advice on large data structures
Along the lines of one of Jim's suggestions, if you have some basic MySQL knowledge check out the RMySQL package. I use it to convert / partition a matrix similar to yours to R objects and it works fine. Hope this helps, A. On Fri, 2 Sep 2011 06:33:13 -0400 Jim Holtman jholt...@gmail.com wrote: i would suggest that if you want to use R that you get a 64-bit version with 24GB of memory to start. if your data is a numeric matrix, you will need 8GB for a single copy. Do you really need it all in memory at once, or can you partition the problem? Can you use a database to access the portion you need at any time? If you only need one, or two, columns at a time, then the use of a database storing the columns might work. You probably need some more analysis on exactly how you want to solve your problem understanding the limitations of the system. Sent from my iPad On Sep 2, 2011, at 1:13, Worik R wor...@gmail.com wrote: Friends I am starting on a (section of the) project where I need to build a matrix with on the order of 5 million rows and 200 columns I am wondering if I can stay in R. I need to do rollapply type operations on the columns, including some that will be functions of (windows of) two columns. I have been looking at the ff and bigmemory packages but am not sure that they will do. Before I get too deep can some one offer some wisdom about what the best direction to go would be? Switching to C/C++ is definitely an option if it is all too hard cheers Worik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hints for Data Clustering
Look at the function daisy in the package cluster. require(cluster) ?daisy Jean Lorenzo Isella wrote on 09/02/2011 11:50:04 AM: Dear All, I will be confronted (relatively soon) with the following problem: given a set of known statistical indicators {s_i} , i=1,2...N for a N countries I would like to be able to do some data clustering i.e. determining the best way to partition the N countries according to their known properties, encoded by the {s_i} set of indicators for those countries. Some properties of these countries may be categorical or anyway non-numerical variables (e.g. the fact of belonging/not belonging to a certain group; joining/not joining a certain treaty etc...). I have seen some data clustering examples, but without categorical variables and I wonder if this is an inherent limitation of the methodology (on the top of my head, I would not know how to define the distance between non-numerical variables). Any suggestions about the general methodology and R packages/code snippets is really appreciated. And also: do the units in which I express a statistical indicator play a role? For instance: for 2 given countries I could have the average age of the population, the average life expectancy and the average income per year in thousands of dollars. This would give rise e.g. to (40,72,26) and (44,75,36), but if I measure the average income in dollars, then I would get (40,72,26000) (44,75,36000). Would the units that I choose for an indicator impact on the clustering results? They should not, in my view, since the income does not change whichever way I express it, but I am not sure about the algorithm results. Many thanks Lorenzo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Avoiding for Loop for moving average
Thanks Joshua, I really like the example given in the blog post that Abhijit pointed me to. Doing it in C++ using the Inline seems like an easy way to get a massive improvement in speed without the hassle of writing a package. I'm working on coding that now. -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building #8208 Los Angeles, CA 90095 On Sep 2, 2011, at 10:32 AM, Joshua Ulrich wrote: On Fri, Sep 2, 2011 at 12:06 PM, Noah Silverman noahsilver...@ucla.edu wrote: Joshua, Thanks for the tip. I need to roll my own code on this. But perhaps I can borrow some code from the package you mentioned. Is the package just performing the loop, but in a faster language? As I said, the function is in C. You could also use the compiler package to compile your pure R function for a 3-4x speedup. Best, -- Joshua Ulrich | FOSS Trading: www.fosstrading.com -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building #8208 Los Angeles, CA 90095 On Sep 2, 2011, at 9:58 AM, Joshua Ulrich wrote: On Fri, Sep 2, 2011 at 11:47 AM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Have you looked at SMA/EMA from the TTR package? That's a pretty quick implementation. runmean from caTools is even better for the SMA but I don't think there's an easy way to turn that into an EWMA. SMA still calls Fortran code, so that's why it's slower than caTools::runmean. I've moved the EMA code to C, so it's about as fast as it can be. Noah, use EMA's ratio argument to replicate your for loop. Hope this helps, Michael Weylandt Best, -- Joshua Ulrich | FOSS Trading: www.fosstrading.com On Fri, Sep 2, 2011 at 12:43 PM, Noah Silverman noahsilver...@ucla.eduwrote: Hello, I need to calculate a moving average and an exponentially weighted moving average over a fairly large data set (500K rows). Doing this in a for loop works nicely, but is slow. ewma - data$col[1] N - dim(data)[1] for(i in 2:N){ data$ewma - alpha * data$ewma[i-1] + (1-alpha) * data$value[i] } Since the moving average accumulates as we move through the data, I'm not sure on the best/fastest way to do this. Does anyone have any suggestions on how to avoid a loop doing this? -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building #8208 Los Angeles, CA 90095 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Including only a subset of the levels of a factor XXXX
On Sep 1, 2011, at 21:11 , R. Michael Weylandt wrote: Dropping all occurences of a factor does not drop that level. This actually turns out to be much more useful than it first might appear, but if you really need to get around it, it can be done. ...most expediently by using factor(), as others have pointed out. Or droplevels() for data frames. We had the converse issue just the other day (Aug 30) when someone had problems with showing zero frequencies in xtabs, which turned out to be caused by the tabulated data _not_ being factors, hence not containing information about which values could have been there but wasn't. The behavior of subsetting operators is so as to make things like tables and barplots consistent across subsets, but there are cases where you want the extra levels dropped. However, the default is as it is, because it is easier to drop levels than to reinstate them. Neither is impossible, of course. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com Døden skal tape! --- Nordahl Grieg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Avoiding for Loop for moving average
The 'filter' function should be able to do what you want efficiently. On 02/09/2011 18:06, Noah Silverman wrote: Joshua, Thanks for the tip. I need to roll my own code on this. But perhaps I can borrow some code from the package you mentioned. Is the package just performing the loop, but in a faster language? -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building #8208 Los Angeles, CA 90095 On Sep 2, 2011, at 9:58 AM, Joshua Ulrich wrote: On Fri, Sep 2, 2011 at 11:47 AM, R. Michael Weylandt michael.weyla...@gmail.com wrote: Have you looked at SMA/EMA from the TTR package? That's a pretty quick implementation. runmean from caTools is even better for the SMA but I don't think there's an easy way to turn that into an EWMA. SMA still calls Fortran code, so that's why it's slower than caTools::runmean. I've moved the EMA code to C, so it's about as fast as it can be. Noah, use EMA's ratio argument to replicate your for loop. Hope this helps, Michael Weylandt Best, -- Joshua Ulrich | FOSS Trading: www.fosstrading.com On Fri, Sep 2, 2011 at 12:43 PM, Noah Silvermannoahsilver...@ucla.eduwrote: Hello, I need to calculate a moving average and an exponentially weighted moving average over a fairly large data set (500K rows). Doing this in a for loop works nicely, but is slow. ewma- data$col[1] N- dim(data)[1] for(i in 2:N){ data$ewma- alpha * data$ewma[i-1] + (1-alpha) * data$value[i] } Since the moving average accumulates as we move through the data, I'm not sure on the best/fastest way to do this. Does anyone have any suggestions on how to avoid a loop doing this? -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building #8208 Los Angeles, CA 90095 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Patrick Burns pbu...@pburns.seanet.com twitter: @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of 'Some hints for the R beginner' and 'The R Inferno') __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Classifying large text corpora using R
Dear everyone, I am new to R, and I am looking at doing text classification on a huge collection of documents (500,000) which are distributed among 300 classes (so basically, this is my training data). Would someone please be kind enough to let me know about the R packages to use and their scalability (time and space)? I am very new to R and do not know of the right packages to use. I started off by trying to use the tm package (http://cran.r-project.org/package=tm) for pre-processing and FSelector (http://cran.r-project.org/web/packages/FSelector/index.html) package for feature selection - but both of these are incredibly slow and completely unusable for my task. So the question is what are the right packages to use (for pre-processing, feature selection, and classification)? Please consider the fact that I may be dealing with data of millions of dimensions which may not even fit in memory. I posted on this issue twice (http://r.789695.n4.nabble.com/Entropy-based-feature-selection-in-R-td3708056.html , http://r.789695.n4.nabble.com/R-s-handling-of-high-dimensional-data-td3741758.html) but did not get any response. This is a very critical piece of my research and I have been struggling with this issue for a long time. Please consider helping me out, directly or by pointing me to any other software/website that you think may be more appropriate. Many thanks in advance. -- View this message in context: http://r.789695.n4.nabble.com/Classifying-large-text-corpora-using-R-tp3786787p3786787.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Chemical Names in Data Frames
Greetings - I am working on some data that contain chemical names with air concentrations, and I am creating a data frame with date/time and each chemical having its own column. However, these are organic chemicals (e.g. 1-butene, 2,3,4-trimethylbenzene etc). The package I am going to be using the data with is openair, and many of the great functions require you to specify a column name which does not seem to work with improper column names- e.g. smoothTrend(mydata, pollutant=1-Butene and smoothTrend(mydata, pollutant=mydata[,1-Butene]) I was wondering if there was a function to automatically convert these chemical names (with all sorts of numbers and minuses in the beginning) to something openair can handle? Or am I going to be stuck recoding several hundred chemical names in my database? VR Jim James T. Durant, MSPH CIH Emergency Response Coordinator US Agency for Toxic Substances and Disease Registry Atlanta, GA 30341 770-378-1695 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chemical Names in Data Frames
On Sep 2, 2011, at 3:13 PM, Durant, James T. (ATSDR/DTEM/PRMSB) wrote: Greetings - I am working on some data that contain chemical names with air concentrations, and I am creating a data frame with date/time and each chemical having its own column. However, these are organic chemicals (e.g. 1-butene, 2,3,4-trimethylbenzene etc). The package I am going to be using the data with is openair, and many of the great functions require you to specify a column name which does not seem to work with improper column names- e.g. smoothTrend(mydata, pollutant=1-Butene and smoothTrend(mydata, pollutant=mydata[,1- Butene]) I was wondering if there was a function to automatically convert these chemical names (with all sorts of numbers and minuses in the beginning) to something openair can handle? Or am I going to be stuck recoding several hundred chemical names in my database? Try using back-ticks on the invalid names, rather than quotes. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chemical Names in Data Frames
?make.names perhaps. On Fri, Sep 2, 2011 at 4:13 PM, Durant, James T. (ATSDR/DTEM/PRMSB) h...@cdc.gov wrote: Greetings - I am working on some data that contain chemical names with air concentrations, and I am creating a data frame with date/time and each chemical having its own column. However, these are organic chemicals (e.g. 1-butene, 2,3,4-trimethylbenzene etc). The package I am going to be using the data with is openair, and many of the great functions require you to specify a column name which does not seem to work with improper column names- e.g. smoothTrend(mydata, pollutant=1-Butene and smoothTrend(mydata, pollutant=mydata[,1-Butene]) I was wondering if there was a function to automatically convert these chemical names (with all sorts of numbers and minuses in the beginning) to something openair can handle? Or am I going to be stuck recoding several hundred chemical names in my database? VR Jim James T. Durant, MSPH CIH Emergency Response Coordinator US Agency for Toxic Substances and Disease Registry Atlanta, GA 30341 770-378-1695 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Chemical Names in Data Frames
possibly with unique = TRUE: make.names(c(', ')) [1] X. X. make.names(c(', '), unique=TRUE) [1] X. X..1 Spencer On 9/2/2011 12:28 PM, Gustavo Carvalho wrote: ?make.names perhaps. On Fri, Sep 2, 2011 at 4:13 PM, Durant, James T. (ATSDR/DTEM/PRMSB) h...@cdc.gov wrote: Greetings - I am working on some data that contain chemical names with air concentrations, and I am creating a data frame with date/time and each chemical having its own column. However, these are organic chemicals (e.g. 1-butene, 2,3,4-trimethylbenzene etc). The package I am going to be using the data with is openair, and many of the great functions require you to specify a column name which does not seem to work with improper column names- e.g. smoothTrend(mydata, pollutant=1-Butene and smoothTrend(mydata, pollutant=mydata[,1-Butene]) I was wondering if there was a function to automatically convert these chemical names (with all sorts of numbers and minuses in the beginning) to something openair can handle? Or am I going to be stuck recoding several hundred chemical names in my database? VR Jim James T. Durant, MSPH CIH Emergency Response Coordinator US Agency for Toxic Substances and Disease Registry Atlanta, GA 30341 770-378-1695 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Spencer Graves, PE, PhD President and Chief Technology Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San José, CA 95126 ph: 408-655-4567 web: www.structuremonitoring.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to keep the same class?
Hello Please see the example below class(testX) [1] matrix class(testX[1,]) [1] numeric Why not matrix? What am I missing here? Is there a way to keep the same class? The reason for the question is that I want to implement a k-step ahead prediction for my own routines and R wrecks does not seem to like [1,] as shown below. predict(fit10,testX[1,]) Error in knnregTrain(train = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, : dims of 'test' and 'train differ predict(fit10,testX[1:2,]) [1] 81.00 76.36 Many thanks Ed [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to keep the same class?
Try this: predict(fit10,testX[1,,drop = FALSE]) On Fri, Sep 2, 2011 at 5:05 PM, Eduardo M. A. M.Mendes emammen...@gmail.com wrote: Hello Please see the example below class(testX) [1] matrix class(testX[1,]) [1] numeric Why not matrix? What am I missing here? Is there a way to keep the same class? The reason for the question is that I want to implement a k-step ahead prediction for my own routines and R wrecks does not seem to like [1,] as shown below. predict(fit10,testX[1,]) Error in knnregTrain(train = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, : dims of 'test' and 'train differ predict(fit10,testX[1:2,]) [1] 81.00 76.36 Many thanks Ed [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to keep the same class?
On Sep 2, 2011, at 3:05 PM, Eduardo M. A. M.Mendes wrote: Hello Please see the example below class(testX) [1] matrix class(testX[1,]) [1] numeric Why not matrix? What am I missing here? Is there a way to keep the same class? The reason for the question is that I want to implement a k-step ahead prediction for my own routines and R wrecks does not seem to like [1,] as shown below. predict(fit10,testX[1,]) Error in knnregTrain(train = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, : dims of 'test' and 'train differ predict(fit10,testX[1:2,]) [1] 81.00 76.36 Many thanks Ed Ed, See: http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-do-my-matrices-lose-dimensions_003f and then use: predict(fit10, testX[1, , drop = FALSE]) HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to keep the same class?
Many thanks to all for the reply. I do apologize for bothering the list with a FAQ but I have to confess that, although I read Faq in the past, I did not remember to do it again. Cheers Ed -Original Message- From: Marc Schwartz [mailto:marc_schwa...@me.com] Sent: Friday, September 02, 2011 5:16 PM To: Eduardo M. A. M.Mendes Cc: r-help@r-project.org Subject: Re: [R] How to keep the same class? On Sep 2, 2011, at 3:05 PM, Eduardo M. A. M.Mendes wrote: Hello Please see the example below class(testX) [1] matrix class(testX[1,]) [1] numeric Why not matrix? What am I missing here? Is there a way to keep the same class? The reason for the question is that I want to implement a k-step ahead prediction for my own routines and R wrecks does not seem to like [1,] as shown below. predict(fit10,testX[1,]) Error in knnregTrain(train = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, : dims of 'test' and 'train differ predict(fit10,testX[1:2,]) [1] 81.00 76.36 Many thanks Ed Ed, See: http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-do-my-matrices-lose-dimensi ons_003f and then use: predict(fit10, testX[1, , drop = FALSE]) HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] conditional replacement of character strings in vectors
Hello, I have a dataframe that looks like this: a b NA Honduras China NA NA Sudan Japan NA NA Mexico NA Mexico I would like to replace the NA values in column b with the non-NA values in column a. I have tried a number of techniques, (if, ifelse) but I must have the logic wrong. Thanks -- Josh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Hessian Matrix Issue
Dear All, I am running a simulation to obtain coverage probability of Wald type confidence intervals for my parameter d in a function of two parameters (mu,d). I am optimizing it using optim method L-BFGS-B to obtain MLE. As, I want to invert the Hessian matrix to get Standard errors of the two parameter estimates. However, my Hessian matrix at times becomes non-invertible that is it is no more positive definite and I get the following error msg: Error in solve.default(ac$hessian) : system is computationally singular: reciprocal condition number = 6.89585e-21 Thank you Following is the code I am running I would really appreciate your comments and suggestions: #Start Code #option to trace /recover error #options(error = recover) #Sample Size n-30 mu-5 size- 2 #true values of parameter d d.true-1+mu/size d.true #true value of zero inflation index phi= 1+log(d)/(1-d) z.true-1+(log(d.true)/(1-d.true)) z.true # Allocating space for simulation vectors and setting counters for simulation counter-0 iter-1 lower.d-numeric(iter) upper.d-numeric(iter) #set.seed(987654321) #begining of simulation loop for (i in 1:iter){ r.NB-rnbinom(n, mu = mu, size = size) y-sort(r.NB) iter.num-i print(y) print(iter.num) #empirical estimates or sample moments xbar-mean(y) variance-(sum((y-xbar)^2))/length(y) dbar-variance/xbar #sample estimate of proportion of zeros and zero inflation index pbar-length(y[y==0])/length(y) ### Simplified function # NegBin-function(th){ mu-th[1] d-th[2] n-length(y) arg1-n*mean(y)*ifelse(mu = 0, log(mu),0) #arg1-n*mean(y)*log(mu) #arg2-n*log(d)*((mean(y))+mu/(d-1)) arg2-n*ifelse(d=0, log(d), 0)*((mean(y))+mu/ifelse((d-1)= 0, (d-1), 0.001)) aa-numeric(length(max(y))) a-numeric(length(y)) for (i in 1:n) { for (j in 1:y[i]){ aa[j]-ifelse(((j-1)*(d-1))/mu 0,log(1+((j-1)*(d-1))/mu),0) #aa[j]-log(1+((j-1)*(d-1))/mu) #print(aa[j]) } a[i]-sum(aa) #print(a[i]) } a arg3-sum(a) llh-arg1+arg2+arg3 if(! is.finite(llh)) llh-1e+20 -llh } ac-optim(NegBin,par=c(xbar,dbar),method=L-BFGS-B,hessian=TRUE,lower= c(0,1) ) ac print(ac$hessian) muhat-ac$par[1] dhat-ac$par[2] zhat- 1+(log(dhat)/(1-dhat)) infor-solve(ac$hessian) var.dhat-infor[2,2] se.dhat-sqrt(var.dhat) var.muhat-infor[1,1] se.muhat-sqrt(var.muhat) var.func-dhat*muhat var.func d.prime-cbind(dhat,muhat) se.var.func-d.prime%*%infor%*%t(d.prime) se.var.func lower.d[i]-dhat-1.96*se.dhat upper.d[i]-dhat+1.96*se.dhat if(lower.d[i] = d.true d.true= upper.d[i]) counter -counter+1 } counter covg.prob-counter/iter covg.prob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Platform of image
Dear R users, When I Save Workspace... and then reopen it, my platform switches from 64-bit to 32-bit, i.e. the Gui switches between these: R version 2.13.1 (2011-07-08) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-pc-mingw32/x64 (64-bit) R version 2.13.1 (2011-07-08) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: i386-pc-mingw32/i386 (32-bit) How come? This is a Windows x64 system. John Welsh, Ph.D. Associate Professor Molecular and Cancer Biology Vaccine Research Institute of San Diego 10835 Road to the Cure San Diego, CA 92121 Phone: (858) 581-3960 ex.248 Email: jwe...@sdibr.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] misclassification rate
Hi users I'm student who is struggling with basic R programming. Would you please help me with this problem. My english is bad I hope that my question is clear: I have a matrix in wich there are two colmns( yp, yt) Yp: predicted values from my model. yt: true values ( my dependante variable y is a categorical;3 modalities (0,1,2) I don't know how to procede to calculate the misclassification rate and the error Types. Thank you for answring Doussa -- View this message in context: http://r.789695.n4.nabble.com/misclassification-rate-tp3787075p3787075.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Parameters in Gamma Frailty model
Dear all, I'm new to frailty model. I have a question on the output from 'survival' pack. Below is the output. What does gamma1,2,3 refer to? How do I calculate joint hazard function or marginal hazard function using info below? Many thanks! Call: coxph(formula = surv ~ as.factor(tibia) + frailty(as.factor(bdcat)), data = try) n=877 (1 observation deleted due to missingness) coef se(coef) se2 Chisq DF p as.factor(tibia)2 0.214 0.1260.125 2.89 1.00 0.0890 frailty(as.factor(bdcat)) 10.24 1.65 0.0038 exp(coef) exp(-coef) lower .95 upper .95 as.factor(tibia)2 1.238 0.808 0.968 1.58 gamma:1 0.716 1.398 0.496 1.03 gamma:2 1.036 0.965 0.750 1.43 gamma:3 1.248 0.801 0.901 1.73 Iterations: 10 outer, 27 Newton-Raphson Variance of random effect= 0.0648 I-likelihood = -1756.4 Degrees of freedom for terms= 1.0 1.6 Rsquare= 0.016 (max possible= 0.982 ) Likelihood ratio test= 14.3 on 2.64 df, p=0.00171 Wald test= 11.5 on 2.64 df, p=0.00668 -- View this message in context: http://r.789695.n4.nabble.com/Parameters-in-Gamma-Frailty-model-tp3787013p3787013.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] conditional replacement of character strings in vectors
Your data frame didn't come across legibly, try sending it in plain text using the dput() command. That said, I'd guess you want something like this: d[is.na(d$a),a] - d[is.na(d$b),b] The idea is that is.na(d$a) selects only those rows where column a is NA and then moves b values into a for only those rows. Write back with the dput() data frame if this doesn't work. Hope this helps, Michael Weylandt On Fri, Sep 2, 2011 at 3:51 PM, Josh Tewksbury tewk...@uw.edu wrote: Hello, I have a dataframe that looks like this: a b NA Honduras China NA NA Sudan Japan NA NA Mexico NA Mexico I would like to replace the NA values in column b with the non-NA values in column a. I have tried a number of techniques, (if, ifelse) but I must have the logic wrong. Thanks -- Josh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] conditional replacement of character strings in vectors
On Sep 2, 2011, at 3:51 PM, Josh Tewksbury wrote: Hello, I have a dataframe that looks like this: a b NA Honduras China NA NA Sudan Japan NA NA Mexico NA Mexico I would like to replace the NA values in column b with the non-NA values in column a. I have tried a number of techniques, (if, ifelse) but I must have the logic wrong. Mangled data but no matter: dfrm$b[is.na(dfrm$b)] - dfrm$a[is.na(dfrm$b)] (Learn to post in plain text. This is a plain text list.) David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ROCR package question for evaluating two regression models
Hello All, I have used logistic regression glm in R and I am evaluating two models both learned with glm but with different predictors. model1 - glm (Y ~ x4+ x5+ x6+ x7, data = dat, family = binomial(link=logit))model2 - glm (Y~ x1 + x2 +x3 , data = dat, family = binomial(link=logit)) and I would like to compare these two models based on the prediction that I get from each model: pred1 = predict(model1, test.data, type = response)pred2 = predict(model2, test.data, type = response) I have used ROCR package to compare them:pr1 = prediction(pred1,test.y)pf1 = performance(pr1, measure = prec, x.measure = rec) plot(pf1) which cutoff this plot is based on? pr2 = prediction(pred2,test.y)pf2 = performance(pr2, measure = prec, x.measure = rec)pf2_roc = performance(pr2,measure=err)plot(pf2) First of all, I would like to use cutoff = 0.5 and plot the ROC, precision-recall curves based on that cutoff value. In other words, how to define a cut off value in performance function?For example, in pf2_roc = performance(pr2,measure=err), when I do plot(pf2_roc), it plots for every single cutoff point. I only want to have one cut off point, is there any way to do that?Second, I would like to see the performance of the two models based on the above measures on the same plot so the comparison would be easier. In other words, how can I plot (pf1, pf2) and compare them together?plot(pf1, pf2) would give me an error as follows:Error in as.double(x) : cannot coerce type 'S4' to vector of type 'double' Could you please help me with that? Thanks a lot,Andra [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Platform of image
You don't say how you re-open it, but I would guess you are double-clicking the .RData file and letting Windows look up which program to run. If so, you can either open the 64-bit GUI directly and use File Open Data to open your data, or you can change which version of the R GUI program is linked to .RData files in Windows. --- Jeff Newmiller The . . Go Live... DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. John Welsh jwe...@sdibr.org wrote: Dear R users, When I Save Workspace... and then reopen it, my platform switches from 64-bit to 32-bit, i.e. the Gui switches between these: R version 2.13.1 (2011-07-08) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-pc-mingw32/x64 (64-bit) R version 2.13.1 (2011-07-08) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: i386-pc-mingw32/i386 (32-bit) How come? This is a Windows x64 system. John Welsh, Ph.D. Associate Professor Molecular and Cancer Biology Vaccine Research Institute of San Diego 10835 Road to the Cure San Diego, CA 92121 Phone: (858) 581-3960 ex.248 Email: jwe...@sdibr.org _ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] conditional replacement of character strings in vectors
On Sep 2, 2011, at 7:51 PM, R. Michael Weylandt wrote: Your data frame didn't come across legibly, try sending it in plain text using the dput() command. That said, I'd guess you want something like this: d[is.na(d$a),a] - d[is.na(d$b),b] One of the rare instances where I disagree with Michael. The row index on the right hand side must be the same as the row index on the left hand side. The idea is that is.na(d$a) selects only those rows where column a is NA and then moves b values into a for only those rows. Right, that is the idea. Write back with the dput() data frame if this doesn't work. Hope this helps, Michael Weylandt On Fri, Sep 2, 2011 at 3:51 PM, Josh Tewksbury tewk...@uw.edu wrote: Hello, I have a dataframe that looks like this: a b NA Honduras China NA NA Sudan Japan NA NA Mexico NA Mexico I would like to replace the NA values in column b with the non-NA values in column a. I have tried a number of techniques, (if, ifelse) but I must have the logic wrong. Thanks -- Josh David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] conditional replacement of character strings in vectors
Y'all are both absolutely right -- I just skimmed the problem and wrote an answer all too briefly that both didn't do what was asked and did what it did instead entirely incorrectly. The syntax suggested by David's email of 7:52 is what I meant to get at... My apologies for any confusion this caused to the OP or anyone else following the thread, Michael On Fri, Sep 2, 2011 at 9:03 PM, David Winsemius dwinsem...@comcast.netwrote: On Sep 2, 2011, at 7:51 PM, R. Michael Weylandt wrote: Your data frame didn't come across legibly, try sending it in plain text using the dput() command. That said, I'd guess you want something like this: d[is.na(d$a),a] - d[is.na(d$b),b] One of the rare instances where I disagree with Michael. The row index on the right hand side must be the same as the row index on the left hand side. The idea is that is.na(d$a) selects only those rows where column a is NA and then moves b values into a for only those rows. Right, that is the idea. Write back with the dput() data frame if this doesn't work. Hope this helps, Michael Weylandt On Fri, Sep 2, 2011 at 3:51 PM, Josh Tewksbury tewk...@uw.edu wrote: Hello, I have a dataframe that looks like this: a b NA Honduras China NA NA Sudan Japan NA NA Mexico NA Mexico I would like to replace the NA values in column b with the non-NA values in column a. I have tried a number of techniques, (if, ifelse) but I must have the logic wrong. Thanks -- Josh David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] confusion matrix
hi users I have a data frame in with there are two colomns real values and predicted ones (for a dichotomic response). How can i obtain a confusion matrix (miscalssification rat and errors)? The costs are egal. Thanks -- View this message in context: http://r.789695.n4.nabble.com/confusion-matrix-tp3787363p3787363.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem in applying function in data subset (with a level) - using plyr or other alternative are also welcome
Dear R experts. I might be missing something obvious. I have been trying to fix this problem for some weeks. Please help. #data ped - c(rep(1, 4), rep(2, 3), rep(3, 3)) y - rnorm(10, 8, 2) # variable set 1 M1a - sample (c(1, 2,3), 10, replace= T) M1b - sample (c(1, 2,3), 10, replace= T) M1aP1 - sample (c(1, 2,3), 10, replace= T) M1bP2 - sample (c(1, 2,3), 10, replace= T) # variable set 2 M2a - sample (c(1, 2,3), 10, replace= T) M2b - sample (c(1, 2,3), 10, replace= T) M2aP1 - sample (c(1, 2,3), 10, replace= T) M2bP2 - sample (c(1, 2,3), 10, replace= T) # variable set 3 M3a - sample (c(1, 2,3), 10, replace= T) M3b - sample (c(1, 2,3), 10, replace= T) M3aP1 - sample (c(1, 2,3), 10, replace= T) M3bP2 - sample (c(1, 2,3), 10, replace= T) mydf - data.frame (ped, M1a,M1b,M1aP1,M1bP2, M2a,M2b,M2aP1,M2bP2, M3a,M3b,M3aP1,M3bP2, y) # functions and further calculations mmat - matrix (c(M1a,M2a,M3a,M1b,M2b,M3b,M1aP1,M2aP1,M3aP1, M1bP2,M2bP2,M3bP2), ncol = 4) # first function myfun - function(x) { x- as.vector(x) ot1 - ifelse(mydf[x[1]] == mydf[x[3]], 1, -1) ot2 - ifelse(mydf[x[2]] == mydf[x[4]], 1, -1) qt - ot1 + ot2 return(qt) } qt - apply(mmat, 1, myfun) ydv - c((y - mean(y))^2) qtd - data.frame(ped, ydv, qt) # second function myfun2 - function(dataframe) { vydv - sum(ydv)*0.25 sumD - sum(ydv * qt) Rt - vydv / sumD return(Rt) } # using plyr require(plyr) dfsumd1 - ddply(mydf,.(mydf$ped),myfun2) Here are 2 issues: (1) The output just one, I need the output for all three set of variables (as listed above) (2) all three values of dfsumd is returning to same for all level of ped: 1,2, 3 Means that the function is applied to whole dataset but only replicated in output !!! I tried with plyr not being lazy but due to my limited R knowledge, If you have a different suggestion, you are welcome too. Thank you in advance... Maya [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem in applying function in data subset (with a level) - using plyr or other alternative are also welcome
On Sep 2, 2011, at 11:18 PM, Maya Joshi wrote: Dear R experts. I might be missing something obvious. I have been trying to fix this problem for some weeks. Please help. #data ped - c(rep(1, 4), rep(2, 3), rep(3, 3)) y - rnorm(10, 8, 2) # variable set 1 M1a - sample (c(1, 2,3), 10, replace= T) M1b - sample (c(1, 2,3), 10, replace= T) M1aP1 - sample (c(1, 2,3), 10, replace= T) M1bP2 - sample (c(1, 2,3), 10, replace= T) # variable set 2 M2a - sample (c(1, 2,3), 10, replace= T) M2b - sample (c(1, 2,3), 10, replace= T) M2aP1 - sample (c(1, 2,3), 10, replace= T) M2bP2 - sample (c(1, 2,3), 10, replace= T) # variable set 3 M3a - sample (c(1, 2,3), 10, replace= T) M3b - sample (c(1, 2,3), 10, replace= T) M3aP1 - sample (c(1, 2,3), 10, replace= T) M3bP2 - sample (c(1, 2,3), 10, replace= T) mydf - data.frame (ped, M1a,M1b,M1aP1,M1bP2, M2a,M2b,M2aP1,M2bP2, M3a,M3b,M3aP1,M3bP2, y) # functions and further calculations mmat - matrix (c(M1a,M2a,M3a,M1b,M2b,M3b,M1aP1,M2aP1,M3aP1, M1bP2,M2bP2,M3bP2), ncol = 4) # first function myfun - function(x) { x- as.vector(x) ot1 - ifelse(mydf[x[1]] == mydf[x[3]], 1, -1) You really ought to explain what you are trying to do. This code will compare two lists. The list from mydf[x2]] will be the column mydf[M2b] which will have as its first element the four element vector assigned above. I am guessing that was not what you wanted. Notice this simple case using the [ function as you are attempting throws an error: list(M2b = c(1,2,3)) == list(M2a = c(1,2,3)) Error in list(M2b = c(1, 2, 3)) == list(M2a = c(1, 2, 3)) : comparison of these types is not implemented' So ... now that your code has failed to explain what you wanted why don't your try some natural language explanations. Notice that the [[ function is generally what people want when extracting from lists: list(M2b = c(1,2,3))[[M2b]] == list(M2a = c(1,2,3))[[M2a]] [1] TRUE TRUE TRUE ot2 - ifelse(mydf[x[2]] == mydf[x[4]], 1, -1) qt - ot1 + ot2 return(qt) } qt - apply(mmat, 1, myfun) ydv - c((y - mean(y))^2) qtd - data.frame(ped, ydv, qt) # second function myfun2 - function(dataframe) { vydv - sum(ydv)*0.25 sumD - sum(ydv * qt) Rt - vydv / sumD return(Rt) } # using plyr require(plyr) dfsumd1 - ddply(mydf,.(mydf$ped),myfun2) Here are 2 issues: (1) The output just one, I need the output for all three set of variables (as listed above) An incredibly vague description. (2) all three values of dfsumd is returning to same for all level of ped: 1,2, 3 I sympathize with those forced to adopt the English language, but that is the standard this decade. So givne your apparent difficulties, you need to exert more effort at making explicit what is _supposed_ to be returned. Means that the function is applied to whole dataset but only replicated in output !!! I tried with plyr not being lazy but due to my limited R knowledge, If you have a different suggestion, you are welcome too. Thank you in advance... Maya [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Platform of image
On Fri, 2 Sep 2011, Jeff Newmiller wrote: You don't say how you re-open it, but I would guess you are double-clicking the .RData file and letting Windows look up which program to run. If so, you can either open the 64-bit GUI directly and use File Open Data to open your data, or you can change which version of the R GUI program is linked to .RData files in Windows. See the rw-FAQ Q2.29 for why (and how to change it). Jeff Newmiller John Welsh jwe...@sdibr.org wrote: Dear R users, When I Save Workspace... and then reopen it, my platform switches from 64-bit to 32-bit, i.e. the Gui switches between these: R version 2.13.1 (2011-07-08) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-pc-mingw32/x64 (64-bit) R version 2.13.1 (2011-07-08) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: i386-pc-mingw32/i386 (32-bit) How come? This is a Windows x64 system. John Welsh, Ph.D. Associate Professor Molecular and Cancer Biology Vaccine Research Institute of San Diego 10835 Road to the Cure San Diego, CA 92121 Phone: (858) 581-3960 ex.248 Email: jwe...@sdibr.org -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.