[R] Trouble with performing post hoc analysis Tukey for lme model using ghlt
Hi, I am new to R and I am trying to perform a post hoc tukey test using the multcomp package's ghlt for a lme model. My first attempt at doing so gave me an output, HOWEVER I have tried to do this again, it keeps coming up with the error: (Error in contrMat(table(mf[[nm]]), type = types[pm]) : less than two groups) My model is looking at effect of incubation temperature (3 groups) on PHA with hen as a my random factor. Temperature treatment has been made factorial (Tempfact). my code is as follows: library(multcomp) PHA - lme(DiffPHA48 ~ Tempfact, random= ~1|Hen, na.action=na.omit) summary(glht(PHA, linfct=mcp(Tempfact = Tukey))) Error in contrMat(table(mf[[nm]]), type = types[pm]) : less than two groups I am confused why it would work the first time and now everyother time it comes up with this error. Can anyone please help me?? Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Trouble-with-performing-post-hoc-analysis-Tukey-for-lme-model-using-ghlt-tp3849982p3849982.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding axis to an ellipse: ellipse package
On 28/09/11 04:53, Antoine wrote: Dear list members, This might be a silly question but I just can't figure it out. I am using the ellipse package on covariance matrices. I would simply like to plot my ellipses WITH its two axis ploted as well. These axis represents the 2 eigen vectors of my matrix and it is important that I can graphically show them. Is there an easy way to do so? Pretty easy. * Calculate your eigenvalues and vectors using eigen(). * Rescale the vectors by multiplying them by the square root of the corresponding eigenvalues, and by the constant corresponding to the confidence level --- e.g. sqrt(qchisq(0.95,2)) for the usual 95% confidence level. * Add these vectors to the ellipse plot, using segments(). E.g. if your rescaled eigenvectors are v1 and v2, just do: segments(-v1[1],-v1[2],v1[1],v1[2]) and segments(-v2[1],-v2[2],v2[1],v2[2]) cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Negative Quartile
Hello, I have a doubt, but it is more statistic than just about R: How the people deal usually with negative percentile and quartile? In my concrete case, I want to know the distribution of an error, so the nearest it is to 0 the better (I think it would be optimun to have the nearest values to 0 in the lower percentiles). The best result would be making the percentile of the absolute value (without sign), but the sing is also important, and I need to keep it. This results in the big negative values to be in the lower percentiles, which are usually understanded as the best cases, since we are working with an error. My question is, what is the usual thing done in this cases? keep the sign, and read the lower percentiles as bad cases also (being the optimal ones the center ones) or try to somehow get the signless distribution keeping the signs somehow? And if this is the case, how can this be done? Thanks for your time, Best regards __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data transformation cleaning
Seems your questions belong to rule mining for frequent item sets. check arules package Weidong Gu On Tue, Sep 27, 2011 at 11:13 PM, pip56789 pd...@virginia.edu wrote: Hi, I have a few methodological and implementation questions for ya'll. Thank you in advance for your help. I have a dataset that reflects people's preference choices. I want to see if there's any kind of clustering effect among certain preference choices (e.g. do people who pick choice A also pick choice D). I have a data set that has one record per user ID, per preference choice. It's a long form of a data set that looks like this: ID | Page 123 | Choice A 123 | Choice B 456 | Choice A 456 | Choice B ... I thought that I should do the following 1. Make the data set wide, counting the observations so the data looks like this: ID | Count of Preference A | Count of Preference B 123 | 1 | 1 ... Using table1 - dcast(data,ID ~ Page,fun.aggregate=length,value_var='Page' ) 2. Create a correlation matrix of preferences cor(table2[,-1]) How would I restrict my correlation to show preferences that met a minimum sample threshold? Can you confirm if the two following commands do the same thing? What would I do from here (or am I taking the wrong approach) table1 - dcast(data,Page ~ Page,fun.aggregate=length,value_var='Page' ) table2 - with(data, table(Page,Page)) many thanks, Peter -- View this message in context: http://r.789695.n4.nabble.com/Data-transformation-cleaning-tp3849889p3849889.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plotting Lines Through multiple groups
Hi I have data in the following format Cort Day Animal 230 1 273 1 240 2 271 2 342 2 303 2 244 2 200 3 241 3 282 3 344 3 etc. It is measured across time(day) however no every individual is measured the same number of times. All I want to do is plot the Raw data and then run a line connecting the data points of each individual animal, for the example above there would be three lines as there are three animals. I've tried, gplots, lattice and car but I'm not really figuring it out, any help would be greatly appreciated! Thanks Clara -- View this message in context: http://r.789695.n4.nabble.com/Plotting-Lines-Through-multiple-groups-tp3850099p3850099.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding axis to an ellipse: ellipse package
Thanks a lot Rolf, I knew it would be possible to do it that way but I was just curious to know if the ellipse package was offering such a feature. Thanks again for you help! Antoine Le 28 sept. 11 à 09:30, Rolf Turner-3 [via R] a écrit : On 28/09/11 04:53, Antoine wrote: Dear list members, This might be a silly question but I just can't figure it out. I am using the ellipse package on covariance matrices. I would simply like to plot my ellipses WITH its two axis ploted as well. These axis represents the 2 eigen vectors of my matrix and it is important that I can graphically show them. Is there an easy way to do so? Pretty easy. * Calculate your eigenvalues and vectors using eigen(). * Rescale the vectors by multiplying them by the square root of the corresponding eigenvalues, and by the constant corresponding to the confidence level --- e.g. sqrt(qchisq(0.95,2)) for the usual 95% confidence level. * Add these vectors to the ellipse plot, using segments(). E.g. if your rescaled eigenvectors are v1 and v2, just do: segments(-v1[1],-v1[2],v1[1],v1[2]) and segments(-v2[1],-v2[2],v2[1],v2[2]) cheers, Rolf Turner __ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/Adding-axis-to-an-ellipse-ellipse-package-tp3847954p3850233.html To unsubscribe from Adding axis to an ellipse: ellipse package, click here. -- View this message in context: http://r.789695.n4.nabble.com/Adding-axis-to-an-ellipse-ellipse-package-tp3847954p3850237.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Matrix and list indices
Thanks Michael it works! Have to say it is amazing what you can do in R with a few lines (a line in this case) of code. Fernando From: R. Michael Weylandt [mailto:michael.weyla...@gmail.com] Sent: 27. september 2011 15:43 To: Cabrera, Fernando Álvarez Subject: Re: [R] Matrix and list indices Untested, I believe this should work, though you might need to modify for floating point funny business in testing the equalities: my_list - list( earth=array(c(0,0,45,0,0,45,0,45),dim=c(2,2,2)), mars=array(c(8:1),dim=c(2,2,2))) my_list$earth[my_list$earth==0] - my_list$mars[my_list$earth==0] Hope this helps, Michael Weylandt On Tue, Sep 27, 2011 at 8:49 AM, fernando.cabr...@nordea.commailto:fernando.cabr...@nordea.com wrote: Hi guys, I am trying to replace all elements of earth that are equal to zero with their corresponding elements in mars. I can do the replace with a bunch of for-loops, but I don't think this is the R way of doing things. my_list - list( earth=array(c(0,0,45,0,0,45,0,45),dim=c(2,2,2)), mars=array(c(8:1),dim=c(2,2,2))) my_list for (i in c(1:2)) { for (j in c(1:2)) { for (k in c(1:2)) { if (my_list$earth[i,j,k] == 0) { my_list$earth[i,j,k] - my_list$mars[i,j,k] } } } } my_list Do you guys have any suggestions for getting rid of the ugly for-loops? Many thanks, Fernando Álvarez Nordea e-Markets Strandgade 3 PO Box 850 DK-0900 Copenhagen C Denmark Tel.: +45 33 33 32 67tel:%2B45%2033%2033%2032%2067 Mobile: +45 61 55 27 54tel:%2B45%2061%2055%2027%2054 This transmission is intended solely for the person or entity to whom it is addressed. It may contain privileged and confidential information. If you are not the intended recipient, please be notified that any dissemination, distribution or copying is strictly prohibited. If you have received this transmission by mistake, please let us know and then delete it from your system. P Please consider the impact on the environment before printing this e-mail. [[alternative HTML version deleted]] __ R-help@r-project.orgmailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Envfit, inconsistant result?
rodrock rodrigo.vargasgaete at gmail.com writes: I am using the envfit function over an ordination of floristic data. The problem is that every time that I run it changes the results. Sometimes dramatically, selecting variables that the first time were not significant. I do not get what could be the problem or if is normal given the permutations are different. # the NMDS ordination gap_flor_NMDS_chord - metaMDS(gaps_flor, distance = euclid, k = 2, trymax = 20, autotransform =TRUE, noshare = 0.1, wascores = TRUE, expand = TRUE, trace = 1, plot = FALSE, old.wa = FALSE, zerodist = add) # the Envfit calculation exp_flor1 - envfit(gap_flor_NMDS_chord, explain1, permu = 999, na.rm=T) Rodrigo, Do you re-run your NMDS when the envfit results change? The results of NMDS may change between two runs. You should first check that the NMDS results are nearly identical. This you can do with procrustes() function in vegan: plot(procrustes(gap_flor_NMDS1, gap_flor_NMDS2)) where the arguments gap_flor_NMDS1 and gap_flor_NMDS2 are two NMDS results that gave you different envfit results. Use of plot() above helps you to avoid looking at tiny differences in a numeric results: if you cannot see a difference in the procrustes plot, there is no difference you need to worry about. Only worry about envfit differences if the NMDS results are practically identical. I'm surprised if metaMDS didn't issue any warning on the combination of arguments you used in metaMDS: having no.share 0 and distance = euclid sounds dubious, and I have tried to catch those cases and issue an informative warning. You could also try higher trymax to get more consistent NMDS results. Cheers, Jari Oksanen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting Lines Through multiple groups
if it was me, i'd do library(ggplot2) df - read.table(pipe(pbpaste), header=T, sep='\t') df$Animus=factor(df$Animal, labels=c(Tom, Dick, Harry)) qplot(Day, Cort, data = df, geom=line, colour=Animus) On Sep 28, 2011, at 8:03 AM, clara_eco wrote: Hi I have data in the following format Cort Day Animal 230 1 273 1 240 2 271 2 342 2 303 2 244 2 200 3 241 3 282 3 344 3 etc. It is measured across time(day) however no every individual is measured the same number of times. All I want to do is plot the Raw data and then run a line connecting the data points of each individual animal, for the example above there would be three lines as there are three animals. I've tried, gplots, lattice and car but I'm not really figuring it out, any help would be greatly appreciated! Thanks Clara -- View this message in context: http://r.789695.n4.nabble.com/Plotting-Lines-Through-multiple-groups-tp3850099p3850099.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fitting a GLM: Problems with ns date functions
Hello, I am attempting to use R as part of a time-series analysis investigating the influence of meteorological factors on health outcomes. The test csv dta file that i am working with contains a complete daily set of the variables ‘DATE’, ‘ADMINS’, ‘NOX’, ‘TEMP’ across a 5 year period (1827 days). Within attempting to fit a GLM, I believe that I am having difficulty with the ns function, although problems could also be arising through date functions in R. Attached is my code and following errors, as a new starter within the world of R any help would be much appreciated. ## [1] Original Analysis: FAILS ## packages loaded for use within my wider data analysis library(splines) library(xtable) library(tsModel) library(lattice) library(mda) library(gam) ## splines: For date there are 4 seasons across 5 years ## splines: For temp cold and warm seasons are 6 months long fit - glm(J00_99 ~ NOX_LIN + ns(DATE_B, 4 * 5) + ns(TEMP_LIN, 6), data = data, family = poisson) ### Error in (1 - h) * qs[i] : non-numeric argument to binary operator pr - predict(fit, type = terms) ### Error in predict(fit, type = terms) : object 'fit' not found ## [2] Attempt 2: FAILS # So I have also been informed that when working with date i should use the chron # package to convert the dates within my data file to a usable format. This could be causing such problems library(chron) dates-as.POSIXct(strptime(data[, DATE], format = %d/%m/%Y, GMT)) fit - glm(ADMINS ~ NOX + ns(date, 4 * 5) + ns(TEMP, 6), data = data, family = poisson) ## Error in as.vector(x, mode) : cannot coerce type 'closure' to vector of type 'any' -- View this message in context: http://r.789695.n4.nabble.com/Fitting-a-GLM-Problems-with-ns-date-functions-tp3850379p3850379.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OT: (quasi-?) separation in a logistic GLM
I know that this is a quite old post but I am dealing with a similar warning message and, also after reading all the posts here, I am still in doubt with what I should do with my analysis. I have a dataset where the binary response variable is sex, and the predictors are several variables (they are percentage). I know that one of this variables, hcp, predicts quite well the sex of the individuals given that the ca. 85% of females have hcp0 while ca. 85% of females have hcp=0. I don't believe that it does matter, but all the values are very low (the maximum value of a predictor doesn't exceed 0.5) and most of predictors show a very left-skewed distribution (log-normal). In any case, I believe that it could be said that quasi-separation is occurring. When I run the saturated model I get the warning message: * glm.sat1-glm(sex~hwp+hcp+hnp+twp+d1lp*d2dp+hwp:hcp+hwp:hnp+hwp:twp+hcp:hnp+hcp:twp+hnp:twp,data=dati,family=binomial) Mensajes de aviso perdidos glm.fit: fitted probabilities numerically 0 or 1 occurred summary(glm.sat1) Call: glm(formula = sex ~ hwp + hcp + hnp + twp + d1lp * d2dp + hwp:hcp + hwp:hnp + hwp:twp + hcp:hnp + hcp:twp + hnp:twp, family = binomial, data = dati) Deviance Residuals: Min 1Q Median 3Q Max -2.6838 -0.1439 0.2699 0.5694 3.8421 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) 8.940 4.046 2.210 0.027125 * hwp-4.897 4.550 -1.076 0.281793 hcp -414.796113.780 -3.646 0.000267 *** hnp-3.689 6.692 -0.551 0.581487 twp 5.450 2.566 2.124 0.033657 * d1lp -22.436 13.837 -1.621 0.104926 d2dp -21.076 9.266 -2.274 0.022940 * d1lp:d2dp 65.102 35.394 1.839 0.065864 . hwp:hcp 313.514823.782 0.381 0.703516 hwp:hnp -46.572 79.467 -0.586 0.557834 hwp:twp 8.252 21.503 0.384 0.701156 hcp:hnp -1721.320 1925.766 -0.894 0.371409 hcp:twp -94.153459.710 -0.205 0.837721 hnp:twp16.598 38.548 0.431 0.666769 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 581.51 on 425 degrees of freedom Residual deviance: 281.98 on 412 degrees of freedom (41 observations deleted due to missingness) AIC: 309.98 Number of Fisher Scoring iterations: 8 ** The estimates of hcp are very high and its Std.Error are also quite high. If I go further with the analyses and use the step procedure I finally get a candidate minimal adequate model: step.1-step(glm.sat1) Start: AIC=309.98 sex ~ hwp + hcp + hnp + twp + d1lp * d2dp + hwp:hcp + hwp:hnp + hwp:twp + hcp:hnp + hcp:twp + hnp:twp Df DevianceAIC - hcp:twp1 282.02 308.02 - hwp:hcp1 282.11 308.11 - hwp:twp1 282.13 308.13 - hnp:twp1 282.17 308.17 - hwp:hnp1 282.32 308.32 - hcp:hnp1 282.90 308.90 none 281.98 309.98 - d1lp:d2dp 1 285.43 311.43 Step: AIC=308.02 sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp + hwp:hcp + hwp:hnp + hwp:twp + hcp:hnp + hnp:twp Df DevianceAIC - hwp:hcp1 282.13 306.13 - hwp:twp1 282.14 306.14 - hnp:twp1 282.18 306.18 - hwp:hnp1 282.43 306.43 - hcp:hnp1 283.07 307.07 none 282.02 308.02 - d1lp:d2dp 1 285.43 309.43 Step: AIC=306.13 sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp + hwp:hnp + hwp:twp + hcp:hnp + hnp:twp Df DevianceAIC - hwp:twp1 282.24 304.24 - hnp:twp1 282.27 304.27 - hwp:hnp1 282.53 304.53 - hcp:hnp1 283.25 305.25 none 282.13 306.13 - d1lp:d2dp 1 285.46 307.46 Step: AIC=304.24 sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp + hwp:hnp + hcp:hnp + hnp:twp Df DevianceAIC - hnp:twp1 282.35 302.35 - hwp:hnp1 282.83 302.83 - hcp:hnp1 283.36 303.36 none 282.24 304.24 - d1lp:d2dp 1 285.56 305.56 Step: AIC=302.35 sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp + hwp:hnp + hcp:hnp Df DevianceAIC - hwp:hnp1 283.06 301.06 - hcp:hnp1 283.37 301.37 none 282.35 302.35 - d1lp:d2dp 1 285.71 303.71 - twp1 306.17 324.17 Step: AIC=301.06 sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp + hcp:hnp Df DevianceAIC - hcp:hnp1 284.36 300.36 none 283.06 301.06 - d1lp:d2dp 1 286.26 302.26 - hwp1 286.96 302.96 - twp1 307.98 323.98 Step: AIC=300.36 sex ~ hwp + hcp + hnp + twp + d1lp + d2dp + d1lp:d2dp Df DevianceAIC none 284.36 300.36 - hnp1 287.07 301.07 - d1lp:d2dp 1 287.80 301.80 - hwp1 288.08 302.08 - twp1 308.39 322.39 - hcp1 484.12 498.12 Hubo 40
[R] apply lm function to dataset split by two variables
Dear all, I am not fluent in R and am struggling to 1) apply a lm to a weight-size dataset, thus the model has to run separately for each species, each year; 2) extract coefs, r-squared, n, etc. The data look like this: yearsps cm w 200950 16 22 200950 17 42 200950 18 45 200951 15 45 200951 16 53 200951 17 73 201050 15 22 201050 16 41 201050 16 21 201050 17 36 201051 15 43 201051 16 67 201051 17 79 The following script works for data from a single year, but I don't find a way to subset the data by sps AND year and get the function running: f - function(data) lm(log(w) ~ log(cm+0.5), data = data) v - lapply(split(data, data$sps), f) and then I can extract the data with this script from Peter Solymos (although I do not get the number of points used in the analysis): myFun - function(lm) { out - c(lm$coefficients[1], lm$coefficients[2], length(lm$run1$model$y), summary(lm)$coefficients[2,2], pf(summary(lm)$fstatistic[1], summary(lm)$fstatistic[2], summary(lm)$fstatistic[3], lower.tail = FALSE), summary(lm)$r.squared) names(out) - c(intercept,slope,n,slope.SE,p.value,r.squared) return(out)} results - list() for (i in 1:length(v)) results[[names(v)[i]]] - myFun(v[[i]]) as.data.frame(results) I have checked the plyr package, but the example that fits my data best uses a for loop and I would like to avoid these. I have also tried the following (among many other options) without results: v-tapply(data$w,list(data$cm,data$year),f) Error in is.function(FUN) : 'FUN' is missing Any ideas? Thanks for your help, Elena [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] metaMDS
On Fri, 2011-09-23 at 12:43 -0500, Jean V Adams wrote: Lineth Contreras wrote on 09/23/2011 11:35:10 AM: Hello R-user community, I am applying the function metaMDS. However, I would like to know if there is any option to export the data I got from the axis as a data frame. I have tried as.data.frame.list but is not working. Any suggestion? Thank you in advance for your help, Lineth When you say the data I got from the axis do you mean the coordinates contained in the $points of the resulting object? If so, something like this should work (using the example provide in ?metaMDS): You would be better off with the scores() method for metaMDS objects: data(dune) sol - metaMDS(dune) scrs - scores(sol) `scrs` is a matrix: class(scrs) [1] matrix which can be exported via say `write.csv()`: write.csv(scrs, filenames.csv) If you want a data frame in R, then SCRS - as.data.frame(scrs) will work, but there is little reason to convert to a data frame just to export it out of R. HTH G data(dune) library(MASS) sol - metaMDS(dune) df - as.data.frame(sol$points) Jean [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting Lines Through multiple groups
Hi Clara, are there repeated measures of an animal on the same day? Otherwise I did not get the point of the level of aggregation. Using lattice you can do sth like xyplot(Cort~Day,groups=Animal,type=c(p,a),data=df) but with your given data this just connects the raw points hth. Am 28.09.2011 08:03, schrieb clara_eco: Hi I have data in the following format Cort Day Animal 230 1 273 1 240 2 271 2 342 2 303 2 244 2 200 3 241 3 282 3 344 3 etc. It is measured across time(day) however no every individual is measured the same number of times. All I want to do is plot the Raw data and then run a line connecting the data points of each individual animal, for the example above there would be three lines as there are three animals. I've tried, gplots, lattice and car but I'm not really figuring it out, any help would be greatly appreciated! Thanks Clara -- View this message in context: http://r.789695.n4.nabble.com/Plotting-Lines-Through-multiple-groups-tp3850099p3850099.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Eik Vettorazzi Institut für Medizinische Biometrie und Epidemiologie Universitätsklinikum Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 -- Pflichtangaben gemäß Gesetz über elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG): Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; Gerichtsstand: Hamburg Vorstandsmitglieder: Prof. Dr. Jörg F. Debatin (Vorsitzender), Dr. Alexander Kirstein, Joachim Prölß, Prof. Dr. Dr. Uwe Koch-Gromus __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Password protected R Repository
Hi, I've set up a very simple R repository. Just a single source library. Everything works fine. I can install the package on my client using: install.packages(repos='http://www.myServer.se/myRepo/', pkgs='myLib', dep=TRUE) However, I want to protect the repo, so I use a .htaccess, placed directly under 'myRepo' on the server. I use 'Authentication Basic' and 'require valid-user'. I've tried a few things. From the obvious: install.packages(repos=getURL('http://www.myServer.se/myRepo', userpwd='user:password'), pkgs='myLib', dep=TRUE) To the more elaborate: h [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Password protected R Repository
stefan.petersson at inizio.se writes: Hi, I've set up a very simple R repository. Just a single source library. Everything works fine. I can install the package on my client using: install.packages(repos='http://www.myServer.se/myRepo/', pkgs='myLib', dep=TRUE) However, I want to protect the repo, so I use a .htaccess, placed directly under 'myRepo' on the server. I use 'Authentication Basic' and 'require valid-user'. I've tried a few things. From the obvious: install.packages(repos=getURL('http://www.myServer.se/myRepo', userpwd='user:password'), pkgs='myLib', dep=TRUE) To the more elaborate: h [[alternative HTML version deleted]] I add this myself, since some strange 'alternative HTML version deleted' thingy cut my post short. Here is the rest: To the more elaborate: h - getCurlHandle(header = TRUE, userpwd = user:password, netrc = TRUE, followlocation = TRUE ) install.packages(getURL(http://www.myServer.se/myRepo/;, verbose = TRUE, curl = h ), pkgs='myLib', dep=TRUE ) But it's not working. The last call is complaining of a missing index.html. And if I put one under myRepo, I get connected to that page, but install.packages can't go further to the src directory on the server. This is what I get: Installing package(s) into ‘/usr/local/lib/R/site-library’ (as ‘lib’ is unspecified) Warning: unable to access index for repository HTTP/1.1 301 Moved Permanently Date: Wed, 28 Sep 2011 09:20:25 GMT Server: Apache Location: http://www.myServer.se/myRepo/ Vary: Accept-Encoding Content-Length: 235 Content-Type: text/html; charset=iso-8859-1 HTTP/1.1 403 Forbidden Date: Wed, 28 Sep 2011 09:20:25 GMT Server: Apache Vary: Accept-Encoding Content-Length: 208 Content-Type: text/html; charset=iso-8859-1 !DOCTYPE HTML PUBLIC -//IETF//DTD HTML 2.0//EN htmlhead title403 Forbidden/title /headbody h1Forbidden/h1 pYou don't have permission to access /smisc/ on this server./p /body/html /src/contrib Warning message: In getDependencies(pkgs, dependencies, available, lib) : package ‘smisc’ is not available (for R version 2.13.1) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Password protected R Repository
Who said that RCurl::getURL worked with install.packages? (At least, I assume this is from RCurl: you did not mention it.) install.packages() first calls available.packages(), and that uses download.file to get the PACKAGES[.gz] file. It then calls download.file to get the packages. So please read the help for download.file (as the help pages say), and try the solutions described there. On Wed, 28 Sep 2011, Stefan Petersson wrote: stefan.petersson at inizio.se writes: Hi, I've set up a very simple R repository. Just a single source library. Everything works fine. I can install the package on my client using: install.packages(repos='http://www.myServer.se/myRepo/', pkgs='myLib', dep=TRUE) However, I want to protect the repo, so I use a .htaccess, placed directly under 'myRepo' on the server. I use 'Authentication Basic' and 'require valid-user'. I've tried a few things. From the obvious: *None* of this is 'obvious', and none of it is reproducible. install.packages(repos=getURL('http://www.myServer.se/myRepo', userpwd='user:password'), pkgs='myLib', dep=TRUE) To the more elaborate: h [[alternative HTML version deleted]] I add this myself, since some strange 'alternative HTML version deleted' thingy cut my post short. Here is the rest: To the more elaborate: h - getCurlHandle(header = TRUE, userpwd = user:password, netrc = TRUE, followlocation = TRUE ) install.packages(getURL(http://www.myServer.se/myRepo/;, verbose = TRUE, curl = h ), pkgs='myLib', dep=TRUE ) But it's not working. The last call is complaining of a missing index.html. And if I put one under myRepo, I get connected to that page, but install.packages can't go further to the src directory on the server. This is what I get: Installing package(s) into ‘/usr/local/lib/R/site-library’ (as ‘lib’ is unspecified) Warning: unable to access index for repository HTTP/1.1 301 Moved Permanently Date: Wed, 28 Sep 2011 09:20:25 GMT Server: Apache Location: http://www.myServer.se/myRepo/ Vary: Accept-Encoding Content-Length: 235 Content-Type: text/html; charset=iso-8859-1 HTTP/1.1 403 Forbidden Date: Wed, 28 Sep 2011 09:20:25 GMT Server: Apache Vary: Accept-Encoding Content-Length: 208 Content-Type: text/html; charset=iso-8859-1 !DOCTYPE HTML PUBLIC -//IETF//DTD HTML 2.0//EN htmlhead title403 Forbidden/title /headbody h1Forbidden/h1 pYou don't have permission to access /smisc/ on this server./p /body/html /src/contrib Warning message: In getDependencies(pkgs, dependencies, available, lib) : package ‘smisc’ is not available (for R version 2.13.1) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Printing an xtable with type = html
I have been playing around with producing tables using xtable and the type = html argument when printing. For example, if xtbl is the output of a dataframe which has been run through xtable, using the command: print(xtbl, type = html, html.table.attributes = border = '1', align = 'center') I would be interested to see other examples of the use of xtable to produce html. There is a whole vignette on using xtable to produce all sorts of tables for incorporation into a TeX document but I have found no examples of producing html with any table attributes. Ideally xtable should be able to access a css file but I don't see any mechanism for doing that. Perhaps someone can enlighten me. David Scott -- _ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142,NEW ZEALAND Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 Email: d.sc...@auckland.ac.nz, Fax: +64 9 373 7018 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data transformation cleaning
On 09/28/2011 01:13 PM, pip56789 wrote: Hi, I have a few methodological and implementation questions for ya'll. Thank you in advance for your help. I have a dataset that reflects people's preference choices. I want to see if there's any kind of clustering effect among certain preference choices (e.g. do people who pick choice A also pick choice D). I have a data set that has one record per user ID, per preference choice. It's a long form of a data set that looks like this: ID | Page 123 | Choice A 123 | Choice B 456 | Choice A 456 | Choice B ... I thought that I should do the following 1. Make the data set wide, counting the observations so the data looks like this: ID | Count of Preference A | Count of Preference B 123 | 1 | 1 ... Using table1- dcast(data,ID ~ Page,fun.aggregate=length,value_var='Page' ) 2. Create a correlation matrix of preferences cor(table2[,-1]) How would I restrict my correlation to show preferences that met a minimum sample threshold? Can you confirm if the two following commands do the same thing? What would I do from here (or am I taking the wrong approach) table1- dcast(data,Page ~ Page,fun.aggregate=length,value_var='Page' ) table2- with(data, table(Page,Page)) Hi Peter, An easy way to visualize set intersections is the intersectDiagram function in the plotrix package. This will display the counts or percentages of each type of intersection. Your data could be passed like this: choices-data.frame(IDs=sample(1:20,50,TRUE), sample(LETTERS[1:4],50,TRUE)) library(plotrix) intersectDiagram(choices) This example is a bit messy, as it will generate quite a few repeated choices that will be ignored by intersectDiagram, but it should give you the idea. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply lm function to dataset split by two variables
Hi: Here's one way to do it with the plyr package: dd - read.table(textConnection( yearsps cm w 200950 16 22 200950 17 42 200950 18 45 200951 15 45 200951 16 53 200951 17 73 201050 15 22 201050 16 41 201050 16 21 201050 17 36 201051 15 43 201051 16 67 201051 17 79), header = TRUE) closeAllConnections() library('plyr') # Input a data frame, output a list of lm objects modlist - dlply(dd, .(sps, year), function(d) lm(w ~ cm, data = d)) # For use in plyr's ldply() function, the utility function should # return a data frame. We save some effort in simple linear regression # by noting that the two-sided p-value of the t-test of zero slope is the # same as that of the overall F test: extractfun - function(m) { cf - coef(m) tinfo - summary(m)$coefficients[2, c(2, 4)] r2 - summary(m)$r.squared data.frame(intercept = cf[1], slope = cf[2], n = length(resid(m)), slope.se = tinfo[1], pval = tinfo[2], Rsq = r2) } # Take a list (of models) as input and output a data frame: ldply(modlist, extractfun) sps year intercept slope n slope.se pval Rsq 1 50 2009 -159.1667 11.5 3 4.907477 0.2567749 0.8459488 2 50 2010 -82. 7.0 4 7.141428 0.4303481 0.3245033 3 51 2009 -167. 14.0 3 3.464102 0.1544210 0.9423077 4 51 2010 -225. 18.0 3 3.464102 0.1210377 0.9642857 HTH, Dennis On Wed, Sep 28, 2011 at 1:41 AM, Elena Guijarro elena.guija...@vi.ieo.es wrote: Dear all, I am not fluent in R and am struggling to 1) apply a lm to a weight-size dataset, thus the model has to run separately for each species, each year; 2) extract coefs, r-squared, n, etc. The data look like this: year sps cm w 2009 50 16 22 2009 50 17 42 2009 50 18 45 2009 51 15 45 2009 51 16 53 2009 51 17 73 2010 50 15 22 2010 50 16 41 2010 50 16 21 2010 50 17 36 2010 51 15 43 2010 51 16 67 2010 51 17 79 The following script works for data from a single year, but I don't find a way to subset the data by sps AND year and get the function running: f - function(data) lm(log(w) ~ log(cm+0.5), data = data) v - lapply(split(data, data$sps), f) and then I can extract the data with this script from Peter Solymos (although I do not get the number of points used in the analysis): myFun - function(lm) { out - c(lm$coefficients[1], lm$coefficients[2], length(lm$run1$model$y), summary(lm)$coefficients[2,2], pf(summary(lm)$fstatistic[1], summary(lm)$fstatistic[2], summary(lm)$fstatistic[3], lower.tail = FALSE), summary(lm)$r.squared) names(out) - c(intercept,slope,n,slope.SE,p.value,r.squared) return(out)} results - list() for (i in 1:length(v)) results[[names(v)[i]]] - myFun(v[[i]]) as.data.frame(results) I have checked the plyr package, but the example that fits my data best uses a for loop and I would like to avoid these. I have also tried the following (among many other options) without results: v-tapply(data$w,list(data$cm,data$year),f) Error in is.function(FUN) : 'FUN' is missing Any ideas? Thanks for your help, Elena [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply lm function to dataset split by two variables
At 09:41 28/09/2011, Elena Guijarro wrote: Dear all, I am not fluent in R and am struggling to 1) apply a lm to a weight-size dataset, thus the model has to run separately for each species, each year; 2) extract coefs, r-squared, n, etc. The data look like this: yearsps cm w 200950 16 22 200950 17 42 200950 18 45 200951 15 45 200951 16 53 200951 17 73 201050 15 22 201050 16 41 201050 16 21 201050 17 36 201051 15 43 201051 16 67 201051 17 79 The following script works for data from a single year, but I don't find a way to subset the data by sps AND year and get the function running: I think lmList from the nlme package does this for you. It comes with some other helpful extractors or you can write your own as you have done. Personally I would return a list rather than a vector but that is a matter of taste. f - function(data) lm(log(w) ~ log(cm+0.5), data = data) v - lapply(split(data, data$sps), f) and then I can extract the data with this script from Peter Solymos (although I do not get the number of points used in the analysis): myFun - function(lm) { out - c(lm$coefficients[1], lm$coefficients[2], length(lm$run1$model$y), summary(lm)$coefficients[2,2], pf(summary(lm)$fstatistic[1], summary(lm)$fstatistic[2], summary(lm)$fstatistic[3], lower.tail = FALSE), summary(lm)$r.squared) names(out) - c(intercept,slope,n,slope.SE,p.value,r.squared) return(out)} results - list() for (i in 1:length(v)) results[[names(v)[i]]] - myFun(v[[i]]) as.data.frame(results) I have checked the plyr package, but the example that fits my data best uses a for loop and I would like to avoid these. I have also tried the following (among many other options) without results: v-tapply(data$w,list(data$cm,data$year),f) Error in is.function(FUN) : 'FUN' is missing Any ideas? Thanks for your help, Elena [[alternative HTML version deleted]] Michael Dewey i...@aghmed.fsnet.co.uk http://www.aghmed.fsnet.co.uk/home.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problems using the 'HPloglik' function in the SDE package
Hello all, I am trying to produce the closed-form Ait-Sahalia approximation to the log-likelihood function of a diffusion process but the function arguments I am passing to the function result in NaN being returned. I've checked the 'Transform','S' and 'M' functions and the problem doesn't seem to be with these. Does anyone have any ideas why this isn't working? My code is reproduced below for info. Thanks in advance. Regards, Steven ## simulate sample of diffusion process to be estimated # define drift and diffusion coefficients for SDE d - expression(x^(-1) - 1 + x - x^2) s - expression(x^1.3) s.x - expression(1.3*(x^0.3)) # simulate process using the Milstein approximation X - sde.sim(t0=0,T=100,X0=1,N=1000,drift=d,sigma=s,sigma.x=s.x,method=milstein) ## define the Lamperti transform function Transform - function(t,x,theta) (1/(theta[5]*(theta[6]-1)))*(x^(1-theta[6])) ## define the diffusion function S - function(t,x,theta) theta[5]*(x^theta[6]) ## define the transformed drift function and it's first six derivatives m0 - function(t,x,theta) { b - theta[5]*(theta[6]-1) p1 - (theta[6]+1)/(theta[6]-1) p2 - theta[6]/(theta[6]-1) p3 - (2-theta[6])/(1-theta[6]) (-1/theta[5])*(theta[1]*((b*x)^p1) - theta[2]*((b*x)^p2) + theta[3]*b*x - theta[4]*((b*x)^p3) - 0.5*theta[6]*(theta[5]^2)*((b*x)^(-1))) } m1 - function(t,x,theta) { b - theta[5]*(theta[6]-1) p1 - (theta[6]+1)/(theta[6]-1) p2 - theta[6]/(theta[6]-1) p3 - (2-theta[6])/(1-theta[6]) (-b/theta[5])*(theta[1]*(p1*(b*x)^(p1-1)) - p2*theta[2]*((b*x)^(p2-1)) + theta[3] - p3*theta[4]*((b*x)^(p3-1)) + 0.5*theta[6]*(theta[5]^2)*((b*x)^(-2))) } m2 - function(t,x,theta) { b - theta[5]*(theta[6]-1) p1 - (theta[6]+1)/(theta[6]-1) p2 - theta[6]/(theta[6]-1) p3 - (2-theta[6])/(1-theta[6]) (-(b^2)/theta[5])*(theta[1]*(p1*(p1-1)*(b*x)^(p1-2)) - p2*(p2-1)*theta[2]*((b*x)^(p2-2)) - p3*(p3-1)*theta[4]*((b*x)^(p3-2)) - theta[6]*(theta[5]^2)*((b*x)^(-3))) } m3 - function(t,x,theta) { b - theta[5]*(theta[6]-1) p1 - (theta[6]+1)/(theta[6]-1) p2 - theta[6]/(theta[6]-1) p3 - (2-theta[6])/(1-theta[6]) (-(b^3)/theta[5])*(theta[1]*(p1*(p1-1)*(p1-2)*(b*x)^(p1-3)) - p2*(p2-1)*(p2-2)*theta[2]*((b*x)^(p2-3)) - p3*(p3-1)*(p3-2)*theta[4]*((b*x)^(p3-3)) + 3*theta[6]*(theta[5]^2)*((b*x)^(-4))) } m4 - function(t,x,theta) { b - theta[5]*(theta[6]-1) p1 - (theta[6]+1)/(theta[6]-1) p2 - theta[6]/(theta[6]-1) p3 - (2-theta[6])/(1-theta[6]) (-(b^4)/theta[5])*(theta[1]*(p1*(p1-1)*(p1-2)*(p1-3)*(b*x)^(p1-4)) - p2*(p2-1)*(p2-2)*(p2-3)*theta[2]*((b*x)^(p2-4)) - p3*(p3-1)*(p3-2)*(p3-3)*theta[4]*((b*x)^(p3-4)) - 12*theta[6]*(theta[5]^2)*((b*x)^(-5))) } m5 - function(t,x,theta) { b - theta[5]*(theta[6]-1) p1 - (theta[6]+1)/(theta[6]-1) p2 - theta[6]/(theta[6]-1) p3 - (2-theta[6])/(1-theta[6]) (-(b^5)/theta[5])*(theta[1]*(p1*(p1-1)*(p1-2)*(p1-3)*(p1-4)*(b*x)^(p1-5)) - p2*(p2-1)*(p2-2)*(p2-3)*(p2-4)*theta[2]*((b*x)^(p2-5)) - p3*(p3-1)*(p3-2)*(p3-3)*(p3-4)*theta[4]*((b*x)^(p3-5)) + 60*theta[6]*(theta[5]^2)*((b*x)^(-6))) } m6 - function(t,x,theta) { b - theta[5]*(theta[6]-1) p1 - (theta[6]+1)/(theta[6]-1) p2 - theta[6]/(theta[6]-1) p3 - (2-theta[6])/(1-theta[6]) (-(b^6)/theta[5])*(theta[1]*(p1*(p1-1)*(p1-2)*(p1-3)*(p1-4)*(p1-5)*(b*x)^(p1-6)) - p2*(p2-1)*(p2-2)*(p2-3)*(p2-4)*(p2-5)*theta[2]*((b*x)^(p2-6)) - p3*(p3-1)*(p3-2)*(p3-3)*(p3-4)*(p3-5)*theta[4]*((b*x)^(p3-6)) - 360*theta[6]*(theta[5]^2)*((b*x)^(-7))) } # now create list consisting of m0..m6 M - list(m0,m1,m2,m3,m4,m5,m6) ## use HPloglik function to calculate log-likelihood function at the true parameter values HPloglik(X,c(1,1,1,1,1,1.3),Moments,Transform,S) -- S [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to solve a simple discrete Bayesian Belief Network?
Can somebody save-me? Thanks in advance! #R script: #trying to find out how solve a discrete Bayesian Belief Network. #option: using 'catnet' package #BEGIN library(catnet) cnet - cnNew(nodes = c(a, b, c), cats = list(c(1, 2), c(1, 2), c(1, 2)), parents = list(NULL, c(1), c(1, 2)), probs = list(c(0.2, 0.8), list(c(0.6, 0.4), c(0.4, 0.6)), list(list(c(0.3, 0.7), c(0.7, 0.3)), list(c(0.9, 0.1), c(0.1, 0.9) #what is the probability of a=1? cnNodeMarginalProb(cnet,node=1)[1] #0.2 #what is the probability of b=2? cnNodeMarginalProb(cnet,node=2)[2] #0.56 #what is the probability of c=1? cnNodeMarginalProb(cnet,node=3)[1] #0.428 #but how can I answer questions like: #what is the probability of a=1 given that c=1? i.e. P(a=1|c=1)? #or what is the probability of a=1 given that b=2 and c=1? i.e. P(a=1|b=2,c=1)? #END -- Marcio Pupin Mello Survey Engineer PhD candidate in Remote Sensing National Institute for Space Research (INPE) - Brazil Laboratory of Remote Sensing in Agriculture and Forestry (LAF) www.dsr.inpe.br/~mello __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Not saving plot
Hi I got a strange problem. I got this code: poc-function(USER=nana,REGISTRY=nana,...){ require(RMySQL) require(R2HTML) require(lattice) ... (removed the dataprocessing) toReturn=capture.output(HTML(as.title(En liten rapport!),file=,align=center)) fileDir-paste(/home/joel/tmpR/rikssvikt/,USER,sep=); fileName-tempfile(pattern=UTT1, tmpdir=); try(png(paste(fileDir,fileName,.png,sep=))); barchart(toPlot,xlab=Antal körningar); dev.off(); gfn -paste(filemanager?USERID=,USER,file=,fileName,.png,sep=); tmp-capture.output(HTMLInsertGraph(file=,GraphFileName=gfn)); toReturn-paste(toReturn,substr(tmp,1,nchar(tmp)-8)); return(paste(div,toReturn[2],/div,sep=)) } When I just run the code without its being in the function in works and saves the plot at the designated location. But when I instead try to run the code when its inside the function it dossent save the plot at all. Anyone know why that is? //joel -- View this message in context: http://r.789695.n4.nabble.com/Not-saving-plot-tp3850981p3850981.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Negative Quartile
This mailing list is about R help, not basic statistics help. Your message is full of what appear to be misconceptions, though it could be just something basic that affects the whole thing. You should consult with a statistician locally for most effective communication. --- Jeff Newmiller The . . Go Live... DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Damian Abalo wardamo.z...@gmail.com wrote: Hello, I have a doubt, but it is more statistic than just about R: How the people deal usually with negative percentile and quartile? In my concrete case, I want to know the distribution of an error, so the nearest it is to 0 the better (I think it would be optimun to have the nearest values to 0 in the lower percentiles). The best result would be making the percentile of the absolute value (without sign), but the sing is also important, and I need to keep it. This results in the big negative values to be in the lower percentiles, which are usually understanded as the best cases, since we are working with an error. My question is, what is the usual thing done in this cases? keep the sign, and read the lower percentiles as bad cases also (being the optimal ones the center ones) or try to somehow get the signless distribution keeping the signs somehow? And if this is the case, how can this be done? Thanks for your time, Best regards _ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Asking an old housekeeping question...
My bad. I know I've asked this some time back but I'm afraid I've lost the post along the way and am not having much luck searching the net. Don't want to pad my R setup with orphan packages. Is there a small snippet of code or even a single command that would allow me to check the packages I do use for their dependencies and then I can chuck those that don't turn up as wanted on the journey. Thanks in advance. Later days... -- Brian Lunergan Nepean, Ontario Canada __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Asking an old housekeeping question...
??dependencies shows up (on my system) tools::package.dependencies tools::dependsOnPkgs The first looks for dependencies; the second for reverse dependencies. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Brian Lunergan Sent: 28 September 2011 13:55 To: R_Help List Subject: [R] Asking an old housekeeping question... My bad. I know I've asked this some time back but I'm afraid I've lost the post along the way and am not having much luck searching the net. Don't want to pad my R setup with orphan packages. Is there a small snippet of code or even a single command that would allow me to check the packages I do use for their dependencies and then I can chuck those that don't turn up as wanted on the journey. Thanks in advance. Later days... -- Brian Lunergan Nepean, Ontario Canada __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Not saving plot
Seems to have something todo with barchart. Cuz if I just change it to plot it works. But I want to use barchart due to the graphical advantages so do anyone have an ide. -- View this message in context: http://r.789695.n4.nabble.com/Not-saving-plot-tp3850981p3851255.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Not saving plot
On Sep 28, 2011, at 8:19 AM, Joel wrote: Hi I got a strange problem. I got this code: poc-function(USER=nana,REGISTRY=nana,...){ require(RMySQL) require(R2HTML) require(lattice) ... (removed the dataprocessing) toReturn=capture.output(HTML(as.title(En liten rapport!),file=,align=center)) fileDir-paste(/home/joel/tmpR/rikssvikt/,USER,sep=); fileName-tempfile(pattern=UTT1, tmpdir=); try(png(paste(fileDir,fileName,.png,sep=))); barchart(toPlot,xlab=Antal körningar); dev.off(); gfn -paste(filemanager?USERID=,USER,file=,fileName,.png,sep=); tmp-capture.output(HTMLInsertGraph(file=,GraphFileName=gfn)); toReturn-paste(toReturn,substr(tmp,1,nchar(tmp)-8)); return(paste(div,toReturn[2],/div,sep=)) } When I just run the code without its being in the function in works and saves the plot at the designated location. But when I instead try to run the code when its inside the function it dossent save the plot at all. Anyone know why that is? Hi, Might this be related to R FAQ 7.22? http://cran.r-project.org/doc/FAQ/R-FAQ.html Cheers, Ben //joel -- View this message in context: http://r.789695.n4.nabble.com/Not-saving-plot-tp3850981p3850981.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Ben Tupper Bigelow Laboratory for Ocean Sciences 180 McKown Point Rd. P.O. Box 475 West Boothbay Harbor, Maine 04575-0475 http://www.bigelow.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Password protected R Repository
Prof Brian Ripley ripley at stats.ox.ac.uk writes: Who said that RCurl::getURL worked with install.packages? (At least, I assume this is from RCurl: you did not mention it.) install.packages() first calls available.packages(), and that uses download.file to get the PACKAGES[.gz] file. It then calls download.file to get the packages. So please read the help for download.file (as the help pages say), and try the solutions described there. On Wed, 28 Sep 2011, Stefan Petersson wrote: The helpfiles for 'download.file' was not that helpful. But maybe it's just me not being able to read them correctly. I tried to call install.packages with the 'method=wget', and hoped for a username and password dialog. But no luck. Other than that, I see no arguments that relates to my problem under ?download.file. Which btw is 'installing an R library from a password protected URL (Apache Basic Authentication)'. Actually, nobody said that RCurl::getURL would work with install.packages, but from what was written in a post somewhere I jumped to the (false) conclusion that it would work. That's why I tried it. Any hints would be greatly appreciated! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Not saving plot
On Wed, 28 Sep 2011, Joel wrote: Seems to have something todo with barchart. Cuz if I just change it to plot it works. But I want to use barchart due to the graphical advantages so do anyone have an ide. Joel, Perhaps. Do you want to save your figure to disk as a .pdf, .jpg, or .png? If so, you need to tell R that's what you want. For example, jpg(mybarchart.jpg) now enter your barchart creation command dev.off() This sequence directs the output of the figure from the screen to the named file. Afterwards you need to return the display to the screen or any other plotting command will be saved in that one file, overwriting whatever it contains. Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How does the survfit.coxph calculate the survival value?
The answers to your questions consume an entire chapter of my book, so it is difficult to write a reasonable email response. When there are time dependent covariates, the creation and interpretation of a survival curve is particularly subtle. The survfit.coxph routine calculates the usual estimates: Breslow, Fleming-Harrington, or Kalbfleisch-Prentice; these differ in how they handle ties but the numerical results are usually very close. A treatise on the formulas, theory, and derivation of these is beyond the scope of this message. Terry Therneau --- begin included message --- I am in an urgent using R to fit the cox proportional model. My data is a data frame including 100 individuals which are software stress tests. There are time-to-failure and six covariates meaning the system resource (every 4 minutes). Is it possible to use R to fit the Cox model? I used *coxph* to estimated the coefficients for both time-independent and time-dependent covariates. And next I need to calculated the survival and hazard rate for cox model with time-independent and that with time-dependent covariates. I see many people use *survfit.coxph *to get the survival. Could anybody please tell me how is the survival value calculated? For example, what kind of procedure. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Not saving plot
You sir are a hero! It works now thanks to your fast and accurate answer. Thanks for the help. //joel -- View this message in context: http://r.789695.n4.nabble.com/Not-saving-plot-tp3850981p3851331.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] dump.frames, debugger and the ... argument
Dear List, i run R non interactively and use option(error = dump.frames) to reconstruct the cause of errors for debugging. One of the functions in the call stack however uses the ... (dot dot dot) argument. When I run debugger() and try to browse to this functions' environment I get the following error message: Error in get(.obj, envir = dump[[.selection]]) : argument ... is missing, with no default Some googleing showed that this seems to be a more common problem but I was not able to find any solution. Any ideas? Jannis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survival analysis: interval censored data
You have still not given me enough information to reproduce your problem. Why doesn't it include all years? I have no way of knowing, since we have no data. --- begin included message -- halo david when I use type= 'interval' Call: survfit(formula = Surv(ingreso, fecha, estado, type = interval) ~ categoria) and when I use just Call: survfit(formula = Surv(ingreso, fecha, estado) ~ categoria) I don t know why when I use type = interval it does not survival calculed for very year regards __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Download statistics for a package
Hi, How can I get information on how many times a particular package has been downloaded from CRAN? Thanks, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edumailto:rvarad...@jhmi.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] GLMER Syntax Question
I was wondering is someone can explain me the differences between these (random slopes and intercept) models model1 - glmer(output~A+B+C+(A+B+C | F) ) model2 - glmer(output~A+B+C+(A | F)+(B | F)+(C | F) Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GLMER Syntax Question
Ubuntu Diego ubuntu.diego at gmail.com writes: I was wondering is someone can explain me the differences between these (random slopes and intercept) models model1 - glmer(output~A+B+C+(A+B+C | F) ) model2 - glmer(output~A+B+C+(A | F)+(B | F)+(C | F) You should ask this question on the r-sig-mixed-models mailing list. sincerely Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survival analysis: interval censored data
hallo terry: I attached araceae data set, when I use this: surara-survfit(Surv(time,time2,event)~categoria) Call: survfit(formula = Surv(time, time2, event) ~ categoria) records n.max n.start events median 0.95LCL 0.95UCL categoria=C 94 63 0 21 NA NA NA categoria=E 111 77 0 26 NA NA NA summary(surara) Call: survfit(formula = Surv(time, time2, event) ~ categoria) categoria=C time n.risk n.event entered censored survival std.err lower 95% CI upper 95% CI 2006 63 5 0 23 0.921 0.0341 0.856 0.990 2007 35 2 30 1 0.868 0.0483 0.778 0.968 2008 62 5 1 3 0.798 0.0536 0.700 0.910 2009 55 4 0 5 0.740 0.0570 0.636 0.861 2010 46 5 0 41 0.660 0.0611 0.550 0.791 categoria=E time n.risk n.event entered censored survival std.err lower 95% CI upper 95% CI 2005 71 7 3 0 0.901 0.0354 0.835 0.973 2006 67 2 0 22 0.875 0.0391 0.801 0.955 2007 43 1 36 1 0.854 0.0432 0.774 0.943 2008 77 5 0 8 0.799 0.0469 0.712 0.896 2009 64 4 1 3 0.749 0.0502 0.657 0.854 2010 58 7 0 51 0.658 0.0545 0.560 0.774 but whe I included type=interval, suraraint-survfit(Surv(time,time2,event,type='interval')~categoria) # falta arreglar lo del intervalo!!! summary(suraraint) Call: survfit(formula = Surv(time, time2, event, type = interval) ~ categoria) categoria=C time n.risk n.event survival std.err lower 95% CI upper 95% CI 2004 95.00 13.14 0.862 0.0354 0.795 0.934 2007 31.86 7.19 0.667 0.0695 0.544 0.818 2008 1.67 1.67 0.000 NaN NA NA categoria=E time n.risk n.event survival std.err lower 95% CI upper 95% CI 2004 112.0 18.47 0.835 0.0351 0.769 0.907 2005 40.5 1.06 0.813 0.0401 0.738 0.896 2007 37.5 7.46 0.651 0.0620 0.540 0.785 it does not survival calculed for very year I have a one-year interval between each census De: Terry Therneau thern...@mayo.edu Para: Ruth Arias rueu...@yahoo.es CC: r-help@r-project.org Enviado: miércoles 28 de septiembre de 2011 16:00 Asunto: Re: survival analysis: interval censored data You have still not given me enough information to reproduce your problem. Why doesn't it include all years? I have no way of knowing, since we have no data. --- begin included message -- halo david when I use type= 'interval' Call: survfit(formula = Surv(ingreso, fecha, estado, type = interval) ~ categoria) and when I use just Call: survfit(formula = Surv(ingreso, fecha, estado) ~ categoria) I don t know why when I use type = interval it does not survival calculed for very year regardstimefamilia genero categoria time2 event 2004Araceae Anthurium C 20100 2004Araceae PhilodendronC 20061 2004Araceae Anthurium C 20100 2004Araceae Anthurium C 20100 2004Araceae Anthurium C 20100 2004Araceae PhilodendronC 20060 2004Araceae PhilodendronC 20060 2004Araceae Anthurium C 20100 2004Araceae Anthurium C 20090 2004Araceae Anthurium C 20060 2004Araceae Anthurium C 20060 2004Araceae PhilodendronC 20060 2004Araceae PhilodendronC 20060 2004Araceae Anthurium C 20100 2004Araceae PhilodendronC 20101 2004Araceae PhilodendronC 20060 2004Araceae PhilodendronC 20060 2004Araceae PhilodendronC 20080 2004Araceae Araceae C 20060 2004Araceae Anthurium C 20100 2004Araceae Anthurium C 20081 2004Araceae PhilodendronC 20070 2004Araceae Anthurium C 20080 2004Araceae Anthurium C 20100 2004Araceae Anthurium C 20101 2004Araceae PhilodendronE 20080 2004Araceae Anthurium E 20060 2004Araceae Anthurium E 20091 2004Araceae Anthurium E 20060 2004Araceae Anthurium E 20081 2004Araceae Anthurium E 20100 2004Araceae Anthurium E 20100 2004Araceae Anthurium E 20100 2004Araceae Anthurium E 20100 2004Araceae Anthurium
Re: [R] Password protected R Repository
On 28.09.2011 15:51, Stefan Petersson wrote: Prof Brian Ripleyripleyat stats.ox.ac.uk writes: Who said that RCurl::getURL worked with install.packages? (At least, I assume this is from RCurl: you did not mention it.) install.packages() first calls available.packages(), and that uses download.file to get the PACKAGES[.gz] file. It then calls download.file to get the packages. So please read the help for download.file (as the help pages say), and try the solutions described there. On Wed, 28 Sep 2011, Stefan Petersson wrote: The helpfiles for 'download.file' was not that helpful. But maybe it's just me not being able to read them correctly. Yes, looks like this is the case. I tried to call install.packages with the 'method=wget', and hoped for a username and password dialog. But no luck. The help page says if proper values are stored in the configuration file for wget, so why do you expect a dialog? Best, Uwe Ligges Other than that, I see no arguments that relates to my problem under ?download.file. Which btw is 'installing an R library from a password protected URL (Apache Basic Authentication)'. Actually, nobody said that RCurl::getURL would work with install.packages, but from what was written in a post somewhere I jumped to the (false) conclusion that it would work. That's why I tried it. Any hints would be greatly appreciated! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Wilcox test and data collection
Dear Contributors I have a problem with the collection of data from the results of a test. I need to perform a comparative test over groups of data , recall the value of the pvalue and create a table. My problem is in the way to replicate the analysis over and over again over subsets of data according to a condition. I have this database, called y: gg t1 t2d 40 1 1 2 50 2 2 1 45 1 3 1 49 2 1 1 5 2 1 3 40 1 1 2 where gg takes values from 1 to 100, t1 and t2 have values in (1,2,3) and d in (0,1,2,3) I want to perform tests on the values of gg according to the conditions that d==0 , compare values of gg when t1==1 with values of gg when t1==3 d==1 , compare values of gg when t1==1 with values of gg when t1==3 d==2 , compare values of gg when t1==1 with values of gg when t1==3 .. then d==0 , compare values of gg when t2==1 with values of gg when t2==3 d==1... then collect the data of a statistics and create a table. The procedure i followed is to create sub datasets called m0,m1,m2,m3 corresponding to the values of d, i.e. m0- y[y$d==0,c(7,17,18,19)] m1- y[y$d==1,c(7,17,18,19)] m2- y[y$d==2,c(7,17,18,19)] m3- y[y$d==3,c(7,17,18,19)] then perform the test as follows: x1-wilcox.test(m0[m0$t1==1,1],m0[m0$t1==3,1],correct=FALSE, exact=FALSE, conf.int=TRUE,alternative = c(g)) #ABC ID x2- wilcox.test(m1[m1$t1==1,1],m1[m1$t1==3,1],correct=FALSE, exact=FALSE, conf.int=TRUE,alternative = c(g)) x3- wilcox.test(m2[m2$t1==1,1],m2[m2$t1==3,1],correct=FALSE, exact=FALSE, conf.int=TRUE,alternative = c(g)) x4- wilcox.test(m3[m3$t1==1,1],m3[m3$t1==3,1],correct=FALSE, exact=FALSE, conf.int=TRUE,alternative = c(g)) each of these tests will create an object, say x and then I extract the value statistics using x$statistics. How to automatize this? Thank you for any help you can provide. Francesca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Download statistics for a package
On 28.09.2011 16:10, Ravi Varadhan wrote: Hi, How can I get information on how many times a particular package has been downloaded from CRAN? You cannot since I guess not even all mirrors will have logs and there is no aggregation of mirror logs happening nor is it clear if we can use mirror logs at all legally in all countries. Uwe Thanks, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edumailto:rvarad...@jhmi.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get old packages to work on R 2.12.1
Many thanks for all the suggestions. For me, the update.packages command will not work behind the firewall at my workplace. But the respondents have given me more than enough suggestions to work with. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dump.frames, debugger and the ... argument
On 28/09/2011 9:59 AM, Jannis wrote: Dear List, i run R non interactively and use option(error = dump.frames) to reconstruct the cause of errors for debugging. One of the functions in the call stack however uses the ... (dot dot dot) argument. When I run debugger() and try to browse to this functions' environment I get the following error message: Error in get(.obj, envir = dump[[.selection]]) : argument ... is missing, with no default Some googleing showed that this seems to be a more common problem but I was not able to find any solution. Any ideas? Well, we're just over a month from a new release. Post a reproducible example and this will likely get fixed. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question concerning Box.test
Well, you could to the following to throw out the NAs on a column basis before being passed to Box.test apply(P, 2, function(x) Box.test(na.omit(x))$p.value) and that probably will work for you. This will only throw out the NAs in each column and won't cancel across columns. However, all the standard points about throwing away data apply: specifically, you should ask yourself why you have NA returns on a certain day -- perhaps that is itself deeply significant -- and make sure the pattern is random (enough). Hope this helps, Michael Weylandt On Wed, Sep 28, 2011 at 5:11 AM, Samir Benzerfa benze...@gmx.ch wrote: Many many thanks! My code looked like this: *apply(P,2, FUN=Box.test(P)$p.value)* So the problem was that I didnt include function(x) before Box.test(x)$p.value and that I used my object P in the brackets instead of x. It works fine now. ** ** I have one last question regarding this problem: The help file for Box.test states that Missing values are not handled. Any idea what that technically means? Does it mean that NAs are included in the calculations or does R just skip missing values? Since I have many NA Values in my data I would like R to exclude these values temporarily for the calculations. Specifically Id like R to consider each column without any NAs before performing the Box.test for that column and then pass to the next one, such that I get the test results for each column as if there where no NAs in it. I tried to do this with *na.rm=T or na.action= *as an argument for the function apply, but R refuses (error message: *error in FUN( ); unused argument na.rm=T*). I know that I could clean my data from NAs by using na.omit or na.exclude, but this cancels all rows which include an NA and since I have about 3000 columns (stocks) and each one includes several different NAs I would end up with very few data. Any hints for this issue? ** ** Best, S.B. ** ** ** ** *Von:* R. Michael Weylandt [mailto:michael.weyla...@gmail.com] *Gesendet:* Mittwoch, 28. September 2011 00:39 *An:* Samir Benzerfa; r-help *Betreff:* Re: [R] Question concerning Box.test ** ** Plaintext data looks like this: P - structure(list(X77.BANK = c(0, 0, 0, 0.003181659, -0.006386799, 0.028028724, -0.015347692, -0.015910002, 0.00322897, -0.013062473, 0, -0.03809005, 0.021189299, -0.003460532, -0.010550182, 0.01744389, 0.010139631, -0.01709, 0.010299957, 0), A...A.MATERIAL = c(0, 0, 0, -0.008194479, -0.008352074, 0.008352074, 0.004116566, 0.01608682, 0.019305155, -0.011479818, 0, -0.011791525, -0.00804272, -0.008194479, -0.012589127, 0.016705694, 0, 0.012120633, 0.023271342, -0.007619397 )), .Names = c(X77.BANK, A...A.MATERIAL), row.names = c(NA, 20L), class = data.frame) Can you provide your test code exactly? The following both work for me: R apply(P,2,Box.test) $X77.BANK Box-Pierce test data: newX[, i] X-squared = 3.1825, df = 1, p-value = 0.07443 $A...A.MATERIAL Box-Pierce test data: newX[, i] X-squared = 0.3258, df = 1, p-value = 0.5682 R apply(P,2,function(x) Box.test(x)$p.value) X77.BANK A...A.MATERIAL 0.07443097 0.56815070 Michael On Tue, Sep 27, 2011 at 5:42 PM, Samir Benzerfa benze...@gmx.ch wrote:** ** Please find the plain text output in the attachment. Yes, Im sorry! I apologize for the confusion of rows and columns. Actually, I want to perform the test for each column and not row. *Von:* R. Michael Weylandt [mailto:michael.weyla...@gmail.com] *Gesendet:* Dienstag, 27. September 2011 17:51 *An:* Samir Benzerfa; r-help *Betreff:* Re: [R] Question concerning Box.test Send this again using dput() to give a plain text output and I'll look at it. Also, I think you should probably look into the difference between a row and a column. Michael On Tue, Sep 27, 2011 at 11:48 AM, Samir Benzerfa benze...@gmx.ch wrote:* *** Many thanks for your hint. I tried regular apply now. However, it still doesnt work. Function apply works fine with other regular functions like sum or mean. But for the function Box.test(x, ) it gives me the following error message: *Error in Box.test( ) : * * x is not a vector or univariate time series* * * For simplicity, I tried to do the test with a simple 2x20 Matrix for 2 stocks (see below), but it still does not work. It works well if I do the test individually for each row à Box.test(x[,1], ) and Box.test(x[,2], )** ** BANK.ABC ABC.MATERIAL 1 0.00.0 2 0.00.0 3 0.00.0 4 0.003181659 -0.008194479 5 -0.006386799 -0.008352074 6 0.0280287240.008352074 7 -0.0153476920.004116566 8 -0.0159100020.016086820 9 0.0032289700.019305155 10 -0.013062473 -0.011479818 11 0.0
Re: [R] inset one map on top of another map
Thank you, Ray! That worked. Thank you also, Yihui. I tried your subplot() suggestion, but couldn't get it to work with a map() on a map() Jean Ray Brownrigg ray.brownr...@ecs.vuw.ac.nz wrote on 09/27/2011 07:26:34 PM: On Wed, 28 Sep 2011, Jean V Adams wrote: I want to overlay a small inset map on top of another map, but I can't figure out how to do it. For example, here are two different maps: # map 1 - Ohio map(state, region= ohio) # map 2 - US with Ohio darkened map(state) map(state, region=ohio, fill=T, add=T) I would like to add map 2 as a small inset in the corner of map 1. I have tried: map(state, region= ohio) par(new=TRUE, mar=c(3, 3, 15, 15)) map(state) map(state, region=ohio, fill=T, add=T) but this seems to erase map 1 and replace it with a full size version of map 2. I can successfully overlay an unrelated plot using similar code: map(state, region= ohio) par(new=TRUE, mar=c(3, 3, 15, 15)) plot(1:10, 1:10) So, there must be something about the maps() function that I'm tripping over. I think the problem is that map() 'insists' (in some sense) on a clean frame so it can get the aspect ratio right. Any suggestions? How about something like: map(state, region= ohio, xlim=c(-85, -80), ylim=c(38, 42)) par(usr=c(-216, -66, 24, 144)) # you should be able to 'automate' this calculation map(state, add=T) map(state, region=ohio, fill=T, add=T) HTH Ray Brownrigg I am using R for Windows 2.13.0 and the maps package version 2.1-5. Jean `�.,, (((� `�.,, (((� `�.,, (((� Jean V. Adams Statistician U.S. Geological Survey Great Lakes Science Center 223 East Steinfest Road Antigo, WI 54409 USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dump.frames, debugger and the ... argument
Well, its 2.10, a slightly old version. I know that I am advised to upgrade to the current version. Unfortunately the installed version on the machine I am using is out of my reach. The reproducible example would be: dummyFunct = function(a, ...) { b = 2* a c = d # should cause error return(b) } options(error = quote(dump.frames(dumpto = last.dump, to.file = TRUE))) dummyFunct(1) load(last.dump.rda) debugger(last.dump) # selection of call 1 causes: # Error in get(.obj, envir = dump[[.selection]]) : # argument ... is missing, with no default R version 2.10.1 (2009-12-14) x86_64-unknown-linux-gnu attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.10.1 --- Prof Brian Ripley rip...@stats.ox.ac.uk schrieb am Mi, 28.9.2011: Von: Prof Brian Ripley rip...@stats.ox.ac.uk Betreff: Re: [R] dump.frames, debugger and the ... argument An: Duncan Murdoch murdoch.dun...@gmail.com CC: Jannis bt_jan...@yahoo.de Datum: Mittwoch, 28. September, 2011 15:40 Uhr On Wed, 28 Sep 2011, Duncan Murdoch wrote: On 28/09/2011 9:59 AM, Jannis wrote: Dear List, i run R non interactively and use option(error = dump.frames) to reconstruct the cause of errors for debugging. One of the functions in the call stack however uses the ... (dot dot dot) argument. When I run debugger() and try to browse to this functions' environment I get the following error message: Error in get(.obj, envir = dump[[.selection]]) : argument ... is missing, with no default Some googleing showed that this seems to be a more common problem but I was not able to find any solution. Any ideas? Well, we're just over a month from a new release. Post a reproducible example and this will likely get fixed. Looks like this is exactly problem fixed in r54480 | ripley | 2011-02-18 13:42:03 + (Fri, 18 Feb 2011) | 1 line use tryCatch to ensure that what is there can be examined Just what version of R is this? Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] removing outliers in non-normal distributions
Hello, I'm seeking ideas on how to remove outliers from a non-normal distribution predictor variable. We wish to reset points deemed outliers to a truncated value that is less extreme. (I've seen many posts requesting outlier removal systems. It seems like most of the replies center around why do you want to remove them, you shouldn't remove them, it depends, etc. so I've tried to add a lot of notes below in an attempt to answer these questions in advance.) Currently we Winsorize using the quantile function to get the new high and low values to set the outliers to on the high end and low end (this is summarized legacy code that I am revisiting): #Get the truncated values for resetting: lowq = quantile(dat,probs=perc_low,na.rm=TRUE) hiq = quantile(dat,probs=perc_hi,na.rm=TRUE) #resetting the highest and lowest values with the truncated values: dat[lowqdat] = lowq dat[hiqdat] = hiq What I don't like about this is that it always truncates values (whether they truly are outliers or not) and the perc_low and perc_hi settings are arbitrary. I'd like to be more intelligent about it. Notes: 1) Ranking has already been explored and is not an option at this time. 2) Reminder: these factors are almost always distributed non-normally. 3) For reason I won't get into here, I have to do this pragmatically. I can't manually inspect the data each time I remove outliers. 4) I will be removing outliers from candidate predictor variables. Predictors variable distributions all look very different from each other, so I can't make any generalizations about them. 5) As #4 above indicates, I am building and testing predictor variables for use in a regression model. 6) The predictor variable outliers are usually somewhat informative, but their extremeness is a result of the predictor variable calculation. I think extremeness takes away from the information that would otherwise be available (outlier effect). So I want to remove some, but not all, of their extremeness. For example, percent change of a small number: from say 0.001 to 500. Yes, we want to know that it changed a lot, but 49,999,900% is not helpful and masks otherwise useful information. I'd like to hear your ideas. Thanks in advance! Regards, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fGarch - Fitting and APARCH-Modell with fixed delta
Hi there, I'm trying to fit a GJR-GARCH Model using fGarch. I wanted to try that by fitting an APARCH model with a fixed delta of 2 and a non-fixed gamma. So I was simply trying to use: spec - garchFit(~aparch(1,1),data=garchSim(),delta=2) coef(spec) And sometimes, it's working like a charm and delta is indeed exactly 2 in the resulting coefficient vector. Frequently, though, the resulting delta coefficient has some other value, usually somewhere between 0 and 2. Any ideas why? Am I doing something wrong or could this be a bug in fGarch? -- View this message in context: http://r.789695.n4.nabble.com/fGarch-Fitting-and-APARCH-Modell-with-fixed-delta-tp3850702p3850702.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] autocorrelation problem with cointegration
Dear All, I am looking for a cointegration relationship between Spot and Future Price of commodites. The problem i am facing follows: 1. After estimating by Engle-Grranger Method, i found that the residuals are stationary at their level I (o), which is required to fulfill the cointegration test. But the autocorrelation problem arises, as DW statistics is signficantly low 0.50-0.88 for various commodities. My question is shall i go ahead with the results or not. 2. When i use Johansens Method i found at least one cointegrtion relation. But i am confused with lag selection criteria. I use VAR to select the lagselection criteria. But there is autocorrelation problem with the lags it is providing for AIC. Whether i should take first difference of the price level to estimate the VAR, then how to use the same lag selection criteria, when i am using price series in levels to estimate the cointegration by johansen method. Looking forward for your help With sincere regards, Upananda -- View this message in context: http://r.789695.n4.nabble.com/autocorrelation-problem-with-cointegration-tp3851336p3851336.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GAMs in R : How to put the new data into the model?
For example: GAMs and after stepwise regression: cod-gam(newCO~RH+s(solar,bs=cr)+windspeed+s(transport,bs=cr),family=gaussian (link=log),groupD,methods=REML) I used 10 year meterorology data (2000-2010) to form equation of concentration of carbon monoxide. NOW, I have 2011 meteorology data, I want to use the above GAMs to get the predict value of concentration of carbon monoxide. How can I put the 2011 data into this gam and get the expected value of concentration of Carbon monoxide?? -- View this message in context: http://r.789695.n4.nabble.com/GAMs-in-R-How-to-put-the-new-data-into-the-model-tp3849851p3850473.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cointegration test
Dear All, I am looking for a cointegration relationship between Spot and Future Price of commodites. The problem i am facing follows: 1. After estimating by Engle-Grranger Method, i found that the residuals are stationary at their level I (o), which is required to fulfill the cointegration test. But the autocorrelation problem arises, as DW statistics is signficantly low 0.50-0.88 for various commodities. My question is shall i go ahead with the results or not. 2. When i use Johansens Method i found at least one cointegrtion relation. But i am confused with lag selection criteria. I use VAR to select the lagselection criteria. But there is autocorrelation problem with the lags it is providing for AIC. Whether i should take first difference of the price level to estimate the VAR, then how to use the same lag selection criteria, when i am using price series in levels to estimate the cointegration by johansen method. Looking forward for your help With sincere regards, Upananda -- You may delay, but time will not. Research Scholar alternative mail id: up...@iitkgp.ac.in Department of HSS, IIT KGP KGP [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.csv behaviour
On 09/28/2011 09:23 AM, Mehmet Suzen wrote: This might be obvious but I was wondering if anyone knows quick and easy way of writing out a CSV file with varying row lengths, ideally an initial data read from a CSV file which has the same format. See example below. I found it quite strange that R cannot write it in one go, so one must append blocks or post-process the file, is this true? (even Ruby can do it!!) Otherwise it puts ,, or similar for missing column values in the shorter length rows and fill=FALSE option do not work! I don't want to post-process if possible. See this post: http://r.789695.n4.nabble.com/Re-read-csv-trap-td3301924.html Example that generated Error! writeLines(c(A,B,C,D, 1,a,b,c, 2,f,g,c, 3,a,i,j, 4,a,b,c, 5,d,e,f, 6,g,h,i,j,k,l,m,n), con=file(test.csv)) read.csv(test.csv) try(read.csv(test.csv,fill=FALSE)) Hi Mehmet, The example doesn't need to call file, writeLines does it for you. It worked for me: writeLines(c(A,B,C,D, 1,a,b,c, 2,f,g,c, 3,a,i,j, 4,a,b,c, 5,d,e,f, 6,g,h,i,j,k,l,m,n), con=test.csv) and to get the original object back, use: readLines(test.csv) The reason you can't use read.csv is that it returns a data frame, and that object can't have elements of unequal length. If you want an object with elements of unequal length, try: as.list(readLines(test.csv)) Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] generic argument passing to plot function
Hello I am trying to write a function that would plot timeseries easily... I aim at plotting two time-series on 2 different y axis sharing the same x-axis I succeded in doing so: plotTimeSerie(x,y,y2,[a lot of other args]){ ... plot() axis.POSIXct(side=1) #here I build the x-axis points() #here I plot the first time serie related to y-axis ... axis(side=2,[some args]) text(side=2,text=,[some args]) #here I build the y-left axis and annotate with text ... points() #here I plot the first time serie related to y-right axis axis(side=4,[some args]) text(side=2,text=,[some args]) #here I build the y-right axis and annotate with text } My problem is that I would like the user to be able to specify any parameters he wishes for the axis/points functions. How can I make plotTimeSerie handling any parameters and pass them into the specified functions (points,axis...)? -- View this message in context: http://r.789695.n4.nabble.com/generic-argument-passing-to-plot-function-tp3851599p3851599.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] number of items to replace is not a multiple of replacement length
Please help with this error message drugbook is an 885 x 32 dataframe names(drugbook) [1] DRUG1 DRUG2 DRUG3 DRUG4 DRUG5 [6] DRUG6 DRUG7 DRUG8 DRUG9 DRUG10 [11] DRUG11 DRUG12 DRUG13 DRUG14 DRUG15 [16] DRUG16 DrugDose1 DrugDose2 DrugDose3 DrugDose4 [21] DrugDose5 DrugDose6 DrugDose7 DrugDose8 DrugDose9 [26] DrugDose10 DrugDose11 DrugDose12 DrugDose13 DrugDose14 [31] DrugDose15 DrugDose16 Equivs is a dataframe holding information for a certain subclass of drugs head(Equivs) Name Equiv 1 Alprazolam 0.5 2 Bromazepam 5.5 3 Chlordiazepoxide 25.0 4 Clobazam 20.0 5 Clonazepam 0.5 6 Clorazepate 15.0 What I'm trying to is for each element of drugbook, get the value in Equivs$Equiv, and multiply it by the corresponding drug dose, which will be in the n+16 th column on in the same row of drugbook. Not all of the grabbed names from Drugbook will be in the dataframe Equivs, though. So sometimes the == will return a null. i.e. DrugDose16 is the dose of Drug 16 this I think should work: ( I did try apply and got knotted, so stopped) mydosequivalents=list() for( c in 1:16) { for (r in 1:885){ name-as.character(drugbook[r,c]); equivalent-Equivs$Equiv[Equivs$Name==name]; dosequivalent-drugbook[r,c+16] * equivalent; if (length(dosequivalent)0) mydosequivalents[r][c]-dosequivalent else (mydosequivalents[r][c]-0) } } But it is throwing an error, and I can't figure out why. I remedied a previous error by checking the length of the equivalent. warnings() 50: In mydosequivalents[c][r] - dosequivalent : number of items to replace is not a multiple of replacement length mydosequivalents ends up being full of NULL and zeros Thanks in advance for any help Ross -- View this message in context: http://r.789695.n4.nabble.com/number-of-items-to-replace-is-not-a-multiple-of-replacement-length-tp3851265p3851265.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rle function to expand for many samples
Dear R experts, code: m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','rep('numeric',150)) s-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]],rle(m$Sample3)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]],rle(m$Sample3)[[1]])) names(s)=c(Values,Probes) #Suppose the test file looks like with ofcourse more samples with values: Chr Start End Sample1 Sample2 Sample3 chr29896633 9896683 0 0 0 chr29896639 9896690 0 0 0 chr214314039143140980 -0.35 0 chr214404467144045020 -0.35 0.32 chr21442171814421777-0.43 -0.35 0.32 chr21603171016031769-0.43 -0.35 0.32 chr21603617816036237-0.43 -0.35 0.45 chr21604866516048724-0.43 -0.35 0.45 chr237491676374917350 0 0.45 chr237702947377030090 0 0 s Values Probes 10.00 4 2 -0.43 4 30.00 2 40.00 2 5 -0.35 6 60.00 2 70.00 3 80.32 3 90.45 3 10 0.00 1 Here s is nothing but sumarising similiar values with Probes giving the count of similiar values. How to I expand rle function to consider 150 samples here I have just shown for 3 samples other than manually expanding for the rest 147 samples? I have the rest of the code for my purpose just stuck with this step only. waiting for your reply, Thanks, Suji -- View this message in context: http://r.789695.n4.nabble.com/Rle-function-to-expand-for-many-samples-tp3851676p3851676.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to Store the executed values in a dataframe rle function
Here one approach: x - read.table(textConnection(Chr start end sample1 sample2 + chr2 9896633 9896683 0 0 + chr2 9896639 9896690 0 0 + chr2 14314039 14314098 0 -0.35 + chr2 14404467 14404502 0 -0.35 + chr2 14421718 14421777 -0.43 -0.35 + chr2 16031710 16031769 -0.43 -0.35 + chr2 16036178 16036237 -0.43 -0.35 + chr2 16048665 16048724 -0.43 -0.35 + chr2 37491676 37491735 0 0 + chr2 37702947 37703009 0 0), header = TRUE, as.is = TRUE) closeAllConnections() result - lapply(c('sample1', 'sample2'), function(.samp){ + # split by breaks in the values + .grps - split(x, cumsum(c(0, diff(x[[.samp]]) != 0))) + + # combine the list of dataframes + .range - do.call(rbind, lapply(.grps, function(.set){ + # create a dataframe of the results + data.frame(Sample = .samp +, Chr = .set$Chr[1L] +, Start = min(.set$start) +, End = max(.set$end) +, Values = .set[[.samp]][1L] +, Probes = nrow(.set) +) + })) + }) # put the list of dataframes together result - do.call(rbind, result) result Sample ChrStart End Values Probes 0 sample1 chr2 9896633 14404502 0.00 4 1 sample1 chr2 14421718 16048724 -0.43 4 2 sample1 chr2 37491676 37703009 0.00 2 01 sample2 chr2 9896633 9896690 0.00 2 11 sample2 chr2 14314039 16048724 -0.35 6 21 sample2 chr2 37491676 37703009 0.00 2 On Mon, Sep 26, 2011 at 10:30 AM, sujitha virith...@gmail.com wrote: Hi group, This is how my test file looks like: Chr start end sample1 sample2 chr2 9896633 9896683 0 0 chr2 9896639 9896690 0 0 chr2 14314039 14314098 0 -0.35 chr2 14404467 14404502 0 -0.35 chr2 14421718 14421777 -0.43 -0.35 chr2 16031710 16031769 -0.43 -0.35 chr2 16036178 16036237 -0.43 -0.35 chr2 16048665 16048724 -0.43 -0.35 chr2 37491676 37491735 0 0 chr2 37702947 37703009 0 0 This is the output that I am expecting: Sample Chr Start End Values Probes sample1 chr2 9896633 14404502 0 4 sample1 chr2 14421718 16048724 -0.43 4 sample1 chr2 37491676 37703001 0 2 sample2 chr2 9896633 9896690 0 2 sample2 chr2 14314039 16048724 -0.35 6 sample2 chr2 37491676 37703009 0 2 Here the Chr value is same but can be any other value aswell so unique among the similar values. The Start for the first line would be the least value until values are similiar (4) then the end would be highest value. The values is the unique value among the common values and probes is number of similar values. Code: m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric')) #reading the test file s-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]])) # to get the last 2 columns names(s)=c(Values,Probes) G=1 for(i in 1:length(s$Probes)){ + if(G==1){first-unique(m$Chr[G:s$Probes[i]]) + second-min(m$Start[G:s$Probes[i]]) + third-max(m$End[G:s$Probes[i]]) + c-cbind(first,second,third,s$Values[i],s$Probes[i]) + print (c) + G=(G+s$Probes[i])} + else if((G-1) length(m$Sample1)) { + first-unique(m$Chr[G:(G+s$Probes[i]-1)]) + second-min(m$Start[G:(G+s$Probes[i]-1)]) + third-max(m$End[G:(G+s$Probes[i]-1)]) + c-cbind(first,second,third,s$Values[i],s$Probes[i]) + print (c) + G=(G+s$Probes[i])} + else { + G=1 + first-unique(m$Chr[G:s$Probes[i]]) + second-min(m$Start[G:s$Probes[i]]) + third-max(m$End[G:s$Probes[i]]) + c-cbind(first,second,third,s$Values[i],s$Probes[i]) + print (c) + G=(G+s$Probes[i])} + } so the output is: first second third [1,] chr2 9896633 14404502 0 4 first second third [1,] chr2 14421718 16048724 -0.43 4 first second third [1,] chr2 37491676 37703009 0 2 first second third [1,] chr2 9896633 9896690 0 2 first second third [1,] chr2 14314039 16048724 -0.35 6 first second third [1,] chr2 37491676 37703009 0 2 I get almost the required output but just need 3 modifications to this code: 1) Since this is just a small part of the file (with 2 samples), but my actual file has 150 samples, so how do I write rle function for that? 2) How do I store all the executed c values as a dataframe (here I am just printing the values)? 3) How do I include sample name in execution? Waiting for your reply , Thanks, Suji -- View this message in context: http://r.789695.n4.nabble.com/How-to-Store-the-executed-values-in-a-dataframe-rle-function-tp3843944p3843944.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
[R] Handling missing values
Hi everyone, I've got a question concerning missing values in R. I'm looking for an R code which removes all NA values in my data.frame. I found many answers about how to remove rows or columns which contain NA's. My problem however is a bit different. I have a list of vectors. Each vector contains some NA Values that I want to remove. So, at the end all vectors will have different dimensions. My final goal is to apply a function(x) to all of the vectors. Any hints how to do this? Should I change the class=data.frame of my list of vectors? Below I made a simple example how my table looks like and how my result should look like. My table: V1 V2 1 1.2 2.1 2 1.3 2.2 3 NA 4.5 4 NA 1.5 5 1.9 NA My intended result: V1 V2 1.2 2.1 1.3 2.2 1.9 4.5 1.5 Many thanks in advance best regards S.B. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using the system command
Hi, I started playing around with a function for using StatTransfer (version 10) for importing data. This started as a simple task but it's not working and so now I'm very frustrated. I'm using R version 2.13 on Windows 7. The function, called fn.importData, is: function(file = NULL, type = NULL){ ## ## create statTransfer command file - stcmd ## ext - switch(type, sas = sas7bdat, excel = xls) tmp - paste(copy C:\\Temp\\, file, ., ext, r c:\\temp\\, file, .rdata -T , file, \n\nquit, sep = ) cat(tmp, file = c:/temp/transfer.stcmd, append = FALSE) ## ## transfer using StatTransfer ## system(paste('c:\\program files\\statTransfer10\\st.exe', 'c:\\temp\\transfer.stcmd'), wait = FALSE) ## ## load ## inpt - paste(c:\\temp\\, file, .RData, sep = ) ##return(inpt) load(inpt, .GlobalEnv) } My call using a sas file (vicks.sasa7bdat) is: fn.importData(file = vicks, type = sas) The StatTransfer command file is created and StatTransfer is called correctly. The translation from vicks.sas7bdat to vicks.RData occurs as it should - I can look at them in the temp directory. But I get an error message for the load function: Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection In addition: Warning message: In readChar(con, 5L, useBytes = TRUE) : cannot open compressed file 'c:\temp\vicks.RData', probable reason 'No such file or directory' If I uncomment the return(inpt) statement in the above function and use the following at the command line: load(fn.importData(file = vicks, type = sas)) then everything works fine. But of course I don't want to do this. Any suggestions as to what I'm doing wrong? Why the error message? Also, the StatTransfer program sometimes fails to close so if I repeatedly call my function I'll have multiple occurrences of StatTransfer running. How can I force it to close after the transfer? Thanks, Walt Walter R. Paczkowski, Ph.D. Data Analytics Corp. 44 Hamilton Lane Plainsboro, NJ 08536 (V) 609-936-8999 (F) 609-936-3733 w...@dataanalyticscorp.com www.dataanalyticscorp.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fGarch - Fitting and APARCH-Modell with fixed delta
Hi there, an answer from someone who's not an expert in the field but used to play around with time series: The action of simulating a process using with input parameters then estimating the parameters is not an invariant, especially when the process involves nonlinearities. Did you try increasing simulation length and the burn-in to see if you get more stable results ? JC 2011/9/28 Bogey a5462...@nepwk.com: Hi there, I'm trying to fit a GJR-GARCH Model using fGarch. I wanted to try that by fitting an APARCH model with a fixed delta of 2 and a non-fixed gamma. So I was simply trying to use: spec - garchFit(~aparch(1,1),data=garchSim(),delta=2) coef(spec) And sometimes, it's working like a charm and delta is indeed exactly 2 in the resulting coefficient vector. Frequently, though, the resulting delta coefficient has some other value, usually somewhere between 0 and 2. Any ideas why? Am I doing something wrong or could this be a bug in fGarch? -- View this message in context: http://r.789695.n4.nabble.com/fGarch-Fitting-and-APARCH-Modell-with-fixed-delta-tp3850702p3850702.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] removing outliers in non-normal distributions
Dear Ben, not specifically an R-related response, but the best philosophy of defining outliers, as far as I'm concerned, you'll find in Davies and Gather, The identification of multiple outliers, JASA 1993. The idea is that you can only properly define what an outlier is relative to a reference distributional shape. Note that the reference distributional shape is *not* what you believe the underlying distribution is, but rather a device to define outliers as points that clearly exceed the extremes that are normally to be expected under the reference shape. If your reference distributional shape is the normal, you need to set up a robust outlier identification rule that has a low probability to find *any* outlier if the data rally come from a normal. Basically declaring everything outside median +/- c*MAD will work (Hampel identifier), but c needs to depend on the size of the dataset in order to calibrate it so that for example under the normal the probability not to find any outlier is 0.05 (or whatever you want; note that it is always left to the user and to some extent arbitrary to specify a borderline). Some values for c are given in Davies and Gather. There are some slightly more efficient alternatives but the Hampel identifier is simple and still good. The same principle can be applied (although this is not done in the cited paper) if your reference distribution is different. This may for example make sense if you have skew data and you want a skew outlier identification rule. It should be more or less straightforward to adapt the ideas to a lognormal reference distribution. For others, I'm not sure if literature exists but maybe. A lot has happened since 1993. Hope this helps, Christian On Wed, 28 Sep 2011, Ben qant wrote: Hello, I'm seeking ideas on how to remove outliers from a non-normal distribution predictor variable. We wish to reset points deemed outliers to a truncated value that is less extreme. (I've seen many posts requesting outlier removal systems. It seems like most of the replies center around why do you want to remove them, you shouldn't remove them, it depends, etc. so I've tried to add a lot of notes below in an attempt to answer these questions in advance.) Currently we Winsorize using the quantile function to get the new high and low values to set the outliers to on the high end and low end (this is summarized legacy code that I am revisiting): #Get the truncated values for resetting: lowq = quantile(dat,probs=perc_low,na.rm=TRUE) hiq = quantile(dat,probs=perc_hi,na.rm=TRUE) #resetting the highest and lowest values with the truncated values: dat[lowqdat] = lowq dat[hiqdat] = hiq What I don't like about this is that it always truncates values (whether they truly are outliers or not) and the perc_low and perc_hi settings are arbitrary. I'd like to be more intelligent about it. Notes: 1) Ranking has already been explored and is not an option at this time. 2) Reminder: these factors are almost always distributed non-normally. 3) For reason I won't get into here, I have to do this pragmatically. I can't manually inspect the data each time I remove outliers. 4) I will be removing outliers from candidate predictor variables. Predictors variable distributions all look very different from each other, so I can't make any generalizations about them. 5) As #4 above indicates, I am building and testing predictor variables for use in a regression model. 6) The predictor variable outliers are usually somewhat informative, but their extremeness is a result of the predictor variable calculation. I think extremeness takes away from the information that would otherwise be available (outlier effect). So I want to remove some, but not all, of their extremeness. For example, percent change of a small number: from say 0.001 to 500. Yes, we want to know that it changed a lot, but 49,999,900% is not helpful and masks otherwise useful information. I'd like to hear your ideas. Thanks in advance! Regards, Ben [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. *** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get old packages to work on R 2.12.1
Joe, Most firewall issues can be resolved by entering setInternet2(use = TRUE) prior to the update.packges call. Rich On Wed, Sep 28, 2011 at 11:23 AM, Joseph Boyer joseph.g.bo...@gsk.comwrote: Many thanks for all the suggestions. For me, the update.packages command will not work behind the firewall at my workplace. But the respondents have given me more than enough suggestions to work with. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using the system command
system(wait=FALSE, command) means to return to R without waiting for the command to finish. E.g., on Linux with the following function f - function (seconds, wait) { tf - tempfile() on.exit(unlink(tf)) system(sprintf((sleep %d ; date) '%s', seconds, tf), wait = wait) readLines(tf) } we get f(3, wait=FALSE) character(0) f(3, wait=TRUE) [1] Wed Sep 28 10:23:33 PDT 2011 In 1 calls to f(seconds=0, wait=FALSE), 9915 resulted in character(0), meaning that the shell's processing of file had created the file but date had not yet put any text into it, 81 resuled in an error from readLines that the file did not not exist yet, and 4 succeeded in reading the date from the file. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: Data Analytics Corp. [mailto:w...@dataanalyticscorp.com] Sent: Wednesday, September 28, 2011 10:19 AM To: William Dunlap Subject: Re: [R] using the system command Hi William, When I use wait = TRUE I get a lot of intermediate commands from StatTransfer. For example, if the files it's transferring to already exist, it shows repeated messages asking if I want to overwrite. wait = FALSE seems to avoid this. Thanks, Walt Walter R. Paczkowski, Ph.D. Data Analytics Corp. 44 Hamilton Lane Plainsboro, NJ 08536 (V) 609-936-8999 (F) 609-936-3733 w...@dataanalyticscorp.com www.dataanalyticscorp.com _ On 9/28/2011 1:09 PM, William Dunlap wrote: Why do you use wait=FALSE in the call to system()? Do things work differently without it? Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Data Analytics Corp. Sent: Wednesday, September 28, 2011 9:05 AM To: r-help@r-project.org Subject: [R] using the system command Hi, I started playing around with a function for using StatTransfer (version 10) for importing data. This started as a simple task but it's not working and so now I'm very frustrated. I'm using R version 2.13 on Windows 7. The function, called fn.importData, is: function(file = NULL, type = NULL){ ## ## create statTransfer command file - stcmd ## ext- switch(type, sas = sas7bdat, excel = xls) tmp- paste(copy C:\\Temp\\, file, ., ext, r c:\\temp\\, file, .rdata -T , file, \n\nquit, sep = ) cat(tmp, file = c:/temp/transfer.stcmd, append = FALSE) ## ## transfer using StatTransfer ## system(paste('c:\\program files\\statTransfer10\\st.exe', 'c:\\temp\\transfer.stcmd'), wait = FALSE) ## ## load ## inpt- paste(c:\\temp\\, file, .RData, sep = ) ##return(inpt) load(inpt, .GlobalEnv) } My call using a sas file (vicks.sasa7bdat) is: fn.importData(file = vicks, type = sas) The StatTransfer command file is created and StatTransfer is called correctly. The translation from vicks.sas7bdat to vicks.RData occurs as it should - I can look at them in the temp directory. But I get an error message for the load function: Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection In addition: Warning message: In readChar(con, 5L, useBytes = TRUE) : cannot open compressed file 'c:\temp\vicks.RData', probable reason 'No such file or directory' If I uncomment the return(inpt) statement in the above function and use the following at the command line: load(fn.importData(file = vicks, type = sas)) then everything works fine. But of course I don't want to do this. Any suggestions as to what I'm doing wrong? Why the error message? Also, the StatTransfer program sometimes fails to close so if I repeatedly call my function I'll have multiple occurrences of StatTransfer running. How can I force it to close after the transfer? Thanks, Walt Walter R. Paczkowski, Ph.D. Data Analytics Corp. 44 Hamilton Lane Plainsboro, NJ 08536 (V) 609-936-8999 (F) 609-936-3733 w...@dataanalyticscorp.com www.dataanalyticscorp.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
Re: [R] Handling missing values
On Sep 28, 2011, at 11:49 AM, Samir Benzerfa wrote: Hi everyone, I've got a question concerning missing values in R. I'm looking for an R code which removes all NA values in my data.frame. I found many answers about how to remove rows or columns which contain NA's. My problem however is a bit different. I have a list of vectors. Each vector contains some NA Values that I want to remove. So, at the end all vectors will have different dimensions. Then it will no longer be a data.frame. My final goal is to apply a function(x) to all of the vectors. If the function has an na.rm= argument tha tyou cna set to TRUE, it may not be necessary to do all of this. Any hints how to do this? Should I change the class=data.frame of my list of vectors? Below I made a simple example how my table looks like and how my result should look like. lis.tbl - lapply(tbl, function(x) x[!is,na(x)] ) You can then use lapply to get your results from your function: yourres - lapply(lis.tbl, yourfunc) My table: V1 V2 1 1.2 2.1 2 1.3 2.2 3 NA 4.5 4 NA 1.5 5 1.9 NA My intended result: V1 V2 1.2 2.1 1.3 2.2 1.9 4.5 1.5 Many thanks in advance best regards S.B. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Password protected R Repository
The helpfiles for 'download.file' was not that helpful. But maybe it's just me not being able to read them correctly. Yes, looks like this is the case. I tried to call install.packages with the 'method=wget', and hoped for a username and password dialog. But no luck. The help page says if proper values are stored in the configuration file for wget, so why do you expect a dialog? Best, Uwe Ligges Well, I expect a dialog because when I use (for example) ncftp without a conf file, I get a dialog asking me for site, usr and pwd. For me, not being a black belt R user, it's not so strange expecting a dialog of some kind when a usr/pwd is required. But maybe that's just me... And I misunderstood the helpfile. I thought it referred to proxy servers alone, and that it didn't concern Apache Authentication. My error. Thanks for clearing that up. My misunderstanding - Method ‘wget’ can be used with proxy firewalls which require user/password authentication if proper values are stored in the configuration file for ‘wget’. So, I added 'http_user=usr' and 'http_passwd=pwd' to my /etc/wgetrc and run: install.packages(repos='http://www.myServer.se/myRepo/', method='wget', pkgs='myLib', dep=TRUE) Bob is my uncle! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] compute probabilities on a Bayesian Network (SOLVED)
I studied a little (I mean a lot) and found out a solution. Then I decided to post here aiming to help people who are interested... The solution uses the gRain package... but because one dependence (the graph package) has been removed from the CRAN repository you have to use the follow script to install gRain package (I tested it on Windows 7): #run to install gRain and graph packages chooseBioCmirror() #then choose the nearest one setRepositories() #then use Ctrl key to select all install.packages(c(graph,gRain),dependencies=TRUE)) Then you can run the code to build the Bayesian Network structure of the figure in http://www.dsr.inpe.br/~mello/1727/BNgrapmodel.png #run to build the Bayesian Network shown in Figure library(gRain) v.L-cptable(~L,values=c(0.35,(1-0.35)),levels=c(T,F),normalize=T,smooth=0) v.R-cptable(~R,values=c(0.3,(1-0.3)),levels=c(T,F),normalize=T,smooth=0) v.H-cptable(~H|D,values=c(0.75,(1-0.75),0.2,(1-0.2)),levels=c(T,F),normalize=T,smooth=0) v.D-cptable(~D,values=c(0.05,(1-0.05)),levels=c(T,F),normalize=T,smooth=0) v.C-cptable(~C|S,values=c(0.9,(1-0.9),0.12,(1-0.12)),levels=c(T,F),normalize=T,smooth=0) v.S-cptable(~S|D:H:R:L,values=c(0.60,(1-0.60),0.65,(1-0.65),0.62,(1-0.62),0.67,(1-0.67),0.61,(1-0.61),0.64,(1-0.64),0.61,(1-0.61),0.70,(1-0.70),0.05,(1-0.05),0.09,(1-0.09),0.07,(1-0.07),0.10,(1-0.10),0.06,(1-0.06),0.08,(1-0.08),0.07,(1-0.07),0.12,(1-0.12)),levels=c(T,F),normalize=T,smooth=0) CPT-compileCPT(list(v.L,v.R,v.H,v.D,v.C,v.S)) BN-grain(CPT,smooth=0) #answering the 1st question: # what is the probability of S=T given C=T, L=T, R=F, H=F and D=F? scenaria.q1-list(nodes=c(C,L,R,H,D),states=c(T,T,F,F,F)) querygrain(setFinding(BN,nodes=scenaria.q1$nodes,states=scenaria.q1$states),nodes=S) #answering the 2nd question: # what is the probability of S=T given C=F, L=F, R=T, H=T, and D=T? scenaria.q2-list(nodes=c(C,L,R,H,D),states=c(F,F,T,T,T)) querygrain(setFinding(BN,nodes=scenaria.q2$nodes,states=scenaria.q2$states),nodes=S) #answering the 3rd question: # what is the probability of what is the probability of S=T when my only one evidence is that C=T? scenaria.q3-list(nodes=c(C),states=c(T)) querygrain(setFinding(BN,nodes=scenaria.q3$nodes,states=scenaria.q3$states),nodes=S) On 26/09/2011 18:42, Marcio Pupin Mello wrote: Deal R Users, I'm trying to find out how can I compute probabilities on a Bayesian Network using R. The Bayesian Network I modelled is shown at http://www.dsr.inpe.br/~mello/1727/BNgrapmodel.png and I'd like to know how can I proceed (commands on R) to answer questions like: (1) what is the probability of S=T given C=T, L=T, R=F, H=F and D=F; or (2) what is the probability of S=T given C=F, L=F, R=T, H=T, and D=T; or even (3) what is the probability of S=T when my only one evidence is that C=T (i.e I don't know the other variables values). I used Microsoft Belief Networks to compute the results but I don't know how to do it with R. Looking forward to receiving replies! Thanks in advance, Marcio Pupin Mello -- Marcio Pupin Mello Survey Engineer Ph.D candidate in Remote Sensing National Institute for Space Research (INPE) - Brazil Laboratory of Remote Sensing in Agriculture and Forestry (LAF) www.dsr.inpe.br/~mello __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to solve a simple discrete Bayesian Belief Network?
I found a solution using the gRain package... see (but using another example).. I studied a little (I mean a lot) and found out a solution. Then I decided to post here aiming to help people who are interested... The solution uses the gRain package... but because one dependence (the graph package) has been removed from the CRAN repository you have to use the follow script to install gRain package (I tested it on Windows 7): #run to install gRain and graph packages chooseBioCmirror() #then choose the nearest one setRepositories() #then use Ctrl key to select all install.packages(c(graph,gRain),dependencies=TRUE)) Then you can run the code to build the Bayesian Network structure of the figure in http://www.dsr.inpe.br/~mello/1727/BNgrapmodel.png #run to build the Bayesian Network shown in Figure library(gRain) v.L-cptable(~L,values=c(0.35,(1-0.35)),levels=c(T,F),normalize=T,smooth=0) v.R-cptable(~R,values=c(0.3,(1-0.3)),levels=c(T,F),normalize=T,smooth=0) v.H-cptable(~H|D,values=c(0.75,(1-0.75),0.2,(1-0.2)),levels=c(T,F),normalize=T,smooth=0) v.D-cptable(~D,values=c(0.05,(1-0.05)),levels=c(T,F),normalize=T,smooth=0) v.C-cptable(~C|S,values=c(0.9,(1-0.9),0.12,(1-0.12)),levels=c(T,F),normalize=T,smooth=0) v.S-cptable(~S|D:H:R:L,values=c(0.60,(1-0.60),0.65,(1-0.65),0.62,(1-0.62),0.67,(1-0.67),0.61,(1-0.61),0.64,(1-0.64),0.61,(1-0.61),0.70,(1-0.70),0.05,(1-0.05),0.09,(1-0.09),0.07,(1-0.07),0.10,(1-0.10),0.06,(1-0.06),0.08,(1-0.08),0.07,(1-0.07),0.12,(1-0.12)),levels=c(T,F),normalize=T,smooth=0) CPT-compileCPT(list(v.L,v.R,v.H,v.D,v.C,v.S)) BN-grain(CPT,smooth=0) #answering the 1st question: # what is the probability of S=T given C=T, L=T, R=F, H=F and D=F? scenaria.q1-list(nodes=c(C,L,R,H,D),states=c(T,T,F,F,F)) querygrain(setFinding(BN,nodes=scenaria.q1$nodes,states=scenaria.q1$states),nodes=S) #answering the 2nd question: # what is the probability of S=T given C=F, L=F, R=T, H=T, and D=T? scenaria.q2-list(nodes=c(C,L,R,H,D),states=c(F,F,T,T,T)) querygrain(setFinding(BN,nodes=scenaria.q2$nodes,states=scenaria.q2$states),nodes=S) #answering the 3rd question: # what is the probability of what is the probability of S=T when my only one evidence is that C=T? scenaria.q3-list(nodes=c(C),states=c(T)) querygrain(setFinding(BN,nodes=scenaria.q3$nodes,states=scenaria.q3$states),nodes=S) On 28/09/2011 09:03, Marcio Pupin Mello wrote: Can somebody save-me? Thanks in advance! #R script: #trying to find out how solve a discrete Bayesian Belief Network. #option: using 'catnet' package #BEGIN library(catnet) cnet - cnNew(nodes = c(a, b, c), cats = list(c(1, 2), c(1, 2), c(1, 2)), parents = list(NULL, c(1), c(1, 2)), probs = list(c(0.2, 0.8), list(c(0.6, 0.4), c(0.4, 0.6)), list(list(c(0.3, 0.7), c(0.7, 0.3)), list(c(0.9, 0.1), c(0.1, 0.9) #what is the probability of a=1? cnNodeMarginalProb(cnet,node=1)[1] #0.2 #what is the probability of b=2? cnNodeMarginalProb(cnet,node=2)[2] #0.56 #what is the probability of c=1? cnNodeMarginalProb(cnet,node=3)[1] #0.428 #but how can I answer questions like: #what is the probability of a=1 given that c=1? i.e. P(a=1|c=1)? #or what is the probability of a=1 given that b=2 and c=1? i.e. P(a=1|b=2,c=1)? #END -- Marcio Pupin Mello Survey Engineer Ph.D candidate in Remote Sensing National Institute for Space Research (INPE) - Brazil Laboratory of Remote Sensing in Agriculture and Forestry (LAF) www.dsr.inpe.br/~mello __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] number of items to replace is not a multiple of replacement length
dunner wrote on 09/28/2011 08:43:32 AM: Please help with this error message drugbook is an 885 x 32 dataframe names(drugbook) [1] DRUG1 DRUG2 DRUG3 DRUG4 DRUG5 [6] DRUG6 DRUG7 DRUG8 DRUG9 DRUG10 [11] DRUG11 DRUG12 DRUG13 DRUG14 DRUG15 [16] DRUG16 DrugDose1 DrugDose2 DrugDose3 DrugDose4 [21] DrugDose5 DrugDose6 DrugDose7 DrugDose8 DrugDose9 [26] DrugDose10 DrugDose11 DrugDose12 DrugDose13 DrugDose14 [31] DrugDose15 DrugDose16 Equivs is a dataframe holding information for a certain subclass of drugs head(Equivs) Name Equiv 1 Alprazolam 0.5 2 Bromazepam 5.5 3 Chlordiazepoxide 25.0 4 Clobazam 20.0 5 Clonazepam 0.5 6 Clorazepate 15.0 What I'm trying to is for each element of drugbook, get the value in Equivs$Equiv, and multiply it by the corresponding drug dose, which will be in the n+16 th column on in the same row of drugbook. Not all of the grabbed names from Drugbook will be in the dataframe Equivs, though. So sometimes the == will return a null. i.e. DrugDose16 is the dose of Drug 16 this I think should work: ( I did try apply and got knotted, so stopped) mydosequivalents=list() for( c in 1:16) { for (r in 1:885){ name-as.character(drugbook[r,c]); equivalent-Equivs$Equiv[Equivs$Name==name]; dosequivalent-drugbook[r,c+16] * equivalent; if (length(dosequivalent)0) mydosequivalents[r][c]-dosequivalent else (mydosequivalents[r][c]-0) } } But it is throwing an error, and I can't figure out why. I remedied a previous error by checking the length of the equivalent. I think your error is coming from the way that you are trying to populate the list using [r][c]. Are you trying to have mydosequivalents be a list of r vectors each with c elements? warnings() 50: In mydosequivalents[c][r] - dosequivalent : number of items to replace is not a multiple of replacement length mydosequivalents ends up being full of NULL and zeros Thanks in advance for any help Ross Another way: # using my example data drugbook - data.frame(DRUG1=letters[1:5], DRUG2=letters[6:10], DrugDose1=1:5, DrugDose2=6:10) Equivs - data.frame(Name=letters[-c(2, 8)], Equiv=100*(1:24)) mydosequivalents - apply(drugbook[, 1:2], 2, function(x) Equivs$Equiv[match(x, Equivs$Name)])*drugbook[, 3:4] mydosequivalents[is.na(mydosequivalents)] - 0 # this should work for your real data mydosequivalents - apply(drugbook[, 1:16], 2, function(x) Equivs$Equiv[match(x, Equivs$Name)])*drugbook[, 17:32] mydosequivalents[is.na(mydosequivalents)] - 0 Jean [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to Store the executed values in a dataframe rle function
The solution that I sent will handle the 150 different samples; just list the column names in the argument to the top 'lapply'. You don't need the 'rle' in my approach. On Wed, Sep 28, 2011 at 2:13 PM, viritha k virith...@gmail.com wrote: Hi, This is the code that I wrote for 3 samples: code: m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric','numeric')) s-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]],rle(m$Sample3)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]],rle(m$Sample3)[[1]])) names(s)=c(Values,Probes) c-data.frame(Sample=character(s$Probes),Chr=character(s$Probes),Start=numeric(s$Probes),End=numeric(s$Probes),Values=numeric(s$Probes),Probes=numeric(s$Probes),stringsAsFactors=FALSE) G=1 n=4 for(i in 1:length(s$Probes)){ + if(G==1){c[i,1]-names(m[n]) + c[i,2]-unique(m$Chr[G:s$Probes[i]]) + c[i,3]-min(m$Start[G:s$Probes[i]]) + c[i,4]-max(m$End[G:s$Probes[i]]) + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i]) + G=(G+s$Probes[i])} + else if((G-1) length(m$Sample1)) { + c[i,1]-names(m[n]) + c[i,2]-unique(m$Chr[G:(G+s$Probes[i]-1)]) + c[i,3]-min(m$Start[G:(G+s$Probes[i]-1)]) + c[i,4]-max(m$End[G:(G+s$Probes[i]-1)]) + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i]) + G=(G+s$Probes[i])} + else { + G=1 + n=n+1 + c[i,1]-names(m[n]) + c[i,2]-unique(m$Chr[G:s$Probes[i]]) + c[i,3]-min(m$Start[G:s$Probes[i]]) + c[i,4]-max(m$End[G:s$Probes[i]]) + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i]) + G=(G+s$Probes[i])}} c Sample Chr Start End Values Probes 1 Sample1 chr2 9896633 14404502 0 4 2 Sample1 chr2 14421718 16048724 -0.43 4 3 Sample1 chr2 37491676 37703009 0 2 4 Sample2 chr2 9896633 9896690 0 2 5 Sample2 chr2 14314039 16048724 -0.35 6 6 Sample2 chr2 37491676 37703009 0 2 7 Sample3 chr2 9896633 14314098 0 3 8 Sample3 chr2 14404467 16031769 0.32 3 9 Sample3 chr2 16036178 37491735 0.45 3 10 Sample3 chr2 37702947 37703009 0 1 The problem that I am facing is for expanding rle function for values and probes. Defintely your code looks simpler, but I would like to read the file by just giving the name of the file as written in my code because my original file contains 150 samples,but how to use lapply or rle function for 150 such samples, if my file contain 150 samples similiar to sample1 and sample2. waiting for your reply, Thanks, Suji On Wed, Sep 28, 2011 at 11:37 AM, jim holtman jholt...@gmail.com wrote: Here one approach: x - read.table(textConnection(Chr start end sample1 sample2 + chr2 9896633 9896683 0 0 + chr2 9896639 9896690 0 0 + chr2 14314039 14314098 0 -0.35 + chr2 14404467 14404502 0 -0.35 + chr2 14421718 14421777 -0.43 -0.35 + chr2 16031710 16031769 -0.43 -0.35 + chr2 16036178 16036237 -0.43 -0.35 + chr2 16048665 16048724 -0.43 -0.35 + chr2 37491676 37491735 0 0 + chr2 37702947 37703009 0 0), header = TRUE, as.is = TRUE) closeAllConnections() result - lapply(c('sample1', 'sample2'), function(.samp){ + # split by breaks in the values + .grps - split(x, cumsum(c(0, diff(x[[.samp]]) != 0))) + + # combine the list of dataframes + .range - do.call(rbind, lapply(.grps, function(.set){ + # create a dataframe of the results + data.frame(Sample = .samp + , Chr = .set$Chr[1L] + , Start = min(.set$start) + , End = max(.set$end) + , Values = .set[[.samp]][1L] + , Probes = nrow(.set) + ) + })) + }) # put the list of dataframes together result - do.call(rbind, result) result Sample Chr Start End Values Probes 0 sample1 chr2 9896633 14404502 0.00 4 1 sample1 chr2 14421718 16048724 -0.43 4 2 sample1 chr2 37491676 37703009 0.00 2 01 sample2 chr2 9896633 9896690 0.00 2 11 sample2 chr2 14314039 16048724 -0.35 6 21 sample2 chr2 37491676 37703009 0.00 2 On Mon, Sep 26, 2011 at 10:30 AM, sujitha virith...@gmail.com wrote: Hi group, This is how my test file looks like: Chr start end sample1 sample2 chr2 9896633 9896683 0 0 chr2 9896639 9896690 0 0 chr2 14314039 14314098 0 -0.35 chr2 14404467 14404502 0 -0.35 chr2 14421718 14421777 -0.43 -0.35 chr2 16031710 16031769 -0.43 -0.35 chr2 16036178 16036237 -0.43 -0.35 chr2 16048665 16048724 -0.43 -0.35 chr2 37491676 37491735 0 0 chr2 37702947 37703009 0 0 This is the output that I am expecting: Sample Chr Start End Values Probes sample1 chr2 9896633 14404502 0 4 sample1 chr2 14421718 16048724 -0.43 4 sample1 chr2 37491676 37703001 0 2 sample2 chr2 9896633 9896690 0 2 sample2 chr2 14314039 16048724 -0.35 6 sample2 chr2 37491676 37703009 0 2 Here the Chr value is same but
[R] survexp with large dataframes
Hello, and thank you in advance. I would like to capture the expected survival from a coxph model for subjects in an observational study with recurrent events, but the survexp statement is failing due to memory. I am using R version 2.13.1 (2011-07-08) on Windows XP. My objective is to plot the fitted survival with the Kaplan-Meier plot. Below is the code with output and [unfortunately] errors. Is there something wrong in my use of cluster in generating the proportional hazards model, or is there some syntax to pass it into survexp? Mike dim(dev) [1] 899876 25 mod1 - coxph(Surv(begin.cp, end.cp, event) + ~ age.sex + + plan_type + + uw_load + + cluster(mbr_key) + ,data=dev + ) summary(mod1) Call: coxph(formula = Surv(begin.cp, end.cp, event) ~ age.sex + plan_type + uw_load + cluster(mbr_key), data = dev) n= 899876, number of events= 753324 coef exp(coef) se(coef) robust se z Pr(|z|) age.sex19-34_MALE -0.821944 0.439576 0.005529 0.023298 -35.280 2e-16 *** age.sex35-49_FEMALE 0.058776 1.060537 0.004201 0.018477 3.181 0.00147 ** age.sex35-49_MALE -0.515590 0.597148 0.004634 0.019986 -25.798 2e-16 *** age.sex50-64_FEMALE 0.190940 1.210386 0.004350 0.020415 9.353 2e-16 *** age.sex50-64_MALE -0.127514 0.880281 0.004487 0.021431 -5.950 2.68e-09 *** age.sexCHILD_CHILD -0.327522 0.720707 0.004238 0.017066 -19.192 2e-16 *** plan_typeLOW-0.165735 0.847270 0.002443 0.011080 -14.958 2e-16 *** uw_load1-50 0.215122 1.240014 0.006437 0.029189 7.370 1.71e-13 *** uw_load101-250 0.551042 1.735060 0.003993 0.018779 29.344 2e-16 *** uw_load251+ 0.981660 2.668884 0.003172 0.017490 56.126 2e-16 *** uw_load51-1000.413464 1.512046 0.006216 0.027877 14.832 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 exp(coef) exp(-coef) lower .95 upper .95 age.sex19-34_MALE 0.4396 2.27490.42000.4601 age.sex35-49_FEMALE1.0605 0.94291.02281.0996 age.sex35-49_MALE 0.5971 1.67460.57420.6210 age.sex50-64_FEMALE1.2104 0.82621.16291.2598 age.sex50-64_MALE 0.8803 1.13600.84410.9180 age.sexCHILD_CHILD 0.7207 1.38750.69700.7452 plan_typeLOW 0.8473 1.18030.82910.8659 uw_load1-501.2400 0.80641.17111.3130 uw_load101-250 1.7351 0.57631.67241.8001 uw_load251+2.6689 0.37472.57892.7620 uw_load51-100 1.5120 0.66141.43161.5970 Concordance= 0.643 (se = 0 ) Rsquare= 0.205 (max possible= 1 ) Likelihood ratio test= 206724 on 11 df, p=0 Wald test= 9207 on 11 df, p=0 Score (logrank) test = 246358 on 11 df, p=0, Robust = 4574 p=0 (Note: the likelihood ratio and score tests assume independence of observations within a cluster, the Wald and robust score tests do not). dev.fit - survexp( ~ 1, ratetable=mod1, data=dev) Error in survexp.cfit(cbind(as.numeric(X), R), Y, conditional, FALSE, : cannot allocate memory block of size 15.2 Gb __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] svd (via MFA): All coordinates fall on straight lines?
Applying the svd function to my data by way of the FactoMineR package's MFA function: dfmfa - MFA(df, group=c(2,96), type=c(n,c)) the result is that all my data points fall on one of 8 straight parallel lines when projected onto any two axes, e.g., points(dfmfa$ind$coord[, c(1, 2)]) Furthermore, spot-checks indicate that for any pair of axes other than (1, 2) the lines are either horizontal or vertical. Is all this telling me something about my data, and if so, what? Or is it telling me I should be setting some parameters to nondefault values, and if so, which parameters and what values? Thanks, Mark Pundurs Data Analyst - Traffic Nokia Location Commerce, Chicago The information contained in this communication may be C...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Robust covariance matrix with NeweyWest()
Dear R-users, I would like to compute a robust covariance matrix of two series of realizations of random variables: ###Begin Example### data - cbind(rnorm(100), rnorm(100)) model - lm(data ~ 1) vcov(model) library(sandwich) NeweyWest(model) #produces an error ###End Example### NeweyWest() produces an error but sandwich(), vcovHAC(), kernHAC, weave(),... do not produce any errors. It seems that the model object does not fit in that special case. Nevertheless, the problem is that I need the robust version of the covariance matrix according to Newey and West (1987, 1994). Any ideas or suggestions to solve the problem? Kind regards, Andy __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to Store the executed values in a dataframe rle function
Hi, This is the code that I wrote for 3 samples: code: m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric','numeric')) s-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]],rle(m$Sample3)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]],rle(m$Sample3)[[1]])) names(s)=c(Values,Probes) c-data.frame(Sample=character(s$Probes),Chr=character(s$Probes),Start=numeric(s$Probes),End=numeric(s$Probes),Values=numeric(s$Probes),Probes=numeric(s$Probes),stringsAsFactors=FALSE) G=1 n=4 for(i in 1:length(s$Probes)){ + if(G==1){c[i,1]-names(m[n]) + c[i,2]-unique(m$Chr[G:s$Probes[i]]) + c[i,3]-min(m$Start[G:s$Probes[i]]) + c[i,4]-max(m$End[G:s$Probes[i]]) + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i]) + G=(G+s$Probes[i])} + else if((G-1) length(m$Sample1)) { + c[i,1]-names(m[n]) + c[i,2]-unique(m$Chr[G:(G+s$Probes[i]-1)]) + c[i,3]-min(m$Start[G:(G+s$Probes[i]-1)]) + c[i,4]-max(m$End[G:(G+s$Probes[i]-1)]) + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i]) + G=(G+s$Probes[i])} + else { + G=1 + n=n+1 + c[i,1]-names(m[n]) + c[i,2]-unique(m$Chr[G:s$Probes[i]]) + c[i,3]-min(m$Start[G:s$Probes[i]]) + c[i,4]-max(m$End[G:s$Probes[i]]) + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i]) + G=(G+s$Probes[i])}} c Sample ChrStart End Values Probes 1 Sample1 chr2 9896633 14404502 0 4 2 Sample1 chr2 14421718 16048724 -0.43 4 3 Sample1 chr2 37491676 37703009 0 2 4 Sample2 chr2 9896633 9896690 0 2 5 Sample2 chr2 14314039 16048724 -0.35 6 6 Sample2 chr2 37491676 37703009 0 2 7 Sample3 chr2 9896633 14314098 0 3 8 Sample3 chr2 14404467 16031769 0.32 3 9 Sample3 chr2 16036178 37491735 0.45 3 10 Sample3 chr2 37702947 37703009 0 1 The problem that I am facing is for expanding rle function for values and probes. Defintely your code looks simpler, but I would like to read the file by just giving the name of the file as written in my code because my original file contains 150 samples,but how to use lapply or rle function for 150 such samples, if my file contain 150 samples similiar to sample1 and sample2. waiting for your reply, Thanks, Suji On Wed, Sep 28, 2011 at 11:37 AM, jim holtman jholt...@gmail.com wrote: Here one approach: x - read.table(textConnection(Chr start end sample1 sample2 + chr2 9896633 9896683 0 0 + chr2 9896639 9896690 0 0 + chr2 14314039 14314098 0 -0.35 + chr2 14404467 14404502 0 -0.35 + chr2 14421718 14421777 -0.43 -0.35 + chr2 16031710 16031769 -0.43 -0.35 + chr2 16036178 16036237 -0.43 -0.35 + chr2 16048665 16048724 -0.43 -0.35 + chr2 37491676 37491735 0 0 + chr2 37702947 37703009 0 0), header = TRUE, as.is = TRUE) closeAllConnections() result - lapply(c('sample1', 'sample2'), function(.samp){ + # split by breaks in the values + .grps - split(x, cumsum(c(0, diff(x[[.samp]]) != 0))) + + # combine the list of dataframes + .range - do.call(rbind, lapply(.grps, function(.set){ + # create a dataframe of the results + data.frame(Sample = .samp +, Chr = .set$Chr[1L] +, Start = min(.set$start) +, End = max(.set$end) +, Values = .set[[.samp]][1L] +, Probes = nrow(.set) +) + })) + }) # put the list of dataframes together result - do.call(rbind, result) result Sample ChrStart End Values Probes 0 sample1 chr2 9896633 14404502 0.00 4 1 sample1 chr2 14421718 16048724 -0.43 4 2 sample1 chr2 37491676 37703009 0.00 2 01 sample2 chr2 9896633 9896690 0.00 2 11 sample2 chr2 14314039 16048724 -0.35 6 21 sample2 chr2 37491676 37703009 0.00 2 On Mon, Sep 26, 2011 at 10:30 AM, sujitha virith...@gmail.com wrote: Hi group, This is how my test file looks like: Chr start end sample1 sample2 chr2 9896633 9896683 0 0 chr2 9896639 9896690 0 0 chr2 14314039 14314098 0 -0.35 chr2 14404467 14404502 0 -0.35 chr2 14421718 14421777 -0.43 -0.35 chr2 16031710 16031769 -0.43 -0.35 chr2 16036178 16036237 -0.43 -0.35 chr2 16048665 16048724 -0.43 -0.35 chr2 37491676 37491735 0 0 chr2 37702947 37703009 0 0 This is the output that I am expecting: Sample Chr Start End Values Probes sample1 chr2 9896633 14404502 0 4 sample1 chr2 14421718 16048724 -0.43 4 sample1 chr2 37491676 37703001 0 2 sample2 chr2 9896633 9896690 0 2 sample2 chr2 14314039 16048724 -0.35 6 sample2 chr2 37491676 37703009 0 2 Here the Chr value is same but can be any other value aswell so unique among the similar values. The Start for the first line would be the least value until values are similiar (4) then the end would be highest value. The values is the unique value among the common values and probes is number of similar values.
Re: [R] How to Store the executed values in a dataframe rle function
Hi Jim, Thanks for the reply, ok but I dont want to use textConnection and paste each line but want the input to be read from a file like m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric'). So how do I incorporate that in your code. Thanks, Suji On Wed, Sep 28, 2011 at 2:40 PM, jim holtman jholt...@gmail.com wrote: The solution that I sent will handle the 150 different samples; just list the column names in the argument to the top 'lapply'. You don't need the 'rle' in my approach. On Wed, Sep 28, 2011 at 2:13 PM, viritha k virith...@gmail.com wrote: Hi, This is the code that I wrote for 3 samples: code: m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric','numeric')) s-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]],rle(m$Sample3)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]],rle(m$Sample3)[[1]])) names(s)=c(Values,Probes) c-data.frame(Sample=character(s$Probes),Chr=character(s$Probes),Start=numeric(s$Probes),End=numeric(s$Probes),Values=numeric(s$Probes),Probes=numeric(s$Probes),stringsAsFactors=FALSE) G=1 n=4 for(i in 1:length(s$Probes)){ + if(G==1){c[i,1]-names(m[n]) + c[i,2]-unique(m$Chr[G:s$Probes[i]]) + c[i,3]-min(m$Start[G:s$Probes[i]]) + c[i,4]-max(m$End[G:s$Probes[i]]) + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i]) + G=(G+s$Probes[i])} + else if((G-1) length(m$Sample1)) { + c[i,1]-names(m[n]) + c[i,2]-unique(m$Chr[G:(G+s$Probes[i]-1)]) + c[i,3]-min(m$Start[G:(G+s$Probes[i]-1)]) + c[i,4]-max(m$End[G:(G+s$Probes[i]-1)]) + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i]) + G=(G+s$Probes[i])} + else { + G=1 + n=n+1 + c[i,1]-names(m[n]) + c[i,2]-unique(m$Chr[G:s$Probes[i]]) + c[i,3]-min(m$Start[G:s$Probes[i]]) + c[i,4]-max(m$End[G:s$Probes[i]]) + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i]) + G=(G+s$Probes[i])}} c Sample ChrStart End Values Probes 1 Sample1 chr2 9896633 14404502 0 4 2 Sample1 chr2 14421718 16048724 -0.43 4 3 Sample1 chr2 37491676 37703009 0 2 4 Sample2 chr2 9896633 9896690 0 2 5 Sample2 chr2 14314039 16048724 -0.35 6 6 Sample2 chr2 37491676 37703009 0 2 7 Sample3 chr2 9896633 14314098 0 3 8 Sample3 chr2 14404467 16031769 0.32 3 9 Sample3 chr2 16036178 37491735 0.45 3 10 Sample3 chr2 37702947 37703009 0 1 The problem that I am facing is for expanding rle function for values and probes. Defintely your code looks simpler, but I would like to read the file by just giving the name of the file as written in my code because my original file contains 150 samples,but how to use lapply or rle function for 150 such samples, if my file contain 150 samples similiar to sample1 and sample2. waiting for your reply, Thanks, Suji On Wed, Sep 28, 2011 at 11:37 AM, jim holtman jholt...@gmail.com wrote: Here one approach: x - read.table(textConnection(Chr start end sample1 sample2 + chr2 9896633 9896683 0 0 + chr2 9896639 9896690 0 0 + chr2 14314039 14314098 0 -0.35 + chr2 14404467 14404502 0 -0.35 + chr2 14421718 14421777 -0.43 -0.35 + chr2 16031710 16031769 -0.43 -0.35 + chr2 16036178 16036237 -0.43 -0.35 + chr2 16048665 16048724 -0.43 -0.35 + chr2 37491676 37491735 0 0 + chr2 37702947 37703009 0 0), header = TRUE, as.is = TRUE) closeAllConnections() result - lapply(c('sample1', 'sample2'), function(.samp){ + # split by breaks in the values + .grps - split(x, cumsum(c(0, diff(x[[.samp]]) != 0))) + + # combine the list of dataframes + .range - do.call(rbind, lapply(.grps, function(.set){ + # create a dataframe of the results + data.frame(Sample = .samp +, Chr = .set$Chr[1L] +, Start = min(.set$start) +, End = max(.set$end) +, Values = .set[[.samp]][1L] +, Probes = nrow(.set) +) + })) + }) # put the list of dataframes together result - do.call(rbind, result) result Sample ChrStart End Values Probes 0 sample1 chr2 9896633 14404502 0.00 4 1 sample1 chr2 14421718 16048724 -0.43 4 2 sample1 chr2 37491676 37703009 0.00 2 01 sample2 chr2 9896633 9896690 0.00 2 11 sample2 chr2 14314039 16048724 -0.35 6 21 sample2 chr2 37491676 37703009 0.00 2 On Mon, Sep 26, 2011 at 10:30 AM, sujitha virith...@gmail.com wrote: Hi group, This is how my test file looks like: Chr start end sample1 sample2 chr2 9896633 9896683 0 0 chr2 9896639 9896690 0 0 chr2 14314039 14314098 0 -0.35 chr2 14404467 14404502 0 -0.35 chr2 14421718 14421777 -0.43 -0.35
[R] Error: could not find function
Dear all I have written a function, calibrateCES, in which a function is called, BPGC, which was written in the enclosure of calibrateCES. Until the most recent version of my program, calibrateCES always used BPGC as planned. Following a modification which does not affect BPGC directly (although it does affect one of the arguments of BPGC), I now get the following error message: Error: could not find function BPGC When I defined BPGC in calibrateCES, the problem disappears, but this is definitely not the solution I want, because I want BPGC to be available for other functions as well. Laurent Franckx Uniting expertise from different fields of technology enhances the development of innovative methods for sustainable production. Join the third edition of the international congress 'Innovation for Sustainable Production 2012' May, 6-9, 2012 - Bruges (Belgium) http://www.i-sup2012.org --- This e-mail, any attachments and the information it contains are confidential and meant only for the use of the addressee(s) only. Access to this e-mail by anyone other than the addressee(s) is unauthorized. If you are not the intended addressee (or responsible for delivery of the message to such person), you may not use, copy, distribute or deliver to anyone this message (or any part of its contents) or take any action in reliance on it. In such case, you should destroy this message and notify the sender immediately. If you have received this e-mail in error, please notify us immediately by e-mail or telephone and delete the e-mail from any computer. All reasonable precautions have been taken to ensure no viruses are present in this e-mail and its attachments. As our company cannot accept responsibility for any loss or damage arising from the use of this e-mail or attachments we recommend that you subject these to your virus checking procedures prior to use. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error: could not find function
Then don't define BPGC inside another function. --- Jeff Newmiller The . . Go Live... DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Franckx Laurent laurent.fran...@vito.be wrote: Dear all I have written a function, calibrateCES, in which a function is called, BPGC, which was written in the enclosure of calibrateCES. Until the most recent version of my program, calibrateCES always used BPGC as planned. Following a modification which does not affect BPGC directly (although it does affect one of the arguments of BPGC), I now get the following error message: Error: could not find function BPGC When I defined BPGC in calibrateCES, the problem disappears, but this is definitely not the solution I want, because I want BPGC to be available for other functions as well. Laurent Franckx Uniting expertise from different fields of technology enhances the development of innovative methods for sustainable production. Join the third edition of the international congress 'Innovation for Sustainable Production 2012' May, 6-9, 2012 - Bruges (Belgium) http://www.i-sup2012.org --- This e-mail, any attachments and the information it contains are confidential and meant only for the use of the addressee(s) only. Access to this e-mail by anyone other than the addressee(s) is unauthorized. If you are not the intended addressee (or responsible for delivery of the message to such person), you may not use, copy, distribute or deliver to anyone this message (or any part of its contents) or take any action in reliance on it. In such case, you should destroy this message and notify the sender immediately. If you have received this e-mail in error, please notify us immediately by e-mail or telephone and delete the e-mail from any computer. All reasonable precautions have been taken to ensure no viruses are present in this e-mail and its attachments. As our company cannot accept responsibility for any loss or damage arising from the use of this e-mail or attachments we recommend that you subject these to your virus checking procedures prior to use. _ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Robust covariance matrix with NeweyWest()
On Wed, 28 Sep 2011, Andreas Klein wrote: Dear R-users, I would like to compute a robust covariance matrix of two series of realizations of random variables: ###Begin Example### data - cbind(rnorm(100), rnorm(100)) model - lm(data ~ 1) vcov(model) library(sandwich) NeweyWest(model) #produces an error ###End Example### NeweyWest() produces an error but sandwich(), vcovHAC(), kernHAC, weave(),... do not produce any errors. It seems that the model object does not fit in that special case. Short version: This is/was a bug. A fixed version has just been released on the CRAN master site. Medium version: Until binary packages are available (if needed), you can work around the bug by computing the truncation lag by hand. NeweyWest(model, lag = floor(bwNeweyWest(model, weights = 1))) Long version: In the bandwidth selection -- function bwNeweyWest() -- estimating/score functions pertaining to intercepts are omitted by default (following Newey West's advice). This is, of course, unless there is only an intercept in which case its estimating/score function is used. The code handled this correctly in models with one intercept but not in mlm objects which can have two intercept scores (and no other regressors), as in your case. Fixed now. Best, Z Nevertheless, the problem is that I need the robust version of the covariance matrix according to Newey and West (1987, 1994). Any ideas or suggestions to solve the problem? Kind regards, Andy __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wilcox test and data collection
Francesca Pancotto wrote on 09/28/2011 10:23:17 AM: Dear Contributors I have a problem with the collection of data from the results of a test. I need to perform a comparative test over groups of data , recall the value of the pvalue and create a table. My problem is in the way to replicate the analysis over and over again over subsets of data according to a condition. I have this database, called y: gg t1 t2d 40 1 1 2 50 2 2 1 45 1 3 1 49 2 1 1 5 2 1 3 40 1 1 2 where gg takes values from 1 to 100, t1 and t2 have values in (1,2,3) and d in (0,1,2,3) I want to perform tests on the values of gg according to the conditions that d==0 , compare values of gg when t1==1 with values of gg when t1==3 d==1 , compare values of gg when t1==1 with values of gg when t1==3 d==2 , compare values of gg when t1==1 with values of gg when t1==3 .. then d==0 , compare values of gg when t2==1 with values of gg when t2==3 d==1... then collect the data of a statistics and create a table. The procedure i followed is to create sub datasets called m0,m1,m2,m3 corresponding to the values of d, i.e. m0- y[y$d==0,c(7,17,18,19)] m1- y[y$d==1,c(7,17,18,19)] m2- y[y$d==2,c(7,17,18,19)] m3- y[y$d==3,c(7,17,18,19)] then perform the test as follows: x1-wilcox.test(m0[m0$t1==1,1],m0[m0$t1==3,1],correct=FALSE, exact=FALSE, conf.int=TRUE,alternative = c(g)) #ABC ID x2- wilcox.test(m1[m1$t1==1,1],m1[m1$t1==3,1],correct=FALSE, exact=FALSE, conf.int=TRUE,alternative = c(g)) x3- wilcox.test(m2[m2$t1==1,1],m2[m2$t1==3,1],correct=FALSE, exact=FALSE, conf.int=TRUE,alternative = c(g)) x4- wilcox.test(m3[m3$t1==1,1],m3[m3$t1==3,1],correct=FALSE, exact=FALSE, conf.int=TRUE,alternative = c(g)) each of these tests will create an object, say x and then I extract the value statistics using x$statistics. How to automatize this? Thank you for any help you can provide. Francesca y - data.frame(gg=sample(100), t1=sample(1:3, 100, TRUE), t2=sample(1:3, 100, TRUE), d=sample(0:3, 100, TRUE)) yd - split(y, y$d) sapply(yd, function(df) wilcox.test(df[df$t1==1, 1], df[df$t1==3, 1], correct=FALSE, exact=FALSE, conf.int=TRUE, alternative=g)[c(estimate, p.value)]) sapply(yd, function(df) wilcox.test(df[df$t2==1, 1], df[df$t2==3, 1], correct=FALSE, exact=FALSE, conf.int=TRUE, alternative=g)[c(estimate, p.value)]) Jean [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] compare proportions
Hi all, I asked this yesterday, but hadn't got any response yet. Understand this is not pure R technical question, but more of statistical. With many statistical experts in the list, I would appreciate any suggestions... Many thanks John -- Hi, I have a seemingly simple proportional test. here is the question I am trying to answer: There is a test running each day in the lab, the test comes out as either positive or negative. So at the end of each month, we can calculate a positive rate in that month as the proportion of positive test results. The data look like: Month # positive # total tests positive rate January 24 205 11.7% February 31 234 13.2% March 26 227 11.5% : : : August 42 241 17.4% The total # of positive before August is 182, and the total # of tests before August is 1526. It appears that from January to July, the positive monthly rate is between 11% to 13%, the rate in August is up around 17%. So the question is whether is up in August is statistically significant? I can think of 3 ways to do this test: 1. Use binom.test(), set “p” as the average positive rate between January and July (=182/1526): binom.test(42,241,182/1526) Exact binomial test data: 42 and 241 number of successes = 42, number of trials = 241, p-value = 0.0125 alternative hypothesis: true probability of success is not equal to 0.1192661 95 percent confidence interval: 0.1285821 0.2281769 sample estimates: probability of success 0.1742739 2. Use prop.test(), where I compare the average positive rate between January July with the positive rate in August: prop.test(c(182,42),c(1526,241)) 2-sample test for equality of proportions with continuity correction data: c(182, 42) out of c(1526, 241) X-squared = 5.203, df = 1, p-value = 0.02255 alternative hypothesis: two.sided 95 percent confidence interval: -0.107988625 -0.002026982 sample estimates: prop 1 prop 2 0.1192661 0.1742739 3. Use prop.test(), where I compare the average MONTHLY positive rate between January July with the positive rate in August. The average monthly # of positives is 182/7=26, the average monthly $ of total tests is 1526/7=218: prop.test(c(26,42),c(218,241)) 2-sample test for equality of proportions with continuity correction data: c(26, 42) out of c(218, 241) X-squared = 2.3258, df = 1, p-value = 0.1272 alternative hypothesis: two.sided 95 percent confidence interval: -0.12375569 0.01374008 sample estimates: prop 1 prop 2 0.1192661 0.1742739 As you can see that the method 3 gave insignificant p value compared to method 1 2. While I understand each method is testing different hypothesis, but for the question I am trying to answer (does August have higher positive rate compare to earlier months?), which method is more relevant? Or I should consider some regression techniques, then what type of regression is appropriate? Thanks for any suggestions, John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survival analysis: interval censored data
On Sep 28, 2011, at 10:56 AM, Ruth Arias wrote: hallo terry: I attached araceae data set, The usual survival analysis via the Kaplan-Meier method only make estimates at the time of events. When you tabulate your data, you see that there were no events for the missing (starting) time rows in those categories during the intervals that you are questioning as missing: xtabs( ~ time+time2+categoria+event, data=araceae) , , categoria = C, event = 0 time2 time 2005 2006 2007 2008 2009 2010 20040 23131 22 2005000000 200700004 19 2008000000 2009000000 , , categoria = E, event = 0 time2 time 2005 2006 2007 2008 2009 2010 20040 22073 21 2005001100 200700000 29 2008000000 2009000001 , , categoria = C, event = 1 time2 time 2005 2006 2007 2008 2009 2010 2004052303 2005000000 2007000232 2008000010 2009000000 , , categoria = E, event = 1 time2 time 2005 2006 2007 2008 2009 2010 2004721134 2005000100 2007000313 2008000000 2009000000 when I use this: surara-survfit(Surv(time,time2,event)~categoria) Call: survfit(formula = Surv(time, time2, event) ~ categoria) records n.max n.start events median 0.95LCL 0.95UCL categoria=C 9463 0 21 NA NA NA categoria=E 11177 0 26 NA NA NA summary(surara) Call: survfit(formula = Surv(time, time2, event) ~ categoria) categoria=C time n.risk n.event entered censored survival std.err lower 95% CI upper 95% CI 2006 63 5 0 230.921 0.0341 0.8560.990 2007 35 2 3010.868 0.0483 0.7780.968 2008 62 5 130.798 0.0536 0.7000.910 2009 55 4 050.740 0.0570 0.6360.861 2010 46 5 0 410.660 0.0611 0.5500.791 categoria=E time n.risk n.event entered censored survival std.err lower 95% CI upper 95% CI 2005 71 7 300.901 0.0354 0.8350.973 2006 67 2 0 220.875 0.0391 0.8010.955 2007 43 1 3610.854 0.0432 0.7740.943 2008 77 5 080.799 0.0469 0.7120.896 2009 64 4 130.749 0.0502 0.6570.854 2010 58 7 0 510.658 0.0545 0.5600.774 You see that your first survfit object is offering a simple sum of 'time2' columns of that tabulation as its 'n.event' values. It's 'n.risk' tabulation is not taking note of whether a case started in any particular prior interval. The n.risk sum appears to be the sum of persons surviving from the prior year less any decedents plus any entrants as reflected in future events on that rowYou notice that there are missing years even in that report: 2004,2005 for category C and 2004 for category E since there are no events in columns for those 'time2' values. but whe I included type=interval, suraraint- survfit(Surv(time,time2,event,type='interval')~categoria) # falta arreglar lo del intervalo!!! summary(suraraint) Call: survfit(formula = Surv(time, time2, event, type = interval) ~ categoria) categoria=C time n.risk n.event survival std.err lower 95% CI upper 95% CI 2004 95.00 13.140.862 0.03540.7950.934 2007 31.867.190.667 0.06950.5440.818 2008 1.671.670.000 NaN NA NA categoria=E time n.risk n.event survival std.err lower 95% CI upper 95% CI 2004 112.0 18.470.835 0.03510.7690.907 2005 40.51.060.813 0.04010.7380.896 2007 37.57.460.651 0.06200.5400.785 The second object's n.event, when Surv() was constructed with type=interval, has values based on the starting 'time' rows, but I am unable to deduce the estimating algorithm. I remember Therneau saying it wasn't a simple algorithm. The 2008 row in category C has one entry of 1 in the next year and there were no censoring for C- entrants in that year. Why the n.event is 1.67 I cannot say, but at least the n.event does not exceed the n.risk. The code or a copy of Therneau and Grambsch would be sensible places to look for answer by
Re: [R] dump.frames, debugger and the ... argument
On 28/09/2011 12:25 PM, Jannis wrote: dummyFunct = function(a, ...) { b = 2* a c = d # should cause error return(b) } options(error = quote(dump.frames(dumpto = last.dump, to.file = TRUE))) dummyFunct(1) load(last.dump.rda) debugger(last.dump) # selection of call 1 causes: # Error in get(.obj, envir = dump[[.selection]]) : # argument ... is missing, with no defa Looks like Brian was right, it has been fixed. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding axis to an ellipse: ellipse package
On 28/09/11 20:34, Antoine wrote: Thanks a lot Rolf, I knew it would be possible to do it that way but I was just curious to know if the ellipse package was offering such a feature. Thanks again for you help! Well, since the help for ellipse makes no mention of such a facility it is *highly* unlikely that there would be one. And in general in R, if there is no obvious built-in way of accomplishing a task, it is probably faster to roll your own rather than expending time on fruitless searching. (I have a vague recollection of there being a fortune along similar lines, but I can't find it.) cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survival analysis: interval censored data
David, Thanks for taking time to answer this. When there are interval censored data, computation of the MLE first finds a minimal number of time points such that every interval with a death contains one of those points. Those are the points at which the final curve can have a jump. The algorithm, more or less is: 1. On the real line, put a pair of parenthesis down for each interval with an event ( at the left end ) at the right. 2. Now scan down the line for () pairs (pairs with nothing else in the enclosed interval): the center of each will be one of the jump points. With slight variations for left truncation. Terry T. On Wed, 2011-09-28 at 16:33 -0400, David Winsemius wrote: On Sep 28, 2011, at 10:56 AM, Ruth Arias wrote: hallo terry: I attached araceae data set, The usual survival analysis via the Kaplan-Meier method only make estimates at the time of events. When you tabulate your data, you see that there were no events for the missing (starting) time rows in those categories during the intervals that you are questioning as missing: xtabs( ~ time+time2+categoria+event, data=araceae) , , categoria = C, event = 0 time2 time 2005 2006 2007 2008 2009 2010 20040 23131 22 2005000000 200700004 19 2008000000 2009000000 , , categoria = E, event = 0 time2 time 2005 2006 2007 2008 2009 2010 20040 22073 21 2005001100 200700000 29 2008000000 2009000001 , , categoria = C, event = 1 time2 time 2005 2006 2007 2008 2009 2010 2004052303 2005000000 2007000232 2008000010 2009000000 , , categoria = E, event = 1 time2 time 2005 2006 2007 2008 2009 2010 2004721134 2005000100 2007000313 2008000000 2009000000 when I use this: surara-survfit(Surv(time,time2,event)~categoria) Call: survfit(formula = Surv(time, time2, event) ~ categoria) records n.max n.start events median 0.95LCL 0.95UCL categoria=C 9463 0 21 NA NA NA categoria=E 11177 0 26 NA NA NA summary(surara) Call: survfit(formula = Surv(time, time2, event) ~ categoria) categoria=C time n.risk n.event entered censored survival std.err lower 95% CI upper 95% CI 2006 63 5 0 230.921 0.0341 0.8560.990 2007 35 2 3010.868 0.0483 0.7780.968 2008 62 5 130.798 0.0536 0.7000.910 2009 55 4 050.740 0.0570 0.6360.861 2010 46 5 0 410.660 0.0611 0.5500.791 categoria=E time n.risk n.event entered censored survival std.err lower 95% CI upper 95% CI 2005 71 7 300.901 0.0354 0.8350.973 2006 67 2 0 220.875 0.0391 0.8010.955 2007 43 1 3610.854 0.0432 0.7740.943 2008 77 5 080.799 0.0469 0.7120.896 2009 64 4 130.749 0.0502 0.6570.854 2010 58 7 0 510.658 0.0545 0.5600.774 You see that your first survfit object is offering a simple sum of 'time2' columns of that tabulation as its 'n.event' values. It's 'n.risk' tabulation is not taking note of whether a case started in any particular prior interval. The n.risk sum appears to be the sum of persons surviving from the prior year less any decedents plus any entrants as reflected in future events on that rowYou notice that there are missing years even in that report: 2004,2005 for category C and 2004 for category E since there are no events in columns for those 'time2' values. but whe I included type=interval, suraraint- survfit(Surv(time,time2,event,type='interval')~categoria) # falta arreglar lo del intervalo!!! summary(suraraint) Call: survfit(formula = Surv(time, time2, event, type = interval) ~ categoria) categoria=C time n.risk n.event survival std.err lower 95% CI upper 95% CI 2004 95.00 13.140.862 0.03540.7950.934 2007 31.867.190.667 0.06950.5440.818 2008
[R] Multiplying a list of matrices with a vector
Hi all! I have a list of matrices and I want to multiply the ith element of the list with the ith element of a another vector. That is, LL - list(A=diag(3),B=diag(3),C=diag(3)) vec - 1:3 for(i in 1:3) + { + LL[[i]] - LL[[i]]*vec[i] + } LL $A [,1] [,2] [,3] [1,]100 [2,]010 [3,]001 $B [,1] [,2] [,3] [1,]200 [2,]020 [3,]002 $C [,1] [,2] [,3] [1,]300 [2,]030 [3,]003 Is there another (more efficient) way of doing this without using loops? I found an older entry in this list with a similar problem, where the list elements were always multiplied with the same number, e.g., lapply(LL, function(x) x*3) and I was looking for something similar. Best, Franco __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiplying a list of matrices with a vector
Untested: mapply('*', LL, vec) Those should be backticks but I can't type them on my phone Michael On Sep 28, 2011, at 6:04 PM, Mendolia, Franco fmendo...@mcw.edu wrote: Hi all! I have a list of matrices and I want to multiply the ith element of the list with the ith element of a another vector. That is, LL - list(A=diag(3),B=diag(3),C=diag(3)) vec - 1:3 for(i in 1:3) + { + LL[[i]] - LL[[i]]*vec[i] + } LL $A [,1] [,2] [,3] [1,]100 [2,]010 [3,]001 $B [,1] [,2] [,3] [1,]200 [2,]020 [3,]002 $C [,1] [,2] [,3] [1,]300 [2,]030 [3,]003 Is there another (more efficient) way of doing this without using loops? I found an older entry in this list with a similar problem, where the list elements were always multiplied with the same number, e.g., lapply(LL, function(x) x*3) and I was looking for something similar. Best, Franco __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Testing for arguments in a function
On Mon, Sep 26, 2011 at 2:39 PM, Gene Leynes gley...@gmail.com wrote: I don't understand how this function can subset by i when i is missing ## My function: myfun = function(vec, i){ ret = vec[i] ret } ## My data: i = 10 vec = 1:100 ## Expected input and behavior: myfun(vec, i) ## Missing an argument, but error is not caught! ## How is subsetting even possible here??? myfun(vec) Hello, Gene: It seems to me the discussion of your question launches off into a more abstract direction than we need here. I've found it is wise to name arguments to functions differently than variables in the environment, so you don't have the function looking for i outside itself. And you can put each variable to a ridiculous default to force an error when that option is not given. NAN is nothing here, just a string that has no meaning, so we get an error as you said you wanted. myfun - function(ivec, ii=NAN){ ivec[ii] } myfun(1:40,10) works myfun(1:40) Produces Error in myfun(1:40) : object 'NAN' not found If you are happy enough to just plop out an error, there's no need to worry. Note the function can be written more succinctly than you originally had it, and you are generally advised to use - rather than = by the R developers. -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] PCA: prcomp rotations
Hi all, I think I may be confused by different people/programs using the word rotation differently. Does prcomp not perform rotations by default? If I understand it correctly retx=TRUE returns ordinated data, that I can plot for individual samples (prcomp()$x: which is the scaled and centered (rotated?) data multiplied by loadings). What does it mean that the data is rotated from the ?prcomp description? Is this referring to the data matrix orientation (i.e. looking at differences among samples (columns) based on variables (rows) vs. differences among variables (columns) based on samples(rows))? Thank you, Colin Wahl Graduate student, Western Washington University code background: I am looking at the ordination of abiotic stream variables between different sampling locations. abiot.pca=prcomp(all24[, c(10, 13:18)], retx=TRUE, center=TRUE, scale= TRUE) summary(abiot.pca) Importance of components: PC1PC2PC3PC4 PC5 PC6 PC7 Standard deviation 1.5925 1.0814 1.0697 0.9536 0.76624 0.68444 0.43037 Proportion of Variance 0.3623 0.1671 0.1635 0.1299 0.08387 0.06692 0.02646 Cumulative Proportion 0.3623 0.5294 0.6928 0.8227 0.90662 0.97354 1.0 loadings[,1:3] PC1 PC2 PC3 avg.dmax -0.1879223 0.55792480 -0.04962935 scond.med -0.4327223 0.04779998 -0.43369560 docon.med 0.1976094 -0.30384127 0.67222550 cb.per 0.4134302 -0.17550281 -0.40171318 gc.per 0.4136933 -0.26997129 -0.39960398 gf.per 0.3419349 0.63917223 0.16174661 fine.per -0.5285840 -0.28616200 0.10168155 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing data via url
hi I think this already a closed topic since you figure it out yourself. but you can always try to fetch data from Google docs (first make the spreadsheet public) and then writing this snippet: library(RCurl) u=https://docs.google.com/spreadsheet/pub?hl=en_UShl=en_USkey=0As6HUAxhy0Q7dDB0bjh4T2RyS3pIQkdXdGc2U0ZZc3csingle=truegid=0output=csv; content = getBinaryURL(u, ssl.verifypeer = FALSE) tmp = tempfile() write(rawToChar(content), file = tmp) mydata- read.csv(gzfile(tmp)) mydata it works for me and hopefully it'll work nor newbies like me too :) -- View this message in context: http://r.789695.n4.nabble.com/Accessing-data-via-url-tp3178094p3853451.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting Lines Through multiple groups
Thank you Eik Vettorazzi-2 That was exactly what I wanted! Regards Clara -- View this message in context: http://r.789695.n4.nabble.com/Plotting-Lines-Through-multiple-groups-tp3850099p3853753.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to Store the executed values in a dataframe rle function
I only used textConnection for the sample data. Just put your file name in the read.table; e.g., x-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric')) as you have in your email. I used 'x' in my code, so I replaced your 'm' with 'x'. Try it and see if it works; no reason it shouldn't. On Wed, Sep 28, 2011 at 3:03 PM, viritha k virith...@gmail.com wrote: Hi Jim, Thanks for the reply, ok but I dont want to use textConnection and paste each line but want the input to be read from a file like m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric'). So how do I incorporate that in your code. Thanks, Suji On Wed, Sep 28, 2011 at 2:40 PM, jim holtman jholt...@gmail.com wrote: The solution that I sent will handle the 150 different samples; just list the column names in the argument to the top 'lapply'. You don't need the 'rle' in my approach. On Wed, Sep 28, 2011 at 2:13 PM, viritha k virith...@gmail.com wrote: Hi, This is the code that I wrote for 3 samples: code: m-read.table(test.txt,sep='\t',header=TRUE,colClasses=c('character','integer','integer','numeric','numeric','numeric')) s-data.frame(c(rle(m$Sample1)[[2]],rle(m$Sample2)[[2]],rle(m$Sample3)[[2]]),c(rle(m$Sample1)[[1]],rle(m$Sample2)[[1]],rle(m$Sample3)[[1]])) names(s)=c(Values,Probes) c-data.frame(Sample=character(s$Probes),Chr=character(s$Probes),Start=numeric(s$Probes),End=numeric(s$Probes),Values=numeric(s$Probes),Probes=numeric(s$Probes),stringsAsFactors=FALSE) G=1 n=4 for(i in 1:length(s$Probes)){ + if(G==1){c[i,1]-names(m[n]) + c[i,2]-unique(m$Chr[G:s$Probes[i]]) + c[i,3]-min(m$Start[G:s$Probes[i]]) + c[i,4]-max(m$End[G:s$Probes[i]]) + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i]) + G=(G+s$Probes[i])} + else if((G-1) length(m$Sample1)) { + c[i,1]-names(m[n]) + c[i,2]-unique(m$Chr[G:(G+s$Probes[i]-1)]) + c[i,3]-min(m$Start[G:(G+s$Probes[i]-1)]) + c[i,4]-max(m$End[G:(G+s$Probes[i]-1)]) + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i]) + G=(G+s$Probes[i])} + else { + G=1 + n=n+1 + c[i,1]-names(m[n]) + c[i,2]-unique(m$Chr[G:s$Probes[i]]) + c[i,3]-min(m$Start[G:s$Probes[i]]) + c[i,4]-max(m$End[G:s$Probes[i]]) + c[i,]-cbind(c[i,1],c[i,2],c[i,3],c[i,4],s$Values[i],s$Probes[i]) + G=(G+s$Probes[i])}} c Sample Chr Start End Values Probes 1 Sample1 chr2 9896633 14404502 0 4 2 Sample1 chr2 14421718 16048724 -0.43 4 3 Sample1 chr2 37491676 37703009 0 2 4 Sample2 chr2 9896633 9896690 0 2 5 Sample2 chr2 14314039 16048724 -0.35 6 6 Sample2 chr2 37491676 37703009 0 2 7 Sample3 chr2 9896633 14314098 0 3 8 Sample3 chr2 14404467 16031769 0.32 3 9 Sample3 chr2 16036178 37491735 0.45 3 10 Sample3 chr2 37702947 37703009 0 1 The problem that I am facing is for expanding rle function for values and probes. Defintely your code looks simpler, but I would like to read the file by just giving the name of the file as written in my code because my original file contains 150 samples,but how to use lapply or rle function for 150 such samples, if my file contain 150 samples similiar to sample1 and sample2. waiting for your reply, Thanks, Suji On Wed, Sep 28, 2011 at 11:37 AM, jim holtman jholt...@gmail.com wrote: Here one approach: x - read.table(textConnection(Chr start end sample1 sample2 + chr2 9896633 9896683 0 0 + chr2 9896639 9896690 0 0 + chr2 14314039 14314098 0 -0.35 + chr2 14404467 14404502 0 -0.35 + chr2 14421718 14421777 -0.43 -0.35 + chr2 16031710 16031769 -0.43 -0.35 + chr2 16036178 16036237 -0.43 -0.35 + chr2 16048665 16048724 -0.43 -0.35 + chr2 37491676 37491735 0 0 + chr2 37702947 37703009 0 0), header = TRUE, as.is = TRUE) closeAllConnections() result - lapply(c('sample1', 'sample2'), function(.samp){ + # split by breaks in the values + .grps - split(x, cumsum(c(0, diff(x[[.samp]]) != 0))) + + # combine the list of dataframes + .range - do.call(rbind, lapply(.grps, function(.set){ + # create a dataframe of the results + data.frame(Sample = .samp + , Chr = .set$Chr[1L] + , Start = min(.set$start) + , End = max(.set$end) + , Values = .set[[.samp]][1L] + , Probes = nrow(.set) + ) + })) + }) # put the list of dataframes together result - do.call(rbind, result) result Sample Chr Start End Values Probes 0 sample1 chr2 9896633 14404502 0.00 4 1 sample1 chr2 14421718 16048724 -0.43 4 2 sample1 chr2 37491676 37703009 0.00 2 01 sample2 chr2 9896633 9896690 0.00
Re: [R] htmlParse hangs or crashes
Hi, I been having the same problem all the afternoon, and I just realize that only happens when I use the R64 and is not crashing using the 32 bits version. This must be a bug in this R version. I hope this could helps. -- View this message in context: http://r.789695.n4.nabble.com/htmlParse-hangs-or-crashes-tp3792285p3853858.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survival analysis: interval censored data
Thanks for taking time to answer this De: Terry Therneau thern...@mayo.edu Para: David Winsemius dwinsem...@comcast.net .org Enviado: miércoles 28 de septiembre de 2011 23:03 Asunto: Re: [R] survival analysis: interval censored data David, Thanks for taking time to answer this. When there are interval censored data, computation of the MLE first finds a minimal number of time points such that every interval with a death contains one of those points. Those are the points at which the final curve can have a jump. The algorithm, more or less is: 1. On the real line, put a pair of parenthesis down for each interval with an event ( at the left end ) at the right. 2. Now scan down the line for () pairs (pairs with nothing else in the enclosed interval): the center of each will be one of the jump points. With slight variations for left truncation. Terry T. On Wed, 2011-09-28 at 16:33 -0400, David Winsemius wrote: On Sep 28, 2011, at 10:56 AM, Ruth Arias wrote: hallo terry: I attached araceae data set, The usual survival analysis via the Kaplan-Meier method only make estimates at the time of events. When you tabulate your data, you see that there were no events for the missing (starting) time rows in those categories during the intervals that you are questioning as missing: xtabs( ~ time+time2+categoria+event, data=araceae) , , categoria = C, event = 0 time2 time 2005 2006 2007 2008 2009 2010 2004 0 23 1 3 1 22 2005 0 0 0 0 0 0 2007 0 0 0 0 4 19 2008 0 0 0 0 0 0 2009 0 0 0 0 0 0 , , categoria = E, event = 0 time2 time 2005 2006 2007 2008 2009 2010 2004 0 22 0 7 3 21 2005 0 0 1 1 0 0 2007 0 0 0 0 0 29 2008 0 0 0 0 0 0 2009 0 0 0 0 0 1 , , categoria = C, event = 1 time2 time 2005 2006 2007 2008 2009 2010 2004 0 5 2 3 0 3 2005 0 0 0 0 0 0 2007 0 0 0 2 3 2 2008 0 0 0 0 1 0 2009 0 0 0 0 0 0 , , categoria = E, event = 1 time2 time 2005 2006 2007 2008 2009 2010 2004 7 2 1 1 3 4 2005 0 0 0 1 0 0 2007 0 0 0 3 1 3 2008 0 0 0 0 0 0 2009 0 0 0 0 0 0 when I use this: surara-survfit(Surv(time,time2,event)~categoria) Call: survfit(formula = Surv(time, time2, event) ~ categoria) records n.max n.start events median 0.95LCL 0.95UCL categoria=C 94 63 0 21 NA NA NA categoria=E 111 77 0 26 NA NA NA summary(surara) Call: survfit(formula = Surv(time, time2, event) ~ categoria) categoria=C time n.risk n.event entered censored survival std.err lower 95% CI upper 95% CI 2006 63 5 0 23 0.921 0.0341 0.856 0.990 2007 35 2 30 1 0.868 0.0483 0.778 0.968 2008 62 5 1 3 0.798 0.0536 0.700 0.910 2009 55 4 0 5 0.740 0.0570 0.636 0.861 2010 46 5 0 41 0.660 0.0611 0.550 0.791 categoria=E time n.risk n.event entered censored survival std.err lower 95% CI upper 95% CI 2005 71 7 3 0 0.901 0.0354 0.835 0.973 2006 67 2 0 22 0.875 0.0391 0.801 0.955 2007 43 1 36 1 0.854 0.0432 0.774 0.943 2008 77 5 0 8 0.799 0.0469 0.712 0.896 2009 64 4 1 3 0.749 0.0502 0.657 0.854 2010 58 7 0 51 0.658 0.0545 0.560 0.774 You see that your first survfit object is offering a simple sum of 'time2' columns of that tabulation as its 'n.event' values. It's 'n.risk' tabulation is not taking note of whether a case started in any particular prior interval. The n.risk sum appears to be the sum of persons surviving from the prior year less any decedents plus any entrants as reflected in future events on that row You notice that there are missing years even in that report: 2004,2005 for category C and 2004 for category E since there are no events in columns for those 'time2' values. but whe I included type=interval, suraraint- survfit(Surv(time,time2,event,type='interval')~categoria) # falta [[elided Yahoo spam]] summary(suraraint) Call: survfit(formula = Surv(time, time2, event, type = interval) ~
Re: [R] How does the survfit.coxph calculate the survival value?
Dear Professor Terry Therneau, Sorry for another naive question about R. Some one suggested me to use mathematica to analyze Cox proportional hazard model. For example estimate the coefficients, and calculate the survival and hazard function. But I think that may be not possible, especially that all the six covariates in my model are time-dependent. Could you please give me any opinion on this? Koshihaku -- View this message in context: http://r.789695.n4.nabble.com/How-does-the-survfit-coxph-calculate-the-survival-value-tp3847179p3853994.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How does the survfit.coxph calculate the survival value?
Thank you very much! I am analyzing software stress test data using R, and fiting data by coxph. I think the survfit.coxph can give the estimated baseline survival S_0(t_i), and we can calculated the survival S(t_i) by using S_0(t_i) and the x(t_i). Then can I use the function lambda={S(t_(i-1))-S(t_i)}/S(t_(i-1)) to calculate the hazard at time t_i ? Koshihaku -- View this message in context: http://r.789695.n4.nabble.com/How-does-the-survfit-coxph-calculate-the-survival-value-tp3847179p3853923.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.