Re: [R] Optimisation and NaN Errors using clm() and clmm()
On 15 April 2013 13:18, Thomas thomasfox...@aol.com wrote: Dear List, I am using both the clm() and clmm() functions from the R package 'ordinal'. I am fitting an ordinal dependent variable with 5 categories to 9 continuous predictors, all of which have been normalised (mean subtracted then divided by standard deviation), using a probit link function. From this global model I am generating a confidence set of 200 models using clm() and the 'glmulti' R package. This produces these errors: / model.2.10 - glmulti(as.factor(dependent) ~ predictor_1*predictor_2*predictor_3*predictor_4*predictor_5*predictor_6*predictor_7*predictor_8*predictor_9, data = database, fitfunc = clm, link = probit, method = g, crit = aicc, confsetsize = 200, marginality = TRUE) ... After 670 generations: Best model: as.factor(dependent)~1+predictor_1+predictor_2+predictor_3+predictor_4+predictor_5+predictor_6+predictor_8+predictor_9+predictor_4:predictor_3+predictor_6:predictor_2+predictor_8:predictor_5+predictor_9:predictor_1+predictor_9:predictor_4+predictor_9:predictor_5+predictor_9:predictor_6 Crit= 183.716706496392 Mean crit= 202.022138576506 Improvements in best and average IC have bebingo en below the specified goals. Algorithm is declared to have converged. Completed. There were 24 warnings (use warnings() to see them) warnings() Warning messages: 1: optimization failed: step factor reduced below minimum 2: optimization failed: step factor reduced below minimum 3: optimization failed: step factor reduced below minimum/ etc. I am then re-fitting each of the 200 models with the clmm() function, with 2 random factors (family nested within order). I get this error in a few of the re-fitted models: / model.2.glmm.2 - clmm(as.factor(dependent) ~ 1 + predictor_1 + predictor_2 + predictor_3 + predictor_6 + predictor_7 + predictor_8 + predictor_9 + predictor_6:predictor_2 + predictor_7:predictor_2 + predictor_7:predictor_3 + predictor_8:predictor_2 + predictor_9:predictor_1 + predictor_9:predictor_2 + predictor_9:predictor_3 + predictor_9:predictor_6 + predictor_9:predictor_7 + predictor_9:predictor_8+ (1|order/family), link = probit, data = database) summary(model.2.glmm.2) Cumulative Link Mixed Model fitted with the Laplace approximation formula: as.factor(dependent) ~ 1 + predictor_1 + predictor_2 + predictor_3 + predictor_6 + predictor_7 + predictor_8 + predictor_9 + predictor_6:predictor_2 + predictor_7:predictor_2 + predictor_7:predictor_3 + predictor_8:predictor_2 + predictor_9:predictor_1 + predictor_9:predictor_2 + predictor_9:predictor_3 + predictor_9:predictor_6 + predictor_9:predictor_7 + predictor_9:predictor_8 + (1 | order/family) data: database link threshold nobs logLik AIC niter max.grad cond.H probit flexible 103 -65.56 173.13 58(3225) 8.13e-06 4.3e+03 Random effects: Var Std.Dev family:order 7.493e-11 8.656e-06 order 1.917e-12 1.385e-06 Number of groups: family:order 12, order 4 Coefficients: Estimate Std. Error z value Pr(|z|) predictor_1 0.40802 0.78685 0.519 0.6041 predictor_2 0.02431 0.26570 0.092 0.9271 predictor_3 -0.84486 0.32056 -2.636 0.0084 ** predictor_6 0.65392 0.34348 1.904 0.0569 . predictor_7 0.71730 0.29596 2.424 0.0154 * predictor_8 -1.37692 0.75660 -1.820 0.0688 . predictor_9 0.15642 0.28969 0.540 0.5892 predictor_2:predictor_6 -0.46880 0.18829 -2.490 0.0128 * predictor_2:predictor_7 4.97365 0.82692 6.015 1.80e-09 *** predictor_3:predictor_7 -1.13192 0.46639 -2.427 0.0152 * predictor_2:predictor_8 -5.52913 0.88476 -6.249 4.12e-10 *** predictor_1:predictor_9 4.28519 NA NA NA predictor_2:predictor_9 -0.26558 0.10541 -2.520 0.0117 * predictor_3:predictor_9 -1.49790 NA NA NA predictor_6:predictor_9 -1.31538 NA NA NA predictor_7:predictor_9 -4.41998 NA NA NA predictor_8:predictor_9 3.99709 NA NA NA --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Threshold coefficients: Estimate Std. Error z value 0|1 -0.2236 0.3072 -0.728 1|2 1.4229 0.3634 3.915 (211 observations deleted due to missingness) Warning message: In sqrt(diag(vc)[1:npar]) : NaNs produced/ This warning is due to a (near) singular variance-covariance matrix of the model parameters, which in turn is due to the fact that the model converged to a boundary solution: both random effects variance parameters are zero. If you exclude the random terms and refit the model with clm, the variance-covariance matrix will probably be well defined and standard errors can be computed. Another thing is that you are fitting 17 regression parameters and 2 random effect terms (which in the end do not count) to only 103 observations. I would be worried about overfitting or perhaps even non-fitting. I think I would also be concerned about the 211 observations that are incomplete, and I would be careful with automatic model selection/averaging etc. on incomplete data (though I don't know how/if glmulti actually deals with that).
Re: [R] Sorting data.frame and again sorting within data.frame
Dear Sir, Thanks a lot for your valuable input and guidance. Regards Katherine --- On Mon, 15/4/13, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote: From: Jeff Newmiller jdnew...@dcn.davis.ca.us Subject: Re: [R] Sorting data.frame and again sorting within data.frame To: David Winsemius dwinsem...@comcast.net, Katherine Gobin katherine_go...@yahoo.com Cc: r-help@r-project.org Date: Monday, 15 April, 2013, 5:33 PM Yes, that would be because she converted to Date on the fly in her example, and so apparently did not need this reminder. --- Jeff Newmiller The . . Go Live... DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. David Winsemius dwinsem...@comcast.net wrote: On Apr 14, 2013, at 11:01 PM, Katherine Gobin wrote: Dear R forum, I have a data.frame as defied below - df = data.frame(names = c(C, A, A, B, C, B, A, B, C), dates = c(4/15/2013, 4/13/2013, 4/15/2013, 4/13/2013, 4/13/2013, 4/15/2013, 4/14/2013, 4/14/2013,4/14/2013 ),values = c(10, 31, 31, 17, 11, 34, 102, 47, 29)) df names dates values 1 C 4/15/2013 10 2 A 4/13/2013 31 3 A 4/15/2013 31 4 B 4/13/2013 17 5 C 4/13/2013 11 6 B 4/15/2013 34 7 A 4/14/2013 102 8 B 4/14/2013 47 9 C 4/14/2013 29 I need to sort df first on names in increasing order and then further on dates in a decreasing order i.e. I need So far no one has pointed out that these are not really Dates in the R sense and will not sort correctly if any of the proposed methods are applied to sequences that extend beyond6 months, i.e, until October forward. You would be advised to convert to real Date-classed variables. ?strptime ?as.Date [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Overlay two stat_ecdf() plots
Hi Do you mean ecdf? If yes just ose add option in plot. plot(ecdf(rnorm(100, 1,2))) plot(ecdf(rnorm(100, 2,2)), add=TRUE, col=2) If not please specify from where is ecdf_stat or stat_ecdf which, as you indicate, are the same functions. Regrdas Petr -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Robin Mjelle Sent: Monday, April 15, 2013 1:10 PM To: r-help@r-project.org Subject: [R] Overlay two stat_ecdf() plots I want to plot two scdf-plots in the same graph. I have two input tables with one column each: Targets - read.table(/media/, sep=, header=T) NonTargets - read.table(/media/..., sep=, header=T) head(Targets) V1 1 3.160514 2 6.701948 3 4.093844 4 1.992014 5 1.604751 6 2.076802 head(NonTargets) V1 1 3.895934 2 1.990506 3 -1.746919 4 -3.451477 5 5.156554 6 1.195109 Targets.m - melt(Targets) head(Targets.m) variablevalue 1 V1 3.160514 2 V1 6.701948 3 V1 4.093844 4 V1 1.992014 5 V1 1.604751 6 V1 2.076802 NonTargets.m - melt(NonTargets) head(NonTargets.m) variable value 1 V1 3.895934 2 V1 1.990506 3 V1 -1.746919 4 V1 -3.451477 5 V1 5.156554 6 V1 1.195109 How do I proceed to plot them in one Graph using ecdf_stat() [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Create function from string
Dear list, I am trying to create a function from a string, and have so far solved it with eval(parse()). This works well also when using the newly created function as an argument to another function. The trouble starts when I want to use it with parLapply. Below is a much simplified example: ## fstring = x+2 FUN = function(x) eval(parse(text = fstring)) FUN(3) FUN2 = function(y, func) y + func(y) FUN2(3,FUN) # I can also pass FUN as an argument to FUN2 when using foreach and parallel: library(parallel) library(foreach) cl = makeCluster(2, outfile = ) ylist = list(1:3,4:6) result = foreach(i = 1:2) %dopar% { FUN2(ylist[[i]], FUN) } # But now when I wanted to change to parLapply (actually parLapplyLB) fstring is not found anymore: parLapply(cl, as.list(1:4), FUN2, func = FUN) ## I assume there is a problem with environments, the question is how to solve this. The cleanest would be to substitute fstring with its content in FUN, but I did not figure out how. Substitute or bquote do not seem to do the trick, although I might not have tried them in the right way. Any suggestions how to solve this, either how to substitute correctly, or to completely avoid the eval(parse())? Thanks, Jon -- Jon Olav Skøien Joint Research Centre - European Commission Institute for Environment and Sustainability (IES) Land Resource Management Unit Via Fermi 2749, TP 440, I-21027 Ispra (VA), ITALY jon.sko...@jrc.ec.europa.eu Disclaimer: Views expressed in this email are those of the individual and do not necessarily represent official views of the European Commission. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Create function from string
Is this what you are looking for? FUN = eval(bquote(function(x) .(parse(text = fstring)[[1]]))) FUN function (x) x + 2 FUN(3) [1] 5 On Apr 16, 2013, at 09:50 , Jon Olav Skoien wrote: Dear list, I am trying to create a function from a string, and have so far solved it with eval(parse()). This works well also when using the newly created function as an argument to another function. The trouble starts when I want to use it with parLapply. Below is a much simplified example: ## fstring = x+2 FUN = function(x) eval(parse(text = fstring)) FUN(3) FUN2 = function(y, func) y + func(y) FUN2(3,FUN) # I can also pass FUN as an argument to FUN2 when using foreach and parallel: library(parallel) library(foreach) cl = makeCluster(2, outfile = ) ylist = list(1:3,4:6) result = foreach(i = 1:2) %dopar% { FUN2(ylist[[i]], FUN) } # But now when I wanted to change to parLapply (actually parLapplyLB) fstring is not found anymore: parLapply(cl, as.list(1:4), FUN2, func = FUN) ## I assume there is a problem with environments, the question is how to solve this. The cleanest would be to substitute fstring with its content in FUN, but I did not figure out how. Substitute or bquote do not seem to do the trick, although I might not have tried them in the right way. Any suggestions how to solve this, either how to substitute correctly, or to completely avoid the eval(parse())? Thanks, Jon -- Jon Olav Skøien Joint Research Centre - European Commission Institute for Environment and Sustainability (IES) Land Resource Management Unit Via Fermi 2749, TP 440, I-21027 Ispra (VA), ITALY jon.sko...@jrc.ec.europa.eu Disclaimer: Views expressed in this email are those of the individual and do not necessarily represent official views of the European Commission. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Create function from string
Thanks a lot, that seems to do exactly what I need! Best wishes, Jon On 16-Apr-13 10:21, peter dalgaard wrote: Is this what you are looking for? FUN = eval(bquote(function(x) .(parse(text = fstring)[[1]]))) FUN function (x) x + 2 FUN(3) [1] 5 On Apr 16, 2013, at 09:50 , Jon Olav Skoien wrote: Dear list, I am trying to create a function from a string, and have so far solved it with eval(parse()). This works well also when using the newly created function as an argument to another function. The trouble starts when I want to use it with parLapply. Below is a much simplified example: ## fstring = x+2 FUN = function(x) eval(parse(text = fstring)) FUN(3) FUN2 = function(y, func) y + func(y) FUN2(3,FUN) # I can also pass FUN as an argument to FUN2 when using foreach and parallel: library(parallel) library(foreach) cl = makeCluster(2, outfile = ) ylist = list(1:3,4:6) result = foreach(i = 1:2) %dopar% { FUN2(ylist[[i]], FUN) } # But now when I wanted to change to parLapply (actually parLapplyLB) fstring is not found anymore: parLapply(cl, as.list(1:4), FUN2, func = FUN) ## I assume there is a problem with environments, the question is how to solve this. The cleanest would be to substitute fstring with its content in FUN, but I did not figure out how. Substitute or bquote do not seem to do the trick, although I might not have tried them in the right way. Any suggestions how to solve this, either how to substitute correctly, or to completely avoid the eval(parse())? Thanks, Jon -- Jon Olav Skøien Joint Research Centre - European Commission Institute for Environment and Sustainability (IES) Land Resource Management Unit Via Fermi 2749, TP 440, I-21027 Ispra (VA), ITALY jon.sko...@jrc.ec.europa.eu Disclaimer: Views expressed in this email are those of the individual and do not necessarily represent official views of the European Commission. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jon Olav Skøien Joint Research Centre - European Commission Institute for Environment and Sustainability (IES) Land Resource Management Unit Via Fermi 2749, TP 440, I-21027 Ispra (VA), ITALY jon.sko...@jrc.ec.europa.eu Disclaimer: Views expressed in this email are those of the individual and do not necessarily represent official views of the European Commission. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] HMM Package parameter estimation
I think it's your starting values for the initial state probability distribution, i.e. c(1,1,1)/3 that cause the problem. They seem to drop you into some sort of local maximum/stationary point, a long way from the global maximum. Try, e.g. c(4,2,1)/7; this gives me: hmmFit$hmm$emissionProbs symbols states 1 2 1 0.9385018 0.06149819 2 0.7883591 0.21164092 3 0.2279287 0.77207131 hmmFit$hmm$transProbs to from 1 2 3 1 0.6925055 0.1239590 0.1835355 2 0.2537700 0.5780679 0.1681621 3 0.2455462 0.1190872 0.6353666 which look to be in reasonable agreement with the true values. Note though that states 2 and 3 have been swapped. This happens. cheers, Rolf Turner On 16/04/13 13:13, Richard Philip wrote: Hi, I am having difficulties estimating the parameters of a HMM using the HMM package. I have simulated a sequence of observations from a known HMM. When I estimate the parameters of a HMM using these simulated observations the parameters are not at all close to the known ones. I realise the estimated parameters are not going to be exactly the same as the known/true parameters, but these are nowhere close. Below is my code used. Any ideas or possible suggestions regarding this issue would be greatly appreciated? library(HMM) ## DECLARE PARAMETERS OF THE KNOWN MODEL states = c(1,2,3) symbols = c(1,2) startProb = c(0.5,0.25,0.25) transProb = matrix(c(0.8,0.05,0.15,0.2,0.6,0.2,0.2,0.3,0.5),3,3,TRUE) emissionProb = matrix(c(0.9,0.1,0.2,0.8,0.7,0.3), 3,2,TRUE) # CREATE THE KNOWN MODEL hmmTrue = initHMM(states, symbols, startProb, transProb , emissionProb) # SIMULATE 1000 OBSERVATIONS OF THE KNOWN MODEL observation = simHMM(hmmTrue, 1000) obs = observation$observation #ESTIMATE A MODEL USING THE OBSERVATIONS GENERATED FROM THE KNOWN MODEL hmmInit = initHMM(states, symbols, c(1/3,1/3,1/3)) hmmFit = baumWelch(hmmInit, obs) #The parameters of hmmTrue and hmmFit are not at all alike, why is this? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ZA unit root test lag order selection
Hi Anonymous There are different methods to select lags in unit roots tests, the two you mention are not fundamentally wrong, and belong to the standard methods used, even if the IC selection is maybe now the prefered solution. Note there is some work from Perron and Ng with a refined selection criterion with better properties, unfortunately it hasn't been implemented in R as far as I know. You are not mentioning the package you use (nor the code?), I guess you use urca? In this case, you could extract AIC/BIC with: library(urca) data(nporg) gnp - na.omit(nporg[, gnp.r]) za.gnp - ur.za(gnp, model=both, lag=2) summary(za.gnp) logLik.ur.za - function(object,...) logLik(object@testreg) ## necessary as AIC not directly implemented AIC(za.gnp) Best Matthieu I was wondering if anyone could help with choosing optimal lag length for ZA test. There have been two lag order selection methods commonly used in the literature: 1) The ZA paper recommends to run the test with maximum number of lags. Then the lag order is reduced sequentially until the longest lag is statistically significant; 2) One could also use AIC or SBC or other criteria to choose lag order. I am using annual series with 22 observations. Which of the above lag order selection procedures would be correct to apply? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] assistant
Dear Sir/Ma, I Adelabu.A.A, one of the R-users from Nigeria. When am running a coxph command the below error was generated, and have try some idea but not going through. kindly please assist: cox1 - coxph(Surv(tmonth,status) ~ sex + age + marital + sumassure, X) Warning message: In fitter(X, Y, strats, offset, init, control, weights = weights, : Ran out of iterations and did not converge summary(cox1, conf.int=0.95, exact = TRUE) Call: coxph(formula = Surv(tmonth, status) ~ sex + age + marital + sumassure, data = X) n= 5958, number of events= 316 coef exp(coef) se(coef) z Pr(|z|) sex -1.418e-01 8.678e-01 1.743e-01 -0.814 0.41593 age 1.492e-03 1.001e+00 1.369e-03 1.090 0.27593 marital 4.283e-01 1.535e+00 2.201e-01 1.946 0.05165 . sumassure1 -1.553e+01 1.795e-07 3.699e+03 -0.004 0.99665 sumassure10 -7.357e-01 4.792e-01 1.156e+00 -0.636 0.52467 sumassure100 -1.556e+01 1.747e-07 5.556e+02 -0.028 0.97766 sumassure1000800 -1.549e+01 1.874e-07 4.062e+03 -0.004 0.99696 sumassure1002000 -1.564e+01 1.607e-07 2.408e+03 -0.006 0.99482 sumassure1008000 -1.562e+01 1.650e-07 3.225e+03 -0.005 0.99614 sumassure1008-1.541e+01 2.028e-07 7.148e+03 -0.002 0.99828 sumassure1014673.1 -1.543e+01 1.988e-07 6.218e+03 -0.002 0.99802 sumassure101737.031.186e+00 3.275e+00 1.418e+00 0.836 0.40288 sumassure101850.551.054e+00 2.870e+00 1.418e+00 0.743 0.45731 sumassure102000 4.671e-02 1.048e+00 1.416e+00 0.033 0.97369 sumassure102 -1.525e+01 2.375e-07 3.578e+03 -0.004 0.99660 sumassure1027251.36 -1.568e+01 1.557e-07 3.699e+03 -0.004 0.99662 sumassure1035360.53 -1.542e+01 2.015e-07 6.961e+03 -0.002 0.99823 sumassure1043436.77 -1.547e+01 1.905e-07 5.366e+03 -0.003 0.99770 sumassure1043438.77 -1.547e+01 1.908e-07 1.981e+03 -0.008 0.99377 sumassure10482402.52 -1.567e+01 1.560e-07 3.699e+03 -0.004 0.99662 sumassure105000 -1.556e+01 1.755e-07 3.493e+03 -0.004 0.99645 sumassure105 -1.562e+01 1.644e-07 3.293e+03 -0.005 0.99622 sumassure1052631.57 -1.555e+01 1.764e-07 4.870e+03 -0.003 0.99745 sumassure1056363.94 -1.498e+01 3.123e-07 7.384e+03 -0.002 0.99838 sumassure1059480 -1.555e+01 1.763e-07 1.589e+03 -0.010 0.99219 sumassure1073559.38 -1.551e+01 1.842e-07 5.238e+03 -0.003 0.99764 sumassure108000 2.147e+00 8.558e+00 1.420e+00 1.512 0.13056 sumassure108 -2.121e+00 1.200e-01 1.226e+00 -1.730 0.08367 . sumassure1080-1.532e+01 2.215e-07 2.657e+03 -0.006 0.99540 sumassure1081137.2 -1.534e+01 2.182e-07 4.921e+03 -0.003 0.99751 sumassure108591.751.126e+00 3.084e+00 1.418e+00 0.794 0.42705 sumassure110 -1.553e+01 1.803e-07 3.699e+03 -0.004 0.99665 sumassure11121828.15 -1.526e+01 2.351e-07 6.735e+03 -0.002 0.99819 sumassure111417.021.042e+00 2.836e+00 1.418e+00 0.735 0.46240 sumassure1116251.83 2.065e+00 7.889e+00 1.421e+00 1.454 0.14595 sumassure1122821.57 -1.567e+01 1.569e-07 3.699e+03 -0.004 0.99662 ... Concordance= 0.781 (se = 0.018 ) Rsquare= 0.119 (max possible= 0.577 ) Likelihood ratio test= 752.4 on 526 df, p=2.95e-10 Wald test= 655.2 on 526 df, p=0.000101 Score (logrank) test = 2262 on 526 df, p=0 The sumassure is a sum assured amount of policy holder in insurance. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] HMM Package parameter estimation
It seems that indeed providing other starting values initiates iterations to take place. However, more worrisome is that the does not seem to converge, even when upping the number of iterations. Below I run a 2 state model on the same as well for comparison (I have added set.seed statements to make the output exactly reproducible). For 2 and 3 state models I get (see code below): fm2 Convergence info: Log likelihood converged to within tol. (relative change) 'log Lik.' -585.7628 (df=5) AIC: 1181.526 BIC: 1206.064 fm3 Convergence info: 'maxit' iterations reached in EM without convergence. 'log Lik.' -585.5807 (df=11) AIC: 1193.161 BIC: 1247.147 hth, Ingmar set.seed(11) # SIMULATE 1000 OBSERVATIONS OF THE KNOWN MODEL observation = simHMM(hmmTrue, 1000) obs = observation$observation #ESTIMATE A MODEL USING THE OBSERVATIONS GENERATED FROM THE KNOWN MODEL hmmInit = initHMM(states, symbols, c(4,2,1)/7 ) # # hmmFit = baumWelch(hmmInit, obs) hmmFit = baumWelch(hmmInit, obs, maxI=200) library(depmixS4) m2 - depmix(obs~1,family=multinomial(identity),ns=2,nt=1000) set.seed(12) fm2 - fit(m2) m3 - depmix(obs~1,family=multinomial(identity),ns=3,nt=1000) set.seed(13) fm3 - fit(m3) On Tue, Apr 16, 2013 at 11:53 AM, Rolf Turner rolf.tur...@xtra.co.nzwrote: I think it's your starting values for the initial state probability distribution, i.e. c(1,1,1)/3 that cause the problem. They seem to drop you into some sort of local maximum/stationary point, a long way from the global maximum. Try, e.g. c(4,2,1)/7; this gives me: hmmFit$hmm$emissionProbs symbols states 1 2 1 0.9385018 0.06149819 2 0.7883591 0.21164092 3 0.2279287 0.77207131 hmmFit$hmm$transProbs to from 1 2 3 1 0.6925055 0.1239590 0.1835355 2 0.2537700 0.5780679 0.1681621 3 0.2455462 0.1190872 0.6353666 which look to be in reasonable agreement with the true values. Note though that states 2 and 3 have been swapped. This happens. cheers, Rolf Turner On 16/04/13 13:13, Richard Philip wrote: Hi, I am having difficulties estimating the parameters of a HMM using the HMM package. I have simulated a sequence of observations from a known HMM. When I estimate the parameters of a HMM using these simulated observations the parameters are not at all close to the known ones. I realise the estimated parameters are not going to be exactly the same as the known/true parameters, but these are nowhere close. Below is my code used. Any ideas or possible suggestions regarding this issue would be greatly appreciated? library(HMM) ## DECLARE PARAMETERS OF THE KNOWN MODEL states = c(1,2,3) symbols = c(1,2) startProb = c(0.5,0.25,0.25) transProb = matrix(c(0.8,0.05,0.15,0.2,0.**6,0.2,0.2,0.3,0.5),3,3,TRUE) emissionProb = matrix(c(0.9,0.1,0.2,0.8,0.7,**0.3), 3,2,TRUE) # CREATE THE KNOWN MODEL hmmTrue = initHMM(states, symbols, startProb, transProb , emissionProb) # SIMULATE 1000 OBSERVATIONS OF THE KNOWN MODEL observation = simHMM(hmmTrue, 1000) obs = observation$observation #ESTIMATE A MODEL USING THE OBSERVATIONS GENERATED FROM THE KNOWN MODEL hmmInit = initHMM(states, symbols, c(1/3,1/3,1/3)) hmmFit = baumWelch(hmmInit, obs) #The parameters of hmmTrue and hmmFit are not at all alike, why is this? __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] assistant
Looks like sumassure is treated as categorical. This sort of thing is usually a data error; it happens if one of the values can not be converted to numeric, O instead of 0, comma instead of period, etc. Check summary(X), or, to investigate more specifically, things like x - X$sumassure table(x[is.na(as.numeric(x))]) -pd On Apr 16, 2013, at 11:31 , Adelabu Ahmmed wrote: Dear Sir/Ma, I Adelabu.A.A, one of the R-users from Nigeria. When am running a coxph command the below error was generated, and have try some idea but not going through. kindly please assist: cox1 - coxph(Surv(tmonth,status) ~ sex + age + marital + sumassure, X) Warning message: In fitter(X, Y, strats, offset, init, control, weights = weights, : Ran out of iterations and did not converge summary(cox1, conf.int=0.95, exact = TRUE) Call: coxph(formula = Surv(tmonth, status) ~ sex + age + marital + sumassure, data = X) n= 5958, number of events= 316 coef exp(coef) se(coef) z Pr(|z|) sex -1.418e-01 8.678e-01 1.743e-01 -0.814 0.41593 age 1.492e-03 1.001e+00 1.369e-03 1.090 0.27593 marital 4.283e-01 1.535e+00 2.201e-01 1.946 0.05165 . sumassure1 -1.553e+01 1.795e-07 3.699e+03 -0.004 0.99665 sumassure10 -7.357e-01 4.792e-01 1.156e+00 -0.636 0.52467 sumassure100 -1.556e+01 1.747e-07 5.556e+02 -0.028 0.97766 sumassure1000800 -1.549e+01 1.874e-07 4.062e+03 -0.004 0.99696 sumassure1002000 -1.564e+01 1.607e-07 2.408e+03 -0.006 0.99482 sumassure1008000 -1.562e+01 1.650e-07 3.225e+03 -0.005 0.99614 sumassure1008-1.541e+01 2.028e-07 7.148e+03 -0.002 0.99828 sumassure1014673.1 -1.543e+01 1.988e-07 6.218e+03 -0.002 0.99802 sumassure101737.031.186e+00 3.275e+00 1.418e+00 0.836 0.40288 sumassure101850.551.054e+00 2.870e+00 1.418e+00 0.743 0.45731 sumassure102000 4.671e-02 1.048e+00 1.416e+00 0.033 0.97369 sumassure102 -1.525e+01 2.375e-07 3.578e+03 -0.004 0.99660 sumassure1027251.36 -1.568e+01 1.557e-07 3.699e+03 -0.004 0.99662 sumassure1035360.53 -1.542e+01 2.015e-07 6.961e+03 -0.002 0.99823 sumassure1043436.77 -1.547e+01 1.905e-07 5.366e+03 -0.003 0.99770 sumassure1043438.77 -1.547e+01 1.908e-07 1.981e+03 -0.008 0.99377 sumassure10482402.52 -1.567e+01 1.560e-07 3.699e+03 -0.004 0.99662 sumassure105000 -1.556e+01 1.755e-07 3.493e+03 -0.004 0.99645 sumassure105 -1.562e+01 1.644e-07 3.293e+03 -0.005 0.99622 sumassure1052631.57 -1.555e+01 1.764e-07 4.870e+03 -0.003 0.99745 sumassure1056363.94 -1.498e+01 3.123e-07 7.384e+03 -0.002 0.99838 sumassure1059480 -1.555e+01 1.763e-07 1.589e+03 -0.010 0.99219 sumassure1073559.38 -1.551e+01 1.842e-07 5.238e+03 -0.003 0.99764 sumassure108000 2.147e+00 8.558e+00 1.420e+00 1.512 0.13056 sumassure108 -2.121e+00 1.200e-01 1.226e+00 -1.730 0.08367 . sumassure1080-1.532e+01 2.215e-07 2.657e+03 -0.006 0.99540 sumassure1081137.2 -1.534e+01 2.182e-07 4.921e+03 -0.003 0.99751 sumassure108591.751.126e+00 3.084e+00 1.418e+00 0.794 0.42705 sumassure110 -1.553e+01 1.803e-07 3.699e+03 -0.004 0.99665 sumassure11121828.15 -1.526e+01 2.351e-07 6.735e+03 -0.002 0.99819 sumassure111417.021.042e+00 2.836e+00 1.418e+00 0.735 0.46240 sumassure1116251.83 2.065e+00 7.889e+00 1.421e+00 1.454 0.14595 sumassure1122821.57 -1.567e+01 1.569e-07 3.699e+03 -0.004 0.99662 ... Concordance= 0.781 (se = 0.018 ) Rsquare= 0.119 (max possible= 0.577 ) Likelihood ratio test= 752.4 on 526 df, p=2.95e-10 Wald test= 655.2 on 526 df, p=0.000101 Score (logrank) test = 2262 on 526 df, p=0 The sumassure is a sum assured amount of policy holder in insurance. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Splitting the Elements of character vector
Dear R forum I have a data.frame df = data.frame(currency_type = c(EURO_o_n, EURO_o_n, EURO_1w, EURO_1w, USD_o_n, USD_o_n, USD_1w, USD_1w), rates = c(0.47, 0.475, 0.461, 0.464, 1.21, 1.19, 1.41, 1.43)) currency_type rates 1 EURO_o_n 0.470 2 EURO_o_n 0.475 3 EURO_1w 0.461 4 EURO_1w 0.464 5 USD_o_n 1.210 6 USD_o_n 1.190 7 USD_1w 1.410 8 USD_1w 1.430 I need to split the values appearing under currency_type to obtain following data.frame in the original order currency tenor rates EURO o_n 0.470 EURO o_n 0.475 EURO 1w 0.461 EURO 1w 0.464 USD o_n 1.210 USD o_n 1.190 USD 1w 1.410 USD 1w 1.430 Basically I need to split the currency name and tenors. I tried strsplit(df$currency_type, _) Error in strsplit(df$currency_type, _) : non-character argument Kindly guide Katherine [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splitting the Elements of character vector
On Tue, Apr 16, 2013 at 8:38 AM, Katherine Gobin katherine_go...@yahoo.com wrote: Dear R forum I have a data.frame df = data.frame(currency_type = c(EURO_o_n, EURO_o_n, EURO_1w, EURO_1w, USD_o_n, USD_o_n, USD_1w, USD_1w), rates = c(0.47, 0.475, 0.461, 0.464, 1.21, 1.19, 1.41, 1.43)) currency_type rates 1 EURO_o_n 0.470 2 EURO_o_n 0.475 3 EURO_1w 0.461 4 EURO_1w 0.464 5 USD_o_n1.210 6 USD_o_n1.190 7USD_1w1.410 8USD_1w1.430 I need to split the values appearing under currency_type to obtain following data.frame in the original order currency tenor rates EURO o_n 0.470 EURO o_n 0.475 EURO 1w 0.461 EURO 1w 0.464 USD o_n 1.210 USD o_n 1.190 USD 1w 1.410 USD 1w 1.430 Basically I need to split the currency name and tenors. Try sub: with(df, data.frame( currency = sub(_.*, , currency_type), tenor = sub(^[^_]*_, , currency_type), rates) ) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splitting the Elements of character vector
Hi, Try: df = data.frame(currency_type = c(EURO_o_n, EURO_o_n, EURO_1w, EURO_1w, USD_o_n, USD_o_n, USD_1w, USD_1w), rates = c(0.47, 0.475, 0.461, 0.464, 1.21, 1.19, 1.41, 1.43),stringsAsFactors=FALSE) df$currency-unlist(lapply(str_split(df[,1],_),`[`,1)) df$tenor-unlist(lapply(str_split(df[,1],_),function(x) {paste(x[-1],collapse=_)})) df[,c(3,4,2)] # currency tenor rates #1 EURO o_n 0.470 #2 EURO o_n 0.475 #3 EURO 1w 0.461 #4 EURO 1w 0.464 #5 USD o_n 1.210 #6 USD o_n 1.190 #7 USD 1w 1.410 #8 USD 1w 1.430 A.K. - Original Message - From: Katherine Gobin katherine_go...@yahoo.com To: r-help@r-project.org Cc: Sent: Tuesday, April 16, 2013 8:38 AM Subject: [R] Splitting the Elements of character vector Dear R forum I have a data.frame df = data.frame(currency_type = c(EURO_o_n, EURO_o_n, EURO_1w, EURO_1w, USD_o_n, USD_o_n, USD_1w, USD_1w), rates = c(0.47, 0.475, 0.461, 0.464, 1.21, 1.19, 1.41, 1.43)) currency_type rates 1 EURO_o_n 0.470 2 EURO_o_n 0.475 3 EURO_1w 0.461 4 EURO_1w 0.464 5 USD_o_n 1.210 6 USD_o_n 1.190 7 USD_1w 1.410 8 USD_1w 1.430 I need to split the values appearing under currency_type to obtain following data.frame in the original order currency tenor rates EURO o_n 0.470 EURO o_n 0.475 EURO 1w 0.461 EURO 1w 0.464 USD o_n 1.210 USD o_n 1.190 USD 1w 1.410 USD 1w 1.430 Basically I need to split the currency name and tenors. I tried strsplit(df$currency_type, _) Error in strsplit(df$currency_type, _) : non-character argument Kindly guide Katherine [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splitting the Elements of character vector
HI, You can also do this by: library(stringr) df2-data.frame(currency=word(str_replace(df[,1],_, ),1), temor=word(str_replace(df[,1],_, ),2), rates=df$rates,stringsAsFactors=FALSE) df2 # currency temor rates #1 EURO o_n 0.470 #2 EURO o_n 0.475 #3 EURO 1w 0.461 #4 EURO 1w 0.464 #5 USD o_n 1.210 #6 USD o_n 1.190 #7 USD 1w 1.410 #8 USD 1w 1.430 A.K. - Original Message - From: arun smartpink...@yahoo.com To: Katherine Gobin katherine_go...@yahoo.com Cc: R help r-help@r-project.org Sent: Tuesday, April 16, 2013 9:00 AM Subject: Re: [R] Splitting the Elements of character vector Hi, Try: df = data.frame(currency_type = c(EURO_o_n, EURO_o_n, EURO_1w, EURO_1w, USD_o_n, USD_o_n, USD_1w, USD_1w), rates = c(0.47, 0.475, 0.461, 0.464, 1.21, 1.19, 1.41, 1.43),stringsAsFactors=FALSE) df$currency-unlist(lapply(str_split(df[,1],_),`[`,1)) df$tenor-unlist(lapply(str_split(df[,1],_),function(x) {paste(x[-1],collapse=_)})) df[,c(3,4,2)] # currency tenor rates #1 EURO o_n 0.470 #2 EURO o_n 0.475 #3 EURO 1w 0.461 #4 EURO 1w 0.464 #5 USD o_n 1.210 #6 USD o_n 1.190 #7 USD 1w 1.410 #8 USD 1w 1.430 A.K. - Original Message - From: Katherine Gobin katherine_go...@yahoo.com To: r-help@r-project.org Cc: Sent: Tuesday, April 16, 2013 8:38 AM Subject: [R] Splitting the Elements of character vector Dear R forum I have a data.frame df = data.frame(currency_type = c(EURO_o_n, EURO_o_n, EURO_1w, EURO_1w, USD_o_n, USD_o_n, USD_1w, USD_1w), rates = c(0.47, 0.475, 0.461, 0.464, 1.21, 1.19, 1.41, 1.43)) currency_type rates 1 EURO_o_n 0.470 2 EURO_o_n 0.475 3 EURO_1w 0.461 4 EURO_1w 0.464 5 USD_o_n 1.210 6 USD_o_n 1.190 7 USD_1w 1.410 8 USD_1w 1.430 I need to split the values appearing under currency_type to obtain following data.frame in the original order currency tenor rates EURO o_n 0.470 EURO o_n 0.475 EURO 1w 0.461 EURO 1w 0.464 USD o_n 1.210 USD o_n 1.190 USD 1w 1.410 USD 1w 1.430 Basically I need to split the currency name and tenors. I tried strsplit(df$currency_type, _) Error in strsplit(df$currency_type, _) : non-character argument Kindly guide Katherine [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting blank cells to NAs
Hi, I am not sure about the problem. If your non-numeric vector is like: a,b,,d,e,,f vec1-unlist(str_split(readLines(textConnection(a,b,,d,e,,f)),,)) vec1[vec1==]- NA vec1 #[1] a b NA d e NA f If this doesn't work, please provide an example vector. A.K. Thanks for the response. That seems to do the trick as far replacing the empty cells with NA, however, the problem remains that the vector is not numeric. This was the reason I wanted to replace the empty cells with NAs in the first place. Forcing the vector with as.numeric afterwards doesn't seem to work either, I get nonsensical results. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R process slow down after a amount of time
Hi R users, I have mentioned that R is getting slower if a process with a loop runs for a while. Is that normal? Let's say, I have a code which produce an output file after one loop run. Now after 10, 15 or 20 loop runs the time between the created files is stongly increasing. Is there maybe any data which fill some memory? Chris -- View this message in context: http://r.789695.n4.nabble.com/R-process-slow-down-after-a-amount-of-time-tp4664358.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R process slow down after a amount of time
On Apr 16, 2013, at 9:52 AM, Chris82 rubenba...@gmx.de wrote: Hi R users, I have mentioned that R is getting slower if a process with a loop runs for a while. Is that normal? Let's say, I have a code which produce an output file after one loop run. Now after 10, 15 or 20 loop runs the time between the created files is stongly increasing. Is there maybe any data which fill some memory? Possibly, but I were to put money on it, I'd guess there's an ever-expanding object problem: x - NULL for(i in 1:1e6) x - c(x, rnorm(1)) which is not-so-secretly quadratic and should instead be: x - rnorm(1e6) Perhaps a small reproducible example would help us help you. Michael Chris -- View this message in context: http://r.789695.n4.nabble.com/R-process-slow-down-after-a-amount-of-time-tp4664358.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] use of simulate.Arima (forecast package)
Hello, The help page is pretty clear, I think. You have to pass an object of class 'Arima', 'ar' or 'ets' to simulate.Arima. See, for instance the second example in the help page for ?Arima. And extend it like this: set.seed(6816) lines(simulate(air.model, nsim = 48), col = red) Hope this helps, Rui Barradas Em 15-04-2013 15:13, Stefano Sofia escreveu: I would like to simulate some SARIMA models, e.g. a SARIMA (1,0,1)(1,0,1)[4] process. I installed the package 'forecast', where the function simulate.Arima should do what I am trying to do. I am not able to understand how it works Could somebody help me with an example? thank you Stefano Sofia AVVISO IMPORTANTE: Questo messaggio di posta elettronica può contenere informazioni confidenziali, pertanto è destinato solo a persone autorizzate alla ricezione. I messaggi di posta elettronica per i client di Regione Marche possono contenere informazioni confidenziali e con privilegi legali. Se non si è il destinatario specificato, non leggere, copiare, inoltrare o archiviare questo messaggio. Se si è ricevuto questo messaggio per errore, inoltrarlo al mittente ed eliminarlo completamente dal sistema del proprio computer. Ai sensi dell’art. 6 della DGR n. 1394/2008 si segnala che, in caso di necessità ed urgenza, la risposta al presente messaggio di posta elettronica può essere visionata da persone estranee al destinatario. IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages to clients of Regione Marche may contain information that is confidential and legally privileged. Please do not read, copy, forward, or store this message unless you are an intended recipient of it. If you have received this message in error, please forward it to the sender and delete it completely from your computer system. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ZA unit root test lag order selection
Dear Matthieu, Many thanks for your reply. I was not sure what the best way forward in selecting lag length. Eventually I wrote a function that carries out serial correl test and AIC based lag length selections. I used urca package. Here is what I come up with in the end: zamod.A=ur.za(x, model=intercept,lag=j) AIC(eval(attributes(zamod.A)$testreg)) # for lag order selection bgtest(attributes(zamod.A)$testreg,order=3)$p.value# to test for serial correlation the final decision on the best model comes from examining the results of AIC and BG tests. Thanks, Rosh -- View this message in context: http://r.789695.n4.nabble.com/ZA-unit-root-test-lag-order-selection-tp4664183p4664350.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 10% off Intro R training from RStudio: NYC May 13-14, SF May 20-21
Hi all, At RStudio, we're hosting our Introduction to R Workshop this May in two locations. As an R-help subscriber, we're offering 10% off! * Intro to data science with R (http://goo.gl/bplg3) May 13-14 New York City * Intro to data science with R (http://goo.gl/VCUFL) May 20-21 San Francisco Bay Area What will you learn? Practical skills for visualizing, transforming, and modeling data in R. During this two-day course, you will learn how to explore and understand data as well as how to do basic programming in R. Our courses incorporate a mix of lectures and hands-on learning. Expect to learn about a topic and then immediately put it into practice with a small example. Plenty of help will be available if you get stuck. You can read more about our training philosophy at http://www.rstudio.com/training/philosophy.html To see prices, precise locations and to register: * for the NY course: http://rstudio-nyc.eventbrite.com/ * for the SF course: http://rstudio-bay.eventbrite.com/ We have limited discounts for students (66% off) and academics (33% off) - please contact j...@rstudio.com for details. To thank the R-help community for being such a great resource, we'd also like to offer all R-help subscribers a 10% discount. Just enter rhelpftw as a promotional code get 10% off! Regards, Hadley PS. Would you like us to offer these courses (or others!) in your area? Please let us know at http://www.rstudio.com/training/workshops/ -- Chief Scientist, RStudio http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] varSelRF help
#this is my data set. data_set-data.frame(x0=c(1,1,0,0), x1=c(1,1,0,0),x2=c(1,1,0,0),x3=c(1,1,0,0),x4=c(1,1,0,0)) #this is my target target-c(1,1,0,0) rf.vs1 - varSelRF(data_set, as.factor(target), ntree = 500, ntreeIterat = 300, vars.drop.frac = 0.2) rf.vs1 rf.vs1[[3]] It is giving me only 2 significant variables, but I am expecting 5 significant variables. Please help. I am a newbie. Regards, Thabung [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Spatial Ananlysis: zero.policy=TRUE doesn't work for no neighbour regions??
Hello, I'm new to R and to Spatial Analysis and got a problem trying to create a Spatial Weights Matrix. *I us the following code to create the Neighbourslist:* library(maptools) library(spdep) library(rgdal) location_County- readShapePoly() proj4string(location_County)- CRS(+proj=longlat ellps=WGS84) location_nbq- poly2nb(location_County) summary(location_nbq) *And get this Output:* Neighbour list object: Number of regions: 3109 Number of nonzero links: 18246 Percentage nonzero weights: 0.1887671 Average number of links: 5.868768 4 regions with no links: 35 689 709 881 Link number distribution: 0123456789 10 11 13 14 4 29 40 94 283 616 1045 703 228 51 12211 29 least connected regions: 45 49 587 645 844 853 1206 1286 1391 1416 1456 1478 1485 1545 1546 1548 1558 1612 1621 1663 1672 1675 1760 1794 1795 2924 2925 2952 3107 with 1 link 1 most connected region: 1385 with 14 links *As there are some regions without neighbours in my data I use the following code to create the Weights Matrix:* W_Matrix- nb2listw(location_nbq, style=W, zero.policy=TRUE) W_Matrix *And get this Output:* Fehler in print.listw(list(style = W, neighbours = list(c(23L, 31L, 42L : regions with no neighbours found, use zero.policy=TRUE /(Error in print.listw(list(style = W, neighbours = list(c(23L, 31L, 42L : regions with no neighbours found, use zero.policy=TRUE)/ As I use zero.policy=TRUE I just don't understand what I'm doing wrong... My question would be: How could I create a Weights Matrix allowing for no-neighbour areas? Thanks Michael -- View this message in context: http://r.789695.n4.nabble.com/Spatial-Ananlysis-zero-policy-TRUE-doesn-t-work-for-no-neighbour-regions-tp4664367.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help needed with data format required for package VIF
Hallo Could somebody perhaps assist with my dilemma, Package: VIF. The examples are not very clear (data is stored internally). I wish to read a .csv file (header=TRUE) and run VIF. But I get nonsensical output. I have downloaded the boston.csv file (from the referring website). How do I run the example using this file format directly (say, using read.table ?? Any help is greatly appreciated. Regards Jacob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matching multiple fields from a matrix
Hi Arun, This is excellent and elegant. I thought there had to be a relatively simple way to do this. Thank you very much. Jeremy From: arun kirshna [via R] [mailto:ml-node+s789695n4664328...@n4.nabble.com] Sent: Monday, April 15, 2013 10:34 PM To: Crowley, Jeremy Subject: Re: matching multiple fields from a matrix HI, May be this helps: dat1- read.table(text= site1 depth1 year1 site2 depth2 year2 10 30 1860 NA NA NA NA NA NA 50 30 1860 10 20 1850 11 20 1850 11 25 1950 12 25 1960 10 NA 1870 12 30 1960 11 25 1880 15 22 1890 14 22 1890 14 25 1880 ,sep=,header=TRUE,stringsAsFactors=FALSE) res-merge(dat1[,1:3],dat1[,4:6],by.x=c(depth1,year1),by.y=c(depth2,year2)) names(res)[1:2]- gsub(\\d+file:///\\d+,,names(res))[1:2] na.omit(res) # depth year site1 site2 #120 18501011 #222 18901415 #325 18801114 #430 18601050 A.K. - Original Message - From: jercrowley [hidden email]/user/SendEmail.jtp?type=nodenode=4664328i=0 To: [hidden email]/user/SendEmail.jtp?type=nodenode=4664328i=1 Cc: Sent: Monday, April 15, 2013 5:07 PM Subject: [R] matching multiple fields from a matrix I have been trying many ways to match 2 separate fields in a matrix. Here is a simplified version of the matrix: site1depth1year1site2depth2year2 10301860NANANA NANANA50301860 Basically I am trying to identify the sites which have a common year and depth from 2 datasets. What I would like to do is match all of the year1 field to year2 field and the depth1 field and to depth2 field. Then I would like to output site1, site2, depth, and year. I have been trying if loops, which(), isTRUE(), etc. but I have not come up with anything that works. Any help would be greatly appreciated. Jeremy -- View this message in context: http://r.789695.n4.nabble.com/matching-multiple-fields-from-a-matrix-tp4664309.html Sent from the R help mailing list archive at Nabble.com. __ [hidden email]/user/SendEmail.jtp?type=nodenode=4664328i=2 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ [hidden email]/user/SendEmail.jtp?type=nodenode=4664328i=3 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/matching-multiple-fields-from-a-matrix-tp4664309p4664328.html To unsubscribe from matching multiple fields from a matrix, click herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4664309code=amNyb3dsZXkyQG10ZWNoLmVkdXw0NjY0MzA5fDEwMzU2Mjk5ODI=. NAMLhttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://r.789695.n4.nabble.com/matching-multiple-fields-from-a-matrix-tp4664309p4664376.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 10% off Intro R training from RStudio: NYC May 13-14, SF May 20-21
Hadley: I don't think this is appropriate. Think of what it would be like if everyone shilled their R training and consulting wares here. Bert Sent from my iPhone -- please excuse typos. On Apr 16, 2013, at 8:09 AM, Hadley Wickham h.wick...@gmail.com wrote: Hi all, At RStudio, we're hosting our Introduction to R Workshop this May in two locations. As an R-help subscriber, we're offering 10% off! * Intro to data science with R (http://goo.gl/bplg3) May 13-14 New York City * Intro to data science with R (http://goo.gl/VCUFL) May 20-21 San Francisco Bay Area What will you learn? Practical skills for visualizing, transforming, and modeling data in R. During this two-day course, you will learn how to explore and understand data as well as how to do basic programming in R. Our courses incorporate a mix of lectures and hands-on learning. Expect to learn about a topic and then immediately put it into practice with a small example. Plenty of help will be available if you get stuck. You can read more about our training philosophy at http://www.rstudio.com/training/philosophy.html To see prices, precise locations and to register: * for the NY course: http://rstudio-nyc.eventbrite.com/ * for the SF course: http://rstudio-bay.eventbrite.com/ We have limited discounts for students (66% off) and academics (33% off) - please contact j...@rstudio.com for details. To thank the R-help community for being such a great resource, we'd also like to offer all R-help subscribers a 10% discount. Just enter rhelpftw as a promotional code get 10% off! Regards, Hadley PS. Would you like us to offer these courses (or others!) in your area? Please let us know at http://www.rstudio.com/training/workshops/ -- Chief Scientist, RStudio http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Overlay two stat_ecdf() plots
On Apr 16, 2013, at 12:45 AM, PIKAL Petr wrote: Hi Do you mean ecdf? If yes just ose add option in plot. plot(ecdf(rnorm(100, 1,2))) plot(ecdf(rnorm(100, 2,2)), add=TRUE, col=2) If not please specify from where is ecdf_stat or stat_ecdf which, as you indicate, are the same functions. It has the appearance of a ggplot2 function, so I think this student has not yet grasped that there needs to be a call to ggplot to set up the dataframework to which `stat_ecdf` will then be added (with the overloaded + operator) as a layer. Regrdas Petr -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Robin Mjelle Sent: Monday, April 15, 2013 1:10 PM To: r-help@r-project.org Subject: [R] Overlay two stat_ecdf() plots I want to plot two scdf-plots in the same graph. I have two input tables with one column each: Targets - read.table(/media/, sep=, header=T) NonTargets - read.table(/media/..., sep=, header=T) head(Targets) V1 1 3.160514 2 6.701948 3 4.093844 4 1.992014 5 1.604751 6 2.076802 head(NonTargets) V1 1 3.895934 2 1.990506 3 -1.746919 4 -3.451477 5 5.156554 6 1.195109 Targets.m - melt(Targets) head(Targets.m) variablevalue 1 V1 3.160514 2 V1 6.701948 3 V1 4.093844 4 V1 1.992014 5 V1 1.604751 6 V1 2.076802 NonTargets.m - melt(NonTargets) head(NonTargets.m) variable value 1 V1 3.895934 2 V1 1.990506 3 V1 -1.746919 4 V1 -3.451477 5 V1 5.156554 6 V1 1.195109 How do I proceed to plot them in one Graph using ecdf_stat() [[alternative HTML version deleted]] David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] odfWeave: Some questions about potential formatting options
Hi Milan and Max, Thanks to each of you for your reply to my post. Thus far, I've managed to find answers to some of the questions I asked initially. I am now able to control the justification of the leftmost column in my tables, as well as to add borders to the top and bottom. I also downloaded Milan's revised version of odfWeave at the link below, and found that it does a nice job of controlling column widths. http://nalimilan.perso.neuf.fr/transfert/odfWeave.tar.gz There are some other things I'm still struggling with though. 1. Is it possible to get odfTableCaption and odfFigureCaption to make the titles they produce bold? I understand it might be possible to accomplish this by changing something in the styles but am not sure what. If someone can give me a hint, I can likely do the rest. 2. Is there any way to get odfFigureCaption to put titles at the top of the figure instead of the bottom? I've noticed that odfTableCaption is able to do this but apparently not odfFigureCaption. 3. Is it possible to add special characters to the output? Below is a sample Kaplan-Meier analysis. There's a footnote in there that reads Note: X2(1) = xx.xx, p = .. Is there any way to make the X a lowercase Chi and to superscript the 2? I did quite a bit of digging on this topic. It sounds like it might be difficult, especially if one is using Windows as I am. Thanks, Paul ## Get data ## Load packages require(survival) require(MASS) Sample analysis attach(gehan) gehan.surv - survfit(Surv(time, cens) ~ treat, data= gehan, conf.type = log-log) print(gehan.surv) survTable - summary(gehan.surv)$table survTable - data.frame(Treatment = rownames(survTable), survTable, row.names=NULL) survTable - subset(survTable, select = -c(records, n.max)) ## odfWeave ## Load odfWeave require(odfWeave) Modify StyleDefs currentDefs - getStyleDefs() currentDefs$firstColumn$type - Table Column currentDefs$firstColumn$columnWidth - 5 cm currentDefs$secondColumn$type - Table Column currentDefs$secondColumn$columnWidth - 3 cm currentDefs$ArialCenteredBold$fontSize - 10pt currentDefs$ArialNormal$fontSize - 10pt currentDefs$ArialCentered$fontSize - 10pt currentDefs$ArialHighlight$fontSize - 10pt currentDefs$ArialLeftBold - currentDefs$ArialCenteredBold currentDefs$ArialLeftBold$textAlign - left currentDefs$cgroupBorder - currentDefs$lowerBorder currentDefs$cgroupBorder$topBorder - 0.0007in solid #00 setStyleDefs(currentDefs) Modify ImageDefs imageDefs - getImageDefs() imageDefs$dispWidth - 5.5 imageDefs$dispHeight- 5.5 setImageDefs(imageDefs) Modify Styles currentStyles - getStyles() currentStyles$figureFrame - frameWithBorders setStyles(currentStyles) Set odt table styles tableStyles - tableStyles(survTable, useRowNames = FALSE, header = ) tableStyles$headerCell[1,] - cgroupBorder tableStyles$header[,1] - ArialLeftBold tableStyles$text[,1] - ArialNormal tableStyles$cell[2,] - lowerBorder Weave odt source file fp - N:/Studies/HCRPC1211/Report/odfWeaveTest/ inFile - paste(fp, testWeaveIn.odt, sep=) outFile - paste(fp, testWeaveOut.odt, sep=) odfWeave(inFile, outFile) ## Contents of .odt source file ## Here is a sample Kaplan-Meier table. testKMTable, echo=FALSE, results = xml= odfTableCaption(“A Sample Kaplan-Meier Analysis Table”) odfTable(survTable, useRowNames = FALSE, digits = 3, colnames = c(Treatment, Number, Events, Median, 95% LCL, 95% UCL), colStyles = c(firstColumn, secondColumn, secondColumn, secondColumn, secondColumn, secondColumn), styles = tableStyles) odfCat(“Note: X2(1) = xx.xx, p = .”) @ Here is a sample Kaplan-Meier graph. testKMFig, echo=FALSE, fig = TRUE= odfFigureCaption(A Sample Kaplan-Meier Analysis Graph, label = Figure) plot(gehan.surv, xlab = Time, ylab= Survivorship) @ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Strange error with log-normal models
Hi, I have some data, that when plotted looks very close to a log-normal distribution. My goal is to build a regression model to test how this variable responds to several independent variables. To do this, I want to use the fitdistr tool from the MASS package to see how well my data fits the actual distribution, and also build a generalized linear model using the glm command. The summary of my data is: Min. 1st Qu. MedianMean 3rd Qu.Max. 0. 0. 0. 0.8617 0.8332 55.5600 So, no missing values, no negative values. When I try to use the fitdistr command, I get an error that I don't understand: m - fitdistr(y, densfun=lognormal) Error in fitdistr(y, densfun = lognormal) : need positive values to fit a log-Normal When I try to build a simple model, I also get an error: l - glm(y~ x, family=gaussian(link=log)) Error in eval(expr, envir, enclos) : cannot find valid starting values: please specify some Can anyone offer some suggestions? Thanks! -- Noah Silverman, M.S. UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] the joy of spreadsheets (off-topic)
Given that we occasionally run into problems with comparing Excel results to R results, and other spreadsheet-induced errors, I thought this might be of interest. http://www.nextnewdeal.net/rortybomb/researchers-finally-replicated-reinhart-rogoff-and-there-are-serious-problems The punchline: If this error turns out to be an actual mistake Reinhart-Rogoff made, well, all I can hope is that future historians note that one of the core empirical points providing the intellectual foundation for the global move to austerity in the early 2010s was based on someone accidentally not updating a row formula in Excel. Ouch. (Note: I know nothing about the site, the author of the article, or the study in question. I was pointed to it by someone else. But if true: highly problematic.) Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange error with log-normal models
On 16/04/2013 1:19 PM, Noah Silverman wrote: Hi, I have some data, that when plotted looks very close to a log-normal distribution. My goal is to build a regression model to test how this variable responds to several independent variables. To do this, I want to use the fitdistr tool from the MASS package to see how well my data fits the actual distribution, and also build a generalized linear model using the glm command. The summary of my data is: Min. 1st Qu. MedianMean 3rd Qu.Max. 0. 0. 0. 0.8617 0.8332 55.5600 So, no missing values, no negative values. When I try to use the fitdistr command, I get an error that I don't understand: m - fitdistr(y, densfun=lognormal) Error in fitdistr(y, densfun = lognormal) : need positive values to fit a log-Normal You have zeros in your data. The lognormal distribution never takes on the value zero. If they are zero because of rounding (e.g. 0.001 would be recorded as zero), and there aren't too many of them, then replacing the zeros with a small positive value (e.g. half the smallest non-zero value) might make sense. But your median is zero, so at least half of your observations are zero. You need to come up with a better model than lognormal. Duncan Murdoch When I try to build a simple model, I also get an error: l - glm(y~ x, family=gaussian(link=log)) Error in eval(expr, envir, enclos) : cannot find valid starting values: please specify some Can anyone offer some suggestions? Thanks! -- Noah Silverman, M.S. UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] efficiently diff two data frames
Dear all, What is the quickest and most efficient way to diff two data frames, so as to obtain a vector of indices (or logical) for rows/columns that differ in the two data frames? For example, Xe - head(mtcars) Xf - head(mtcars) Xf[2:4,3:5] - 55 all.equal(Xe, Xf) [1] Component 3: Mean relative difference: 0.6863118 [2] Component 4: Mean relative difference: 0.4728435 [3] Component 5: Mean relative difference: 14.23546 I could use all.equal(), but it only returns human readable info that cannot be easily used programmatically. It also gives no info on the rows. Another way would be to: require(prob) setdiff(Xe, Xf) mpg cyl disp hp dratwt qsec vs am gear carb Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 144 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 141 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 031 But again this doesn't return subsetting indices, nor any info on hte columns. Any suggestions on how to approach this? Regards , Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to change the date into an interval of date?
Hi, Please check your dput(). By using your dput() output, I am getting: $patient_id [1] 2 2 2 2 3 3 3 3 $responsed_at [1] 14755 14797 14835 14883 14755 14789 14826 14857 $number [1] 1 2 3 4 1 2 3 4 $score [1] 1 1 2 3 1 5 4 5 $.Names [1] patient_id responsed_at number scores $row.names [1] NA -8 $class [1] data.frame It looks like the image from shared also showed the same output. I am not using RStudio. So, I don't know what is wrong. #the dput should be:dat1- structure(list(patient_id = c(2,2,2,2,3,3,3,3), responsed_at = c(14755,14797,14835,14883,14755,14789,14826,14857), number = c(1,2,3,4,1,2,3,4), score=c(1,1,2,3,1,5,4,5)), .Names = c(patient_id,responsed_at, number, scores),row.names=c(NA,-8L),class = data.frame) dat1 # patient_id responsed_at number scores #1 2 14755 1 1 #2 2 14797 2 1 #3 2 14835 3 2 #4 2 14883 4 3 #5 3 14755 1 1 #6 3 14789 2 5 #7 3 14826 3 4 #8 3 14857 4 5 library(zoo) dat1$responsed_at-as.Date(dat1$responsed_at) dat1 # patient_id responsed_at number scores #1 2 2010-05-26 1 1 #2 2 2010-07-07 2 1 #3 2 2010-08-14 3 2 #4 2 2010-10-01 4 3 #5 3 2010-05-26 1 1 #6 3 2010-06-29 2 5 #7 3 2010-08-05 3 4 #8 3 2010-09-05 4 5 str(dat1) #'data.frame': 8 obs. of 4 variables: # $ patient_id : num 2 2 2 2 3 3 3 3 # $ responsed_at: Date, format: 2010-05-26 2010-07-07 ... # $ number : num 1 2 3 4 1 2 3 4 # $ scores : num 1 1 2 3 1 5 4 5 A.K. From: GUANGUAN LUO guanguan...@gmail.com To: arun smartpink...@yahoo.com Sent: Tuesday, April 16, 2013 10:49 AM Subject: Re: how to change the date into an interval of date? hi, dput(head(data,8)) structure(list(patient_id = c(2,2,2,2,3,3,3,3), responsed_at = c(14755,14797,14835,14883,14755,14789,14826,14857), number = c(1,2,3,4,1,2,3,4), score=c(1,1,2,3,1,5,4,5), .Names = c(patient_id, responsed_at, number, scores), class = data.frame)) like this? i use R studio, there are 4 windows , window of results is the output. 2013/4/16 arun smartpink...@yahoo.com HI, Please dput() your dataset as in my previous reply. This is image and it is twice or thrice the work for me to convert it to readable form. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example Also, you didn't answer to my question: I didn't understand which one is window of results and which is my tables. From: GUANGUAN LUO guanguan...@gmail.com To: smartpink...@yahoo.com Sent: Tuesday, April 16, 2013 10:10 AM Subject: Re: how to change the date into an interval of date? patient_id number response_id session_id responsed_at login clinique_basdai.fatigue 1 2 1 77 2 14755 3002 4 2 2 2 1258 61 14797 3002 5 3 2 3 2743 307 14835 3002 5 4 2 4 4499 562 14883 3002 6 5 2 5 6224 809 14916 3002 4 6 2 6 7708 1024 14949 3002 3 7 2 7 9475 1224 14985 3002 3 8 2 8 11362 1458 15020 3002 4 9 2 9 13417 1688 15055 3002 5 10 2 10 15365 1959 15090 3002 4 11 2 11 17306 2211 15126 3002 5 12 2 12 19073 2449 15160 3002 3 13 2 13 20679 2677 15193 3002 5 14 2 14 22294 2883 15228 3002 5 15 2 15 24097 3082 15265 3002 5 16 2 16 25670 3304 15299 3002 5
[R] Path Diagram
Hi All, Apologies if this has been answered somewhere else, but I have been searching for an answer all day and not been able to find one. I am trying to plot a path diagram for a CFA I have run, I have installed Rgraphviz and run the following: pathDiagram(cfa, min.rank='item1, item2, item3, item4, item5, item6, item7, item8, item9, item10, item11, item12', max.rank='SMP, AAAS', file='documents') I get the following message and output: Running dot -Tpdf -o documents.pdf documents.dot digraph cfa { rankdir=LR; size=8,8; node [fontname=Helvetica fontsize=14 shape=box]; edge [fontname=Helvetica fontsize=10]; center=1; {rank=min item1 item2 item3 item4 item5 item6 item7 item8 item9 item10 item11 item12} {rank=max SMP AAAS} SMP [shape=ellipse] AAAS [shape=ellipse] SMP - item1 [label=smp0]; SMP - item3 [label=smp1]; SMP - item4 [label=smp2]; SMP - item6 [label=smp3]; SMP - item8 [label=smp4]; SMP - item10 [label=smp5]; SMP - item11 [label=smp6]; AAAS - item2 [label=aaas0]; AAAS - item5 [label=aaas1]; AAAS - item7 [label=aaas2]; AAAS - item9 [label=aaas3]; AAAS - item12 [label=aaas4]; } How do I get to see the graph? Many thanks, Laura Laura Thomas PhD Student- Sport and Exercise Psychology Department of Sport and Exercise Penglais Campus Aberystywth University Aberystwyth 01970621947 l...@aber.ac.uk www.aber.ac.uk/en/sport-exercise/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] I don't understand the 'order' function
I thought I've understood the 'order' function, using simple examples like: order(c(5,4,-2)) [1] 3 2 1 However, I arrived to the following example: order(c(2465, 2255, 2085, 1545, 1335, 1210, 920, 210, 210, 505, 1045)) [1] 8 9 10 7 11 6 5 4 3 2 1 and I was completely perplexed! Shouldn't the output vector be 11 10 9 8 7 6 4 1 2 3 5 ? Do I have a damaged version of R? I became still more astonished when I used the sort function and got the right answer: sort(c(2465, 2255, 2085, 1545, 1335, 1210, 920, 210, 210, 505, 1045)) [1] 210 210 505 920 1045 1210 1335 1545 2085 2255 2465 since 'sort' documentation claims to be using 'order' to establish the right order. Please help me to understand all this! Thanks, -Sergio. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] testInstalledBasic / testInstalledPackages
On Apr 16, 2013, at 11:44 AM, Trina Patel trinarpa...@gmail.com wrote: Hi, I installed R 3.0.0 on a Windows 2008 Server. When I submitted the following code in R64, library(tools) testInstalledBasic(scope=devel) I get the following message in the R Console: library(tools) testInstalledBasic(scope=devel) running tests of consistency of as/is.* creating ‘isas-tests.R’ running code in ‘isas-tests.R’ comparing ‘isas-tests.Rout’ to ‘isas-tests.Rout.save’ ...2550a2551 running tests of random deviate generation -- fails occasionally running code in ‘p-r-random-tests.R’ comparing ‘p-r-random-tests.Rout’ to ‘p-r-random-tests.Rout.save’ ... OK running tests of primitives running code in ‘primitives.R’ running regexp regression tests running code in ‘utf8-regex.R’ running tests to possibly trigger segfaults creating ‘no-segfault.R’ running code in ‘no-segfault.R’ Warning message: running command 'diff -bw C:\Users\TRINA_~1\AppData\Local\Temp\Rtmp2FwZXW\Rdiffa1a88562f12b C:\Users\TRINA_~1\AppData\Local\Temp\Rtmp2FwZXW\Rdiffb1a8848c57620' had status 1 When I compare the isas-tests.Rout to isas-tests.Rout.save, as well as the two diff files listed above, it seems that there is one extra empty line in isas-tests.Rout.save. Is there any way to fix this error without modifying the isas-tests.Rout.save file? Next I submitted the following code, testInstalledPackages(scope=base) and got the message below in my R console: testInstalledPackages(scope=base) Testing examples for package ‘base’ Testing examples for package ‘tools’ comparing ‘tools-Ex.Rout’ to ‘tools-Ex.Rout.save’ ... 621c621 [1] 0cce1e42ef3fb133940946534fcf8896 --- [1] eb723b61539feef013de476e68b5c50a When comparing the files tools-ex.rout and tools-ex-rout.save, it seems this difference indicates an error in the md5sums for the file C:\Program Files\R\R-3.0.0\COPYING. Does this indicate a problem with my installation? Looking at the file C:\Program Files\R\R-3.0.0\MD5, leads me to suspect there might be an error in the test itself. Thanks for the help! See: http://cran.r-project.org/doc/manuals/r-release/R-admin.html#Testing-a-Windows-Installation from the R Installation and Administration Manual. Try running: Sys.setenv(LC_COLLATE = C, LANGUAGE = en) before you run the tests. You might also want to have a look at: https://github.com/marcschwartz/R-IQ-OQ Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] I don't understand the 'order' function
Hi Julio, On Tue, Apr 16, 2013 at 1:51 PM, Julio Sergio julioser...@gmail.com wrote: I thought I've understood the 'order' function, using simple examples like: order(c(5,4,-2)) [1] 3 2 1 However, I arrived to the following example: order(c(2465, 2255, 2085, 1545, 1335, 1210, 920, 210, 210, 505, 1045)) [1] 8 9 10 7 11 6 5 4 3 2 1 and I was completely perplexed! Shouldn't the output vector be 11 10 9 8 7 6 4 1 2 3 5 ? Do I have a damaged version of R? Your version of R is fine; your understanding is damaged. :) order() returns the element indices for each position. So in your example, the sorted version of the vector would have element 8 in the first place, element 9 in the second place, and element 1 in the last place. order() is not the same as rank(). See: x - c(2465, 2255, 2085, 1545, 1335, 1210, 920, 210, 210, 505, 1045) order(x) x[order(x)] rank(x) # what you seem to expect Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] I don't understand the 'order' function
Hello, Inline. Em 16-04-2013 18:51, Julio Sergio escreveu: I thought I've understood the 'order' function, using simple examples like: order(c(5,4,-2)) [1] 3 2 1 However, I arrived to the following example: order(c(2465, 2255, 2085, 1545, 1335, 1210, 920, 210, 210, 505, 1045)) [1] 8 9 10 7 11 6 5 4 3 2 1 and I was completely perplexed! Shouldn't the output vector be 11 10 9 8 7 6 4 1 2 3 5 ? No, why should it? Try assigning the output of order and see what happens to the vector. x - c(2465, 2255, 2085, 1545, 1335, 1210, 920, 210, 210, 505, 1045) (o - order(x) ) x[o] # Allright Hope this helps, Rui Barradas Do I have a damaged version of R? I became still more astonished when I used the sort function and got the right answer: sort(c(2465, 2255, 2085, 1545, 1335, 1210, 920, 210, 210, 505, 1045)) [1] 210 210 505 920 1045 1210 1335 1545 2085 2255 2465 since 'sort' documentation claims to be using 'order' to establish the right order. Please help me to understand all this! Thanks, -Sergio. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] I don't understand the 'order' function
Hi, vec1- c(2465, 2255, 2085, 1545, 1335, 1210, 920, 210, 210, 505, 1045) vec1[order(vec1)] #[1] 210 210 505 920 1045 1210 1335 1545 2085 2255 2465 order(vec1) #[1] 8 9 10 7 11 6 5 4 3 2 1 sort(vec1,index.return=TRUE) #$x #[1] 210 210 505 920 1045 1210 1335 1545 2085 2255 2465 #$ix # [1] 8 9 10 7 11 6 5 4 3 2 1 A.K. - Original Message - From: Julio Sergio julioser...@gmail.com To: r-h...@stat.math.ethz.ch Cc: Sent: Tuesday, April 16, 2013 1:51 PM Subject: [R] I don't understand the 'order' function I thought I've understood the 'order' function, using simple examples like: order(c(5,4,-2)) [1] 3 2 1 However, I arrived to the following example: order(c(2465, 2255, 2085, 1545, 1335, 1210, 920, 210, 210, 505, 1045)) [1] 8 9 10 7 11 6 5 4 3 2 1 and I was completely perplexed! Shouldn't the output vector be 11 10 9 8 7 6 4 1 2 3 5 ? Do I have a damaged version of R? I became still more astonished when I used the sort function and got the right answer: sort(c(2465, 2255, 2085, 1545, 1335, 1210, 920, 210, 210, 505, 1045)) [1] 210 210 505 920 1045 1210 1335 1545 2085 2255 2465 since 'sort' documentation claims to be using 'order' to establish the right order. Please help me to understand all this! Thanks, -Sergio. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] I don't understand the 'order' function
On 16/04/2013 1:51 PM, Julio Sergio wrote: I thought I've understood the 'order' function, using simple examples like: order(c(5,4,-2)) [1] 3 2 1 However, I arrived to the following example: order(c(2465, 2255, 2085, 1545, 1335, 1210, 920, 210, 210, 505, 1045)) [1] 8 9 10 7 11 6 5 4 3 2 1 and I was completely perplexed! Shouldn't the output vector be 11 10 9 8 7 6 4 1 2 3 5 ? Do I have a damaged version of R? You are probably confusing order() and rank(). What we want is that x[order(x)] is in increasing order. This is the inverse permutation of what rank(x) gives, so (if there are no ties) rank(x)[order(x)] and order(x)[rank(x)] should both give 1:length(x). Duncan I became still more astonished when I used the sort function and got the right answer: sort(c(2465, 2255, 2085, 1545, 1335, 1210, 920, 210, 210, 505, 1045)) [1] 210 210 505 920 1045 1210 1335 1545 2085 2255 2465 since 'sort' documentation claims to be using 'order' to establish the right order. Please help me to understand all this! Thanks, -Sergio. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the joy of spreadsheets (off-topic)
When in doubt, assume the spreadsheet is wrong. I suggested this to someone have a problem with R vs Excel results a while ago. When I checked back with him -- there was a spreadsheet error. I think a t-shirt with the motto Friends don't let friends use spreadsheets[1] sounds like a good idea. Unfortunately I am not artistic enough to do a design. 1. Slight paraphrase of J. D Cryer's statement http://homepage.cs.uiowa.edu/~jcryer/JSMTalk2001.pdf John Kane Kingston ON Canada -Original Message- From: sarah.gos...@gmail.com Sent: Tue, 16 Apr 2013 13:25:57 -0400 To: r-help@r-project.org Subject: [R] the joy of spreadsheets (off-topic) Given that we occasionally run into problems with comparing Excel results to R results, and other spreadsheet-induced errors, I thought this might be of interest. http://www.nextnewdeal.net/rortybomb/researchers-finally-replicated-reinhart-rogoff-and-there-are-serious-problems The punchline: If this error turns out to be an actual mistake Reinhart-Rogoff made, well, all I can hope is that future historians note that one of the core empirical points providing the intellectual foundation for the global move to austerity in the early 2010s was based on someone accidentally not updating a row formula in Excel. Ouch. (Note: I know nothing about the site, the author of the article, or the study in question. I was pointed to it by someone else. But if true: highly problematic.) Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family! Visit http://www.inbox.com/photosharing to find out more! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 10% off Intro R training from RStudio: NYC May 13-14, SF May 20-21
-Original Message- From: gunter.ber...@gene.com Sent: Tue, 16 Apr 2013 09:43:14 -0700 To: h.wick...@gmail.com Subject: Re: [R] 10% off Intro R training from RStudio: NYC May 13-14, SF May 20-21 Hadley: I don't think this is appropriate. Think of what it would be like if everyone shilled their R training and consulting wares here. They do. John Kane Kingston ON Canada Bert Sent from my iPhone -- please excuse typos. On Apr 16, 2013, at 8:09 AM, Hadley Wickham h.wick...@gmail.com wrote: Hi all, At RStudio, we're hosting our Introduction to R Workshop this May in two locations. As an R-help subscriber, we're offering 10% off! * Intro to data science with R (http://goo.gl/bplg3) May 13-14 New York City * Intro to data science with R (http://goo.gl/VCUFL) May 20-21 San Francisco Bay Area What will you learn? Practical skills for visualizing, transforming, and modeling data in R. During this two-day course, you will learn how to explore and understand data as well as how to do basic programming in R. Our courses incorporate a mix of lectures and hands-on learning. Expect to learn about a topic and then immediately put it into practice with a small example. Plenty of help will be available if you get stuck. You can read more about our training philosophy at http://www.rstudio.com/training/philosophy.html To see prices, precise locations and to register: * for the NY course: http://rstudio-nyc.eventbrite.com/ * for the SF course: http://rstudio-bay.eventbrite.com/ We have limited discounts for students (66% off) and academics (33% off) - please contact j...@rstudio.com for details. To thank the R-help community for being such a great resource, we'd also like to offer all R-help subscribers a 10% discount. Just enter rhelpftw as a promotional code get 10% off! Regards, Hadley PS. Would you like us to offer these courses (or others!) in your area? Please let us know at http://www.rstudio.com/training/workshops/ -- Chief Scientist, RStudio http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] I don't understand the 'order' function
Julio Sergio juliosergio at gmail.com writes: I thought I've understood the 'order' function, using simple examples like: Thanks to you all!... As Sarah said, what was damaged was my understanding ( ;-) )... and as Duncan said, I was confusing 'order' with 'rank', thanks! Now I understand the 'order' function. -Sergio __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 10% off Intro R training from RStudio: NYC May 13-14, SF May 20-21
On Tue, Apr 16, 2013 at 5:43 PM, Bert Gunter gunter.ber...@gene.com wrote: Hadley: I don't think this is appropriate. Think of what it would be like if everyone shilled their R training and consulting wares here. Everyone does, don't they? A search on Nabble shows up regular postings from XLSolutions, Mango used to post (not seen anything in a while) and Revo sneak the odd commercial in Dave Smith's updates. I don't see anything about non-commercial postings being banned from R-help, but they do seem to be against the spirit of R-help. I suspect commercials sneak in under under 'announcements' in the R-help documentation: R-help: The ‘main’ R mailing list, for [...] announcements (not covered by ‘R-announce’ or ‘R-packages’, see above) As with everything R, if it bothers the maintainers, then they'll put a stop to it. We users matter not... Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] I don't understand the 'order' function
[See in-line below[ On 16-Apr-2013 17:51:41 Julio Sergio wrote: I thought I've understood the 'order' function, using simple examples like: order(c(5,4,-2)) [1] 3 2 1 However, I arrived to the following example: order(c(2465, 2255, 2085, 1545, 1335, 1210, 920, 210, 210, 505, 1045)) [1] 8 9 10 7 11 6 5 4 3 2 1 and I was completely perplexed! Shouldn't the output vector be 11 10 9 8 7 6 4 1 2 3 5 ? Do I have a damaged version of R? I think the simplest explanation can be given as: S - c(2465, 2255, 2085, 1545, 1335, 1210, 920, 210, 210, 505, 1045) cbind(Index=1:length(S), S, Order=order(S), Sort=sort(S)) IndexS Order Sort [1,] 1 2465 8 210 [2,] 2 2255 9 210 [3,] 3 208510 505 [4,] 4 1545 7 920 [5,] 5 133511 1045 [6,] 6 1210 6 1210 [7,] 7 920 5 1335 [8,] 8 210 4 1545 [9,] 9 210 3 2085 [10,]10 505 2 2255 [11,]11 1045 1 2465 showing that the value of 'order' for any one of the numbers is the Index (position) of that number in the original series, placed in the position that number occupies in the sorted series. (With a tie for S[8] = S[9] = 210). For example: which one of S occurs in 5th position in the sorted series? It is the 11th of S (1045). I became still more astonished when I used the sort function and got the right answer: sort(c(2465, 2255, 2085, 1545, 1335, 1210, 920, 210, 210, 505, 1045)) [1] 210 210 505 920 1045 1210 1335 1545 2085 2255 2465 since 'sort' documentation claims to be using 'order' to establish the right order. Indeed, once you have order(S), you know which element of S to put in each position of the sorted order: S[order(S)] [1] 210 210 505 920 1045 1210 1335 1545 2085 2255 2465 Does this help to explain it? Ted. Please help me to understand all this! Thanks, -Sergio. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 16-Apr-2013 Time: 19:12:21 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] efficiently diff two data frames
Hello, Maybe Petr Savicky's answer in the link https://stat.ethz.ch/pipermail/r-help/2012-February/304830.html can lead you to what you want. I've changed his function a bit in order to return a logical vector into the rows where different rows return TRUE. setdiffDF2 - function(A, B){ f - function(X, Y) !duplicated(rbind(Y, X))[nrow(Y) + 1:nrow(X)] ix1 - f(A, B) ix2 - f(B, A) ix1 ix2 } ix - setdiffDF2(Xe, Xf) Xe[ix,] Xf[ix,] Note that this gives no information on the columns. Hope this helps, Rui Barradas Em 16-04-2013 18:42, Liviu Andronic escreveu: Dear all, What is the quickest and most efficient way to diff two data frames, so as to obtain a vector of indices (or logical) for rows/columns that differ in the two data frames? For example, Xe - head(mtcars) Xf - head(mtcars) Xf[2:4,3:5] - 55 all.equal(Xe, Xf) [1] Component 3: Mean relative difference: 0.6863118 [2] Component 4: Mean relative difference: 0.4728435 [3] Component 5: Mean relative difference: 14.23546 I could use all.equal(), but it only returns human readable info that cannot be easily used programmatically. It also gives no info on the rows. Another way would be to: require(prob) setdiff(Xe, Xf) mpg cyl disp hp dratwt qsec vs am gear carb Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 144 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 141 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 031 But again this doesn't return subsetting indices, nor any info on hte columns. Any suggestions on how to approach this? Regards , Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 10% off Intro R training from RStudio: NYC May 13-14, SF May 20-21
Hi Bert: given what Hadley and Rstudio have provided to the R-community, what's the big deal of letting people know about a class. It's the ideal place to send the notice. and yes, as Barry and John said, every other commercial entity does send to the R-list. Mark On Tue, Apr 16, 2013 at 2:11 PM, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: On Tue, Apr 16, 2013 at 5:43 PM, Bert Gunter gunter.ber...@gene.com wrote: Hadley: I don't think this is appropriate. Think of what it would be like if everyone shilled their R training and consulting wares here. Everyone does, don't they? A search on Nabble shows up regular postings from XLSolutions, Mango used to post (not seen anything in a while) and Revo sneak the odd commercial in Dave Smith's updates. I don't see anything about non-commercial postings being banned from R-help, but they do seem to be against the spirit of R-help. I suspect commercials sneak in under under 'announcements' in the R-help documentation: R-help: The main R mailing list, for [...] announcements (not covered by R-announce or R-packages, see above) As with everything R, if it bothers the maintainers, then they'll put a stop to it. We users matter not... Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the joy of spreadsheets (off-topic)
What a terrific article. Thanks for sharing! The more we critically examine how research is actually done the more frightened we become. Frank -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 10% off Intro R training from RStudio: NYC May 13-14, SF May 20-21
Hi Bert, We are following the mailing list guidelines to the best of our knowledge (e.g. http://r.789695.n4.nabble.com/R-development-master-class-NYC-Dec-12-13-td4037031.html#a4038699). It's our belief (as shared by others) that advertising our courses falls under the general aegis of helping people learn R. Our goal is for RStudio to be a net positive to R the community. We support the R foundation, R user groups, do a lot of teaching for free, and develop a lot of open-source software like the RStudio IDE, shiny, ggplot2 and devtools. Public courses help fuel our development and hence benefit the R community. Hadley On Tue, Apr 16, 2013 at 11:43 AM, Bert Gunter gunter.ber...@gene.com wrote: Hadley: I don't think this is appropriate. Think of what it would be like if everyone shilled their R training and consulting wares here. Bert Sent from my iPhone -- please excuse typos. On Apr 16, 2013, at 8:09 AM, Hadley Wickham h.wick...@gmail.com wrote: Hi all, At RStudio, we're hosting our Introduction to R Workshop this May in two locations. As an R-help subscriber, we're offering 10% off! * Intro to data science with R (http://goo.gl/bplg3) May 13-14 New York City * Intro to data science with R (http://goo.gl/VCUFL) May 20-21 San Francisco Bay Area What will you learn? Practical skills for visualizing, transforming, and modeling data in R. During this two-day course, you will learn how to explore and understand data as well as how to do basic programming in R. Our courses incorporate a mix of lectures and hands-on learning. Expect to learn about a topic and then immediately put it into practice with a small example. Plenty of help will be available if you get stuck. You can read more about our training philosophy at http://www.rstudio.com/training/philosophy.html To see prices, precise locations and to register: * for the NY course: http://rstudio-nyc.eventbrite.com/ * for the SF course: http://rstudio-bay.eventbrite.com/ We have limited discounts for students (66% off) and academics (33% off) - please contact j...@rstudio.com for details. To thank the R-help community for being such a great resource, we'd also like to offer all R-help subscribers a 10% discount. Just enter rhelpftw as a promotional code get 10% off! Regards, Hadley PS. Would you like us to offer these courses (or others!) in your area? Please let us know at http://www.rstudio.com/training/workshops/ -- Chief Scientist, RStudio http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chief Scientist, RStudio http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] avoid losing data.frame attributes on cbind()
Dear all, How should I add several variables to a data frame without losing the attributes of the df? Consider the following: require(Hmisc) Xa - iris label(Xa, self=T) - Some df label str(Xa) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels setosa,versicolor,..: 1 1 1 1 1 1 1 1 1 1 ... - attr(*, label)= chr Some df label Xb - round(iris[,1:2]) names(Xb) - c(var1,'var2') Xc - cbind(Xa, Xb) #the attribute is now gone str(Xc) 'data.frame': 150 obs. of 7 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels setosa,versicolor,..: 1 1 1 1 1 1 1 1 1 1 ... $ var1: num 5 5 5 5 5 5 5 5 4 5 ... $ var2: num 4 3 3 3 4 4 3 3 3 3 ... In such cases, when I want to plug some variables from 2nd df into the 1st df, how should I proceed without losing the attributes of the 1st data frame. And, if possible, I'm looking for something nicer than: for(i in names(Xb)) Xa[ , i] - Xb[ , i] Regards, Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the joy of spreadsheets (off-topic)
I tend to live in fear that some spreadsheet calculating a drug dose for me will use my telephone number rather than my weight. John Kane Kingston ON Canada -Original Message- From: f.harr...@vanderbilt.edu Sent: Tue, 16 Apr 2013 13:20:46 -0500 To: r-h...@stat.math.ethz.ch Subject: Re: [R] the joy of spreadsheets (off-topic) What a terrific article. Thanks for sharing! The more we critically examine how research is actually done the more frightened we become. Frank -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks orcas on your desktop! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Model ranking (AICc, BIC, QIC) with coxme regression
Hi, I'm actually trying to rank a set of candidate models with an information criterion (AICc, QIC, BIC). The problem I have is that I use mixed-effect cox regression only available with the package {coxme} (see the example below). #Model1 spring.cox - coxme (Surv(start, stop, Real_rand) ~ strata(Paired)+R4+R3+R2+(R3|Individual), spring) I've already found some explications in this forum to adjust QIC on coxph object (see the following lines, thanks to M. Basille), but it doesn't work on coxme... QIC - function(mod, ...) UseMethod(QIC) QIC.coxph - function(spring.cox, details = FALSE) { trace - sum(diag(solve(spring.cox$naive.var) %*% spring.cox$var)) quasi - spring.cox$loglik[2] return(-2*quasi + 2*trace) } The only thing that I can't find in the coxme output to use these previous commands is the naive.var, that we can obtain in coxph regression by specifying robust=TRUE in the argument list: spring.cox - coxph (Surv(start, stop, Real_rand) ~ strata(Paired)+R4+R3+R2, spring, robust=T) But coxph doesn't allow inclusion of interaction between two random variables (R3|Individual), and it's why I have to use coxme. I found a new update in R-forge to improve {coxme} (r-forge.r-project.org/scm/viewvc.php/pkg/R/dredge.R?view=logroot=mumin), but I did not understand all it works and I'm not sure it fixes my problems... Is there someone that can help me with that? Rémi Lesmerises, biol. M.Sc., Candidat Ph.D. en Biologie Université du Québec à Rimouski 300, allée des Ursulines Rimouski, Qc., G5L 3A1 Tél.: 1 800 511-3382 #1241 remilesmeri...@yahoo.ca [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] I don't understand the 'order' function
I think Duncan said that order and rank were inverses (if there are no ties). order() has period 2 so order(order(x)) is also rank(x) if there are no ties. E.g., data.frame(x, o1=order(x), o2=order(order(x)), o3=order(order(order(x))), o4=order(order(order(order(x, rank=rank(x)) x o1 o2 o3 o4 rank 1 2465 8 11 8 11 11.0 2 2255 9 10 9 10 10.0 3 2085 10 9 10 9 9.0 4 1545 7 8 7 8 8.0 5 1335 11 7 11 7 7.0 6 1210 6 6 6 6 6.0 7 920 5 4 5 4 4.0 8 210 4 1 4 1 1.5 9 210 3 2 3 2 1.5 10 505 2 3 2 3 3.0 11 1045 1 5 1 5 5.0 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Julio Sergio Sent: Tuesday, April 16, 2013 11:10 AM To: r-h...@stat.math.ethz.ch Subject: Re: [R] I don't understand the 'order' function Julio Sergio juliosergio at gmail.com writes: I thought I've understood the 'order' function, using simple examples like: Thanks to you all!... As Sarah said, what was damaged was my understanding ( ;-) )... and as Duncan said, I was confusing 'order' with 'rank', thanks! Now I understand the 'order' function. -Sergio __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] need help with R
Of course but you should carefully read the guidelines (see bottom of post and it is a good idea to read Reproducibility https://github.com/hadley/devtools/wiki/Reproducibility http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for some useful suggestions on how to describe a problem and lay out your code. Welcome John Kane Kingston ON Canada -Original Message- From: sam.ting...@gmail.com Sent: Mon, 15 Apr 2013 18:09:25 +1200 To: r-help@r-project.org Subject: [R] need help with R hey there can i email questions to this address to get help with using R thanks sam [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] avoid losing data.frame attributes on cbind()
HI, Not sure if this helps: library(plyr) res-mutate(Xa,var1=round(Sepal.Length),var2=round(Sepal.Width)) str(res) #'data.frame': 150 obs. of 7 variables: # $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... # $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... # $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... # $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... # $ Species : Factor w/ 3 levels setosa,versicolor,..: 1 1 1 1 1 1 1 1 1 1 ... # $ var1 : num 5 5 5 5 5 5 5 5 4 5 ... # $ var2 : num 4 3 3 3 4 4 3 3 3 3 ... #- attr(*, label)= chr Some df label A.K. - Original Message - From: Liviu Andronic landronim...@gmail.com To: r-help r-h...@stat.math.ethz.ch Cc: Sent: Tuesday, April 16, 2013 2:24 PM Subject: [R] avoid losing data.frame attributes on cbind() Dear all, How should I add several variables to a data frame without losing the attributes of the df? Consider the following: require(Hmisc) Xa - iris label(Xa, self=T) - Some df label str(Xa) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels setosa,versicolor,..: 1 1 1 1 1 1 1 1 1 1 ... - attr(*, label)= chr Some df label Xb - round(iris[,1:2]) names(Xb) - c(var1,'var2') Xc - cbind(Xa, Xb) #the attribute is now gone str(Xc) 'data.frame': 150 obs. of 7 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels setosa,versicolor,..: 1 1 1 1 1 1 1 1 1 1 ... $ var1 : num 5 5 5 5 5 5 5 5 4 5 ... $ var2 : num 4 3 3 3 4 4 3 3 3 3 ... In such cases, when I want to plug some variables from 2nd df into the 1st df, how should I proceed without losing the attributes of the 1st data frame. And, if possible, I'm looking for something nicer than: for(i in names(Xb)) Xa[ , i] - Xb[ , i] Regards, Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Path Diagram
Dear Laura, This works for me. Is dot on your system path? Best, John --- John Fox Senator McMaster Professor of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Laura Thomas Sent: Tuesday, April 16, 2013 1:17 PM To: r-help@r-project.org Subject: [R] Path Diagram Hi All, Apologies if this has been answered somewhere else, but I have been searching for an answer all day and not been able to find one. I am trying to plot a path diagram for a CFA I have run, I have installed Rgraphviz and run the following: pathDiagram(cfa, min.rank='item1, item2, item3, item4, item5, item6, item7, item8, item9, item10, item11, item12', max.rank='SMP, AAAS', file='documents') I get the following message and output: Running dot -Tpdf -o documents.pdf documents.dot digraph cfa { rankdir=LR; size=8,8; node [fontname=Helvetica fontsize=14 shape=box]; edge [fontname=Helvetica fontsize=10]; center=1; {rank=min item1 item2 item3 item4 item5 item6 item7 item8 item9 item10 item11 item12} {rank=max SMP AAAS} SMP [shape=ellipse] AAAS [shape=ellipse] SMP - item1 [label=smp0]; SMP - item3 [label=smp1]; SMP - item4 [label=smp2]; SMP - item6 [label=smp3]; SMP - item8 [label=smp4]; SMP - item10 [label=smp5]; SMP - item11 [label=smp6]; AAAS - item2 [label=aaas0]; AAAS - item5 [label=aaas1]; AAAS - item7 [label=aaas2]; AAAS - item9 [label=aaas3]; AAAS - item12 [label=aaas4]; } How do I get to see the graph? Many thanks, Laura Laura Thomas PhD Student- Sport and Exercise Psychology Department of Sport and Exercise Penglais Campus Aberystywth University Aberystwyth 01970621947 l...@aber.ac.uk www.aber.ac.uk/en/sport-exercise/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with handling of attributes in xmlToList in XML package
Are my emails getting through? 2013/4/14 santiago gil sg.c...@gmail.com: Hello all, I have a problem with the way attributes are dealt with in the function xmlToList(), and I haven't been able to figure it out for days now. Say I have a document (produced by nmap) like this: mydoc - 'host starttime=1365204834 endtime=1365205860status state=up reason=echo-reply reason_ttl=127/ address addr=XXX.XXX.XXX.XXX addrtype=ipv4/ portsport protocol=tcp portid=135state state=open reason=syn-ack reason_ttl=127/service name=msrpc product=Microsoft Windows RPC ostype=Windows method=probed conf=10cpecpe:/o:microsoft:windows/cpe/service/port port protocol=tcp portid=139state state=open reason=syn-ack reason_ttl=127/service name=netbios-ssn method=probed conf=10//port /ports times srtt=647 rttvar=71 to=10/ /host' I want to store this as a list of lists, so I do: mytree-xmlTreeParse(mydoc) myroot-xmlRoot(mytree) mylist-xmlToList(myroot) Now my problem is that when I want to fetch the attributes of the services running of each port, the behavior is not consistent: mylist[[ports]][[1]][[service]]$.attrs[name] name msrpc mylist[[ports]][[2]][[service]]$.attrs[name] Error in trash_list[[ports]][[2]][[service]]$.attrs : $ operator is invalid for atomic vectors I understand that the way they are dfined in the documnt is not the same, but I think there still should be a consistent behavior. I've tried many combination of parameters for xmlTreeParse() but nothing has helped me. I can't find a way to call up the name of the service consistently regardless of whether the node has children or not. Any tips? All the best, S.G. -- --- http://barabasilab.neu.edu/people/gil/ -- --- http://barabasilab.neu.edu/people/gil/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 10% off Intro R training from RStudio: NYC May 13-14, SF May 20-21
On Apr 16, 2013, at 12:43, Bert Gunter gunter.ber...@gene.com wrote: Hadley: I don't think this is appropriate. Think of what it would be like if everyone shilled their R training and consulting wares here. Echoing others, this seems an accepted practice on the lists, endorsed at least in one instance by Peter Dalgaard: https://stat.ethz.ch/pipermail/r-help/2011-November/295496.html. Similarly, Dirk has sent brief announcements for Rcpp training on the Rcpp list: http://permalink.gmane.org/gmane.comp.lang.r.rcpp/2334 I believe on R-SIG-HPC as well, but I don't have a link handy. List moderators will -- and have -- stepped in when it gets spammy: http://comments.gmane.org/gmane.comp.lang.r.hpc/1338 Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with handling of attributes in xmlToList in XML package
Yes. This is the third such copy. You can view them all in the Archive, starting with the first one: https://stat.ethz.ch/pipermail/r-help/2013-April/351504.html On Apr 16, 2013, at 11:49 AM, santiago gil wrote: Are my emails getting through? 2013/4/14 santiago gil sg.c...@gmail.com: Hello all, I have a problem with the way attributes are dealt with in the function xmlToList(), and I haven't been able to figure it out for days now. Say I have a document (produced by nmap) like this: mydoc - 'host starttime=1365204834 endtime=1365205860status state=up reason=echo-reply reason_ttl=127/ address addr=XXX.XXX.XXX.XXX addrtype=ipv4/ portsport protocol=tcp portid=135state state=open reason=syn-ack reason_ttl=127/service name=msrpc product=Microsoft Windows RPC ostype=Windows method=probed conf=10cpecpe:/o:microsoft:windows/cpe/service/port port protocol=tcp portid=139state state=open reason=syn-ack reason_ttl=127/service name=netbios-ssn method=probed conf=10//port /ports times srtt=647 rttvar=71 to=10/ /host' I want to store this as a list of lists, so I do: mytree-xmlTreeParse(mydoc) myroot-xmlRoot(mytree) mylist-xmlToList(myroot) Now my problem is that when I want to fetch the attributes of the services running of each port, the behavior is not consistent: mylist[[ports]][[1]][[service]]$.attrs[name] name msrpc mylist[[ports]][[2]][[service]]$.attrs[name] Error in trash_list[[ports]][[2]][[service]]$.attrs : $ operator is invalid for atomic vectors I understand that the way they are dfined in the documnt is not the same, but I think there still should be a consistent behavior. I've tried many combination of parameters for xmlTreeParse() but nothing has helped me. I can't find a way to call up the name of the service consistently regardless of whether the node has children or not. Any tips? All the best, S.G. -- --- http://barabasilab.neu.edu/people/gil/ -- --- http://barabasilab.neu.edu/people/gil/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with handling of attributes in xmlToList in XML package
Hi, Santiago: Yes, your e-mail has been received. I'm sorry, I can't solve your question. Regards. Eva --- El mar, 16/4/13, santiago gil sg.c...@gmail.com escribió: De: santiago gil sg.c...@gmail.com Asunto: Re: [R] Problem with handling of attributes in xmlToList in XML package Para: r-help@r-project.org Fecha: martes, 16 de abril, 2013 20:49 Are my emails getting through? 2013/4/14 santiago gil sg.c...@gmail.com: Hello all, I have a problem with the way attributes are dealt with in the function xmlToList(), and I haven't been able to figure it out for days now. Say I have a document (produced by nmap) like this: mydoc - 'host starttime=1365204834 endtime=1365205860status state=up reason=echo-reply reason_ttl=127/ address addr=XXX.XXX.XXX.XXX addrtype=ipv4/ portsport protocol=tcp portid=135state state=open reason=syn-ack reason_ttl=127/service name=msrpc product=Microsoft Windows RPC ostype=Windows method=probed conf=10cpecpe:/o:microsoft:windows/cpe/service/port port protocol=tcp portid=139state state=open reason=syn-ack reason_ttl=127/service name=netbios-ssn method=probed conf=10//port /ports times srtt=647 rttvar=71 to=10/ /host' I want to store this as a list of lists, so I do: mytree-xmlTreeParse(mydoc) myroot-xmlRoot(mytree) mylist-xmlToList(myroot) Now my problem is that when I want to fetch the attributes of the services running of each port, the behavior is not consistent: mylist[[ports]][[1]][[service]]$.attrs[name] name msrpc mylist[[ports]][[2]][[service]]$.attrs[name] Error in trash_list[[ports]][[2]][[service]]$.attrs : $ operator is invalid for atomic vectors I understand that the way they are dfined in the documnt is not the same, but I think there still should be a consistent behavior. I've tried many combination of parameters for xmlTreeParse() but nothing has helped me. I can't find a way to call up the name of the service consistently regardless of whether the node has children or not. Any tips? All the best, S.G. -- --- http://barabasilab.neu.edu/people/gil/ -- --- http://barabasilab.neu.edu/people/gil/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange error with log-normal models
On Wed, Apr 17, 2013 at 5:19 AM, Noah Silverman noahsilver...@ucla.eduwrote: Hi, I have some data, that when plotted looks very close to a log-normal distribution. My goal is to build a regression model to test how this variable responds to several independent variables. [snip] When I try to build a simple model, I also get an error: l - glm(y~ x, family=gaussian(link=log)) Error in eval(expr, envir, enclos) : cannot find valid starting values: please specify some Duncan has described the problems with the lognormal. I will just point out that this 'simple model' is not lognormal. It is a model with normal errors and log link, ie. y ~ N(mu, sigma^2) log(mu) = x \beta -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting blank cells to NAs
On Apr 16, 2013, at 6:38 AM, arun wrote: Hi, I am not sure about the problem. If your non-numeric vector is like: a,b,,d,e,,f vec1-unlist(str_split(readLines(textConnection(a,b,,d,e,,f)),,)) vec1[vec1==]- NA vec1 #[1] a b NA d e NA f If this doesn't work, please provide an example vector. A.K. Thanks for the response. That seems to do the trick as far replacing the empty cells with NA, however, the problem remains that the vector is not numeric. This was the reason I wanted to replace the empty cells with NAs in the first place. Forcing the vector with as.numeric afterwards doesn't seem to work either, I get nonsensical results. In R there are actully multiple version of NA and in hte case of character objects the reserved name is `NA_character_` , . not NA. You can also use `is.na-` #Method: `is.na-` vec - sample(c(letters[1:5], ), 20, repl=TRUE) vec [1] d a b d b c d b b e c a a is.na(vec) - vec== vec [1] d a b NA NA NA d b c d NA b b e NA c NA NA a a --- Method: assign NA_character_ vec - sample(c(letters[1:5], ), 20, repl=TRUE) vec [1] e c e b e c a d b d d d b d e e a vec[vec==] - NA_character_ vec [1] e c e b e c a d b d d NA d b d NA e e a NA -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Singular design matrix in rq
Quantreggers: I'm trying to run rq() on a dataset I posted at: https://docs.google.com/file/d/0B8Kij67bij_ASUpfcmJ4LTFEUUk/edit?usp=sharing (it's a 1500kb csv file named singular.csv) and am getting the following error: mydata - read.csv(singular.csv) fit_spl - rq(raw_data[,1] ~ bs(raw_data[,i],df=15),tau=1) Error in rq.fit.br(x, y, tau = tau, ...) : Singular design matrix Any ideas what might be causing this or, more importantly, suggestions for how to solve this? I'm just trying to fit a smoothed hull to the top of the data cloud (hence the large df). Thanks! --jonathan -- Jonathan A. Greenberg, PhD Assistant Professor Global Environmental Analysis and Remote Sensing (GEARS) Laboratory Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 217-300-1924 http://www.geog.illinois.edu/~jgrn/ AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with handling of attributes in xmlToList in XML package
Hi, On Apr 16, 2013, at 2:49 PM, santiago gil wrote: 2013/4/14 santiago gil sg.c...@gmail.com: Hello all, I have a problem with the way attributes are dealt with in the function xmlToList(), and I haven't been able to figure it out for days now. I have not used xmlToList(), but I find what you try below works if you specify useInternalNodes = TRUE in your invocation of xmlTreeParse. Often that is the solution for many issues with xml. Also, I have found it best to write a relatively generic getter style function. So, in the example below I have written a function called getPortAttr - it will get attributes for the child node you name. I used your example as the defaults: service is the child to query and name is the attribute to retrieve from that child. It's a heck of a lot easier to write a function than building the longish parse strings with lots of [[this]][[and]][[that]] stuff, and it is reusable to boot. Cheers, Ben library(XML) mydoc - 'host starttime=1365204834 endtime=1365205860 status state=up reason=echo-reply reason_ttl=127/ address addr=XXX.XXX.XXX.XXX addrtype=ipv4/ ports port protocol=tcp portid=135 state state=open reason=syn-ack reason_ttl=127/ service name=msrpc product=Microsoft Windows RPC ostype=Windows method=probed conf=10 cpecpe:/o:microsoft:windows/cpe /service /port port protocol=tcp portid=139 state state=open reason=syn-ack reason_ttl=127/ service name=netbios-ssn method=probed conf=10/ /port /ports times srtt=647 rttvar=71 to=10/ /host' mytree-xmlTreeParse(mydoc, useInternalNodes = TRUE) myroot-xmlRoot(mytree) myports - myroot[[ports]][port] getPortAttr - function(x, child = service, attr = name) { kid - x[[child]] att - xmlAttrs(kid)[[attr]] att } portNames - sapply(myports, getPortAttr) # portNames # port port # msrpc netbios-ssn portReason - sapply(myports, getPortAttr, child = state, attr = reason) # portReason # port port #syn-ack syn-ack Say I have a document (produced by nmap) like this: mydoc - 'host starttime=1365204834 endtime=1365205860status state=up reason=echo-reply reason_ttl=127/ address addr=XXX.XXX.XXX.XXX addrtype=ipv4/ portsport protocol=tcp portid=135state state=open reason=syn-ack reason_ttl=127/service name=msrpc product=Microsoft Windows RPC ostype=Windows method=probed conf=10cpecpe:/o:microsoft:windows/cpe/service/port port protocol=tcp portid=139state state=open reason=syn-ack reason_ttl=127/service name=netbios-ssn method=probed conf=10//port /ports times srtt=647 rttvar=71 to=10/ /host' I want to store this as a list of lists, so I do: mytree-xmlTreeParse(mydoc) myroot-xmlRoot(mytree) mylist-xmlToList(myroot) Now my problem is that when I want to fetch the attributes of the services running of each port, the behavior is not consistent: mylist[[ports]][[1]][[service]]$.attrs[name] name msrpc mylist[[ports]][[2]][[service]]$.attrs[name] Error in trash_list[[ports]][[2]][[service]]$.attrs : $ operator is invalid for atomic vectors I understand that the way they are dfined in the documnt is not the same, but I think there still should be a consistent behavior. I've tried many combination of parameters for xmlTreeParse() but nothing has helped me. I can't find a way to call up the name of the service consistently regardless of whether the node has children or not. Any tips? All the best, S.G. -- --- http://barabasilab.neu.edu/people/gil/ -- --- http://barabasilab.neu.edu/people/gil/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Ben Tupper Bigelow Laboratory for Ocean Sciences 60 Bigelow Drive, P.O. Box 380 East Boothbay, Maine 04544 http://www.bigelow.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange error with log-normal models
@Duncan, You make a very good point. Somehow I overlooked that 0 is not positive. I guess that rules out the log normal model. My challenge here is finding the right model for this data. Originally it was a nice count of students. Relatively easy to model with a zero inflated Poisson model. The resulting residuals seemed reasonable. However, I was then instructed to change the count of students to a rate which was calculated as students / population (Each school has its own population.)) This is now no longer a count variable, but a proportion between 0 and 1. This rate (students/population) is no longer Poisson, but is certainly not normal either. So, I'm a bit lost as to the appropriate distribution to represent it. Any thoughts? -- Noah Silverman, M.S. UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095 On Apr 16, 2013, at 12:44 PM, Thomas Lumley tlum...@uw.edu wrote: On Wed, Apr 17, 2013 at 5:19 AM, Noah Silverman noahsilver...@ucla.edu wrote: Hi, I have some data, that when plotted looks very close to a log-normal distribution. My goal is to build a regression model to test how this variable responds to several independent variables. [snip] When I try to build a simple model, I also get an error: l - glm(y~ x, family=gaussian(link=log)) Error in eval(expr, envir, enclos) : cannot find valid starting values: please specify some Duncan has described the problems with the lognormal. I will just point out that this 'simple model' is not lognormal. It is a model with normal errors and log link, ie. y ~ N(mu, sigma^2) log(mu) = x \beta -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R process slow down after a amount of time
Le 16/04/13 15:52, Chris82 a écrit : Hi R users, I have mentioned that R is getting slower if a process with a loop runs for a while. Is that normal? Let's say, I have a code which produce an output file after one loop run. Now after 10, 15 or 20 loop runs the time between the created files is stongly increasing. Is there maybe any data which fill some memory? Chris I try your idea but I don't find anyt time difference. Could you more explicit ? Marc # loop time ## tm - rep(Sys.time(), 1000) k - 1 for (i in 1:1e7) { if (i%%1==0) { tm[k] - Sys.time() k - k+1 } } plot(1:999, diff(tm), bty=n, type=l, ylim=c(0, 0.05)) -- __ Marc Girondot, Pr Laboratoire Ecologie, Systématique et Evolution Equipe de Conservation des Populations et des Communautés CNRS, AgroParisTech et Université Paris-Sud 11 , UMR 8079 Bâtiment 362 91405 Orsay Cedex, France Tel: 33 1 (0)1.69.15.72.30 Fax: 33 1 (0)1.69.15.73.53 e-mail: marc.giron...@u-psud.fr Web: http://www.ese.u-psud.fr/epc/conservation/Marc.html Skype: girondot __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange error with log-normal models
Noah, You might want to look at beta regression, using the betareg package on CRAN. There is a JSS paper here that you might find helpful: http://www.jstatsoft.org/v34/i02/paper along with the vignettes for the package: http://cran.r-project.org/web/packages/betareg/vignettes/betareg.pdf http://cran.r-project.org/web/packages/betareg/vignettes/betareg-ext.pdf Regards, Marc Schwartz On Apr 16, 2013, at 3:20 PM, Noah Silverman noahsilver...@ucla.edu wrote: @Duncan, You make a very good point. Somehow I overlooked that 0 is not positive. I guess that rules out the log normal model. My challenge here is finding the right model for this data. Originally it was a nice count of students. Relatively easy to model with a zero inflated Poisson model. The resulting residuals seemed reasonable. However, I was then instructed to change the count of students to a rate which was calculated as students / population (Each school has its own population.)) This is now no longer a count variable, but a proportion between 0 and 1. This rate (students/population) is no longer Poisson, but is certainly not normal either. So, I'm a bit lost as to the appropriate distribution to represent it. Any thoughts? -- Noah Silverman, M.S. UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095 On Apr 16, 2013, at 12:44 PM, Thomas Lumley tlum...@uw.edu wrote: On Wed, Apr 17, 2013 at 5:19 AM, Noah Silverman noahsilver...@ucla.edu wrote: Hi, I have some data, that when plotted looks very close to a log-normal distribution. My goal is to build a regression model to test how this variable responds to several independent variables. [snip] When I try to build a simple model, I also get an error: l - glm(y~ x, family=gaussian(link=log)) Error in eval(expr, envir, enclos) : cannot find valid starting values: please specify some Duncan has described the problems with the lognormal. I will just point out that this 'simple model' is not lognormal. It is a model with normal errors and log link, ie. y ~ N(mu, sigma^2) log(mu) = x \beta -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] avoid losing data.frame attributes on cbind()
Hi, Another method would be: Xc- Xa Xc$var1-NA; Xc$var2- NA Xc[]- append(as.list(Xa),as.list(Xb)) str(Xc) #'data.frame': 150 obs. of 7 variables: # $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... # $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... # $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... # $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... # $ Species : Factor w/ 3 levels setosa,versicolor,..: 1 1 1 1 1 1 1 1 1 1 ... # $ var1 : num 5 5 5 5 5 5 5 5 4 5 ... # $ var2 : num 4 3 3 3 4 4 3 3 3 3 ... # - attr(*, label)= chr Some df label A.K. - Original Message - From: arun smartpink...@yahoo.com To: Liviu Andronic landronim...@gmail.com Cc: R help r-help@r-project.org Sent: Tuesday, April 16, 2013 2:40 PM Subject: Re: [R] avoid losing data.frame attributes on cbind() HI, Not sure if this helps: library(plyr) res-mutate(Xa,var1=round(Sepal.Length),var2=round(Sepal.Width)) str(res) #'data.frame': 150 obs. of 7 variables: # $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... # $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... # $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... # $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... # $ Species : Factor w/ 3 levels setosa,versicolor,..: 1 1 1 1 1 1 1 1 1 1 ... # $ var1 : num 5 5 5 5 5 5 5 5 4 5 ... # $ var2 : num 4 3 3 3 4 4 3 3 3 3 ... #- attr(*, label)= chr Some df label A.K. - Original Message - From: Liviu Andronic landronim...@gmail.com To: r-help r-h...@stat.math.ethz.ch Cc: Sent: Tuesday, April 16, 2013 2:24 PM Subject: [R] avoid losing data.frame attributes on cbind() Dear all, How should I add several variables to a data frame without losing the attributes of the df? Consider the following: require(Hmisc) Xa - iris label(Xa, self=T) - Some df label str(Xa) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels setosa,versicolor,..: 1 1 1 1 1 1 1 1 1 1 ... - attr(*, label)= chr Some df label Xb - round(iris[,1:2]) names(Xb) - c(var1,'var2') Xc - cbind(Xa, Xb) #the attribute is now gone str(Xc) 'data.frame': 150 obs. of 7 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels setosa,versicolor,..: 1 1 1 1 1 1 1 1 1 1 ... $ var1 : num 5 5 5 5 5 5 5 5 4 5 ... $ var2 : num 4 3 3 3 4 4 3 3 3 3 ... In such cases, when I want to plug some variables from 2nd df into the 1st df, how should I proceed without losing the attributes of the 1st data frame. And, if possible, I'm looking for something nicer than: for(i in names(Xb)) Xa[ , i] - Xb[ , i] Regards, Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] avoid losing data.frame attributes on cbind()
Just to add: Xc[]- append(Xa,Xb) #should also work str(Xc) #'data.frame': 150 obs. of 7 variables: # $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... # $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... # $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... # $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... # $ Species : Factor w/ 3 levels setosa,versicolor,..: 1 1 1 1 1 1 1 1 1 1 ... # $ var1 : num 5 5 5 5 5 5 5 5 4 5 ... # $ var2 : num 4 3 3 3 4 4 3 3 3 3 ... # - attr(*, label)= chr Some df label A.K. - Original Message - From: arun smartpink...@yahoo.com To: Liviu Andronic landronim...@gmail.com Cc: R help r-help@r-project.org Sent: Tuesday, April 16, 2013 4:45 PM Subject: Re: [R] avoid losing data.frame attributes on cbind() Hi, Another method would be: Xc- Xa Xc$var1-NA; Xc$var2- NA Xc[]- append(as.list(Xa),as.list(Xb)) str(Xc) #'data.frame': 150 obs. of 7 variables: # $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... # $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... # $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... # $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... # $ Species : Factor w/ 3 levels setosa,versicolor,..: 1 1 1 1 1 1 1 1 1 1 ... # $ var1 : num 5 5 5 5 5 5 5 5 4 5 ... # $ var2 : num 4 3 3 3 4 4 3 3 3 3 ... # - attr(*, label)= chr Some df label A.K. - Original Message - From: arun smartpink...@yahoo.com To: Liviu Andronic landronim...@gmail.com Cc: R help r-help@r-project.org Sent: Tuesday, April 16, 2013 2:40 PM Subject: Re: [R] avoid losing data.frame attributes on cbind() HI, Not sure if this helps: library(plyr) res-mutate(Xa,var1=round(Sepal.Length),var2=round(Sepal.Width)) str(res) #'data.frame': 150 obs. of 7 variables: # $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... # $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... # $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... # $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... # $ Species : Factor w/ 3 levels setosa,versicolor,..: 1 1 1 1 1 1 1 1 1 1 ... # $ var1 : num 5 5 5 5 5 5 5 5 4 5 ... # $ var2 : num 4 3 3 3 4 4 3 3 3 3 ... #- attr(*, label)= chr Some df label A.K. - Original Message - From: Liviu Andronic landronim...@gmail.com To: r-help r-h...@stat.math.ethz.ch Cc: Sent: Tuesday, April 16, 2013 2:24 PM Subject: [R] avoid losing data.frame attributes on cbind() Dear all, How should I add several variables to a data frame without losing the attributes of the df? Consider the following: require(Hmisc) Xa - iris label(Xa, self=T) - Some df label str(Xa) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels setosa,versicolor,..: 1 1 1 1 1 1 1 1 1 1 ... - attr(*, label)= chr Some df label Xb - round(iris[,1:2]) names(Xb) - c(var1,'var2') Xc - cbind(Xa, Xb) #the attribute is now gone str(Xc) 'data.frame': 150 obs. of 7 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels setosa,versicolor,..: 1 1 1 1 1 1 1 1 1 1 ... $ var1 : num 5 5 5 5 5 5 5 5 4 5 ... $ var2 : num 4 3 3 3 4 4 3 3 3 3 ... In such cases, when I want to plug some variables from 2nd df into the 1st df, how should I proceed without losing the attributes of the 1st data frame. And, if possible, I'm looking for something nicer than: for(i in names(Xb)) Xa[ , i] - Xb[ , i] Regards, Liviu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange error with log-normal models
On Apr 16, 2013, at 22:20 , Noah Silverman wrote: @Duncan, You make a very good point. Somehow I overlooked that 0 is not positive. I guess that rules out the log normal model. My challenge here is finding the right model for this data. Originally it was a nice count of students. Relatively easy to model with a zero inflated Poisson model. The resulting residuals seemed reasonable. However, I was then instructed to change the count of students to a rate which was calculated as students / population (Each school has its own population.)) This is now no longer a count variable, but a proportion between 0 and 1. This rate (students/population) is no longer Poisson, but is certainly not normal either. So, I'm a bit lost as to the appropriate distribution to represent it. Any thoughts? Off the cuff: Could it be more natural to model as a ZIP with log(pop) as an offset on the log-lambda scale? -- Noah Silverman, M.S. UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095 On Apr 16, 2013, at 12:44 PM, Thomas Lumley tlum...@uw.edu wrote: On Wed, Apr 17, 2013 at 5:19 AM, Noah Silverman noahsilver...@ucla.edu wrote: Hi, I have some data, that when plotted looks very close to a log-normal distribution. My goal is to build a regression model to test how this variable responds to several independent variables. [snip] When I try to build a simple model, I also get an error: l - glm(y~ x, family=gaussian(link=log)) Error in eval(expr, envir, enclos) : cannot find valid starting values: please specify some Duncan has described the problems with the lognormal. I will just point out that this 'simple model' is not lognormal. It is a model with normal errors and log link, ie. y ~ N(mu, sigma^2) log(mu) = x \beta -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] testInstalledBasic / testInstalledPackages
Hi Marc, Thank you for the links to all the resources, I will be sure to review them in detail. As for running, Sys.setenv(LC_COLLATE = C, LANGUAGE = en) I'm sorry that I forgot to mention that I did set the above enviornmental variables as specified. Both within R, as suggested in your email, and also by adding them as system enviornmental variables (as required on our Windows 2008 Server environment). Thanks again, Trina Patel On Tue, Apr 16, 2013 at 10:52 AM, Marc Schwartz marc_schwa...@me.com wrote: On Apr 16, 2013, at 11:44 AM, Trina Patel trinarpa...@gmail.com wrote: Hi, I installed R 3.0.0 on a Windows 2008 Server. When I submitted the following code in R64, library(tools) testInstalledBasic(scope=devel) I get the following message in the R Console: library(tools) testInstalledBasic(scope=devel) running tests of consistency of as/is.* creating ‘isas-tests.R’ running code in ‘isas-tests.R’ comparing ‘isas-tests.Rout’ to ‘isas-tests.Rout.save’ ...2550a2551 running tests of random deviate generation -- fails occasionally running code in ‘p-r-random-tests.R’ comparing ‘p-r-random-tests.Rout’ to ‘p-r-random-tests.Rout.save’ ... OK running tests of primitives running code in ‘primitives.R’ running regexp regression tests running code in ‘utf8-regex.R’ running tests to possibly trigger segfaults creating ‘no-segfault.R’ running code in ‘no-segfault.R’ Warning message: running command 'diff -bw C:\Users\TRINA_~1\AppData\Local\Temp\Rtmp2FwZXW\Rdiffa1a88562f12b C:\Users\TRINA_~1\AppData\Local\Temp\Rtmp2FwZXW\Rdiffb1a8848c57620' had status 1 When I compare the isas-tests.Rout to isas-tests.Rout.save, as well as the two diff files listed above, it seems that there is one extra empty line in isas-tests.Rout.save. Is there any way to fix this error without modifying the isas-tests.Rout.save file? Next I submitted the following code, testInstalledPackages(scope=base) and got the message below in my R console: testInstalledPackages(scope=base) Testing examples for package ‘base’ Testing examples for package ‘tools’ comparing ‘tools-Ex.Rout’ to ‘tools-Ex.Rout.save’ ... 621c621 [1] 0cce1e42ef3fb133940946534fcf8896 --- [1] eb723b61539feef013de476e68b5c50a When comparing the files tools-ex.rout and tools-ex-rout.save, it seems this difference indicates an error in the md5sums for the file C:\Program Files\R\R-3.0.0\COPYING. Does this indicate a problem with my installation? Looking at the file C:\Program Files\R\R-3.0.0\MD5, leads me to suspect there might be an error in the test itself. Thanks for the help! See: http://cran.r-project.org/doc/manuals/r-release/R-admin.html#Testing-a-Windows-Installation from the R Installation and Administration Manual. Try running: Sys.setenv(LC_COLLATE = C, LANGUAGE = en) before you run the tests. You might also want to have a look at: https://github.com/marcschwartz/R-IQ-OQ Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the joy of spreadsheets (off-topic)
On 04/17/2013 03:25 AM, Sarah Goslee wrote: ... Ouch. (Note: I know nothing about the site, the author of the article, or the study in question. I was pointed to it by someone else. But if true: highly problematic.) Sarah There seem to be three major problems described here, and only one is marginally related to Excel (and similar spreadsheets). Cherry picking data is all too common. Almost anyone who reviews papers for publication will have encountered it, and there are excellent books describing examples that have had great influence on public policy. Similarly, applying obscure and sometimes inappropriate statistical methods that produce the desired results when nothing else will appears with depressing frequency. The final point does relate to Excel and any application that hides what is going on to the casual observer. I will treasure this URL to give to anyone who chastises my moaning when I have to perform some task in Excel. It is not an error in the application (although these certainly exist) but a salutory caution to those who think that if a reasonable looking number appears in a cell, it must be the correct answer. I have found not one, but two such errors in the simple calculation of a birthday age from the date of birth and date of death. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Understanding why a GAM can't suppress an intercept
Dear List, I've just tried to specify a GAM without an intercept -- I've got one of the (rare) cases where it is appropriate for E(y) - 0 as X -0. Naively running a GAM with the -1 appended to the formula and the calling predict.gam, I see that the model isn't behaving as expected. I don't understand why this would be. Google turns up this old R help thread: http://r.789695.n4.nabble.com/GAM-without-intercept-td4645786.html Simon writes: *Smooth terms are constrained to sum to zero over the covariate values. ** **This is an identifiability constraint designed to avoid confounding with ** **the intercept (particularly important if you have more than one smooth). * If you remove the intercept from you model altogether (m2) then the smooth will still sum to zero over the covariate values, which in your case will mean that the smooth is quite a long way from the data. When you include the intercept (m1) then the intercept is effectively shifting the constrained curve up towards the data, and you get a nice fit. Why? I haven't read Simon's book in great detail, though I have read Ruppert et al.'s Semiparametric Regression. I don't see a reason why a penalized spline model shouldn't equal the intercept (or zero) when all of the regressors equals zero. Is anyone able to help with a bit of intuition? Or relevant passages from a good description of why this would be the case? Furthermore, why does the -1 formula specification work if it doesn't work as intended by for example lm? Many thanks, Andrew [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Understanding why a GAM can't have an intercept
Dear List, I've just tried to specify a GAM without an intercept -- I've got one of the (rare) cases where it is appropriate for E(y) - 0 as X -0. Naively running a GAM with the -1 appended to the formula and the calling predict.gam, I see that the model isn't behaving as expected. I don't understand why this would be. Google turns up this old R help thread: http://r.789695.n4.nabble.com/GAM-without-intercept-td4645786.html Simon writes: *Smooth terms are constrained to sum to zero over the covariate values. ** **This is an identifiability constraint designed to avoid confounding with ** **the intercept (particularly important if you have more than one smooth). * If you remove the intercept from you model altogether (m2) then the smooth will still sum to zero over the covariate values, which in your case will mean that the smooth is quite a long way from the data. When you include the intercept (m1) then the intercept is effectively shifting the constrained curve up towards the data, and you get a nice fit. Why? I haven't read Simon's book in great detail, though I have read Ruppert et al.'s Semiparametric Regression. I don't see a reason why a penalized spline model shouldn't equal the intercept (or zero) when all of the regressors equals zero. Is anyone able to help with a bit of intuition? Or relevant passages from a good description of why this would be the case? Furthermore, why does the -1 formula specification work if it doesn't work as intended by for example lm? Many thanks, Andrew [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] I don't understand the 'order' function
William Dunlap wdunlap at tibco.com writes: I think Duncan said that order and rank were inverses (if there are no ties). order() has period 2 so order(order(x)) is also rank(x) if there are no ties. E.g., Thanks William! This is very interesting. So, applying order two times I can have a rank index for each element. Thanks, -Sergio. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Understanding why a GAM can't have an intercept
please deleter this thread -- wrong title On 04/16/2013 02:35 PM, Andrew Crane-Droesch wrote: Dear List, I've just tried to specify a GAM without an intercept -- I've got one of the (rare) cases where it is appropriate for E(y) - 0 as X -0. Naively running a GAM with the -1 appended to the formula and the calling predict.gam, I see that the model isn't behaving as expected. I don't understand why this would be. Google turns up this old R help thread: http://r.789695.n4.nabble.com/GAM-without-intercept-td4645786.html Simon writes: *Smooth terms are constrained to sum to zero over the covariate values. ** **This is an identifiability constraint designed to avoid confounding with ** **the intercept (particularly important if you have more than one smooth). * If you remove the intercept from you model altogether (m2) then the smooth will still sum to zero over the covariate values, which in your case will mean that the smooth is quite a long way from the data. When you include the intercept (m1) then the intercept is effectively shifting the constrained curve up towards the data, and you get a nice fit. Why? I haven't read Simon's book in great detail, though I have read Ruppert et al.'s Semiparametric Regression. I don't see a reason why a penalized spline model shouldn't equal the intercept (or zero) when all of the regressors equals zero. Is anyone able to help with a bit of intuition? Or relevant passages from a good description of why this would be the case? Furthermore, why does the -1 formula specification work if it doesn't work as intended by for example lm? Many thanks, Andrew [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Singular design matrix in rq
Have you looked at the result of bs(raw_data[,i], df=15) ? If there are not many unique values in the input there will be a lot of NaN's in the output (because there are repeated knots) and those NaN's will cause rq() to give that message. E.g., d - data.frame(y=sin(1:100), x4=rep(1:4,each=25), x50=rep(1:50,each=2)) rq(data=d, y ~ bs(x4, df=15), tau=.8) # using x50 works Error in rq.fit.br(x, y, tau = tau, ...) : Singular design matrix with(d, bs(x4, df=15)) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 [1,] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 [2,] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 ... [98,] 0 0 0 0 0 0 0 0 0 0 0 NaN NaN NaN NaN [99,] 0 0 0 0 0 0 0 0 0 0 0 NaN NaN NaN NaN [100,] 0 0 0 0 0 0 0 0 0 0 0 NaN NaN NaN NaN attr(,degree) [1] 3 attr(,knots) 7.692308% 15.38462% 23.07692% 30.76923% 38.46154% 1 1 1 2 2 46.15385% 53.84615% 61.53846% 69.23077% 76.92308% 2 3 3 3 4 84.61538% 92.30769% 4 4 attr(,Boundary.knots) [1] 1 4 attr(,intercept) [1] FALSE attr(,class) [1] bs basis matrix Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jonathan Greenberg Sent: Tuesday, April 16, 2013 12:58 PM To: r-help; Roger Koenker Subject: [R] Singular design matrix in rq Quantreggers: I'm trying to run rq() on a dataset I posted at: https://docs.google.com/file/d/0B8Kij67bij_ASUpfcmJ4LTFEUUk/edit?usp=sharing (it's a 1500kb csv file named singular.csv) and am getting the following error: mydata - read.csv(singular.csv) fit_spl - rq(raw_data[,1] ~ bs(raw_data[,i],df=15),tau=1) Error in rq.fit.br(x, y, tau = tau, ...) : Singular design matrix Any ideas what might be causing this or, more importantly, suggestions for how to solve this? I'm just trying to fit a smoothed hull to the top of the data cloud (hence the large df). Thanks! --jonathan -- Jonathan A. Greenberg, PhD Assistant Professor Global Environmental Analysis and Remote Sensing (GEARS) Laboratory Department of Geography and Geographic Information Science University of Illinois at Urbana-Champaign 607 South Mathews Avenue, MC 150 Urbana, IL 61801 Phone: 217-300-1924 http://www.geog.illinois.edu/~jgrn/ AIM: jgrn307, MSN: jgrn...@hotmail.com, Gchat: jgrn307, Skype: jgrn3007 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plot 2 y axis
Hi, I want to plot two variables on the same graph but with two y axis just like what you can do in Excel. I searched online that seems like you can not achieve that in ggplot. So is there anyway I can do it in a nice way in basic plot? Suppose my data looks like this: WeightHeight Date 0.1 0.31 0.2 0.42 0.3 0.83 0.6 1 4 I want to haveDateas X axis ,Weight as the left y axis and Height as the right y axis. Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with handling of attributes in xmlToList in XML package
I apologize for the multiple posting then, it's just that I received those emails saying that my post was awaiting approval and more than four days went by without news. Sorry for the lack of patience. Thank you very much, Ben. Indeed that's how I've been doing it so far, but I have accrued too many reasons not to work with the XML object any more and move all my coding to a list formulation. I wonder what you mean with [...] but I find what you try below works if you specify useInternalNodes = TRUE in your invocation of xmlTreeParse Actually, the output error that I included happens when I use useInternalNodes=T (my bad). If I use useInternalNodes=F I get mylist[[ports]][[2]][[service]]$.attrs[name] NULL The useInternalNodes clause has proven fatally dangerous for me before. If I parse a tree with useInternalNodes=T, save the workspace, close R and reopen it, load the workspace and try to read the tree, it will completely crash my computer, which has already cost me too many lost days of work. On the other hand, useInternalNodes=F will result in any xml operation being ridiculously slow. So the intention was to move everything to a more R-friendly object like a list. Any tips? Best, Santiago 2013/4/16 Ben Tupper btup...@bigelow.org: Hi, On Apr 16, 2013, at 2:49 PM, santiago gil wrote: 2013/4/14 santiago gil sg.c...@gmail.com: Hello all, I have a problem with the way attributes are dealt with in the function xmlToList(), and I haven't been able to figure it out for days now. I have not used xmlToList(), but I find what you try below works if you specify useInternalNodes = TRUE in your invocation of xmlTreeParse. Often that is the solution for many issues with xml. Also, I have found it best to write a relatively generic getter style function. So, in the example below I have written a function called getPortAttr - it will get attributes for the child node you name. I used your example as the defaults: service is the child to query and name is the attribute to retrieve from that child. It's a heck of a lot easier to write a function than building the longish parse strings with lots of [[this]][[and]][[that]] stuff, and it is reusable to boot. Cheers, Ben library(XML) mydoc - 'host starttime=1365204834 endtime=1365205860 status state=up reason=echo-reply reason_ttl=127/ address addr=XXX.XXX.XXX.XXX addrtype=ipv4/ ports port protocol=tcp portid=135 state state=open reason=syn-ack reason_ttl=127/ service name=msrpc product=Microsoft Windows RPC ostype=Windows method=probed conf=10 cpecpe:/o:microsoft:windows/cpe /service /port port protocol=tcp portid=139 state state=open reason=syn-ack reason_ttl=127/ service name=netbios-ssn method=probed conf=10/ /port /ports times srtt=647 rttvar=71 to=10/ /host' mytree-xmlTreeParse(mydoc, useInternalNodes = TRUE) myroot-xmlRoot(mytree) myports - myroot[[ports]][port] getPortAttr - function(x, child = service, attr = name) { kid - x[[child]] att - xmlAttrs(kid)[[attr]] att } portNames - sapply(myports, getPortAttr) # portNames # port port # msrpc netbios-ssn portReason - sapply(myports, getPortAttr, child = state, attr = reason) # portReason # port port #syn-ack syn-ack Say I have a document (produced by nmap) like this: mydoc - 'host starttime=1365204834 endtime=1365205860status state=up reason=echo-reply reason_ttl=127/ address addr=XXX.XXX.XXX.XXX addrtype=ipv4/ portsport protocol=tcp portid=135state state=open reason=syn-ack reason_ttl=127/service name=msrpc product=Microsoft Windows RPC ostype=Windows method=probed conf=10cpecpe:/o:microsoft:windows/cpe/service/port port protocol=tcp portid=139state state=open reason=syn-ack reason_ttl=127/service name=netbios-ssn method=probed conf=10//port /ports times srtt=647 rttvar=71 to=10/ /host' I want to store this as a list of lists, so I do: mytree-xmlTreeParse(mydoc) myroot-xmlRoot(mytree) mylist-xmlToList(myroot) Now my problem is that when I want to fetch the attributes of the services running of each port, the behavior is not consistent: mylist[[ports]][[1]][[service]]$.attrs[name] name msrpc mylist[[ports]][[2]][[service]]$.attrs[name] Error in trash_list[[ports]][[2]][[service]]$.attrs : $ operator is invalid for atomic vectors I understand that the way they are dfined in the documnt is not the same, but I think there still should be a consistent behavior. I've tried many combination of parameters for xmlTreeParse() but nothing has helped me. I can't find a way to call up the name of the service consistently regardless of whether the node has children or not. Any tips? All the best, S.G. -- --- http://barabasilab.neu.edu/people/gil/ --
Re: [R] plot 2 y axis
On 04/17/2013 08:35 AM, Ye Lin wrote: Hi, I want to plot two variables on the same graph but with two y axis just like what you can do in Excel. I searched online that seems like you can not achieve that in ggplot. So is there anyway I can do it in a nice way in basic plot? Suppose my data looks like this: WeightHeight Date 0.1 0.31 0.2 0.42 0.3 0.83 0.6 1 4 I want to haveDateas X axis ,Weight as the left y axis and Height as the right y axis. Hi Ye lin, Try this (yldat is your data above as a data frame): library(plotrix) twoord.plot(yldat$Date,yldat$Height,yldat$Weight, lylim=c(0,1.04),rylim=c(0,0.61), xtickpos=1:4,xticklab=1:4) That should get you started. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange error with log-normal models
peter dalgaard pdalgd at gmail.com writes: On Apr 16, 2013, at 22:20 , Noah Silverman wrote: My challenge here is finding the right model for this data. Originally it was a nice count of students. Relatively easy to model with a zero inflated Poisson model. The resulting residuals seemed reasonable. [snip] Off the cuff: Could it be more natural to model as a ZIP with log(pop) as an offset on the log-lambda scale? I agree. This was cross-posted to StackOverflow (broken URL: http://stackoverflow.com/questions/16046726/ regression-for-a-rate-variable-in-r ), where I made that suggestion. I don't know that cross-posting to r-help lists and StackOverflow is anywhere expressly forbidden (cross-posting *among* r lists is ruled out in the Posting Guide), but I'd prefer people didn't (because of this kind of wasted/duplicated effort). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Q-Q Plot for comparing two unequal data sets
Hello All, Would anyone be able to help me understand how R computes a quantile-quantile plot for comparing two data samples with unequal sample sizes? Normally, the procedure should be to rearrange the larger data sample into n equally-spaced parts using interpolation, where n is the sample size of the smaller sample, and then plot the matching data pairs. I tried using different plotting position formulas for the interpolation but cannot reproduce what R is plotting. Thanks in advance. Regards Janh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] failed to download vegan
Hello, This is Elaine. I am using R 3.0 to download package vegan but failed. The warning message is package vegan successfully unpacked and MD5 sums checked Warning: unable to move temporary installation C:\Users\elaine\Documents\R\win-library\3.0\file16c82da53b1b\vegan to C:\Users\elaine\Documents\R\win-library\3.0\vegan I cannot find the folder \file16c82da53b1b\ below C:\Users\elaine\Documents\R\win-library\3.0 Please kindly help how to download or move the vegan to the folder it should be in. Thank you very much Elaine [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Q-Q Plot for comparing two unequal data sets
On Apr 16, 2013, at 20:12, Janh Anni annij...@gmail.com wrote: Hello All, Would anyone be able to help me understand how R computes a quantile-quantile plot for comparing two data samples with unequal sample sizes? Normally, the procedure should be to rearrange the larger data sample into n equally-spaced parts using interpolation, where n is the sample size of the smaller sample, and then plot the matching data pairs. I tried using different plotting position formulas for the interpolation but cannot reproduce what R is plotting. Thanks in advance. If you type qqplot at the prompt you'll be given the code and can review it for yourself. It's also available online for your viewing pleasure. http://svn.r-project.org/R/trunk/src/library/stats/R/qqplot.R It seems the key is the approx (linear interpolation) function, but you can work out the details. Michael Regards Janh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] failed to download vegan
Hello All, I manually moved the vegan.zip to C:\Users\elaine\Documents\R\ win-library\3.0\vegan. Then unzipping the file. It worked to require vegan Elaine On Wed, Apr 17, 2013 at 8:56 AM, Elaine Kuo elaine.kuo...@gmail.com wrote: Hello, This is Elaine. I am using R 3.0 to download package vegan but failed. The warning message is package vegan successfully unpacked and MD5 sums checked Warning: unable to move temporary installation C:\Users\elaine\Documents\R\win-library\3.0\file16c82da53b1b\vegan to C:\Users\elaine\Documents\R\win-library\3.0\vegan I cannot find the folder \file16c82da53b1b\ below C:\Users\elaine\Documents\R\win-library\3.0 Please kindly help how to download or move the vegan to the folder it should be in. Thank you very much Elaine [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with handling of attributes in xmlToList in XML package
Hi, On Apr 16, 2013, at 6:39 PM, santiago gil wrote: Thank you very much, Ben. Indeed that's how I've been doing it so far, but I have accrued too many reasons not to work with the XML object any more and move all my coding to a list formulation. I wonder what you mean with [...] but I find what you try below works if you specify useInternalNodes = TRUE in your invocation of xmlTreeParse Actually, the output error that I included happens when I use useInternalNodes=T (my bad). My bad right back at you. It doesn't work here now (and didn't before I guess). I can't explain why xmlToList splits the two nodes so differently. That's another good reason for me to shy away from it. If I use useInternalNodes=F I get mylist[[ports]][[2]][[service]]$.attrs[name] NULL The useInternalNodes clause has proven fatally dangerous for me before. If I parse a tree with useInternalNodes=T, save the workspace, close R and reopen it, load the workspace and try to read the tree, it will completely crash my computer, which has already cost me too many lost days of work. On the other hand, useInternalNodes=F will result in any xml operation being ridiculously slow. So the intention was to move everything to a more R-friendly object like a list. My experience with the XML package seems to be quite different from yours regarding useInternalNodes = TRUE/FALSE. I get satisfactory and stable performance with useInternalNodes = TRUE, so your experience is very puzzling to me. I never save workspaces - heck, I'm not sure what XML does with the external pointers in that case. Can you save an address and expect to get the same address later? Instead I save the xml formed data using saveXML which dumps to a nicely formed text file.. I guess I'm not much help! You might want to contact the maintainer of XML with a small example, such as the one you posted. He has been very responsive and help to me in the past. Cheers, Ben Best, Santiago 2013/4/16 Ben Tupper btup...@bigelow.org: Hi, On Apr 16, 2013, at 2:49 PM, santiago gil wrote: 2013/4/14 santiago gil sg.c...@gmail.com: Hello all, I have a problem with the way attributes are dealt with in the function xmlToList(), and I haven't been able to figure it out for days now. I have not used xmlToList(), but I find what you try below works if you specify useInternalNodes = TRUE in your invocation of xmlTreeParse. Often that is the solution for many issues with xml. Also, I have found it best to write a relatively generic getter style function. So, in the example below I have written a function called getPortAttr - it will get attributes for the child node you name. I used your example as the defaults: service is the child to query and name is the attribute to retrieve from that child. It's a heck of a lot easier to write a function than building the longish parse strings with lots of [[this]][[and]][[that]] stuff, and it is reusable to boot. Cheers, Ben library(XML) mydoc - 'host starttime=1365204834 endtime=1365205860 status state=up reason=echo-reply reason_ttl=127/ address addr=XXX.XXX.XXX.XXX addrtype=ipv4/ ports port protocol=tcp portid=135 state state=open reason=syn-ack reason_ttl=127/ service name=msrpc product=Microsoft Windows RPC ostype=Windows method=probed conf=10 cpecpe:/o:microsoft:windows/cpe /service /port port protocol=tcp portid=139 state state=open reason=syn-ack reason_ttl=127/ service name=netbios-ssn method=probed conf=10/ /port /ports times srtt=647 rttvar=71 to=10/ /host' mytree-xmlTreeParse(mydoc, useInternalNodes = TRUE) myroot-xmlRoot(mytree) myports - myroot[[ports]][port] getPortAttr - function(x, child = service, attr = name) { kid - x[[child]] att - xmlAttrs(kid)[[attr]] att } portNames - sapply(myports, getPortAttr) # portNames # port port # msrpc netbios-ssn portReason - sapply(myports, getPortAttr, child = state, attr = reason) # portReason # port port #syn-ack syn-ack Say I have a document (produced by nmap) like this: mydoc - 'host starttime=1365204834 endtime=1365205860status state=up reason=echo-reply reason_ttl=127/ address addr=XXX.XXX.XXX.XXX addrtype=ipv4/ portsport protocol=tcp portid=135state state=open reason=syn-ack reason_ttl=127/service name=msrpc product=Microsoft Windows RPC ostype=Windows method=probed conf=10cpecpe:/o:microsoft:windows/cpe/service/port port protocol=tcp portid=139state state=open reason=syn-ack reason_ttl=127/service name=netbios-ssn method=probed conf=10//port /ports times srtt=647 rttvar=71 to=10/ /host' I want to store this as a list of lists, so I do: mytree-xmlTreeParse(mydoc) myroot-xmlRoot(mytree) mylist-xmlToList(myroot) Now my problem is that when I want to fetch the attributes of the services running of each port, the behavior is not
Re: [R] Q-Q Plot for comparing two unequal data sets
Hello Michael, Thanks for that information. Regards Janh On Tue, Apr 16, 2013 at 9:13 PM, Michael Weylandt michael.weyla...@gmail.com wrote: On Apr 16, 2013, at 20:12, Janh Anni annij...@gmail.com wrote: Hello All, Would anyone be able to help me understand how R computes a quantile-quantile plot for comparing two data samples with unequal sample sizes? Normally, the procedure should be to rearrange the larger data sample into n equally-spaced parts using interpolation, where n is the sample size of the smaller sample, and then plot the matching data pairs. I tried using different plotting position formulas for the interpolation but cannot reproduce what R is plotting. Thanks in advance. If you type qqplot at the prompt you'll be given the code and can review it for yourself. It's also available online for your viewing pleasure. http://svn.r-project.org/R/trunk/src/library/stats/R/qqplot.R It seems the key is the approx (linear interpolation) function, but you can work out the details. Michael Regards Janh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R question
HI Philippos, Try this: dat1- read.csv(Validation_data_set3.csv,sep=,,stringsAsFactors=FALSE) #converted to csv str(dat1) #'data.frame': 12573 obs. of 17 variables: # $ Removed.AGC : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.SST : chr 46.1658 41.2566 14.0931 ... # $ Removed.Kurtosis : num NA NA NA NA 5.38 ... # $ Removed.Skewness : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.QC17999 : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.QC16200 : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.SST.AGC : chr 46.1658 41.2566 14.0931 ... # $ Removed.Kurtosis.Skewness : num NA NA NA NA 5.38 ... # $ Removed.AGC.QC16200 : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.AGC.QC17999 : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.AGC.QC17999.3.stdevs : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.AGC.QC17999.less.than.1 : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.SST.AGC.QC17999 : chr 46.1658 41.2566 14.0931 ... # $ Removed.SST.AGC.QC16200 : chr 46.1658 41.2566 14.0931 ... # $ Removed.SST.AGC.Kurtosis.Skewness : chr ... # $ Removed.SST.AGC.Kurtosis.Skewness.QC17999: chr ... # $ Removed.SST.AGC.Kurtosis.Skewness.QC16200: chr ... #Found these characters in columns that are not numeric do.call(rbind,lapply(dat1,function(x) {x1- x[is.character(x)];x1[grepl(\\#,x1)]})) # [,1] [,2] [,3] #Removed.SST #DIV/0! #DIV/0! #DIV/0! #Removed.SST.AGC #DIV/0! #DIV/0! #DIV/0! #Removed.SST.AGC.QC17999 #DIV/0! #DIV/0! #DIV/0! #Removed.SST.AGC.QC16200 #DIV/0! #DIV/0! #DIV/0! #Removed.SST.AGC.Kurtosis.Skewness #DIV/0! #DIV/0! #DIV/0! #Removed.SST.AGC.Kurtosis.Skewness.QC17999 #DIV/0! #DIV/0! #DIV/0! #Removed.SST.AGC.Kurtosis.Skewness.QC16200 #DIV/0! #DIV/0! #DIV/0! # [,4] #Removed.SST #DIV/0! #Removed.SST.AGC #DIV/0! #Removed.SST.AGC.QC17999 #DIV/0! #Removed.SST.AGC.QC16200 #DIV/0! #Removed.SST.AGC.Kurtosis.Skewness #DIV/0! #Removed.SST.AGC.Kurtosis.Skewness.QC17999 #DIV/0! #Removed.SST.AGC.Kurtosis.Skewness.QC16200 #DIV/0! dat2-as.data.frame(sapply(dat1,function(x) { x[is.character(x)][grep(\\#,x[is.character(x)])]- NA; x1- as.numeric(x)})) str(dat2) #'data.frame': 12573 obs. of 17 variables: # $ Removed.AGC : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.SST : num NA 46.17 41.26 14.09 5.38 ... # $ Removed.Kurtosis : num NA NA NA NA 5.38 ... # $ Removed.Skewness : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.QC17999 : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.QC16200 : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.SST.AGC : num NA 46.17 41.26 14.09 5.38 ... # $ Removed.Kurtosis.Skewness : num NA NA NA NA 5.38 ... # $ Removed.AGC.QC16200 : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.AGC.QC17999 : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.AGC.QC17999.3.stdevs : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.AGC.QC17999.less.than.1 : num 65.67 46.17 41.26 14.09 5.38 ... # $ Removed.SST.AGC.QC17999 : num NA 46.17 41.26 14.09 5.38 ... # $ Removed.SST.AGC.QC16200 : num NA 46.17 41.26 14.09 5.38 ... # $ Removed.SST.AGC.Kurtosis.Skewness : num NA NA NA NA 5.38 ... # $ Removed.SST.AGC.Kurtosis.Skewness.QC17999: num NA NA NA NA 5.38 ... # $ Removed.SST.AGC.Kurtosis.Skewness.QC16200: num NA NA NA NA 5.38 ... head(dat2,3) # Removed.AGC Removed.SST Removed.Kurtosis Removed.Skewness Removed.QC17999 #1 65.6738 NA NA 65.6738 65.6738 #2 46.1658 46.1658 NA 46.1658 46.1658 #3 41.2566 41.2566 NA 41.2566 41.2566 # Removed.QC16200 Removed.SST.AGC Removed.Kurtosis.Skewness Removed.AGC.QC16200 #1 65.6738 NA NA 65.6738 #2 46.1658 46.1658 NA 46.1658 #3 41.2566 41.2566 NA 41.2566 # Removed.AGC.QC17999 Removed.AGC.QC17999.3.stdevs #1 65.6738 65.6738 #2 46.1658 46.1658 #3 41.2566
[R] Unsubscribe please
Verstuurd vanaf mijn iPad Bert Verleysen 00 32 (0)477 874 272 www.beverconsult.be __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Change the default resolution for plotting figures?
Hi, I want to save a plot in the windows device as png and the default resolution is 72dpi. Is it possible to increase the default resolution to for example 300 dpi? I have thought of using function png(..., res=300), but the problem is that the figure produced this way looks different than the one shown in the windows device. One notable difference is the missing of some ticks in the x axis. Therefore I would rather to produce the figure in a window device and then save it as a png. Unfortunately in the device window there is no such an option to change the resolution. Little information can be found so far. Any ideas are appreciated! Best, Jing __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] normalizePath
maybe something wrong with your R_LIBS(it should be R_LIBS=dir/R- 3.0.0/lib64/) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Change the default resolution for plotting figures?
I have been using the following so far without having any problems: dev.copy(png,sample.png,width=8, height=10, units=in,res=500) dev.off() On Tue, Apr 16, 2013 at 6:32 PM, jt...@mappi.helsinki.fi wrote: Hi, I want to save a plot in the windows device as png and the default resolution is 72dpi. Is it possible to increase the default resolution to for example 300 dpi? I have thought of using function png(..., res=300), but the problem is that the figure produced this way looks different than the one shown in the windows device. One notable difference is the missing of some ticks in the x axis. Therefore I would rather to produce the figure in a window device and then save it as a png. Unfortunately in the device window there is no such an option to change the resolution. Little information can be found so far. Any ideas are appreciated! Best, Jing __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unsubscribe please
Hi, Do it yourself: https://stat.ethz.ch/mailman/listinfo/r-help Hint: Bbottom of the page (To unsubscribe from R-help) Regards, Pascal On 04/17/2013 06:33 AM, Bert Verleysen (beverconsult) wrote: Verstuurd vanaf mijn iPad Bert Verleysen 00 32 (0)477 874 272 www.beverconsult.be __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge
Hi Farnoosh, YOu can use either ?merge() or ?join() DataA- read.table(text= ID v1 1 10 2 1 3 22 4 15 5 3 6 6 7 8 ,sep=,header=TRUE) DataB- read.table(text= ID v2 2 yes 5 no 7 yes ,sep=,header=TRUE,stringsAsFactors=FALSE) merge(DataA,DataB,by=ID,all.x=TRUE) # ID v1 v2 #1 1 10 NA #2 2 1 yes #3 3 22 NA #4 4 15 NA #5 5 3 no #6 6 6 NA #7 7 8 yes library(plyr) join(DataA,DataB,by=ID,type=left) # ID v1 v2 #1 1 10 NA #2 2 1 yes #3 3 22 NA #4 4 15 NA #5 5 3 no #6 6 6 NA #7 7 8 yes A.K. From: farnoosh sheikhi farnoosh...@yahoo.com To: smartpink...@yahoo.com smartpink...@yahoo.com Sent: Wednesday, April 17, 2013 12:52 AM Subject: Merge Hi Arun, I want to merge a data set with another data frame with 2 columns and keep the sample size of the DataA. DataA DataB DataCombine ID v1 ID V2 ID v1 v2 1 10 2 yes 1 10 NA 2 1 5 no 2 1 yes 3 22 7 yes 3 22 NA 4 15 4 15 NA 5 3 5 3 no 6 6 6 6 NA 7 8 7 8 yes Thanks a lot for your help and time. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Transformation of a variable in a dataframe
HI, I have a dataframe with two variable A, B. I transform the two variable and name them as C, D and save it in a dataframe dfcd. However, I wonder why can't I call them by dfcd$C and dfcd$D? Thanks, Miao A=c(1,2,3) B=c(4,6,7) dfab-data.frame(A,B) C=dfab[A]*2 D=dfab[B]*3 dfcd-data.frame(C,D) dfcd A B 1 2 12 2 4 18 3 6 21 dfcd$C NULL dfcd$A [1] 2 4 6 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Transformation of a variable in a dataframe
Hi, Because a column name exists for C and D: colnames(C) [1] A colnames(D) [1] B One possibility: A=c(1,2,3) B=c(4,6,7) dfab-data.frame(A,B) C=dfab$A*2 D=dfab$B*3 dfcd-data.frame(C,D) dfcd C D 1 2 12 2 4 18 3 6 21 dfcd$C [1] 2 4 6 HTH, Pascal On 04/17/2013 02:33 PM, jpm miao wrote: HI, I have a dataframe with two variable A, B. I transform the two variable and name them as C, D and save it in a dataframe dfcd. However, I wonder why can't I call them by dfcd$C and dfcd$D? Thanks, Miao A=c(1,2,3) B=c(4,6,7) dfab-data.frame(A,B) C=dfab[A]*2 D=dfab[B]*3 dfcd-data.frame(C,D) dfcd A B 1 2 12 2 4 18 3 6 21 dfcd$C NULL dfcd$A [1] 2 4 6 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.