[R] R command to open a file browser on Windows and Mac?
Folks: Is there an easy function to open a finder window (on mac) or windows explorer window (on windows) given an input folder? A lot of times I want to be able to see via a file browser my working directory. Is there a good R hack to do this? --j [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R command to open a file browser on Windows and Mac?
On 03/08/2015 11:19 AM, Jonathan Greenberg wrote: Folks: Is there an easy function to open a finder window (on mac) or windows explorer window (on windows) given an input folder? A lot of times I want to be able to see via a file browser my working directory. Is there a good R hack to do this? On Windows, shell.exec(dir) will open Explorer at that directory. (It'll do something else if dir isn't a directory name, or has spaces in it without quotes, so you need to be a little careful.) On OSX, system2(open, dir) should do the same. Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Households per Census block
Folks, I am using the UScensus2010 package and I am trying to figure out the number of households per census block. There are a number of possible data downloads in the package but apparently I am not smart enough to figure out which data-set is appropriate and what functions to use. Any help or pointers or links would be greatly appreciated. Thanks for your time, Best, KW __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R command to open a file browser on Windows and Mac?
Set your path with setwd(“my_path”) and then use file.choose(). You could have gotten this information sooner with a simple online search. Mark R. Mark Sharp, Ph.D. Director of Primate Records Database Southwest National Primate Research Center Texas Biomedical Research Institute P.O. Box 760549 San Antonio, TX 78245-0549 Telephone: (210)258-9476 e-mail: msh...@txbiomed.org On Aug 3, 2015, at 10:19 AM, Jonathan Greenberg j...@illinois.edu wrote: Folks: Is there an easy function to open a finder window (on mac) or windows explorer window (on windows) given an input folder? A lot of times I want to be able to see via a file browser my working directory. Is there a good R hack to do this? --j [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using R to fit a curve to a dataset using a specific equation
Use Reply-All to keep the discussion on the list. I suggested reading about nls (not just how to do it in R) because you requested R2. It was not clear that you were aware that there are strong reasons to suspect that R2 is misleading when applied nls results. That is why nls() does not provide it automatically. But R2 is easily computed from the model results: GossSS - sum((dta$Gossypol - mean(dta$Gossypol))^2) R2 - deviance(dta.nls)/GossSS R2 [1] 0.6318866 As for ggplot, just add the line we created before to the points plot: library(ggplot) xval - seq(0, 10, length.out=200) yval - predict(dta.nls, data.frame(Damage_cm=xval)) ggplot() + geom_point(data=dta, aes(x=Damage_cm, y=Gossypol)) + geom_line(aes(x=xval, y=yval)) David Carlson From: Michael Eisenring [mailto:michael.eisenr...@gmx.ch] Sent: Saturday, August 1, 2015 5:33 PM To: David L Carlson Subject: Aw: RE: [R] Using R to fit a curve to a dataset using a specific equation Hello and thank you very much for your help! I just started to read up on non-linear least squares in The RBook. (I am totally new to the topic so i dindt even know where to look in the book ). I have three last questions: In the Rbook they say how to describe a model. In my case it would be something like: ‘The model y ~ y0 + a * (1 - b^x) had y0= 1303.45 ( 386.15 standard error), a= and b= The model explained ??% of the total variation in y My question is were do I find the %age of total variation the model explains. it does not say that in the book. Is there something similar as a R^2 value or a p-value? My last question: is it possible to use ggplot2 for plotting the whole model? Thanks a lot. Mike Gesendet: Samstag, 01. August 2015 um 13:49 Uhr Von: David L Carlson dcarl...@tamu.edu An: Michael Eisenring michael.eisenr...@gmx.ch, r-help@r-project.org r-help@r-project.org Betreff: RE: [R] Using R to fit a curve to a dataset using a specific equation I can get you started, but you should really read up on non-linear least squares. Calling your data frame dta (since data is a function): plot(Gossypol~Damage_cm, dta) # Looking at the plot, 0 is a plausible estimate for y0: # a+y0 is the asymptote, so estimate about 4000; # b is between 0 and 1, so estimate .5 dta.nls - nls(Gossypol~y0+a*(1-b^Damage_cm), dta, start=list(y0=0, a=4000, b=.5)) xval - seq(0, 10, length.out=200) lines(xval, predict(dta.nls, data.frame(Damage_cm=xval))) profile(dta.nls, alpha= .05) === Number of iterations to convergence: 3 Achieved convergence tolerance: 1.750586e-06 attr(,summary) Formula: Gossypol ~ y0 + a * (1 - b^Damage_cm) Parameters: Estimate Std. Error t value Pr(|t|) y0 1303.4529432 386.1515684 3.37550 0.0013853 ** a 2796.0464520 530.4140959 5.27144 2.5359e-06 *** b 0.4939111 0.1809687 2.72926 0.0085950 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1394.375 on 53 degrees of freedom Number of iterations to convergence: 3 Achieved convergence tolerance: 1.750586e-06 David Carlson Dept of Anthropology Texas AM College Station, TX 77843 From: R-help [r-help-boun...@r-project.org] on behalf of Michael Eisenring [michael.eisenr...@gmx.ch] Sent: Saturday, August 01, 2015 10:17 AM To: r-help@r-project.org Subject: [R] Using R to fit a curve to a dataset using a specific equation Hi there I would like to use a specific equation to fit a curve to one of my data sets (attached) dput(data) structure(list(Gossypol = c(1036.331811, 4171.427741, 6039.995102, 5909.068158, 4140.242559, 4854.985845, 6982.035521, 6132.876396, 948.2418407, 3618.448997, 3130.376482, 5113.942098, 1180.171957, 1500.863038, 4576.787021, 5629.979049, 3378.151945, 3589.187889, 2508.417927, 1989.576826, 5972.926124, 2867.610671, 450.7205451, 1120.955, 3470.09352, 3575.043632, 2952.931863, 349.0864019, 1013.807628, 910.8879471, 3743.331903, 3350.203452, 592.3403778, 1517.045807, 1504.491931, 3736.144027, 2818.419785, 723.885643, 1782.864308, 1414.161257, 3723.629772, 3747.076592, 2005.919344, 4198.569251, 2228.522959, 3322.115942, 4274.324792, 720.9785449, 2874.651764, 2287.228752, 5654.858696, 1247.806111, 1247.806111, 2547.326207, 2608.716056, 1079.846532), Treatment = structure(c(2L, 3L, 4L, 5L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 1L), .Label = c(C, 1c_2d, 3c_2d, 9c_2d, 1c_7d), class = factor), Damage_cm = c(0.4955, 1.516, 4.409, 3.2665, 0.491, 2.3035, 3.51, 1.8115, 0, 0.4435, 1.573, 1.8595, 0, 0.142, 2.171, 4.023, 4.9835, 0, 0.6925, 1.989, 5.683, 3.547, 0, 0.756, 2.129, 9.437, 3.211, 0, 0.578, 2.966, 4.7245, 1.8185, 0, 1.0475, 1.62, 5.568, 9.7455, 0, 0.8295, 2.411, 7.272, 4.516, 0, 0.4035, 2.974, 8.043, 4.809, 0, 0.6965, 1.313, 5.681, 3.474, 0, 0.5895, 2.559, 0)), .Names =
Re: [R] R command to open a file browser on Windows and Mac?
And for completeness, on linux: system(paste0(xdg-open ,getwd())) there's a function in a package somewhere that hides the system dependencies of opening things with the appropriate application, and if you pass a folder/directory to it I reckon it will open it in the Explorer/Finder/Nautilus//xfm//This Month's Linux File Browser// as appropriate. But I can't remember the name of the function or the package. Barry On Mon, Aug 3, 2015 at 4:19 PM, Jonathan Greenberg j...@illinois.edu wrote: Folks: Is there an easy function to open a finder window (on mac) or windows explorer window (on windows) given an input folder? A lot of times I want to be able to see via a file browser my working directory. Is there a good R hack to do this? --j [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using R to fit a curve to a dataset using a specific equation
Your question is more statistics than R and I’m not qualified to offer an opinion. You should be able to find someone locally to help you. The Cross Validated website is also a useful resource. David From: Michael Eisenring [mailto:michael.eisenr...@gmx.ch] Sent: Monday, August 3, 2015 10:37 AM To: David L Carlson Cc: r-help Subject: Aw: RE: RE: [R] Using R to fit a curve to a dataset using a specific equation Hi David, thank you for your help. It makes sense to me that the R2 is very misleading in an non-linear regression. the same is true for the p-values. My question then is: how can I present the results of my curve and quantify its goodness if R2 and p-values are misleading? thanks a lot, Mike Gesendet: Montag, 03. August 2015 um 07:40 Uhr Von: David L Carlson dcarl...@tamu.edumailto:dcarl...@tamu.edu An: Michael Eisenring michael.eisenr...@gmx.chmailto:michael.eisenr...@gmx.ch, r-help r-help@r-project.orgmailto:r-help@r-project.org Betreff: RE: RE: [R] Using R to fit a curve to a dataset using a specific equation Use Reply-All to keep the discussion on the list. I suggested reading about nls (not just how to do it in R) because you requested R2. It was not clear that you were aware that there are strong reasons to suspect that R2 is misleading when applied nls results. That is why nls() does not provide it automatically. But R2 is easily computed from the model results: GossSS - sum((dta$Gossypol - mean(dta$Gossypol))^2) R2 - deviance(dta.nls)/GossSS R2 [1] 0.6318866 As for ggplot, just add the line we created before to the points plot: library(ggplot) xval - seq(0, 10, length.out=200) yval - predict(dta.nls, data.frame(Damage_cm=xval)) ggplot() + geom_point(data=dta, aes(x=Damage_cm, y=Gossypol)) + geom_line(aes(x=xval, y=yval)) David Carlson From: Michael Eisenring [mailto:michael.eisenr...@gmx.ch] Sent: Saturday, August 1, 2015 5:33 PM To: David L Carlson Subject: Aw: RE: [R] Using R to fit a curve to a dataset using a specific equation Hello and thank you very much for your help! I just started to read up on non-linear least squares in The RBook. (I am totally new to the topic so i dindt even know where to look in the book ). I have three last questions: In the Rbook they say how to describe a model. In my case it would be something like: ‘The model y ~ y0 + a * (1 - b^x) had y0= 1303.45 ( 386.15 standard error), a= and b= The model explained ??% of the total variation in y My question is were do I find the %age of total variation the model explains. it does not say that in the book. Is there something similar as a R^2 value or a p-value? My last question: is it possible to use ggplot2 for plotting the whole model? Thanks a lot. Mike Gesendet: Samstag, 01. August 2015 um 13:49 Uhr Von: David L Carlson dcarl...@tamu.edumailto:dcarl...@tamu.edu An: Michael Eisenring michael.eisenr...@gmx.chmailto:michael.eisenr...@gmx.ch, r-help@r-project.orgmailto:r-help@r-project.org r-help@r-project.orgmailto:r-help@r-project.org Betreff: RE: [R] Using R to fit a curve to a dataset using a specific equation I can get you started, but you should really read up on non-linear least squares. Calling your data frame dta (since data is a function): plot(Gossypol~Damage_cm, dta) # Looking at the plot, 0 is a plausible estimate for y0: # a+y0 is the asymptote, so estimate about 4000; # b is between 0 and 1, so estimate .5 dta.nls - nls(Gossypol~y0+a*(1-b^Damage_cm), dta, start=list(y0=0, a=4000, b=.5)) xval - seq(0, 10, length.out=200) lines(xval, predict(dta.nls, data.frame(Damage_cm=xval))) profile(dta.nls, alpha= .05) === Number of iterations to convergence: 3 Achieved convergence tolerance: 1.750586e-06 attr(,summary) Formula: Gossypol ~ y0 + a * (1 - b^Damage_cm) Parameters: Estimate Std. Error t value Pr(|t|) y0 1303.4529432 386.1515684 3.37550 0.0013853 ** a 2796.0464520 530.4140959 5.27144 2.5359e-06 *** b 0.4939111 0.1809687 2.72926 0.0085950 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1394.375 on 53 degrees of freedom Number of iterations to convergence: 3 Achieved convergence tolerance: 1.750586e-06 David Carlson Dept of Anthropology Texas AM College Station, TX 77843 From: R-help [r-help-boun...@r-project.org] on behalf of Michael Eisenring [michael.eisenr...@gmx.ch] Sent: Saturday, August 01, 2015 10:17 AM To: r-help@r-project.orgmailto:r-help@r-project.org Subject: [R] Using R to fit a curve to a dataset using a specific equation Hi there I would like to use a specific equation to fit a curve to one of my data sets (attached) dput(data) structure(list(Gossypol = c(1036.331811, 4171.427741, 6039.995102, 5909.068158, 4140.242559, 4854.985845, 6982.035521, 6132.876396, 948.2418407, 3618.448997, 3130.376482, 5113.942098, 1180.171957, 1500.863038, 4576.787021, 5629.979049, 3378.151945, 3589.187889, 2508.417927,
[R-es] Menor que 1000
Estimados Colegas: Estoy tratando de hacer unas gráficas y al pedir ejecutar la última línea, el equipo me dice: geom_smooth: method=auto and size of largest group is 1000, so using loess. Use 'method = x' to change the smoothing method. La línea en cuestión es: ggplot(data = dat, aes(x=srt, y=d_t, col=Detec)) + geom_point(aes(shape=Detec)) + geom_smooth(span=0.65, aes(group=1))+ scale_colour_manual(values=c(black,red)) + ggtitle(Curva Suavizada con Intervalo de Confianza + Detectados y No Detectados) Los datos son 348 observaciones de 6 variables. *¿Me puede alguien ayudar para decirme que debo cambiar y así deje de salir dicho aviso? Consulté la ayuda de ggplot en soporte de R Studio pero no que busco.* *Anticipo las más cumplidas gracias.* *MANOLO MÁRQUEZ P.* [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
Re: [R-es] ayuda con análisis de supervivencia
Hola: A que te refieres como el bmi hasta el evento? Respecto que no sea un tiempo de supervivencia, no eres el único. En este artículo tampoco utilizan un tiempo: http://www.ncbi.nlm.nih.gov/pubmed/8970394 Saludos. On Sun, 2 Aug 2015 19:19:45 +0200 JM ARBONES marbo...@unizar.es wrote: Hola a todos, -Estoy estudiando el efecto de dos genotipos (~tratamientos) en la aparición de síndrome metabólico (MetS) con datos longitudinales recogidos a tiempo 0,7,10,15,20 y 25 años. -He hecho un dataframe con las siguientes variables MetS: Síndrome Metabólico (Si=1,No=0) bmi: Indice de masa corporal (IMC) cuando se produce la conversión a MetS+ . Para los que permancen MetS-, esta variable indica el bmi cuando hay censura (por abandono del estudio o al finalizar el estudio en el año 25). bmi0: IMC al inicio del estudio (categórica, levels=normal/overweight/obese) apoE4: Genotipo de interés (E4, no-E4) -Mi hipótesis es que la interacción genotipo~MetS depende del IMC al principio del estudio. Concretamente, individuos 'overweight' al inicio del estudio y con el genotipo E4 hacen la conversión a MetS+ a valores de IMC mas bajos que los que tienen el genotipo no-E4. Este fenómeno no ocurriría en los 'normal' y 'obese'. -He creado unos objetos Surv, pero en lugar de utilizar el tiempo hasta evento (MetS+) estoy utilizando el bmi hasta el evento. Las gráficas que resultan al hacer el análisis de supervivencia parecerían confirmar mi hipótesis, pero no se si lo que estoy haciendo es una aberración estadística. Tampoco se si los coeficientes de la regresión de Cox tienen sentido al no utilizar la variable tiempo. ?Alguien me podría 1)decir si lo que estoy haciendo tiene sentido y 2) como interpretar los resultados (regresión de Cox y gráficas)? Si a alguien se anima a contestar, adjunto un link con los datos (.Rdata) y el script que he utilizado en el análisis. https://www.dropbox.com/s/d96itird8ms42yx/dataframe.Rdata?dl=0 https://www.dropbox.com/s/d96itird8ms42yx/dataframe.Rdata?dl=0 sapply(levels(df0$bmi0),function (x){ #SURVIVAL CURVE dfx=filter(df0,bmi0==x) surv2=Surv(dfx$bmi,dfx$MetS) km2=survfit(surv2~dfx$apoe4)##start.time=20,type='kaplan') plot(km2,lty=2:1,xlim=c(20,41),xlab='BMI at onset',main=x,mark.time = F) legend('bottomleft',c('E4','no-E4'),lty=2:1) cox=list(coxph(surv2~relevel(dfx$apoe4,ref='no-E4'))) }) sapply(levels(df0$bmi0),function (x){ #CUMULATIVE HAZARDs dfx=filter(df0,bmi0==x) surv2=Surv(dfx$bmi,dfx$MetS) km2=survfit(surv2~dfx$apoe4) plot(km2,lty=2:1,xlim=c(20,41),xlab='BMI at onset',main=x,mark.time = F,fun='cumhaz') legend('topleft',c('E4','no-E4'),lty=2:1) }) Muchas gracias y un saludo Jose Miguel --- Jose Miguel Arbones-Mainar, PhD Unidad de Investigación Traslacional Instituto Aragones de Ciencias de la Salud Hospital Universitario Miguel Servet Pº Isabel la Católica, 1-3 50009 Zaragoza (Spain) Tel: +34 976 769 565 Fax: +34 976 769 566 www.adipofat.com http://www.adipofat.com/ --- Jose Miguel Arbones-Mainar www.adipofat.com http://www.adipofat.com/ [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
[R] Faster text search in document database than with grep?
I have a database of text documents (letter sequences). Several thousands of documents with approx. 1000-2000 letters each. I need to find exact matches of short 3-15 letters sequences in those documents. Without any regexp patterns the search of one 3-15 letter words takes in the order of 1s. So for a database with several thousand documents it's an the order of hours. The naive approach would be to use mcmapply, but than on a standard hardware I am still in the same order and since R is an interactive programming environment this isn't a solution I would go for. But aren't there faster algorithmic solutions? Can anyone point me please to an implementation available in R. Thank you Witold -- Witold Eryk Wolski [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splitting lines in R script
Hi Steven, In general, the command line must be incomplete (in your case, a trailing hyphen) for the interpreter to take the next line as a continuation. Jim On Sun, Aug 2, 2015 at 9:05 PM, Steven Yen sye...@gmail.com wrote: I have a line containing summation of four components. # This works OK: p-pbivnorm(bb,dd,tau)+pbivnorm(aa,cc,tau)- -pbivnorm(aa,dd,tau)-pbivnorm(bb,cc,tau) # This produces unpredicted results without warning: p-pbivnorm(bb,dd,tau)+pbivnorm(aa,cc,tau) -pbivnorm(aa,dd,tau)-pbivnorm(bb,cc,tau) Is there a general rule of thumb for line breaks? Thanks you. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Natural Smoothing B-splines
More effective search with respect to browsing or digging manually into the documentation. Almost surely I'll find what I need! Thanks very much mggl __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] NCDF_arrays
Dear R-users I am working with ncdf data using the variables time (1-365), lon (longitude), lat (latitude) and the Temperature variable daily). After setting the parameters for the model, I am able to calculate the output for each lon-lat grid point. The model works well including one ncdf file (TabsD29) (1). Now I want to include more than one ncdf file (2). Alls the ncdf files have the same variables with exception of the temperature variable. However, I get the output wrong arrays. I think may suggestion to calculate the mean is wrong? Many thanks Sibylle nlon - length(lon) nlat - length(lat) nday - length(TabsD29[1,1,]) Tmin - 10. GDDmax - 145 DOYstart - 1 (1) Teffs - pmax(TabsD29 - Tmin, array(0., dim=c(nlon, nlat, nday))) (2) Teff - pmax(mean(as.numeric(TabsD80+TabsD81+TabsD82+TabsD83+TabsD84+TabsD85+TabsD86+TabsD87+TabsD88+TabsD89+TabsD90+TabsD91+TabsD92+TabsD93+TabsD94+TabsD95+TabsD96+TabsD97+TabsD98+TabsD99+TabsD20+TabsD21+TabsD22+TabsD23+TabsD24+TabsD25+TabsD26+TabsD27+TabsD28+TabsD29)) - Tmin, array(0., dim=c(nlon, nlat, nday)), na.rm=TRUE) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] vectorized sub, gsub, grep, etc.
Interesting. I know of no practical use for such a function. If the first position were 'abb,' sub() would return 'aBb,' failing to replace the second 'b.' I find it hard to believe that's the desired functionality. Writing a looped regex function in Rcpp makes the most sense for speed. Using Boost C++ library regex (link http://gallery.rcpp.org/articles/boost-regular-expressions/) or a C++ wrapper for PCRE (link https://gist.github.com/abicky/58ea79b01d9e394d5076) are two solutions, but pure Rcpp would be ideal to avoid external software dependencies. Cheers, Adam On Sun, Aug 2, 2015 at 9:42 PM, John Thaden jjtha...@flash.net wrote: Adam, The original posting gave a function sub2 whose aim differs both from your functions' aim and from the intent of mgsub() in the qdap package: Here is code to apply a different pattern and replacement for every target. #Example X - c(ab, cd, ef) patt - c(b, cd, a) repl - c(B, CD, A) The first pattern ('b') and the first replacement ('B') therefore apply only to the first target ('ab'), the second to the second, etc. The function achieves its aim, giving the correct answer 'aB', 'CD', 'ef'. mgsub() satisfies a different need, testing all targets for matches with any pattern in the vector of patterns and, if a match is found, replacing the matched target with the replacement value corresponding to the matched pattern. It, too, achieves its aim, giving a different (but also correct) answer 'AB', 'CD', 'ef'. Regards, -John #Example X - c(ab, cd, ef) patt - c(b, cd, a) repl - c(B, CD, A) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R error
On 03 Aug 2015, at 18:00 , Hood, Kyle (CDC/OCOO/OCIO/ITSO) (CTR) y...@cdc.gov wrote: Good afternoon, I recently received a ticket from a customer to upgrade from 3.1.1. to 3.2.1. After the upgrade, when he tries to install a package he receives the error below. Could you please advise as to what is wrong? Thank you. It's not too easy to tell given the number of ways large networked installs can be configured, but the logic is that if the R installation directory is write protected (which is usually a good thing), the packages go into a subdirectory of the user's home dir. The output suggests that R believes that this is \\cdc.gov\private\M328\ygv7, but apparently that doesn't exist since it tries to create \\cdc.gov\private which it can't. Apart from that, try digging around in https://cran.r-project.org/bin/windows/base/rw-FAQ.html -pd Kyle --- Please select a CRAN mirror for use in this session --- Warning in install.packages(NULL, .libPaths()[1L], dependencies = NA, type = type) : 'lib = C:/Program Files/R/R-3.2.1/library' is not writable Error in install.packages(NULL, .libPaths()[1L], dependencies = NA, type = type) : unable to create '\\cdc.gov\private\M328\ygv7/R/win-library/3.2' In addition: Warning message: In dir.create(userdir, recursive = TRUE) : cannot create dir '\\cdc.gov\private', reason 'Permission denied' [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Environmental Data Connector v1.3
Hi Dan, thanks for your response. The setwd is coded somewhere in the EDC.get function. I guess I could try alter the code but I assume this package should work as is. -- View this message in context: http://r.789695.n4.nabble.com/Environmental-Data-Connector-v1-3-tp4710686p4710701.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] scaling variables consecutively and independently
Hello Everyone, So I am very new to R and I'm having some trouble. I basically have around 110 datasets each one made up of around 100 variables. I am trying to z-score the scores in each column but independently of each other ( each column independent of the other). The problem is that there are just too many variables in each dataset to compute individually. I figured there should be some type of loop that I can do in which it would scale the scores in each column and then move on to the next but I haven't been able to find anything about this. Can anyone help? Thanks so much! -RW -- View this message in context: http://r.789695.n4.nabble.com/scaling-variables-consecutively-and-independently-tp4710702.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Households per Census block
Hi Anthony and Keith Weintraub, Here is a way to do what you are asking using the UScensus2010 packages: ## latest version of the package, not yet on CRAN install.packages(UScensus2010, repos=http://R-Forge.R-project.org;) library(UScensus2010) install.blk() library(UScensus2010blk) ### You will want the H0010001 variable (see help(alabama.blk10)) ### Other variables are also available ### You can use the new api function in UScensus2010 to get arbitrary variables from SF1 and acs data(states.names) head(states.names) state.blk.housing-vector(list,length(states.names)) ## notice this could be greatly spead up using the library(parallel) ## with mclapply ## This will be somewhat slow b/c of so much spatial data for(i in 1:length(states.names)){ data(list=paste(states.names[i],blk10,sep=.)) temp-get(paste(states.names[i],blk10,sep=.)) #unique b/c more shapefiles than fips state.blk.housing[[i]]-unique(temp@data[,c(fips,H0010001)]) print(i) rm(paste(states.names,blk10,sep=.)) } ### # alternatively Using the US Census API function in the new UScensus2010 package ### ## Get all states fips code data(countyfips) state.fips-unique(substr(countyfips$fips,1,2)) head(state.fips) length(state.fips) ## will be 51=50 (states)+ 1(DC) ## You will need a census key key-YOUR KEY HERE housing-CensusAPI2010(c(H0010001), state.fips=state.fips, level = c(block), key, summaryfile = c(sf1)) Best, -- Zack - Zack W. Almquist Assistant Professor Department of Sociology and School of Statistics Affiliate, Minnesota Population Center University of Minnesota On Mon, Aug 3, 2015 at 12:43 PM, Anthony Damico ajdam...@gmail.com wrote: hi, ccing the package maintainer. one alternative is to pull the HU100 variable directly from the census bureau's summary files: that variable starts at position 328 and ends at 336. just modify this loop and you'll get a table with one-record-per-census-block in every state. https://github.com/davidbrae/swmap/blob/master/how%20to%20map%20the%20consumer%20expenditure%20survey.R#L104 (1) line 134 change the very last -9 to 9 (2) line 137 between pop100 and intptlat add an hu100 summary file docs- http://www.census.gov/prod/cen2010/doc/sf1.pdf#page=18 On Mon, Aug 3, 2015 at 11:55 AM, Keith S Weintraub kw1...@gmail.com wrote: Folks, I am using the UScensus2010 package and I am trying to figure out the number of households per census block. There are a number of possible data downloads in the package but apparently I am not smart enough to figure out which data-set is appropriate and what functions to use. Any help or pointers or links would be greatly appreciated. Thanks for your time, Best, KW __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Environmental Data Connector v1.3
Hi Robert: I didn’t see this until Dan sent me something offline. I apologize for the problem. Yes the function should work as is, and a lot of the EDC is Java. I have forwarded your email to the people who did the coding. But in the meantime can you do two things for me to help us in the debugging: 1. Can you send me the result of the command: Sys.getenv(“EDC_HOME”) 2. Can you send the result ls -l on that directory? I suggest that at this point do it offline, as I doubt these details would be of interest to the list as a whole. If we find a solution I will post that. I will add that we don’t have the resources to test on all versions of all OS, and that at times changes in R have required us to change our code, and this can also be one such instance. But yes, the error message suggests that the code can’t write the necessary temp files to whatever directory it is trying. Thanks, -Roy On Aug 3, 2015, at 11:36 AM, Robert in SA ri.william...@outlook.com wrote: Hi Dan, thanks for your response. The setwd is coded somewhere in the EDC.get function. I guess I could try alter the code but I assume this package should work as is. -- View this message in context: http://r.789695.n4.nabble.com/Environmental-Data-Connector-v1-3-tp4710686p4710701.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ** The contents of this message do not reflect any position of the U.S. Government or NOAA. ** Roy Mendelssohn Supervisory Operations Research Analyst NOAA/NMFS Environmental Research Division Southwest Fisheries Science Center ***Note new address and phone*** 110 Shaffer Road Santa Cruz, CA 95060 Phone: (831)-420-3666 Fax: (831) 420-3980 e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/ Old age and treachery will overcome youth and skill. From those who have been given much, much will be expected the arc of the moral universe is long, but it bends toward justice -MLK Jr. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Environmental Data Connector v1.3
During installation EDC_HOME was set to /home/robert/EDC and the directory definitely exists. Are you sure that EDC_HOME is set now? What do you get from the following command? Sys.getenv(EDC_HOME) If that is set to something other than , what do you get from getwd() setwd(Sys.getenv(EDC_HOME)) getwd() If it is was not set, do things work better if you first do Sys.setenv(EDC_HOME=/home/robert/EDC) Bill Dunlap TIBCO Software wdunlap tibco.com On Mon, Aug 3, 2015 at 8:12 AM, Robert in SA ri.william...@outlook.com wrote: Hello. I have successfully installed EDC v1.3 on linux ubuntu 14.04. I am running a 64bit machine with R 3.2.1 via Rstudio. I have tried example1 - EDC.get(1) after loading the ncdf and EDCR libraries, from both the R terminal and Rstudio and get the following result: Error in setwd(paste(Sys.getenv(EDC_HOME), sep = )) : cannot change working directory During installation EDC_HOME was set to /home/robert/EDC and the directory definitely exists. Does anyone have any suggestions? -- View this message in context: http://r.789695.n4.nabble.com/Environmental-Data-Connector-v1-3-tp4710686.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] vectorized sub, gsub, grep, etc.
sub() has practical uses though gsub() may have more. This function was what I needed at the time. Of course the gsub() version is also possible. Sent from Yahoo Mail on Android [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R error
Good afternoon, I recently received a ticket from a customer to upgrade from 3.1.1. to 3.2.1. After the upgrade, when he tries to install a package he receives the error below. Could you please advise as to what is wrong? Thank you. Kyle --- Please select a CRAN mirror for use in this session --- Warning in install.packages(NULL, .libPaths()[1L], dependencies = NA, type = type) : 'lib = C:/Program Files/R/R-3.2.1/library' is not writable Error in install.packages(NULL, .libPaths()[1L], dependencies = NA, type = type) : unable to create '\\cdc.gov\private\M328\ygv7/R/win-library/3.2' In addition: Warning message: In dir.create(userdir, recursive = TRUE) : cannot create dir '\\cdc.gov\private', reason 'Permission denied' [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Environmental Data Connector v1.3
Why are you using paste() ? Why not just setwd(Sys.getenv(EDC_HOME)) Dan Daniel Nordlund, PhD Research and Data Analysis Division Services Enterprise Support Administration Washington State Department of Social and Health Services -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Robert in SA Sent: Monday, August 03, 2015 8:12 AM To: r-help@r-project.org Subject: [R] Environmental Data Connector v1.3 Hello. I have successfully installed EDC v1.3 on linux ubuntu 14.04. I am running a 64bit machine with R 3.2.1 via Rstudio. I have tried example1 - EDC.get(1) after loading the ncdf and EDCR libraries, from both the R terminal and Rstudio and get the following result: Error in setwd(paste(Sys.getenv(EDC_HOME), sep = )) : cannot change working directory During installation EDC_HOME was set to /home/robert/EDC and the directory definitely exists. Does anyone have any suggestions? -- View this message in context: http://r.789695.n4.nabble.com/Environmental-Data-Connector-v1-3-tp4710686.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Matching posterior probabilities from poLCA
Hi all, I'm a newbie to R with a question about poLCA. When you run a latent class analysis in poLCA it generates a value for each respondent giving their posterior probability of 'belonging' to each latent class. These are stored as a matrix in the element 'posterior'. I would like to create a dataframe which contains each respondent's unique ID number (which is stored as a variable in the dataframe used for poLCA) and their *matched* posterior probability from the 'posterior' matrix. I would then like to write this dataframe to a csv file for use in another program. I know this is possible, but I just can't seem to get it right (blame my incompetence with R). Any help would be very warmly appreciated. Best wishes. Robert de Vries [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Environmental Data Connector v1.3
Hello. I have successfully installed EDC v1.3 on linux ubuntu 14.04. I am running a 64bit machine with R 3.2.1 via Rstudio. I have tried example1 - EDC.get(1) after loading the ncdf and EDCR libraries, from both the R terminal and Rstudio and get the following result: Error in setwd(paste(Sys.getenv(EDC_HOME), sep = )) : cannot change working directory During installation EDC_HOME was set to /home/robert/EDC and the directory definitely exists. Does anyone have any suggestions? -- View this message in context: http://r.789695.n4.nabble.com/Environmental-Data-Connector-v1-3-tp4710686.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Households per Census block
hi, ccing the package maintainer. one alternative is to pull the HU100 variable directly from the census bureau's summary files: that variable starts at position 328 and ends at 336. just modify this loop and you'll get a table with one-record-per-census-block in every state. https://github.com/davidbrae/swmap/blob/master/how%20to%20map%20the%20consumer%20expenditure%20survey.R#L104 (1) line 134 change the very last -9 to 9 (2) line 137 between pop100 and intptlat add an hu100 summary file docs- http://www.census.gov/prod/cen2010/doc/sf1.pdf#page=18 On Mon, Aug 3, 2015 at 11:55 AM, Keith S Weintraub kw1...@gmail.com wrote: Folks, I am using the UScensus2010 package and I am trying to figure out the number of households per census block. There are a number of possible data downloads in the package but apparently I am not smart enough to figure out which data-set is appropriate and what functions to use. Any help or pointers or links would be greatly appreciated. Thanks for your time, Best, KW __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] About nls.
Hi Please keep conversation on list somebody can have better idea. Other see in line. -Original Message- From: Jianling Fan [mailto:fanjianl...@gmail.com] Sent: Friday, July 31, 2015 4:46 PM To: PIKAL Petr Subject: Re: [R] About nls. Hello, Petr, Thanks for your help. That works but it change my model. And I think that's not the main problem. from my data, (den1/R1+den2+den3+den4+den5/R5) always 1, which makes (1/(den1/R1+den2+den3+den4+den5/R5)-1)0. It is not true unless I have different data from yours. with(dat,(den1/0.9+den2+den3+den4+den5/23)) [1] 0.466 0.747 0.976 1.073 1.110 0.380 [7] 0.480 0.850 0.880 1.000 0.480 0.890 [13] 0.980 0.990 1.000 0.200 0.390 0.690 [19] 0.990 1.000 6.0652174 9.9065217 13.3434783 16.5782609 [25] 18.8021739 19.8130435 with(dat,(1/(den1/.9+den2+den3+den4+den5/23)-1)0) [1] TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE [13] TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE [25] FALSE FALSE str(dat) 'data.frame': 26 obs. of 7 variables: $ Depth: int 20 40 60 80 100 15 30 45 60 120 ... $ lnd : num 3 3.69 4.09 4.38 4.61 ... $ den1 : num 0.419 0.672 0.878 0.966 0.999 ... $ den2 : num 0 0 0 0 0 0.38 0.48 0.85 0.88 1 ... $ den3 : num 0 0 0 0 0 0 0 0 0 0 ... $ den4 : num 0 0 0 0 0 0 0 0 0 0 ... $ den5 : num 0 0 0 0 0 0 0 0 0 0 ... dput(dat) structure(list(Depth = c(20L, 40L, 60L, 80L, 100L, 15L, 30L, 45L, 60L, 120L, 15L, 30L, 45L, 60L, 120L, 15L, 30L, 45L, 60L, 120L, 10L, 30L, 50L, 70L, 90L, 110L), lnd = c(2.995732, 3.688879, 4.094345, 4.382027, 4.60517, 2.70805, 3.401197, 3.806662, 4.094345, 4.787492, 2.70805, 3.401197, 3.806662, 4.094345, 4.787492, 2.70805, 3.401197, 3.806662, 4.094345, 4.787492, 2.302585, 3.401197, 3.912023, 4.248495, 4.49981, 4.70048), den1 = c(0.419, 0.6725, 0.878, 0.966, 0.999, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), den2 = c(0, 0, 0, 0, 0, 0.38, 0.48, 0.85, 0.88, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), den3 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.48, 0.89, 0.98, 0.99, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), den4 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.2, 0.39, 0.69, 0.99, 1, 0, 0, 0, 0, 0, 0), den5 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 139.5, 227.85, 306.9, 381.3, 432.45, 455.7)), .Names = c(Depth, lnd, den1, den2, den3, den4, den5), class = data.frame, row.names = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)) You can check if this is the same as you have. That is why it is preferable to use dput for sending data. Cheers Petr So, the nls should work in this case. But I don't know why it does not. Thanks! Regards, Julian On 31 July 2015 at 00:54, PIKAL Petr petr.pi...@precheza.cz wrote: Hi I am not an expert but the problem seems to me that (den1/R1+den2+den3+den4+den5/R5)-1 gives you sometimes value 0 and sometimes negative. In these cases the value of log(1/result) is NA or Inf and nls can not handle this. I do not search vhere is nls2 from so I used nls and removed -1 form your formula, which resulted to some final values. fit1-nls(lnd~log(1/(den1/R1+den2+den3+den4+den5/R5))/c+log(d50), + start=c(R1=0.9, R5=23, c=-1.1, d50=10), data=test) coef(fit) A B 6.9720965 -0.0272203 coef(fit1) R1 R5 c d50 0.9622249 416.1272498 -0.7178156 73.6017161 Cheers Petr -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jianling Fan Sent: Thursday, July 30, 2015 9:51 PM To: r-help@r-project.org Subject: [R] About nls. Hello, I am trying to do a nls regression with R. but I always get a error as Error in numericDeriv(form[[3L]], names(ind), env) : Missing value or an infinity produced when evaluating the model. I googled it and found someone said it is because of the improper start value. I tried many times but can not solve it. Does anyone can help me? thanks a lot ! my code is: fit1-nls2(lnd~log(1/(den1/R1+den2+den3+den4+den5/R5)-1)/c+log(d50), start=c(R1=0.9, R5=23, c=-1.1, d50=10), data=SWrt) data (SWrt) is: Depth lnd den1 den2 den3 den4 den5 1 20 2.995732 0.4190 0.00 0.00 0.00 0.00 2 40 3.688879 0.6725 0.00 0.00 0.00 0.00 3 60 4.094345 0.8780 0.00 0.00 0.00 0.00 4 80 4.382027 0.9660 0.00 0.00 0.00 0.00 5100 4.605170 0.9990 0.00 0.00 0.00 0.00 6 15 2.708050 0. 0.38 0.00 0.00 0.00 7 30 3.401197 0. 0.48 0.00 0.00 0.00 8 45 3.806662 0. 0.85 0.00 0.00 0.00 9 60 4.094345 0. 0.88 0.00 0.00 0.00 10 120 4.787492 0. 1.00 0.00 0.00 0.00 1115 2.708050 0. 0.00 0.48 0.00 0.00 1230 3.401197 0. 0.00
Re: [R] Faster text search in document database than with grep?
Dear Duncan, This is a model of the data I work with. database - replicate(5, paste(sample(letters,rexp(1,1/500), rep=TRUE), collapse=)) words - replicate(1,paste(sample(letters,rexp(1,1/70), rep=TRUE), collapse=)) NumberOfWords - 10 system.time(lapply(words[1: NumberOfWords], grep, database)) user system elapsed 5.002 0.003 5.005 The model reproduces the running times I have to cope with. To use grep in this context is rather naive and I am wondering if there are better solutions availabe in R. On 3 August 2015 at 15:13, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 03/08/2015 5:25 AM, Witold E Wolski wrote: I have a database of text documents (letter sequences). Several thousands of documents with approx. 1000-2000 letters each. I need to find exact matches of short 3-15 letters sequences in those documents. Without any regexp patterns the search of one 3-15 letter words takes in the order of 1s. So for a database with several thousand documents it's an the order of hours. The naive approach would be to use mcmapply, but than on a standard hardware I am still in the same order and since R is an interactive programming environment this isn't a solution I would go for. But aren't there faster algorithmic solutions? Can anyone point me please to an implementation available in R. You haven't shown us what you did, but it sounds far slower than I'd expect. I just used the code below to set up a database of 1 documents of 2000 letters each, and searching those documents for abc takes about 70 milliseconds: database - replicate(1, paste(sample(letters, 2000, rep=TRUE), collapse=)) grep(abc, database, fixed=TRUE) Duncan Murdoch -- Witold Eryk Wolski [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Faster text search in document database than with grep?
On 03/08/2015 5:25 AM, Witold E Wolski wrote: I have a database of text documents (letter sequences). Several thousands of documents with approx. 1000-2000 letters each. I need to find exact matches of short 3-15 letters sequences in those documents. Without any regexp patterns the search of one 3-15 letter words takes in the order of 1s. So for a database with several thousand documents it's an the order of hours. The naive approach would be to use mcmapply, but than on a standard hardware I am still in the same order and since R is an interactive programming environment this isn't a solution I would go for. But aren't there faster algorithmic solutions? Can anyone point me please to an implementation available in R. You haven't shown us what you did, but it sounds far slower than I'd expect. I just used the code below to set up a database of 1 documents of 2000 letters each, and searching those documents for abc takes about 70 milliseconds: database - replicate(1, paste(sample(letters, 2000, rep=TRUE), collapse=)) grep(abc, database, fixed=TRUE) Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] scaling variables consecutively and independently
On Aug 3, 2015, at 1:06 PM, Ram09 wrote: Yes, I've been using the scale function but I don't know how to write a line of code that will scale the scores in each variable independently of each other instead of as a whole. You do not appear to be reading the help page for `scale`. In other words, how can I get the scale function to standardize all the scores in one variable (column) then move on to the the next, so on and so forth for the whole dataset without having to tediously type out the same line of code for each variable? This is the first Line of text in the help page: 'scale' is generic function whose default method centers and/or scales the columns of a numeric matrix. If you do not want the result as a matrix then you can use lapply on a dataframe. -- David. R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to simulate informative censoring in a Cox PH model?
Hi Greg The copulas concept seems a nicely simple way of simulating event times that are subject to informative censoring (in contrast to the double cox model approach I use). The correlation between the marginal uniform random variables you speak of reminded me that my approach should also induce this correlation, just in a different way. Similarly I should also observe zero correlation between my event times from my outcome model and the censoring times. Unfortunately this was not the case - to cut a long story short I was inadvertently generating my independent censoring times from a model that depended on covariates in the outcome model. This now explains the mixed results I rather laboriously attempted to describe previously. Re-running some scenarios with my new error-free code I can now clearly observe the points you have been making, that is informative censoring only leads to bias if the covariates in the censoring model are not in the outcome model. Indeed I can choose the common (to both models) treatment effect to be vastly different (with all other effects the same) and have no bias, yet small differences in the censoring Z effect (not in the outcome model) effect lead to moderate biases. I am still somewhat confused at the other approach to this problem where I have seen in various journal articles authors assuming an outcome model for the censored subjects - i.e. an outcome model for the unobserved event times. Using this approach the definition of informative censoring appears to be where the observed and un-observed outcome models are different. This approach also makes sense to me - censoring merely loses precision of the parameter estimators due to reduced events, but does not lead to bias. However the concept of correlated event and censoring times does not even present itself here? Thanks Dan On Fri, Jul 31, 2015 at 5:06 PM, Greg Snow 538...@gmail.com wrote: Daniel, Basically just responding to your last paragraph (the others are interesting, but I think that you are learning as much as anyone and I don't currently have any other suggestions). I am not an expert on copulas, so this is a basic understanding, you should learn more about them if you choose to use them. The main idea of a copula is that it is a bivariate or multivariate distribution where all the variables have uniform marginal distributions but the variables are not independent from each other. How I would suggest using them is to choose a copula and generate random points from a bivariate copula, then put those (uniform) values into the inverse pdf function for the Weibull (or other distribution), one of which is the event time, the other the censoring time. This will give you times that (marginally) come from the distributions of interest, but are not independent (so would be considered informative censoring). Repeat this with different levels of relationship in the copula to see how much difference it makes in your simulations. On Thu, Jul 30, 2015 at 2:02 PM, Daniel Meddings dpmeddi...@gmail.com wrote: Thanks Greg once more for taking the time to reply. I certainly agree that this is not a simple set-up, although it is realistic I think. In short you are correct about model mis-specification being the key to producing more biased estimates under informative than under non-informative censoring. After looking again at my code and trying various things I realize that the key factor that leads to the informative and non-informative censoring data giving rise to the same biased estimates is how I generate my Z_i variable, and also the magnitude of the Z_i coefficient in both of the event and informative censoring models. In the example I gave I generated Z_i (I think of this as a poor prognosis variable) from a beta distribution so that it ranged from 0-1. The biased estimates for beta_t_1 (I think of this as the effect of a treatment on survival) were approximately 1.56 when the true value was -1. What I forgot to mention was that estimating a cox model with 1,000,000 subjects to the full data (i.e. no censoring at all) arguably gives the best treatment effect estimate possible given that the effects of Z_i and Z_i*Treat_i are not in the model. This best possible estimate turns out to be 1.55 - i.e. the example I gave just so happens to be such that even with 25-27% censoring, the estimates obtained are almost the best that can be attained. My guess is that the informative censoring does not bias the estimate more than non-informative censoring because the only variable not accounted for in the model is Z_i which does not have a large enough effect beta_t_2, and/or beta_c_2, or perhaps because Z_i only has a narrow range which does not permit the current beta_t_2 value to do any damage? To investigate the beta_t_2, and/or beta_c_2 issue I changed beta_c_2 from 2 to 7 and beta_c_0 from 0.2 to -1.2, and beta_d_0 from
Re: [R] scaling variables consecutively and independently
On Aug 3, 2015, at 11:42 AM, Ram09 wrote: Hello Everyone, So I am very new to R and I'm having some trouble. I basically have around 110 datasets each one made up of around 100 variables. I am trying to z-score the scores in each column but independently of each other ( each column independent of the other). The problem is that there are just too many variables in each dataset to compute individually. I figured there should be some type of loop that I can do in which it would scale the scores in each column and then move on to the next but I haven't been able to find anything about this. Can anyone help? Thanks so much! Perhaps you are looking for the 'scale'-function? deleted Nabble link R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] scaling variables consecutively and independently
Yes, I've been using the scale function but I don't know how to write a line of code that will scale the scores in each variable independently of each other instead of as a whole. In other words, how can I get the scale function to standardize all the scores in one variable (column) then move on to the the next, so on and so forth for the whole dataset without having to tediously type out the same line of code for each variable? -RW -- View this message in context: http://r.789695.n4.nabble.com/scaling-variables-consecutively-and-independently-tp4710702p4710709.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Environmental Data Connector v1.3
On Aug 3, 2015, at 8:12 AM, Robert in SA ri.william...@outlook.com wrote: Hello. I have successfully installed EDC v1.3 on linux ubuntu 14.04. I am running a 64bit machine with R 3.2.1 via Rstudio. I have tried example1 - EDC.get(1) after loading the ncdf and EDCR libraries, from both the R terminal and Rstudio and get the following result: Error in setwd(paste(Sys.getenv(EDC_HOME), sep = )) : cannot change working directory During installation EDC_HOME was set to /home/robert/EDC and the directory definitely exists. Does anyone have any suggestions? Bringing this back on list so that the solution is on record , the problem is that EDC_HOME was not set when the install was done. We are looking into whether this is a problem with the installer. In the meantime, if this problem pops-up again, use: Sys.setenv(EDC_HOME=/your/EDC/home/directory”) Many thanks to Bill Dunlop of Tibco for help on solving this. He also notes: Yes. You can set EDC_HOME on your .Rprofile or Renviron (sp?) files. See help(Startup) for details. -Roy ** The contents of this message do not reflect any position of the U.S. Government or NOAA. ** Roy Mendelssohn Supervisory Operations Research Analyst NOAA/NMFS Environmental Research Division Southwest Fisheries Science Center ***Note new address and phone*** 110 Shaffer Road Santa Cruz, CA 95060 Phone: (831)-420-3666 Fax: (831) 420-3980 e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/ Old age and treachery will overcome youth and skill. From those who have been given much, much will be expected the arc of the moral universe is long, but it bends toward justice -MLK Jr. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.