Re: [R] R Help
Better ask for local help if you can't reduce your code to some minimal examples so that we can understand easily what you are looking for. On 02.03.2015 10:38, Rami Alzebdieh wrote: Dear Sir, I start using (R) 3 months ago, and I am still learning, Same for me after more than 16 years. Best, Uwe Ligges I have a project and I am using R in this project, my friend helped me to build a code for this project and it's working perfect, but I need to make a small change in, it looks very simple but for me it's very complicated. I insert the code and I hope if you can help me this problem. I highlighted what exactly I need to change. This project is calculating the market and industry weighted returns for each based on the date levels. sync = read.csv(country-14.csv,header=T) id.country = 14 sync = sync[sync$country!=country sync$country==id.country,-c(2,5)] sync$price=as.numeric(as.character(sync$price)) sync$mv=as.numeric(as.character(sync$mv)) attach(sync) Calculate returns and add to the dataset n.comp = nlevels(as.factor(as.character(sync$company_name))) comp.names = levels(as.factor(as.character(sync$company_name))) data = vector(list,n.comp) for(i in 1:n.comp){ temp = sync[sync$company_name==comp.names[i],] data[[i]] = cbind(temp,c(NA,diff(temp$price)/temp$price[1:(length(temp$price)-1)])) } sync = do.call(rbind,data) names(sync)[7] = returns detach(sync) attach(sync) Fill industry_code column industry_code=rep(NA,dim(sync)[1]) for(i in 1:dim(sync)[1]){ if(nchar(as.character(company_code[i])) == 3){ industry_code[i] = as.numeric(substr(as.character(company_code[i]),1,1)) } else { industry_code[i] = as.numeric(substr(as.character(company_code[i]),1,2)) } print((i/dim(sync)[1])*100) } sync = cbind(sync,as.factor(industry_code)) names(sync)[8] = industry_code detach(sync) attach(sync) Calculate market weighted returns and add to the dataset market_returns = rep(NA,dim(sync)[1]) industry_returns = rep(NA,dim(sync)[1]) for(i in 1:nlevels(date)){ data = sync[date==levels(date)[i],] data$company_name = as.factor(as.character(data$company_name)) for(m in 1:nlevels(data$company_name)){ index1 = data$company_name == levels(data$company_name)[m] index2 = date==levels(date)[i] company_name==levels(data$company_name)[m] market_returns[index2] = (sum(data$returns*(data$mv/sum(data$mv,na.rm=TRUE)),na.rm=TRUE) - (data$returns[index1]*(data$mv[index1]/sum(data$mv,na.rm=TRUE/(nlevels(data$company_name)-1) ## this what I need to change, instead of using the number of levels companies in the dataset (nlevels(data$company_name) , I need to put the number of returns values(data$returns) without NA (by the way this code is calculating returns at the date level as you can see from above) } print(i/nlevels(date)) } sync = cbind(sync,market_returns) names(sync)[9] = market_returns detach(sync) attach(sync) Calculate industry weighted returns and add to the dataset for(i in 1:nlevels(date)){ for(k in 1:nlevels(as.factor(as.character(industry_code{ data1 = sync[date==levels(date)[i] industry_code==levels(as.factor(as.character(industry_code)))[k],] data1$company_name = as.factor(as.character(data1$company_name)) for(l in 1:nlevels(data1$company_name)){ index3 = data1$company_name == levels(data1$company_name)[l] index4 = date==levels(date)[i] company_name==levels(data1$company_name)[l] industry_returns[index4] = (sum(data1$returns*(data1$mv/sum(data1$mv,na.rm=TRUE)),na.rm=TRUE) - (data1$returns[index3]*(data1$mv[index3]/sum(data1$mv,na.rm=TRUE/(nlevels(data1$company_name)-1) ## also here I need to change, instead of using the number of levels companies in the dataset (nlevels(data1$company_name) , I need to put the number of returns values(data1$returns) without NA (by the way this code is calculating returns at the date level and industry level as you can see from above) } } print(i/nlevels(date)) } sync = cbind(sync,industry_returns) names(sync)[10] = industry_returns detach(sync) attach(sync) year = apply(as.matrix(sync$date),1,function(x) as.factor(substr(as.character(x),7,10))) sync = cbind(sync,as.factor(year)) names(sync)[11] = year sync = sync[sync$year!=1999,] sync$year = as.factor(as.character(sync$year)) detach(sync) attach(sync) year = as.factor(as.character(year)) industry_code = as.factor(as.character(industry_code)) comp.per.ind = rep(NA, dim(sync)[1]) for(i in 1:nlevels(year)){ for(j in 1:nlevels(industry_code)){ index = year==levels(year)[i] industry_code==levels(industry_code)[j] data = sync[index,] comp.per.ind[index] = nlevels(as.factor(as.character(data$company_name))) } } sync = cbind(sync,as.factor(comp.per.ind)) names(sync)[12] = comp.per.ind detach(sync) attach(sync) write.csv(sync,paste(Returns_data,id.country,.csv,sep=)) Thank you
[R] R Help
Dear Sir, I start using (R) 3 months ago, and I am still learning, I have a project and I am using R in this project, my friend helped me to build a code for this project and it's working perfect, but I need to make a small change in, it looks very simple but for me it's very complicated. I insert the code and I hope if you can help me this problem. I highlighted what exactly I need to change. This project is calculating the market and industry weighted returns for each based on the date levels. sync = read.csv(country-14.csv,header=T) id.country = 14 sync = sync[sync$country!=country sync$country==id.country,-c(2,5)] sync$price=as.numeric(as.character(sync$price)) sync$mv=as.numeric(as.character(sync$mv)) attach(sync) Calculate returns and add to the dataset n.comp = nlevels(as.factor(as.character(sync$company_name))) comp.names = levels(as.factor(as.character(sync$company_name))) data = vector(list,n.comp) for(i in 1:n.comp){ temp = sync[sync$company_name==comp.names[i],] data[[i]] = cbind(temp,c(NA,diff(temp$price)/temp$price[1:(length(temp$price)-1)])) } sync = do.call(rbind,data) names(sync)[7] = returns detach(sync) attach(sync) Fill industry_code column industry_code=rep(NA,dim(sync)[1]) for(i in 1:dim(sync)[1]){ if(nchar(as.character(company_code[i])) == 3){ industry_code[i] = as.numeric(substr(as.character(company_code[i]),1,1)) } else { industry_code[i] = as.numeric(substr(as.character(company_code[i]),1,2)) } print((i/dim(sync)[1])*100) } sync = cbind(sync,as.factor(industry_code)) names(sync)[8] = industry_code detach(sync) attach(sync) Calculate market weighted returns and add to the dataset market_returns = rep(NA,dim(sync)[1]) industry_returns = rep(NA,dim(sync)[1]) for(i in 1:nlevels(date)){ data = sync[date==levels(date)[i],] data$company_name = as.factor(as.character(data$company_name)) for(m in 1:nlevels(data$company_name)){ index1 = data$company_name == levels(data$company_name)[m] index2 = date==levels(date)[i] company_name==levels(data$company_name)[m] market_returns[index2] = (sum(data$returns*(data$mv/sum(data$mv,na.rm=TRUE)),na.rm=TRUE) - (data$returns[index1]*(data$mv[index1]/sum(data$mv,na.rm=TRUE/(nlevels(data$company_name)-1) ## this what I need to change, instead of using the number of levels companies in the dataset (nlevels(data$company_name) , I need to put the number of returns values(data$returns) without NA (by the way this code is calculating returns at the date level as you can see from above) } print(i/nlevels(date)) } sync = cbind(sync,market_returns) names(sync)[9] = market_returns detach(sync) attach(sync) Calculate industry weighted returns and add to the dataset for(i in 1:nlevels(date)){ for(k in 1:nlevels(as.factor(as.character(industry_code{ data1 = sync[date==levels(date)[i] industry_code==levels(as.factor(as.character(industry_code)))[k],] data1$company_name = as.factor(as.character(data1$company_name)) for(l in 1:nlevels(data1$company_name)){ index3 = data1$company_name == levels(data1$company_name)[l] index4 = date==levels(date)[i] company_name==levels(data1$company_name)[l] industry_returns[index4] = (sum(data1$returns*(data1$mv/sum(data1$mv,na.rm=TRUE)),na.rm=TRUE) - (data1$returns[index3]*(data1$mv[index3]/sum(data1$mv,na.rm=TRUE/(nlevels(data1$company_name)-1) ## also here I need to change, instead of using the number of levels companies in the dataset (nlevels(data1$company_name) , I need to put the number of returns values(data1$returns) without NA (by the way this code is calculating returns at the date level and industry level as you can see from above) } } print(i/nlevels(date)) } sync = cbind(sync,industry_returns) names(sync)[10] = industry_returns detach(sync) attach(sync) year = apply(as.matrix(sync$date),1,function(x) as.factor(substr(as.character(x),7,10))) sync = cbind(sync,as.factor(year)) names(sync)[11] = year sync = sync[sync$year!=1999,] sync$year = as.factor(as.character(sync$year)) detach(sync) attach(sync) year = as.factor(as.character(year)) industry_code = as.factor(as.character(industry_code)) comp.per.ind = rep(NA, dim(sync)[1]) for(i in 1:nlevels(year)){ for(j in 1:nlevels(industry_code)){ index = year==levels(year)[i] industry_code==levels(industry_code)[j] data = sync[index,] comp.per.ind[index] = nlevels(as.factor(as.character(data$company_name))) } } sync = cbind(sync,as.factor(comp.per.ind)) names(sync)[12] = comp.per.ind detach(sync) attach(sync) write.csv(sync,paste(Returns_data,id.country,.csv,sep=)) Thank you for your help Rami Alzebdieh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R-es] PERMUTACIONES EN R
Hola David, ¿Seguro que buscas las combinaciones? Creo que lo que buscas es esto... # MuestraS - c(1 ,1, 1, 1, 0, 1, 1, 0, 1, 1) library(combinat) resPer - permn(MuestraS) matresPer - matrix(unlist(resPer), nrow=factorial(length(MuestraS)), ncol=length(MuestraS)) head(matresPer) # Que produce esto: matresPer - matrix(unlist(resPer), nrow=factorial(length(MuestraS)), ncol=length(MuestraS)) head(matresPer) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,]110111101 1 [2,]111011110 1 [3,]101111011 1 [4,]111101111 0 [5,]011110111 1 [6,]111111111 1 Cuidado que el objeto resPer es una lista de 3628800 elementos... 609.1Mb que como matriz sólo ocupa 276.9Mb. Saludos, Carlos Ortega www.qualityexcellence.es El 2 de marzo de 2015, 22:51, David Contreras davidcontrera...@gmail.com escribió: Buena tarde amigos, En días pasados hice algunas consultas y ya pude salir de las dudas que tenia en ese momento, ahora requiero de su colaboración con lo siguiente: Tengo un vector dicotomico (Binario) con la siguiente información que me surgio de algunos procesos anteriores: MuestraS [1] 1 1 1 1 0 1 1 0 1 1 Ahora necesito hallar todas las posibles combinaciones que se puedan hacer con estos elementos para luego hacer un muestro aleatorio simple con reemplazo y seleccionar algunas de las posibles muestras que se obtengan. Agradezco me puedan ayudar con este asunto Saludos, DC. [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es -- Saludos, Carlos Ortega www.qualityexcellence.es [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
[R-es] PERMUTACIONES EN R
Buena tarde amigos, En días pasados hice algunas consultas y ya pude salir de las dudas que tenia en ese momento, ahora requiero de su colaboración con lo siguiente: Tengo un vector dicotomico (Binario) con la siguiente información que me surgio de algunos procesos anteriores: MuestraS [1] 1 1 1 1 0 1 1 0 1 1 Ahora necesito hallar todas las posibles combinaciones que se puedan hacer con estos elementos para luego hacer un muestro aleatorio simple con reemplazo y seleccionar algunas de las posibles muestras que se obtengan. Agradezco me puedan ayudar con este asunto Saludos, DC. [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
Re: [R] How to decide weight in WLS model in R ?
Angela: These are statistical, not R, issues I believe, and you appear to be out of your depth statistically here. I suggest you talk to a local statistical resource or, if you can't find such help, post on a statistical site like stats.stackexchange.com. Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. Clifford Stoll On Mon, Mar 2, 2015 at 12:13 PM, Yan Wu yanwu1...@gmail.com wrote: Hi, I would like to know how to decide the weight in a WLS model in R? For example, In the pipeline data from faraway, I try to fit a regression model Lab ~ Field (non-constant variance). I wish to use weights to account for the non-constant variance. So how to decide the weight in the WLS model? For the pipeline data, they split the range of Field into 12 groups of size 9. within each group, and they compute the variance of Lab as varlab and the mean of Field as meanfield. In addition, they suppose that the variance in the response is linked to the predictor in the following way: var(Lab)=a*(Field^b). So we could get a estimate of a and b by regress log(varlab) on log(meanfield). But how to determine weights in a WLS fit of Lab on Field in R? I guess that it may require the function of 'VarConstPower' in R in the example above. So could you please explain how to use 'VarConstPower' in R? I will appreciate it if you could please answer the two questions above. Thanks! Angela - [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to decide weight in WLS model in R ?
Hi, I would like to know how to decide the weight in a WLS model in R? For example, In the pipeline data from faraway, I try to fit a regression model Lab ~ Field (non-constant variance). I wish to use weights to account for the non-constant variance. So how to decide the weight in the WLS model? For the pipeline data, they split the range of Field into 12 groups of size 9. within each group, and they compute the variance of Lab as varlab and the mean of Field as meanfield. In addition, they suppose that the variance in the response is linked to the predictor in the following way: var(Lab)=a*(Field^b). So we could get a estimate of a and b by regress log(varlab) on log(meanfield). But how to determine weights in a WLS fit of Lab on Field in R? I guess that it may require the function of 'VarConstPower' in R in the example above. So could you please explain how to use 'VarConstPower' in R? I will appreciate it if you could please answer the two questions above. Thanks! Angela - [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installed R to Windows, but having trouble with read.csv
On 02/03/2015 11:53 AM, Mello Cavallo, Alice wrote: I copied the file into the bin folder of R ... perf_data - read.csv(PerfResultsCSv.csv) Error in file(file, rt) : cannot open the connection In addition: Warning message: In file(file, rt) : cannot open file 'PerfResultsCSv.csv': No such file or directory I also installed and load RWeka package, but canot open. It seems like I am having some issue of directories... I am new to R, any help appreciated. Setting the filename using the file.choose() function is usually easiest. I.e. f - file.choose() perf_data - read.csv(f) You'll get the usual file choose dialog (at least in Windows and OSX; not sure about Linux) when file.choose() runs and can navigate to the file. The full name including path will be saved in f. Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Looping and break
Hello, I apologies for bringing up next and break in loops given that there is so much on the net about it, but I've tried numerous examples found using Google and just can't seem to get this to work. This is a simple version of what I am doing with matrices but it shows the issue. I need to have the loop indexed as n to perform a calculation on the variable total. But if total is greater than 8, it goes to the next loop indexed a. For example, it does condition a = 1 for n = 1 to 50 but within n if total is greater than 8 it goes to the next condition of a which would be a = 2, and so on. for (a in 1:3){ if (a == 1) { b - c(1:5) } if (a == 2) { b - c(1:5) } if (a == 3) { b - c(1:5) } for (n in 1:50){ if (n 15) next total - 2*b if (total 8) next } } Any help would be greatly appreciated. Thanks, Scott -- View this message in context: http://r.789695.n4.nabble.com/Looping-and-break-tp4704093.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R-es] PERMUTACIONES EN R
Muchas gracias por sus respuestas tan oportunas y precisas, Use partes de los dos códigos y hasta el momento todo va perfecto. Nuevamente gracias y estaré en contacto ante cualquier inconveniente que se me presente. Saludos, DC. El 2 de marzo de 2015, 17:50, Carlos Ortega c...@qualityexcellence.es escribió: Hola David, ¿Seguro que buscas las combinaciones? Creo que lo que buscas es esto... # MuestraS - c(1 ,1, 1, 1, 0, 1, 1, 0, 1, 1) library(combinat) resPer - permn(MuestraS) matresPer - matrix(unlist(resPer), nrow=factorial(length(MuestraS)), ncol=length(MuestraS)) head(matresPer) # Que produce esto: matresPer - matrix(unlist(resPer), nrow=factorial(length(MuestraS)), ncol=length(MuestraS)) head(matresPer) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,]110111101 1 [2,]111011110 1 [3,]101111011 1 [4,]111101111 0 [5,]011110111 1 [6,]111111111 1 Cuidado que el objeto resPer es una lista de 3628800 elementos... 609.1Mb que como matriz sólo ocupa 276.9Mb. Saludos, Carlos Ortega www.qualityexcellence.es El 2 de marzo de 2015, 22:51, David Contreras davidcontrera...@gmail.com escribió: Buena tarde amigos, En días pasados hice algunas consultas y ya pude salir de las dudas que tenia en ese momento, ahora requiero de su colaboración con lo siguiente: Tengo un vector dicotomico (Binario) con la siguiente información que me surgio de algunos procesos anteriores: MuestraS [1] 1 1 1 1 0 1 1 0 1 1 Ahora necesito hallar todas las posibles combinaciones que se puedan hacer con estos elementos para luego hacer un muestro aleatorio simple con reemplazo y seleccionar algunas de las posibles muestras que se obtengan. Agradezco me puedan ayudar con este asunto Saludos, DC. [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es -- Saludos, Carlos Ortega www.qualityexcellence.es [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
Re: [R] Looping and break
On 03/03/15 15:04, Jeff Newmiller wrote: Your example is decidedly not expressed in R, though it looks like you tried. Can you provide the hand-computed result that you are trying to obtain? Note that the reason you cannot find anything about next or break in R is that they don't exist. Point of order, Mr. Chairman, but they ***do*** exist. See e.g ?next (which actually takes you to the help for Control Flow). There are generally alternative ways to accomplish the kinds of things you might want to accomplish without them, and those alternatives often don't involve explicit loops at all. Otherwise I concur with everything you say. cheers, Rolf -- Rolf Turner Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 Home phone: +64-9-480-4619 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Looping and break
Sigh. To be positive is to be wrong at the top of one's lungs. Next I will be told R has a goto statement. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On March 2, 2015 6:23:57 PM PST, Rolf Turner r.tur...@auckland.ac.nz wrote: On 03/03/15 15:04, Jeff Newmiller wrote: Your example is decidedly not expressed in R, though it looks like you tried. Can you provide the hand-computed result that you are trying to obtain? Note that the reason you cannot find anything about next or break in R is that they don't exist. Point of order, Mr. Chairman, but they ***do*** exist. See e.g ?next (which actually takes you to the help for Control Flow). There are generally alternative ways to accomplish the kinds of things you might want to accomplish without them, and those alternatives often don't involve explicit loops at all. Otherwise I concur with everything you say. cheers, Rolf __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum upto every twelth cell in a column
Here is an implementation. (x - c(23,35,22,11,10,1,14,15,13,15,17,16,154,13,24,25,25,25,25,25,22,11,15,15)) [1] 23 35 22 11 10 1 14 15 13 15 17 16 154 13 24 25 25 25 25 25 22 11 15 15 (y - c(0,cumsum(x))) [1] 0 23 58 80 91 101 102 116 131 144 159 176 192 346 359 383 408 433 458 483 508 530 541 556 [25] 571 (y[seq(13,length(y),12)] - y[seq(1,length(y)-12,12)]) [1] 192 379 -- View this message in context: http://r.789695.n4.nabble.com/Sum-upto-every-twelth-cell-in-a-column-tp4704007p4704087.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Looping and break
Your example is decidedly not expressed in R, though it looks like you tried. Can you provide the hand-computed result that you are trying to obtain? Note that the reason you cannot find anything about next or break in R is that they don't exist. There are generally alternative ways to accomplish the kinds of things you might want to accomplish without them, and those alternatives often don't involve explicit loops at all. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On March 2, 2015 4:11:21 PM PST, Scott Colwell scolw...@uoguelph.ca wrote: Hello, I apologies for bringing up next and break in loops given that there is so much on the net about it, but I've tried numerous examples found using Google and just can't seem to get this to work. This is a simple version of what I am doing with matrices but it shows the issue. I need to have the loop indexed as n to perform a calculation on the variable total. But if total is greater than 8, it goes to the next loop indexed a. For example, it does condition a = 1 for n = 1 to 50 but within n if total is greater than 8 it goes to the next condition of a which would be a = 2, and so on. for (a in 1:3){ if (a == 1) { b - c(1:5) } if (a == 2) { b - c(1:5) } if (a == 3) { b - c(1:5) } for (n in 1:50){ if (n 15) next total - 2*b if (total 8) next } } Any help would be greatly appreciated. Thanks, Scott -- View this message in context: http://r.789695.n4.nabble.com/Looping-and-break-tp4704093.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Looping and break
On 03/03/15 16:08, Jeff Newmiller wrote: Sigh. To be positive is to be wrong at the top of one's lungs. Next I will be told R has a goto statement. I am ***positive*** that it hasn't! :-) Well, 99.999% confident. Although I guess it's not inconceivable that some misguided nerd might construct one. In R all things are possible. It'd be tough, but, in view of the fact that statements are not identified/identifiable in R so it would be hard to tell the code, uh, where to go. cheers, Rolf -- Rolf Turner Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 Home phone: +64-9-480-4619 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] vectorize data string analysis
Hello All, I have to admit that I am not that good when it comes to vectorizing a function. I need some insight. Is the below a case where vectorization can be accomplished to improve speed? Below the function a sample data - as you can see it is not delimited. However, the record length is 220 characters. So I wrote the following code to delimit the data set /r. The function works and I have a dataset that can then be inserted into a MySql data table. However, the actual data set is 518,000 records so the number of characters is 518000 * 220. It takes R hours to parse this using the function I have written. Can this be vectorized or is this a loop deal? Best Regards, Glenn #' FNMA Factor #' #' This function parses the FNMA factor file for load into #' into a database table the FNMA factor file is non-delimited #' @param filepath A character vector specifying a data director #' @param lenght of the line A numeric value equal to the length of a line #' @export FNMAFactor - function(filepath = character){ callpath - paste(filepath,mbsfact.txt, sep = ) returnpath - paste(filepath,factor.txt, sep = ) data - readLines(con = callpath) numchar - nchar(data, type = chars) start - c(seq(1, numchar, 220)) end - c(seq(220, numchar, 220)) for(i in 1 : length(start)){ write(str_sub(data, start[i], end[i]), file = returnpath, append = TRUE)} } 31365EJ46 CI125483 2003473100OCT0303103340610.1548980406.500030197040112180MULTIPLE POOL 0070147FNMS 06.500 CI125483070170096031371KMA6 CL254253 1304570700OCT0310156865640.778566.30102030132357MULTIPLE POOL 0067230FNMS 06.000 CL254253067150333031371RE44 CL259455 0983651400OCT0303447615880.3504916406.500050102050132357MULTIPLE POOL 0070200FNMS 06.500 CL259455070450340031376KBB1 CL357434 2505145900OCT0325021294240.9987958905.90103090133359MULTIPLE POOL 0055000FNMS 05.000 CL357434055000358031385XE52 WS56 3651248300OCT0333344198060.9132273504.575050103050133356MEGA POOL ** NOT AN ACTIVE SERVICER ** 0052440FNAR 04.595 WS560031385XLL9 WS555731 00013439369600OCT03000129242191330.9616685505.360080103040133352MEGA POOL ** NOT AN ACTIVE SERVICER ** 0075160FNAR 05.368 WS5557310031390XG87 CI659123 0208856500OCT0301136251660.5440346206.80102080117179WASHINGTON MUTUAL BANK, FA 19850 PLUMMER STREET CHATSWORTH CA91311069210FNMS 06.000 CI659123069090165031403BTR4 CL744060 0770371700OCT0307694084860.9987496805.90103080133356MULTIPLE POOL 0053920FNMS 05.000 CL7440600031403GND0 LB748388 0952312900OCT0309512089400.9988407604.525090103080133358DLJ MORTGAGE CAPITAL INC. ELEVEN MADISON AVENUE NEW YORK NY10010058430FNAR XX.XXX LB7483880031403GNG3 LB748391 0715661500OCT0307007212290.9791238304.379090103080133358DLJ MORTGAGE CAPITAL INC. ELEVEN MADISON AVENUE NEW YORK NY10010056530FNAR XX.XXX LB74839100 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting data frame by prepared order
Here is an implementation. t - data.frame(x=c(1,1,1,1,1,2,2,2,2,2),y=c(a,a,a,b,b,a,a,b,b,b)) t x y 1 1 a 2 1 a 3 1 a 4 1 b 5 1 b 6 2 a 7 2 a 8 2 b 9 2 b 10 2 b assignSeq function(test) { temp - test[order(test$x),] InC - numeric(length(test)) inD - unique(test$x) countAll - 0 for (i in 1:length(inD)) { countA - 0 countB - 0 for (j in 1:dim(temp[temp$x==inD[i],])[1]) { countAll - countAll + 1 if (temp$y[countAll] == a) { InC[countAll] - 2*countA countA - countA + 1 } else { InC[countAll] - 2*countB + 1 countB - countB + 1 } } } temp$seq - InC return(temp) } d - assignSeq(t) d[order(d$x,d$seq),-3] x y 1 1 a 4 1 b 2 1 a 5 1 b 3 1 a 6 2 a 8 2 b 7 2 a 9 2 b 10 2 b -- View this message in context: http://r.789695.n4.nabble.com/Sorting-data-frame-by-prepared-order-tp4704038p4704092.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] table
Hello List, I am trying to obtain a table containing absolute and relative frequencies but it must be done by strata. Each strata have to contain totals and subtotals being the sum of the subtotals equal to the total in upper strata in same column. As this could be some vague I am including an example of such table: data-data.frame(Provincial=rep(c(Prov1,Prov2,Prov1,Prov3),10), Municipios=rep(c(Mun1,Mun2,Mun3,Mun4),10),unit=rep(c(unit1,unit2,unit3,unit4),10)) VariableN % Province (i) Municipalities (j) Health units (k) #8721;i, #8721;j, #8721;k And so on i = 1 to 16 #8721;i, #8721;j, #8721;k If you could help me to obtain a function to get such table I would appreciate very much. Best and thank you . maicel monzon MD. MSc. -- Este mensaje le ha llegado mediante el servicio de correo electronico que ofrece Infomed para respaldar el cumplimiento de las misiones del Sistema Nacional de Salud. La persona que envia este correo asume el compromiso de usar el servicio a tales fines y cumplir con las regulaciones establecidas Infomed: http://www.sld.cu/ __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rms package: error with Glm
Dear R-help, I'm getting an error with Glm (from the rms package) when the equivalent model using glm does not give an error. This is using rms 4.3-0 in R 3.1.1. An example is shown below. I have set the seed value, but the error is not specific to this seed value. Thanks for any help anyone can give. Mark Seeto # library(rms) set.seed(1) n - 100 # sample size beta0 - 3.7 beta1 - 1.5 beta2 - 0.9 beta3 - 0.5 rate.x1 - 2 mean.x2 - 1 sd.x2 - 2 nu - 1.3 d - data.frame(x1 = rexp(n, rate = rate.x1), x2 = rnorm(n, mean.x2, sd.x2)) d$y - rgamma(n, shape = nu, rate = nu/exp(beta0 + beta1*d$x1 + beta2*d$x2 + beta3*d$x2^2)) glm(y ~ x1 + x2 + I(x2^2), family = Gamma(link=log), data=d) # No error Glm(y ~ x1 + pol(x2, 2), family = Gamma(link=log), data=d) # Error shown below ## Error in glm.fit(x = X[, Intercept, drop = FALSE], y = Y, weights = weights, : ## NA/NaN/Inf in 'x' ## In addition: Warning message: ## step size truncated due to divergence __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to speed up a double loop?
Hi I do not see much logic in your removal of outliers. you can easily find which values differ from previous one by more than 15 myts[c(FALSE,abs(diff(myts$x))15),] but I did not understand why do you keep values from row 8 and 10. Your example can be solved by myts$y[myts$x15]-1 myts$x-myts$x*(myts$x15) but it probably is not what you want. Cheers Petr -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of jeff6868 Sent: Monday, March 02, 2015 12:11 PM To: r-help@r-project.org Subject: [R] How to speed up a double loop? Dear R-users, I would like to speed up a double-loop I developed for detecting and removing outliers in my whole data.frame. The idea is to remove data with a too big difference with the previous value. If detected, this test must be done here on maximum the next 10 values following the last correct one (and put an index on another column). It works well on a small data frame, but really too slowly for my real DF with 500 000 rows. Here's a fake data example and the double-loop: myts - data.frame(x=c(1,2,50,40,30,40,100,1,50,1,2,3,3,5,4),y=NA) for(jj in 1:(nrow(myts)-10)){ for(nn in ((jj+1):(jj+10))) { if((!is.na(myts[jj,1])) (!is.na(myts[nn,1])) (abs((myts[nn,1])-(myts[jj,1]))15)) { myts[nn,2] - 1 myts[nn,1] - NA } } } Can somebody explain me how can I speed this up easily? I heard about vectorization but I don't really understand how it works. -- View this message in context: http://r.789695.n4.nabble.com/How-to- speed-up-a-double-loop-tp4704054.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny pouze jeho adresátům. Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze svého systému. Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat. Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či zpožděním přenosu e-mailu. V případě, že je tento e-mail součástí obchodního jednání: - vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu. - a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce s dodatkem či odchylkou. - trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným dosažením shody na všech jejích náležitostech. - odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi či osobě jím zastoupené známá. This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner. The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning. - if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation. - the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects. - the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient.
[R] How to speed up a double loop?
Dear R-users, I would like to speed up a double-loop I developed for detecting and removing outliers in my whole data.frame. The idea is to remove data with a too big difference with the previous value. If detected, this test must be done here on maximum the next 10 values following the last correct one (and put an index on another column). It works well on a small data frame, but really too slowly for my real DF with 500 000 rows. Here's a fake data example and the double-loop: myts - data.frame(x=c(1,2,50,40,30,40,100,1,50,1,2,3,3,5,4),y=NA) for(jj in 1:(nrow(myts)-10)){ for(nn in ((jj+1):(jj+10))) { if((!is.na(myts[jj,1])) (!is.na(myts[nn,1])) (abs((myts[nn,1])-(myts[jj,1]))15)) { myts[nn,2] - 1 myts[nn,1] - NA } } } Can somebody explain me how can I speed this up easily? I heard about vectorization but I don't really understand how it works. -- View this message in context: http://r.789695.n4.nabble.com/How-to-speed-up-a-double-loop-tp4704054.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sorting data frame by prepared order
Hi, Maybe a beginning of solution with this? test - data.frame(x=c(1,1,1,1,1,1,2,2,2,2,2,2),y=c(a,a,a,b,b,b,a,a,b,b,b,a)) test[order(test$x),] out - split(test,test$x) for (i in 1:length(out)) { foo - unique(out[[i]][,2]) out[[i]][,2] - rep(foo,(nrow(out[[i]])/(length(foo } Seems to work for an length with a even value of your unique values in your first column. But still a problem for odd lengths. Maybe solved by adding fake rows that you can remove afterwords (with a specific index for example). -- View this message in context: http://r.789695.n4.nabble.com/Sorting-data-frame-by-prepared-order-tp4704038p4704058.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Firefox not showing R help.
On 01/03/2015 17:21, Duncan Murdoch wrote: On 01/03/2015 10:13 AM, Ista Zahn wrote: I think this might be due to the removal of the -remote option in firefox. Some discussion and details are available at https://lists.gnu.org/archive/html/emacs-orgmode/2015-02/msg00946.html Yes, that's it. R uses -remote if isLocal is TRUE; I had thought it wasn't. The -remote arg is also used for mozilla and opera; is it needed there? R 3.1.3RC has been updated to work with Firefox 36.0. Duncan Murdoch Best, Ista On Sun, Mar 1, 2015 at 7:14 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 28/02/2015 6:43 PM, Rolf Turner wrote: See inline below. On 01/03/15 10:59, Duncan Murdoch wrote: On 28/02/2015 4:10 PM, Rolf Turner wrote: Firefox recently updated itself on my laptop. Now when I ask for R help --- e.g. ?plot --- I just get my home page. And no help. If I do ?plot again after Firefox has opened its window, I just get yet another Firefox window, opened to my home page. (I have my preferences set to When Firefox starts Show my homepage --- as I always have had in the past.) I would guess that browseURL() won't work for any URL. Is that right? Yes. That is correct. E.g. if I do browseURL(http://www.r-project.org/;) I get taken to my home page, rather than to the R home page. What does getOption(browser) give you in R? /usr/bin/firefox If it is just a character string (e.g. xdg-open is what I get in Ubuntu), does it work from your command line, outside of R, e.g. for me that test would be xdg-open http://www.r-project.org I tried /usr/bin/firefox http://www.r-project.org/ from the Linux command line and was taken to the R home page, seamlessly. I also tried xdg-open http://www.r-project.org/ and that worked equally well. Finally I tried options(browser=xdg-open) and then ?plot and BINGO!!! the HTML help came up as requested. So I have a working solution to my problem. But I *really* don't understand why changing the browser from /usr/bin/firefox to xdg-open made a difference. (Since there appears to be no difference at the Linux command line.) Anyway; thanks very much for solving my problem. I believe browseURL will quote the URL, i.e. it would execute /usr/bin/firefox http://www.r-project.org/; Perhaps Firefox is confused by the quotes? Doesn't seem likely... Duncan Murdoch cheers, Rolf If that doesn't work, but you can figure out a command line way to open a particular URL, change getOption(browser) to use that. Duncan Murdoch The Firefox that I am currently running is (according Firefox help -- About Firefox) is version 36.0. Can anyone suggest to me how I can get my html R help back? For what it's worth: I am using Linux, Fedora 17. (Yes, I know it's elderly, but then so am I. :-) ) Also in case it has any relevance: sessionInfo() R version 3.1.2 (2014-10-31) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_NZ.utf8 LC_NUMERIC=C [3] LC_TIME=en_NZ.utf8LC_COLLATE=en_NZ.utf8 [5] LC_MONETARY=en_NZ.utf8LC_MESSAGES=en_NZ.utf8 [7] LC_PAPER=en_NZ.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_NZ.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] spatstat_1.40-0.064 misc_0.0-16 loaded via a namespace (and not attached): [1] abind_1.4-0 deldir_0.1-7goftest_1.0-2 grid_3.1.2 [5] lattice_0.20-29 Matrix_1.1-4mgcv_1.8-3 nlme_3.1-118 [9] polyclip_1.3-1 tensor_1.5 tools_3.1.2 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Emeritus Professor of Applied Statistics, University of Oxford 1 South Parks Road, Oxford OX1 3TG, UK __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RWebdriver and RSelenium returns the same error while trying to connect the java server
Is this a very recent version of Firefox? Have a look at the thread Re: [R] Firefox not showing R help. It could be related, in which case you might want to try the up-to-the-minute latest version of R-patched (aka 3.1.3RC). Otherwise, as usual, you may need to talk with the relevant package maintainers. -pd On 01 Mar 2015, at 21:09 , Ista Zahn istaz...@gmail.com wrote: Do you have a server running? Can you connect to localhost directly from a browser? Best, Ista On Sun, Mar 1, 2015 at 12:35 PM, PO SU rhelpmaill...@163.com wrote: Dear expeRts, when i using RWebdriver and RSelenium , require(RSelenium) Loading required package: RSelenium remDr - remoteDriver(remoteServerAddr = localhost + , port = + , browserName = firefox + ) remDr$open() [1] Connecting to remote server Error: Summary: UnknownError Detail: An unknown server-side error occurred while processing the command. class: org.openqa.selenium.firefox.NotConnectedException from the cmd lines, it show unable to connect 127.0.0.1 7055 after 45000ms i don't know why, is there anybody who happens to know it ? -- PO SU mail: desolato...@163.com Majored in Statistics from SJTU __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dummy variable in ARIMA
Have a look at the caschrono package. There's an excellent associated book by the author of the package -Yves Aragon- but it's in French; if you don't read French, the package documentation is very clear. José -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Mikael Olai Milhøj Sent: 26 February 2015 16:03 To: r-help@r-project.org Subject: [R] Dummy variable in ARIMA Hi all I have been searching on the web in vain. I want to include a dummy variable in my ARIMA model. Let's say that I want to make an AR(1) model for X including a dummy variable which should be 1 for observation 4,5,6 and zero otherwise (let's say that there is 50 observations in total). How do I make that? This does the trick but seems inefficient: dummy-c(rep(0,3), rep(1,3), rep(0,44)) Thx in advance Best regards /Mikael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Age UK Group Age UK is a registered charity and company limited by guarantee, (registered charity number 1128267, registered company number 6825798) Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA. For the purposes of promoting Age UK Insurance, Age UK is an Appointed Representative of Age UK Enterprises Limited. Age UK Enterprises Limited is authorised and regulated by the Financial Conduct Authority. Charitable Services are offered through Age UK (the Charity) and commercial products and services are offered by the Charity’s subsidiary companies. The Age UK Group comprises of Age UK, and its subsidiary companies and charities, dedicated to improving the lives of people in later life. Our network includes the three national charities Age Cymru, Age NI and Age Scotland and more than 160 local Age UK charities. This email and any files transmitted with it are confide...{{dropped:11}} __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] readHTMLTable() in XML package
I'm having trouble pulling down data from a website with my code below as I keep encountering the same error, but the error occurs on different pages. My code below loops through a wensite and grabs data from the html table. The error appears on different pages at different times and I'm not sure of the root cause. Error in readHTMLTable(readLines(url), which = 1, header = TRUE) : error in evaluating the argument 'doc' in selecting a method for function 'readHTMLTable': Error in readHTMLTable(readLines(url), which = 1, header = TRUE) : error in evaluating the argument 'doc' in selecting a method for function 'readHTMLTable': library(XML) for(i in 1:1000){ url - paste(paste('http://games.crossfit.com/scores/leaderboard.php?stage=5sort=0page=', i, sep=''), 'division=1region=0numberperpage=100competition=0frontpage=0expanded=1year=15full=1showtoggles=0hidedropdowns=0showathleteac=1=is_mobile=0', sep='') tmp - readHTMLTable(readLines(url), which=1, header=TRUE) names(tmp) - gsub(\\n, , names(tmp)) names(tmp) - gsub( +, , names(tmp)) tmp[] - lapply(tmp, function(x) gsub(\\n, , x)) if(i == 1){ dat - tmp } else { dat - rbind(dat, tmp) } cat('Grabbing data from page', i, '\n') } Thanks, Harold [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] numbering consecutive rows based on length criteria
Using this dataset: dat - read.table(textConnection(daynoRes.QwRes.Q 1 237074.41 215409.41 2 2336240.20 164835.16 3 84855.42 357062.72 4 76993.48 386326.78 5 73489.47 307144.09 6 70246.96 75885.75 7 69630.09 74054.33 8 66714.78 70071.80 9 122296.90 66579.08 10 63502.71 65811.37 11 63401.84 64795.12 12 63387.84 64401.14 13 63186.10 64163.95 14 63160.74 63468.25 15 60471.15 60719.15 16 58235.63 57655.14 17 58089.73 58061.34 18 57846.39 57357.89 19 57839.42 56495.69 20 57740.06 56219.97 21 58068.57 55810.91 22 58358.34 56437.81 23 76284.90 73722.92 24 105138.31 100729.00 25 147203.03 178079.38 26 109996.02 13.95 27 91424.20 87391.56 28 89065.91 87196.69 29 86628.74 84809.07 30 79357.60 77555.62),header=T) I'm attempting to generate a column that continuously numbers consecutive rows where wRes.Q is greater than noRes.Q. To that end, I've come up with the following: dat$flg - dat$wRes.Qdat$noRes.Q dat$cnt - with(dat, ave(integer(length(flg)), flg, FUN=seq_along)) The problem with dat$cnt is that it doesn't start over with 1 when a 'new' group of either true or false is encountered. Thus, row 9's cnt value should start over at 1, as should dat$cnt[10], and dat$cnt[11]==2, etc. (the desired result is shown below) In the larger dataset I'm working with (6,000 rows), there are blocks of rows where the number of consecutive rows with dat$cnt==TRUE exceeds 100. My goal is to plot these blocks of rows as polygons in a time series plot. If, for the small example provided, the number of consecutive rows with dat$cnt==TRUE is greater than or equal to 5 (the 2 blocks of rows satisfying this criteria in this small example are rows 3-8 and 10-15), is there a way to add a column that uniquely numbers these blocks of rows? I'd like to end up with the following, which shows the correct cnt column and a column called plygn that is my ultimate goal: dat # daynoRes.QwRes.Q flg cnt plygn # 1 237074.41 215409.41 FALSE 1 NA # 2 2336240.20 164835.16 FALSE 2 NA # 3 84855.42 357062.72 TRUE 1 1 # 4 76993.48 386326.78 TRUE 2 1 # 5 73489.47 307144.09 TRUE 3 1 # 6 70246.96 75885.75 TRUE 4 1 # 7 69630.09 74054.33 TRUE 5 1 # 8 66714.78 70071.80 TRUE 6 1 # 9 122296.90 66579.08 FALSE 1 NA # 10 63502.71 65811.37 TRUE 1 2 # 11 63401.84 64795.12 TRUE 2 2 # 12 63387.84 64401.14 TRUE 3 2 # 13 63186.10 64163.95 TRUE 4 2 # 14 63160.74 63468.25 TRUE 5 2 # 15 60471.15 60719.15 TRUE 6 2 # 16 58235.63 57655.14 FALSE 1 NA # 17 58089.73 58061.34 FALSE 2 NA # 18 57846.39 57357.89 FALSE 3 NA # 19 57839.42 56495.69 FALSE 4 NA # 20 57740.06 56219.97 FALSE 5 NA # 21 58068.57 55810.91 FALSE 6 NA # 22 58358.34 56437.81 FALSE 7 NA # 23 76284.90 73722.92 FALSE 8 NA # 24 105138.31 100729.00 FALSE 9 NA # 25 147203.03 178079.38 TRUE 1 NA # 26 109996.02 13.95 TRUE 2 NA # 27 91424.20 87391.56 FALSE 1 NA # 28 89065.91 87196.69 FALSE 2 NA # 29 86628.74 84809.07 FALSE 3 NA # 30 79357.60 77555.62 FALSE 4 NA Thanks, Eric [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] numbering consecutive rows based on length criteria
On Mar 2, 2015, at 11:43 AM, Morway, Eric emor...@usgs.gov wrote: Using this dataset: dat - read.table(textConnection(daynoRes.QwRes.Q 1 237074.41 215409.41 2 2336240.20 164835.16 3 84855.42 357062.72 4 76993.48 386326.78 5 73489.47 307144.09 6 70246.96 75885.75 7 69630.09 74054.33 8 66714.78 70071.80 9 122296.90 66579.08 10 63502.71 65811.37 11 63401.84 64795.12 12 63387.84 64401.14 13 63186.10 64163.95 14 63160.74 63468.25 15 60471.15 60719.15 16 58235.63 57655.14 17 58089.73 58061.34 18 57846.39 57357.89 19 57839.42 56495.69 20 57740.06 56219.97 21 58068.57 55810.91 22 58358.34 56437.81 23 76284.90 73722.92 24 105138.31 100729.00 25 147203.03 178079.38 26 109996.02 13.95 27 91424.20 87391.56 28 89065.91 87196.69 29 86628.74 84809.07 30 79357.60 77555.62),header=T) I'm attempting to generate a column that continuously numbers consecutive rows where wRes.Q is greater than noRes.Q. To that end, I've come up with the following: dat$flg - dat$wRes.Qdat$noRes.Q dat$cnt - with(dat, ave(integer(length(flg)), flg, FUN=seq_along)) The problem with dat$cnt is that it doesn't start over with 1 when a 'new' group of either true or false is encountered. Thus, row 9's cnt value should start over at 1, as should dat$cnt[10], and dat$cnt[11]==2, etc. (the desired result is shown below) In the larger dataset I'm working with (6,000 rows), there are blocks of rows where the number of consecutive rows with dat$cnt==TRUE exceeds 100. My goal is to plot these blocks of rows as polygons in a time series plot. If, for the small example provided, the number of consecutive rows with dat$cnt==TRUE is greater than or equal to 5 (the 2 blocks of rows satisfying this criteria in this small example are rows 3-8 and 10-15), is there a way to add a column that uniquely numbers these blocks of rows? I'd like to end up with the following, which shows the correct cnt column and a column called plygn that is my ultimate goal: dat # daynoRes.QwRes.Q flg cnt plygn # 1 237074.41 215409.41 FALSE 1 NA # 2 2336240.20 164835.16 FALSE 2 NA # 3 84855.42 357062.72 TRUE 1 1 # 4 76993.48 386326.78 TRUE 2 1 # 5 73489.47 307144.09 TRUE 3 1 # 6 70246.96 75885.75 TRUE 4 1 # 7 69630.09 74054.33 TRUE 5 1 # 8 66714.78 70071.80 TRUE 6 1 # 9 122296.90 66579.08 FALSE 1 NA # 10 63502.71 65811.37 TRUE 1 2 # 11 63401.84 64795.12 TRUE 2 2 # 12 63387.84 64401.14 TRUE 3 2 # 13 63186.10 64163.95 TRUE 4 2 # 14 63160.74 63468.25 TRUE 5 2 # 15 60471.15 60719.15 TRUE 6 2 # 16 58235.63 57655.14 FALSE 1 NA # 17 58089.73 58061.34 FALSE 2 NA # 18 57846.39 57357.89 FALSE 3 NA # 19 57839.42 56495.69 FALSE 4 NA # 20 57740.06 56219.97 FALSE 5 NA # 21 58068.57 55810.91 FALSE 6 NA # 22 58358.34 56437.81 FALSE 7 NA # 23 76284.90 73722.92 FALSE 8 NA # 24 105138.31 100729.00 FALSE 9 NA # 25 147203.03 178079.38 TRUE 1 NA # 26 109996.02 13.95 TRUE 2 NA # 27 91424.20 87391.56 FALSE 1 NA # 28 89065.91 87196.69 FALSE 2 NA # 29 86628.74 84809.07 FALSE 3 NA # 30 79357.60 77555.62 FALSE 4 NA Thanks, Eric Hi, See ?rle unlist(sapply(rle(with(dat, wRes.Q noRes.Q))$lengths, seq)) [1] 1 2 1 2 3 4 5 6 1 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 1 2 1 2 3 4 cbind() the result above to your data frame. Regards, Marc Schwartz __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] numbering consecutive rows based on length criteria
Dear Eric, Here is a solution using the plyr package. library(plyr) dat$flg - dat$wRes.Qdat$noRes.Q dat$group - cumsum(c(0, abs(diff(dat$flg ddply(dat, group, function(x){ if(x$flg[1] nrow(x) = 5){ x$plygn - seq_along(x$group) } else { x$plygn - NA } x }) Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie Kwaliteitszorg / team Biometrics Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-03-02 18:43 GMT+01:00 Morway, Eric emor...@usgs.gov: Using this dataset: dat - read.table(textConnection(daynoRes.QwRes.Q 1 237074.41 215409.41 2 2336240.20 164835.16 3 84855.42 357062.72 4 76993.48 386326.78 5 73489.47 307144.09 6 70246.96 75885.75 7 69630.09 74054.33 8 66714.78 70071.80 9 122296.90 66579.08 10 63502.71 65811.37 11 63401.84 64795.12 12 63387.84 64401.14 13 63186.10 64163.95 14 63160.74 63468.25 15 60471.15 60719.15 16 58235.63 57655.14 17 58089.73 58061.34 18 57846.39 57357.89 19 57839.42 56495.69 20 57740.06 56219.97 21 58068.57 55810.91 22 58358.34 56437.81 23 76284.90 73722.92 24 105138.31 100729.00 25 147203.03 178079.38 26 109996.02 13.95 27 91424.20 87391.56 28 89065.91 87196.69 29 86628.74 84809.07 30 79357.60 77555.62),header=T) I'm attempting to generate a column that continuously numbers consecutive rows where wRes.Q is greater than noRes.Q. To that end, I've come up with the following: dat$flg - dat$wRes.Qdat$noRes.Q dat$cnt - with(dat, ave(integer(length(flg)), flg, FUN=seq_along)) The problem with dat$cnt is that it doesn't start over with 1 when a 'new' group of either true or false is encountered. Thus, row 9's cnt value should start over at 1, as should dat$cnt[10], and dat$cnt[11]==2, etc. (the desired result is shown below) In the larger dataset I'm working with (6,000 rows), there are blocks of rows where the number of consecutive rows with dat$cnt==TRUE exceeds 100. My goal is to plot these blocks of rows as polygons in a time series plot. If, for the small example provided, the number of consecutive rows with dat$cnt==TRUE is greater than or equal to 5 (the 2 blocks of rows satisfying this criteria in this small example are rows 3-8 and 10-15), is there a way to add a column that uniquely numbers these blocks of rows? I'd like to end up with the following, which shows the correct cnt column and a column called plygn that is my ultimate goal: dat # daynoRes.QwRes.Q flg cnt plygn # 1 237074.41 215409.41 FALSE 1 NA # 2 2336240.20 164835.16 FALSE 2 NA # 3 84855.42 357062.72 TRUE 1 1 # 4 76993.48 386326.78 TRUE 2 1 # 5 73489.47 307144.09 TRUE 3 1 # 6 70246.96 75885.75 TRUE 4 1 # 7 69630.09 74054.33 TRUE 5 1 # 8 66714.78 70071.80 TRUE 6 1 # 9 122296.90 66579.08 FALSE 1 NA # 10 63502.71 65811.37 TRUE 1 2 # 11 63401.84 64795.12 TRUE 2 2 # 12 63387.84 64401.14 TRUE 3 2 # 13 63186.10 64163.95 TRUE 4 2 # 14 63160.74 63468.25 TRUE 5 2 # 15 60471.15 60719.15 TRUE 6 2 # 16 58235.63 57655.14 FALSE 1 NA # 17 58089.73 58061.34 FALSE 2 NA # 18 57846.39 57357.89 FALSE 3 NA # 19 57839.42 56495.69 FALSE 4 NA # 20 57740.06 56219.97 FALSE 5 NA # 21 58068.57 55810.91 FALSE 6 NA # 22 58358.34 56437.81 FALSE 7 NA # 23 76284.90 73722.92 FALSE 8 NA # 24 105138.31 100729.00 FALSE 9 NA # 25 147203.03 178079.38 TRUE 1 NA # 26 109996.02 13.95 TRUE 2 NA # 27 91424.20 87391.56 FALSE 1 NA # 28 89065.91 87196.69 FALSE 2 NA # 29 86628.74 84809.07 FALSE 3 NA # 30 79357.60 77555.62 FALSE 4 NA Thanks, Eric [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
[R] Installed R to Windows, but having trouble with read.csv
I copied the file into the bin folder of R ... perf_data - read.csv(PerfResultsCSv.csv) Error in file(file, rt) : cannot open the connection In addition: Warning message: In file(file, rt) : cannot open file 'PerfResultsCSv.csv': No such file or directory I also installed and load RWeka package, but canot open. It seems like I am having some issue of directories... I am new to R, any help appreciated. Thanks, Alice [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] readHTMLTable() in XML package
This somewhat simpler rvest code does the trick for me: library(rvest) library(dplyr) i - 1:10 urls - paste0('http://games.crossfit.com/scores/leaderboard.php?stage=5', 'sort=0division=1region=0numberperpage=100competition=0frontpage=0', 'expanded=1year=15full=1showtoggles=0hidedropdowns=0showathleteac=1', 'is_mobile=0page=', i) results_table - function(url) { url %% html %% html_table(fill = TRUE) %% .[[1]] } results - lapply(urls, results_table) out - results %% bind_rows() Hadley On Mon, Mar 2, 2015 at 10:00 AM, Doran, Harold hdo...@air.org wrote: I'm having trouble pulling down data from a website with my code below as I keep encountering the same error, but the error occurs on different pages. My code below loops through a wensite and grabs data from the html table. The error appears on different pages at different times and I'm not sure of the root cause. Error in readHTMLTable(readLines(url), which = 1, header = TRUE) : error in evaluating the argument 'doc' in selecting a method for function 'readHTMLTable': Error in readHTMLTable(readLines(url), which = 1, header = TRUE) : error in evaluating the argument 'doc' in selecting a method for function 'readHTMLTable': library(XML) for(i in 1:1000){ url - paste(paste('http://games.crossfit.com/scores/leaderboard.php?stage=5sort=0page=', i, sep=''), 'division=1region=0numberperpage=100competition=0frontpage=0expanded=1year=15full=1showtoggles=0hidedropdowns=0showathleteac=1=is_mobile=0', sep='') tmp - readHTMLTable(readLines(url), which=1, header=TRUE) names(tmp) - gsub(\\n, , names(tmp)) names(tmp) - gsub( +, , names(tmp)) tmp[] - lapply(tmp, function(x) gsub(\\n, , x)) if(i == 1){ dat - tmp } else { dat - rbind(dat, tmp) } cat('Grabbing data from page', i, '\n') } Thanks, Harold [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- http://had.co.nz/ __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to speed up a double loop?
Hi Petr, Thanks for your reply, Actually it's not what I'm looking for. The aim is not simply to remove each value 15. In my loop, I consider the first numeric value of my column as correct. Then, I want to test the second value. If the absolute difference with the previous correct one is 15, it's a new correct one, but if it's 15, then it's a wrong one. If it's a wrong one, it has to test the third one to check if it's still 15 from the last correct value (first one). The value becomes correct again when the difference with the last correct one goes under 15 (and so, this value is the new correct one, and so one for the rest of the column). My loop is already doing the trick, but I just want to speed it up (or maybe another faster way to do the job). Hope it's more understandable right now! -- View this message in context: http://r.789695.n4.nabble.com/How-to-speed-up-a-double-loop-tp4704054p4704061.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R crashes when I run rgeos::gDistance
Hi, This is my first post to R-help. I'm having trouble getting rgeos to work. Info on the server and packages I'm using: $ *uname -a* Linux some-server.somewhere.com 2.6.32-431.11.2.el6.x86_64 #1 SMP Tue Mar 25 19:59:55 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux $ *R --version* R version 3.0.2 (2013-09-25) -- Frisbee Sailing Copyright (C) 2013 The R Foundation for Statistical Computing Platform: x86_64-redhat-linux-gnu (64-bit) $ *lsb_release -a* LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch Distributor ID: CentOS Description: CentOS release 6.5 (Final) Release: 6.5 Codename: Final $ *rpm -qa | grep geos* geos-devel-3.4.2-1.rhel6.x86_64 geos-3.4.2-1.rhel6.x86_64 $ *rpm -qa | grep gdal* gdal-1.9.2-6.rhel6.x86_64 gdal-libs-1.9.2-6.rhel6.x86_64 gdal-devel-1.9.2-6.rhel6.x86_64 gdal-java-1.9.2-6.rhel6.x86_64 $ *R -q* *library(rgeos)* rgeos version: 0.3-8, (SVN revision 460) GEOS runtime version: 3.4.2-CAPI-1.8.2 r3921 Polygon checking: TRUE *example(gDistance)* gDstnc pt1 = readWKT(POINT(0.5 0.5)) gDstnc pt2 = readWKT(POINT(2 2)) gDstnc p1 = readWKT(POLYGON((0 0,1 0,1 1,0 1,0 0))) gDstnc p2 = readWKT(POLYGON((2 0,3 1,4 0,2 0))) gDstnc gDistance(pt1,pt2) R: GeometryComponentFilter.cpp:34: virtual void geos::geom::GeometryComponentFilter::filter_ro(const geos::geom::Geometry*): Assertion `0' failed. Aborted (core dumped) I'd like to be able to use the gDistance function. What should I do to fix this? Please let me know if any additional information would be helpful. Thank you for your time, Adrian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Installed R to Windows, but having trouble with read.csv
Is this file in your working directory? (To know your working directory use: getwd() ) If not, put it in there. 2015-03-02 11:53 GMT-05:00 Mello Cavallo, Alice mel...@wit.edu: I copied the file into the bin folder of R ... perf_data - read.csv(PerfResultsCSv.csv) Error in file(file, rt) : cannot open the connection In addition: Warning message: In file(file, rt) : cannot open file 'PerfResultsCSv.csv': No such file or directory I also installed and load RWeka package, but canot open. It seems like I am having some issue of directories... I am new to R, any help appreciated. Thanks, Alice [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.