Re: [R] Can R read Word fonts and comments?
It’s my understanding that docx and xlsx files are zipped containers that have their data in XML files. You should try unzipping one and examining it with a viewer. You may then be able to use pkg:XML. — David. > On Jul 1, 2016, at 3:13 PM, Bert Gunterwrote: > > No, sorry -- all I would do is search. > > -- Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Fri, Jul 1, 2016 at 2:33 PM, John wrote: >> Yes, I have done some search (e.g., tm, markdown, etc), but I can't find >> this function. >> If you know any package that works for this purpose, that would be quite >> helpful. >> Thanks, >> >> John >> >> 2016-06-28 16:50 GMT-07:00 Bert Gunter : >>> >>> Did you try searching before posting here? -- e.g. a web search or on >>> rseek.org ? >>> >>> Cheers, >>> Bert >>> >>> >>> Bert Gunter >>> >>> "The trouble with having an open mind is that people keep coming along >>> and sticking things into it." >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>> >>> >>> On Tue, Jun 28, 2016 at 3:53 PM, John wrote: Hi, From time to time I highlight the word documents with red/blue color or italic/bold fonts, and I also add comments to a file. Is there a package/function to let R extract the italic/bold blue/red words and comments from a docx/doc file? I am aware that there are a few packages reading Word, but don't know which one is able to do it. Thanks, John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. >> >> > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to identify runs or clusters of events in time
See below On Fri, 1 Jul 2016, Mark Shanks wrote: Hi, Imagine the two problems: 1) You have an event that occurs repeatedly over time. You want to identify periods when the event occurs more frequently than the base rate of occurrence. Ideally, you don't want to have to specify the period (e.g., break into months), so the analysis can be sensitive to scenarios such as many events happening only between, e.g., June 10 and June 15 - even though the overall number of events for the month may not be much greater than usual. Similarly, there may be a cluster of events that occur from March 28 to April 3. Ideally, you want to pull out the base rate of occurrence and highlight only the periods when the frequency is less or greater than the base rate. A good place to start is: Siegmund, D. O., N. R. Zhang, and B. Yakir. "False discovery rate for scanning statistics." Biometrika 98.4 (2011): 979-985. and Aldous, David. Probability approximations via the Poisson clumping heuristic. Vol. 77. Springer Science & Business Media, 2013. --- A nice illustration of how scan statistcis can be used is: Aberdein, Jody, and David Spiegelhalter. "Have London's roads become more dangerous for cyclists?." Significance 10.6 (2013): 46-48. 2) Events again occur repeatedly over time in an inconsistent way. However, this time, the event has positive or negative outcomes - such as a spot check of conformity to regulations. You again want to know whether there is a group of negative outcomes close together in time. This analysis should take into account the negative outcomes as well though. E.g., if from June 10 to June 15 you get 5 negative outcomes and no positive outcomes it should be flagged. On the other hand, if from June 10 to June 15 you get 5 negative outcomes interspersed between many positive outcomes it should be ignored. I'm guessing that there is some statistical approach designed to look at these types of issues. What is it called? `Scan statistic' is a good search term. `Poisson clumping', too. What package in R implements it? I basically just need to know where to start. There are some R packages. CRAN has packages SNscan and graphscan, which sound like they might interest you. My BioConductor package geneRxCluster: http://bioconductor.org/packages/release/bioc/html/geneRxCluster.html seeks clusters in a binary sequence as described in detail at http://bioinformatics.oxfordjournals.org/content/30/11/1493 HTH, Chuck __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [FORGED] turning data in list format into a matrix format
Your question is filled with uh, infelicities. See inline below. On 02/07/16 07:53, Marietta Suarez wrote: As of now, I simulate 6 variables and manipulate 2 of them. The way my syntax is written my final database Do you mean "data set"? is in list mode. I need all of it to be in just 1 database Do you mean data frame? any help would be MUCH appreciated. Here's my syntax: fun=function(n, k,rep){ #prepare to store data data=matrix(0,nrow=10*k, ncol=6) db=vector("list",length=rep) #here's the problem for (j in 1:rep){ for (i in 1:k){ #generate data under normal, skewed, and logistic distributions here data[,1]<-seq(1,nrow(data),1) data[,2]<-rep(i,nrow(data)) data[,3]<-rep(n,nrow(data)) data[,4]<-rnorm(n, 100, 15) data[,5]<-rsnorm(n, 100, 15, 1) There is no such function as "rsnorm" in base R or in the standard packages. Where does this function come from? Do not expect your respondents to be psychic. data[,6]<-rlogis(n, 100, 15) } db[[j]]<-data } DataReturn<<-db Do *NOT* use "<<-"! Especially if you actually mean "<-"!!! What is the point of "DataReturn" anyway? It's the same thing as "db". Why confuse things by introducing a new name? return(DataReturn) #DataReturn <- data.frame(matrix(unlist(DataReturn), nrow=rep, byrow=T)) #here's where I tried to solve it unsuccessfully } In what sense is this unsuccessful? It returns a data frame. What is wrong with this data frame? What did you *want* it to look like? A *reproducible* "toy" example showing what you want to get would be helpful. A wild guess is that you want your reps "stacked" into a single data frame. In which case something like do.call(rbind,db) may be what you want. Finally, do you really want the result to be a data frame? Since all values dealt with are numeric, you might as well use a matrix, and save on the overheads that data frames involve. cheers, Rolf Turner -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Anybody know the cause of this?
I don't have a clue, but I suspect that those who might would be helped by your providing the output of the sessionInfo() command + perhaps other relevant info on your computing environment. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Jul 1, 2016 at 12:56 PM, James Plantewrote: > Just upgraded to R 3.3.1; when I updated the packages on CRAN, I got a BUNCH > of warning messages like the ones below: > > 2016-07-01 14:44:19.840 R[369:3724] IMKClient Stall detected, *please Report* > your user scenario attaching a spindump (or sysdiagnose) that captures the > problem - (imkxpc_windowLevelWithReply:) block performed very slowly (2.40 > secs). > starting httpd help server ... done >> spindump() > Error: could not find function "spindump" > In addition: Warning message: > In library() : > library ‘/Users/jimplante/Library/R/3.3/library’ contains no packages > 2016-07-01 14:47:32.774 R[369:3724] IMKClient Stall detected, *please Report* > your user scenario attaching a spindump (or sysdiagnose) that captures the > problem - (imkxpc_attributesForCharacterIndex:reply:) block performed very > slowly (1.42 secs). >> > > This is not interfering with any of my work, but I’m curious to know what > it’s warning me about. > Two more questions: 1) What’s a spindump; and 2) What’s a sysdiagnose? > And finally, how would I get either one of those, and to whom would I send > them? > > Thanks, > > Jim > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can R read Word fonts and comments?
No, sorry -- all I would do is search. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Jul 1, 2016 at 2:33 PM, Johnwrote: > Yes, I have done some search (e.g., tm, markdown, etc), but I can't find > this function. > If you know any package that works for this purpose, that would be quite > helpful. > Thanks, > > John > > 2016-06-28 16:50 GMT-07:00 Bert Gunter : >> >> Did you try searching before posting here? -- e.g. a web search or on >> rseek.org ? >> >> Cheers, >> Bert >> >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along >> and sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Tue, Jun 28, 2016 at 3:53 PM, John wrote: >> > Hi, >> > >> >From time to time I highlight the word documents with red/blue color >> > or >> > italic/bold fonts, and I also add comments to a file. Is there a >> > package/function to let R extract the italic/bold blue/red words and >> > comments from a docx/doc file? >> > >> >I am aware that there are a few packages reading Word, but don't know >> > which one is able to do it. >> > >> >Thanks, >> > >> > John >> > >> > [[alternative HTML version deleted]] >> > >> > __ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] trouble double looping to generate data for a meta-analysis
Hint: It's much more efficient not to loop and generate random data in a single call only once -- then make your samples. (This can even often be done with different distribution parameters, as in many cases these can also be vactorized) Example: ## 1000 random samples of size 100 > set.seed(1122) > samps.norm <- matrix(rnorm(1e5),nrow = 100 ) > dim(samps.norm) [1] 100 1000 ## This was instantaneous on my machine. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Jul 1, 2016 at 10:28 AM, Marietta Suarezwrote: > i'm trying to generate data for a meta analysis. 1- generate data following > a normal distribution, 2- generate data following a skewed distribution, 3- > generate data following a logistic distribution. i need to loop this > because the # of studies in each meta will be either 10 or 15. k or total > number of studies in the meta will be 5. i need to loop twice to repeat > this process 10 times. database should be 3 columns (distributions) by 65 > rows x 10 reps > > > here's my code, not sure what's not working: > library(fGarch) > > #n reps =10 > rep=10 > > #begin function here, need to vary n and k, when k=2 n=10, when k3 n=15 > fun=function(n, k){ > > #prepare to store data > data=matrix(0,nrow=10*k, ncol=3) > db=matrix(0,nrow=650, ncol=3) > > for (j in 1:rep) > { > for (i in 1:k) > { > #generate data under normal, skewed, and logistic distributions here > > data[,1]=rnorm(n, 100, 15) > data[,2]=rsnorm(n, 100, 15, 1) > data[,3]=rlogis(n, 100, 15) > } > [j]=db > } > } > > save=fun(10,2) > > Please help!!! > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can R read Word fonts and comments?
Yes, I have done some search (e.g., tm, markdown, etc), but I can't find this function. If you know any package that works for this purpose, that would be quite helpful. Thanks, John 2016-06-28 16:50 GMT-07:00 Bert Gunter: > Did you try searching before posting here? -- e.g. a web search or on > rseek.org ? > > Cheers, > Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Tue, Jun 28, 2016 at 3:53 PM, John wrote: > > Hi, > > > >From time to time I highlight the word documents with red/blue color > or > > italic/bold fonts, and I also add comments to a file. Is there a > > package/function to let R extract the italic/bold blue/red words and > > comments from a docx/doc file? > > > >I am aware that there are a few packages reading Word, but don't know > > which one is able to do it. > > > >Thanks, > > > > John > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R-es] Resumen de R-help-es, Vol 89, Envío 1
Gracias Olivier, me ha servido de mucho. 2016-07-01 5:00 GMT-05:00: > Envíe los mensajes para la lista R-help-es a > r-help-es@r-project.org > > Para subscribirse o anular su subscripción a través de la WEB > https://stat.ethz.ch/mailman/listinfo/r-help-es > > O por correo electrónico, enviando un mensaje con el texto "help" en > el asunto (subject) o en el cuerpo a: > r-help-es-requ...@r-project.org > > Puede contactar con el responsable de la lista escribiendo a: > r-help-es-ow...@r-project.org > > Si responde a algún contenido de este mensaje, por favor, edite la > linea del asunto (subject) para que el texto sea mas especifico que: > "Re: Contents of R-help-es digest...". Además, por favor, incluya en > la respuesta sólo aquellas partes del mensaje a las que está > respondiendo. > > > Asuntos del día: > >1. Repetir datos en una tabla (Novvier Marco Uscuchagua Cornelio) >2. Re: Repetir datos en una tabla (Javier Marcuzzi) >3. Re: Repetir datos en una tabla (Olivier Nuñez) > > > -- > > Message: 1 > Date: Thu, 30 Jun 2016 12:15:34 -0500 > From: Novvier Marco Uscuchagua Cornelio > To: r-help-es@r-project.org > Subject: [R-es] Repetir datos en una tabla > Message-ID: >
[R] Anybody know the cause of this?
Just upgraded to R 3.3.1; when I updated the packages on CRAN, I got a BUNCH of warning messages like the ones below: 2016-07-01 14:44:19.840 R[369:3724] IMKClient Stall detected, *please Report* your user scenario attaching a spindump (or sysdiagnose) that captures the problem - (imkxpc_windowLevelWithReply:) block performed very slowly (2.40 secs). starting httpd help server ... done > spindump() Error: could not find function "spindump" In addition: Warning message: In library() : library ‘/Users/jimplante/Library/R/3.3/library’ contains no packages 2016-07-01 14:47:32.774 R[369:3724] IMKClient Stall detected, *please Report* your user scenario attaching a spindump (or sysdiagnose) that captures the problem - (imkxpc_attributesForCharacterIndex:reply:) block performed very slowly (1.42 secs). > This is not interfering with any of my work, but I’m curious to know what it’s warning me about. Two more questions: 1) What’s a spindump; and 2) What’s a sysdiagnose? And finally, how would I get either one of those, and to whom would I send them? Thanks, Jim __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] turning data in list format into a matrix format
As of now, I simulate 6 variables and manipulate 2 of them. The way my syntax is written my final database is in list mode. I need all of it to be in just 1 database. any help would be MUCH appreciated. Here's my syntax: fun=function(n, k,rep){ #prepare to store data data=matrix(0,nrow=10*k, ncol=6) db=vector("list",length=rep) #here's the problem for (j in 1:rep){ for (i in 1:k){ #generate data under normal, skewed, and logistic distributions here data[,1]<-seq(1,nrow(data),1) data[,2]<-rep(i,nrow(data)) data[,3]<-rep(n,nrow(data)) data[,4]<-rnorm(n, 100, 15) data[,5]<-rsnorm(n, 100, 15, 1) data[,6]<-rlogis(n, 100, 15) } db[[j]]<-data } DataReturn<<-db return(DataReturn) #DataReturn <- data.frame(matrix(unlist(DataReturn), nrow=rep, byrow=T)) #here's where I tried to solve it unsuccessfully } [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Difficulty subsetting data frames using logical operators
> On Jul 1, 2016, at 2:11 AM, Giles Bischoffwrote: > > So, I uploaded a data set via my directory using the command data <- > data.frame(read.csv("hw1_data.csv")) and then tried to subset that data > using logical operators. Specifically, I was trying to make it so that I > got all the rows in which the values for "Ozone" (a column in the data set) > were greater than 31 (I was trying to get the mean of all said values). > Then, I tried using the command data[ , "Ozone">31]. Additionally, I had > trouble getting it so that I had all the rows where all the values in > "Ozone">31 & "Temp">90 simultaneously. There were some NA values in both of > those columns, so that might be it. If someone could help me to figure out > how to remove those values, that'd be great as well. I'm using a Mac (OS X) > with the latest version of R (3.1.2. I think??). > > Here is some of the code I used: > Bad idea to use `data` as an data-object name. There is an R function by that name. Look at ?data and see the fortune: fortunes::fortune("dog") The threat that it will "clash" is not actually correct, but it is guaranteed to deliver error messages that do not make sense and result in code that will be difficult to read. >> data <- data.frame(read.csv("hw1_data.csv")) >> data >Ozone Solar.R Wind Temp Month Day > 1 41 190 7.4 67 5 1 > 2 36 118 8.0 72 5 2 > 3 12 149 12.6 74 5 3 > 4 18 313 11.5 62 5 4 > 5 NA NA 14.3 56 5 5 > 6 28 NA 14.9 66 5 6 > 7 23 299 8.6 65 5 7 > 8 19 99 13.8 59 5 8 > 9 8 19 20.1 61 5 9 > 10 NA 194 8.6 69 5 10 > 11 7 NA 6.9 74 5 11 > 12 16 256 9.7 69 5 12 > 13 11 290 9.2 66 5 13 > 14 14 274 10.9 68 5 14 > 15 18 65 13.2 58 5 15 > 16 14 334 11.5 64 5 16 > 17 34 307 12.0 66 5 17 > 18 6 78 18.4 57 5 18 > 19 30 322 11.5 68 5 19 > 20 11 44 9.7 62 5 20 > 21 1 8 9.7 59 5 21 > 22 11 320 16.6 73 5 22 > 23 4 25 9.7 61 5 23 > 24 32 92 12.0 61 5 24 > 25 NA 66 16.6 57 5 25 > 26 NA 266 14.9 58 5 26 > 27 NA NA 8.0 57 5 27 > 28 23 13 12.0 67 5 28 > 29 45 252 14.9 81 5 29 > 30115 223 5.7 79 5 30 > 31 37 279 7.4 76 5 31 > 32 NA 286 8.6 78 6 1 > 33 NA 287 9.7 74 6 2 > 34 NA 242 16.1 67 6 3 > 35 NA 186 9.2 84 6 4 > 36 NA 220 8.6 85 6 5 > 37 NA 264 14.3 79 6 6 > 38 29 127 9.7 82 6 7 > 39 NA 273 6.9 87 6 8 > 40 71 291 13.8 90 6 9 > 41 39 323 11.5 87 6 10 > 42 NA 259 10.9 93 6 11 > 43 NA 250 9.2 92 6 12 > 44 23 148 8.0 82 6 13 > 45 NA 332 13.8 80 6 14 > 46 NA 322 11.5 79 6 15 > 47 21 191 14.9 77 6 16 > 48 37 284 20.7 72 6 17 > 49 20 37 9.2 65 6 18 > 50 12 120 11.5 73 6 19 > 51 13 137 10.3 76 6 20 > 52 NA 150 6.3 77 6 21 > 53 NA 59 1.7 76 6 22 > 54 NA 91 4.6 76 6 23 > 55 NA 250 6.3 76 6 24 > 56 NA 135 8.0 75 6 25 > 57 NA 127 8.0 78 6 26 > 58 NA 47 10.3 73 6 27 > 59 NA 98 11.5 80 6 28 > 60 NA 31 14.9 77 6 29 > 61 NA 138 8.0 83 6 30 > 62135 269 4.1 84 7 1 > 63 49 248 9.2 85 7 2 > 64 32 236 9.2 81 7 3 > 65 NA 101 10.9 84 7 4 > 66 64 175 4.6 83 7 5 > 67 40 314 10.9 83 7 6 > 68 77 276 5.1 88 7 7 > 69 97 267 6.3 92 7 8 > 70 97 272 5.7 92 7 9 > 71 85 175 7.4 89 7 10 > 72 NA 139 8.6 82 7 11 > 73 10 264 14.3 73 7 12 > 74 27 175 14.9 81 7 13 > 75 NA 291 14.9 91 7 14 > 76 7 48 14.3 80 7 15 > 77 48 260 6.9 81 7 16 > 78 35 274 10.3 82 7 17 > 79 61 285 6.3 84 7 18 > 80 79 187 5.1 87 7 19 > 81 63 220 11.5 85 7 20 > 82 16 7 6.9 74 7 21 > 83 NA 258 9.7 81 7 22 > 84 NA 295 11.5 82 7 23 > 85 80 294 8.6 86 7 24 > 86108 223 8.0 85 7 25 > 87 20 81 8.6 82 7 26 > 88 52 82 12.0 86 7 27 > 89 82 213 7.4 88 7 28 > 90 50 275 7.4 86 7 29 > 91 64 253 7.4 83 7 30 > 92
Re: [R] trouble double looping to generate data for a meta-analysis
You might look at the package Wakefield For data generation -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Marietta Suarez Sent: Friday, July 01, 2016 1:28 PM To: r-help@r-project.org Subject: [R] trouble double looping to generate data for a meta-analysis i'm trying to generate data for a meta analysis. 1- generate data following a normal distribution, 2- generate data following a skewed distribution, 3- generate data following a logistic distribution. i need to loop this because the # of studies in each meta will be either 10 or 15. k or total number of studies in the meta will be 5. i need to loop twice to repeat this process 10 times. database should be 3 columns (distributions) by 65 rows x 10 reps here's my code, not sure what's not working: library(fGarch) #n reps =10 rep=10 #begin function here, need to vary n and k, when k=2 n=10, when k3 n=15 fun=function(n, k){ #prepare to store data data=matrix(0,nrow=10*k, ncol=3) db=matrix(0,nrow=650, ncol=3) for (j in 1:rep) { for (i in 1:k) { #generate data under normal, skewed, and logistic distributions here data[,1]=rnorm(n, 100, 15) data[,2]=rsnorm(n, 100, 15, 1) data[,3]=rlogis(n, 100, 15) } [j]=db } } save=fun(10,2) Please help!!! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] trouble double looping to generate data for a meta-analysis
i'm trying to generate data for a meta analysis. 1- generate data following a normal distribution, 2- generate data following a skewed distribution, 3- generate data following a logistic distribution. i need to loop this because the # of studies in each meta will be either 10 or 15. k or total number of studies in the meta will be 5. i need to loop twice to repeat this process 10 times. database should be 3 columns (distributions) by 65 rows x 10 reps here's my code, not sure what's not working: library(fGarch) #n reps =10 rep=10 #begin function here, need to vary n and k, when k=2 n=10, when k3 n=15 fun=function(n, k){ #prepare to store data data=matrix(0,nrow=10*k, ncol=3) db=matrix(0,nrow=650, ncol=3) for (j in 1:rep) { for (i in 1:k) { #generate data under normal, skewed, and logistic distributions here data[,1]=rnorm(n, 100, 15) data[,2]=rsnorm(n, 100, 15, 1) data[,3]=rlogis(n, 100, 15) } [j]=db } } save=fun(10,2) Please help!!! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Difficulty subsetting data frames using logical operators
You may need to re-read the Intro to R. data[data$Ozone > 31,] or subset(data, Ozone > 31) Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Fri, Jul 1, 2016 at 5:11 AM, Giles Bischoffwrote: > So, I uploaded a data set via my directory using the command data <- > data.frame(read.csv("hw1_data.csv")) and then tried to subset that data > using logical operators. Specifically, I was trying to make it so that I > got all the rows in which the values for "Ozone" (a column in the data set) > were greater than 31 (I was trying to get the mean of all said values). > Then, I tried using the command data[ , "Ozone">31]. Additionally, I had > trouble getting it so that I had all the rows where all the values in > "Ozone">31 & "Temp">90 simultaneously. There were some NA values in both of > those columns, so that might be it. If someone could help me to figure out > how to remove those values, that'd be great as well. I'm using a Mac (OS X) > with the latest version of R (3.1.2. I think??). > > Here is some of the code I used: > > >data <- data.frame(read.csv("hw1_data.csv")) > > data > Ozone Solar.R Wind Temp Month Day > 1 41 190 7.4 67 5 1 > 2 36 118 8.0 72 5 2 > 3 12 149 12.6 74 5 3 > 4 18 313 11.5 62 5 4 > 5 NA NA 14.3 56 5 5 > 6 28 NA 14.9 66 5 6 > 7 23 299 8.6 65 5 7 > 8 19 99 13.8 59 5 8 > 9 8 19 20.1 61 5 9 > 10 NA 194 8.6 69 5 10 > 11 7 NA 6.9 74 5 11 > 12 16 256 9.7 69 5 12 > 13 11 290 9.2 66 5 13 > 14 14 274 10.9 68 5 14 > 15 18 65 13.2 58 5 15 > 16 14 334 11.5 64 5 16 > 17 34 307 12.0 66 5 17 > 18 6 78 18.4 57 5 18 > 19 30 322 11.5 68 5 19 > 20 11 44 9.7 62 5 20 > 21 1 8 9.7 59 5 21 > 22 11 320 16.6 73 5 22 > 23 4 25 9.7 61 5 23 > 24 32 92 12.0 61 5 24 > 25 NA 66 16.6 57 5 25 > 26 NA 266 14.9 58 5 26 > 27 NA NA 8.0 57 5 27 > 28 23 13 12.0 67 5 28 > 29 45 252 14.9 81 5 29 > 30115 223 5.7 79 5 30 > 31 37 279 7.4 76 5 31 > 32 NA 286 8.6 78 6 1 > 33 NA 287 9.7 74 6 2 > 34 NA 242 16.1 67 6 3 > 35 NA 186 9.2 84 6 4 > 36 NA 220 8.6 85 6 5 > 37 NA 264 14.3 79 6 6 > 38 29 127 9.7 82 6 7 > 39 NA 273 6.9 87 6 8 > 40 71 291 13.8 90 6 9 > 41 39 323 11.5 87 6 10 > 42 NA 259 10.9 93 6 11 > 43 NA 250 9.2 92 6 12 > 44 23 148 8.0 82 6 13 > 45 NA 332 13.8 80 6 14 > 46 NA 322 11.5 79 6 15 > 47 21 191 14.9 77 6 16 > 48 37 284 20.7 72 6 17 > 49 20 37 9.2 65 6 18 > 50 12 120 11.5 73 6 19 > 51 13 137 10.3 76 6 20 > 52 NA 150 6.3 77 6 21 > 53 NA 59 1.7 76 6 22 > 54 NA 91 4.6 76 6 23 > 55 NA 250 6.3 76 6 24 > 56 NA 135 8.0 75 6 25 > 57 NA 127 8.0 78 6 26 > 58 NA 47 10.3 73 6 27 > 59 NA 98 11.5 80 6 28 > 60 NA 31 14.9 77 6 29 > 61 NA 138 8.0 83 6 30 > 62135 269 4.1 84 7 1 > 63 49 248 9.2 85 7 2 > 64 32 236 9.2 81 7 3 > 65 NA 101 10.9 84 7 4 > 66 64 175 4.6 83 7 5 > 67 40 314 10.9 83 7 6 > 68 77 276 5.1 88 7 7 > 69 97 267 6.3 92 7 8 > 70 97 272 5.7 92 7 9 > 71 85 175 7.4 89 7 10 > 72 NA 139 8.6 82 7 11 > 73 10 264 14.3 73 7 12 > 74 27 175 14.9 81 7 13 > 75 NA 291 14.9 91 7 14 > 76 7 48 14.3 80 7 15 > 77 48 260 6.9 81 7 16 > 78 35 274 10.3 82 7 17 > 79 61 285 6.3 84 7 18 > 80 79 187 5.1 87 7 19 > 81 63 220 11.5 85 7 20 > 82 16 7 6.9 74 7 21 > 83 NA 258 9.7 81 7 22 > 84 NA 295 11.5 82 7 23 > 85 80 294 8.6 86 7 24 > 86108 223 8.0 85 7 25 > 87 20 81 8.6 82 7 26 > 88 52 82 12.0 86 7 27 > 89 82 213 7.4 88 7 28 > 90 50 275 7.4 86 7 29 > 91 64 253 7.4 83 7 30 > 92 59 254 9.2 81 7 31 > 93 39 83 6.9 81 8 1 > 94 9 24 13.8
[R] C stack error in as.vector() starting in R 3.3.0
Apologies for the long post. This is an issue I have been struggling with and I have tried to be as complete, to the point, and reproducible as possible. In documenting a package with roxygen2, I have come across an error that does not occur in R 3.2.4 revised, but does occur in R 3.3.0 and 3.3.1. Using traceback() and debug(), I've traced the error to a call made to as.vector(x, "character") that seems to get stuck in a loop which culminates in this error: Error: C stack usage 7970892 is too close to the limit The object that causes this error is of a signature type for a method. With some playing around, I've been able to work out that the error is actually associated with the method that roxygen2 creates when doing its magic. Something happens when this method definition with its associated environment is in the workspace that causes the error. At this point, I should stress again that the error does NOT occur in R 3.2.4 revised or earlier, but does occur in R 3.3.0 and 3.3.1. I have also tested this with several versions of roxygen2 and that does not make a difference. Thus, my ultimate question is what has changed in R 3.3.0 that would lead to this so that the roxygen2 maintainers can correct it? As a test, I created a signature identical to the one that causes the error and tried the as.vector() command that would generate the loop. I don't get the error in my command line generated object, nor with the object that is extracted from the method definition. However, when I load the method definition into the workspace, the error will happen on either my command line generated object, or the one extracted from the method definition and will not stop happening until I restart R. I have reported this error as an issue on the roxygen2 GitHub repository and it has been crossposted by another user who had a similar experience with a different package on the devtools repository. Those posts, which contain more information are here: https://github.com/klutometis/roxygen/issues/475 https://github.com/hadley/devtools/issues/1234 Below is the result of sessionInfo() and the output of a session demonstrating the effect. The files used are in a zip file here: https://github.com/klutometis/roxygen/files/335417/error.test.rdata.files.zip > sessionInfo() R version 3.3.0 (2016-05-03) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.11.5 (El Capitan) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.3.0 > rm(list = ls()) > > # Create test class and method > setClass(Class = "gtypes", slots = c(loci = "data.frame", ploidy = "numeric"), package = "roxygen_devtest") > setGeneric("nInd", function(x, ...) standardGeneric("nInd"), package = "adegenet") [1] "nInd" > setMethod("nInd", "gtypes", function(x, ...) nrow(x@loci) / x@ploidy) [1] "nInd" > > test.method <- getMethod("nInd", "gtypes") > str(test.method) Formal class 'MethodDefinition' [package "methods"] with 4 slots ..@ .Data :function (x, ...) ..@ target :Formal class 'signature' [package "methods"] with 3 slots .. .. ..@ .Data : chr "gtypes" .. .. ..@ names : chr "x" .. .. ..@ package: chr "roxygen_devtest" ..@ defined:Formal class 'signature' [package "methods"] with 3 slots .. .. ..@ .Data : chr "gtypes" .. .. ..@ names : chr "x" .. .. ..@ package: chr "roxygen_devtest" ..@ generic: atomic [1:1] nInd .. ..- attr(*, "package")= chr "adegenet" > > # No error: > as.vector(test.method@defined, "character") [1] "gtypes" > > # This is the method that generates the error > load("problem.method.rdata") > > # Notice the difference in the environments of the two functions: > test.method@.Data function (x, ...) nrow(x@loci)/x@ploidy > problem.method@.Data function (x, ...) nrow(x@loci)/x@ploidy > > # Swap the problem function for the one I created: > problem.method@.Data <- test.method@.Data > save(problem.method, file = "fixed.method.rdata") > > # Here's the error (both with my method and the original): > as.vector(test.method@defined, "character") Error: C stack usage 7970892 is too close to the limit > as.vector(problem.method@defined, "character") Error: C stack usage 7970892 is too close to the limit Restarting R session... > # *** Restart R for this to work *** > rm(list = ls()) > > load("fixed.method.rdata") > as.vector(problem.method@defined, "character") [1] "gtypes" *Eric Archer, Ph.D.* Southwest Fisheries Science Center NMFS, NOAA 8901 La Jolla Shores Drive La Jolla, CA 92037 USA 858-546-7121 (work) 858-546-7003 (FAX) Marine Mammal Genetics Group: swfsc.noaa.gov/mmtd-mmgenetics ETP Cetacean Assessment Program: swfsc.noaa.gov/mmtd-etp https://github/ericarcher " *The universe doesn't care what you believe. The wonderful thing about science is that it doesn't ask for your faith, it just asks for your eyes.*" -
Re: [R] How to identify runs or clusters of events in time
Mark, I did something similar a couple of year ago by coding non-events as 0, positive events as +1 and negative events as -1 then summing the value through time. In my case the patterns showed up quite clearly and I used other criteria to define the actual periods. Clint Clint BowmanINTERNET: cl...@ecy.wa.gov Air Quality Modeler INTERNET: cl...@math.utah.edu Department of Ecology VOICE: (360) 407-6815 PO Box 47600FAX:(360) 407-7534 Olympia, WA 98504-7600 USPS: PO Box 47600, Olympia, WA 98504-7600 Parcels:300 Desmond Drive, Lacey, WA 98503-1274 On Fri, 1 Jul 2016, Mark Shanks wrote: Hi, Imagine the two problems: 1) You have an event that occurs repeatedly over time. You want to identify periods when the event occurs more frequently than the base rate of occurrence. Ideally, you don't want to have to specify the period (e.g., break into months), so the analysis can be sensitive to scenarios such as many events happening only between, e.g., June 10 and June 15 - even though the overall number of events for the month may not be much greater than usual. Similarly, there may be a cluster of events that occur from March 28 to April 3. Ideally, you want to pull out the base rate of occurrence and highlight only the periods when the frequency is less or greater than the base rate. 2) Events again occur repeatedly over time in an inconsistent way. However, this time, the event has positive or negative outcomes - such as a spot check of conformity to regulations. You again want to know whether there is a group of negative outcomes close together in time. This analysis should take into account the negative outcomes as well though. E.g., if from June 10 to June 15 you get 5 negative outcomes and no positive outcomes it should be flagged. On the other hand, if from June 10 to June 15 you get 5 negative outcomes interspersed between many positive outcomes it should be ignored. I'm guessing that there is some statistical approach designed to look at these types of issues. What is it called? What package in R implements it? I basically just need to know where to start. Thanks, Mark [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RODBC Error - first argument is not an open ODBC Channel
Hi, Sincere apologies if my mail is inappropriate for this mailbox. Please ignore the mail. *Issue* I am getting error "*first argument is not an open RODBC channel*" when I publish my application on IIS. It runs perfectly under Visual Studio development mode and the script runs fine on R Console too. But getting error once published to IIS *Code Snippet:* library(RODBC) conna <- odbcConnect("XXX",uid='sa',pwd='x') coreKPI<-sqlFetch(conna,"vw_GetFields") odbcClose(conna) I have tried using odbcDriverConnect instead of odbcConnect and providing complete DB string but still the issue. Also I have set the PATH variable to point to 32 bit R. Tried few other options suggested on stack overflow but no luck. I have been struggling for 3 days now to get this going. Please advise. Thanks in anticipation. Warm Regards Gaurav [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Difficulty subsetting data frames using logical operators
So, I uploaded a data set via my directory using the command data <- data.frame(read.csv("hw1_data.csv")) and then tried to subset that data using logical operators. Specifically, I was trying to make it so that I got all the rows in which the values for "Ozone" (a column in the data set) were greater than 31 (I was trying to get the mean of all said values). Then, I tried using the command data[ , "Ozone">31]. Additionally, I had trouble getting it so that I had all the rows where all the values in "Ozone">31 & "Temp">90 simultaneously. There were some NA values in both of those columns, so that might be it. If someone could help me to figure out how to remove those values, that'd be great as well. I'm using a Mac (OS X) with the latest version of R (3.1.2. I think??). Here is some of the code I used: >data <- data.frame(read.csv("hw1_data.csv")) > data Ozone Solar.R Wind Temp Month Day 1 41 190 7.4 67 5 1 2 36 118 8.0 72 5 2 3 12 149 12.6 74 5 3 4 18 313 11.5 62 5 4 5 NA NA 14.3 56 5 5 6 28 NA 14.9 66 5 6 7 23 299 8.6 65 5 7 8 19 99 13.8 59 5 8 9 8 19 20.1 61 5 9 10 NA 194 8.6 69 5 10 11 7 NA 6.9 74 5 11 12 16 256 9.7 69 5 12 13 11 290 9.2 66 5 13 14 14 274 10.9 68 5 14 15 18 65 13.2 58 5 15 16 14 334 11.5 64 5 16 17 34 307 12.0 66 5 17 18 6 78 18.4 57 5 18 19 30 322 11.5 68 5 19 20 11 44 9.7 62 5 20 21 1 8 9.7 59 5 21 22 11 320 16.6 73 5 22 23 4 25 9.7 61 5 23 24 32 92 12.0 61 5 24 25 NA 66 16.6 57 5 25 26 NA 266 14.9 58 5 26 27 NA NA 8.0 57 5 27 28 23 13 12.0 67 5 28 29 45 252 14.9 81 5 29 30115 223 5.7 79 5 30 31 37 279 7.4 76 5 31 32 NA 286 8.6 78 6 1 33 NA 287 9.7 74 6 2 34 NA 242 16.1 67 6 3 35 NA 186 9.2 84 6 4 36 NA 220 8.6 85 6 5 37 NA 264 14.3 79 6 6 38 29 127 9.7 82 6 7 39 NA 273 6.9 87 6 8 40 71 291 13.8 90 6 9 41 39 323 11.5 87 6 10 42 NA 259 10.9 93 6 11 43 NA 250 9.2 92 6 12 44 23 148 8.0 82 6 13 45 NA 332 13.8 80 6 14 46 NA 322 11.5 79 6 15 47 21 191 14.9 77 6 16 48 37 284 20.7 72 6 17 49 20 37 9.2 65 6 18 50 12 120 11.5 73 6 19 51 13 137 10.3 76 6 20 52 NA 150 6.3 77 6 21 53 NA 59 1.7 76 6 22 54 NA 91 4.6 76 6 23 55 NA 250 6.3 76 6 24 56 NA 135 8.0 75 6 25 57 NA 127 8.0 78 6 26 58 NA 47 10.3 73 6 27 59 NA 98 11.5 80 6 28 60 NA 31 14.9 77 6 29 61 NA 138 8.0 83 6 30 62135 269 4.1 84 7 1 63 49 248 9.2 85 7 2 64 32 236 9.2 81 7 3 65 NA 101 10.9 84 7 4 66 64 175 4.6 83 7 5 67 40 314 10.9 83 7 6 68 77 276 5.1 88 7 7 69 97 267 6.3 92 7 8 70 97 272 5.7 92 7 9 71 85 175 7.4 89 7 10 72 NA 139 8.6 82 7 11 73 10 264 14.3 73 7 12 74 27 175 14.9 81 7 13 75 NA 291 14.9 91 7 14 76 7 48 14.3 80 7 15 77 48 260 6.9 81 7 16 78 35 274 10.3 82 7 17 79 61 285 6.3 84 7 18 80 79 187 5.1 87 7 19 81 63 220 11.5 85 7 20 82 16 7 6.9 74 7 21 83 NA 258 9.7 81 7 22 84 NA 295 11.5 82 7 23 85 80 294 8.6 86 7 24 86108 223 8.0 85 7 25 87 20 81 8.6 82 7 26 88 52 82 12.0 86 7 27 89 82 213 7.4 88 7 28 90 50 275 7.4 86 7 29 91 64 253 7.4 83 7 30 92 59 254 9.2 81 7 31 93 39 83 6.9 81 8 1 94 9 24 13.8 81 8 2 95 16 77 7.4 82 8 3 96 78 NA 6.9 86 8 4 97 35 NA 7.4 85 8 5 98 66 NA 4.6 87 8 6 99122 255 4.0 89 8 7 10089 229 10.3 90 8 8 101 110 207 8.0 90 8 9 102NA 222 8.6 92 8 10 103NA 137 11.5 86 8 11 10444 192 11.5 86 8 12 10528 273 11.5 82 8 13 10665 157 9.7 80 8 14 107NA 64 11.5 79 8 15 10822 71 10.3
Re: [R] Understanding and predict round-off errors sign on simple functions
Thank you for all your answers and I will take a look to the 'propagate' package. Ps: first time I am participating to a mailing list, I hope I answer to the right emails. -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of DIGHE, NILESH [AG/2362] Sent: jeudi, 30 juin 2016 14:02 To: Marc Schwartz; Bert Gunter Cc: R-help Subject: Re: [R] Understanding and predict round-off errors sign on simple functions Using "runmean" function from caTools package within your SMA function appears to solve the issue. Please see details below. library(caTools) > dput(m) structure(c(-0.626453810742332, 0.183643324222082, -0.835628612410047, 1.59528080213779, 0.329507771815361, -0.820468384118015, 0.487429052428485, 0.738324705129217, 0.575781351653492, -0.305388387156356, 3.51178116845085, 2.38984323641143, 1.3787594194582, -0.2146998871775, 3.12493091814311, 1.95506639098477, 1.98380973690105, 2.9438362106853, 2.82122119509809, 2.59390132121751, 5.91897737160822, 5.78213630073107, 5.07456498336519, 3.01064830413663, 5.61982574789471, 4.943871260471, 4.84420449329467, 3.52924761610073, 4.52184994489138, 5.4179415601997), .Dim = c(10L, 3L)) > dput(SMA) function (x, n = 10, ...) { ma <- runmean(x, n) if (!is.null(dim(ma))) { colnames(ma) <- "SMA" } return(ma) } mma <- apply(m, 2, SMA, n=1) results<-mma-m > dput(results) structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), .Dim = c(10L, 3L)) Nilesh -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Marc Schwartz Sent: Wednesday, June 29, 2016 1:07 PM To: Bert Gunter Cc: R-help Subject: Re: [R] Understanding and predict round-off errors sign on simple functions Hi, Just to augment Bert's comments, I presume that you are aware of the relevant R FAQ: https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these- numbers-are-equal_003f That you had an expectation of the difference being 0 suggested to me that you might not be, but my apologies if that is not the case. That being said, there are some higher precision CRAN packages that may offer some additional functionality, with the potential limitations that Bert references below. More information is available in the Numerical Mathematics CRAN Task View: https://cran.r-project.org/web/views/NumericalMathematics.html In addition, with the caveat that I have not used it, there is the 'propagate' package on CRAN that may be relevant to what you want to be able to anticipate, at some level: https://cran.r-project.org/web/packages/propagate/index.html It has not been updated in a while and there are some notes for the CRAN package checks, that suggest that the maintainer may not be active at this point. Regards, Marc > On Jun 29, 2016, at 10:13 AM, Bert Gunter wrote: > > I am certainly no expert, but I would assume that: > > 1. Roundoff errors depend on the exact numerical libraries and > versions that are used, and so general language comparisons are > impossible without that information; > > 2. Roundoff errors depend on the exact calculations being done and > machine precision and are very complicated to determine > > So I would say the answer to your questions is no. > > But you should probably address such a question to a numerical analyst > for an authoritative answer. Maybe try stats.stackexchange.com . > > -- Bert > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Wed, Jun 29, 2016 at 2:55 AM, Sirhc via R-help wrote: >> Hi, >> >> >> >> May be it is a basic thing but I would like to know if we can >> anticipate round-off errors sign. >> >> >> >> Here is an example : >> >> >> >> # numerical matrix >> >> m <- matrix(data=cbind(rnorm(10, 0), rnorm(10, 2), rnorm(10, 5)), >> nrow=10, >> ncol=3) >> >> >> >>> m >> >>[,1] [,2] [,3] >> >> [1,] 0.4816247 1.1973502 3.855641 >> >> [2,] -1.2174937 0.7356427 4.393279 >> >> [3,] 0.8504074 2.5286509 2.689196 >> >> [4,] 1.8048642 1.8580804 6.665237 >> >> [5,] -0.6749397 1.0944277 4.838608 >> >> [6,] 0.8252034 1.5595268 3.681695 >> >> [7,] 1.3002208 0.9582693 4.561577 >> >> [8,] 1.6950923 3.5677921 6.005078 >> >> [9,] 0.6509285 0.9025964 5.082288 >> >> [10,] -0.5676040 1.3281102 4.446451 >> >> >> >> #weird moving average of period 1 ! >> >> mma <- apply(m, 2, SMA, n=1) >> >> >> >>> mma >> >>[,1] [,2] [,3] >> >> [1,] NANA NA >> >> [2,] -1.2174937 0.7356427 4.393279 >> >> [3,] 0.8504074 2.5286509 2.689196 >> >> [4,] 1.8048642 1.8580804 6.665237 >> >> [5,] -0.6749397 1.0944277 4.838608 >> >> [6,] 0.8252034
Re: [R] [FORGED] Splitting data.frame into a list of small data.frames given indices
Inline. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Jul 1, 2016 at 7:40 AM, Witold E Wolskiwrote: > Hi William, > > I tested plyrs dlply function, and it seems to have have an O(N* > log(R)) complexity (tested for R=N) so I do not know if N is the > number of rows or nr of categories. > > For the data.frame example with 2e5 rows and 2e5 categories it is > approx. 10 times faster than split. Still, it is 10 seconds on an > i7-5930K 3.5GHz Intel. > > It would be nice if the documentation would contain runtime > complexity information and the documentation of base package function > would point to function which should be used instead. It would, indeed -- but these things are very dependent on exact data context, the hardware in use, the OS, etc. Moreover, how could base documentation possibly keep track of what all other packages are doing?! -- that seems unreasonable on the face of it. I know that from time to time R docs do mention base algorithmic complexity (e.g. ?sort and the data.table package, I believe), but generally it is preferable to omit such details, imho: access to a function should be through its relatively fixed API, while the underlying machinery may be subject to considerable change. Obviously, there are circumstances where something still could (and perhaps should) be said about efficiency -- and R docs often do say it -- but I think the level of detail you request is unrealistic and might often even mislead. Obviously, just my opinion, so contrary views welcome. Cheers, Bert > > Thanks > > > > > On 29 June 2016 at 16:13, William Dunlap wrote: >> I won't go into why splitting data.frames (or factors) uses time >> proportional to the number of input rows times the number of >> levels in the splitting factor, but you will get much better mileage >> if you call split individually on each 'atomic' (numeric, character, ...) >> variable and use mapply on the resulting lists. >> >> The plyr and dplyr packages were developed to deal with this >> sort of problem. Check them out. >> >> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com >> >> On Wed, Jun 29, 2016 at 6:21 AM, Witold E Wolski wrote: >>> >>> Hi, >>> >>> Here is an complete example which shows the the complexity of split or >>> by is O(n^2) >>> >>> nrows <- c(1e3,5e3, 1e4 ,5e4, 1e5 ,2e5) >>> res<-list() >>> >>> for(i in nrows){ >>> dum <- data.frame(x = runif(i,1,1000), y=runif(i,1,1000)) >>> res[[length(res)+1]]<-(system.time(x<- split(dum, 1:nrow(dum >>> } >>> res <- do.call("rbind",res) >>> plot(nrows^2, res[,"elapsed"]) >>> >>> And I can't see a reason why this has to be so slow. >>> >>> >>> cheers >>> >>> >>> >>> >>> >>> >>> >>> On 29 June 2016 at 12:00, Rolf Turner wrote: >>> > On 29/06/16 21:16, Witold E Wolski wrote: >>> >> >>> >> It's the inverse problem to merging a list of data.frames into a large >>> >> data.frame just discussed in the "performance of do.call("rbind")" >>> >> thread >>> >> >>> >> I would like to split a data.frame into a list of data.frames >>> >> according to first column. >>> >> This SEEMS to be easily possible with the function base::by. However, >>> >> as soon as the data.frame has a few million rows this function CAN NOT >>> >> BE USED (except you have A PLENTY OF TIME). >>> >> >>> >> for 'by' runtime ~ nrow^2, or formally O(n^2) (see benchmark below). >>> >> >>> >> So basically I am looking for a similar function with better >>> >> complexity. >>> >> >>> >> >>> >> > nrows <- c(1e5,1e6,2e6,3e6,5e6) >>> >>> >>> >>> timing <- list() >>> >>> for(i in nrows){ >>> >> >>> >> + dum <- peaks[1:i,] >>> >> + timing[[length(timing)+1]] <- system.time(x<- by(dum[,2:3], >>> >> INDICES=list(dum[,1]), FUN=function(x){x}, simplify = FALSE)) >>> >> + } >>> >>> >>> >>> names(timing)<- nrows >>> >>> timing >>> >> >>> >> $`1e+05` >>> >>user system elapsed >>> >>0.050.000.05 >>> >> >>> >> $`1e+06` >>> >>user system elapsed >>> >>1.482.984.46 >>> >> >>> >> $`2e+06` >>> >>user system elapsed >>> >>7.25 11.39 18.65 >>> >> >>> >> $`3e+06` >>> >>user system elapsed >>> >> 16.15 25.81 41.99 >>> >> >>> >> $`5e+06` >>> >>user system elapsed >>> >> 43.22 74.72 118.09 >>> > >>> > >>> > I'm not sure that I follow what you're doing, and your example is not >>> > reproducible, since we have no idea what "peaks" is, but on a toy >>> > example >>> > with 5e6 rows in the data frame I got a timing result of >>> > >>> >user system elapsed >>> > 0.379 0.025 0.406 >>> > >>> > when I applied split(). Is this adequately fast? Seems to me that if >>> > you >>> > want to split something, split() would be a good place to start. >>> > >>> > cheers, >>> > >>> > Rolf Turner >>> > >>> > -- >>> > Technical
Re: [R] Column product
Hi, I think this is substantially less ugly: A <- matrix(1:15,nrow=5,byrow=F); A a <- c(1,2,3) B <- sweep(A, 2, a, "^") apply(B, 1, prod) You could combine it into one line if you wanted, but I find it clearer as two: > apply(sweep(A, 2, a, "^"), 1, prod) [1] 47916 169344 421824 889056 1687500 Sarah On Fri, Jul 1, 2016 at 10:15 AM, Steven Yenwrote: > A is a 5 x 3 matrix and a is a 3-vector. I like to exponentiate A[,1] to > a[1], A[,2] to a[2], and A[,3] to a[3], and obtain the product of the > resulting columns, as in line 3. > > I also accomplish this with lines 4 and 5. I like to have rowProducts(B) > but there is not so I came up with something ugly in line > 5--exponentiating the row sums of log. Is there a more elegant way than > than line 5 or, better yet, lines 4 and 5 together? Thanks. > > A<-matrix(1:15,nrow=5,byrow=F); A > a<-c(1,2,3) > (A[,1]^a[1])*(A[,2]^a[2])*(A[,3]^a[3]) > B<-t(t(A)^a); B > exp(rowSums(log(B))) > > Result: > > > A<-matrix(1:15,nrow=5,byrow=F); A > [,1] [,2] [,3] > [1,]16 11 > [2,]27 12 > [3,]38 13 > [4,]49 14 > [5,]5 10 15 > > a<-c(1,2,3) > > (A[,1]^a[1])*(A[,2]^a[2])*(A[,3]^a[3]) > [1] 47916 169344 421824 889056 1687500 > > B<-t(t(A)^a); B > [,1] [,2] [,3] > [1,]1 36 1331 > [2,]2 49 1728 > [3,]3 64 2197 > [4,]4 81 2744 > [5,]5 100 3375 > > exp(rowSums(log(B))) > [1] 47916 169344 421824 889056 1687500 > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [FORGED] Splitting data.frame into a list of small data.frames given indices
Hi William, I tested plyrs dlply function, and it seems to have have an O(N* log(R)) complexity (tested for R=N) so I do not know if N is the number of rows or nr of categories. For the data.frame example with 2e5 rows and 2e5 categories it is approx. 10 times faster than split. Still, it is 10 seconds on an i7-5930K 3.5GHz Intel. It would be nice if the documentation would contain runtime complexity information and the documentation of base package function would point to function which should be used instead. Thanks On 29 June 2016 at 16:13, William Dunlapwrote: > I won't go into why splitting data.frames (or factors) uses time > proportional to the number of input rows times the number of > levels in the splitting factor, but you will get much better mileage > if you call split individually on each 'atomic' (numeric, character, ...) > variable and use mapply on the resulting lists. > > The plyr and dplyr packages were developed to deal with this > sort of problem. Check them out. > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Wed, Jun 29, 2016 at 6:21 AM, Witold E Wolski wrote: >> >> Hi, >> >> Here is an complete example which shows the the complexity of split or >> by is O(n^2) >> >> nrows <- c(1e3,5e3, 1e4 ,5e4, 1e5 ,2e5) >> res<-list() >> >> for(i in nrows){ >> dum <- data.frame(x = runif(i,1,1000), y=runif(i,1,1000)) >> res[[length(res)+1]]<-(system.time(x<- split(dum, 1:nrow(dum >> } >> res <- do.call("rbind",res) >> plot(nrows^2, res[,"elapsed"]) >> >> And I can't see a reason why this has to be so slow. >> >> >> cheers >> >> >> >> >> >> >> >> On 29 June 2016 at 12:00, Rolf Turner wrote: >> > On 29/06/16 21:16, Witold E Wolski wrote: >> >> >> >> It's the inverse problem to merging a list of data.frames into a large >> >> data.frame just discussed in the "performance of do.call("rbind")" >> >> thread >> >> >> >> I would like to split a data.frame into a list of data.frames >> >> according to first column. >> >> This SEEMS to be easily possible with the function base::by. However, >> >> as soon as the data.frame has a few million rows this function CAN NOT >> >> BE USED (except you have A PLENTY OF TIME). >> >> >> >> for 'by' runtime ~ nrow^2, or formally O(n^2) (see benchmark below). >> >> >> >> So basically I am looking for a similar function with better >> >> complexity. >> >> >> >> >> >> > nrows <- c(1e5,1e6,2e6,3e6,5e6) >> >>> >> >>> timing <- list() >> >>> for(i in nrows){ >> >> >> >> + dum <- peaks[1:i,] >> >> + timing[[length(timing)+1]] <- system.time(x<- by(dum[,2:3], >> >> INDICES=list(dum[,1]), FUN=function(x){x}, simplify = FALSE)) >> >> + } >> >>> >> >>> names(timing)<- nrows >> >>> timing >> >> >> >> $`1e+05` >> >>user system elapsed >> >>0.050.000.05 >> >> >> >> $`1e+06` >> >>user system elapsed >> >>1.482.984.46 >> >> >> >> $`2e+06` >> >>user system elapsed >> >>7.25 11.39 18.65 >> >> >> >> $`3e+06` >> >>user system elapsed >> >> 16.15 25.81 41.99 >> >> >> >> $`5e+06` >> >>user system elapsed >> >> 43.22 74.72 118.09 >> > >> > >> > I'm not sure that I follow what you're doing, and your example is not >> > reproducible, since we have no idea what "peaks" is, but on a toy >> > example >> > with 5e6 rows in the data frame I got a timing result of >> > >> >user system elapsed >> > 0.379 0.025 0.406 >> > >> > when I applied split(). Is this adequately fast? Seems to me that if >> > you >> > want to split something, split() would be a good place to start. >> > >> > cheers, >> > >> > Rolf Turner >> > >> > -- >> > Technical Editor ANZJS >> > Department of Statistics >> > University of Auckland >> > Phone: +64-9-373-7599 ext. 88276 >> >> >> >> -- >> Witold Eryk Wolski >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- Witold Eryk Wolski __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logistic regression and robust standard errors
On Fri, 1 Jul 2016, Faradj Koliev wrote: Dear Achim Zeileis, Many thanks for your quick and informative answer. I?m sure that the vcovCL should work, however, I experience some problems. > coeftest(model, vcov=vcovCL(model, cluster=mydata$ID)) First I got this error: Error in vcovCL(model, cluster = mydata$ID) : length of 'cluster' does not match number of observations This probably means that not all rows of "mydata" have been used for the estimation of the model, e.g., due to missing values or something like that. Hence, the mismatch seems to occur. After checking the observations I got this error: Error in vcovCL(model, cluster = mydata$ID) : object 'tets' not found Called from: vcovCL(model, cluster = mydata$ID) Browse[1]> The variable "tets" is presumably one of the regressors in your "model" and apparently it cannot be found when extracting the model scores. Maybe you haven't stored the model frame and deleted the data. Hard to say without a simple reproducible example. Z What can I do to fix it? What am I doing wrong now? 1 jul 2016 kl. 14:57 skrev Achim Zeileis: On Fri, 1 Jul 2016, Faradj Koliev wrote: Dear all, I use ?polr? command (library: MASS) to estimate an ordered logistic regression. My model: summary( model<- polr(y ~ x1+x2+x3+x4+x1*x2 ,data=mydata, Hess = TRUE)) But how do I get robust clustered standard errors? I??ve tried coeftest(resA, vcov=vcovHC(resA, cluster=lipton$ID)) The vcovHC() function currently does not (yet) have a "cluster" argument. We are working on it but it's not finished yet. As an alternative I include the vcovCL() function below that computes the usual simple clustered sandwich estimator. This can be applied to "polr" objects and plugged into coeftest(). So coeftest(resA, vcov=vcovCL(resA, cluster=lipton$ID)) should work. and summary(a <- robcov(model,mydata$ID)). The robcov() function does in principle what you want by I'm not sure whether it works with polr(). But for sure it works with lrm() from the "rms" package. Hope that helps, Z vcovCL <- function(object, cluster = NULL, adjust = NULL) { stopifnot(require("sandwich")) ## cluster specification if(is.null(cluster)) cluster <- attr(object, "cluster") if(is.null(cluster)) stop("no 'cluster' specification found") cluster <- factor(cluster) ## estimating functions and dimensions ef <- estfun(object) n <- NROW(ef) k <- NCOL(ef) if(n != length(cluster)) stop("length of 'cluster' does not match number of observations") m <- length(levels(cluster)) ## aggregate estimating functions by cluster and compute meat ef <- sapply(levels(cluster), function(i) colSums(ef[cluster == i, , drop = FALSE])) ef <- if(NCOL(ef) > 1L) t(ef) else matrix(ef, ncol = 1L) mt <- crossprod(ef)/n ## bread br <- try(bread(object), silent = TRUE) if(inherits(br, "try-error")) br <- vcov(object) * n ## put together sandwich vc <- 1/n * (br %*% mt %*% br) ## adjustment if(is.null(adjust)) adjust <- class(object)[1L] == "lm" adj <- if(adjust) m/(m - 1L) * (n - 1L)/(n - k) else m/(m - 1L) ## return return(adj * vc) } Neither works for me. So I wonder what am I doing wrong here? All suggestions are welcome ? thank you! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Column product
A is a 5 x 3 matrix and a is a 3-vector. I like to exponentiate A[,1] to a[1], A[,2] to a[2], and A[,3] to a[3], and obtain the product of the resulting columns, as in line 3. I also accomplish this with lines 4 and 5. I like to have rowProducts(B) but there is not so I came up with something ugly in line 5--exponentiating the row sums of log. Is there a more elegant way than than line 5 or, better yet, lines 4 and 5 together? Thanks. A<-matrix(1:15,nrow=5,byrow=F); A a<-c(1,2,3) (A[,1]^a[1])*(A[,2]^a[2])*(A[,3]^a[3]) B<-t(t(A)^a); B exp(rowSums(log(B))) Result: > A<-matrix(1:15,nrow=5,byrow=F); A [,1] [,2] [,3] [1,]16 11 [2,]27 12 [3,]38 13 [4,]49 14 [5,]5 10 15 > a<-c(1,2,3) > (A[,1]^a[1])*(A[,2]^a[2])*(A[,3]^a[3]) [1] 47916 169344 421824 889056 1687500 > B<-t(t(A)^a); B [,1] [,2] [,3] [1,]1 36 1331 [2,]2 49 1728 [3,]3 64 2197 [4,]4 81 2744 [5,]5 100 3375 > exp(rowSums(log(B))) [1] 47916 169344 421824 889056 1687500 > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logistic regression and robust standard errors
Dear Achim Zeileis, Many thanks for your quick and informative answer. I’m sure that the vcovCL should work, however, I experience some problems. > coeftest(model, vcov=vcovCL(model, cluster=mydata$ID)) First I got this error: Error in vcovCL(model, cluster = mydata$ID) : length of 'cluster' does not match number of observations After checking the observations I got this error: Error in vcovCL(model, cluster = mydata$ID) : object 'tets' not found Called from: vcovCL(model, cluster = mydata$ID) Browse[1]> What can I do to fix it? What am I doing wrong now? > 1 jul 2016 kl. 14:57 skrev Achim Zeileis: > > On Fri, 1 Jul 2016, Faradj Koliev wrote: > >> Dear all, >> >> I use ?polr? command (library: MASS) to estimate an ordered logistic >> regression. >> >> My model: summary( model<- polr(y ~ x1+x2+x3+x4+x1*x2 ,data=mydata, Hess = >> TRUE)) >> >> But how do I get robust clustered standard errors? >> I??ve tried coeftest(resA, vcov=vcovHC(resA, cluster=lipton$ID)) > > The vcovHC() function currently does not (yet) have a "cluster" argument. We > are working on it but it's not finished yet. > > As an alternative I include the vcovCL() function below that computes the > usual simple clustered sandwich estimator. This can be applied to "polr" > objects and plugged into coeftest(). So > > coeftest(resA, vcov=vcovCL(resA, cluster=lipton$ID)) > > should work. > >> and summary(a <- robcov(model,mydata$ID)). > > The robcov() function does in principle what you want by I'm not sure whether > it works with polr(). But for sure it works with lrm() from the "rms" package. > > Hope that helps, > Z > > vcovCL <- function(object, cluster = NULL, adjust = NULL) > { > stopifnot(require("sandwich")) > > ## cluster specification > if(is.null(cluster)) cluster <- attr(object, "cluster") > if(is.null(cluster)) stop("no 'cluster' specification found") > cluster <- factor(cluster) > > ## estimating functions and dimensions > ef <- estfun(object) > n <- NROW(ef) > k <- NCOL(ef) > if(n != length(cluster)) >stop("length of 'cluster' does not match number of observations") > m <- length(levels(cluster)) > > ## aggregate estimating functions by cluster and compute meat > ef <- sapply(levels(cluster), function(i) colSums(ef[cluster == i, , >drop = FALSE])) > ef <- if(NCOL(ef) > 1L) t(ef) else matrix(ef, ncol = 1L) > mt <- crossprod(ef)/n > > ## bread > br <- try(bread(object), silent = TRUE) > if(inherits(br, "try-error")) br <- vcov(object) * n > > ## put together sandwich > vc <- 1/n * (br %*% mt %*% br) > > ## adjustment > if(is.null(adjust)) adjust <- class(object)[1L] == "lm" > adj <- if(adjust) m/(m - 1L) * (n - 1L)/(n - k) else m/(m - 1L) > > ## return > return(adj * vc) > } > > >> Neither works for me. So I wonder what am I doing wrong here? >> >> >> All suggestions are welcome ? thank you! >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge several datasets into one
Hi Lily, I think below codes can work: f<- list.files("D:/output/test/your folde rname",full.names=TRUE,recursive=TRUE) files<- grep(".csv", f) files_merge<- data.frame() for (i in 1:length(f[files])){ data<- read.csv(file=f[files][i],header=TRUE, sep=",") files_merge<- rbind(files_merge,data) } Best, Lida On Thu, Jun 30, 2016 at 2:26 PM, lily liwrote: > Hi R users, > > I'd like to ask that how to merge several datasets into one in R? I put > these csv files in one folder, and use the lapply function, but it says > that cannot open file 'xx.csv'. These files have different names, but end > with .csv extension, and the files have the same header. Thanks for your > help. > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R-es] Repetir datos en una tabla
Estimados Pero la pregunta inicial dice que tiene los datos en un data.frame y quiere pasarlos de día a horas, repitiendo. Entiendo que por ejemplo: Dia1………2 Dia2………4 La repetición de día en 24 horas no contempla los días de años bisiestos (tiene 5 años de información según dice). Lógicamente no voy a realizar 2 x 24, + 4 x 24, o 2/24 + 4 /24 (multiplicar el diario por 24, o poner la misma cantidad a todas las horas). Distinto es que el problema sea realizar una tabla por hora y de cinco años de duración donde aplico una función que calcula algo. Hay algo que no me cierra. Javier Rubén Marcuzzi De: Carlos J. Gil Bellosta Enviado: viernes, 1 de julio de 2016 9:42 Para: Olivier Nuñez [[alternative HTML version deleted]] ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
Re: [R] Logistic regression and robust standard errors
On Fri, 1 Jul 2016, Faradj Koliev wrote: Dear all, I use ?polr? command (library: MASS) to estimate an ordered logistic regression. My model: summary( model<- polr(y ~ x1+x2+x3+x4+x1*x2 ,data=mydata, Hess = TRUE)) But how do I get robust clustered standard errors? I??ve tried coeftest(resA, vcov=vcovHC(resA, cluster=lipton$ID)) The vcovHC() function currently does not (yet) have a "cluster" argument. We are working on it but it's not finished yet. As an alternative I include the vcovCL() function below that computes the usual simple clustered sandwich estimator. This can be applied to "polr" objects and plugged into coeftest(). So coeftest(resA, vcov=vcovCL(resA, cluster=lipton$ID)) should work. and summary(a <- robcov(model,mydata$ID)). The robcov() function does in principle what you want by I'm not sure whether it works with polr(). But for sure it works with lrm() from the "rms" package. Hope that helps, Z vcovCL <- function(object, cluster = NULL, adjust = NULL) { stopifnot(require("sandwich")) ## cluster specification if(is.null(cluster)) cluster <- attr(object, "cluster") if(is.null(cluster)) stop("no 'cluster' specification found") cluster <- factor(cluster) ## estimating functions and dimensions ef <- estfun(object) n <- NROW(ef) k <- NCOL(ef) if(n != length(cluster)) stop("length of 'cluster' does not match number of observations") m <- length(levels(cluster)) ## aggregate estimating functions by cluster and compute meat ef <- sapply(levels(cluster), function(i) colSums(ef[cluster == i, , drop = FALSE])) ef <- if(NCOL(ef) > 1L) t(ef) else matrix(ef, ncol = 1L) mt <- crossprod(ef)/n ## bread br <- try(bread(object), silent = TRUE) if(inherits(br, "try-error")) br <- vcov(object) * n ## put together sandwich vc <- 1/n * (br %*% mt %*% br) ## adjustment if(is.null(adjust)) adjust <- class(object)[1L] == "lm" adj <- if(adjust) m/(m - 1L) * (n - 1L)/(n - k) else m/(m - 1L) ## return return(adj * vc) } Neither works for me. So I wonder what am I doing wrong here? All suggestions are welcome ? thank you! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R-es] Repetir datos en una tabla
Y una solución "clásica": iris[rep(1:nrow(iris), each = 24),] Un saludo, Carlos J. Gil Bellosta http://www.datanalytics.com El día 1 de julio de 2016, 11:35, Olivier Nuñezescribió: > Una solución con el paquete data.table: > >> require(data.table) >> tabla=data.table(dia=1:10,y=rnorm(10)) >> tabla > dia y > 1: 1 -1.04816325 > 2: 2 -0.23554981 > 3: 3 1.79809995 > 4: 4 0.07578478 > 5: 5 -1.38710527 > 6: 6 2.18929038 > 7: 7 0.52330030 > 8: 8 -0.34695695 > 9: 9 -0.10357643 > 10: 10 -0.76800351 >> tabla[,.(hora=1:24),by=.(dia,y)] > dia y hora > 1: 1 -1.04816321 > 2: 1 -1.04816322 > 3: 1 -1.04816323 > 4: 1 -1.04816324 > 5: 1 -1.04816325 > --- > 236: 10 -0.7680035 20 > 237: 10 -0.7680035 21 > 238: 10 -0.7680035 22 > 239: 10 -0.7680035 23 > 240: 10 -0.7680035 24 > > > Un saludo. Olivier > > - Mensaje original - > De: "Novvier Marco Uscuchagua Cornelio" > Para: r-help-es@r-project.org > Enviados: Jueves, 30 de Junio 2016 19:15:34 > Asunto: [R-es] Repetir datos en una tabla > > Buen día amigos, > Tengo una tabla con registros de datos por día y quisiera convertirla en > horas, es decir, repetir una fila 24 veces. Lo puedo hacer manualmente pero > es un registro de 5 años y tardaría una eternidad. ¿Saben cual es método > más rápido? > De antemano muchas gracias. > Atte. Marco. > > [[alternative HTML version deleted]] > > ___ > R-help-es mailing list > R-help-es@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-help-es > > ___ > R-help-es mailing list > R-help-es@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-help-es ___ R-help-es mailing list R-help-es@r-project.org https://stat.ethz.ch/mailman/listinfo/r-help-es
[R] Logistic regression and robust standard errors
Dear all, I use ”polr” command (library: MASS) to estimate an ordered logistic regression. My model: summary( model<- polr(y ~ x1+x2+x3+x4+x1*x2 ,data=mydata, Hess = TRUE)) But how do I get robust clustered standard errors? I’’ve tried coeftest(resA, vcov=vcovHC(resA, cluster=lipton$ID)) and summary(a <- robcov(model,mydata$ID)). Neither works for me. So I wonder what am I doing wrong here? All suggestions are welcome – thank you! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge several datasets into one
Hello, Maybe something like this. fls <- list.files(pattern = "*.csv") dat.list <- lapply(fls, read.csv) dat <- do.call(rbind, dat.list) Hope this helps, Rui Barradas Citando lily li: > Hi R users, > > I'd like to ask that how to merge several datasets into one in R? I put > these csv files in one folder, and use the lapply function, but it says > that cannot open file 'xx.csv'. These files have different names, but end > with .csv extension, and the files have the same header. Thanks for your > help. > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.htmland provide commented, > minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.