Re: [R] Can R read Word fonts and comments?

2016-07-01 Thread David Winsemius
It’s my understanding that docx and xlsx files are zipped containers that have 
their data in XML files. You should try unzipping one and examining it with a 
viewer. You may then be able to use pkg:XML.

— 
David.

> On Jul 1, 2016, at 3:13 PM, Bert Gunter  wrote:
> 
> No, sorry -- all I would do is search.
> 
> -- Bert
> 
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> On Fri, Jul 1, 2016 at 2:33 PM, John  wrote:
>> Yes, I have done some search (e.g., tm, markdown, etc), but I can't find
>> this function.
>> If you know any package that works for this purpose, that would be quite
>> helpful.
>> Thanks,
>> 
>> John
>> 
>> 2016-06-28 16:50 GMT-07:00 Bert Gunter :
>>> 
>>> Did you try searching before posting here? -- e.g. a web search or on
>>> rseek.org ?
>>> 
>>> Cheers,
>>> Bert
>>> 
>>> 
>>> Bert Gunter
>>> 
>>> "The trouble with having an open mind is that people keep coming along
>>> and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>> 
>>> 
>>> On Tue, Jun 28, 2016 at 3:53 PM, John  wrote:
 Hi,
 
   From time to time I highlight the word documents with red/blue color
 or
 italic/bold fonts, and I also add comments to a file. Is there a
 package/function to let R extract the italic/bold blue/red words and
 comments from a docx/doc file?
 
   I am aware that there are a few packages reading Word, but don't know
 which one is able to do it.
 
   Thanks,
 
 John
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to identify runs or clusters of events in time

2016-07-01 Thread Charles C. Berry


See below

On Fri, 1 Jul 2016, Mark Shanks wrote:


Hi,


Imagine the two problems:


1) You have an event that occurs repeatedly over time. You want to 
identify periods when the event occurs more frequently than the base 
rate of occurrence. Ideally, you don't want to have to specify the 
period (e.g., break into months), so the analysis can be sensitive to 
scenarios such as many events happening only between, e.g., June 10 and 
June 15 - even though the overall number of events for the month may not 
be much greater than usual. Similarly, there may be a cluster of events 
that occur from March 28 to April 3. Ideally, you want to pull out the 
base rate of occurrence and highlight only the periods when the 
frequency is less or greater than the base rate.




A good place to start is:

Siegmund, D. O., N. R. Zhang, and B. Yakir. "False discovery rate
for scanning statistics." Biometrika 98.4 (2011): 979-985.

and

Aldous, David. Probability approximations via the Poisson clumping 
heuristic. Vol. 77. Springer Science & Business Media, 2013.


---

A nice illustration of how scan statistcis can be used is:

Aberdein, Jody, and David Spiegelhalter. "Have London's roads
become more dangerous for cyclists?." Significance 10.6 (2013):
46-48.




2) Events again occur repeatedly over time in an inconsistent way. 
However, this time, the event has positive or negative outcomes - such 
as a spot check of conformity to regulations. You again want to know 
whether there is a group of negative outcomes close together in time. 
This analysis should take into account the negative outcomes as well 
though. E.g., if from June 10 to June 15 you get 5 negative outcomes and 
no positive outcomes it should be flagged. On the other hand, if from 
June 10 to June 15 you get 5 negative outcomes interspersed between many 
positive outcomes it should be ignored.



I'm guessing that there is some statistical approach designed to look at 
these types of issues. What is it called?


`Scan statistic' is a good search term. `Poisson clumping', too.

What package in R implements it? I basically just need to know where to 
start.





There are some R packages.

CRAN has packages SNscan and graphscan, which sound like they 
might interest you.


My BioConductor package geneRxCluster:

http://bioconductor.org/packages/release/bioc/html/geneRxCluster.html

seeks clusters in a binary sequence as described in detail at

http://bioinformatics.oxfordjournals.org/content/30/11/1493

HTH,

Chuck

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] turning data in list format into a matrix format

2016-07-01 Thread Rolf Turner


Your question is filled with uh, infelicities.  See inline below.

On 02/07/16 07:53, Marietta Suarez wrote:

As of now, I simulate 6 variables and manipulate 2 of them. The way my
syntax is written my final database


   Do you mean "data set"?

is in list mode. I need all of it to be

in just 1 database


   Do you mean data frame?

any help would be MUCH appreciated. Here's my syntax:


fun=function(n, k,rep){
  #prepare to store data
  data=matrix(0,nrow=10*k, ncol=6)
  db=vector("list",length=rep) #here's the problem

  for (j in 1:rep){
for (i in 1:k){
  #generate data under normal, skewed, and logistic distributions here
  data[,1]<-seq(1,nrow(data),1)
  data[,2]<-rep(i,nrow(data))
  data[,3]<-rep(n,nrow(data))
  data[,4]<-rnorm(n, 100, 15)
  data[,5]<-rsnorm(n, 100, 15, 1)


   There is no such function as "rsnorm" in base R or in the standard
   packages.  Where does this function come from?  Do not expect
   your respondents to be psychic.


  data[,6]<-rlogis(n, 100, 15)
}
db[[j]]<-data
  }
  DataReturn<<-db


   Do *NOT* use "<<-"!  Especially if you actually mean "<-"!!!

   What is the point of "DataReturn" anyway?  It's the same thing
   as "db".  Why confuse things by introducing a new name?


  return(DataReturn)
  #DataReturn <- data.frame(matrix(unlist(DataReturn), nrow=rep,
byrow=T)) #here's
where I tried to solve it unsuccessfully
}


In what sense is this unsuccessful?  It returns a data frame.  What is 
wrong with this data frame?  What did you *want* it to look like?


A *reproducible* "toy" example showing what you want to get would be 
helpful.


A wild guess is that you want your reps "stacked" into a single data 
frame.  In which case something like


   do.call(rbind,db)

may be what you want.

Finally, do you really want the result to be a data frame?  Since all 
values dealt with are numeric, you might as well use a matrix, and save 
on the overheads that data frames involve.


cheers,

Rolf Turner

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Anybody know the cause of this?

2016-07-01 Thread Bert Gunter
I don't have a clue, but I suspect that those who might would be
helped by your providing the output of the sessionInfo() command  +
perhaps other relevant info on your computing environment.

Cheers,

Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Jul 1, 2016 at 12:56 PM, James Plante  wrote:
> Just upgraded to R 3.3.1; when I updated the packages on CRAN, I got a BUNCH 
> of warning messages like the ones below:
>
> 2016-07-01 14:44:19.840 R[369:3724] IMKClient Stall detected, *please Report* 
> your user scenario attaching a spindump (or sysdiagnose) that captures the 
> problem - (imkxpc_windowLevelWithReply:) block performed very slowly (2.40 
> secs).
> starting httpd help server ... done
>> spindump()
> Error: could not find function "spindump"
> In addition: Warning message:
> In library() :
>   library ‘/Users/jimplante/Library/R/3.3/library’ contains no packages
> 2016-07-01 14:47:32.774 R[369:3724] IMKClient Stall detected, *please Report* 
> your user scenario attaching a spindump (or sysdiagnose) that captures the 
> problem - (imkxpc_attributesForCharacterIndex:reply:) block performed very 
> slowly (1.42 secs).
>>
>
> This is not interfering with any of my work, but I’m curious to know what 
> it’s warning me about.
> Two more questions: 1) What’s a spindump; and 2) What’s a sysdiagnose?
> And finally, how would I get either one of those, and to whom would I send 
> them?
>
> Thanks,
>
> Jim
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can R read Word fonts and comments?

2016-07-01 Thread Bert Gunter
No, sorry -- all I would do is search.

-- Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Jul 1, 2016 at 2:33 PM, John  wrote:
> Yes, I have done some search (e.g., tm, markdown, etc), but I can't find
> this function.
> If you know any package that works for this purpose, that would be quite
> helpful.
> Thanks,
>
> John
>
> 2016-06-28 16:50 GMT-07:00 Bert Gunter :
>>
>> Did you try searching before posting here? -- e.g. a web search or on
>> rseek.org ?
>>
>> Cheers,
>> Bert
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Tue, Jun 28, 2016 at 3:53 PM, John  wrote:
>> > Hi,
>> >
>> >From time to time I highlight the word documents with red/blue color
>> > or
>> > italic/bold fonts, and I also add comments to a file. Is there a
>> > package/function to let R extract the italic/bold blue/red words and
>> > comments from a docx/doc file?
>> >
>> >I am aware that there are a few packages reading Word, but don't know
>> > which one is able to do it.
>> >
>> >Thanks,
>> >
>> > John
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] trouble double looping to generate data for a meta-analysis

2016-07-01 Thread Bert Gunter
Hint: It's much more efficient not to loop and generate random data in
a single call only once -- then make your samples. (This can even
often be done with different distribution parameters, as in many cases
these can also be vactorized)

Example:

## 1000 random samples of size 100

> set.seed(1122)
> samps.norm <- matrix(rnorm(1e5),nrow = 100 )
> dim(samps.norm)
[1]  100 1000

## This was instantaneous on my machine.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Jul 1, 2016 at 10:28 AM, Marietta Suarez  wrote:
> i'm trying to generate data for a meta analysis. 1- generate data following
> a normal distribution, 2- generate data following a skewed distribution, 3-
> generate data following a logistic distribution. i need to loop this
> because the # of studies in each meta will be either 10 or 15. k or total
> number of studies in the meta will be 5. i need to loop twice to repeat
> this process 10 times. database should be 3 columns (distributions) by 65
> rows x 10 reps
>
>
> here's my code, not sure what's not working:
> library(fGarch)
>
> #n reps =10
> rep=10
>
> #begin function here, need to vary n and k, when k=2 n=10, when k3 n=15
> fun=function(n, k){
>
>   #prepare to store data
>   data=matrix(0,nrow=10*k, ncol=3)
>   db=matrix(0,nrow=650, ncol=3)
>
>   for (j in 1:rep)
>   {
> for (i in 1:k)
> {
>   #generate data under normal, skewed, and logistic distributions here
>
>   data[,1]=rnorm(n, 100, 15)
>   data[,2]=rsnorm(n, 100, 15, 1)
>   data[,3]=rlogis(n, 100, 15)
> }
>   [j]=db
>   }
> }
>
> save=fun(10,2)
>
> Please help!!!
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can R read Word fonts and comments?

2016-07-01 Thread John
Yes, I have done some search (e.g., tm, markdown, etc), but I can't find
this function.
If you know any package that works for this purpose, that would be quite
helpful.
Thanks,

John

2016-06-28 16:50 GMT-07:00 Bert Gunter :

> Did you try searching before posting here? -- e.g. a web search or on
> rseek.org ?
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Jun 28, 2016 at 3:53 PM, John  wrote:
> > Hi,
> >
> >From time to time I highlight the word documents with red/blue color
> or
> > italic/bold fonts, and I also add comments to a file. Is there a
> > package/function to let R extract the italic/bold blue/red words and
> > comments from a docx/doc file?
> >
> >I am aware that there are a few packages reading Word, but don't know
> > which one is able to do it.
> >
> >Thanks,
> >
> > John
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R-es] Resumen de R-help-es, Vol 89, Envío 1

2016-07-01 Thread Novvier Marco Uscuchagua Cornelio
Gracias Olivier, me ha servido de mucho.

2016-07-01 5:00 GMT-05:00 :

> Envíe los mensajes para la lista R-help-es a
> r-help-es@r-project.org
>
> Para subscribirse o anular su subscripción a través de la WEB
> https://stat.ethz.ch/mailman/listinfo/r-help-es
>
> O por correo electrónico, enviando un mensaje con el texto "help" en
> el asunto (subject) o en el cuerpo a:
> r-help-es-requ...@r-project.org
>
> Puede contactar con el responsable de la lista escribiendo a:
> r-help-es-ow...@r-project.org
>
> Si responde a algún contenido de este mensaje, por favor, edite la
> linea del asunto (subject) para que el texto sea mas especifico que:
> "Re: Contents of R-help-es digest...". Además, por favor, incluya en
> la respuesta sólo aquellas partes del mensaje a las que está
> respondiendo.
>
>
> Asuntos del día:
>
>1. Repetir datos en una tabla (Novvier Marco Uscuchagua Cornelio)
>2. Re: Repetir datos en una tabla (Javier Marcuzzi)
>3. Re: Repetir datos en una tabla (Olivier Nuñez)
>
>
> --
>
> Message: 1
> Date: Thu, 30 Jun 2016 12:15:34 -0500
> From: Novvier Marco Uscuchagua Cornelio 
> To: r-help-es@r-project.org
> Subject: [R-es] Repetir datos en una tabla
> Message-ID:
> 

[R] Anybody know the cause of this?

2016-07-01 Thread James Plante
Just upgraded to R 3.3.1; when I updated the packages on CRAN, I got a BUNCH of 
warning messages like the ones below:

2016-07-01 14:44:19.840 R[369:3724] IMKClient Stall detected, *please Report* 
your user scenario attaching a spindump (or sysdiagnose) that captures the 
problem - (imkxpc_windowLevelWithReply:) block performed very slowly (2.40 
secs).
starting httpd help server ... done
> spindump()
Error: could not find function "spindump"
In addition: Warning message:
In library() :
  library ‘/Users/jimplante/Library/R/3.3/library’ contains no packages
2016-07-01 14:47:32.774 R[369:3724] IMKClient Stall detected, *please Report* 
your user scenario attaching a spindump (or sysdiagnose) that captures the 
problem - (imkxpc_attributesForCharacterIndex:reply:) block performed very 
slowly (1.42 secs).
> 

This is not interfering with any of my work, but I’m curious to know what it’s 
warning me about. 
Two more questions: 1) What’s a spindump; and 2) What’s a sysdiagnose? 
And finally, how would I get either one of those, and to whom would I send them?

Thanks,

Jim
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] turning data in list format into a matrix format

2016-07-01 Thread Marietta Suarez
As of now, I simulate 6 variables and manipulate 2 of them. The way my
syntax is written my final database is in list mode. I need all of it to be
in just 1 database. any help would be MUCH appreciated. Here's my syntax:

fun=function(n, k,rep){
  #prepare to store data
  data=matrix(0,nrow=10*k, ncol=6)
  db=vector("list",length=rep) #here's the problem

  for (j in 1:rep){
for (i in 1:k){
  #generate data under normal, skewed, and logistic distributions here
  data[,1]<-seq(1,nrow(data),1)
  data[,2]<-rep(i,nrow(data))
  data[,3]<-rep(n,nrow(data))
  data[,4]<-rnorm(n, 100, 15)
  data[,5]<-rsnorm(n, 100, 15, 1)
  data[,6]<-rlogis(n, 100, 15)
}
db[[j]]<-data
  }
  DataReturn<<-db
  return(DataReturn)
  #DataReturn <- data.frame(matrix(unlist(DataReturn), nrow=rep,
byrow=T)) #here's
where I tried to solve it unsuccessfully
}

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Difficulty subsetting data frames using logical operators

2016-07-01 Thread David Winsemius

> On Jul 1, 2016, at 2:11 AM, Giles Bischoff  wrote:
> 
> So, I uploaded a data set via my directory using the command data <-
> data.frame(read.csv("hw1_data.csv")) and then tried to subset that data
> using logical operators. Specifically, I was trying to make it so that I
> got all the rows in which the values for "Ozone" (a column in the data set)
> were greater than 31 (I was trying to get the mean of all said values).
> Then, I tried using the command data[ , "Ozone">31]. Additionally, I had
> trouble getting it so that I had all the rows where all the values in
> "Ozone">31 & "Temp">90 simultaneously. There were some NA values in both of
> those columns, so that might be it. If someone could help me to figure out
> how to remove those values, that'd be great as well. I'm using a Mac (OS X)
> with the latest version of R (3.1.2. I think??).
> 
> Here is some of the code I used:
> 

Bad idea to use `data` as an data-object name. There is an R function by that 
name. 

Look at ?data and see the fortune:

fortunes::fortune("dog")

The threat that it will "clash" is not actually correct, but it is guaranteed 
to deliver error messages that do not make sense and result in code that will 
be difficult to read.

>> data <- data.frame(read.csv("hw1_data.csv"))
>> data
>Ozone Solar.R Wind Temp Month Day
> 1  41 190  7.4   67 5   1
> 2  36 118  8.0   72 5   2
> 3  12 149 12.6   74 5   3
> 4  18 313 11.5   62 5   4
> 5  NA  NA 14.3   56 5   5
> 6  28  NA 14.9   66 5   6
> 7  23 299  8.6   65 5   7
> 8  19  99 13.8   59 5   8
> 9   8  19 20.1   61 5   9
> 10 NA 194  8.6   69 5  10
> 11  7  NA  6.9   74 5  11
> 12 16 256  9.7   69 5  12
> 13 11 290  9.2   66 5  13
> 14 14 274 10.9   68 5  14
> 15 18  65 13.2   58 5  15
> 16 14 334 11.5   64 5  16
> 17 34 307 12.0   66 5  17
> 18  6  78 18.4   57 5  18
> 19 30 322 11.5   68 5  19
> 20 11  44  9.7   62 5  20
> 21  1   8  9.7   59 5  21
> 22 11 320 16.6   73 5  22
> 23  4  25  9.7   61 5  23
> 24 32  92 12.0   61 5  24
> 25 NA  66 16.6   57 5  25
> 26 NA 266 14.9   58 5  26
> 27 NA  NA  8.0   57 5  27
> 28 23  13 12.0   67 5  28
> 29 45 252 14.9   81 5  29
> 30115 223  5.7   79 5  30
> 31 37 279  7.4   76 5  31
> 32 NA 286  8.6   78 6   1
> 33 NA 287  9.7   74 6   2
> 34 NA 242 16.1   67 6   3
> 35 NA 186  9.2   84 6   4
> 36 NA 220  8.6   85 6   5
> 37 NA 264 14.3   79 6   6
> 38 29 127  9.7   82 6   7
> 39 NA 273  6.9   87 6   8
> 40 71 291 13.8   90 6   9
> 41 39 323 11.5   87 6  10
> 42 NA 259 10.9   93 6  11
> 43 NA 250  9.2   92 6  12
> 44 23 148  8.0   82 6  13
> 45 NA 332 13.8   80 6  14
> 46 NA 322 11.5   79 6  15
> 47 21 191 14.9   77 6  16
> 48 37 284 20.7   72 6  17
> 49 20  37  9.2   65 6  18
> 50 12 120 11.5   73 6  19
> 51 13 137 10.3   76 6  20
> 52 NA 150  6.3   77 6  21
> 53 NA  59  1.7   76 6  22
> 54 NA  91  4.6   76 6  23
> 55 NA 250  6.3   76 6  24
> 56 NA 135  8.0   75 6  25
> 57 NA 127  8.0   78 6  26
> 58 NA  47 10.3   73 6  27
> 59 NA  98 11.5   80 6  28
> 60 NA  31 14.9   77 6  29
> 61 NA 138  8.0   83 6  30
> 62135 269  4.1   84 7   1
> 63 49 248  9.2   85 7   2
> 64 32 236  9.2   81 7   3
> 65 NA 101 10.9   84 7   4
> 66 64 175  4.6   83 7   5
> 67 40 314 10.9   83 7   6
> 68 77 276  5.1   88 7   7
> 69 97 267  6.3   92 7   8
> 70 97 272  5.7   92 7   9
> 71 85 175  7.4   89 7  10
> 72 NA 139  8.6   82 7  11
> 73 10 264 14.3   73 7  12
> 74 27 175 14.9   81 7  13
> 75 NA 291 14.9   91 7  14
> 76  7  48 14.3   80 7  15
> 77 48 260  6.9   81 7  16
> 78 35 274 10.3   82 7  17
> 79 61 285  6.3   84 7  18
> 80 79 187  5.1   87 7  19
> 81 63 220 11.5   85 7  20
> 82 16   7  6.9   74 7  21
> 83 NA 258  9.7   81 7  22
> 84 NA 295 11.5   82 7  23
> 85 80 294  8.6   86 7  24
> 86108 223  8.0   85 7  25
> 87 20  81  8.6   82 7  26
> 88 52  82 12.0   86 7  27
> 89 82 213  7.4   88 7  28
> 90 50 275  7.4   86 7  29
> 91 64 253  7.4   83 7  30
> 92 

Re: [R] trouble double looping to generate data for a meta-analysis

2016-07-01 Thread Federman, Douglas
You might look at the package 
 Wakefield
For data generation

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Marietta Suarez
Sent: Friday, July 01, 2016 1:28 PM
To: r-help@r-project.org
Subject: [R] trouble double looping to generate data for a meta-analysis

i'm trying to generate data for a meta analysis. 1- generate data following
a normal distribution, 2- generate data following a skewed distribution, 3-
generate data following a logistic distribution. i need to loop this
because the # of studies in each meta will be either 10 or 15. k or total
number of studies in the meta will be 5. i need to loop twice to repeat
this process 10 times. database should be 3 columns (distributions) by 65
rows x 10 reps


here's my code, not sure what's not working:
library(fGarch)

#n reps =10
rep=10

#begin function here, need to vary n and k, when k=2 n=10, when k3 n=15
fun=function(n, k){

  #prepare to store data
  data=matrix(0,nrow=10*k, ncol=3)
  db=matrix(0,nrow=650, ncol=3)

  for (j in 1:rep)
  {
for (i in 1:k)
{
  #generate data under normal, skewed, and logistic distributions here

  data[,1]=rnorm(n, 100, 15)
  data[,2]=rsnorm(n, 100, 15, 1)
  data[,3]=rlogis(n, 100, 15)
}
  [j]=db
  }
}

save=fun(10,2)

Please help!!!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] trouble double looping to generate data for a meta-analysis

2016-07-01 Thread Marietta Suarez
i'm trying to generate data for a meta analysis. 1- generate data following
a normal distribution, 2- generate data following a skewed distribution, 3-
generate data following a logistic distribution. i need to loop this
because the # of studies in each meta will be either 10 or 15. k or total
number of studies in the meta will be 5. i need to loop twice to repeat
this process 10 times. database should be 3 columns (distributions) by 65
rows x 10 reps


here's my code, not sure what's not working:
library(fGarch)

#n reps =10
rep=10

#begin function here, need to vary n and k, when k=2 n=10, when k3 n=15
fun=function(n, k){

  #prepare to store data
  data=matrix(0,nrow=10*k, ncol=3)
  db=matrix(0,nrow=650, ncol=3)

  for (j in 1:rep)
  {
for (i in 1:k)
{
  #generate data under normal, skewed, and logistic distributions here

  data[,1]=rnorm(n, 100, 15)
  data[,2]=rsnorm(n, 100, 15, 1)
  data[,3]=rlogis(n, 100, 15)
}
  [j]=db
  }
}

save=fun(10,2)

Please help!!!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Difficulty subsetting data frames using logical operators

2016-07-01 Thread jim holtman
You may need to re-read the Intro to R.

data[data$Ozone > 31,]

or

subset(data, Ozone > 31)


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Fri, Jul 1, 2016 at 5:11 AM, Giles Bischoff 
wrote:

> So, I uploaded a data set via my directory using the command data <-
> data.frame(read.csv("hw1_data.csv")) and then tried to subset that data
> using logical operators. Specifically, I was trying to make it so that I
> got all the rows in which the values for "Ozone" (a column in the data set)
> were greater than 31 (I was trying to get the mean of all said values).
> Then, I tried using the command data[ , "Ozone">31]. Additionally, I had
> trouble getting it so that I had all the rows where all the values in
> "Ozone">31 & "Temp">90 simultaneously. There were some NA values in both of
> those columns, so that might be it. If someone could help me to figure out
> how to remove those values, that'd be great as well. I'm using a Mac (OS X)
> with the latest version of R (3.1.2. I think??).
>
> Here is some of the code I used:
>
> >data <- data.frame(read.csv("hw1_data.csv"))
> > data
> Ozone Solar.R Wind Temp Month Day
> 1  41 190  7.4   67 5   1
> 2  36 118  8.0   72 5   2
> 3  12 149 12.6   74 5   3
> 4  18 313 11.5   62 5   4
> 5  NA  NA 14.3   56 5   5
> 6  28  NA 14.9   66 5   6
> 7  23 299  8.6   65 5   7
> 8  19  99 13.8   59 5   8
> 9   8  19 20.1   61 5   9
> 10 NA 194  8.6   69 5  10
> 11  7  NA  6.9   74 5  11
> 12 16 256  9.7   69 5  12
> 13 11 290  9.2   66 5  13
> 14 14 274 10.9   68 5  14
> 15 18  65 13.2   58 5  15
> 16 14 334 11.5   64 5  16
> 17 34 307 12.0   66 5  17
> 18  6  78 18.4   57 5  18
> 19 30 322 11.5   68 5  19
> 20 11  44  9.7   62 5  20
> 21  1   8  9.7   59 5  21
> 22 11 320 16.6   73 5  22
> 23  4  25  9.7   61 5  23
> 24 32  92 12.0   61 5  24
> 25 NA  66 16.6   57 5  25
> 26 NA 266 14.9   58 5  26
> 27 NA  NA  8.0   57 5  27
> 28 23  13 12.0   67 5  28
> 29 45 252 14.9   81 5  29
> 30115 223  5.7   79 5  30
> 31 37 279  7.4   76 5  31
> 32 NA 286  8.6   78 6   1
> 33 NA 287  9.7   74 6   2
> 34 NA 242 16.1   67 6   3
> 35 NA 186  9.2   84 6   4
> 36 NA 220  8.6   85 6   5
> 37 NA 264 14.3   79 6   6
> 38 29 127  9.7   82 6   7
> 39 NA 273  6.9   87 6   8
> 40 71 291 13.8   90 6   9
> 41 39 323 11.5   87 6  10
> 42 NA 259 10.9   93 6  11
> 43 NA 250  9.2   92 6  12
> 44 23 148  8.0   82 6  13
> 45 NA 332 13.8   80 6  14
> 46 NA 322 11.5   79 6  15
> 47 21 191 14.9   77 6  16
> 48 37 284 20.7   72 6  17
> 49 20  37  9.2   65 6  18
> 50 12 120 11.5   73 6  19
> 51 13 137 10.3   76 6  20
> 52 NA 150  6.3   77 6  21
> 53 NA  59  1.7   76 6  22
> 54 NA  91  4.6   76 6  23
> 55 NA 250  6.3   76 6  24
> 56 NA 135  8.0   75 6  25
> 57 NA 127  8.0   78 6  26
> 58 NA  47 10.3   73 6  27
> 59 NA  98 11.5   80 6  28
> 60 NA  31 14.9   77 6  29
> 61 NA 138  8.0   83 6  30
> 62135 269  4.1   84 7   1
> 63 49 248  9.2   85 7   2
> 64 32 236  9.2   81 7   3
> 65 NA 101 10.9   84 7   4
> 66 64 175  4.6   83 7   5
> 67 40 314 10.9   83 7   6
> 68 77 276  5.1   88 7   7
> 69 97 267  6.3   92 7   8
> 70 97 272  5.7   92 7   9
> 71 85 175  7.4   89 7  10
> 72 NA 139  8.6   82 7  11
> 73 10 264 14.3   73 7  12
> 74 27 175 14.9   81 7  13
> 75 NA 291 14.9   91 7  14
> 76  7  48 14.3   80 7  15
> 77 48 260  6.9   81 7  16
> 78 35 274 10.3   82 7  17
> 79 61 285  6.3   84 7  18
> 80 79 187  5.1   87 7  19
> 81 63 220 11.5   85 7  20
> 82 16   7  6.9   74 7  21
> 83 NA 258  9.7   81 7  22
> 84 NA 295 11.5   82 7  23
> 85 80 294  8.6   86 7  24
> 86108 223  8.0   85 7  25
> 87 20  81  8.6   82 7  26
> 88 52  82 12.0   86 7  27
> 89 82 213  7.4   88 7  28
> 90 50 275  7.4   86 7  29
> 91 64 253  7.4   83 7  30
> 92 59 254  9.2   81 7  31
> 93 39  83  6.9   81 8   1
> 94  9  24 13.8   

[R] C stack error in as.vector() starting in R 3.3.0

2016-07-01 Thread Eric Archer - NOAA Federal
Apologies for the long post. This is an issue I have been struggling with
and I have tried to be as complete, to the point, and reproducible as
possible.

In documenting a package with roxygen2, I have come across an error that
does not occur in R 3.2.4 revised, but does occur in R 3.3.0 and 3.3.1.
Using traceback() and debug(), I've traced the error to a call made to
as.vector(x, "character") that seems to get stuck in a loop which
culminates in this error:

Error: C stack usage  7970892 is too close to the limit

The object that causes this error is of a signature type for a method. With
some playing around, I've been able to work out that the error is actually
associated with the method that roxygen2 creates when doing its magic.
Something happens when this method definition with its associated
environment is in the workspace that causes the error.

At this point, I should stress again that the error does NOT occur in R
3.2.4 revised or earlier, but does occur in R 3.3.0 and 3.3.1. I have also
tested this with several versions of roxygen2 and that does not make a
difference. Thus, my ultimate question is what has changed in R 3.3.0 that
would lead to this so that the roxygen2 maintainers can correct it?

As a test, I created a signature identical to the one that causes the error
and tried the as.vector() command that would generate the loop. I don't get
the error in my command line generated object, nor with the object that is
extracted from the method definition. However, when I load the method
definition into the workspace, the error will happen on either my command
line generated object, or the one extracted from the method definition and
will not stop happening until I restart R.

I have reported this error as an issue on the roxygen2 GitHub repository
and it has been crossposted by another user who had a similar experience
with a different package on the devtools repository. Those posts, which
contain more information are here:

https://github.com/klutometis/roxygen/issues/475
https://github.com/hadley/devtools/issues/1234

Below is the result of sessionInfo() and the output of a session
demonstrating the effect. The files used are in a zip file here:
https://github.com/klutometis/roxygen/files/335417/error.test.rdata.files.zip

 > sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.5 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_3.3.0

> rm(list = ls())
>
> # Create test class and method
> setClass(Class = "gtypes", slots = c(loci = "data.frame", ploidy =
"numeric"), package = "roxygen_devtest")
> setGeneric("nInd", function(x, ...) standardGeneric("nInd"), package =
"adegenet")
[1] "nInd"
> setMethod("nInd", "gtypes", function(x, ...) nrow(x@loci) / x@ploidy)
[1] "nInd"
>
> test.method <- getMethod("nInd", "gtypes")
> str(test.method)
Formal class 'MethodDefinition' [package "methods"] with 4 slots
  ..@ .Data  :function (x, ...)
  ..@ target :Formal class 'signature' [package "methods"] with 3 slots
  .. .. ..@ .Data  : chr "gtypes"
  .. .. ..@ names  : chr "x"
  .. .. ..@ package: chr "roxygen_devtest"
  ..@ defined:Formal class 'signature' [package "methods"] with 3 slots
  .. .. ..@ .Data  : chr "gtypes"
  .. .. ..@ names  : chr "x"
  .. .. ..@ package: chr "roxygen_devtest"
  ..@ generic: atomic [1:1] nInd
  .. ..- attr(*, "package")= chr "adegenet"
>
> # No error:
> as.vector(test.method@defined, "character")
[1] "gtypes"
>
> # This is the method that generates the error
> load("problem.method.rdata")
>
> # Notice the difference in the environments of the two functions:
> test.method@.Data
function (x, ...)
nrow(x@loci)/x@ploidy
> problem.method@.Data
function (x, ...)
nrow(x@loci)/x@ploidy

>
> # Swap the problem function for the one I created:
> problem.method@.Data <- test.method@.Data
> save(problem.method, file = "fixed.method.rdata")
>
> # Here's the error (both with my method and the original):
> as.vector(test.method@defined, "character")
Error: C stack usage  7970892 is too close to the limit
> as.vector(problem.method@defined, "character")
Error: C stack usage  7970892 is too close to the limit

Restarting R session...

> # *** Restart R for this to work ***
> rm(list = ls())
>
> load("fixed.method.rdata")
> as.vector(problem.method@defined, "character")
[1] "gtypes"



*Eric Archer, Ph.D.*
Southwest Fisheries Science Center
NMFS, NOAA
8901 La Jolla Shores Drive
La Jolla, CA 92037 USA
858-546-7121 (work)
858-546-7003 (FAX)

Marine Mammal Genetics Group: swfsc.noaa.gov/mmtd-mmgenetics
ETP Cetacean Assessment Program: swfsc.noaa.gov/mmtd-etp
https://github/ericarcher

"


*The universe doesn't care what you believe. The wonderful thing about
science is that it   doesn't ask for your faith, it just asks   for your
eyes.*"  - 

Re: [R] How to identify runs or clusters of events in time

2016-07-01 Thread Clint Bowman

Mark,

I did something similar a couple of year ago by coding non-events as 0, 
positive events as +1 and negative events as -1 then summing the value 
through time.  In my case the patterns showed up quite clearly and I used 
other criteria to define the actual periods.


Clint

Clint BowmanINTERNET:   cl...@ecy.wa.gov
Air Quality Modeler INTERNET:   cl...@math.utah.edu
Department of Ecology   VOICE:  (360) 407-6815
PO Box 47600FAX:(360) 407-7534
Olympia, WA 98504-7600

USPS:   PO Box 47600, Olympia, WA 98504-7600
Parcels:300 Desmond Drive, Lacey, WA 98503-1274

On Fri, 1 Jul 2016, Mark Shanks wrote:


Hi,


Imagine the two problems:


1) You have an event that occurs repeatedly over time. You want to identify 
periods when the event occurs more frequently than the base rate of occurrence. 
Ideally, you don't want to have to specify the period (e.g., break into 
months), so the analysis can be sensitive to scenarios such as many events 
happening only between, e.g., June 10 and June 15 - even though the overall 
number of events for the month may not be much greater than usual. Similarly, 
there may be a cluster of events that occur from March 28 to April 3. Ideally, 
you want to pull out the base rate of occurrence and highlight only the periods 
when the frequency is less or greater than the base rate.


2) Events again occur repeatedly over time in an inconsistent way. However, 
this time, the event has positive or negative outcomes - such as a spot check 
of conformity to regulations. You again want to know whether there is a group 
of negative outcomes close together in time. This analysis should take into 
account the negative outcomes as well though. E.g., if from June 10 to June 15 
you get 5 negative outcomes and no positive outcomes it should be flagged. On 
the other hand, if from June 10 to June 15 you get 5 negative outcomes 
interspersed between many positive outcomes it should be ignored.


I'm guessing that there is some statistical approach designed to look at these 
types of issues. What is it called? What package in R implements it? I 
basically just need to know where to start.


Thanks,


Mark

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RODBC Error - first argument is not an open ODBC Channel

2016-07-01 Thread Gaurav Virmani
Hi,

Sincere apologies if my mail is inappropriate for this mailbox. Please
ignore the mail.

*Issue*

I am getting error "*first argument is not an open RODBC channel*" when I
publish my application on IIS. It runs perfectly under Visual Studio
development mode and the script runs fine on R Console too. But getting
error once published to IIS

*Code Snippet:*

library(RODBC)
conna <- odbcConnect("XXX",uid='sa',pwd='x')

coreKPI<-sqlFetch(conna,"vw_GetFields")
odbcClose(conna)

I have tried using odbcDriverConnect instead of odbcConnect and providing
complete DB string but still the issue. Also I have set the PATH variable
to point to 32 bit R. Tried few other options suggested on stack overflow
but no luck. I have been struggling for 3 days now to get this going.
Please advise.

Thanks in anticipation.

Warm Regards
Gaurav

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Difficulty subsetting data frames using logical operators

2016-07-01 Thread Giles Bischoff
So, I uploaded a data set via my directory using the command data <-
data.frame(read.csv("hw1_data.csv")) and then tried to subset that data
using logical operators. Specifically, I was trying to make it so that I
got all the rows in which the values for "Ozone" (a column in the data set)
were greater than 31 (I was trying to get the mean of all said values).
Then, I tried using the command data[ , "Ozone">31]. Additionally, I had
trouble getting it so that I had all the rows where all the values in
"Ozone">31 & "Temp">90 simultaneously. There were some NA values in both of
those columns, so that might be it. If someone could help me to figure out
how to remove those values, that'd be great as well. I'm using a Mac (OS X)
with the latest version of R (3.1.2. I think??).

Here is some of the code I used:

>data <- data.frame(read.csv("hw1_data.csv"))
> data
Ozone Solar.R Wind Temp Month Day
1  41 190  7.4   67 5   1
2  36 118  8.0   72 5   2
3  12 149 12.6   74 5   3
4  18 313 11.5   62 5   4
5  NA  NA 14.3   56 5   5
6  28  NA 14.9   66 5   6
7  23 299  8.6   65 5   7
8  19  99 13.8   59 5   8
9   8  19 20.1   61 5   9
10 NA 194  8.6   69 5  10
11  7  NA  6.9   74 5  11
12 16 256  9.7   69 5  12
13 11 290  9.2   66 5  13
14 14 274 10.9   68 5  14
15 18  65 13.2   58 5  15
16 14 334 11.5   64 5  16
17 34 307 12.0   66 5  17
18  6  78 18.4   57 5  18
19 30 322 11.5   68 5  19
20 11  44  9.7   62 5  20
21  1   8  9.7   59 5  21
22 11 320 16.6   73 5  22
23  4  25  9.7   61 5  23
24 32  92 12.0   61 5  24
25 NA  66 16.6   57 5  25
26 NA 266 14.9   58 5  26
27 NA  NA  8.0   57 5  27
28 23  13 12.0   67 5  28
29 45 252 14.9   81 5  29
30115 223  5.7   79 5  30
31 37 279  7.4   76 5  31
32 NA 286  8.6   78 6   1
33 NA 287  9.7   74 6   2
34 NA 242 16.1   67 6   3
35 NA 186  9.2   84 6   4
36 NA 220  8.6   85 6   5
37 NA 264 14.3   79 6   6
38 29 127  9.7   82 6   7
39 NA 273  6.9   87 6   8
40 71 291 13.8   90 6   9
41 39 323 11.5   87 6  10
42 NA 259 10.9   93 6  11
43 NA 250  9.2   92 6  12
44 23 148  8.0   82 6  13
45 NA 332 13.8   80 6  14
46 NA 322 11.5   79 6  15
47 21 191 14.9   77 6  16
48 37 284 20.7   72 6  17
49 20  37  9.2   65 6  18
50 12 120 11.5   73 6  19
51 13 137 10.3   76 6  20
52 NA 150  6.3   77 6  21
53 NA  59  1.7   76 6  22
54 NA  91  4.6   76 6  23
55 NA 250  6.3   76 6  24
56 NA 135  8.0   75 6  25
57 NA 127  8.0   78 6  26
58 NA  47 10.3   73 6  27
59 NA  98 11.5   80 6  28
60 NA  31 14.9   77 6  29
61 NA 138  8.0   83 6  30
62135 269  4.1   84 7   1
63 49 248  9.2   85 7   2
64 32 236  9.2   81 7   3
65 NA 101 10.9   84 7   4
66 64 175  4.6   83 7   5
67 40 314 10.9   83 7   6
68 77 276  5.1   88 7   7
69 97 267  6.3   92 7   8
70 97 272  5.7   92 7   9
71 85 175  7.4   89 7  10
72 NA 139  8.6   82 7  11
73 10 264 14.3   73 7  12
74 27 175 14.9   81 7  13
75 NA 291 14.9   91 7  14
76  7  48 14.3   80 7  15
77 48 260  6.9   81 7  16
78 35 274 10.3   82 7  17
79 61 285  6.3   84 7  18
80 79 187  5.1   87 7  19
81 63 220 11.5   85 7  20
82 16   7  6.9   74 7  21
83 NA 258  9.7   81 7  22
84 NA 295 11.5   82 7  23
85 80 294  8.6   86 7  24
86108 223  8.0   85 7  25
87 20  81  8.6   82 7  26
88 52  82 12.0   86 7  27
89 82 213  7.4   88 7  28
90 50 275  7.4   86 7  29
91 64 253  7.4   83 7  30
92 59 254  9.2   81 7  31
93 39  83  6.9   81 8   1
94  9  24 13.8   81 8   2
95 16  77  7.4   82 8   3
96 78  NA  6.9   86 8   4
97 35  NA  7.4   85 8   5
98 66  NA  4.6   87 8   6
99122 255  4.0   89 8   7
10089 229 10.3   90 8   8
101   110 207  8.0   90 8   9
102NA 222  8.6   92 8  10
103NA 137 11.5   86 8  11
10444 192 11.5   86 8  12
10528 273 11.5   82 8  13
10665 157  9.7   80 8  14
107NA  64 11.5   79 8  15
10822  71 10.3   

Re: [R] Understanding and predict round-off errors sign on simple functions

2016-07-01 Thread Sirhc via R-help
Thank you for all your answers and I will take a look to the 'propagate'
package.

Ps: first time I am participating to a mailing list, I hope I answer to the
right emails. 

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of DIGHE,
NILESH [AG/2362]
Sent: jeudi, 30 juin 2016 14:02
To: Marc Schwartz ; Bert Gunter

Cc: R-help 
Subject: Re: [R] Understanding and predict round-off errors sign on simple
functions

Using "runmean" function from caTools package within your SMA function
appears to solve the issue.  Please see details below.

library(caTools)

> dput(m)
structure(c(-0.626453810742332, 0.183643324222082, -0.835628612410047,
1.59528080213779, 0.329507771815361, -0.820468384118015, 0.487429052428485,
0.738324705129217, 0.575781351653492, -0.305388387156356, 3.51178116845085,
2.38984323641143, 1.3787594194582, -0.2146998871775, 3.12493091814311,
1.95506639098477, 1.98380973690105, 2.9438362106853, 2.82122119509809,
2.59390132121751, 5.91897737160822, 5.78213630073107, 5.07456498336519,
3.01064830413663, 5.61982574789471, 4.943871260471, 4.84420449329467,
3.52924761610073, 4.52184994489138, 5.4179415601997), .Dim = c(10L,
3L))


> dput(SMA)
function (x, n = 10, ...)
{
ma <- runmean(x, n)
if (!is.null(dim(ma))) {
colnames(ma) <- "SMA"
}
return(ma)
}


mma <- apply(m, 2, SMA, n=1)

results<-mma-m

> dput(results)
structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), .Dim = c(10L, 3L))


Nilesh
-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Marc
Schwartz
Sent: Wednesday, June 29, 2016 1:07 PM
To: Bert Gunter
Cc: R-help
Subject: Re: [R] Understanding and predict round-off errors sign on simple
functions

Hi,

Just to augment Bert's comments, I presume that you are aware of the
relevant R FAQ:

 
https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-
numbers-are-equal_003f

That you had an expectation of the difference being 0 suggested to me that
you might not be, but my apologies if that is not the case.

That being said, there are some higher precision CRAN packages that may
offer some additional functionality, with the potential limitations that
Bert references below. More information is available in the Numerical
Mathematics CRAN Task View:

  https://cran.r-project.org/web/views/NumericalMathematics.html

In addition, with the caveat that I have not used it, there is the
'propagate' package on CRAN that may be relevant to what you want to be able
to anticipate, at some level:

  https://cran.r-project.org/web/packages/propagate/index.html

It has not been updated in a while and there are some notes for the CRAN
package checks, that suggest that the maintainer may not be active at this
point.

Regards,

Marc


> On Jun 29, 2016, at 10:13 AM, Bert Gunter  wrote:
> 
> I am certainly no expert, but I would assume that:
> 
> 1. Roundoff errors depend on the exact numerical libraries and 
> versions that are used, and so general language comparisons are 
> impossible without that information;
> 
> 2. Roundoff errors depend on the exact calculations being done and 
> machine precision and are very complicated to determine
> 
> So I would say the answer to your questions is no.
> 
> But you should probably address such a question to a numerical analyst 
> for an authoritative answer. Maybe try stats.stackexchange.com  .
> 
> -- Bert
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along 
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> On Wed, Jun 29, 2016 at 2:55 AM, Sirhc via R-help 
wrote:
>> Hi,
>> 
>> 
>> 
>> May be it is a basic thing but I would like to know if we can 
>> anticipate round-off errors sign.
>> 
>> 
>> 
>> Here is an example :
>> 
>> 
>> 
>> # numerical matrix
>> 
>> m <- matrix(data=cbind(rnorm(10, 0), rnorm(10, 2), rnorm(10, 5)), 
>> nrow=10,
>> ncol=3)
>> 
>> 
>> 
>>> m
>> 
>>[,1]  [,2] [,3]
>> 
>> [1,]  0.4816247 1.1973502 3.855641
>> 
>> [2,] -1.2174937 0.7356427 4.393279
>> 
>> [3,]  0.8504074 2.5286509 2.689196
>> 
>> [4,]  1.8048642 1.8580804 6.665237
>> 
>> [5,] -0.6749397 1.0944277 4.838608
>> 
>> [6,]  0.8252034 1.5595268 3.681695
>> 
>> [7,]  1.3002208 0.9582693 4.561577
>> 
>> [8,]  1.6950923 3.5677921 6.005078
>> 
>> [9,]  0.6509285 0.9025964 5.082288
>> 
>> [10,] -0.5676040 1.3281102 4.446451
>> 
>> 
>> 
>> #weird moving average of period 1 !
>> 
>> mma <- apply(m, 2, SMA, n=1)
>> 
>> 
>> 
>>> mma
>> 
>>[,1]  [,2] [,3]
>> 
>> [1,] NANA   NA
>> 
>> [2,] -1.2174937 0.7356427 4.393279
>> 
>> [3,]  0.8504074 2.5286509 2.689196
>> 
>> [4,]  1.8048642 1.8580804 6.665237
>> 
>> [5,] -0.6749397 1.0944277 4.838608
>> 
>> [6,]  0.8252034 

Re: [R] [FORGED] Splitting data.frame into a list of small data.frames given indices

2016-07-01 Thread Bert Gunter
Inline.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Jul 1, 2016 at 7:40 AM, Witold E Wolski  wrote:
> Hi William,
>
> I tested plyrs dlply function, and it seems to have  have an O(N*
> log(R)) complexity (tested for R=N) so I do not know if N is the
> number of rows or nr of categories.
>
> For the data.frame example with 2e5 rows and 2e5 categories it is
> approx. 10 times faster than split. Still, it is 10 seconds on an
> i7-5930K 3.5GHz Intel.
>
> It would be nice if the documentation would contain runtime
> complexity information and the documentation of base package function
> would point to function which should be used instead.

It would, indeed -- but these things are very dependent on exact data
context, the hardware in use, the OS, etc.  Moreover, how could base
documentation possibly keep track of what all other packages are
doing?! -- that seems unreasonable on the face of it.  I know that
from time to time R docs do mention base algorithmic complexity (e.g.
?sort and the data.table package, I believe), but generally it is
preferable to omit such details, imho: access to a function should be
through its relatively fixed API, while the underlying machinery may
be subject to considerable change.  Obviously, there are circumstances
where something still could (and perhaps should) be said about
efficiency -- and R docs often do say it -- but I think the level of
detail you request is unrealistic and might often even mislead.

Obviously, just my opinion, so contrary views welcome.

Cheers,
Bert


>
> Thanks
>
>
>
>
> On 29 June 2016 at 16:13, William Dunlap  wrote:
>> I won't go into why splitting data.frames (or factors) uses time
>> proportional to the number of input rows times the number of
>> levels in the splitting factor, but you will get much better mileage
>> if you call split individually on each 'atomic' (numeric, character, ...)
>> variable and use mapply on the resulting lists.
>>
>> The plyr and dplyr packages were developed to deal with this
>> sort of problem.  Check them out.
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Wed, Jun 29, 2016 at 6:21 AM, Witold E Wolski  wrote:
>>>
>>> Hi,
>>>
>>> Here is an complete example which shows the the complexity of split or
>>> by is O(n^2)
>>>
>>> nrows <- c(1e3,5e3, 1e4 ,5e4, 1e5 ,2e5)
>>> res<-list()
>>>
>>> for(i in nrows){
>>>   dum <- data.frame(x = runif(i,1,1000), y=runif(i,1,1000))
>>>   res[[length(res)+1]]<-(system.time(x<- split(dum, 1:nrow(dum
>>> }
>>> res <- do.call("rbind",res)
>>> plot(nrows^2, res[,"elapsed"])
>>>
>>> And I can't see a reason why this has to be so slow.
>>>
>>>
>>> cheers
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 29 June 2016 at 12:00, Rolf Turner  wrote:
>>> > On 29/06/16 21:16, Witold E Wolski wrote:
>>> >>
>>> >> It's the inverse problem to merging a list of data.frames into a large
>>> >> data.frame just discussed in the "performance of do.call("rbind")"
>>> >> thread
>>> >>
>>> >> I would like to split a data.frame into a list of data.frames
>>> >> according to first column.
>>> >> This SEEMS to be easily possible with the function base::by. However,
>>> >> as soon as the data.frame has a few million rows this function CAN NOT
>>> >> BE USED (except you have A PLENTY OF TIME).
>>> >>
>>> >> for 'by' runtime ~ nrow^2, or formally O(n^2)  (see benchmark below).
>>> >>
>>> >> So basically I am looking for a similar function with better
>>> >> complexity.
>>> >>
>>> >>
>>> >>  > nrows <- c(1e5,1e6,2e6,3e6,5e6)
>>> >>>
>>> >>> timing <- list()
>>> >>> for(i in nrows){
>>> >>
>>> >> + dum <- peaks[1:i,]
>>> >> + timing[[length(timing)+1]] <- system.time(x<- by(dum[,2:3],
>>> >> INDICES=list(dum[,1]), FUN=function(x){x}, simplify = FALSE))
>>> >> + }
>>> >>>
>>> >>> names(timing)<- nrows
>>> >>> timing
>>> >>
>>> >> $`1e+05`
>>> >>user  system elapsed
>>> >>0.050.000.05
>>> >>
>>> >> $`1e+06`
>>> >>user  system elapsed
>>> >>1.482.984.46
>>> >>
>>> >> $`2e+06`
>>> >>user  system elapsed
>>> >>7.25   11.39   18.65
>>> >>
>>> >> $`3e+06`
>>> >>user  system elapsed
>>> >>   16.15   25.81   41.99
>>> >>
>>> >> $`5e+06`
>>> >>user  system elapsed
>>> >>   43.22   74.72  118.09
>>> >
>>> >
>>> > I'm not sure that I follow what you're doing, and your example is not
>>> > reproducible, since we have no idea what "peaks" is, but on a toy
>>> > example
>>> > with 5e6 rows in the data frame I got a timing result of
>>> >
>>> >user  system elapsed
>>> >   0.379 0.025 0.406
>>> >
>>> > when I applied split().  Is this adequately fast? Seems to me that if
>>> > you
>>> > want to split something, split() would be a good place to start.
>>> >
>>> > cheers,
>>> >
>>> > Rolf Turner
>>> >
>>> > --
>>> > Technical 

Re: [R] Column product

2016-07-01 Thread Sarah Goslee
Hi,

I think this is substantially less ugly:


A <- matrix(1:15,nrow=5,byrow=F); A
a <- c(1,2,3)

B <- sweep(A, 2, a, "^")
apply(B, 1, prod)

You could combine it into one line if you wanted, but I find it clearer as two:

> apply(sweep(A, 2, a, "^"), 1, prod)
[1]   47916  169344  421824  889056 1687500


Sarah


On Fri, Jul 1, 2016 at 10:15 AM, Steven Yen  wrote:
> A is a 5 x 3 matrix and a is a 3-vector. I like to exponentiate A[,1] to
> a[1], A[,2] to a[2], and A[,3] to a[3], and obtain the product of the
> resulting columns, as in line 3.
>
> I also accomplish this with lines 4 and 5. I like to have rowProducts(B)
> but there is not so I came up with something ugly in line
> 5--exponentiating the row sums of log. Is there a more elegant way than
> than line 5 or, better yet, lines 4 and 5 together? Thanks.
>
> A<-matrix(1:15,nrow=5,byrow=F); A
> a<-c(1,2,3)
> (A[,1]^a[1])*(A[,2]^a[2])*(A[,3]^a[3])
> B<-t(t(A)^a); B
> exp(rowSums(log(B)))
>
> Result:
>
>  > A<-matrix(1:15,nrow=5,byrow=F); A
>   [,1] [,2] [,3]
> [1,]16   11
> [2,]27   12
> [3,]38   13
> [4,]49   14
> [5,]5   10   15
>  > a<-c(1,2,3)
>  > (A[,1]^a[1])*(A[,2]^a[2])*(A[,3]^a[3])
> [1]   47916  169344  421824  889056 1687500
>  > B<-t(t(A)^a); B
>   [,1] [,2] [,3]
> [1,]1   36 1331
> [2,]2   49 1728
> [3,]3   64 2197
> [4,]4   81 2744
> [5,]5  100 3375
>  > exp(rowSums(log(B)))
> [1]   47916  169344  421824  889056 1687500
>  >

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] Splitting data.frame into a list of small data.frames given indices

2016-07-01 Thread Witold E Wolski
Hi William,

I tested plyrs dlply function, and it seems to have  have an O(N*
log(R)) complexity (tested for R=N) so I do not know if N is the
number of rows or nr of categories.

For the data.frame example with 2e5 rows and 2e5 categories it is
approx. 10 times faster than split. Still, it is 10 seconds on an
i7-5930K 3.5GHz Intel.

It would be nice if the documentation would contain runtime
complexity information and the documentation of base package function
would point to function which should be used instead.

Thanks




On 29 June 2016 at 16:13, William Dunlap  wrote:
> I won't go into why splitting data.frames (or factors) uses time
> proportional to the number of input rows times the number of
> levels in the splitting factor, but you will get much better mileage
> if you call split individually on each 'atomic' (numeric, character, ...)
> variable and use mapply on the resulting lists.
>
> The plyr and dplyr packages were developed to deal with this
> sort of problem.  Check them out.
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Wed, Jun 29, 2016 at 6:21 AM, Witold E Wolski  wrote:
>>
>> Hi,
>>
>> Here is an complete example which shows the the complexity of split or
>> by is O(n^2)
>>
>> nrows <- c(1e3,5e3, 1e4 ,5e4, 1e5 ,2e5)
>> res<-list()
>>
>> for(i in nrows){
>>   dum <- data.frame(x = runif(i,1,1000), y=runif(i,1,1000))
>>   res[[length(res)+1]]<-(system.time(x<- split(dum, 1:nrow(dum
>> }
>> res <- do.call("rbind",res)
>> plot(nrows^2, res[,"elapsed"])
>>
>> And I can't see a reason why this has to be so slow.
>>
>>
>> cheers
>>
>>
>>
>>
>>
>>
>>
>> On 29 June 2016 at 12:00, Rolf Turner  wrote:
>> > On 29/06/16 21:16, Witold E Wolski wrote:
>> >>
>> >> It's the inverse problem to merging a list of data.frames into a large
>> >> data.frame just discussed in the "performance of do.call("rbind")"
>> >> thread
>> >>
>> >> I would like to split a data.frame into a list of data.frames
>> >> according to first column.
>> >> This SEEMS to be easily possible with the function base::by. However,
>> >> as soon as the data.frame has a few million rows this function CAN NOT
>> >> BE USED (except you have A PLENTY OF TIME).
>> >>
>> >> for 'by' runtime ~ nrow^2, or formally O(n^2)  (see benchmark below).
>> >>
>> >> So basically I am looking for a similar function with better
>> >> complexity.
>> >>
>> >>
>> >>  > nrows <- c(1e5,1e6,2e6,3e6,5e6)
>> >>>
>> >>> timing <- list()
>> >>> for(i in nrows){
>> >>
>> >> + dum <- peaks[1:i,]
>> >> + timing[[length(timing)+1]] <- system.time(x<- by(dum[,2:3],
>> >> INDICES=list(dum[,1]), FUN=function(x){x}, simplify = FALSE))
>> >> + }
>> >>>
>> >>> names(timing)<- nrows
>> >>> timing
>> >>
>> >> $`1e+05`
>> >>user  system elapsed
>> >>0.050.000.05
>> >>
>> >> $`1e+06`
>> >>user  system elapsed
>> >>1.482.984.46
>> >>
>> >> $`2e+06`
>> >>user  system elapsed
>> >>7.25   11.39   18.65
>> >>
>> >> $`3e+06`
>> >>user  system elapsed
>> >>   16.15   25.81   41.99
>> >>
>> >> $`5e+06`
>> >>user  system elapsed
>> >>   43.22   74.72  118.09
>> >
>> >
>> > I'm not sure that I follow what you're doing, and your example is not
>> > reproducible, since we have no idea what "peaks" is, but on a toy
>> > example
>> > with 5e6 rows in the data frame I got a timing result of
>> >
>> >user  system elapsed
>> >   0.379 0.025 0.406
>> >
>> > when I applied split().  Is this adequately fast? Seems to me that if
>> > you
>> > want to split something, split() would be a good place to start.
>> >
>> > cheers,
>> >
>> > Rolf Turner
>> >
>> > --
>> > Technical Editor ANZJS
>> > Department of Statistics
>> > University of Auckland
>> > Phone: +64-9-373-7599 ext. 88276
>>
>>
>>
>> --
>> Witold Eryk Wolski
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Witold Eryk Wolski

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic regression and robust standard errors

2016-07-01 Thread Achim Zeileis

On Fri, 1 Jul 2016, Faradj Koliev wrote:


Dear Achim Zeileis, 
Many thanks for your quick and informative answer. 

I?m sure that the vcovCL should work, however, I experience some problems. 


> coeftest(model, vcov=vcovCL(model, cluster=mydata$ID))

First I got this error: 

Error in vcovCL(model, cluster = mydata$ID) : 
  length of 'cluster' does not match number of observations


This probably means that not all rows of "mydata" have been used for the 
estimation of the model, e.g., due to missing values or something like 
that. Hence, the mismatch seems to occur.



After checking the observations I got this error: 

Error in vcovCL(model, cluster = mydata$ID) : object 'tets' not found
Called from: vcovCL(model, cluster = mydata$ID)
Browse[1]> 


The variable "tets" is presumably one of the regressors in your "model" 
and apparently it cannot be found when extracting the model scores. Maybe 
you haven't stored the model frame and deleted the data.


Hard to say without a simple reproducible example.
Z


What can I do to fix it? What am I doing wrong now? 





  1 jul 2016 kl. 14:57 skrev Achim Zeileis
  :

On Fri, 1 Jul 2016, Faradj Koliev wrote:

  Dear all,

  I use ?polr? command (library: MASS) to estimate an
  ordered logistic regression.

  My model:   summary( model<- polr(y ~ x1+x2+x3+x4+x1*x2
  ,data=mydata, Hess = TRUE))

  But how do I get robust clustered standard errors?
  I??ve tried coeftest(resA, vcov=vcovHC(resA,
  cluster=lipton$ID))


The vcovHC() function currently does not (yet) have a "cluster"
argument. We are working on it but it's not finished yet.

As an alternative I include the vcovCL() function below that computes
the usual simple clustered sandwich estimator. This can be applied to
"polr" objects and plugged into coeftest(). So

coeftest(resA, vcov=vcovCL(resA, cluster=lipton$ID))

should work.

  and summary(a <- robcov(model,mydata$ID)).


The robcov() function does in principle what you want by I'm not sure
whether it works with polr(). But for sure it works with lrm() from
the "rms" package.

Hope that helps,
Z

vcovCL <- function(object, cluster = NULL, adjust = NULL)
{
 stopifnot(require("sandwich"))

 ## cluster specification
 if(is.null(cluster)) cluster <- attr(object, "cluster")
 if(is.null(cluster)) stop("no 'cluster' specification found")
 cluster <- factor(cluster)

 ## estimating functions and dimensions
 ef <- estfun(object)
 n <- NROW(ef)
 k <- NCOL(ef)
 if(n != length(cluster))
   stop("length of 'cluster' does not match number of observations")
 m <- length(levels(cluster))

 ## aggregate estimating functions by cluster and compute meat
 ef <- sapply(levels(cluster), function(i) colSums(ef[cluster == i, ,
   drop = FALSE]))
 ef <- if(NCOL(ef) > 1L) t(ef) else matrix(ef, ncol = 1L)
 mt <- crossprod(ef)/n

 ## bread
 br <- try(bread(object), silent = TRUE)
 if(inherits(br, "try-error")) br <- vcov(object) * n

 ## put together sandwich
 vc <- 1/n * (br %*% mt %*% br)

 ## adjustment
 if(is.null(adjust)) adjust <- class(object)[1L] == "lm"
 adj <- if(adjust) m/(m - 1L) * (n - 1L)/(n - k) else m/(m - 1L)

 ## return
 return(adj * vc)
}


  Neither works for me. So I wonder what am I doing wrong
  here?


  All suggestions are welcome ? thank you!
  [[alternative HTML version deleted]]

  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and
  more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
  reproducible code.





__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Column product

2016-07-01 Thread Steven Yen
A is a 5 x 3 matrix and a is a 3-vector. I like to exponentiate A[,1] to 
a[1], A[,2] to a[2], and A[,3] to a[3], and obtain the product of the 
resulting columns, as in line 3.

I also accomplish this with lines 4 and 5. I like to have rowProducts(B) 
but there is not so I came up with something ugly in line 
5--exponentiating the row sums of log. Is there a more elegant way than 
than line 5 or, better yet, lines 4 and 5 together? Thanks.

A<-matrix(1:15,nrow=5,byrow=F); A
a<-c(1,2,3)
(A[,1]^a[1])*(A[,2]^a[2])*(A[,3]^a[3])
B<-t(t(A)^a); B
exp(rowSums(log(B)))

Result:

 > A<-matrix(1:15,nrow=5,byrow=F); A
  [,1] [,2] [,3]
[1,]16   11
[2,]27   12
[3,]38   13
[4,]49   14
[5,]5   10   15
 > a<-c(1,2,3)
 > (A[,1]^a[1])*(A[,2]^a[2])*(A[,3]^a[3])
[1]   47916  169344  421824  889056 1687500
 > B<-t(t(A)^a); B
  [,1] [,2] [,3]
[1,]1   36 1331
[2,]2   49 1728
[3,]3   64 2197
[4,]4   81 2744
[5,]5  100 3375
 > exp(rowSums(log(B)))
[1]   47916  169344  421824  889056 1687500
 >


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic regression and robust standard errors

2016-07-01 Thread Faradj Koliev
Dear Achim Zeileis, 

Many thanks for your quick and informative answer. 

I’m sure that the vcovCL should work, however, I experience some problems. 


> coeftest(model, vcov=vcovCL(model, cluster=mydata$ID))

First I got this error: 

Error in vcovCL(model, cluster = mydata$ID) : 
  length of 'cluster' does not match number of observations

After checking the observations I got this error: 

Error in vcovCL(model, cluster = mydata$ID) : object 'tets' not found
Called from: vcovCL(model, cluster = mydata$ID)
Browse[1]> 

What can I do to fix it? What am I doing wrong now? 





> 1 jul 2016 kl. 14:57 skrev Achim Zeileis :
> 
> On Fri, 1 Jul 2016, Faradj Koliev wrote:
> 
>> Dear all, 
>> 
>> I use ?polr? command (library: MASS) to estimate an ordered logistic 
>> regression.
>> 
>> My model:   summary( model<- polr(y ~ x1+x2+x3+x4+x1*x2 ,data=mydata, Hess = 
>> TRUE))
>> 
>> But how do I get robust clustered standard errors? 
>> I??ve tried coeftest(resA, vcov=vcovHC(resA, cluster=lipton$ID))
> 
> The vcovHC() function currently does not (yet) have a "cluster" argument. We 
> are working on it but it's not finished yet.
> 
> As an alternative I include the vcovCL() function below that computes the 
> usual simple clustered sandwich estimator. This can be applied to "polr" 
> objects and plugged into coeftest(). So
> 
> coeftest(resA, vcov=vcovCL(resA, cluster=lipton$ID))
> 
> should work.
> 
>> and summary(a <- robcov(model,mydata$ID)).
> 
> The robcov() function does in principle what you want by I'm not sure whether 
> it works with polr(). But for sure it works with lrm() from the "rms" package.
> 
> Hope that helps,
> Z
> 
> vcovCL <- function(object, cluster = NULL, adjust = NULL)
> {
>  stopifnot(require("sandwich"))
> 
>  ## cluster specification
>  if(is.null(cluster)) cluster <- attr(object, "cluster")
>  if(is.null(cluster)) stop("no 'cluster' specification found")
>  cluster <- factor(cluster)
> 
>  ## estimating functions and dimensions
>  ef <- estfun(object)
>  n <- NROW(ef)
>  k <- NCOL(ef)
>  if(n != length(cluster))
>stop("length of 'cluster' does not match number of observations")
>  m <- length(levels(cluster))
> 
>  ## aggregate estimating functions by cluster and compute meat
>  ef <- sapply(levels(cluster), function(i) colSums(ef[cluster == i, ,
>drop = FALSE]))
>  ef <- if(NCOL(ef) > 1L) t(ef) else matrix(ef, ncol = 1L)
>  mt <- crossprod(ef)/n
> 
>  ## bread
>  br <- try(bread(object), silent = TRUE)
>  if(inherits(br, "try-error")) br <- vcov(object) * n
> 
>  ## put together sandwich
>  vc <- 1/n * (br %*% mt %*% br)
> 
>  ## adjustment
>  if(is.null(adjust)) adjust <- class(object)[1L] == "lm"
>  adj <- if(adjust) m/(m - 1L) * (n - 1L)/(n - k) else m/(m - 1L)
> 
>  ## return
>  return(adj * vc)
> }
> 
> 
>> Neither works for me. So I wonder what am I doing wrong here?
>> 
>> 
>> All suggestions are welcome ? thank you!
>>  [[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge several datasets into one

2016-07-01 Thread Lida Zeighami
Hi Lily,

I think below codes can work:

 f<- list.files("D:/output/test/your folde
rname",full.names=TRUE,recursive=TRUE)

files<- grep(".csv", f)

files_merge<- data.frame()
for (i in 1:length(f[files])){
  data<- read.csv(file=f[files][i],header=TRUE, sep=",")
  files_merge<-  rbind(files_merge,data)
}

Best,
Lida

On Thu, Jun 30, 2016 at 2:26 PM, lily li  wrote:

> Hi R users,
>
> I'd like to ask that how to merge several datasets into one in R? I put
> these csv files in one folder, and use the lapply function, but it says
> that cannot open file 'xx.csv'. These files have different names, but end
> with .csv extension, and the files have the same header. Thanks for your
> help.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R-es] Repetir datos en una tabla

2016-07-01 Thread Javier Marcuzzi
Estimados

Pero la pregunta inicial dice que tiene los datos en un data.frame y quiere 
pasarlos de día a horas, repitiendo.

Entiendo que por ejemplo:
Dia1………2
Dia2………4

La repetición de día en 24 horas no contempla los días de años bisiestos (tiene 
5 años de información según dice).
Lógicamente no voy a realizar 2 x 24, + 4 x 24, o 2/24 + 4 /24 (multiplicar el 
diario por 24, o poner la misma cantidad a todas las horas).

Distinto es que el problema sea realizar una tabla por hora y de cinco años de 
duración donde aplico una función que calcula algo.

Hay algo que no me cierra.

Javier Rubén Marcuzzi

De: Carlos J. Gil Bellosta 
Enviado: viernes, 1 de julio de 2016 9:42
Para: Olivier Nuñez
[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es

Re: [R] Logistic regression and robust standard errors

2016-07-01 Thread Achim Zeileis

On Fri, 1 Jul 2016, Faradj Koliev wrote:

Dear all, 



I use ?polr? command (library: MASS) to estimate an ordered logistic regression.

My model:   summary( model<- polr(y ~ x1+x2+x3+x4+x1*x2 ,data=mydata, Hess = 
TRUE))

But how do I get robust clustered standard errors? 


I??ve tried coeftest(resA, vcov=vcovHC(resA, cluster=lipton$ID))


The vcovHC() function currently does not (yet) have a "cluster" argument. 
We are working on it but it's not finished yet.


As an alternative I include the vcovCL() function below that computes the 
usual simple clustered sandwich estimator. This can be applied to "polr" 
objects and plugged into coeftest(). So


coeftest(resA, vcov=vcovCL(resA, cluster=lipton$ID))

should work.


and summary(a <- robcov(model,mydata$ID)).


The robcov() function does in principle what you want by I'm not sure 
whether it works with polr(). But for sure it works with lrm() from the 
"rms" package.


Hope that helps,
Z

vcovCL <- function(object, cluster = NULL, adjust = NULL)
{
  stopifnot(require("sandwich"))

  ## cluster specification
  if(is.null(cluster)) cluster <- attr(object, "cluster")
  if(is.null(cluster)) stop("no 'cluster' specification found")
  cluster <- factor(cluster)

  ## estimating functions and dimensions
  ef <- estfun(object)
  n <- NROW(ef)
  k <- NCOL(ef)
  if(n != length(cluster))
stop("length of 'cluster' does not match number of observations")
  m <- length(levels(cluster))

  ## aggregate estimating functions by cluster and compute meat
  ef <- sapply(levels(cluster), function(i) colSums(ef[cluster == i, ,
drop = FALSE]))
  ef <- if(NCOL(ef) > 1L) t(ef) else matrix(ef, ncol = 1L)
  mt <- crossprod(ef)/n

  ## bread
  br <- try(bread(object), silent = TRUE)
  if(inherits(br, "try-error")) br <- vcov(object) * n

  ## put together sandwich
  vc <- 1/n * (br %*% mt %*% br)

  ## adjustment
  if(is.null(adjust)) adjust <- class(object)[1L] == "lm"
  adj <- if(adjust) m/(m - 1L) * (n - 1L)/(n - k) else m/(m - 1L)

  ## return
  return(adj * vc)
}



Neither works for me. So I wonder what am I doing wrong here?


All suggestions are welcome ? thank you!
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R-es] Repetir datos en una tabla

2016-07-01 Thread Carlos J. Gil Bellosta
Y una solución "clásica":

iris[rep(1:nrow(iris), each = 24),]

Un saludo,

Carlos J. Gil Bellosta
http://www.datanalytics.com

El día 1 de julio de 2016, 11:35, Olivier Nuñez  escribió:
> Una solución con el paquete data.table:
>
>> require(data.table)
>> tabla=data.table(dia=1:10,y=rnorm(10))
>> tabla
> dia   y
>  1:   1 -1.04816325
>  2:   2 -0.23554981
>  3:   3  1.79809995
>  4:   4  0.07578478
>  5:   5 -1.38710527
>  6:   6  2.18929038
>  7:   7  0.52330030
>  8:   8 -0.34695695
>  9:   9 -0.10357643
> 10:  10 -0.76800351
>> tabla[,.(hora=1:24),by=.(dia,y)]
>  dia  y hora
>   1:   1 -1.04816321
>   2:   1 -1.04816322
>   3:   1 -1.04816323
>   4:   1 -1.04816324
>   5:   1 -1.04816325
>  ---
> 236:  10 -0.7680035   20
> 237:  10 -0.7680035   21
> 238:  10 -0.7680035   22
> 239:  10 -0.7680035   23
> 240:  10 -0.7680035   24
>
>
> Un saludo. Olivier
>
> - Mensaje original -
> De: "Novvier Marco Uscuchagua Cornelio" 
> Para: r-help-es@r-project.org
> Enviados: Jueves, 30 de Junio 2016 19:15:34
> Asunto: [R-es] Repetir datos en una tabla
>
> Buen día amigos,
> Tengo una tabla con registros de datos por día y quisiera convertirla en
> horas, es decir, repetir una fila 24 veces. Lo puedo hacer manualmente pero
> es un registro de 5 años y tardaría una eternidad. ¿Saben cual es método
> más rápido?
> De antemano muchas gracias.
> Atte. Marco.
>
> [[alternative HTML version deleted]]
>
> ___
> R-help-es mailing list
> R-help-es@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-help-es
>
> ___
> R-help-es mailing list
> R-help-es@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-help-es

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


[R] Logistic regression and robust standard errors

2016-07-01 Thread Faradj Koliev
Dear all, 


I use ”polr” command (library: MASS) to estimate an ordered logistic regression.

My model:   summary( model<- polr(y ~ x1+x2+x3+x4+x1*x2 ,data=mydata, Hess = 
TRUE))

But how do I get robust clustered standard errors? 

I’’ve tried   coeftest(resA, vcov=vcovHC(resA, cluster=lipton$ID)) and 
summary(a <- robcov(model,mydata$ID)). Neither works for me. So I wonder what 
am I doing wrong here? 


All suggestions are welcome – thank you! 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge several datasets into one

2016-07-01 Thread ruipbarradas
Hello,

Maybe something like this.

fls <- list.files(pattern = "*.csv")
dat.list <- lapply(fls, read.csv)
dat <- do.call(rbind, dat.list)

Hope this helps,

Rui Barradas
 

Citando lily li :

> Hi R users,
>
> I'd like to ask that how to merge several datasets into one in R? I put
> these csv files in one folder, and use the lapply function, but it says
> that cannot open file 'xx.csv'. These files have different names, but end
> with .csv extension, and the files have the same header. Thanks for your
> help.
>
>         [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide  
> http://www.R-project.org/posting-guide.htmland provide commented,  
> minimal, self-contained, reproducible code.

 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.