Re: [R] Using PCA to filter a series

2014-10-04 Thread Jonathan Thayn
This is exactly what I was looking for. Thank you.


Jonathan Thayn




On Oct 3, 2014, at 10:32 AM, David L Carlson wrote:

> You can reconstruct the data from the first component. Here's an example 
> using singular value decomposition on the original data matrix:
> 
>> d <- cbind(d1, d2, d3, d4)
>> d.svd <- svd(d)
>> new <- d.svd$u[,1] * d.svd$d[1]
> 
> new is basically your cp1. If we multiply it by each of the loadings, we can 
> create reconstructed values based on the first component:
> 
>> dnew <- sapply(d.svd$v[,1], function(x) new * x)
>> round(head(dnew), 1)
>  [,1]  [,2]  [,3]  [,4]
> [1,] 119.3 134.1 135.7 134.6
> [2,] 104.2 117.2 118.6 117.6
> [3,] 109.7 123.3 124.8 123.8
> [4,] 109.3 122.9 124.3 123.3
> [5,] 105.8 119.0 120.4 119.4
> [6,] 111.5 125.4 126.9 125.8
>> head(d)
>  d1  d2  d3  d4
> [1,] 113 138 138 134
> [2,] 108 115 120 115
> [3,] 105 127 129 120
> [4,] 103 127 129 120
> [5,] 109 119 120 117
> [6,] 115 126 126 123
> 
>> diag(cor(d, dnew))
> [1] 0.9233742 0.9921703 0.9890085 0.9910287
> 
> Since you want a single variable to stand for all four, you could scale new 
> to the mean:
> 
>> newd <- new*mean(d.svd$v[,1])
>> head(newd)
> [1] 130.9300 114.3972 120.3884 119.9340 116.1588 122.3983
> 
> -
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
> 
> 
> 
> -Original Message-
> From: Jonathan Thayn [mailto:jth...@ilstu.edu] 
> Sent: Thursday, October 2, 2014 11:11 PM
> To: David L Carlson
> Cc: r-help@r-project.org
> Subject: Re: [R] Using PCA to filter a series
> 
> I suppose I could calculate the eigenvectors directly and not worry about 
> centering the time-series, since they essentially the same range to begin 
> with:
> 
> vec <- eigen(cor(cbind(d1,d2,d3,d4)))$vector
> cp <- cbind(d1,d2,d3,d4)%*%vec
> cp1 <- cp[,1]
> 
> I guess there is no way to reconstruct the original input data using just the 
> first component, though, is there? Not the original data in it entirety, just 
> one time-series that we representative of the general pattern. Possibly 
> something like the following, but with just the first component:
> 
> o <- cp%*%solve(vec)
> 
> Thanks for your help. It's been a long time since I've played with PCA.
> 
> Jonathan Thayn
> 
> 
> 
> 
> On Oct 2, 2014, at 4:59 PM, David L Carlson wrote:
> 
>> I think you want to convert your principal component to the same scale as 
>> d1, d2, d3, and d4. But the "original space" is a 4-dimensional space in 
>> which d1, d2, d3, and d4 are the axes, each with its own mean and standard 
>> deviation. Here are a couple of possibilities
>> 
>> # plot original values for comparison
>>> matplot(cbind(d1, d2, d3, d4), pch=20, col=2:5)
>> # standardize the pc scores to the grand mean and sd
>>> new1 <- scale(pca$scores[,1])*sd(c(d1, d2, d3, d4)) + mean(c(d1, d2, d3, 
>>> d4))
>>> lines(new1)
>> # Use least squares regression to predict the row means for the original 
>> four variables
>>> new2 <- predict(lm(rowMeans(cbind(d1, d2, d3, d4))~pca$scores[,1]))
>>> lines(new2, col="red")
>> 
>> -
>> David L Carlson
>> Department of Anthropology
>> Texas A&M University
>> College Station, TX 77840-4352
>> 
>> 
>> 
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
>> Behalf Of Don McKenzie
>> Sent: Thursday, October 2, 2014 4:39 PM
>> To: Jonathan Thayn
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Using PCA to filter a series
>> 
>> 
>> On Oct 2, 2014, at 2:29 PM, Jonathan Thayn  wrote:
>> 
>>> Hi Don. I would like to "de-rotate� the first component back to its 
>>> original state so that it aligns with the original time-series. My goal is 
>>> to create a �cleaned�, or a �model� time-series from which noise has been 
>>> removed. 
>> 
>> Please cc the list with replies. It�s considered courtesy plus you�ll get 
>> more help that way than just from me.
>> 
>> Your goal sounds almost metaphorical, at least to me.  Your first axis 
>> �aligns� with the original time series already in that it captures the 
>> dominant variation
>> across all four. Beyond that, there are many approaches to signal/noise 
>> relations within time-series analysis. I am not a good source of help on 
>> these, and you probably need a statistical consult (locally?), which is not 
>> the function of this list.
>> 
>>> 
>>> 
>>> Jonathan Thayn
>>> 
>>> 
>>> 
>>> On Oct 2, 2014, at 2:33 PM, Don McKenzie  wrote:
>>> 
 
 On Oct 2, 2014, at 12:18 PM, Jonathan Thayn  wrote:
 
> I have four time-series of similar data. I would  like to combine these 
> into a single, clean time-series. I could simply find the mean of each 
> time period, but I think that using principal components analysis should 
> extract the most salient pattern and ignore some of the noise. I can 
> compute components using princomp
> 
> 
> d1 <- c(113, 108, 105, 103, 109, 115, 115, 102, 102, 11

Re: [R] R Markdown (Rstudio) Limit Results in knit Pdf

2014-10-04 Thread Michael Friendly
This is a knitr-specific question, and you are probably better off 
posting to the stackoverflow knitr questions site,

http://stackoverflow.com/questions/tagged/knitr

Nonetheless, here is what I use to add an output.lines options to chunk 
output.  This works for me using LaTeX output;  you can try it with 
markdown...


  # knitr hook function to allow an output.lines option
  # e.g.,
  #   output.lines=12 prints lines 1:12 ...
  #   output.lines=1:12 does the same
  #   output.lines=3:15 prints lines ... 3:15 ...
  #   output.lines=-(1:8) removes lines 1:8 and prints ... 9:n ...
  #   No allowance for anything but a consecutive range of lines

  hook_output <- knit_hooks$get("output")
  knit_hooks$set(output = function(x, options) {
 lines <- options$output.lines
 if (is.null(lines)) {
   return(hook_output(x, options))  # pass to default hook
 }
 x <- unlist(strsplit(x, "\n"))
 more <- "..."
 if (length(lines)==1) {# first n lines
   if (length(x) > lines) {
 # truncate the output, but add 
 x <- c(head(x, lines), more)
   }
 } else {
   x <- c(if (abs(lines[1])>1 | lines[1]<0) more else NULL,
  x[lines],
  if (length(x)>lines[abs(length(lines))]) more else NULL
 )
 }
 # paste these lines together
 x <- paste(c(x, ""), collapse = "\n")
 hook_output(x, options)
   })



On 10/4/2014 11:08 AM, Teis M. Kristensen wrote:

Hi all,

I am writing here because I would like to limit the number of lines that are 
produced from a function when I knit my markdown document in R.

The code is written down as following and gives 50+ lines of data when run. My 
goal is to only have 9 lines of code produced by the sedist function.

```{r, results=1:9}
sedist(FILENAME, method="correlation")
```

I have tried using {r, message=1:9}, {r, Hide=1:9} and similar.

Please let me if you have a solution.

Best,
Teis Moeller Kristensen
School of Communication and Information
Rutgers University
Office ANX A - 103


[[alternative HTML version deleted]]




--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University  Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Inference() Function Insisting That I Use ANOVA Versus Two-Sided Hypothesis Test; R/RStudio

2014-10-04 Thread Jason Eyerly
Hello All.

I'm trying to use a custom function called Inference() as seen in the code 
below. There's no documentation for the function, but it is from my DASI class 
in Coursera. According to the feedback I have received, I am using the function 
properly. I'm trying to do a two-sided hypothesis test between my class 
variable and my wordsum variable. However, the function/R/R Studio keep 
insisting I do an ANOVA test. This doesn't work for me since I'm trying to 
reject the null, and create a confidence interval between the difference of two 
independent means. I've looked at the function, but as I'm no R expert, I don't 
see anything out of the ordinary. Any help is greatly appreciated.

load(url("http://bit.ly/dasi_gss_ws_cl";))
source("http://bit.ly/dasi_inference";)

summary(gss)
by(gss$wordsum, gss$class, mean)
boxplot(gss$wordsum ~ gss$class)

gss_clean = na.omit(subset(gss, class == "WORKING" | class =="LOWER"))

inference(y = gss_clean$wordsum, x = gss_clean$class, est = "mean", type = 
"ht", 
  null = 0, alternative = "twosided", method = "theoretical�)

Returns:

Response variable: numerical, Explanatory variable: categorical
Error: Use alternative = 'greater' for ANOVA or chi-square test.
In addition: Warning message:
Ignoring null value since it's undefined for ANOVA.

Best Regards,
Jason Eyerly
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] get names of glm and related families from an object

2014-10-04 Thread Michael Friendly
In a function I'm writing, I want to handle a variety of families of 
glm() and related models
including MASS::glm.nb() and hopefully countreg::hurdle(), zeroinfl().  
A central part of the function

is a switch() call

family <- object$family$family
switch(family,
"binomial" = {
},
"quasibinomial" = {
},
"poisson" = {
},
"quasipoisson" = {
},
"gaussian" = {
}
)

But I discovered that the object$family$family slot doesn't work the 
same way with glm.nb --

I get, e.g.,
> nmes.nbin2$family$family
[1] "Negative Binomial(1.2354)"
and the countreg functions don't return a family component.

I think I have to also use class(object), and a more complicated 
computation of family,

but maybe there's a simpler way?

-Michael

--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University  Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rolling window linear regression

2014-10-04 Thread Bert Gunter
Use ?loess instead.

-- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Sat, Oct 4, 2014 at 12:09 AM, Grace Shi <1104271...@qq.com> wrote:
>
> I have to do roll regression based on the Daily data. I use the past three
> weeks of daily returns as the estimation window and the regression is
> estimated rolling forward one week at a time generating time series
> estimates of beta. I know I should use the rollapply in zoo package. but I
> am not sure how to  do.
>
> data example:
>
>  stockday week  y  x
>  "1"  2009-01-02 2009-01  0.89  2.45
>  "1"  2009-01-03 2009-01  1.21  1.90
>  "1"  2009-01-04 2009-01  0.12  0.89
>  "1"  2009-01-05 2009-01  1.45  2.78
>  "1"  2009-01-06 2009-01  1.98  0.98
>  "1"  2009-01-09 2009-02  3.34  1.23
>  "1"  2009-01-10 2009-02  0.12  0.89
>  "1"  2009-01-11 2009-02  1.45  2.78
>  "1"  2009-01-13 2009-02  1.98  0.98
>  "1"  2009-01-16 2009-03  3.38  0.93
>  "1"  2009-01-17 2009-03  6.56  3.90
>  "1"  2009-01-18 2009-03  5.09  3.45
>  "1"  2009-01-19 2009-03  5.89  3.78
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about range of letters

2014-10-04 Thread Robert Baer


On 10/4/2014 8:21 AM, Nia Gupta wrote:

Hello,

I have a column with a bunch of letters. I would like to keep some of these 
letters (A,C,D,L) and turn the rest into 'X'.

I have tried using ifelse with '|' in between the argument but it didn't work 
nor did 4 separate ifelse statements.

Example, I currently have:
LettersABCDE
I would like to have:
LettersAXCDX
Thank you

[[alternative HTML version deleted]]

try:
let = sample(LETTERS[1:5],100,replace=TRUE)
let
let1 =  ifelse(let %in% c('A','C','D'),let,'X')
let1


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about range of letters

2014-10-04 Thread Ben Bolker
Nia Gupta  yahoo.com> writes:

> 
> Hello, 
 
> I have a column with a bunch of letters. I would like to keep some
> of these letters (A,C,D,L) and turn the rest into 'X'.
 
> I have tried using ifelse with '|' in between the argument but it
> didn't work nor did 4 separate ifelse statements.
 
> Example, I currently have:
> Letters    A    B    C    D    E
> I would like to have:
> Letters    A    X    C    D    X
> Thank you

  %in% will be helpful

 Letters <- LETTERS[1:5]
 targets <- c("A","C","D","L")
 Letters[!Letters %in% targets] <- "X"

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R Markdown (Rstudio) Limit Results in knit Pdf

2014-10-04 Thread Teis M. Kristensen
Hi all,

I am writing here because I would like to limit the number of lines that are 
produced from a function when I knit my markdown document in R.

The code is written down as following and gives 50+ lines of data when run. My 
goal is to only have 9 lines of code produced by the sedist function.

```{r, results=1:9}
sedist(FILENAME, method="correlation")
```

I have tried using {r, message=1:9}, {r, Hide=1:9} and similar. 

Please let me if you have a solution.

Best,
Teis Moeller Kristensen
School of Communication and Information
Rutgers University
Office ANX A - 103


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question about range of letters

2014-10-04 Thread Nia Gupta
Hello, 

I have a column with a bunch of letters. I would like to keep some of these 
letters (A,C,D,L) and turn the rest into 'X'. 

I have tried using ifelse with '|' in between the argument but it didn't work 
nor did 4 separate ifelse statements. 

Example, I currently have:
Letters    A    B    C    D    E
I would like to have:
Letters    A    X    C    D    X
Thank you

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Rolling window linear regression

2014-10-04 Thread Grace Shi

I have to do roll regression based on the Daily data. I use the past three
weeks of daily returns as the estimation window and the regression is
estimated rolling forward one week at a time generating time series
estimates of beta. I know I should use the rollapply in zoo package. but I
am not sure how to  do.

data example:

 stockday week  y  x
 "1"  2009-01-02 2009-01  0.89  2.45
 "1"  2009-01-03 2009-01  1.21  1.90
 "1"  2009-01-04 2009-01  0.12  0.89
 "1"  2009-01-05 2009-01  1.45  2.78
 "1"  2009-01-06 2009-01  1.98  0.98
 "1"  2009-01-09 2009-02  3.34  1.23
 "1"  2009-01-10 2009-02  0.12  0.89
 "1"  2009-01-11 2009-02  1.45  2.78
 "1"  2009-01-13 2009-02  1.98  0.98
 "1"  2009-01-16 2009-03  3.38  0.93
 "1"  2009-01-17 2009-03  6.56  3.90
 "1"  2009-01-18 2009-03  5.09  3.45
 "1"  2009-01-19 2009-03  5.89  3.78

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge by time, certain value if 5 min before and after an "event"

2014-10-04 Thread Dagmar
Thank you Jean, Petr, Terry, William and everyone else who thought about 
my problem.
It is sooo good that this mailing list exists!

I solved my problem using Petr's suggestion, that didn't seem so 
complicated and worked fine for me.
Thanks again and have a great weekend,
Dagmar



Am 02.10.2014 um 23:38 schrieb Adams, Jean:
> Thanks, Dagmar.
>
> So, shouldn't row 3 with a time of 09:51:01 be "low" and not "high"?
>
> Jean
>
> On Thu, Oct 2, 2014 at 4:25 PM, Dagmar  > wrote:
>
> Dear Jean and all,
>
> I want all lines to be "low", but during times 9:55 - 10:05 a.m
> (i.e. a
> timespan of 10 min) I want them to be "high".
> In my real data "low" and "high" refer to "lowtide" and "hightide" in
> the waddensea and I want to assign the location of my animal at
> the time
> it was taken to the tide (that means, there was water not only exactly
> at 10:00 (as taken from official data) but also 5 min before and
> after).
>
> I hope that is more understandable, if not ask again. Thanks for
> trying
> to help,
> Dagmar
>
> Am 02.10.2014 um 23:01 schrieb Adams, Jean:
> > Dagmar,
> >
> > Can you explain more fully why rows 1, 2, and 5 in your result are
> > "low" and rows 3 and 4 are "high"?  It is not clear to me from the
> > information you have provided.
> >
> > > result[c(1, 2, 5), ]
> > Timestamp location Event
> > 1 24.09.2012 09:05:011 low
> > 2 24.09.2012 09:49:502 low
> > 5 24.09.2012 10:05:105 low
> >
> > > result[3:4, ]
> > Timestamp location Event
> > 3 24.09.2012 09:51:013  high
> > 4 24.09.2012 10:04:501  high
> >
> > Jean
> >
> >
> > On Thu, Oct 2, 2014 at 8:13 AM, Dagmar  
> > >> wrote:
> >
> > Hello! I hope someone can help me. It would save me days of
> work.
> > Thanks in
> > advance!
> > I have two dataframes which look like these:
> >
> >
> > myframe <- data.frame (Timestamp=c("24.09.2012 09:00:00",
> "24.09.2012
> > 10:00:00",
> > "24.09.2012 11:00:00"), Event=c("low","high","low") )
> > myframe
> >
> >
> > mydata <- data.frame ( Timestamp=c("24.09.2012 09:05:01",
> "24.09.2012
> > 09:49:50", "24.09.2012 09:51:01", "24.09.2012 10:04:50",
> "24.09.2012
> > 10:05:10")
> > , location=c("1","2","3","1","5") )
> > mydata
> >
> >
> > # I want to merge them by time so I have a dataframe which looks
> > like this
> > in the end (i.e. "Low"  during 5 min before and after "high" )
> >
> > result <- data.frame ( Timestamp=c("24.09.2012 09:05:01",
> "24.09.2012
> > 09:49:50", "24.09.2012 09:51:01", "24.09.2012 10:04:50",
> "24.09.2012
> > 10:05:10")
> > , location=c("1","2","3","1","5") ,
> > Event=c("low", "low","high","high","low"))
> > result
> >
> > Anyone knows how do merge them?
> > Best regards,
> > Dagmar
> >
> > __
> > R-help@r-project.org 
> >
> mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible
> code.
> >
> >
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org  mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.