Re: [R] Simple spectral analysis

2007-01-09 Thread Earl F. Glynn
"Georg Hoermann" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> Peter Dalgaard wrote:
>> Earl F. Glynn wrote:

> Thanks a lot for the help. I will post the script when its ready
> (an introduction for our biology students to time series, just 8 hours)

I've been working with one of our labs here to find "cyclic" genes from 
microarray data.  The supplement for the paper we published is here (I 
haven't bothered making the code general enough for a package yet):

http://research.stowers-institute.org/efg/2005/LombScargle/index.htm



Normally we only have about 20 to 50 points in our time series for each 
gene.  With missing data a problem, I used a "Lomb-Scargle" approach to find 
the periodicity.  With Fourier analysis, one must impute any missing data 
points, but with Lomb-Scargle you just process the data you have without any 
imputation.



Perhaps you or your students would be interested in the "Numerical 
Experiments" on this page 
http://research.stowers-institute.org/efg/2005/LombScargle/supplement/NumericalExperiments/index.htm



I was curious how well the Lomb-Scargle technique would work with your 
data -- see the R code below.   Normally the Lomb-Scargle periodogram shows 
a single peak when there is a single dominant frequency.  The Peak 
Significance curve for all your data is a difficult to interpret, and I'm 
not sure the statistical tests are valid (without some tweaks) for your size 
dataset.



I took a random sample of 50 of your ~3000 data points and analyzed those --  
see the second code block below.  [For 50 data points I know all the 
assumptions are "good enough" for the statistics being computed.]  The 
periodogram here shows a single peak for period 365.6 days, which has good 
statistical significance.  Other subset samples can show harmonic 
frequencies, sometimes.





# efg, 9 Jan 2007

air = read.csv("http://www.hydrology.uni-kiel.de/~schorsch/air_temp.csv";)
#air <- read.csv("air_temp.csv")

TempAirC <- air$T_air
Time <- as.Date(air$Date, "%d.%m.%Y")
N <- length(Time)

# Lomb-Scargle code
source("http://research.stowers-institute.org/efg/2005/LombScargle/R/LombScargle.R";)
MAXSPD <<- 1500
unit <<- "day"
M <- N# Usually use factor of 2 or 4, but with large N use 1 instead

# Look at test frequencies corresponding to periods of 200 days to 500 days: 
f = 1/T
TestFrequencies <- (1/500) + (1/200 - 1/500) * (1:M / M)

# Use Horne & Baliunas' estimate of independent frequencies
Nindependent <- NHorneBaliunas(length(Time))  # valid for this size?

# Fairly slow with this large dataset

ComputeAndPlotLombScargle(as.numeric(Time), TempAirC,
  TestFrequencies, Nindependent,
  "Air Temperature [C]")





# Could get good results with fewer points too, say 50 chosen at random

MAXSPD <<- 25
TempAirC <- air$T_air
Time <- as.Date(air$Date, "%d.%m.%Y")

set.seed(19)  # For reproducible results
RandomSet <- sample(1:length(Time), 50)
TempAirC <- TempAirC[RandomSet]
Time <-Time[RandomSet]

N <- length(Time)
M <- 4 * N# Usually use factor of 2 or 4

# Look at test frequencies corresponding to periods of 200 days to 500 days: 
f = 1/T
TestFrequencies <- (1/500) + (1/200 - 1/500) * (1:M / M)

# Use Horne & Baliunas' estimate of independent frequencies
Nindependent <- NHorneBaliunas(length(Time))

# Very fast to compute for only 50 points

ComputeAndPlotLombScargle(as.numeric(Time), TempAirC,
  TestFrequencies, Nindependent,
  "Air Temperature [C]")





efg



Earl F. Glynn

Scientific Programmer

Stowers Institute

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple spectral analysis

2007-01-08 Thread Petr Pikal
Hi

without beeing specific in spectrum analysis you will get frequencies 
and spectral densities fro spectrum()

>From help page

An object of class "spec", which is a list containing at least the 
following components: 

freq vector of frequencies at which the spectral density is 
estimated. (Possibly approximate Fourier frequencies.) The units are 
the reciprocal of cycles per unit time (and not per observation 
spacing): see Details below. 
spec Vector (for univariate series) or matrix (for multivariate 
series) of estimates of the spectral density at frequencies 
corresponding to freq. 



This is the important part:

**The result is returned invisibly if plot is true.**

So if you call

spectrum(data) you will get plot but in case

sp <- spectrum(data)

you will get also object sp which has above mentioned components. 
Actual periods are obtainable by

n/sp$freq

HTH
Petr

On 8 Jan 2007 at 17:12, Georg Hoermann wrote:

Date sent:  Mon, 08 Jan 2007 17:12:34 +0100
From:   Georg Hoermann <[EMAIL PROTECTED]>
To: r-help@stat.math.ethz.ch
Subject:    [R] Simple spectral analysis

> Hello world,
> 
> I am actually trying to transfer a lecture from Statistica to
> R and I ran into problems with spectral analysis, I think I
> just don't get it 8-(
> (The posting from "FFT, frequs, magnitudes, phases" from 2005
> did not enlighten me)
> 
> As a starter for the students I have a 10year data set of air 
> temperature with daily values  and I try to
> get a periodogram where the annual period (365 days) should be clearly
> visible (in statistica I can get the frequencies and the period). I
> tried the spectrum() and pgram() functions, but did not find a way
> through... The final aim would be to get the periodogram (and the
> residuals from the reassembled data set...)
> 
> Thanks and greetings,
> Georg
> 
> The data set:
> 
> air =
> read.csv("http://www.hydrology.uni-kiel.de/~schorsch/air_temp.csv";)
> airtemp = ts(T_air, start=c(1989,1), freq = 365) plot(airtemp)
> 
> 
> -- 
> Georg Hoermann, Dep. of Hydrology, Ecology, Kiel University, Germany
> +49/431/23761412, mo: +49/171/4995884, icq:348340729, skype: ghoermann
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.

Petr Pikal
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple spectral analysis

2007-01-08 Thread Georg Hoermann
Peter Dalgaard wrote:
> Earl F. Glynn wrote:
>>   
> The defaults for detrending and tapering could be involved. Putting, 
> e.g., detrend=F gives me a spectrum with substantially higher 
> low-frequency components.
> 
> But what was the problem in the first place?
> 
understanding how this things work in R compared to other packages 8-)

Thanks a lot for the help. I will post the script when its ready
(an introduction for our biology students to time series, just 8 hours)

Georg



-- 
Georg Hoermann, Dep. of Hydrology, Ecology, Kiel University, Germany
+49/431/23761412, mo: +49/171/4995884, icq:348340729, skype: ghoermann

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple spectral analysis

2007-01-08 Thread Peter Dalgaard
Earl F. Glynn wrote:
> "Georg Hoermann" <[EMAIL PROTECTED]> wrote in message 
> news:[EMAIL PROTECTED]
>   
>> The data set:
>>
>> air = read.csv("http://www.hydrology.uni-kiel.de/~schorsch/air_temp.csv";)
>> airtemp = ts(T_air, start=c(1989,1), freq = 365)
>> plot(airtemp)
>> 
>
> Maybe this will get you started using fft or spectrum -- I'm not sure why 
> the spectrum answer is only close:
>   
The defaults for detrending and tapering could be involved. Putting, 
e.g., detrend=F gives me a spectrum with substantially higher 
low-frequency components.

But what was the problem in the first place?

spec.pgram(airtemp,xlim=c(0,10))

abline(v=1:10,col="red")


shows a strong peak at 1 and maybe a weak peak at 2, and the other 
integer frequencies less pronounced. This seems reasonably in tune with

> x <- (1:3652)/365

> summary(lm(air$T_air ~ sin(2*pi*x)+cos(2*pi*x)+  sin(4*pi*x)+cos(4*pi*x) +  
> sin(6*pi*x)+cos(6*pi*x)+x))

Call:

lm(formula = air$T_air ~ sin(2 * pi * x) + cos(2 * pi * x) + 

sin(4 * pi * x) + cos(4 * pi * x) + sin(6 * pi * x) + cos(6 * 

pi * x) + x)

Residuals:

 Min   1Q   Median   3Q  Max 

-16.3109  -2.3317  -0.1080   2.2063  10.6249 

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept)  9.679040.11267  85.909   <2e-16 ***

sin(2 * pi * x) -2.645540.07967 -33.208   <2e-16 ***

cos(2 * pi * x) -7.735200.07938 -97.443   <2e-16 ***

sin(4 * pi * x)  0.929670.07948  11.696   <2e-16 ***

cos(4 * pi * x)  0.139820.07938   1.761   0.0783 .  

sin(6 * pi * x)  0.133200.07945   1.676   0.0937 .  

cos(6 * pi * x)  0.144800.07938   1.824   0.0682 .  

x   -0.237730.01952 -12.179   <2e-16 ***

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 


Residual standard error: 3.393 on 3644 degrees of freedom

Multiple R-Squared: 0.7486, Adjusted R-squared: 0.7482 

F-statistic:  1550 on 7 and 3644 DF,  p-value: < 2.2e-16 





> air = read.csv("http://www.hydrology.uni-kiel.de/~schorsch/air_temp.csv";)
>
> TempAirC <- air$T_air
> Time <- as.Date(air$Date, "%d.%m.%Y")
> N <- length(Time)
>
> oldpar <- par(mfrow=c(4,1))
> plot(TempAirC ~ Time)
>
> # Using fft
> transform <- fft(TempAirC)
>
> # Extract DC component from transform
> dc <- Mod(transform[1])/N
>
> periodogram <- round( Mod(transform)^2/N, 3)
>
> # Drop first element, which is the mean
> periodogram <- periodogram[-1]
>
> # keep first half up to Nyquist limit
> periodogram <- periodogram[1:(N/2)]
>
> # Approximate number of data points in single cycle:
> print( N / which(max(periodogram) == periodogram) )
>
> # plot spectrum against Fourier Frequency index
> plot(periodogram, col="red", type="o",
>  xlab="Fourier Frequency Index", xlim=c(0,25),
>  ylab="Periodogram",
>  main="Periodogram derived from 'fft'")
>
> # Using spectrum
> s <- spectrum(TempAirC, taper=0, detrend=FALSE, col="red",
>   main="Spectral Density")
>
> plot(log(s$spec) ~ s$freq, col="red", type="o",
>  xlab="Fourier Frequency", xlim=c(0.0, 0.005),
>  ylab="Log(Periodogram)",
>  main="Periodogram from 'spectrum'")
>
> cat("Max frequency\n")
> maxfreq <- s$freq[ which(max(s$spec) == s$spec) ]
>
> # Period will be 1/frequency:
> cat("Corresponding period\n")
> print(1/maxfreq)
>
> par(oldpar)
>
>
>
> efg
>
> Earl F. Glynn
> Scientific Programmer
> Stowers Institute
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple spectral analysis

2007-01-08 Thread Earl F. Glynn
"Georg Hoermann" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> The data set:
>
> air = read.csv("http://www.hydrology.uni-kiel.de/~schorsch/air_temp.csv";)
> airtemp = ts(T_air, start=c(1989,1), freq = 365)
> plot(airtemp)

Maybe this will get you started using fft or spectrum -- I'm not sure why 
the spectrum answer is only close:

air = read.csv("http://www.hydrology.uni-kiel.de/~schorsch/air_temp.csv";)

TempAirC <- air$T_air
Time <- as.Date(air$Date, "%d.%m.%Y")
N <- length(Time)

oldpar <- par(mfrow=c(4,1))
plot(TempAirC ~ Time)

# Using fft
transform <- fft(TempAirC)

# Extract DC component from transform
dc <- Mod(transform[1])/N

periodogram <- round( Mod(transform)^2/N, 3)

# Drop first element, which is the mean
periodogram <- periodogram[-1]

# keep first half up to Nyquist limit
periodogram <- periodogram[1:(N/2)]

# Approximate number of data points in single cycle:
print( N / which(max(periodogram) == periodogram) )

# plot spectrum against Fourier Frequency index
plot(periodogram, col="red", type="o",
 xlab="Fourier Frequency Index", xlim=c(0,25),
 ylab="Periodogram",
 main="Periodogram derived from 'fft'")

# Using spectrum
s <- spectrum(TempAirC, taper=0, detrend=FALSE, col="red",
  main="Spectral Density")

plot(log(s$spec) ~ s$freq, col="red", type="o",
 xlab="Fourier Frequency", xlim=c(0.0, 0.005),
 ylab="Log(Periodogram)",
 main="Periodogram from 'spectrum'")

cat("Max frequency\n")
maxfreq <- s$freq[ which(max(s$spec) == s$spec) ]

# Period will be 1/frequency:
cat("Corresponding period\n")
print(1/maxfreq)

par(oldpar)



efg

Earl F. Glynn
Scientific Programmer
Stowers Institute

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Simple spectral analysis

2007-01-08 Thread Georg Hoermann
Hello world,

I am actually trying to transfer a lecture from Statistica to
R and I ran into problems with spectral analysis, I think I
just don't get it 8-(
(The posting from "FFT, frequs, magnitudes, phases" from 2005
did not enlighten me)

As a starter for the students I have a 10year data set of air 
temperature with daily values  and I try to
get a periodogram where the annual period (365 days) should be clearly
visible (in statistica I can get the frequencies and the period).
I tried the spectrum() and pgram() functions, but
did not find a way through... The final aim would be to
get the periodogram (and the residuals from the reassembled data set...)

Thanks and greetings,
Georg

The data set:

air = read.csv("http://www.hydrology.uni-kiel.de/~schorsch/air_temp.csv";)
airtemp = ts(T_air, start=c(1989,1), freq = 365)
plot(airtemp)


-- 
Georg Hoermann, Dep. of Hydrology, Ecology, Kiel University, Germany
+49/431/23761412, mo: +49/171/4995884, icq:348340729, skype: ghoermann

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.