date:20120129

Re: [R] question about ... questions/ code

2012-01-29 Thread R. Michael Weylandt

It's standard form to cc the list so that replies and experience get pooled.

This sounds like a problem with your data set to be honest. Can you
post a link to it? If it's not publicly available, there's probably
nothing that we (having no access to it) can do to help interpret.

If you know the structure of the excel file, the colnames argument of
read.table will let you say what you want the names to be once they're
in R: that could be of help.

Michael

On Mon, Jan 30, 2012 at 12:02 AM, Nicole Marie Ford  wrote:
> Michael,
>
> I am sorry if I was not clear.
>
> The code book is a list of questions (over 800) from the Russia Barometer 
> survey from 2009.  I am sure it exists somewhere in hardcopy, however I 
> downloaded it from UK archive, and is in Excel.
>
> Other datasets I have used list their variables.  Further, they tend to be 
> intuitive, meaning, for example, question 9 is listed as variable Q9 if I 
> were to do names(dataset),  etc.  I recode them later for my own use.  But 
> this layout makes it easier to find which named variable in the dataset goes 
> to which question.  Clearly, that is not what is happening here.
>
> Does that make sense?
>
> ~Nicole
>
>
>
> - Original Message -
> From: "R. Michael Weylandt" 
> To: "Nicole Marie Ford" 
> Cc: "r-help" 
> Sent: Sunday, January 29, 2012 9:51:37 PM
> Subject: Re: [R] question about ... questions/ code
>
> Perhaps I'm misunderstanding you, but it doesn't sound like this is
> much of an R question at all: what is "the code book"? If it's an
> actual (dead tree) book, I don't think there's much you can do in R to
> automate identifications; if it's an online API, you *might* be able
> to rig a matching algorithm.
>
> Still, hopefully if you say a little more about your data source, it
> might be possible to help -- it doesn't sound like a pleasant
> situation at all...
>
> Michael
>
> Here's a (very untested) shot at matching known levels to columns of
> levels, but it's assuming there's going to be a perfect match, and
> that your book encodes them the same way as the data source etc.
>
> LEVS <- lapply(RB09, levels)
> thingToMatch = c("A", "b", "c")
> which(sapply(LEVS, function(x,y) all(match(x,y, nomatch = 0L) > 0L),
> thingToMatch))
>
>
> On Sun, Jan 29, 2012 at 12:31 PM, Nicole Marie Ford  wrote:
>> Hello,
>>
>> I have a dataset which I am calling RB09.
>>
>> I am trying to match the questions in the code book with variable codes.
>>
>> It is not very intuitive.
>>
>> example:
>>
>>  names(RB09)
>>  [1] "ea1"        "eaf1"       "eaf1a"      "eaf2"       "eaf2_7"
>>  [6] "eaf3"       "eafimpun"   "eafunpun"   "evimpmar"   "evfutpro"
>>  [11] "ecjoh"      "eaf4a"      "eaf5"       "eaf6a"      "eaf6b"
>>  [16] "eaf6c"      "eaf6d"      "eaf6e"      "eaf7a"      "eaf7b"
>>
>> (there are over 800 of these)
>>
>> questions looks like this:
>>
>> B16a. Most people in this country
>>        Trusts
>>        Neutral
>>        Does not trust
>> however, there is no variable B16a.  there is one that is "ssb16a" but as 
>> you can see:
>>
>>> levels(RB09$ss16a)
>> [1] "yes"       "no"        "dont know" "na"        "dk"
>>
>> The levels are not the same.  SO I don't think this is correctly matched.
>>
>> Is there an easy way to find out what -for example- which question "eaf6c"
>> goes to?
>>
>> Also, I know there is a way to search key words and find the variables which 
>> match.  I have done this before and can't find the code.
>>
>> Any direction would help.  Thanks.
>>
>> ~Nicole
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculate a function repeatedly over sections of a ts object

2012-01-29 Thread R. Michael Weylandt

Sorry, that last line should read:

FUN=function(z){
  lz <- length(z)
  SDF(z,method="lag window",
window=taper(type="parzen",n.sample=lz,cutoff= 2*sqrt(lz)), npad=2*lz)
}

On Sun, Jan 29, 2012 at 11:29 PM, R. Michael Weylandt
 wrote:
> It's customary to keep the list cc'd.
>
> I can't run your code without the data, but it does seem to me that
> your problem is in the FUN argument, as you guess.
>
> You have:
>
> FUN=function(z) SDF(adezoo,method="lag window",
> window=taper(type="parzen",n.sample=n.d,cutoff=(2*sqrt(n.d))),
> npad=2*n.d)
>
> But this function doesn't actually act on it's argument: you tell it
> to accept something called "z" but then it never gets told to do
> anything to "z". Perhaps you meant
>
> FUN=function(z) SDF(z,method="lag window",
> window=taper(type="parzen",n.sample=n.d,cutoff=(2*sqrt(n.d))),
> npad=2*n.d)
>
> I also worry about your use of "n.d"; are you sure you don't want to
> use the length of the rolling window? Something more like:
>
> FUN=function(z){
>   lz <- length(z)
>   SDF(z,method="lag window",
> window=taper(type="parzen",n.sample=lz,cutoff= 2*sqrt(lz)),
> npad=2*nlz)
> }
>
> Does that fix it?
>
> Michael
>
> On Fri, Jan 27, 2012 at 1:06 PM, Jorge Molinos  wrote:
>> Hi Michael,
>>
>> Sorry, I've been trying to use rollapply with my function but it seems I 
>> can't get it to work properly. The function seems to be dividing the time 
>> series accordingly (every 1) and using the correct length for the time 
>> window (10 years) but when I look at the results all of them are the same 
>> for all the subseries which doesn't make sense. The problem has to be within 
>> the FUN argument though I cannot figure out what it is. Would you mind 
>> checking on the code to see if you can spot where is the problem?
>>
>> adets<-ts(adeery$DA,c(adeery$Year[1],adeery$Day[1]),frequency=365)
>>
>> adezoo<-as.zoo(adets)
>>
>> n.d<-length(adets)
>>
>> especlist<-rollapply(adezoo, width=3650, FUN=function(z) 
>> SDF(adezoo,method="lag window",
>>    window=taper(type="parzen",n.sample=n.d,cutoff=(2*sqrt(n.d))),
>>    npad=2*n.d), by = 365, align="left")
>>
>>
>> And these are, for example, the SDF values at the last day for each 10-y 
>> subseries (all the same though they should be different as I have it verify 
>> by doing the SDF step by step using the same values for the arguments within 
>> the function):
>>
>>         especlist1.7048
>> 1978(20)    1.998068e-06
>> 1979(20)    1.998068e-06
>> 1980(20)    1.998068e-06
>> 1981(20)    1.998068e-06
>> 1982(20)    1.998068e-06
>> 1983(20)    1.998068e-06
>> 1984(20)    1.998068e-06
>> 1985(20)    1.998068e-06
>> 1986(20)    1.998068e-06
>> 1987(20)    1.998068e-06
>>
>> Thanks a lot.
>>
>> Jorge
>>
>>
>> 
>> From: R. Michael Weylandt [michael.weyla...@gmail.com]
>> Sent: 26 January 2012 21:00
>> To: Jorge Molinos
>> Cc: r-help@R-project.org
>> Subject: Re: [R] Calculate a function repeatedly over sections of a ts object
>>
>> I'm not sure if it's easily doable with a ts class, but the rollapply
>> function in the zoo package will do this easily. (Also, I find zoo to
>> be a much more natural time-series workflow than ts so it might make
>> the rest of your life easier as well)
>>
>> Michael
>>
>> On Thu, Jan 26, 2012 at 2:24 PM, Jorge Molinos  wrote:
>>>
>>> Hi,
>>>
>>> I want to apply a function (in my case SDF; package “sapa”) repeatedly over 
>>> discrete sections of a daily time series object by sliding a time window of 
>>> constant length (e.g. 10 consecutive years or 1825 days) over the entire ts 
>>> at increments of 1 time unit (e.g. 1 year or 365 days). So for example, the 
>>> first SDF would be calculated for the daily values of my variable recorded 
>>> between years 1 to 5, SDF2 to those for years 2 to 6 and so on until the 
>>> total length of the series is covered. How can I implement this into a R 
>>> script? Any help is much appreciated.
>>>
>>> Jorge
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculate a function repeatedly over sections of a ts object

2012-01-29 Thread R. Michael Weylandt

It's customary to keep the list cc'd.

I can't run your code without the data, but it does seem to me that
your problem is in the FUN argument, as you guess.

You have:

FUN=function(z) SDF(adezoo,method="lag window",
window=taper(type="parzen",n.sample=n.d,cutoff=(2*sqrt(n.d))),
npad=2*n.d)

But this function doesn't actually act on it's argument: you tell it
to accept something called "z" but then it never gets told to do
anything to "z". Perhaps you meant

FUN=function(z) SDF(z,method="lag window",
window=taper(type="parzen",n.sample=n.d,cutoff=(2*sqrt(n.d))),
npad=2*n.d)

I also worry about your use of "n.d"; are you sure you don't want to
use the length of the rolling window? Something more like:

FUN=function(z){
   lz <- length(z)
   SDF(z,method="lag window",
window=taper(type="parzen",n.sample=lz,cutoff= 2*sqrt(lz)),
npad=2*nlz)
}

Does that fix it?

Michael

On Fri, Jan 27, 2012 at 1:06 PM, Jorge Molinos  wrote:
> Hi Michael,
>
> Sorry, I've been trying to use rollapply with my function but it seems I 
> can't get it to work properly. The function seems to be dividing the time 
> series accordingly (every 1) and using the correct length for the time window 
> (10 years) but when I look at the results all of them are the same for all 
> the subseries which doesn't make sense. The problem has to be within the FUN 
> argument though I cannot figure out what it is. Would you mind checking on 
> the code to see if you can spot where is the problem?
>
> adets<-ts(adeery$DA,c(adeery$Year[1],adeery$Day[1]),frequency=365)
>
> adezoo<-as.zoo(adets)
>
> n.d<-length(adets)
>
> especlist<-rollapply(adezoo, width=3650, FUN=function(z) 
> SDF(adezoo,method="lag window",
>    window=taper(type="parzen",n.sample=n.d,cutoff=(2*sqrt(n.d))),
>    npad=2*n.d), by = 365, align="left")
>
>
> And these are, for example, the SDF values at the last day for each 10-y 
> subseries (all the same though they should be different as I have it verify 
> by doing the SDF step by step using the same values for the arguments within 
> the function):
>
>         especlist1.7048
> 1978(20)    1.998068e-06
> 1979(20)    1.998068e-06
> 1980(20)    1.998068e-06
> 1981(20)    1.998068e-06
> 1982(20)    1.998068e-06
> 1983(20)    1.998068e-06
> 1984(20)    1.998068e-06
> 1985(20)    1.998068e-06
> 1986(20)    1.998068e-06
> 1987(20)    1.998068e-06
>
> Thanks a lot.
>
> Jorge
>
>
> 
> From: R. Michael Weylandt [michael.weyla...@gmail.com]
> Sent: 26 January 2012 21:00
> To: Jorge Molinos
> Cc: r-help@R-project.org
> Subject: Re: [R] Calculate a function repeatedly over sections of a ts object
>
> I'm not sure if it's easily doable with a ts class, but the rollapply
> function in the zoo package will do this easily. (Also, I find zoo to
> be a much more natural time-series workflow than ts so it might make
> the rest of your life easier as well)
>
> Michael
>
> On Thu, Jan 26, 2012 at 2:24 PM, Jorge Molinos  wrote:
>>
>> Hi,
>>
>> I want to apply a function (in my case SDF; package “sapa”) repeatedly over 
>> discrete sections of a daily time series object by sliding a time window of 
>> constant length (e.g. 10 consecutive years or 1825 days) over the entire ts 
>> at increments of 1 time unit (e.g. 1 year or 365 days). So for example, the 
>> first SDF would be calculated for the daily values of my variable recorded 
>> between years 1 to 5, SDF2 to those for years 2 to 6 and so on until the 
>> total length of the series is covered. How can I implement this into a R 
>> script? Any help is much appreciated.
>>
>> Jorge
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ColorBrewer question

2012-01-29 Thread R. Michael Weylandt

I believe you need to use the scale_fill_brewer since fill is the
color of the bars while color is the outside of the bars in
ggplot2-speak:

E.g., with built-in data (it's polite to provide yours so that your
minimal working example is working):

data(diamonds)
ggplot(diamonds, aes(clarity)) + geom_bar(aes(fill = clarity, color = clarity))

# Note the borders are now changed but the fill is the same
ggplot(diamonds, aes(clarity)) + geom_bar(aes(fill = clarity, color =
clarity)) + scale_color_brewer(pal = "Blues")

# Now the fill is changed, but you probably want to drop the border
coloring since it's hideous against the blues
ggplot(diamonds, aes(clarity)) + geom_bar(aes(fill = clarity, color =
clarity)) + scale_fill_brewer(pal = "Blues")

# So lovely
ggplot(diamonds, aes(clarity)) + geom_bar(aes(fill = clarity)) +
scale_fill_brewer(pal = "Blues")

Michael

On Sun, Jan 29, 2012 at 12:21 PM, Mario Giesel  wrote:
> Hello, R friends,
>
> I'm trying to change colors of my horizontal bars so that they show a 
> sequence.
> I chose the ColorBrewer palette "Blues". However the resulting plot doesn't 
> show any changes to the default.
> I tried several places of "+ scale_colour_brewer(type="seq", pal = "Blues")" 
> with no effect.
> This is my code:
>
> p <- ggplot(data, aes(x = gender))  + 
> scale_y_continuous("",formatter="percent") + xlab("Gender") + coord_flip() +  
>    scale_colour_brewer(type="seq", pal = "Blues")
> p+geom_bar(aes(fill=pet),colour='black',position='fill')
>
>
> Any ideas welcome.
> Thanks,
>  Mario
>        [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question about ... questions/ code

2012-01-29 Thread R. Michael Weylandt

Perhaps I'm misunderstanding you, but it doesn't sound like this is
much of an R question at all: what is "the code book"? If it's an
actual (dead tree) book, I don't think there's much you can do in R to
automate identifications; if it's an online API, you *might* be able
to rig a matching algorithm.

Still, hopefully if you say a little more about your data source, it
might be possible to help -- it doesn't sound like a pleasant
situation at all...

Michael

Here's a (very untested) shot at matching known levels to columns of
levels, but it's assuming there's going to be a perfect match, and
that your book encodes them the same way as the data source etc.

LEVS <- lapply(RB09, levels)
thingToMatch = c("A", "b", "c")
which(sapply(LEVS, function(x,y) all(match(x,y, nomatch = 0L) > 0L),
thingToMatch))

On Sun, Jan 29, 2012 at 12:31 PM, Nicole Marie Ford  wrote:
> Hello,
>
> I have a dataset which I am calling RB09.
>
> I am trying to match the questions in the code book with variable codes.
>
> It is not very intuitive.
>
> example:
>
>  names(RB09)
>  [1] "ea1"        "eaf1"       "eaf1a"      "eaf2"       "eaf2_7"
>  [6] "eaf3"       "eafimpun"   "eafunpun"   "evimpmar"   "evfutpro"
>  [11] "ecjoh"      "eaf4a"      "eaf5"       "eaf6a"      "eaf6b"
>  [16] "eaf6c"      "eaf6d"      "eaf6e"      "eaf7a"      "eaf7b"
>
> (there are over 800 of these)
>
> questions looks like this:
>
> B16a. Most people in this country
>        Trusts
>        Neutral
>        Does not trust
> however, there is no variable B16a.  there is one that is "ssb16a" but as you 
> can see:
>
>> levels(RB09$ss16a)
> [1] "yes"       "no"        "dont know" "na"        "dk"
>
> The levels are not the same.  SO I don't think this is correctly matched.
>
> Is there an easy way to find out what -for example- which question "eaf6c"
> goes to?
>
> Also, I know there is a way to search key words and find the variables which 
> match.  I have done this before and can't find the code.
>
> Any direction would help.  Thanks.
>
> ~Nicole
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply lm() to each row of a matrix

2012-01-29 Thread Bert Gunter

Reread ?lm and note that the lhs can be a matrix. I believe this is
exactly what you want.

-- Bert

On Sun, Jan 29, 2012 at 2:05 PM, Martin Batholdy
 wrote:
> Hi,
>
>
> I would like to fit lm-models to a matrix with 'samples' of a dependent 
> variable (each row represents one sample of the dependent variable).
> The independent variable is a vector that stays the same:
>
>
> y <- c(1:10)
> x <- matrix(rnorm(5*10,0,1), 5, 10)
>
>
>
> now I would like to avoid looping over the rows, since my original matrix is 
> much larger;
>
>
>
> for(t in 1:dim(x)[1]) {
>
>        print(lm(y ~ x[t,]))
>
> }
>
>
> Is there a time-efficient way to do this?
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply lm() to each row of a matrix

2012-01-29 Thread R. Michael Weylandt

If it's a simple one variable OLS regression and you only need
regression coefficients, you'll probably get best performance by
hard-coding the closed form solutions. apply() might help a little
(since it's a very good loop) but ultimately you'll be best served by
deciding exactly what you want and calculating that.

If you feel more comfortable setting up the regression yourself, you
can eliminate R's work in setting up the regression model & go
straight to the lm.fit() workhorse inside of lm().

Perhaps you can say a little more about what exactly you need?

Michael

On Sun, Jan 29, 2012 at 5:05 PM, Martin Batholdy
 wrote:
> Hi,
>
>
> I would like to fit lm-models to a matrix with 'samples' of a dependent 
> variable (each row represents one sample of the dependent variable).
> The independent variable is a vector that stays the same:
>
>
> y <- c(1:10)
> x <- matrix(rnorm(5*10,0,1), 5, 10)
>
>
>
> now I would like to avoid looping over the rows, since my original matrix is 
> much larger;
>
>
>
> for(t in 1:dim(x)[1]) {
>
>        print(lm(y ~ x[t,]))
>
> }
>
>
> Is there a time-efficient way to do this?
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] apply lm() to each row of a matrix

2012-01-29 Thread Martin Batholdy

Hi,


I would like to fit lm-models to a matrix with 'samples' of a dependent 
variable (each row represents one sample of the dependent variable).
The independent variable is a vector that stays the same:


y <- c(1:10)
x <- matrix(rnorm(5*10,0,1), 5, 10)



now I would like to avoid looping over the rows, since my original matrix is 
much larger;



for(t in 1:dim(x)[1]) {

print(lm(y ~ x[t,]))

}


Is there a time-efficient way to do this?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] percentage from density()

2012-01-29 Thread William Dunlap

If v is your original data,
   v <- c(-20, rep(0,98), 20)
why not use
   mean( -20 < v & v < 2)
as your estimate of the probability that v is in (-20,2)?

Estimating a density is like taking the derivative
of a smooth of the empirical distribution function,
so why not eliminate the middleman instead of integrating
the estimated density?  Any difference between the two
methods tells more about the smoothing used than about
the data involved.  (Not that I am any sort of expert
in this matter.)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf Of Greg Snow
> Sent: Saturday, January 28, 2012 8:12 PM
> To: Duke; r-help@r-project.org
> Subject: Re: [R] percentage from density()
> 
> If you use logspline estimation (logspline package) instead of kernel density 
> estimation then this is
> simple as there are cumulative area functions for logspline fits.
> 
> If you need to do this with kernel density estimates then you can just find 
> the area over your region
> for the kernel centered at each data point and average those values together 
> to get the area under the
> entire density estimate.
> 
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf Of Duke
> Sent: Friday, January 27, 2012 3:45 PM
> To: r-help@r-project.org
> Subject: [R] percentage from density()
> 
> Hi folks,
> 
> I know that density function will give a estimated density for a give
> dataset. Now from that I want to have a percentage estimation for a
> certain range. For examle:
> 
>  > y = density(c(-20,rep(0,98),20))
>  > plot(y, xlim=c(-4,4))
> 
> Now if I want to know the percentage of data lying in (-20,2). Basically
> it should be the area of the curve from -20 to 2. Anybody knows a simple
> function to do it?
> 
> Thanks,
> 
> D.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] formula error inside function

2012-01-29 Thread Hugh Morgan


Hi,

I have what I suppose is the same problem as this.  I am using the 
linear mixed model function lme, and this does not seems to take the 
attribute "model=TRUE" at the end of the function.


Is there a more general way of solving this problem?
Is my description of the problem below correct (from my understanding of 
cran.r-project.org/doc/contrib/Fox-Companion/appendix-scope.pdf)?


Using the test script:

calculate_mixed_model_p <- function() {
  dataObj=read.csv('dataMini.csv', header=TRUE, sep=",", dec=".")
  colnames(dataObj)
  attach(dataObj)
  library(nlme)

  formulaGenotype = test_variable~Genotype + Gender
  formulaNull = test_variable~Gender
  finalModelGenotype = lme(formulaGenotype, random=~1|Date, dataObj, 
na.action="na.omit", method="ML", keep.data = TRUE)
  finalModelNull = lme(formulaNull, random=~1|Date, dataObj, 
na.action="na.omit", method="ML")

  anovaModel = anova (finalModelGenotype,finalModelNull)
  print(anovaModel)
}

Fails with:

Error in eval(expr, envir, enclos) : object 'formulaGenotype' not found

I THINK function lme(...) constructs an object (finalModelGenotype) that 
has as part of it a link (pointer?) to object formulaGenotype.  During 
construction this is in the function scope as it was passed to it.  When 
finalModelGenotype is later passed to function anova(...) the link is 
still there but as the lme(...) scope no longer exists the link is now 
broken.


Any help greatly apperitiated,

Hugh

PS, I tried to make this script self contained, and generated the data 
object with the following lines.  It looks identical when you print it, 
but the lme function fails with error at [2].  If someone was to tell me 
what I am doing wrong I may be able to post easier scripts.


[1]
  
dataObj=data.frame(test_variable=c("23.0","20.2","23.8","25.6","24.6","22.7","27.7","27.5","23.5","22.8","22.3","20.9","26.6","23.8","24.5","26.8","23.2","29.9","23.3","22.5","22.2","27.2","28.1","24.5","22.7","20.7","26.2","27.1","22.0","22.2","26.7","28.5","22.2","22.1","25.3","21.7","29.3"),

Gender=c("Female","Female","Male","Male","Male","Female","Male","Male","Female","Female","Female","Female","Male","Male","Male","Male","Female","Male","Female","Female","Female","Male","Male","Male","Female","Female","Male","Male","Female","Female","Male","Male","Female","Female","Female","Female","Male"),

Genotype=c("10028","10028","10028","10028","10028","10028","10028","10028","10028","10028","10028","10028","10028","10028","10028","10028","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0"),

Assay.Date=c("01/07/2009","01/07/2009","01/07/2009","01/07/2009","01/07/2009","01/07/2009","01/07/2009","01/07/2009","01/07/2009","07/07/2010","07/07/2010","07/07/2010","01/07/2009","07/07/2010","07/07/2010","07/07/2010","02/06/2010","02/06/2010","02/06/2010","02/06/2010","02/06/2010","02/06/2010","02/06/2010","02/06/2010","17/06/2010","17/06/2010","17/06/2010","17/06/2010","16/06/2010","16/06/2010","16/06/2010","16/06/2010","22/06/2010","22/06/2010","22/06/2010","22/06/2010","22/06/2010"),

Weight=c("9.9","9.5","9.9","10","9.9","9.8","10.2","10.4","9.9","9.8","9.9","9.5","9.8","9.5","9.8","9.9","9.5","10","9.8","9.5","9.7","10","10.2","9.9","9.9","9.5","10","10","9.8","9.9","10.2","10.1","9.8","9.9","10.2","9.8","10")

  )

[2]

Error in `rownames<-`(`*tmp*`, value = c("1", "2", "3", "4", "5", "6",  :
  attempt to set rownames on object with no dimensions
In addition:Warning message:
In Ops.factor(y[revOrder], Fitted) : - not meaningful for factors



On 01/25/2012 01:25 PM, Terry Therneau wrote:

I want use survfit() and basehaz() inside a function, but it doesn't
work. Could you take a look at this problem. Thanks for your help.

Your problem has to do with environments, and these lines

fmla<- as.formula("Surv(time, event) ~ Z1 + Z2")
BaseFun<- function(x){
 start.coxph<- coxph(x, phmmd)
  ...
 survfit(start.coxph)
 }
Basefun(fmla)

The survfit routine needs to reconstruct the model matrix, and by
default in R this is done in the context where the model formula was
first defined.  Unfortunately this is outside the function, leading to
problems -- your argument "x" is is unknown in the outer envirnoment.
The solution is to add "model=TRUE" to the coxph call so that the model
frame is saved and survfit doesn't have to do reconstruction.

If you think this should work as is, well, so do I.  I spent a lot of
time on this issue a few months ago and finally threw in the towel.  The
interaction of environments with model.frame and model.matrix is subtle
and far from obvious.  (Just to be clear: I didn't say "broken".  Each
aspect of the process has well thought out reasons.)  The standard
modeling functions lm, glm, etc changed their defaults from model=F to
model=T at some point.  This costs some space&  memory, but coxph may
need to do the same.

Terry T

__
R-help@r-projec

Re: [R] How I assign the result of a plot to a variable?

2012-01-29 Thread baptiste auguie

Try this,

plot(1:10)
img <- grid::grid.cap()
# grid.raster(img)
stream <- col2rgb(img)
write.table(stream, file="dam.txt", row.names = FALSE,
col.names = FALSE)

(you'll have to restore the dimensions of the matrix once you've read
the rgb values for each pixel)

HTH,

baptiste

On 30 January 2012 08:27, Ajay Askoolum  wrote:
> I can write a plot to a files of a given format using this:
>
> x<-sample(c(1:100),10)
> bmp("c:/mygraph.bmp")
> plot(x)
> dev.off()
>
> and then show the image file in another application. This application can 
> also display the image from the stream of numbers that define the image.
>
> How I can get the plot as a stream of numbers?
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How I assign the result of a plot to a variable?

2012-01-29 Thread Ajay Askoolum

I can write a plot to a files of a given format using this:

x<-sample(c(1:100),10)
bmp("c:/mygraph.bmp")
plot(x)
dev.off()

and then show the image file in another application. This application can also 
display the image from the stream of numbers that define the image.

How I can get the plot as a stream of numbers?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How do I turn NA's to zeroes when a combination lacks one element?

2012-01-29 Thread C_Crown

Thank you so very much for your help. That worked perfectly. Yes, it's
homework, but hopefully what I'm learning with the homework I'll be able to
apply to my own research in the future, so I'd really like to learn the
program. And by the by, the "someone" was a classmate. Thanks again! 

Crystal

--
View this message in context: 
http://r.789695.n4.nabble.com/How-do-I-turn-NA-s-to-zeroes-when-a-combination-lacks-one-element-tp4338725p4338869.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How do I turn NA's to zeroes when a combination lacks one element?

2012-01-29 Thread Mitchell Maltenfort

?ifelse

Type that in and you'll be in good shape.

On 1/29/12, C_Crown  wrote:
> Hi all, I am  very new to R. I am taking a course and am trying to complete
> my first assignment.
> For the assignment I have to get the program to generate different color
> combinations possible for a sample size of 55 with the probabilities of each
> color being chosen as: .24, .24, .16, .20, .13, .14. Here is what I've come
> up with...
>
> sample.size<- 55
> MM.probability<- c(.24, .24, .16, .20, .13, .14)
> MM.color<- c('Bl','Br','G','O','R','Y')
> mmtable<- matrix(nrow = 1000, ncol = 6)
> for(i in 1:1000){
> combinations<- sample(MM.color, sample.size, replace = T, prob =
> MM.probability)
> mmtable[i,]<-table(combinations)
> colnames(mmtable)<- c("Bl","Br","G","O","R","Y")
> }
>
> I feel like it should work, but every time I run it, it usually only goes so
> far (maybe to row 350, or 450, sometimes it completes with no problem)
> before I start getting "NA" in every column of every row. I also get this
> error message "Error in mmtable[i, ] <- table(combinations) :  number of
> items to replace is not a multiple of replacement length"
> Someone suggested that it is because the program is coming upon a
> combination that is missing one of the colors, so I'd have to instruct it to
> put a zero in place of the missing color so the simulation can continue,
> which completely makes sense. But I've been trying and can't figure out how
> to do it. I tried "mmtable[is.na(mmtable)]<-0", but then I just get zeroes
> everywhere instead of NA's, and "mmtable[na.omit(mmtable)]" gives me the
> same as no instruction at all. Do you know what the right notation would be?
> I also need to use the script to determine the probability of getting the
> combination with proportions: .182, .164, .309, .145, .091, .1090. I've also
> tried a few things for this and am coming up with nothing. Sorry if this is
> a really simple question, I am sure most of you could do this in your sleep,
> but it's all very new to me. Thanks in advance.
>
> -C
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/How-do-I-turn-NA-s-to-zeroes-when-a-combination-lacks-one-element-tp4338725p4338725.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Sent from my mobile device


Ersatzistician and Chutzpahthologist
I can answer any question.  "I don't know" is an answer. "I don't know yet"
is a better answer.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help joincount.test

2012-01-29 Thread Roger Bivand

CD  gmail.com> writes:

> 
> Hello,
> I'm trying to analyse the spatial organization of different fields planted
> with different varieties (each field has only one variety), but I have
> problems trying to understand the results of the test I did.

> Can somebody help me understanding what really means this "same colour
> statistic"?

It would be sensible to access the reference on the help page of 
joincount.test:

Cliff, A. D., Ord, J. K. 1981 Spatial processes, Pion, p. 20.

and possibly also that on the help page of joincount.multi:

Upton, G., Fingleton, B. 1985 Spatial data analysis by example: point 
pattern and quatitative data, Wiley, pp. 158--170.

If you change the construction of the spatial weights, you should not be 
surprised if test results also change.

You might get more response on the R-sig-geo list.

Roger

> Thanks a lot,
> Camille
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How do I turn NA's to zeroes when a combination lacks one element?

2012-01-29 Thread Berend Hasselman


On 29-01-2012, at 18:18, C_Crown wrote:

> Hi all, I am  very new to R. I am taking a course and am trying to complete
> my first assignment. 
> For the assignment I have to get the program to generate different color
> combinations possible for a sample size of 55 with the probabilities of each
> color being chosen as: .24, .24, .16, .20, .13, .14. Here is what I've come
> up with... 
> 
> sample.size<- 55
> MM.probability<- c(.24, .24, .16, .20, .13, .14)
> MM.color<- c('Bl','Br','G','O','R','Y')
> mmtable<- matrix(nrow = 1000, ncol = 6)
> for(i in 1:1000){
> combinations<- sample(MM.color, sample.size, replace = T, prob =
> MM.probability)
> mmtable[i,]<-table(combinations)
> colnames(mmtable)<- c("Bl","Br","G","O","R","Y")
> }
> 
> I feel like it should work, but every time I run it, it usually only goes so
> far (maybe to row 350, or 450, sometimes it completes with no problem)
> before I start getting "NA" in every column of every row. I also get this
> error message "Error in mmtable[i, ] <- table(combinations) :  number of
> items to replace is not a multiple of replacement length"
> Someone suggested that it is because the program is coming upon a
> combination that is missing one of the colors, so I'd have to instruct it to
> put a zero in place of the missing color so the simulation can continue,
> which completely makes sense. But I've been trying and can't figure out how
> to do it.

This is homework. "Someone's"  guess as to what is causing the problem is 
correct.
However:

Create mmtable as a matrix with all 0's and give the columns the names of 
colors.
Like this

mmtable  <- matrix(0, nrow = 1000, ncol = 6, dimnames=list(c(),MM.color))

Then in the loop replace the lines 

mmtable[i,]<-table(combinations)
colnames(mmtable)<- c("Bl","Br","G","O","R","Y")

with

z <- table(combinations)
mmtable[i,names(z)] <- z

The second line stores the entries in z in the columns with the same column 
name and leaves the others alone.

Since you initialized mmtable with 0's you won't need to replace NA's with 0.

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] question about ... questions/ code

2012-01-29 Thread Nicole Marie Ford

Hello,

I have a dataset which I am calling RB09.  

I am trying to match the questions in the code book with variable codes.  

It is not very intuitive.

example:

 names(RB09)
  [1] "ea1""eaf1"   "eaf1a"  "eaf2"   "eaf2_7"
  [6] "eaf3"   "eafimpun"   "eafunpun"   "evimpmar"   "evfutpro"  
 [11] "ecjoh"  "eaf4a"  "eaf5"   "eaf6a"  "eaf6b" 
 [16] "eaf6c"  "eaf6d"  "eaf6e"  "eaf7a"  "eaf7b" 

(there are over 800 of these)

questions looks like this:

B16a. Most people in this country
Trusts  
Neutral 
Does not trust  
however, there is no variable B16a.  there is one that is "ssb16a" but as you 
can see:

> levels(RB09$ss16a)
[1] "yes"   "no""dont know" "na""dk"   

The levels are not the same.  SO I don't think this is correctly matched.

Is there an easy way to find out what -for example- which question "eaf6c"
goes to?  

Also, I know there is a way to search key words and find the variables which 
match.  I have done this before and can't find the code.

Any direction would help.  Thanks.

~Nicole

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Horizontal stacked 100% bars with ggplot2

2012-01-29 Thread Mario Giesel

Thanks, Carlos,
 
 your solution looks nice, however, there are 4 bars instead of 2 i.e. bars are 
not stacked.
Meanwhile I got a solution for that task.
So thanks for sharing your version of it!
 
Good luck,
 Mario



Von: Carlos Ortega 

Cc: "r-help@r-project.org"  
Gesendet: 23:02 Samstag, 28.Januar 2012
Betreff: Re: [R] Horizontal stacked 100% bars with ggplot2


Hello,

If it helps...:



Lines <- "pet gender
dog male
dog female
dog male
cat female
cat female
cat male
"

d.f <- read.table(textConnection(Lines), header=T, as.is = TRUE)

d.tab<-table(d.f$pet, d.f$gender)
d.f.tab<-as.data.frame(table(d.f$pet, d.f$gender))
names(d.f.tab)<-c('pet','gender', 'Freq')


library(lattice)
histogram(
  ~ Freq | pet *gender, data=d.f.tab, 
  groups=gender, stack=T, horizontal=T
  )


Regards,
Carlos Ortega
www.qualityexcellence.es





Hello, R friends,
>
>I'm trying to crack this nut:
>
>
>Example Data.
>
>pet    gender
>dog    male
>dog    female
>dog    male
>cat    female
>cat    female
>cat    male
>
>Plot Task.
>
>Horizontal 100% bars where
>y axis shows gender factor (male vs. female)
>and x axis shows percentage of kind of pets (dog vs. cat)
>so that % dogs + % cats are stacked in 1 bar and sum up to 100% (for each 
>gender group 1 bar).
>
>How can this be done with ggplot2?
>
[[elided Yahoo spam]]
>
>Mario
>
>       [[alternative HTML version deleted]]
>
>
>__
>R-help@r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Structure to Code

2012-01-29 Thread Gabor Grothendieck

On Sun, Jan 29, 2012 at 11:36 AM, Ajay Askoolum  wrote:
> Thank you. I need some clarification.
>
> dput(AirPassengers)
>
> gives:
>
> structure(c(112, 118, 132, 129, 121, 135, 148, 148, 136, 119,
> 104, 118, 115, 126, 141, 135, 125, 149, 170, 170, 158, 133, 114,
> 140, 145, 150, 178, 163, 172, 178, 199, 199, 184, 162, 146, 166,
> 171, 180, 193, 181, 183, 218, 230, 242, 209, 191, 172, 194, 196,
> 196, 236, 235, 229, 243, 264, 272, 237, 211, 180, 201, 204, 188,
> 235, 227, 234, 264, 302, 293, 259, 229, 203, 229, 242, 233, 267,
> 269, 270, 315, 364, 347, 312, 274, 237, 278, 284, 277, 317, 313,
> 318, 374, 413, 405, 355, 306, 271, 306, 315, 301, 356, 348, 355,
> 422, 465, 467, 404, 347, 305, 336, 340, 318, 362, 348, 363, 435,
> 491, 505, 404, 359, 310, 337, 360, 342, 406, 396, 420, 472, 548,
> 559, 463, 407, 362, 405, 417, 391, 419, 461, 472, 535, 622, 606,
> 508, 461, 390, 432), .Tsp = c(1949, 1960.917, 12), class = "ts")
>
> and AirPassengers looks as follows:
>
> AirPassengers
>  Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
> 1949 112 118 132 129 121 135 148 148 136 119 104 118
> 1950 115 126 141 135 125 149 170 170 158 133 114 140
> 1951 145 150 178 163 172 178 199 199 184 162 146 166
> 1952 171 180 193 181 183 218 230 242 209 191 172 194
> 1953 196 196 236 235 229 243 264 272 237 211 180 201
> 1954 204 188 235 227 234 264 302 293 259 229 203 229
> 1955 242 233 267 269 270 315 364 347 312 274 237 278
> 1956 284 277 317 313 318 374 413 405 355 306 271 306
> 1957 315 301 356 348 355 422 465 467 404 347 305 336
> 1958 340 318 362 348 363 435 491 505 404 359 310 337
> 1959 360 342 406 396 420 472 548 559 463 407 362 405
> 1960 417 391 419 461 472 535 622 606 508 461 390 432
>
> Where are the column headers in the result of dput(AirPassengers)?
>
> I guess it must be something to do with the '12' in the final argument of 
> .Tsp.
>
> So, If I just wanted 'Jan' ... 'Jun' and specified the following:
>

AirPassengers is a "ts" class object.  "ts" class objects represent
regularly spaced series.  If the frequency of the regularly spaced
series is 12 then the elements of the series are assumed to represent
monthly values -- that assumption is where the month names magically
come from.

If you want to take the first 6 months of each year then the resulting
series is no longer regularly spaced since some elements are now one
month later than the prior element and others are 6 months later.
Not being regularly spaced we can't use "ts" any more.  We could use
"zoo" class from the zoo package which is intended to represent series
which are not necessarily regularly spaced:

library(zoo)
z <- as.zoo(AirPassengers)
z6 <- z[cycle(z) <= 6]  # first 6 mos in each yr

# optionally use yearmon class for the time index
time(z6) <- yearmon(time(z6))

Read the vignettes and help files/reference manual in the zoo package
for more info.
http://cran.r-project.org/package=zoo

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Drawing sample

2012-01-29 Thread Jorge I Velez

Take a look at

?sample

HTH,
Jorge.-


On Sun, Jan 29, 2012 at 11:21 AM, Ron Michael <> wrote:

> Dear all, here I need to draw all possible samples of size 2, from a
> population, which is characterized by c(1,4,56, 3). Sampling will be done
> with replacement. Is there any direct R function to faciliate this darwing?
>
> Thanks for your help
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need very fast application of 'diff' - ideas?

2012-01-29 Thread Martin Morgan

On 01/29/2012 08:12 AM, Kevin Ummel wrote:

Sorry, guys. I'm not active on the listserve, so my last post was held by the 
moderator until after Dirk's solution was posted.

Excellent stuff.

Is 'diff' really the bottleneck in your calculations? I would have said 
diff was in the class of 'fast' R calculations, so would expect many 
other steps in a real analysis, including a poorly constructed input, to 
be much more expensive.

Since speed is apparently of the essence, it makes sense to create a 
shared library and load that, rather than re-compiling it (via inline) 
each time.

The calculation is very straight-forward in C. It makes sense to use the 
'.Call' interface to avoid copying on the way in and out, and other R 
overhead of the '.C' interface. A simple solution, assuming the correct 
argument type (numeric; the original post talks about integer values but 
the values actually used (floor(x)) are numeric and presumably in a 
speed-is-of-the-essence application the data would be created as the the 
type of interest), no NAs, non-0 length input, etc., is (like Hans' 
solution, using the .C interface), in file cdiff.c:

#include 

SEXP cdiff(SEXP x)
{
const int len = Rf_length(x) - 1;
SEXP ans = Rf_allocVector(REALSXP, len);
double *ansp = REAL(ans), *xp = REAL(x);
for (int i = 0; i < len; ++i)
ansp[i] = xp[i + 1] - xp[i];
return ans;
}

I doubt that, with appropriate optimization flags for the compiler,  use 
of [] vs. pointer arithmetic would make a difference. With compilation as

R CMD SHLIB cdiff.c

One would probably want to compile this with a high optimization, e.g., 
from the 'Writing R Extensions' manual section 5.5

MAKEFLAGS="CFLAGS=-O3" R CMD SHLIB cdiff.c

Use as

dyn.load("cdiff.so")
## ...
x = rnorm(1000)
ans <- .Call("cdiff", x)

For me I get

> dyn.load("cdiff.so")
> system.time(x <- rnorm(1000))
   user  system elapsed
  1.577   0.015   1.594
> system.time(ans0 <- diff(x))
   user  system elapsed
  0.842   0.110   0.952
> system.time(ans1 <- .Call("cdiff", x))
   user  system elapsed
  0.024   0.020   0.044
> identical(ans0, ans1)
[1] TRUE

Note that just creating random data already dominates the calculation, 
which is why diff seems such an unlikely candidate for a bottleneck. I 
would guess that obtaining memory for the answer is the big bottleneck 
in cdiff (or Rcpp); one could work around this but then violate R's 
memory model. That I can write C code that is 20x faster than 'diff' 
comes at a significant price, in terms of error checking and, yes, 
development time.

The timing for python in the original post doesn't capture the full cost 
of the calculation; it should include the cost of the subset and view 
construction (or whatever the efficient pythonic paradigm is)

Martin

thanks,
kevin

On Jan 29, 2012, at 8:37 AM, R. Michael Weylandt wrote:

Have you not followed your own thread? Dirk is Mr. Rcpp himself and he
gives an implementation that gives you 25x improvement here as well as
tips for getting even more out of it:

http://tolstoy.newcastle.edu.au/R/e17/help/12/01/2471.html

Michael

On Sat, Jan 28, 2012 at 12:28 PM, Kevin Ummel  wrote:

Thanks. I've played around with pure R solutions. The fastest re-write of diff 
(for the 1 lag case) I can seem to find is this:

diff2 = function(x) {
  y = c(x,NA) - c(NA,x)
  y[2:length(x)]
}

#Compiling via 'cmpfun' doesn't seem to help (or hurt):
require(compiler)
diff2 = cmpfun(diff2)

But that only gets ~10% improvement over default 'diff' on my machine. Still 
too slow for my particular application.

I'm inclined towards Michael's suggestion of inline+Rcpp (or some other use of 
C under the hood).

Could someone show me how to go about doing that?

Thanks!
Kevin

On Jan 28, 2012, at 9:14 AM, Peter Langfelder wrote:

ehm... this doesn't take very many ideas.

x = runif(n=10e6, min=0, max=1000)
x = round(x)

system.time( {
  y = x[-1] - x[-length(x)]
})

I get about 0.5 seconds on my old laptop.

HTH

Peter

On Fri, Jan 27, 2012 at 4:15 PM, Kevin Ummel  wrote:

Hi everyone,

Speed is the key here.

I need to find the difference between a vector and its one-period lag (i.e. the 
difference between each value and the subsequent one in the vector). Let's say 
the vector contains 10 million random integers between 0 and 1,000. The 
solution vector will have 9,999,999 values, since their is no lag for the 1st 
observation.

In R we have:

#Set up input vector
x = runif(n=10e6, min=0, max=1000)
x = round(x)

#Find one-period difference
y = diff(x)

Question is: How can I get the 'diff(x)' part as fast as absolutely possible? I 
queried some colleagues who work with other languages, and they provided 
equivalent solutions in Python and Clojure that, on their machines, appear to 
be potentially much faster (I've put the code below in case anyone is 
interested). However, they mentioned that the overhead in passing the data 
between languages could kill any improveme

[R] ColorBrewer question

2012-01-29 Thread Mario Giesel

Hello, R friends,
 
I'm trying to change colors of my horizontal bars so that they show a sequence.
I chose the ColorBrewer palette "Blues". However the resulting plot doesn't 
show any changes to the default.
I tried several places of "+ scale_colour_brewer(type="seq", pal = "Blues")" 
with no effect.
This is my code:
 
p <- ggplot(data, aes(x = gender))  + 
scale_y_continuous("",formatter="percent") + xlab("Gender") + coord_flip() +    
 scale_colour_brewer(type="seq", pal = "Blues")
p+geom_bar(aes(fill=pet),colour='black',position='fill') 
 
 
Any ideas welcome.
Thanks,
 Mario
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How do I turn NA's to zeroes when a combination lacks one element?

2012-01-29 Thread C_Crown

Hi all, I am  very new to R. I am taking a course and am trying to complete
my first assignment. 
For the assignment I have to get the program to generate different color
combinations possible for a sample size of 55 with the probabilities of each
color being chosen as: .24, .24, .16, .20, .13, .14. Here is what I've come
up with... 
 
sample.size<- 55
MM.probability<- c(.24, .24, .16, .20, .13, .14)
MM.color<- c('Bl','Br','G','O','R','Y')
mmtable<- matrix(nrow = 1000, ncol = 6)
for(i in 1:1000){
combinations<- sample(MM.color, sample.size, replace = T, prob =
MM.probability)
mmtable[i,]<-table(combinations)
colnames(mmtable)<- c("Bl","Br","G","O","R","Y")
}

I feel like it should work, but every time I run it, it usually only goes so
far (maybe to row 350, or 450, sometimes it completes with no problem)
before I start getting "NA" in every column of every row. I also get this
error message "Error in mmtable[i, ] <- table(combinations) :  number of
items to replace is not a multiple of replacement length"
Someone suggested that it is because the program is coming upon a
combination that is missing one of the colors, so I'd have to instruct it to
put a zero in place of the missing color so the simulation can continue,
which completely makes sense. But I've been trying and can't figure out how
to do it. I tried "mmtable[is.na(mmtable)]<-0", but then I just get zeroes
everywhere instead of NA's, and "mmtable[na.omit(mmtable)]" gives me the
same as no instruction at all. Do you know what the right notation would be?
I also need to use the script to determine the probability of getting the
combination with proportions: .182, .164, .309, .145, .091, .1090. I've also
tried a few things for this and am coming up with nothing. Sorry if this is
a really simple question, I am sure most of you could do this in your sleep,
but it's all very new to me. Thanks in advance. 

-C

--
View this message in context: 
http://r.789695.n4.nabble.com/How-do-I-turn-NA-s-to-zeroes-when-a-combination-lacks-one-element-tp4338725p4338725.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Drawing sample

2012-01-29 Thread David Winsemius

On Jan 29, 2012, at 11:21 AM, Ron Michael wrote:

Dear all, here I need to draw all possible samples of size 2, from a  
population, which is characterized by c(1,4,56, 3).

> combn(c(1,4,56, 3), 2)  # an enumeration of possible tuples
 [,1] [,2] [,3] [,4] [,5] [,6]
[1,]11144   56
[2,]4   563   5633

Sampling will be done with replacement. Is there any direct R  
function to faciliate this darwing?

Here is an implementation of such a sample from that matrix.

> combn(c(1,4,56, 3), 2)[ , sample(1:6, 10, replace=TRUE)]
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]   5644141111 1
[2,]33   5643443   56 3

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Structure to Code

2012-01-29 Thread David Winsemius



On Jan 29, 2012, at 11:36 AM, Ajay Askoolum wrote:


Thank you. I need some clarification.

dput(AirPassengers)

gives:

structure(c(112, 118, 132, 129, 121, 135, 148, 148, 136, 119,
104, 118, 115, 126, 141, 135, 125, 149, 170, 170, 158, 133, 114,
140, 145, 150, 178, 163, 172, 178, 199, 199, 184, 162, 146, 166,
171, 180, 193, 181, 183, 218, 230, 242, 209, 191, 172, 194, 196,
196, 236, 235, 229, 243, 264, 272, 237, 211, 180, 201, 204, 188,
235, 227, 234, 264, 302, 293, 259, 229, 203, 229, 242, 233, 267,
269, 270, 315, 364, 347, 312, 274, 237, 278, 284, 277, 317, 313,
318, 374, 413, 405, 355, 306, 271, 306, 315, 301, 356, 348, 355,
422, 465, 467, 404, 347, 305, 336, 340, 318, 362, 348, 363, 435,
491, 505, 404, 359, 310, 337, 360, 342, 406, 396, 420, 472, 548,
559, 463, 407, 362, 405, 417, 391, 419, 461, 472, 535, 622, 606,
508, 461, 390, 432), .Tsp = c(1949, 1960.917, 12), class =  
"ts")


and AirPassengers looks as follows:

AirPassengers
 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1949 112 118 132 129 121 135 148 148 136 119 104 118
1950 115 126 141 135 125 149 170 170 158 133 114 140
1951 145 150 178 163 172 178 199 199 184 162 146 166
1952 171 180 193 181 183 218 230 242 209 191 172 194
1953 196 196 236 235 229 243 264 272 237 211 180 201
1954 204 188 235 227 234 264 302 293 259 229 203 229
1955 242 233 267 269 270 315 364 347 312 274 237 278
1956 284 277 317 313 318 374 413 405 355 306 271 306
1957 315 301 356 348 355 422 465 467 404 347 305 336
1958 340 318 362 348 363 435 491 505 404 359 310 337
1959 360 342 406 396 420 472 548 559 463 407 362 405
1960 417 391 419 461 472 535 622 606 508 461 390 432

Where are the column headers in the result of dput(AirPassengers)?


Where? Nowhere is where. They were constructed by print.ts as needed,  
but they are not part of the structure. When you are in an interactive  
session the material you see after you type anobjects name is not the  
"object" but rather the result of print(object).


?print.ts

Echoing Michael  My impression is that people get so frustrated by  
trying to understand the conventions of the ts-class that they give up  
and use "zoo"-classed objects.





I guess it must be something to do with the '12' in the final  
argument of .Tsp.


So, If I just wanted 'Jan' ... 'Jun' and specified the following:



structure(c(112, 118, 132, 129, 121, 135, 148, 148, 136, 119,
104, 118, 115, 126, 141, 135, 125, 149, 170, 170, 158, 133, 114,
140, 145, 150, 178, 163, 172, 178, 199, 199, 184, 162, 146, 166,
171, 180, 193, 181, 183, 218, 230, 242, 209, 191, 172, 194, 196,
196, 236, 235, 229, 243, 264, 272, 237, 211, 180, 201, 204, 188,
235, 227, 234, 264, 302, 293, 259, 229, 203, 229, 242, 233, 267,
269, 270, 315, 364, 347, 312, 274, 237, 278, 284, 277, 317, 313,
318, 374, 413, 405, 355, 306, 271, 306, 315, 301, 356, 348, 355,
422, 465, 467, 404, 347, 305, 336, 340, 318, 362, 348, 363, 435,
491, 505, 404, 359, 310, 337, 360, 342, 406, 396, 420, 472, 548,
559, 463, 407, 362, 405, 417, 391, 419, 461, 472, 535, 622, 606,
508, 461, 390, 432), .Tsp = c(1949, 1972, 6), class = "ts")


Why does this generate " invalid time series parameters specified"?


Because (1972-1949+1)*6 is not large enough to hold the vector of  
values.


--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Drawing sample

2012-01-29 Thread Ron Michael

Dear all, here I need to draw all possible samples of size 2, from a 
population, which is characterized by c(1,4,56, 3). Sampling will be done with 
replacement. Is there any direct R function to faciliate this darwing?
 
Thanks for your help

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Structure to Code

2012-01-29 Thread R. Michael Weylandt

You said something about extracting January to June -- that's not so
possible with the default ts (best I understand it): ts() only allows
regular time-series so you can't go jan to jun at 1 month intervals
and then jump to jan again (6 months). If you want irregular time
series, check out the zoo package.

You're getting an "invalid parameters" error because the time window
specified is too short for the length of data given -- the .Tsp
attribute gives the start, end, and number of samples per year. This
works:

structure(c(112, 118, 132, 129, 121, 135, 148, 148, 136, 119,
104, 118, 115, 126, 141, 135, 125, 149, 170, 170, 158, 133, 114,
140, 145, 150, 178, 163, 172, 178, 199, 199, 184, 162, 146, 166,
171, 180, 193, 181, 183, 218, 230, 242, 209, 191, 172, 194, 196,
196, 236, 235, 229, 243, 264, 272, 237, 211, 180, 201, 204, 188,
235, 227, 234, 264, 302, 293, 259, 229, 203, 229, 242, 233, 267,
269, 270, 315, 364, 347, 312, 274, 237, 278, 284, 277, 317, 313,
318, 374, 413, 405, 355, 306, 271, 306, 315, 301, 356, 348, 355,
422, 465, 467, 404, 347, 305, 336, 340, 318, 362, 348, 363, 435,
491, 505, 404, 359, 310, 337, 360, 342, 406, 396, 420, 472, 548,
559, 463, 407, 362, 405, 417, 391, 419, 461, 472, 535, 622, 606,
508, 461, 390, 432), .Tsp = c(1949, 1972.833, 6), class =
"ts") ## Note it ends later

Also, it's probably not a good idea to emulate the output of dput() to
set up new input. That's what constructor functions -- like ts()  or
zoo() --  are for.

Michael

On Sun, Jan 29, 2012 at 11:36 AM, Ajay Askoolum  wrote:
> Thank you. I need some clarification.
>
> dput(AirPassengers)
>
> gives:
>
> structure(c(112, 118, 132, 129, 121, 135, 148, 148, 136, 119,
> 104, 118, 115, 126, 141, 135, 125, 149, 170, 170, 158, 133, 114,
> 140, 145, 150, 178, 163, 172, 178, 199, 199, 184, 162, 146, 166,
> 171, 180, 193, 181, 183, 218, 230, 242, 209, 191, 172, 194, 196,
> 196, 236, 235, 229, 243, 264, 272, 237, 211, 180, 201, 204, 188,
> 235, 227, 234, 264, 302, 293, 259, 229, 203, 229, 242, 233, 267,
> 269, 270, 315, 364, 347, 312, 274, 237, 278, 284, 277, 317, 313,
> 318, 374, 413, 405, 355, 306, 271, 306, 315, 301, 356, 348, 355,
> 422, 465, 467, 404, 347, 305, 336, 340, 318, 362, 348, 363, 435,
> 491, 505, 404, 359, 310, 337, 360, 342, 406, 396, 420, 472, 548,
> 559, 463, 407, 362, 405, 417, 391, 419, 461, 472, 535, 622, 606,
> 508, 461, 390, 432), .Tsp = c(1949, 1960.917, 12), class = "ts")
>
> and AirPassengers looks as follows:
>
> AirPassengers
>  Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
> 1949 112 118 132 129 121 135 148 148 136 119 104 118
> 1950 115 126 141 135 125 149 170 170 158 133 114 140
> 1951 145 150 178 163 172 178 199 199 184 162 146 166
> 1952 171 180 193 181 183 218 230 242 209 191 172 194
> 1953 196 196 236 235 229 243 264 272 237 211 180 201
> 1954 204 188 235 227 234 264 302 293 259 229 203 229
> 1955 242 233 267 269 270 315 364 347 312 274 237 278
> 1956 284 277 317 313 318 374 413 405 355 306 271 306
> 1957 315 301 356 348 355 422 465 467 404 347 305 336
> 1958 340 318 362 348 363 435 491 505 404 359 310 337
> 1959 360 342 406 396 420 472 548 559 463 407 362 405
> 1960 417 391 419 461 472 535 622 606 508 461 390 432
>
> Where are the column headers in the result of dput(AirPassengers)?
>
> I guess it must be something to do with the '12' in the final argument of
> .Tsp.
>
> So, If I just wanted 'Jan' ... 'Jun' and specified the following:
>
>
> structure(c(112, 118, 132, 129, 121, 135, 148, 148, 136, 119,
> 104, 118, 115, 126, 141, 135, 125, 149, 170, 170, 158, 133, 114,
> 140, 145, 150, 178, 163, 172, 178, 199, 199, 184, 162, 146, 166,
> 171, 180, 193, 181, 183, 218, 230, 242, 209, 191, 172, 194, 196,
> 196, 236, 235, 229, 243, 264, 272, 237, 211, 180, 201, 204, 188,
> 235, 227, 234, 264, 302, 293, 259, 229, 203, 229, 242, 233, 267,
> 269, 270, 315, 364, 347, 312, 274, 237, 278, 284, 277, 317, 313,
> 318, 374, 413, 405, 355, 306, 271, 306, 315, 301, 356, 348, 355,
> 422, 465, 467, 404, 347, 305, 336, 340, 318, 362, 348, 363, 435,
> 491, 505, 404, 359, 310, 337, 360, 342, 406, 396, 420, 472, 548,
> 559, 463, 407, 362, 405, 417, 391, 419, 461, 472, 535, 622, 606,
> 508, 461, 390, 432), .Tsp = c(1949, 1972, 6), class = "ts")
>
>
> Why does this generate " invalid time series parameters specified"?
>
>
>
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need very fast application of 'diff' - ideas?

2012-01-29 Thread R. Michael Weylandt

Sorry about that -- Gmail threaded things by arrival in my mailbox,
not by timestamp (surprisingly)

Rcpp really is one of the coolest new things in the R ecosystem --
hope it works well for you.

M

On Sun, Jan 29, 2012 at 11:12 AM, Kevin Ummel  wrote:
> Sorry, guys. I'm not active on the listserve, so my last post was held by the 
> moderator until after Dirk's solution was posted.
>
> Excellent stuff.
>
> thanks,
> kevin
>
> On Jan 29, 2012, at 8:37 AM, R. Michael Weylandt wrote:
>
>> Have you not followed your own thread? Dirk is Mr. Rcpp himself and he
>> gives an implementation that gives you 25x improvement here as well as
>> tips for getting even more out of it:
>>
>> http://tolstoy.newcastle.edu.au/R/e17/help/12/01/2471.html
>>
>> Michael
>>
>> On Sat, Jan 28, 2012 at 12:28 PM, Kevin Ummel  wrote:
>>> Thanks. I've played around with pure R solutions. The fastest re-write of 
>>> diff (for the 1 lag case) I can seem to find is this:
>>>
>>> diff2 = function(x) {
>>>  y = c(x,NA) - c(NA,x)
>>>  y[2:length(x)]
>>> }
>>>
>>> #Compiling via 'cmpfun' doesn't seem to help (or hurt):
>>> require(compiler)
>>> diff2 = cmpfun(diff2)
>>>
>>> But that only gets ~10% improvement over default 'diff' on my machine. 
>>> Still too slow for my particular application.
>>>
>>> I'm inclined towards Michael's suggestion of inline+Rcpp (or some other use 
>>> of C under the hood).
>>>
>>> Could someone show me how to go about doing that?
>>>
>>> Thanks!
>>> Kevin
>>>
>>> On Jan 28, 2012, at 9:14 AM, Peter Langfelder wrote:
>>>
 ehm... this doesn't take very many ideas.


 x = runif(n=10e6, min=0, max=1000)
 x = round(x)

 system.time( {
  y = x[-1] - x[-length(x)]
 })

 I get about 0.5 seconds on my old laptop.

 HTH

 Peter


 On Fri, Jan 27, 2012 at 4:15 PM, Kevin Ummel  wrote:
> Hi everyone,
>
> Speed is the key here.
>
> I need to find the difference between a vector and its one-period lag 
> (i.e. the difference between each value and the subsequent one in the 
> vector). Let's say the vector contains 10 million random integers between 
> 0 and 1,000. The solution vector will have 9,999,999 values, since their 
> is no lag for the 1st observation.
>
> In R we have:
>
> #Set up input vector
> x = runif(n=10e6, min=0, max=1000)
> x = round(x)
>
> #Find one-period difference
> y = diff(x)
>
> Question is: How can I get the 'diff(x)' part as fast as absolutely 
> possible? I queried some colleagues who work with other languages, and 
> they provided equivalent solutions in Python and Clojure that, on their 
> machines, appear to be potentially much faster (I've put the code below 
> in case anyone is interested). However, they mentioned that the overhead 
> in passing the data between languages could kill any improvements. I 
> don't have much experience integrating other languages, so I'm hoping the 
> community has some ideas about how to approach this particular problem...
>
> Many thanks,
> Kevin
>
> In iPython:
>
> In [3]: import numpy as np
> In [4]: arr = np.random.randint(0, 1000, (1000,1)).astype("int16")
> In [5]: arr1 = arr[1:].view()
> In [6]: timeit arr2 = arr1 - arr[:-1]
> 10 loops, best of 3: 20.1 ms per loop
>
> In Clojure:
>
> (defn subtract-lag
>  [n]
>  (let [v (take n (repeatedly rand))]
>    (time (dorun (map - v (cons 0 v))
>
>
>
>
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Structure to Code

2012-01-29 Thread Ajay Askoolum

Thank you. I need some clarification.

dput(AirPassengers)

gives:

structure(c(112, 118, 132, 129, 121, 135, 148, 148, 136, 119, 
104, 118, 115, 126, 141, 135, 125, 149, 170, 170, 158, 133, 114, 
140, 145, 150, 178, 163, 172, 178, 199, 199, 184, 162, 146, 166, 
171, 180, 193, 181, 183, 218, 230, 242, 209, 191, 172, 194, 196, 
196, 236, 235, 229, 243, 264, 272, 237, 211, 180, 201, 204, 188, 
235, 227, 234, 264, 302, 293, 259, 229, 203, 229, 242, 233, 267, 
269, 270, 315, 364, 347, 312, 274, 237, 278, 284, 277, 317, 313, 
318, 374, 413, 405, 355, 306, 271, 306, 315, 301, 356, 348, 355, 
422, 465, 467, 404, 347, 305, 336, 340, 318, 362, 348, 363, 435, 
491, 505, 404, 359, 310, 337, 360, 342, 406, 396, 420, 472, 548, 
559, 463, 407, 362, 405, 417, 391, 419, 461, 472, 535, 622, 606, 
508, 461, 390, 432), .Tsp = c(1949, 1960.917, 12), class = "ts")

and AirPassengers looks as follows:

AirPassengers
 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1949 112 118 132 129 121 135 148 148 136 119 104 118
1950 115 126 141 135 125 149 170 170 158 133 114 140
1951 145 150 178 163 172 178 199 199 184 162 146 166
1952 171 180 193 181 183 218 230 242 209 191 172 194
1953 196 196 236 235 229 243 264 272 237 211 180 201
1954 204 188 235 227 234 264 302 293 259 229 203 229
1955 242 233 267 269 270 315 364 347 312 274 237 278
1956 284 277 317 313 318 374 413 405 355 306 271 306
1957 315 301 356 348 355 422 465 467 404 347 305 336
1958 340 318 362 348 363 435 491 505 404 359 310 337
1959 360 342 406 396 420 472 548 559 463 407 362 405
1960 417 391 419 461 472 535 622 606 508 461 390 432

Where are the column headers in the result of dput(AirPassengers)?

I guess it must be something to do with the '12' in the final argument of .Tsp.

So, If I just wanted 'Jan' ... 'Jun' and specified the following:



structure(c(112, 118, 132, 129, 121, 135, 148, 148, 136, 119, 
104, 118, 115, 126, 141, 135, 125, 149, 170, 170, 158, 133, 114, 
140, 145, 150, 178, 163, 172, 178, 199, 199, 184, 162, 146, 166, 
171, 180, 193, 181, 183, 218, 230, 242, 209, 191, 172, 194, 196, 
196, 236, 235, 229, 243, 264, 272, 237, 211, 180, 201, 204, 188, 
235, 227, 234, 264, 302, 293, 259, 229, 203, 229, 242, 233, 267, 
269, 270, 315, 364, 347, 312, 274, 237, 278, 284, 277, 317, 313, 
318, 374, 413, 405, 355, 306, 271, 306, 315, 301, 356, 348, 355, 
422, 465, 467, 404, 347, 305, 336, 340, 318, 362, 348, 363, 435, 
491, 505, 404, 359, 310, 337, 360, 342, 406, 396, 420, 472, 548, 
559, 463, 407, 362, 405, 417, 391, 419, 461, 472, 535, 622, 606, 
508, 461, 390, 432), .Tsp = c(1949, 1972, 6), class = "ts")


Why does this generate " invalid time series parameters specified"?
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need very fast application of 'diff' - ideas?

2012-01-29 Thread Kevin Ummel

Sorry, guys. I'm not active on the listserve, so my last post was held by the 
moderator until after Dirk's solution was posted.

Excellent stuff.

thanks,
kevin

On Jan 29, 2012, at 8:37 AM, R. Michael Weylandt wrote:

> Have you not followed your own thread? Dirk is Mr. Rcpp himself and he
> gives an implementation that gives you 25x improvement here as well as
> tips for getting even more out of it:
> 
> http://tolstoy.newcastle.edu.au/R/e17/help/12/01/2471.html
> 
> Michael
> 
> On Sat, Jan 28, 2012 at 12:28 PM, Kevin Ummel  wrote:
>> Thanks. I've played around with pure R solutions. The fastest re-write of 
>> diff (for the 1 lag case) I can seem to find is this:
>> 
>> diff2 = function(x) {
>>  y = c(x,NA) - c(NA,x)
>>  y[2:length(x)]
>> }
>> 
>> #Compiling via 'cmpfun' doesn't seem to help (or hurt):
>> require(compiler)
>> diff2 = cmpfun(diff2)
>> 
>> But that only gets ~10% improvement over default 'diff' on my machine. Still 
>> too slow for my particular application.
>> 
>> I'm inclined towards Michael's suggestion of inline+Rcpp (or some other use 
>> of C under the hood).
>> 
>> Could someone show me how to go about doing that?
>> 
>> Thanks!
>> Kevin
>> 
>> On Jan 28, 2012, at 9:14 AM, Peter Langfelder wrote:
>> 
>>> ehm... this doesn't take very many ideas.
>>> 
>>> 
>>> x = runif(n=10e6, min=0, max=1000)
>>> x = round(x)
>>> 
>>> system.time( {
>>>  y = x[-1] - x[-length(x)]
>>> })
>>> 
>>> I get about 0.5 seconds on my old laptop.
>>> 
>>> HTH
>>> 
>>> Peter
>>> 
>>> 
>>> On Fri, Jan 27, 2012 at 4:15 PM, Kevin Ummel  wrote:
 Hi everyone,

 Speed is the key here.

 I need to find the difference between a vector and its one-period lag 
 (i.e. the difference between each value and the subsequent one in the 
 vector). Let's say the vector contains 10 million random integers between 
 0 and 1,000. The solution vector will have 9,999,999 values, since their 
 is no lag for the 1st observation.

 In R we have:

 #Set up input vector
 x = runif(n=10e6, min=0, max=1000)
 x = round(x)

 #Find one-period difference
 y = diff(x)

 Question is: How can I get the 'diff(x)' part as fast as absolutely 
 possible? I queried some colleagues who work with other languages, and 
 they provided equivalent solutions in Python and Clojure that, on their 
 machines, appear to be potentially much faster (I've put the code below in 
 case anyone is interested). However, they mentioned that the overhead in 
 passing the data between languages could kill any improvements. I don't 
 have much experience integrating other languages, so I'm hoping the 
 community has some ideas about how to approach this particular problem...

 Many thanks,
 Kevin

 In iPython:

 In [3]: import numpy as np
 In [4]: arr = np.random.randint(0, 1000, (1000,1)).astype("int16")
 In [5]: arr1 = arr[1:].view()
 In [6]: timeit arr2 = arr1 - arr[:-1]
 10 loops, best of 3: 20.1 ms per loop

 In Clojure:

 (defn subtract-lag
  [n]
  (let [v (take n (repeatedly rand))]
(time (dorun (map - v (cons 0 v))

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] r-help; weibull distribution

2012-01-29 Thread David Winsemius



On Jan 29, 2012, at 7:38 AM, Christopher Kelvin wrote:


 Please, Help me,
How do I generate data from the weibull distribution if the data  
contain both failure and interval censored,
For example, I want to generate n=100, shape=2 and scale =4 with 30%  
interval censored.

What about right censoring
Thank you
[[alternative HTML version deleted]]


Here's my theory about why your two other identical questions in the  
last two days have not been answered, either. This and your other  
question about how to use optim() to estimate weibull parameters look  
like homework and I'm guessing people are honoring the "no homework"  
guidelines laid out in the Posting Guide. Have you read it?


If it's not homework, you would appear more credible if you honored  
the suggestion in the Posting Guide that you provide background  
regarding your professional/academic position and  describe the tasks  
you are setting for yourself.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Structure to Code

2012-01-29 Thread Florent D.

see dump() or dput().

On Sun, Jan 29, 2012 at 10:56 AM, Ajay Askoolum  wrote:
> Given:
>
> data(AirPassengers)
>
> I get a ts data structure AirPassengers in the workspace.
>
> How can I generate the code that can create that structure? That is, given an 
> example of a data structure, is there a way to generate the code that can 
> greate that structure?
>
>
> Alternatively, is there a reference that provides a list of dta structures 
> together with a full list of theor respective attributes?
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Structure to Code

2012-01-29 Thread R. Michael Weylandt

dput()

Michael

On Sun, Jan 29, 2012 at 10:56 AM, Ajay Askoolum  wrote:
> Given:
>
> data(AirPassengers)
>
> I get a ts data structure AirPassengers in the workspace.
>
> How can I generate the code that can create that structure? That is, given an 
> example of a data structure, is there a way to generate the code that can 
> greate that structure?
>
>
> Alternatively, is there a reference that provides a list of dta structures 
> together with a full list of theor respective attributes?
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Data Structure to Code

2012-01-29 Thread Ajay Askoolum

Given:

data(AirPassengers)

I get a ts data structure AirPassengers in the workspace.

How can I generate the code that can create that structure? That is, given an 
example of a data structure, is there a way to generate the code that can 
greate that structure?


Alternatively, is there a reference that provides a list of dta structures 
together with a full list of theor respective attributes?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need very fast application of 'diff' - ideas?

2012-01-29 Thread R. Michael Weylandt

Have you not followed your own thread? Dirk is Mr. Rcpp himself and he
gives an implementation that gives you 25x improvement here as well as
tips for getting even more out of it:

http://tolstoy.newcastle.edu.au/R/e17/help/12/01/2471.html

Michael

On Sat, Jan 28, 2012 at 12:28 PM, Kevin Ummel  wrote:
> Thanks. I've played around with pure R solutions. The fastest re-write of 
> diff (for the 1 lag case) I can seem to find is this:
>
> diff2 = function(x) {
>  y = c(x,NA) - c(NA,x)
>  y[2:length(x)]
> }
>
> #Compiling via 'cmpfun' doesn't seem to help (or hurt):
> require(compiler)
> diff2 = cmpfun(diff2)
>
> But that only gets ~10% improvement over default 'diff' on my machine. Still 
> too slow for my particular application.
>
> I'm inclined towards Michael's suggestion of inline+Rcpp (or some other use 
> of C under the hood).
>
> Could someone show me how to go about doing that?
>
> Thanks!
> Kevin
>
> On Jan 28, 2012, at 9:14 AM, Peter Langfelder wrote:
>
>> ehm... this doesn't take very many ideas.
>>
>>
>> x = runif(n=10e6, min=0, max=1000)
>> x = round(x)
>>
>> system.time( {
>>  y = x[-1] - x[-length(x)]
>> })
>>
>> I get about 0.5 seconds on my old laptop.
>>
>> HTH
>>
>> Peter
>>
>>
>> On Fri, Jan 27, 2012 at 4:15 PM, Kevin Ummel  wrote:
>>> Hi everyone,
>>>
>>> Speed is the key here.
>>>
>>> I need to find the difference between a vector and its one-period lag (i.e. 
>>> the difference between each value and the subsequent one in the vector). 
>>> Let's say the vector contains 10 million random integers between 0 and 
>>> 1,000. The solution vector will have 9,999,999 values, since their is no 
>>> lag for the 1st observation.
>>>
>>> In R we have:
>>>
>>> #Set up input vector
>>> x = runif(n=10e6, min=0, max=1000)
>>> x = round(x)
>>>
>>> #Find one-period difference
>>> y = diff(x)
>>>
>>> Question is: How can I get the 'diff(x)' part as fast as absolutely 
>>> possible? I queried some colleagues who work with other languages, and they 
>>> provided equivalent solutions in Python and Clojure that, on their 
>>> machines, appear to be potentially much faster (I've put the code below in 
>>> case anyone is interested). However, they mentioned that the overhead in 
>>> passing the data between languages could kill any improvements. I don't 
>>> have much experience integrating other languages, so I'm hoping the 
>>> community has some ideas about how to approach this particular problem...
>>>
>>> Many thanks,
>>> Kevin
>>>
>>> In iPython:
>>>
>>> In [3]: import numpy as np
>>> In [4]: arr = np.random.randint(0, 1000, (1000,1)).astype("int16")
>>> In [5]: arr1 = arr[1:].view()
>>> In [6]: timeit arr2 = arr1 - arr[:-1]
>>> 10 loops, best of 3: 20.1 ms per loop
>>>
>>> In Clojure:
>>>
>>> (defn subtract-lag
>>>  [n]
>>>  (let [v (take n (repeatedly rand))]
>>>    (time (dorun (map - v (cons 0 v))
>>>
>>>
>>>
>>>
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] r-help; weibull distribution

2012-01-29 Thread Christopher Kelvin

 Please, Help me,
How do I generate data from the weibull distribution if the data contain both 
failure and interval censored,
For example, I want to generate n=100, shape=2 and scale =4 with 30% interval 
censored. 
What about right censoring
Thank you 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] package does not have a NAMESPACE

2012-01-29 Thread Uwe Ligges




On 27.01.2012 19:51, Ondřej Mikula wrote:

Dear r-helpers,

I have a trouble with a package downloaded from sourceforge.net
(namely the package 'kopls'). I installed it from the local zip file
with the expected result

utils:::menuInstallLocal()

package ‘kopls’ successfully unpacked and MD5 sums checked
but when I tried to load it I obtained the following error message

require(kopls)

Loading required package: kopls
Failed with error:  ‘package ‘kopls’ does not have a NAMESPACE and
should be re-installed’

I tried to contact its developers but with no reaction. Could anybody
give me an advice how to solve the problem on my own?
Many thanks in advance

Ondrej Mikula




Obviously the package does not have an explicit namespace and the 
Windows binary was build with an outdated version of R (otherwise R had 
added a default namespace).


Hence: Install from the sources (as described in the manual R 
Installation and Administration).


Uwe Ligges

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] wireframe_box_axis

2012-01-29 Thread David Winsemius



On Jan 28, 2012, at 6:44 PM, h...@komarac.biologija.unios.hr wrote:


Thank you very much for help. But would you be kind to show me on this
simple example how to obtain x, y and z axis. Namely, it is not  
problem to

avoid box with "par.box = list(col="transparent")" but to keep axis?

library(lattice)
x <- seq(-pi, pi, len = 20)
y <- seq(-pi, pi, len = 20)
g <- expand.grid(x = x, y = y)
g$z <- sin(sqrt(g$x^2 + g$y^2))
wireframe(z ~ x * y, g, drape = TRUE,
perspective = FALSE,
aspect = c(3,1), colorkey = FALSE,
par.box = list(col=NA))

On this plot there are wireframe with arrows, I need axis with tick  
marks

and no arrows (and of course no box.)


I didn't notice until this morning that the OP had not copied the list  
with this follow-up question. This is what I sent in reply. If it is  
incorrect on hte issue of it not being possible to draw axis lines, I  
would like to be corrected.


--- sent to "hack" 
As shown in the second example in the cloud/wireframe help page (which  
happened to be the one I was using earlier to test the par.box  
strategy in the regrettable absence of a working example from you):


... , scales=list(arrows=FALSE),  # to remove the arrows

And that example certainly looks like the second example, anyway. What  
could be your additional goal is drawing not only axis ticks and  
labels but also the portions of the box that would be associated with  
the ticks and labels. The only instance in the archives I could find  
in 2003 was a statement by Sarkar that such an options was not  
available as of that time.

--- end quote 


I did find a later message from Sarkar (circa 2006) that  said there  
were plans to allow hiding of box elements in the back, but I really  
did my best to review the help pages and look at the box.3d

 parameters and could not find how to do that.

--
David.




Thanks in advance!!!







On Jan 28, 2012, at 5:06 PM, Branimir Hackenberger wrote:


Dear all!

I am using wireframe function from lattice package.


Is it possible to remove box around the plot but to  keep axis (x,
y, z)?


First hit on a search "make box transparent lattice sarkar":

http://finzi.psych.upenn.edu/R/Rhelp02/archive/17797.html

 ... , par.box = list(col="transparent") ,

--

David Winsemius, MD
West Hartford, CT






David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error from Brugs "'r for windows gui front-end has stopped working''

2012-01-29 Thread Jin Minming

Thanks.

I guess it may be due to the software conflict in my computers as there are the 
same software packages which have been installed in these two laptops.  Anyway, 
I can still use my desktop with Windows XP system. 

Jim


--- On Sat, 28/1/12, Uwe Ligges  wrote:

> From: Uwe Ligges 
> Subject: Re: [R] Error from Brugs "'r for windows gui front-end has stopped 
> working''
> To: "Jin Minming" 
> Cc: r-help@r-project.org
> Date: Saturday, 28 January, 2012, 18:45
> 
> 
> On 25.01.2012 18:50, Jin Minming wrote:
> > Hello Uwe,
> >
> > Yes. These packages are what I used for the test in
> Windows 7 and vista system:
> >   R-2.14.1 in 32-bit
> >   BRugs version 0.7-5
> >   OpenBUGS version (3.2.1)
> 
> On a Win 7 32 bit with the same software as described above,
> I cannot 
> reproduce a crash (except for using a different path since
> the once you 
> are using does not work for me, since OpenBUGS cannot deal
> with spaces 
> in the paths which is the case under German versions of
> Windows).
> Hence my suggestion to ask on some BUGS mailing list if
> anybody can 
> reproduce your problems.
> 
> Uwe Ligges
> 
> 
> >
> > Thanks,
> >
> > Wei
> >
> >
> > --- On Wed, 25/1/12, Uwe Ligges 
> wrote:
> >
> >> From: Uwe Ligges
> >> Subject: Re: [R] Error from Brugs "'r for windows
> gui front-end has stopped working''
> >> To: "Jin Minming"
> >> Cc: r-help@r-project.org
> >> Date: Wednesday, 25 January, 2012, 11:09
> >> I just tried all 100 iterations both
> >> under WinXP and Windows Server 2008
> >> of your script under 32-bit R (which I assume you
> are using
> >> when talking
> >> about the old BRugs below).
> >>
> >> Please confirm:
> >> R-2.14.1 in 32-bit
> >> BRugs version 0.7-5
> >> OpenBUGS version (3.2.1)
> >>
> >> I do not have any Win7 machine available before
> Friday.
> >>
> >> Uwe Ligges
> >>
> >>
> >> On 25.01.2012 09:55, Jin Minming wrote:
> >>> I forgot to add the R codes. The following are
> the
> >> codes I used for this problem:
> >>>
> >>>
> >>> ###    Step by step example:
> >> ###
> >>> library(BRugs) # loading BRugs
> >>>
> >>> ## Now setting the working directory to the
> examples'
> >> one:
> >>> setwd(options()$OpenBUGSExamples)
> >>>
> >>> ## add a simple loop
> >>> for (mn in 1:100){
> >>>
> >>> ## some usual steps (like clicking in
> WinBUGS):
> >>> modelCheck("Ratsmodel.txt")
> >>    # check model file
> >>> modelData("Ratsdata.txt")
> >>      # read data file
> >>> modelCompile(numChains=2)
> >>      # compile model with 2 chains
> >>> modelInits(rep("Ratsinits.txt", 2))  #
> read init
> >> data file
> >>> modelUpdate(1000)
> >>            # burn in
> >>> samplesSet(c("alpha0", "alpha"))
> >>     # alpha0 and alpha should
> be monitored
> >>> modelUpdate(1000)
> >>            # 1000
> more iterations
> >> 
> >>>
> >>> samplesStats("*")
> >>            # the
> summarized results
> >>>
> >>> ## some plots
> >>> #samplesHistory("*", mfrow = c(4, 2)) # plot
> the
> >> chain,
> >>> #samplesDensity("alpha")
> >>        # plot the densities,
> >>> #samplesBgr("alpha[1:6]")
> >>       # plot the bgr
> statistics, and
> >>> #samplesAutoC("alpha[1:6]", 1)
> >>    # plot autocorrelations of 1st chain
> >>>
> >>> # force to print the current loop number
> >>> flush.console()
> >>> print(mn)
> >>> }
> >>>
> >>>
> >>>
> >>>
> >>> --- On Tue, 24/1/12, Jin Minming
> >> wrote:
> >>>
>  From: Jin Minming
>  Subject: [R] Error from Brugs "'r for
> windows gui
> >> front-end has stopped working''
>  To: r-help@r-project.org
>  Date: Tuesday, 24 January, 2012, 20:08
>  Dear all,
> 
>  I just try to run the example in R brugs
> packages,
> >> but not
>  only once. Loop is added in this example.
> After
> >> several
>  times (7, 11, or other random number),
> there is an
> >> error
>  message "r for windows gui front-end has
> stopped
> >> working".
>  This happened in two laptops with windows 7
> and
> >> vista. I
>  have tried different versions of R (2.14.1
> and
> >> 2.13) and
>  Brugs packages (0.7.1 and 0.7-5). But it
> works well
> >> in a
>  computer installed with Windows XP system.
> 
>  Anyone has some idea how to avoid this
> crash in
> >> Windows
>  Vista or 7 system?
> 
>  Thanks,
> 
>  Jim
> 
> 
> __
>  R-help@r-project.org
>  mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-help
>  PLEASE do read the posting guide 
>  http://www.R-project.org/posting-guide.html
>  and provide commented, minimal,
> self-contained,
> >> reproducible
>  code.
> 
> >>>
> >>> __
> >>> R-help@r-project.org
> >> mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide 
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal,
> self-contained,
> >> reproducible code.
> >>
>

_

Re: [R] r-help; weibull parameter estimate

2012-01-29 Thread peter dalgaard


On Jan 29, 2012, at 12:17 , Christopher Kelvin wrote:

> Hello,
> If i write a function as below using log of weibull distribution i do not get 
> the required 
> 
> results in estimating the parameters what do i do, please

Presumably find and fix the error in your likelihood function!

> z2 <- function(p)-sum(dweibull(x,p[1],p[2],log=TRUE))
> z2(c(2,1))
[1] 1359.169
> z2(c(2,2))
[1] 634.8413
> z(c(2,1))
[1] 736251.1
> z(c(2,2))
[1] 184012
> optim(c(.5,.5),z2)
$par
[1] 1.971611 1.938388

$value
[1] 633.9709

$counts
function gradient 
  79   NA 

$convergence
[1] 0

$message
NULL


> 
> a/b * (t/b)^a-1 * exp(-t/b)^a
> 
> 
> n=500
> x<-rweibull(n,2,2)
> z<-function(p) {(-n*log(p[1])+n*log(p[2])-
> (p[1]-1)*sum(log(x))+(p[1]-1)*log(p[2])+(sum(x/p[2])^(p[1]))  )}
> zz<-optim(c(0.5,0.5),z)
> zz
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] question on model.matrix

2012-01-29 Thread Daniel Negusse

> 
> 
> I am new to this group - joined today. I am in because i have an assignment 
> to do using R => microarray analysis using affy (bioconductor). 
> 
> while reading some tutorials, i came across this and i am stuck. i want to 
> understand it and would appreciate if anyone can tell me. 
> 
> design <- model.matrix(~ -1+factor(c(1,1,2,2,3,3)))
> 
> can someone break down this code and explain to me what the "~", and the 
> "-1+factor" are doing? sometimes i see codes without some of these things. 
> 
> thanks, would appreciate if you can explain it in the simplest way you can. i 
> am a pure molecular biologist and trying to learn this new monster called R. 
> :-) 
> 
> thanks, 
> 
> daniel


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] r-help; weibull parameter estimate

2012-01-29 Thread Christopher Kelvin

Hello,
If i write a function as below using log of weibull distribution i do not get 
the required 

results in estimating the parameters what do i do, please

a/b * (t/b)^a-1 * exp(-t/b)^a


n=500
x<-rweibull(n,2,2)
z<-function(p) {(-n*log(p[1])+n*log(p[2])-
(p[1]-1)*sum(log(x))+(p[1]-1)*log(p[2])+(sum(x/p[2])^(p[1]))  )}
zz<-optim(c(0.5,0.5),z)
zz
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help joincount.test

2012-01-29 Thread CD

Hello,
I'm trying to analyse the spatial organization of different fields planted
with different varieties (each field has only one variety), but I have
problems trying to understand the results of the test I did.
To do this, I created different neighbourhood matrix. For example, for the
first matrix, fields are considered as neighbours if they are distant from 0
to 1000 meters from each others (independently on the variety). In the
second matrix, I consider fields as neighbours if they are distant from 0 to
3000 m.
Then I test the spatial organization with the joincount.test function in R.
The help page says "The BB join count test for spatial autocorrelation using
a spatial weights matrix in weights list form for testing whether
same-colour joins occur more frequently than would be expected if the zones
were labelled in a spatially random way."

When I take 0 < d < 1000 m, the test gives (for variety A) : same colour
statistic = 4.5, with expectation = 2.6 (highly significative). I thought
this meant that one field of variety A had an average of 4.5 neighbours of
the same variety A, while the expected number of neighbours of the same
variety was 2.6. This means that for this variety and this range of
distances, the fields are strongly organized.
But, when 0 < d < 3000 m, the test gives : same colour statistic = 2.3, with
expectation = 2.5. Because of this result, I'm not sure any more that the
same colour statistic gives a number of neighbours of the same variety,
because if it was the case, this number could have increased with increasing
d, but it could not have decreased with increasing d as I found here.

Can somebody help me understanding what really means this "same colour
statistic"?
Thanks a lot,
Camille

--
View this message in context: 
http://r.789695.n4.nabble.com/Help-joincount-test-tp4337880p4337880.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Why does the order of terms in a formula translate into different models/ model matrices?

2012-01-29 Thread peter dalgaard

On Jan 29, 2012, at 02:42 , Ben Bolker wrote:
>> 
> 
> (quoting removed to make Gmane happy)
> 
> AFAICS, this is a bug.
> 
> 
>  I think so too, although I haven't got my head around it yet.
> 
>  Chuck, are you willing to post a summary of this to r-devel
> for discussion ... and/or post a bug report?

You have to be very specific about what the bug is, if any... I.e., which 
precisely are the rules that are broken by the current behavior? 

Preferably also suggest a fix --- the corner cases of model.matrix and friends 
is some of the more impenetrable code in the R sources.

Notice that order dependent parametrization of terms is not a bug per se, nor 
is the automatic switch to dummy coding of factors. Consider these cases:

d <- cbind(expand.grid(a=c("a","b"),b=c("X","Y"),c=c("U","W")),x=1:8)
model.matrix(~ a:b + a:c, d)
model.matrix(~ a:c + a:b, d)
model.matrix(~ a:b + a:x, d)
model.matrix(~ a:x + a:b, d)

and notice that the logic applying to "x" is the same as that applying to "c". 

The crux seems to be that the model ~a:c contains the model ~a whereas ~a:x 
does not, and hence the rationale for _not_ expanding a subsequent a:b term to 
dummies (namely that ~a is "already in" the model) fails. 

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] For Loop Error

2012-01-29 Thread Petr Savicky

On Sat, Jan 28, 2012 at 10:25:47AM -0800, Melrose2012 wrote:
> Hi Again,
> 
> I am writing a 'for loop' to create a matrix of randomly sampled colors. 
> I've written this loop in matlab and it works fine.  I then tried to do it
> in R and apparently there is something wrong with my syntax b/c every time I
> run the script, the for loop "blows up" at a different point in the loop.
> 
> As you can see, I ask R to clear my workspace each time, so there shouldn't
> be any variables in my workspace that are messing this up.
> 
> rm(list=ls())
> 
> # Sample Size
> size <- 53  
> # Probability of each color based on company
> P.company <- c(.14,.14,.16,.13,.20,.24)
> # Names of colors
> color <- c('Br','Y','G','R','O','Bl')
> # Make an empty matrix that will be filled in by for loop
> table.combos <- matrix(nrow = 1, ncol = 6);
> 
> # For loop will run through choosing a random bag of 53 M&Ms 1 times
> # and create a table enumerating the number of each color in each of these
> 1 bags
> for(i in 1:1) {
>   combos <- sample(color,size, replace = TRUE, prob = P.company)
>   table.combos[i, ] <- table(combos)
>   colnames(table.combos)<-c("Br","Y","G","R","O","Bl")
> }
> 
> Every time the loop blows up, I get back this error:
> 
> Error in table.combos[i, ] <- table(combos) : 
>   number of items to replace is not a multiple of replacement length

Hi.

Rui Barradas already pointed out that the problem is with samples,
which do not contain all colors. Forcing table() to produce zero
frequencies of colors, which do not occur, a factor may be used.
Try, for example

  size <- 53
  P.company <- c(.14,.14,.16,.13,.20,.24)
  color <- c('Br','Y','G','R','O','Bl')
  table.combos <- matrix(nrow = 1, ncol = 6);
  colnames(table.combos) <- color
 
  for(i in 1:1) {
combos <- sample(color,size, replace = TRUE, prob = P.company)
table.combos[i, ] <- table(factor(combos, levels=color))
  }

  which(table.combos==0, arr.ind=TRUE)

 row col
   [1,]  588   1
   [2,] 3005   1
   [3,] 4535   1
   [4,] 8314   1
   [5,] 7654   2
   [6,] 8607   3
   [7,] 1790   4
   [8,] 1980   4
   [9,] 2991   4
  [10,] 4550   4
  [11,] 5868   4
  [12,] 6704   4
  [13,] 8769   4
  [14,] 8823   4
  [15,] 7861   5

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Please help, a question about the R function arima, thanks a lot!

2012-01-29 Thread blackkettle

Dear everyone,

I have met a problem when using the R function arima.

The following is the coding.

> www <- "http://www.massey.ac.nz/~pscowper/ts/pounds_nz.dat";
> x <- read.table(www, header = T)
> x.ts <- ts(x, st = 1991, fr = 4)
> x.ma <- arima(x.ts, order = c(0, 0, 1))
> x.ma
> acf(x.ma$res[-1])


I was wondering whether somone here could enlighten me how x.ma$res is
calculated in the previous example of  R  ?

Thank you so much! 

--
View this message in context: 
http://r.789695.n4.nabble.com/Please-help-a-question-about-the-R-function-arima-thanks-a-lot-tp4337535p4337535.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] For Loop Error

2012-01-29 Thread Rui Barradas

Hello,

> 
> Every time the loop blows up, I get back this error:
> 
> Error in table.combos[i, ] <- table(combos) :
>   number of items to replace is not a multiple of replacement length
> 

This is because not all colors are present in that one sample that breaks
the code.

> 
> There is no apparent consistency on which "i" in the loop I get this
> error.  
> 
There shouldn't be, every time you run the loop, 'sample' will take
different values. To get
reproducibility you would need 'set.seed'.

A solution could be based on the following.
It's important to sort, like the difference between 'tc1' and 'tc2' proves.
It's 'tc2' you want.

Note that I've reduced the sample size to 10.
Note also that I initialize the matrices to zeros, not NA's.


color <- sort(c('Br','Y','G','R','O','Bl'))

tc1 <- matrix(0, nrow = 1, ncol = 6)
tc2 <- matrix(0, nrow = 1, ncol = 6)

colnames(tc1) <- color
colnames(tc2) <- color

set.seed(123)
for(i in 1) {
  combos <- sample(color, 10, replace = TRUE, prob = P.company)
  tc1[i, unique(combos)] <- table(combos)
  tc2[i, sort(unique(combos))] <- table(combos)
}

table(combos)
tc1
tc2


I hope this helps.

Rui Barradas


--
View this message in context: 
http://r.789695.n4.nabble.com/For-Loop-Error-tp4336585p4337469.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

47 matches

Mail list logo