Re: [R] Random assignment

2010-10-15 Thread Dennis Murphy
Hi:

I don't believe you've provided quite enough information just yet...

On Fri, Oct 15, 2010 at 2:22 AM, John Haart anothe...@me.com wrote:

 Dear List,

 I am doing some simulation in R and need basic help!

 I have a list of animal families for which i know the number of species in
 each family.

 I am working under the assumption that a species has a 7.48% chance of
 being at risk.


Is this over all families, or within a particular family? If the former, why
does a distinction of family matter?


 I want to simulate the number of species expected to be at risk under a
 random binomial distribution with 10,000 randomizations.


I guess you've stated the p, but what's the n? The number of species in each
family? If you're simulating on a family by family basis, then it would seem
that a binomial(nspecies, 0.0748) distribution would be the reference.
Assuming you have multiple families, do you want separate simulations per
family, or do you want to do some sort of weighting (perhaps proportional to
size) over all families? The latter is doable, but it would require a
two-stage simulation: one to randomly select a family and then to randomly
select a species.

Dennis



 I am relatively knew to this field and would greatly appreciate a
 idiot-proof response, I.e how should the data be entered into R? I was
 thinking of using read.table, header = T, where the table has F = Family
 Name, and SP = Number of species in that family?

 John

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with R

2010-10-15 Thread Philipp Pagel
On Fri, Oct 15, 2010 at 09:57:21AM +0200, Muteba Mwamba, John wrote:

 FATAL ERROR: unable to restore saved data in .RDATA

Without more information it's hard to know what exactly went wrong.

Anyway, the message most likely means that the .RData file got
corrupted. Deleting it should solve the problem. Note that this means
that you will get an empty workspace and have to recreate whatever
data was in it before.

 I decided to uninstall the copy (a R2.11.0) and installed a new
 version (2.11.1) but I'm still receiving the same message. When I
 click OK the closes.

Re-installation of R will most likely not fix this (unless a change in
the format of the .RData files had occurred - but to my knowledge no
such thing has happened, recently.)

cu
Philipp

-- 
Dr. Philipp Pagel
Lehrstuhl für Genomorientierte Bioinformatik
Technische Universität München
Wissenschaftszentrum Weihenstephan
Maximus-von-Imhof-Forum 3
85354 Freising, Germany
http://webclu.bio.wzw.tum.de/~pagel/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with R

2010-10-15 Thread peter dalgaard

On Oct 15, 2010, at 12:37 , Philipp Pagel wrote:

 On Fri, Oct 15, 2010 at 09:57:21AM +0200, Muteba Mwamba, John wrote:
 
 FATAL ERROR: unable to restore saved data in .RDATA
 
 Without more information it's hard to know what exactly went wrong.
 
 Anyway, the message most likely means that the .RData file got
 corrupted. Deleting it should solve the problem. Note that this means
 that you will get an empty workspace and have to recreate whatever
 data was in it before.

Renaming the file is a better idea. Then at least you have some chance of being 
able to load() it into an R Session and recover things from the workspace. No 
guarantees, though.

-pd

 
 I decided to uninstall the copy (a R2.11.0) and installed a new
 version (2.11.1) but I'm still receiving the same message. When I
 click OK the closes.
 
 Re-installation of R will most likely not fix this (unless a change in
 the format of the .RData files had occurred - but to my knowledge no
 such thing has happened, recently.)
 
 cu
   Philipp
 
 -- 
 Dr. Philipp Pagel
 Lehrstuhl für Genomorientierte Bioinformatik
 Technische Universität München
 Wissenschaftszentrum Weihenstephan
 Maximus-von-Imhof-Forum 3
 85354 Freising, Germany
 http://webclu.bio.wzw.tum.de/~pagel/
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with R

2010-10-15 Thread Petr PIKAL
Sometimes such message appears when you try to open .RData file in 
environment where packages used when the file was created are not 
installed. Than it is possible just to install necessary packages. Without 
whole story it is impossible to say what is real cause for that error.

Regards
Petr

r-help-boun...@r-project.org napsal dne 15.10.2010 12:45:30:

 
 On Oct 15, 2010, at 12:37 , Philipp Pagel wrote:
 
  On Fri, Oct 15, 2010 at 09:57:21AM +0200, Muteba Mwamba, John wrote:
  
  FATAL ERROR: unable to restore saved data in .RDATA
  
  Without more information it's hard to know what exactly went wrong.
  
  Anyway, the message most likely means that the .RData file got
  corrupted. Deleting it should solve the problem. Note that this means
  that you will get an empty workspace and have to recreate whatever
  data was in it before.
 
 Renaming the file is a better idea. Then at least you have some chance 
of 
 being able to load() it into an R Session and recover things from the 
 workspace. No guarantees, though.
 
 -pd
 
  
  I decided to uninstall the copy (a R2.11.0) and installed a new
  version (2.11.1) but I'm still receiving the same message. When I
  click OK the closes.
  
  Re-installation of R will most likely not fix this (unless a change in
  the format of the .RData files had occurred - but to my knowledge no
  such thing has happened, recently.)
  
  cu
 Philipp
  
  -- 
  Dr. Philipp Pagel
  Lehrstuhl für Genomorientierte Bioinformatik
  Technische Universität München
  Wissenschaftszentrum Weihenstephan
  Maximus-von-Imhof-Forum 3
  85354 Freising, Germany
  http://webclu.bio.wzw.tum.de/~pagel/
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 -- 
 Peter Dalgaard
 Center for Statistics, Copenhagen Business School
 Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 Phone: (+45)38153501
 Email: pd@cbs.dk  Priv: pda...@gmail.com
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] creating 'all' sum contrasts

2010-10-15 Thread Michael Hopkins


OK, my last question didn't get any replies so I am going to try and ask a 
different way.

When I generate contrasts with contr.sum() for a 3 level categorical variable I 
get the 2 orthogonal contrasts:

 contr.sum( c(1,2,3) )
  [,1] [,2]
110
201
3   -1   -1

This provides the contrasts 1-3 and 2-3 as expected.  But I also want it to 
create 1-2 (i.e. 1-3 - 2-3).  So in general I want all possible 
orthogonal contrasts - think of it as the contrasts for all pairwise 
comparisons between the levels.

Are there are any options for contrast() or other functions/libraries that will 
allow me to do this automatically?  I could go through and create new columns 
but I am using this for complex multi-factor experiments with varying levels 
per factor and fitting the models within other functions (e.g. regsubsets()) so 
an automatic solution using what is already available would be far preferable.


Michael Hopkins
Algorithm and Statistical Modelling Expert
 
Upstream
23 Old Bond Street
London
W1S 4PZ


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random assignment

2010-10-15 Thread John Haart
Hi Denis and list

Thanks for this , and sorry for not providing enough information

First let me put the study into a bit more context : -

I know the number of species at risk in each family, what i am asking  is Is 
risk random according to family or do certain families have a disproportionate 
number of at risk species?

My idea was to randomly allocate risk to the families based on the criteria 
below (binomial(nspecies, 0.0748)) and then compare this to the true data and 
see if there was a significant difference.

So in answer to your questions, (assuming my method is correct !)

 Is this over all families, or within a particular family? If the former, why
 does a distinction of family matter?

Within a particular family  - this is because i am looking to see if risk in 
the observed data set is random in respect to family so this will provide the 
baseline to compare against.

 I guess you've stated the p, but what's the n? The number of species in each
 family?

This varies largely, for instance i have some families that are monotypic  
(with 1 species) and then i have other families with 100+ species 


 Assuming you have multiple families, do you want separate simulations per
 family, or do you want to do some sort of weighting (perhaps proportional to
 size) over all families?

I am assuming i want some sort of weighting. This is because i am wanting to 
calculate the number of species expected to be at risk in EACH family under the 
random binomial distribution ( assuming every species has a 7.48% chance of 
being at risk.

Thanks

John




On 15 Oct 2010, at 11:19, Dennis Murphy wrote:

Hi:

I don't believe you've provided quite enough information just yet...

On Fri, Oct 15, 2010 at 2:22 AM, John Haart anothe...@me.com wrote:

 Dear List,
 
 I am doing some simulation in R and need basic help!
 
 I have a list of animal families for which i know the number of species in
 each family.
 
 I am working under the assumption that a species has a 7.48% chance of
 being at risk.
 

Is this over all families, or within a particular family? If the former, why
does a distinction of family matter?

 
 I want to simulate the number of species expected to be at risk under a
 random binomial distribution with 10,000 randomizations.
 

I guess you've stated the p, but what's the n? The number of species in each
family? If you're simulating on a family by family basis, then it would seem
that a binomial(nspecies, 0.0748) distribution would be the reference.
Assuming you have multiple families, do you want separate simulations per
family, or do you want to do some sort of weighting (perhaps proportional to
size) over all families? The latter is doable, but it would require a
two-stage simulation: one to randomly select a family and then to randomly
select a species.

Dennis


 
 I am relatively knew to this field and would greatly appreciate a
 idiot-proof response, I.e how should the data be entered into R? I was
 thinking of using read.table, header = T, where the table has F = Family
 Name, and SP = Number of species in that family?
 
 John
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random assignment

2010-10-15 Thread Michael Bedward
Hi John,

The word species attracted my attention :)

Like Dennis, I'm not sure I understand your idea properly. In
particular, I don't see what you need the simulation for.

If family F has Fn species, your random expectation is that p * Fn of
them will be at risk (p = 0.0748). The variance on that expectation
will be p * (1-p) * Fn.

If you do your simulation that's the result you'll get.  Perhaps to
initial identify families with disproportionate observed extinction
rates all you need is the dbinom function ?

Michael


On 15 October 2010 22:29, John Haart anothe...@me.com wrote:
 Hi Denis and list

 Thanks for this , and sorry for not providing enough information

 First let me put the study into a bit more context : -

 I know the number of species at risk in each family, what i am asking  is Is 
 risk random according to family or do certain families have a 
 disproportionate number of at risk species?

 My idea was to randomly allocate risk to the families based on the criteria 
 below (binomial(nspecies, 0.0748)) and then compare this to the true data 
 and see if there was a significant difference.

 So in answer to your questions, (assuming my method is correct !)

 Is this over all families, or within a particular family? If the former, why
 does a distinction of family matter?

 Within a particular family  - this is because i am looking to see if risk in 
 the observed data set is random in respect to family so this will provide 
 the baseline to compare against.

 I guess you've stated the p, but what's the n? The number of species in each
 family?

 This varies largely, for instance i have some families that are monotypic  
 (with 1 species) and then i have other families with 100+ species


 Assuming you have multiple families, do you want separate simulations per
 family, or do you want to do some sort of weighting (perhaps proportional to
 size) over all families?

 I am assuming i want some sort of weighting. This is because i am wanting to 
 calculate the number of species expected to be at risk in EACH family under 
 the random binomial distribution ( assuming every species has a 7.48% chance 
 of being at risk.

 Thanks

 John




 On 15 Oct 2010, at 11:19, Dennis Murphy wrote:

 Hi:

 I don't believe you've provided quite enough information just yet...

 On Fri, Oct 15, 2010 at 2:22 AM, John Haart anothe...@me.com wrote:

 Dear List,

 I am doing some simulation in R and need basic help!

 I have a list of animal families for which i know the number of species in
 each family.

 I am working under the assumption that a species has a 7.48% chance of
 being at risk.


 Is this over all families, or within a particular family? If the former, why
 does a distinction of family matter?


 I want to simulate the number of species expected to be at risk under a
 random binomial distribution with 10,000 randomizations.


 I guess you've stated the p, but what's the n? The number of species in each
 family? If you're simulating on a family by family basis, then it would seem
 that a binomial(nspecies, 0.0748) distribution would be the reference.
 Assuming you have multiple families, do you want separate simulations per
 family, or do you want to do some sort of weighting (perhaps proportional to
 size) over all families? The latter is doable, but it would require a
 two-stage simulation: one to randomly select a family and then to randomly
 select a species.

 Dennis



 I am relatively knew to this field and would greatly appreciate a
 idiot-proof response, I.e how should the data be entered into R? I was
 thinking of using read.table, header = T, where the table has F = Family
 Name, and SP = Number of species in that family?

 John

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create Arrays

2010-10-15 Thread Gerrit Eichner

Hi, Doug,

maybe


HH - c(0.88, 0.72, 0.89, 0.93, 1.23, 0.86, 0.98, 0.85, 1.23)
TT - c(7.14, 7.14, 7.49, 8.14, 7.14, 7.32, 7.14, 7.14, 7.14)
columnnumbers - c(0, 0, 0, 3, 0, 0, 0, 2, 0)

TMP - lapply( seq( columnnumbers),
   function( i, CN, M) {
if( CN[i] == 0) as.matrix( M[, i]) else
 matrix( -1, nrow( M), CN[i])
}, CN = columnnumbers, M = rbind( HH, TT))

do.call( cbind, TMP)



gets close to what you want (after some adaptation, of course).


 HTH  --  Gerrit

-
AOR Dr. Gerrit Eichner   Mathematical Institute, Room 212
gerrit.eich...@math.uni-giessen.de   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104  Arndtstr. 2, 35392 Giessen, Germany
Fax: +49-(0)641-99-32109http://www.uni-giessen.de/cms/eichner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random assignment

2010-10-15 Thread John Haart
Hi Michael,

Thanks for this - the reason i am following this approach is that it appeared 
in a paper i was reading, and i thought it was a interesting angle to take 

The paper is 

Vamosi  Wilson, 2008. Nonrandom extinction leads to elevated loss of 
angiosperm evolutionary history. Ecology Letters, (2008) 11: 1047–1053.

and the specific method i am following states :- 

 We calculated the number of species expected to be at risk in each family 
 under a random binomial distribution in 10 000 randomizations [generated 
 using R version 2.6.0 (R Development Team 2007)] assuming every species has a 
 7.48% chance of being at risk. 

I guess the reason i am doing the simulation is because i am not hugely 
statistically minded and the paper was asking the same question i am interested 
in answering :).

So following your approach -

 if family F has Fn species, your random expectation is that p * Fn of
 them will be at risk (p = 0.0748). The variance on that expectation
 will be p * (1-p) * Fn.


Family f = Bromeliaceae , with Fn = 80, p=0.0748
random expectation = p*Fn = (0.0748*80) = 5.984
variance = p * (1-p) * Fn = (0.0748*0.9252) *80 = 5.5363968

So the random expectation is that the Bromeliaceae will have 6 species at risk, 
if risk is assigned randomly?

So if i do this for all the families it will be the same as doing the 
simulation experiment outline in the method above?

Thanks

John




On 15 Oct 2010, at 12:49, Michael Bedward wrote:

Hi John,

The word species attracted my attention :)

Like Dennis, I'm not sure I understand your idea properly. In
particular, I don't see what you need the simulation for.

If family F has Fn species, your random expectation is that p * Fn of
them will be at risk (p = 0.0748). The variance on that expectation
will be p * (1-p) * Fn.

If you do your simulation that's the result you'll get.  Perhaps to
initial identify families with disproportionate observed extinction
rates all you need is the dbinom function ?

Michael


On 15 October 2010 22:29, John Haart anothe...@me.com wrote:
 Hi Denis and list
 
 Thanks for this , and sorry for not providing enough information
 
 First let me put the study into a bit more context : -
 
 I know the number of species at risk in each family, what i am asking  is Is 
 risk random according to family or do certain families have a 
 disproportionate number of at risk species?
 
 My idea was to randomly allocate risk to the families based on the criteria 
 below (binomial(nspecies, 0.0748)) and then compare this to the true data 
 and see if there was a significant difference.
 
 So in answer to your questions, (assuming my method is correct !)
 
 Is this over all families, or within a particular family? If the former, why
 does a distinction of family matter?
 
 Within a particular family  - this is because i am looking to see if risk in 
 the observed data set is random in respect to family so this will provide 
 the baseline to compare against.
 
 I guess you've stated the p, but what's the n? The number of species in each
 family?
 
 This varies largely, for instance i have some families that are monotypic  
 (with 1 species) and then i have other families with 100+ species
 
 
 Assuming you have multiple families, do you want separate simulations per
 family, or do you want to do some sort of weighting (perhaps proportional to
 size) over all families?
 
 I am assuming i want some sort of weighting. This is because i am wanting to 
 calculate the number of species expected to be at risk in EACH family under 
 the random binomial distribution ( assuming every species has a 7.48% chance 
 of being at risk.
 
 Thanks
 
 John
 
 
 
 
 On 15 Oct 2010, at 11:19, Dennis Murphy wrote:
 
 Hi:
 
 I don't believe you've provided quite enough information just yet...
 
 On Fri, Oct 15, 2010 at 2:22 AM, John Haart anothe...@me.com wrote:
 
 Dear List,
 
 I am doing some simulation in R and need basic help!
 
 I have a list of animal families for which i know the number of species in
 each family.
 
 I am working under the assumption that a species has a 7.48% chance of
 being at risk.
 
 
 Is this over all families, or within a particular family? If the former, why
 does a distinction of family matter?
 
 
 I want to simulate the number of species expected to be at risk under a
 random binomial distribution with 10,000 randomizations.
 
 
 I guess you've stated the p, but what's the n? The number of species in each
 family? If you're simulating on a family by family basis, then it would seem
 that a binomial(nspecies, 0.0748) distribution would be the reference.
 Assuming you have multiple families, do you want separate simulations per
 family, or do you want to do some sort of weighting (perhaps proportional to
 size) over all families? The latter is doable, but it would require a
 two-stage simulation: one to randomly select a family and then to randomly
 select a species.
 
 Dennis
 
 
 
 I am relatively knew to this field and would 

Re: [R] Nonlinear Regression Parameter Shared Across Multiple Data Sets

2010-10-15 Thread Jared Blashka
Looking at the source for nlrob, it looks like it saves the coefficients
from the results of running an nls and then passes those coefficients back
into the next nls request. The issue that it's running into is that nls
returns the coefficients as upper, LOGEC501, LOGEC502, and LOGEC503, rather
than just upper and a vector named LOGEC50. Does anyone know a way to
restructure the formula/start parameter so that coef returns a vector
instead of each element individually? Right now, I've had to 'hack' nlrob so
it recombines similarly named elements into a vector, but was wondering if
there was a way to accomplish the end goal without those measures.

Thanks,
Jared

On Wed, Oct 13, 2010 at 3:14 PM, Jared Blashka evilamaran...@gmail.comwrote:

 As an addendum to my question, I'm attempting to apply the solution to the
 robust non-linear regression function nlrob from the robustbase package, and
 it doesn't work in that situation. I'm getting

 allRobustFit - nlrob(Y ~ (upper)/(1+10^(X-LOGEC50[dset])), data=all
 ,start=list(upper=max(all$Y),LOGEC50=c(-8.5,-8.5,-8.5)))
 Error in nls(formula, data = data, start = start, algorithm = algorithm,
  :
   parameters without starting value in 'data': LOGEC50

 I'm guessing this is because the nlrob function doesn't know what to do
 with a vector for a start value. Am I correct and is there another method of
 using nlrob in the same way?

 Thanks,
 Jared

 On Tue, Oct 12, 2010 at 8:58 AM, Jared Blashka evilamaran...@gmail.comwrote:

 Thanks so much! It works great.

 I had thought the way to do it relied on combining the data sets, but I
 couldn't figure out how to alter the formula to work with the combination.

 Jared


 On Tue, Oct 12, 2010 at 7:07 AM, Keith Jewell k.jew...@campden.co.ukwrote:


 Jared Blashka evilamaran...@gmail.com wrote in message
 news:aanlktinffmudugqnkudvr=fmf0wrrtsbjxjexuki_...@mail.gmail.com...
  I'm working with 3 different data sets and applying this non-linear
  regression formula to each of them.
 
  nls(Y ~ (upper)/(1+10^(X-LOGEC50)), data=std_no_outliers,
  start=list(upper=max(std_no_outliers$Y),LOGEC50=-8.5))
 
  Previously, all of the regressions were calculated in Prism, but I'd
 like
  to
  be able to automate the calculation process in a script, which is why
 I'm
  trying to move to R. The issue I'm running into is that previously, in
  Prism, I was able to calculate a shared value for a constraint so that
 all
  three data sets shared the same value, but have other constraints
  calculated
  separately. So Prism would figure out what single value for the
 constraint
  in question would work best across all three data sets. For my formula,
  each
  data set needs it's own LOGEC50 value, but the upper value should be
 the
  same across the 3 sets. Is there a way to do this within R, or with a
  package I'm not aware of, or will I need to write my own nls function
 to
  work with multiple data sets, because I've got no idea where to start
 with
  that.
 
  Thanks,
  Jared
 
  [[alternative HTML version deleted]]
 
 An approach which works for me (code below to illustrate principle, not
 tried...)

 1) combine all three data sets into one dataframe with a column (e.g.
 dset) indicating data set (1, 2 or 3)

 2) express your formula with upper as single valued and LOGEC50 as a
 vector
 inderxed by dest e.g.
 Y ~ upper/(1+10^(C-LOGEC50[dset]))

 3) in the start list, make LOGEC50 a vector e.g. using -8.5 as start for
 all
 three LOGEC50 values
   start =
 list(start=list(upper=max(std_no_outliers$Y),LOGEC50=c(-8.5, -8.5, -8.5))

 Hope that helps,

 Keith J

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] scores for a new observation from PCAgrid() in pcaPP

2010-10-15 Thread Kari Ruohonen
Hi,
I a trying to compute scores for a new observation based on previously
computed PCA by PCAgrid() function in the pcaPP package. My data has
more variables than observations.

Here is an imaginary data set to show the case:
 n.samples-30
 n.bins-1000
 x.sim-rep(0,n.bins)
 V.sim-diag(n.bins)
 mtx-array(dim=c(n.samples,n.bins))
 for(i in 1:n.samples) mtx[i,]-mvrnorm(1,x.sim,V.sim)

With prcomp() I can do the following:

 pc.pr2-prcomp(mtx,scale=TRUE)
 newscr.pr2-scale(t(mtx[1,]),pc.pr2$center,pc.pr2$scale)%*%pc.pr2
$rotation

The latter computes the scores for the first row of mtx. I can verify
that the scores are the same as computed originally by comparing with

 pc.pr2$x[1,] # that will print out the scores for the first
observation

Now, if I tried the same with PCAgrid() as follows:

 pc.pp2-PCAgrid(mtx,k=min(dim(mtx)),scale=mad)
 newscr.pp2-scale(t(mtx[1,]),pc.pp2$center,pc.pp2$scale)%*%pc.pp2
$loadings

The newscr.pp2 do not match the scores in the pc.pp2 object as can be
verified by comparing with:
 pc.pp2$x[1,] 

I wonder what I am missing? Or is it so that for the grid method such
computation of scores from the loadings and original observations is not
possible?

For the case pn, i.e. when there are more observations than variables,
the scores computed from loadings and the scores from the model object
match also for the PCAgrid() method, i.e. the behaviour described above
seems to relate to cases where pn.

Many thanks for any help,
Kari

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merging and working with BIG data sets. Is sqldf the best way??

2010-10-15 Thread Gabor Grothendieck
On Fri, Oct 15, 2010 at 6:14 AM, Chris Howden
ch...@trickysolutions.com.au wrote:
 Thanks for the advice Gabor,

 I was indeed not starting and finishing with sqldf(). Which was why it was
 not working for me. Please forgive a blatantly obvious mistake.


 I have tried what U suggested and unfortunately R is still having problems
 doing the join. The problem seems to be one of memory since I am receiving
 the below error message when I run the natural join using sqldf.
 Error: cannot allocate vector of size 250.0 Mb
 Timing stopped at: 32.8 0.48 34.79


Specify an external database so it uses an external, rather than in
memory, database.  See example 6b (also shown in several other
examples) on the home page:

   sqldf(c(create..., create..., select...), dbname = tempfile())

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random assignment

2010-10-15 Thread Michael Bedward
Hi John,

I haven't read that particular paper but in answer to your question...

 So if i do this for all the families it will be the same as doing the 
 simulation experiment
 outline in the method above?

Yes :)

Michael


On 15 October 2010 23:18, John Haart anothe...@me.com wrote:
 Hi Michael,

 Thanks for this - the reason i am following this approach is that it appeared 
 in a paper i was reading, and i thought it was a interesting angle to take

 The paper is

 Vamosi  Wilson, 2008. Nonrandom extinction leads to elevated loss of 
 angiosperm evolutionary history. Ecology Letters, (2008) 11: 1047–1053.

 and the specific method i am following states :-

 We calculated the number of species expected to be at risk in each family 
 under a random binomial distribution in 10 000 randomizations [generated 
 using R version 2.6.0 (R Development Team 2007)] assuming every species has 
 a 7.48% chance of being at risk.

 I guess the reason i am doing the simulation is because i am not hugely 
 statistically minded and the paper was asking the same question i am 
 interested in answering :).

 So following your approach -

 if family F has Fn species, your random expectation is that p * Fn of
 them will be at risk (p = 0.0748). The variance on that expectation
 will be p * (1-p) * Fn.


 Family f = Bromeliaceae , with Fn = 80, p=0.0748
 random expectation = p*Fn = (0.0748*80) = 5.984
 variance = p * (1-p) * Fn = (0.0748*0.9252) *80 = 5.5363968

 So the random expectation is that the Bromeliaceae will have 6 species at 
 risk, if risk is assigned randomly?

 So if i do this for all the families it will be the same as doing the 
 simulation experiment outline in the method above?

 Thanks

 John




 On 15 Oct 2010, at 12:49, Michael Bedward wrote:

 Hi John,

 The word species attracted my attention :)

 Like Dennis, I'm not sure I understand your idea properly. In
 particular, I don't see what you need the simulation for.

 If family F has Fn species, your random expectation is that p * Fn of
 them will be at risk (p = 0.0748). The variance on that expectation
 will be p * (1-p) * Fn.

 If you do your simulation that's the result you'll get.  Perhaps to
 initial identify families with disproportionate observed extinction
 rates all you need is the dbinom function ?

 Michael


 On 15 October 2010 22:29, John Haart anothe...@me.com wrote:
 Hi Denis and list

 Thanks for this , and sorry for not providing enough information

 First let me put the study into a bit more context : -

 I know the number of species at risk in each family, what i am asking  is 
 Is risk random according to family or do certain families have a 
 disproportionate number of at risk species?

 My idea was to randomly allocate risk to the families based on the criteria 
 below (binomial(nspecies, 0.0748)) and then compare this to the true data 
 and see if there was a significant difference.

 So in answer to your questions, (assuming my method is correct !)

 Is this over all families, or within a particular family? If the former, why
 does a distinction of family matter?

 Within a particular family  - this is because i am looking to see if risk in 
 the observed data set is random in respect to family so this will provide 
 the baseline to compare against.

 I guess you've stated the p, but what's the n? The number of species in each
 family?

 This varies largely, for instance i have some families that are monotypic  
 (with 1 species) and then i have other families with 100+ species


 Assuming you have multiple families, do you want separate simulations per
 family, or do you want to do some sort of weighting (perhaps proportional to
 size) over all families?

 I am assuming i want some sort of weighting. This is because i am wanting to 
 calculate the number of species expected to be at risk in EACH family under 
 the random binomial distribution ( assuming every species has a 7.48% chance 
 of being at risk.

 Thanks

 John




 On 15 Oct 2010, at 11:19, Dennis Murphy wrote:

 Hi:

 I don't believe you've provided quite enough information just yet...

 On Fri, Oct 15, 2010 at 2:22 AM, John Haart anothe...@me.com wrote:

 Dear List,

 I am doing some simulation in R and need basic help!

 I have a list of animal families for which i know the number of species in
 each family.

 I am working under the assumption that a species has a 7.48% chance of
 being at risk.


 Is this over all families, or within a particular family? If the former, why
 does a distinction of family matter?


 I want to simulate the number of species expected to be at risk under a
 random binomial distribution with 10,000 randomizations.


 I guess you've stated the p, but what's the n? The number of species in each
 family? If you're simulating on a family by family basis, then it would seem
 that a binomial(nspecies, 0.0748) distribution would be the reference.
 Assuming you have multiple families, do you want separate simulations per
 family, or do you 

Re: [R] fast rowCumsums wanted for calculating the cdf

2010-10-15 Thread Douglas Bates
Although I know there is another message in this thread I am replying
to this message to be able to include the whole discussion to this
point.

Gregor mentioned the possibility of extending the compiled code for
cumsum so that it would handle the matrix case.  The work by Dirk
Eddelbuettel and Romain Francois on developing the Rcpp package make
it surprisingly easy to create compiled code for this task.  I am
copying the Rcpp-devel list on this in case one of the readers of that
list has time to create a sample implementation before I can get to
it.  It's midterm season and I have to tackle the stack of blue books
on my desk before doing fun things like writing code.

On Fri, Oct 15, 2010 at 2:23 AM, Joshua Wiley jwiley.ps...@gmail.com wrote:
 Hi,

 You might look at Reduce().  It seems faster.  I converted the matrix
 to a list in an incredibly sloppy way (which you should not emulate)
 because I cannot think of  the simple way.

 probs - t(matrix(rep(1:1000), nrow=10)) # matrix with row-wise 
 probabilites
 F - matrix(0, nrow=nrow(probs), ncol=ncol(probs));
 F[,1] - probs[,1,drop=TRUE];
 add - function(x) {Reduce(`+`, x, accumulate = TRUE)}


 system.time(F.slow - t(apply(probs, 1, cumsum)))
   user  system elapsed
  36.758   0.416  42.464

 system.time(for (cc in 2:ncol(F)) {
 +  F[,cc] - F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
 + })
   user  system elapsed
  0.980   0.196   1.328

 system.time(add(list(probs[,1], probs[,2], probs[,3], probs[,4], probs[,5], 
 probs[,6], probs[,7], probs[,8], probs[,9], probs[,10])))
   user  system elapsed
  0.420   0.072   0.539


 .539 seconds for 10 vectors each with 1 million elements does not seem
 bad.  For 3 each, on my system it took .014 seconds, which for
 200,000 iterations, I estimated as:

 (20*.014)/60/60
 [1] 0.778

 (and this is on a stone age system with a single core processor and
 only 756MB of RAM)

 It should not be difficult to get the list back to a matrix.

 Cheers,

 Josh



 On Thu, Oct 14, 2010 at 11:51 PM, Gregor mailingl...@gmx.at wrote:
 Dear all,

 Maybe the easiest solution: Is there anything that speaks against 
 generalizing
 cumsum from base to cope with matrices (as is done in matlab)? E.g.:

 cumsum(Matrix, 1)
 equivalent to
 apply(Matrix, 1, cumsum)

 The main advantage could be optimized code if the Matrix is extreme nonsquare
 (e.g. 100,000x10), but the summation is done over the short side (in this 
 case 10).
 apply would practically yield a loop over 100,000 elements, and 
 vectorization w.r.t.
 the long side (loop over 10 elements) provides considerable efficiency gains.

 Many regards,
 Gregor




 On Tue, 12 Oct 2010 10:24:53 +0200
 Gregor mailingl...@gmx.at wrote:

 Dear all,

 I am struggling with a (currently) cost-intensive problem: calculating the
 (non-normalized) cumulative distribution function, given the 
 (non-normalized)
 probabilities. something like:

 probs - t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites
 F - t(apply(probs, 1, cumsum)) #SLOOOW!

 One (already faster, but for sure not ideal) solution - thanks to Henrik 
 Bengtsson:

 F - matrix(0, nrow=nrow(probs), ncol=ncol(probs));
 F[,1] - probs[,1,drop=TRUE];
 for (cc in 2:ncol(F)) {
   F[,cc] - F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
 }

 In my case, probs is a (30,000 x 10) matrix, and i need to iterate this 
 step around
 200,000 times, so speed is crucial. I currently can make sure to have no 
 NAs, but
 in order to extend matrixStats, this could be a nontrivial issue.

 Any ideas for speeding up this - probably routine - task?

 Thanks in advance,
 Gregor

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 http://www.joshuawiley.com/

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] split data with missing data condition

2010-10-15 Thread Jumlong Vongprasert
Dear all
I have data like this:
  x  y
  [1,] 59.74889  3.1317081
  [2,] 38.77629  1.7102589
  [3,]   NA  2.2312962
  [4,] 32.35268  1.3889621
  [5,] 74.01394  1.5361227
  [6,] 34.82584  1.1665412
  [7,] 42.72262  2.7870875
  [8,] 70.54999  3.3917257
  [9,] 59.37573  2.6763249
 [10,] 68.87422  1.9697770
 [11,] 19.00898  2.0584415
 [12,] 60.27915  2.5365194
 [13,] 50.76850  2.3943836
 [14,]   NA  2.2862790
 [15,] 39.01229  1.7924957

and I want to spit data into two set of data,  data set of nonmising and
data set of missing.
How I can do this.
Many Thanks.
Jumlong


-- 
Jumlong Vongprasert
Institute of Research and Development
Ubon Ratchathani Rajabhat University
Ubon Ratchathani
THAILAND
34000

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] creating 'all' sum contrasts

2010-10-15 Thread Berwin A Turlach
G'day Michael,

On Fri, 15 Oct 2010 12:09:07 +0100
Michael Hopkins hopk...@upstreamsystems.com wrote:

 OK, my last question didn't get any replies so I am going to try and
 ask a different way.
 
 When I generate contrasts with contr.sum() for a 3 level categorical
 variable I get the 2 orthogonal contrasts:
 
  contr.sum( c(1,2,3) )
   [,1] [,2]
 110
 201
 3   -1   -1

These two contrasts are *not* orthogonal.

 This provides the contrasts 1-3 and 2-3 as expected.  But I also
 want it to create 1-2 (i.e. 1-3 - 2-3).  So in general I want
 all possible orthogonal contrasts - think of it as the contrasts for
 all pairwise comparisons between the levels.

You have to decide what you want.  The contrasts for all pairwise
comparaisons between the levels or all possible orthogonal contrasts?

The latter is actually not well defined.  For a factor with p levels,
there would be p orthogonal contrasts, which are only identifiable up to
rotation, hence infinitely many such sets. But there are p(p-1)
pairwise comparisons. So unless p=2, yo have to decide what you want

 Are there are any options for contrast() or other functions/libraries
 that will allow me to do this automatically?

Look at package multcomp, in particular functions glht and mcp, these
might help.

Cheers,

Berwin

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] split data with missing data condition

2010-10-15 Thread Dimitris Rizopoulos

you can do the following:

mat - cbind(x = runif(15, 50, 70), y = rnorm(15, 2))
mat[sample(15, 2), x] - NA

na.x - is.na(mat[, 1])
mat[na.x, ]
mat[!na.x, ]


I hope it helps.

Best,
Dimitris


On 10/15/2010 2:45 PM, Jumlong Vongprasert wrote:

Dear all
I have data like this:
   x  y
   [1,] 59.74889  3.1317081
   [2,] 38.77629  1.7102589
   [3,]   NA  2.2312962
   [4,] 32.35268  1.3889621
   [5,] 74.01394  1.5361227
   [6,] 34.82584  1.1665412
   [7,] 42.72262  2.7870875
   [8,] 70.54999  3.3917257
   [9,] 59.37573  2.6763249
  [10,] 68.87422  1.9697770
  [11,] 19.00898  2.0584415
  [12,] 60.27915  2.5365194
  [13,] 50.76850  2.3943836
  [14,]   NA  2.2862790
  [15,] 39.01229  1.7924957

and I want to spit data into two set of data,  data set of nonmising and
data set of missing.
How I can do this.
Many Thanks.
Jumlong




--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] split data with missing data condition

2010-10-15 Thread jim holtman
Try this:

 a - read.table(textConnection( x  y
+  59.74889  3.1317081
+  38.77629  1.7102589
+NA  2.2312962
+  32.35268  1.3889621
+  74.01394  1.5361227
+  34.82584  1.1665412
+  42.72262  2.7870875
+  70.54999  3.3917257
+  59.37573  2.6763249
+   68.87422  1.9697770
+   19.00898  2.0584415
+   60.27915  2.5365194
+   50.76850  2.3943836
+ NA  2.2862790
+   39.01229  1.7924957), header=TRUE)
 a - as.matrix(a)
 # good data
 a.good - a[complete.cases(a),, drop=FALSE]
 a.bad - a[!complete.cases(a),, drop=FALSE]
 a.good
 xy
 [1,] 59.74889 3.131708
 [2,] 38.77629 1.710259
 [3,] 32.35268 1.388962
 [4,] 74.01394 1.536123
 [5,] 34.82584 1.166541
 [6,] 42.72262 2.787088
 [7,] 70.54999 3.391726
 [8,] 59.37573 2.676325
 [9,] 68.87422 1.969777
[10,] 19.00898 2.058441
[11,] 60.27915 2.536519
[12,] 50.76850 2.394384
[13,] 39.01229 1.792496
 a.bad
  xy
[1,] NA 2.231296
[2,] NA 2.286279



On Fri, Oct 15, 2010 at 8:45 AM, Jumlong Vongprasert
jumlong.u...@gmail.com wrote:
 Dear all
 I have data like this:
              x          y
  [1,] 59.74889  3.1317081
  [2,] 38.77629  1.7102589
  [3,]       NA  2.2312962
  [4,] 32.35268  1.3889621
  [5,] 74.01394  1.5361227
  [6,] 34.82584  1.1665412
  [7,] 42.72262  2.7870875
  [8,] 70.54999  3.3917257
  [9,] 59.37573  2.6763249
  [10,] 68.87422  1.9697770
  [11,] 19.00898  2.0584415
  [12,] 60.27915  2.5365194
  [13,] 50.76850  2.3943836
  [14,]       NA  2.2862790
  [15,] 39.01229  1.7924957

 and I want to spit data into two set of data,  data set of nonmising and
 data set of missing.
 How I can do this.
 Many Thanks.
 Jumlong


 --
 Jumlong Vongprasert
 Institute of Research and Development
 Ubon Ratchathani Rajabhat University
 Ubon Ratchathani
 THAILAND
 34000

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Color individual leaf labels in dendrogram

2010-10-15 Thread Kennedy

Hi,

I have performed a clustering of a matrix and plotted the result with
pltree. See code below. I want to color the labels of the leafs
individually. For example I want the label name Node 2 to be plotted in
red. How do I do this?

Sincerely 

Henrik


  library(cluster) 

  D - matrix(nr=4,nc=4)
  rownames(D) - c(Node 1,Node 2,Node 3,Node 4)
  D[1,] - c(0,.6,.1,.7)
  D[2,] - c(.6,.0,.3,.9)
  D[3,] - c(.1,.3,0,.9)
  D[4,] - c(.7,.9,.9,0)

  C - agnes(D,diss=T,method=complete)

  pltree(C)

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Color-individual-leaf-labels-in-dendrogram-tp2996982p2996982.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] split data with missing data condition

2010-10-15 Thread Petr PIKAL
Hi

r-help-boun...@r-project.org napsal dne 15.10.2010 15:00:46:

 you can do the following:
 
 mat - cbind(x = runif(15, 50, 70), y = rnorm(15, 2))
 mat[sample(15, 2), x] - NA
 
 na.x - is.na(mat[, 1])
 mat[na.x, ]
 mat[!na.x, ]

Or if you have missing data in several columns and you want select only 
those which are complete use complete.cases

mat[complete.cases(mat},]
mat[!complete.cases(mat},]

Regards
Petr

 
 
 I hope it helps.
 
 Best,
 Dimitris
 
 
 On 10/15/2010 2:45 PM, Jumlong Vongprasert wrote:
  Dear all
  I have data like this:
 x  y
 [1,] 59.74889  3.1317081
 [2,] 38.77629  1.7102589
 [3,]   NA  2.2312962
 [4,] 32.35268  1.3889621
 [5,] 74.01394  1.5361227
 [6,] 34.82584  1.1665412
 [7,] 42.72262  2.7870875
 [8,] 70.54999  3.3917257
 [9,] 59.37573  2.6763249
[10,] 68.87422  1.9697770
[11,] 19.00898  2.0584415
[12,] 60.27915  2.5365194
[13,] 50.76850  2.3943836
[14,]   NA  2.2862790
[15,] 39.01229  1.7924957
 
  and I want to spit data into two set of data,  data set of nonmising 
and
  data set of missing.
  How I can do this.
  Many Thanks.
  Jumlong
 
 
 
 -- 
 Dimitris Rizopoulos
 Assistant Professor
 Department of Biostatistics
 Erasmus University Medical Center
 
 Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
 Tel: +31/(0)10/7043478
 Fax: +31/(0)10/7043014
 Web: http://www.erasmusmc.nl/biostatistiek/
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [Rcpp-devel] fast rowCumsums wanted for calculating the cdf

2010-10-15 Thread Romain Francois

Hi,

This would probably deserve some abstraction, we had C++ versions of 
apply in our TODO list for some time, but here is a shot:



require( Rcpp )
require( inline )


f.Rcpp - cxxfunction( signature( x = matrix ), '

NumericMatrix input( x ) ;
NumericMatrix output  = cloneNumericMatrix( input ) ;

int nr = input.nrow(), nc = input.ncol() ;
NumericVector tmp( nr );
for( int i=0; inc; i++){
tmp = tmp + input.column(i) ;
NumericMatrix::Column target( output, i ) ;
std::copy( tmp.begin(), tmp.end(), target.begin() ) ;
}
return output ;


', plugin = Rcpp )

f.R - function( x ){
t(apply(probs, 1, cumsum)) #SLOOOW!
}



 probs - t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites
stopifnot( all.equal( f.R( probs ), f.Rcpp( probs ) ) )

require( rbenchmark )
probs - t(matrix(rep(1:1000), nrow=10)) # matrix with row-wise 
probabilites


bm - benchmark(
f.R( probs ),
f.Rcpp( probs ),
columns=c(test, elapsed, relative, user.self, sys.self),
order=relative,
replications=10)
print( bm )


I get this on my iMac with R 2.12.0 and Rcpp as just released to CRAN.

$ Rscript cumsum.R
   test elapsed relative user.self sys.self
2 f.Rcpp(probs)   4.614  1.0 4.3750.239
1f.R(probs)  68.645 14.8775567.7650.877

When we implement apply in C++, we will probably leverage loop 
unrolling to achieve greater performance.


Romain

Le 15/10/10 14:34, Douglas Bates a écrit :

Although I know there is another message in this thread I am replying
to this message to be able to include the whole discussion to this
point.

Gregor mentioned the possibility of extending the compiled code for
cumsum so that it would handle the matrix case.  The work by Dirk
Eddelbuettel and Romain Francois on developing the Rcpp package make
it surprisingly easy to create compiled code for this task.  I am
copying the Rcpp-devel list on this in case one of the readers of that
list has time to create a sample implementation before I can get to
it.  It's midterm season and I have to tackle the stack of blue books
on my desk before doing fun things like writing code.

On Fri, Oct 15, 2010 at 2:23 AM, Joshua Wileyjwiley.ps...@gmail.com  wrote:

Hi,

You might look at Reduce().  It seems faster.  I converted the matrix
to a list in an incredibly sloppy way (which you should not emulate)
because I cannot think of  the simple way.


probs- t(matrix(rep(1:1000), nrow=10)) # matrix with row-wise probabilites
F- matrix(0, nrow=nrow(probs), ncol=ncol(probs));
F[,1]- probs[,1,drop=TRUE];
add- function(x) {Reduce(`+`, x, accumulate = TRUE)}


system.time(F.slow- t(apply(probs, 1, cumsum)))

   user  system elapsed
  36.758   0.416  42.464


system.time(for (cc in 2:ncol(F)) {

+  F[,cc]- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
+ })
   user  system elapsed
  0.980   0.196   1.328


system.time(add(list(probs[,1], probs[,2], probs[,3], probs[,4], probs[,5], 
probs[,6], probs[,7], probs[,8], probs[,9], probs[,10])))

   user  system elapsed
  0.420   0.072   0.539




.539 seconds for 10 vectors each with 1 million elements does not seem
bad.  For 3 each, on my system it took .014 seconds, which for
200,000 iterations, I estimated as:


(20*.014)/60/60

[1] 0.778

(and this is on a stone age system with a single core processor and
only 756MB of RAM)

It should not be difficult to get the list back to a matrix.

Cheers,

Josh



On Thu, Oct 14, 2010 at 11:51 PM, Gregormailingl...@gmx.at  wrote:

Dear all,

Maybe the easiest solution: Is there anything that speaks against generalizing
cumsum from base to cope with matrices (as is done in matlab)? E.g.:

cumsum(Matrix, 1)
equivalent to
apply(Matrix, 1, cumsum)

The main advantage could be optimized code if the Matrix is extreme nonsquare
(e.g. 100,000x10), but the summation is done over the short side (in this case 
10).
apply would practically yield a loop over 100,000 elements, and vectorization 
w.r.t.
the long side (loop over 10 elements) provides considerable efficiency gains.

Many regards,
Gregor




On Tue, 12 Oct 2010 10:24:53 +0200
Gregormailingl...@gmx.at  wrote:


Dear all,

I am struggling with a (currently) cost-intensive problem: calculating the
(non-normalized) cumulative distribution function, given the (non-normalized)
probabilities. something like:

probs- t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites
F- t(apply(probs, 1, cumsum)) #SLOOOW!

One (already faster, but for sure not ideal) solution - thanks to Henrik 
Bengtsson:

F- matrix(0, nrow=nrow(probs), ncol=ncol(probs));
F[,1]- probs[,1,drop=TRUE];
for (cc in 2:ncol(F)) {
   F[,cc]- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
}

In my case, probs is a (30,000 x 10) matrix, and i need to iterate this step 
around
200,000 times, so speed is crucial. I currently can make sure to have no NAs, 
but
in order to extend matrixStats, this could be a nontrivial issue.

Any ideas for speeding up this 

Re: [R] interaction contrasts

2010-10-15 Thread Kay Cichini

..by some (extensive) trial and error reordering the contrast matrix and the
reference level 
i figured it out myself -
for anyone who might find this helpful searching for a similar contrast in
the future: 
this should be the right one:

c2-rbind(fac2-effect in A=c(0,1,0,0,0,0,0,0),
  fac2-effect in B=c(0,1,0,0,0,1,0,0),
  fac2-effect in C=c(0,1,0,0,0,0,1,0),
  fac2-effect in D=c(0,1,0,0,0,0,0,1),
fac2-effect, A*B=c(0,0,0,0,0,1,0,0),
  fac2-effect, A*C=c(0,0,0,0,0,0,1,0),
  fac2-effect, A*D=c(0,0,0,0,0,0,0,1),
fac2-effect, B*C=c(0,0,0,0,0,-1,1,0),
fac2-effect, B*D=c(0,0,0,0,0,-1,0,1),
fac2-effect, C*D=c(0,0,0,0,0,0,-1,1))

summary(glht(mod,c2))


Kay Cichini wrote:
 
 hello,
 
 i was shortly asking the list for help with some interaction contrasts
 (see below) for which
 i had to change the reference level of the model on the fly (i read a
 post that this is possible in 
 multcomp).
 
 if someone has a clue how this is coded in multcomp; glht() - please point
 me there.
 
 yours,
 kay
 
 
 Kay Cichini wrote:
 
 hello list,
 
 i'd very much appreciate help with setting up the 
 contrast for a 2-factorial crossed design.
 
 here is a toy example:
 
 library(multcomp)
 
 dat-data.frame(fac1=gl(4,8,labels=LETTERS[1:4]),
 fac2=rep(c(I,II),16),y=rnorm(32,1,1))
 
 mod-lm(y~fac1*fac2,data=dat)
 
 ## the contrasts i'm interressted in:
 
 c1-rbind(fac2-effect in A=c(0,1,0,0,0,0,0,0),
   fac2-effect in B=c(0,1,0,0,0,1,0,0),
   fac2-effect in C=c(0,1,0,0,0,0,1,0),
   fac2-effect in D=c(0,1,0,0,0,0,0,1),
  fac2-effect, A*B=c(0,0,0,0,0,1,0,0),
   fac2-effect, A*C=c(0,0,0,0,0,0,1,0),
   fac2-effect, A*D=c(0,0,0,0,0,0,0,1))
 
 summary(glht(mod,c1))
 
 ## now i want to add the remaining combinations 
 ## fac2, B*C
 ## fac2, B*D
 ## fac2, C*D
 ## to the simultanous tests to see whether the effects 
 ## of fac2 within the levels of fac1 differ between  
 ## each combination of the levels of fac1, or not ??
 
 thanks for any advise!
 
 yours,
 kay
 
 
 
 
 


-

Kay Cichini
Postgraduate student
Institute of Botany
Univ. of Innsbruck


-- 
View this message in context: 
http://r.789695.n4.nabble.com/interaction-contrasts-tp2993845p2996987.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in DGEList function

2010-10-15 Thread Ying Ye

Hi!

I am a new R user and have no clue of this error (see below) while  
using edgeR package:


 Y - clade_reads
 y - Y[,c(g1,g2)]
 grouping - c( rep(1,length(g1)), rep(2,length(g2)) )
 size - apply(y, 2, sum)
 d - DGEList(data = y, group = grouping, lib.size = size)
Error in DGEList(data = y, group = grouping, lib.size = size) :
  unused argument(s) (data = y)


And g1 and g2 are two defined groups. Could anyone kindly interpret  
this?


Many thanks!

mikecrux

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random assignment

2010-10-15 Thread Michael Bedward
Hello again John,

I was going to suggest that you just use qbinom to generate the
expected number of extinctions. For example, for the family with 80
spp the central 95% expectation is:

qbinom(c(0.025, 0.975), 80, 0.0748)

which gives 2 - 11 spp.

If you wanted to do look across a large number of families you'd need
to deal with multiple comparison error but as a quick first look it
might be helpful.

However, I've just got a copy of teh paper and it seems that the
authors are calculating something different to a simple binomial
expecation: they are differentiating between high-risk (red listed)
and low-risk species within a family. They state that this equation
(expressed here in R-ese)...

choose(N, R) * p^R * b^(N - R)

...gives the probabilitiy of an entire family becoming extinct, where
N is number of spp in family; R is number of those that are red
listed; p is extinction probability for red list spp (presumably over
some period but I haven't read the paper properly yet); b is
extinction probability for other spp.

Then, in their simulations they hold b constant but play around with a
range of values for p.

So this sounds a bit different to what you originally posted as your
objective (?)


Michael



On 15 October 2010 22:49, Michael Bedward michael.bedw...@gmail.com wrote:
 Hi John,

 The word species attracted my attention :)

 Like Dennis, I'm not sure I understand your idea properly. In
 particular, I don't see what you need the simulation for.

 If family F has Fn species, your random expectation is that p * Fn of
 them will be at risk (p = 0.0748). The variance on that expectation
 will be p * (1-p) * Fn.

 If you do your simulation that's the result you'll get.  Perhaps to
 initial identify families with disproportionate observed extinction
 rates all you need is the dbinom function ?

 Michael


 On 15 October 2010 22:29, John Haart anothe...@me.com wrote:
 Hi Denis and list

 Thanks for this , and sorry for not providing enough information

 First let me put the study into a bit more context : -

 I know the number of species at risk in each family, what i am asking  is 
 Is risk random according to family or do certain families have a 
 disproportionate number of at risk species?

 My idea was to randomly allocate risk to the families based on the criteria 
 below (binomial(nspecies, 0.0748)) and then compare this to the true data 
 and see if there was a significant difference.

 So in answer to your questions, (assuming my method is correct !)

 Is this over all families, or within a particular family? If the former, why
 does a distinction of family matter?

 Within a particular family  - this is because i am looking to see if risk in 
 the observed data set is random in respect to family so this will provide 
 the baseline to compare against.

 I guess you've stated the p, but what's the n? The number of species in each
 family?

 This varies largely, for instance i have some families that are monotypic  
 (with 1 species) and then i have other families with 100+ species


 Assuming you have multiple families, do you want separate simulations per
 family, or do you want to do some sort of weighting (perhaps proportional to
 size) over all families?

 I am assuming i want some sort of weighting. This is because i am wanting to 
 calculate the number of species expected to be at risk in EACH family under 
 the random binomial distribution ( assuming every species has a 7.48% chance 
 of being at risk.

 Thanks

 John




 On 15 Oct 2010, at 11:19, Dennis Murphy wrote:

 Hi:

 I don't believe you've provided quite enough information just yet...

 On Fri, Oct 15, 2010 at 2:22 AM, John Haart anothe...@me.com wrote:

 Dear List,

 I am doing some simulation in R and need basic help!

 I have a list of animal families for which i know the number of species in
 each family.

 I am working under the assumption that a species has a 7.48% chance of
 being at risk.


 Is this over all families, or within a particular family? If the former, why
 does a distinction of family matter?


 I want to simulate the number of species expected to be at risk under a
 random binomial distribution with 10,000 randomizations.


 I guess you've stated the p, but what's the n? The number of species in each
 family? If you're simulating on a family by family basis, then it would seem
 that a binomial(nspecies, 0.0748) distribution would be the reference.
 Assuming you have multiple families, do you want separate simulations per
 family, or do you want to do some sort of weighting (perhaps proportional to
 size) over all families? The latter is doable, but it would require a
 two-stage simulation: one to randomly select a family and then to randomly
 select a species.

 Dennis



 I am relatively knew to this field and would greatly appreciate a
 idiot-proof response, I.e how should the data be entered into R? I was
 thinking of using read.table, header = T, where the table has F = Family
 Name, and SP = Number 

Re: [R] Error in DGEList function

2010-10-15 Thread Martin Morgan
On 10/15/2010 06:17 AM, Ying Ye wrote:
 Hi!
 
 I am a new R user and have no clue of this error (see below) while using
 edgeR package:

edgeR is a Bioconductor pacakge so please subscribe to the Bioconductor
list and ask there.

http://bioconductor.org/help/mailing-list/

include the output of the sessionInfo() command after library(edgeR)

Martin

 
 Y - clade_reads
 y - Y[,c(g1,g2)]
 grouping - c( rep(1,length(g1)), rep(2,length(g2)) )
 size - apply(y, 2, sum)
 d - DGEList(data = y, group = grouping, lib.size = size)
 Error in DGEList(data = y, group = grouping, lib.size = size) :
   unused argument(s) (data = y)
 
 
 And g1 and g2 are two defined groups. Could anyone kindly interpret this?
 
 Many thanks!
 
 mikecrux
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] p-values from coxph?

2010-10-15 Thread Öhagen Patrik

Dear List,

I each iteration of a simulation study, I would like to save the p-value 
generated by coxph. I fail to see how to adress the p-value. Do I have to 
calculate it myself from the Wald Test statistic?


Cheers, Paddy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Color individual leaf labels in dendrogram

2010-10-15 Thread Bryan Hanson
Henrik, there is an easily adaptable example in this thread:
http://r.789695.n4.nabble.com/coloring-leaves-in-a-hclust-or-dendrogram-plot
-tt795496.html#a795497

HTH. Bryan
*
Bryan Hanson
Professor of Chemistry  Biochemistry
DePauw University, Greencastle IN USA



On 10/15/10 9:05 AM, Kennedy henrik.aldb...@gmail.com wrote:

 
 Hi,
 
 I have performed a clustering of a matrix and plotted the result with
 pltree. See code below. I want to color the labels of the leafs
 individually. For example I want the label name Node 2 to be plotted in
 red. How do I do this?
 
 Sincerely 
 
 Henrik
 
 
   library(cluster)
 
   D - matrix(nr=4,nc=4)
   rownames(D) - c(Node 1,Node 2,Node 3,Node 4)
   D[1,] - c(0,.6,.1,.7)
   D[2,] - c(.6,.0,.3,.9)
   D[3,] - c(.1,.3,0,.9)
   D[4,] - c(.7,.9,.9,0)
 
   C - agnes(D,diss=T,method=complete)
 
   pltree(C)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create Arrays

2010-10-15 Thread dpender

Hi Gerrit,

Almost it but I need to insert M[,i] as well as (matrix( -1, nrow( M),
CN[i]) when CN[i] = 0

I know this is not correct but can something like the following be done?

HH - c(0.88, 0.72, 0.89, 0.93, 1.23, 0.86, 0.98, 0.85, 1.23)
TT - c(7.14, 7.14, 7.49, 8.14, 7.14, 7.32, 7.14, 7.14, 7.14)
c - c(0, 0, 0, 2, 0, 0, 0, 2, 0)

TMP - lapply( seq(c),
function( i, CN, M) {
 if( CN[i] == 0) as.matrix( M[, i]) else
  (matrix( -1, nrow( M), CN[i])  as.matrix( M[, i]))
 }, CN = c, M = rbind( HH, TT))

do.call( cbind, TMP)

Doug

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Create-Arrays-tp2996706p2997060.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Many Thanks

2010-10-15 Thread Jumlong Vongprasert

 Dear R-help mailing list and software development team.
After I have used R a few weeks, I was exposed to the best of the program.
In addition, the R-help mailing list a great assist new users.
I do my job as I want and get great support from the R-help mailing list.
Thanks R-help mailing list.
Thank software development team.
Jumlong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Many Thanks

2010-10-15 Thread Jumlong Vongprasert

 Dear R-help mailing list and software development team.
After I have used R a few weeks, I was exposed to the best of the program.
In addition, the R-help mailing list a great assist new users.
I do my job as I want and get great support from the R-help mailing list.
Thanks R-help mailing list.
Thanks software development team.
Jumlong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] boxplot issue

2010-10-15 Thread Jonas Josefsson

Hi!

I am trying to produce a graph which shows overlap in latitude for a 
number of species.


I have a dataframe which looks as follows

species1,species2,species3,species4.
minlat  6147947,612352,627241,6112791
maxlat 7542842,723423,745329,7634921


I want to produce a graph with one horizontal bar for each species where 
minlat sets minimum value and maxlat sets maximum value for the bar. I 
want all bars to be stacked on top of eachother to show where I have 
overlap.


I have tried using boxplot but it only produces the standard mean and sd 
of the two values...


Thanks!

Jonas Josefsson

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [Debian][NetBeans] setting NetBeans to work with JRI libraries

2010-10-15 Thread filip

I am having hard time properly setting NetBeans to work with JRI libs
(http://rosuda.org/JRI/). Most of the instructions I have found so far
are written for Eclipse or Windows (or both).

I have set java.library.path variable in config: customize:VM
arguments field, by specifying
-Djava.library.path=/home/filip/Pobrane/JRI/. When I run the
application I get the following:

Cannot find JRI native library!
Please make sure that the JRI native library is in a directory listed
in java.library.path.

java.lang.UnsatisfiedLinkError: /home/fb/Pobrane/JRI/libjri.so:
libR.so: cannot open shared object file: No such file or directory
(...)

So I ran:

locate libR

with the following result:

/usr/local/lib/R/lib/libRblas.so
/usr/local/lib/R/lib/libRlapack.so

However this command:
ls /usr/local/lib/R/lib/

lists:

libRblas.so  libRlapack.so  libR.so

I have tried various customisations to the config, copying the
libraries and other desperate moves. I can properly compile and run the
rtest.java and rtest2.java under the examples section of my JRI
installation via the ./run script.

It would seem that NetBeans has a problem loading libraries,
that reference to other libraries (libjri is loaded, but not libR). I
have compiled R 2.12.0 with --with-blas=-lgoto2 --enable-BLAS-shlib
--enable-R-shlib (shared libraries, shared BLAS). Could the BLAS
be a problem? 


-- 
while(!succeed) { try(); }

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tessellation from biological data in spatstat

2010-10-15 Thread Dr Richard Mort

Hi,

I'm new to this mailing list so apologies if this is too basic. I have 
confocal images 512x512 from which I have extracted x,y positions of the 
coordiates of labelled cells exported from ImageJ as a.csv file. I also 
have images that define an underlying pattern in the tissue defined as 
areas of different pixel values 0 or 255 (also 512x512) I've exported 
these images as .txt files. I'd like to use spatstat to look at how the 
individual cells plot onto these patterns.


#I can import my x,y data and do basic stats with spatstat by doing the 
following:


library(spatstat)
data01 - read.csv(250810.csv, skip=1)
x - data01[[2]]
y - data01[[3]]
data02 - ppp(x, y, xrange=c(0,512), yrange=c(0,512))
unitname(data02) - c(pixel, pixels)
plot(data02)
#etc etc

#I can also import my text images and convert them to a tessellation 
using the following:


a - read.table(FOLLICLES.txt)
win - owin(c(0,512), c(0,512))
unitname(win) - c(pixel, pixels)
b - as.matrix(a, xrange=c(0,512), yrange=c(0,512))
unitname(b) - c(pixel, pixels)
c - im(b, xcol=seq(ncol(b)), yrow=seq(nrow(b)), unitname=pixels)
d - tess(image = c)
plot(d)

#I can then plot my ppp pattern on top of my tessellation using:

plot(d)
plot(data02, add=TRUE)

#So far so good, but when I run for example:

qb - quadratcount(data02, tess = d)

#I get the following error:

Warning message:
In quadratcount.ppp(data02, tess = d) :
 Tessellation does not contain all the points of X

#When I investigate further the following is returned for each object:

 d
Tessellation
Tessellation is determined by a factor-valued image with 2 levels
window: binary image mask
512 x 512 pixel array (ny, nx)
enclosing rectangle: [0.5, 512.5] x [0.5, 512.5] pixels 
 data02

planar point pattern: 254 points
window: rectangle = [0, 512] x [0, 512] units 

#I don't understand why my tessellation is a different size but even 
taking this into account my ppp should plot inside the window and does 
when i do plot(data02, add=TRUE). I've spent ages playing around with 
redefining the windows but I must be missing something (probably obvious).


Any help would be appreciated, I can provide files.
Kind regards
Richard

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to extract parameter estimates of variance function from lme fit

2010-10-15 Thread Hoai Thu Thai

Dear R-Users,

I have a question concerning extraction of parameter estimates of 
variance function from lme fit.


To fit my simulated data, we use varConstPower ( constant plus power 
variance function).

fm-lme(UPDRS~time,data=data.simula,random=~time,method=ML,weights=varConstPower(fixed=list(power=1)))
I extract the results of this function by using the following codes:
y-summary(fm)
x-y$modelStruct$varStruct
 x
Variance function structure of class varConstPower representing
   constpower
1.184535e-15 1.00e+00

These are the constants and power that apppear in the summary(fm), and 
that I want to extract.

Hower I got different value of const when extracting from x:

 x$const
const
-34.36943

Does anyone know why there is such difference and how to extract the 
expected value (1.184535e-15) ?


Thanks in advance for your help.

Thu


---

THAI Hoai Thu
INSERM U738 - Université Paris 7
16 rue Henri Huchard
75018 Paris, FRANCE
Tel: 01 57 27 75 39
Email: hoai-thu.t...@inserm.fr

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] boxplot issue

2010-10-15 Thread David Winsemius

A) you hijacked another thread.

On Oct 15, 2010, at 9:50 AM, Jonas Josefsson wrote:


Hi!

I am trying to produce a graph which shows overlap in latitude for a  
number of species.


I have a dataframe which looks as follows

   species1,species2,species3,species4.
minlat  6147947,612352,627241,6112791
maxlat 7542842,723423,745329,7634921

B) This looks like a data input snafu. It appears that the commas in  
the original data were not properly specified as delimiters.




I want to produce a graph with one horizontal bar for each species  
where minlat sets minimum value and maxlat sets maximum value for  
the bar. I want all bars to be stacked on top of eachother to show  
where I have overlap.


I have tried using boxplot but it only produces the standard mean  
and sd of the two values...


C) boxplot is designed to handle raw data and then pass plotting  
responsibility off to the bxp function. Once you address your data  
input issues you can use bxp.


--
David


Thanks!

Jonas Josefsson


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to extract parameter estimates of variance function from lme fit

2010-10-15 Thread Henrique Dallazuanna
Try this:

coef(fm$modelStruct$varStruct, uncons = FALSE)

On Fri, Oct 15, 2010 at 11:42 AM, Hoai Thu Thai hoai-thu.t...@inserm.frwrote:

 Dear R-Users,

 I have a question concerning extraction of parameter estimates of variance
 function from lme fit.

 To fit my simulated data, we use varConstPower ( constant plus power
 variance function).

 fm-lme(UPDRS~time,data=data.simula,random=~time,method=ML,weights=varConstPower(fixed=list(power=1)))
 I extract the results of this function by using the following codes:
 y-summary(fm)
 x-y$modelStruct$varStruct
  x
 Variance function structure of class varConstPower representing
   constpower
 1.184535e-15 1.00e+00

 These are the constants and power that apppear in the summary(fm), and that
 I want to extract.
 Hower I got different value of const when extracting from x:

  x$const
const
 -34.36943

 Does anyone know why there is such difference and how to extract the
 expected value (1.184535e-15) ?

 Thanks in advance for your help.

 Thu


 ---

 THAI Hoai Thu
 INSERM U738 - Université Paris 7
 16 rue Henri Huchard
 75018 Paris, FRANCE
 Tel: 01 57 27 75 39
 Email: hoai-thu.t...@inserm.fr

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to extract parameter estimates of variance function from lme fit

2010-10-15 Thread Hoai Thu Thai

Thank you Henrique!! It works.

Thu


Le 15/10/2010 16:53, Henrique Dallazuanna a écrit :

coef(fm$modelStruct$varStruct, uncons = FALSE)


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] p-values from coxph?

2010-10-15 Thread David Winsemius


On Oct 15, 2010, at 9:21 AM, Öhagen Patrik wrote:



Dear List,

I each iteration of a simulation study, I would like to save the p- 
value generated by coxph. I fail to see how to adress the p-value.  
Do I have to calculate it myself from the Wald Test statistic?


No. And the most important reason is that would not give you the same  
value as is print()-ed by coxph().


If you ask for the the str(print(coxph(...)) you get NULL (after the  
side-effect of prinitng. The print function only produces side- 
effects. On the other hand you can use the summary function and it  
gives you a richer set of output.  Using the first example on the help  
page for coxph:


str(summary(coxph(Surv(time, status) ~ x + strata(sex), test1)))
List of 12
 $ call: language coxph(formula = Surv(time, status) ~ x +  
strata(sex), data = test1)

 $ fail: NULL
 $ na.action   : NULL
 $ n   : int 7
 $ loglik  : num [1:2] -3.87 -3.33
 $ coefficients: num [1, 1:5] 0.802 2.231 0.822 0.976 0.329
  ..- attr(*, dimnames)=List of 2
  .. ..$ : chr x
  .. ..$ : chr [1:5] coef exp(coef) se(coef) z ...
 $ conf.int: num [1, 1:4] 2.231 0.448 0.445 11.18
  ..- attr(*, dimnames)=List of 2
  .. ..$ : chr x
  .. ..$ : chr [1:4] exp(coef) exp(-coef) lower .95 upper .95
 $ logtest : Named num [1:3] 1.087 1 0.297
  ..- attr(*, names)= chr [1:3] test df pvalue
 $ sctest  : Named num [1:3] 1.051 1 0.305
  ..- attr(*, names)= chr [1:3] test df pvalue
 $ rsq : Named num [1:2] 0.144 0.669
  ..- attr(*, names)= chr [1:2] rsq maxrsq
 $ waldtest: Named num [1:3] 0.95 1 0.329
  ..- attr(*, names)= chr [1:3] test df pvalue
 $ used.robust : logi FALSE

So the fifth element of  coefficients leaf of the list structure has  
the same p-value as that print()-ed.


Try:

 summary(fit)$coefficients[5]
[1] 0.3292583

(It does seem to me that the name for that leaf of the fit object is  
not particularly in accord with what I would have considered  
coefficients., but I am really in no solid position to criticize  
Terry Therneau to whom we all owe a great deal of gratitude.)



--
David.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Many Thanks

2010-10-15 Thread Andrew Miles
And thank YOU for taking the time to express your gratitude.  I'm sure  
all those who regularly take the time to contribute to the list  
appreciate the appreciation.


Andrew Miles

On Oct 15, 2010, at 9:49 AM, Jumlong Vongprasert wrote:


Dear R-help mailing list and software development team.
After I have used R a few weeks, I was exposed to the best of the  
program.

In addition, the R-help mailing list a great assist new users.
I do my job as I want and get great support from the R-help mailing  
list.

Thanks R-help mailing list.
Thanks software development team.
Jumlong

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] interaction contrasts

2010-10-15 Thread David Winsemius
Should you need to do it again, you may want to look at the relevel  
function. I suppose that would meet the definition of some versions of  
on the fly but once I have a model, rerunning with a different  
factor leveling is generally pretty painless.


--
David.


On Oct 15, 2010, at 9:09 AM, Kay Cichini wrote:



..by some (extensive) trial and error reordering the contrast matrix  
and the

reference level
i figured it out myself -
for anyone who might find this helpful searching for a similar  
contrast in

the future:
this should be the right one:

c2-rbind(fac2-effect in A=c(0,1,0,0,0,0,0,0),
 fac2-effect in B=c(0,1,0,0,0,1,0,0),
 fac2-effect in C=c(0,1,0,0,0,0,1,0),
 fac2-effect in D=c(0,1,0,0,0,0,0,1),
fac2-effect, A*B=c(0,0,0,0,0,1,0,0),
 fac2-effect, A*C=c(0,0,0,0,0,0,1,0),
 fac2-effect, A*D=c(0,0,0,0,0,0,0,1),
fac2-effect, B*C=c(0,0,0,0,0,-1,1,0),
fac2-effect, B*D=c(0,0,0,0,0,-1,0,1),
fac2-effect, C*D=c(0,0,0,0,0,0,-1,1))

summary(glht(mod,c2))


Kay Cichini wrote:


hello,

i was shortly asking the list for help with some interaction  
contrasts

(see below) for which
i had to change the reference level of the model on the fly (i  
read a

post that this is possible in
multcomp).

if someone has a clue how this is coded in multcomp; glht() -  
please point

me there.

yours,
kay


Kay Cichini wrote:


hello list,

i'd very much appreciate help with setting up the
contrast for a 2-factorial crossed design.

here is a toy example:

library(multcomp)

dat-data.frame(fac1=gl(4,8,labels=LETTERS[1:4]),
   fac2=rep(c(I,II),16),y=rnorm(32,1,1))

mod-lm(y~fac1*fac2,data=dat)

## the contrasts i'm interressted in:

c1-rbind(fac2-effect in A=c(0,1,0,0,0,0,0,0),
 fac2-effect in B=c(0,1,0,0,0,1,0,0),
 fac2-effect in C=c(0,1,0,0,0,0,1,0),
 fac2-effect in D=c(0,1,0,0,0,0,0,1),
fac2-effect, A*B=c(0,0,0,0,0,1,0,0),
 fac2-effect, A*C=c(0,0,0,0,0,0,1,0),
 fac2-effect, A*D=c(0,0,0,0,0,0,0,1))

summary(glht(mod,c1))

## now i want to add the remaining combinations
## fac2, B*C
## fac2, B*D
## fac2, C*D
## to the simultanous tests to see whether the effects
## of fac2 within the levels of fac1 differ between
## each combination of the levels of fac1, or not ??

thanks for any advise!

yours,
kay









-

Kay Cichini
Postgraduate student
Institute of Botany
Univ. of Innsbruck


--
View this message in context: 
http://r.789695.n4.nabble.com/interaction-contrasts-tp2993845p2996987.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nonlinear Regression Parameter Shared Across Multiple DataSets

2010-10-15 Thread Keith Jewell
Hi,

I've had to do something like that before. It seems to be a feature of nls 
(in R, but not as I recall in Splus) that it accepts a list with vector 
components as 'start' values, but flattens the result values to a single 
vector.

I can't spend much time explaining, but here's a fragment of code that might 
get you started:
---
# Fit the nls, correct coef names lost by nls
   val - nls(formula=formulaIn, data=DataList, start= tCoefs, 
control=control)
  CoefList - list()  # initialise CoefList
   for(aName in names(tCoefs)) {# for each vector of 
coefficients
 tvec - get(aName, envir=val$m$getEnv()) # get it from the nls 
environment
 names(tvec) - names(tCoefs[[aName]]) # correct its names
  assign(aName, tvec, envir=val$m$getEnv()) # return it to nls
 CoefList[[aName]] - tvec# store in CoefList
}

As I recall,
tCoefs is a list of starting values that can have vector components
CoefList ends up as a similar structure and named list of result values

hth

Keith J

Jared Blashka evilamaran...@gmail.com wrote in message 
news:aanlktimprowm-mne9n_bu8ty=jylitq4ewhpph1hy...@mail.gmail.com...
 Looking at the source for nlrob, it looks like it saves the coefficients
 from the results of running an nls and then passes those coefficients back
 into the next nls request. The issue that it's running into is that nls
 returns the coefficients as upper, LOGEC501, LOGEC502, and LOGEC503, 
 rather
 than just upper and a vector named LOGEC50. Does anyone know a way to
 restructure the formula/start parameter so that coef returns a vector
 instead of each element individually? Right now, I've had to 'hack' nlrob 
 so
 it recombines similarly named elements into a vector, but was wondering if
 there was a way to accomplish the end goal without those measures.

 Thanks,
 Jared

 On Wed, Oct 13, 2010 at 3:14 PM, Jared Blashka 
 evilamaran...@gmail.comwrote:

 As an addendum to my question, I'm attempting to apply the solution to 
 the
 robust non-linear regression function nlrob from the robustbase package, 
 and
 it doesn't work in that situation. I'm getting

 allRobustFit - nlrob(Y ~ (upper)/(1+10^(X-LOGEC50[dset])), data=all
 ,start=list(upper=max(all$Y),LOGEC50=c(-8.5,-8.5,-8.5)))
 Error in nls(formula, data = data, start = start, algorithm = algorithm,
  :
   parameters without starting value in 'data': LOGEC50

 I'm guessing this is because the nlrob function doesn't know what to do
 with a vector for a start value. Am I correct and is there another method 
 of
 using nlrob in the same way?

 Thanks,
 Jared

 On Tue, Oct 12, 2010 at 8:58 AM, Jared Blashka 
 evilamaran...@gmail.comwrote:

 Thanks so much! It works great.

 I had thought the way to do it relied on combining the data sets, but I
 couldn't figure out how to alter the formula to work with the 
 combination.

 Jared


 On Tue, Oct 12, 2010 at 7:07 AM, Keith Jewell 
 k.jew...@campden.co.ukwrote:


 Jared Blashka evilamaran...@gmail.com wrote in message
 news:aanlktinffmudugqnkudvr=fmf0wrrtsbjxjexuki_...@mail.gmail.com...
  I'm working with 3 different data sets and applying this non-linear
  regression formula to each of them.
 
  nls(Y ~ (upper)/(1+10^(X-LOGEC50)), data=std_no_outliers,
  start=list(upper=max(std_no_outliers$Y),LOGEC50=-8.5))
 
  Previously, all of the regressions were calculated in Prism, but I'd
 like
  to
  be able to automate the calculation process in a script, which is why
 I'm
  trying to move to R. The issue I'm running into is that previously, 
  in
  Prism, I was able to calculate a shared value for a constraint so 
  that
 all
  three data sets shared the same value, but have other constraints
  calculated
  separately. So Prism would figure out what single value for the
 constraint
  in question would work best across all three data sets. For my 
  formula,
  each
  data set needs it's own LOGEC50 value, but the upper value should be
 the
  same across the 3 sets. Is there a way to do this within R, or with a
  package I'm not aware of, or will I need to write my own nls function
 to
  work with multiple data sets, because I've got no idea where to start
 with
  that.
 
  Thanks,
  Jared
 
  [[alternative HTML version deleted]]
 
 An approach which works for me (code below to illustrate principle, not
 tried...)

 1) combine all three data sets into one dataframe with a column (e.g.
 dset) indicating data set (1, 2 or 3)

 2) express your formula with upper as single valued and LOGEC50 as a
 vector
 inderxed by dest e.g.
 Y ~ upper/(1+10^(C-LOGEC50[dset]))

 3) in the start list, make LOGEC50 a vector e.g. using -8.5 as start 
 for
 all
 three LOGEC50 values
   start =
 list(start=list(upper=max(std_no_outliers$Y),LOGEC50=c(-8.5, -8.5, -8.5))

 Hope that helps,

 Keith J

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 

Re: [R] specify data frame by name

2010-10-15 Thread Greg Snow
Also look at the get function, it may be a bit more straight forward (and safer 
if there is any risk of someone specifying 'rm(ls())' as a data frame name).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of darckeen
 Sent: Thursday, October 14, 2010 11:53 PM
 To: r-help@r-project.org
 Subject: Re: [R] specify data frame by name
 
 
 nvm, i figured it out.
 
 
 dfm - data.frame(x=1:10)
 
 testfunc - function(data=dfm)
 {
   dat - eval.parent(as.symbol(data))
   sum(dat$x)
 }
 
 print(testfunc())
 
 --
 View this message in context: http://r.789695.n4.nabble.com/specify-
 data-frame-by-name-tp2996534p2996541.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nominal response model

2010-10-15 Thread Mauricio Romero
Is there a way to estimate a nominal response model?

 

To be more specific let's say I want to calibrate:

\pi_{v}(\theta_j)=\frac{e^{\xi_{v}+\lambda_{v}\theta_j}}{\sum_{h=1}^m
e^{\xi_{h}+\lambda_{h}\theta_j}}

 

Where $\theta_j$ is a the dependent variable and I need to estimate
$\xi_{h}$ and $\lambda_{h}$ for $h \in {1...,m}$.

 

Thank you,

 

 

Mauricio Romero 

Quantil S.A.S.

Cel: 3112231150

www.quantil.com.co

 

It is from the earth that we must find our substance; it is on the earth
that we must find solutions to the problems that promise to destroy all life
here

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with an unbalanced split plot

2010-10-15 Thread Eugenio Larios
Hi Dennis,

The first thing I did with my data was to explore it with 6 graphs
(wet-high, med, and solo-; dry-high, med, and solo-) and gave me very
interesting patterns: seed size in wet treatments is either negatively
correlated (high and medium densities) or flat (solo). But dry treatments
are all positively correlated! There is a very interesting switch there.

I also figured out why I can't do three way interactions. I explored the
structure of my data with str(mydata) and it shows that water treatment has
three levels when it should have just two. Then I went back to the excel
sheet, tried to sort the data by water treatment and I discover a single
data point from the wet treatment sticking out by itself. That is why R
reads three levels and since it is only one point, there cannot be any stats
of course.

thanks
E

On Thu, Oct 14, 2010 at 9:27 PM, Dennis Murphy djmu...@gmail.com wrote:

 Hi:

 On Thu, Oct 14, 2010 at 7:50 PM, Eugenio Larios 
 elari...@email.arizona.edu wrote:

 Hi Dennis,

 thank you very much for your help, I really appreciate it.

 I forgot to say about the imbalance, yes. I only explained the original
 set up, sorry. Let me explain.

 It is because in the process of the experiment which lasted 3 months I
 lost individuals within the plots and I actually ended up losing 2 whole
 plots (one dry and one wet) and some other individuals in other plots.


 That still leaves you balanced at the plot level :)  Fortunately, you have
 enough replication. If you have missing subplots within the remaining plots,
 that would be another source of imbalance at the subplot level, but you
 should have enough subplots to be able to estimate all of the interactions
 unless an entire treatment in one set of plots was missing.

 It's worth graphing your data to anticipate which effects/interactions
 should be significant; graphs involving the spatial configuration of the
 plots and subplots would also be worthwhile.


 My study system has this special feature that allows me to track parental
 seed sizes in plants germinated in the field, a persistent ring that stays
 attached to the root even when the plant has germinated, so some of the
 plants I lost did not have this ring anymore. It happens sometimes but most
 of the time they have it. Also, some plants disappeared probably due to
 predation, etc That made my experiment imbalanced.


 That's common. No big deal.


 Do you think that will change the analysis? Also, do you think I should
 use least squares ANOVA  (perhaps type III due to the imbalance?) instead of
 LMM? What about the random effects that my blocking has created?


 Actually, with unbalanced data it's to your advantage to use lme() over
 ANOVA. Just don't place too much importance on the p-values of tests; even
 the degrees of freedom are debatable. With unbalanced data, it's hard to
 predict what the sampling distribution of a given statistic will actually
 be, so the p-values aren't as trustworthy.

 You mentioned that you couldn't fit a three-way interaction; given your
 data configuration, that shouldn't happen.

 (1) Get two-way tables of water * density, one for the counts and one for
 the averages, something like

 with(mydata, table(water, density))
 aggregate(log(fitness) ~ water + density, data = mydata, FUN = mean, na.rm
 = TRUE)

 In the first table, unless you have very low frequencies in some category,
 your data 'density' should be enough to estimate all the main effects and
 interactions of interest. The second table is to check that you don't have
 NaNs or missing cells, etc.


 I am new to R-help website so I wrote you this message to your email but I
 would like to post it on the R website, do you know how?


 Wag answer: I hope so, since I managed to view and respond to your message
 :)

 More seriously, in gmail, the window that opens to produce replies has an
 option 'Reply to all'. I don't know if your e-mail client at UofA has that
 feature, but if not, you could always cc R-help and put the e-mail address
 in by hand if necessary. Most mailers are smart enough to auto-complete an
 address as you type in the name, so you could see if that applies on your
 system.

 I keep a separate account for R-help because of the traffic volume - if you
 intend to subscribe to the list, you might want to do the same. It's not
 unusual for 75-100 e-mails a weekday to enter your inbox...


 Thanks again!

 Eugenio


 On Thu, Oct 14, 2010 at 5:34 PM, Dennis Murphy djmu...@gmail.com wrote:

 Hi:

 On Thu, Oct 14, 2010 at 3:58 PM, Eugenio Larios 
 elari...@email.arizona.edu wrote:

 Hi Everyone,

 I am trying to analyze a split plot experiment in the field that was
 arranged like this:
 I am trying to measure the fitness consequences of seed size.

 Factors (X):
 *Seed size*: a continuous variable, normally distributed.
 *Water*: Categorical Levels- wet and dry.
 *Density*: Categorical Levels- high, medium and solo
 *Plot*: Counts from 1 to 20
 The *response variable *(Y) was 

Re: [R] creating 'all' sum contrasts

2010-10-15 Thread Michael Hopkins

On 15 Oct 2010, at 13:55, Berwin A Turlach wrote:

 G'day Michael,
 

Hi Berwin

Thanks for the reply

 On Fri, 15 Oct 2010 12:09:07 +0100
 Michael Hopkins hopk...@upstreamsystems.com wrote:
 
 OK, my last question didn't get any replies so I am going to try and
 ask a different way.
 
 When I generate contrasts with contr.sum() for a 3 level categorical
 variable I get the 2 orthogonal contrasts:
 
 contr.sum( c(1,2,3) )
  [,1] [,2]
 110
 201
 3   -1   -1
 
 These two contrasts are *not* orthogonal.
 
I'm surprised.  Can you please tell me how you calculated that.

 This provides the contrasts 1-3 and 2-3 as expected.  But I also
 want it to create 1-2 (i.e. 1-3 - 2-3).  So in general I want
 all possible orthogonal contrasts - think of it as the contrasts for
 all pairwise comparisons between the levels.
 
 You have to decide what you want.  The contrasts for all pairwise
 comparaisons between the levels or all possible orthogonal contrasts?
 

Well the pairwise contrasts are the most important as I am looking for evidence 
of whether they are zero (i.e. no difference between levels) or not.  But see 
my above comment about orthogonality.

 The latter is actually not well defined.  For a factor with p levels,
 there would be p orthogonal contrasts, which are only identifiable up to
 rotation, hence infinitely many such sets. But there are p(p-1)
 pairwise comparisons. So unless p=2, yo have to decide what you want
 
Well of course the pairwise comparisons are bi-directional so in fact only 
p(p-1)/2 are of interest to me.

 Are there are any options for contrast() or other functions/libraries
 that will allow me to do this automatically?
 
 Look at package multcomp, in particular functions glht and mcp, these
 might help.
 
Thanks I will have a look.  

But I want to be able to do this transparently within lm() using regsubsets() 
etc as I am collecting large quantities of summary stats from all possible 
models to use with a model choice criterion based upon true Bayesian model 
probabilities.

 Cheers,
 
   Berwin
 
 == Full address 
 Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
 School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
 The University of Western Australia   FAX : +61 (8) 6488 1028
 35 Stirling Highway   
 Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
 Australiahttp://www.maths.uwa.edu.au/~berwin




Michael Hopkins
Algorithm and Statistical Modelling Expert
 
Upstream
23 Old Bond Street
London
W1S 4PZ

Mob +44 0782 578 7220
DL   +44 0207 290 1326
Fax  +44 0207 290 1321

hopk...@upstreamsystems.com
www.upstreamsystems.com
 



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fast rowCumsums wanted for calculating the cdf

2010-10-15 Thread William Dunlap
 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Joshua Wiley
 Sent: Friday, October 15, 2010 12:23 AM
 To: Gregor
 Cc: r-help@r-project.org
 Subject: Re: [R] fast rowCumsums wanted for calculating the cdf
 
 Hi,
 
 You might look at Reduce().  It seems faster.  I converted the matrix
 to a list in an incredibly sloppy way (which you should not emulate)
 because I cannot think of  the simple way.
 
  probs - t(matrix(rep(1:1000), nrow=10)) # matrix with 
 row-wise probabilites
  F - matrix(0, nrow=nrow(probs), ncol=ncol(probs));
  F[,1] - probs[,1,drop=TRUE];
  add - function(x) {Reduce(`+`, x, accumulate = TRUE)}
 
 
  system.time(F.slow - t(apply(probs, 1, cumsum)))
user  system elapsed
  36.758   0.416  42.464
 
  system.time(for (cc in 2:ncol(F)) {
 +  F[,cc] - F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
 + })
user  system elapsed
   0.980   0.196   1.328
 
  system.time(add(list(probs[,1], probs[,2], probs[,3], 
 probs[,4], probs[,5], probs[,6], probs[,7], probs[,8], 
 probs[,9], probs[,10])))
user  system elapsed
   0.420   0.072   0.539

One way to avoid the typing of
   list(probs[,1], probs[,2], ...)
is to use split(probs, col(probs)).  However, split() is
surprisingly slow in this case.

Another approach is to use a matrix multiply, by a ncol(probs)
by ncol(probs) matrix with 0's in lower triangle and 1's
elsewhere.  Here are 4 approaches I've seen mentioned,
made into functions that output the matrix requested:

f1 - function (x) t(apply(x, 1, cumsum))
f2 - function (x){
if (ncol(x)  1) 
for (i in 2:ncol(x)) x[, i] - x[, i] + x[, i - 1]
x
}
f3 - function (x) 
matrix(unlist(Reduce(`+`, split(x, col(x)), accumulate = TRUE)), 
  ncol = ncol(x))
f4 - function (x) x %*% outer(seq_len(ncol(x)), seq_len(ncol(x)), =)

I tested it as follows
   probs - matrix(runif(1e7), ncol=10, nrow=1e6)
   system.time(v1 - f1(probs))
 user  system elapsed 
19.450.25   16.78 
   system.time(v2 - f2(probs))
 user  system elapsed 
 0.680.030.79 
   system.time(v3 - f3(probs))
 user  system elapsed 
38.250.24   34.47 
   system.time(v4 - f4(probs))
 user  system elapsed 
 0.700.050.56 
   identical(v1,v2)  identical(v1,v3)  identical(v1,v4)
  [1] TRUE

If you have a fancy optimized/multithreaded BLAS linked
to your version of R there you may find that %*% is
much faster (I'm using the precompiled R for Windows).
As ncol(x) gets bigger the %*% approach probably loses
its advantage.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 
 
 .539 seconds for 10 vectors each with 1 million elements does not seem
 bad.  For 3 each, on my system it took .014 seconds, which for
 200,000 iterations, I estimated as:
 
  (20*.014)/60/60
 [1] 0.778
 
 (and this is on a stone age system with a single core processor and
 only 756MB of RAM)
 
 It should not be difficult to get the list back to a matrix.
 
 Cheers,
 
 Josh
 
 
 
 On Thu, Oct 14, 2010 at 11:51 PM, Gregor mailingl...@gmx.at wrote:
  Dear all,
 
  Maybe the easiest solution: Is there anything that speaks 
 against generalizing
  cumsum from base to cope with matrices (as is done in matlab)? E.g.:
 
  cumsum(Matrix, 1)
  equivalent to
  apply(Matrix, 1, cumsum)
 
  The main advantage could be optimized code if the Matrix is 
 extreme nonsquare
  (e.g. 100,000x10), but the summation is done over the short 
 side (in this case 10).
  apply would practically yield a loop over 100,000 elements, 
 and vectorization w.r.t.
  the long side (loop over 10 elements) provides considerable 
 efficiency gains.
 
  Many regards,
  Gregor
 
 
 
 
  On Tue, 12 Oct 2010 10:24:53 +0200
  Gregor mailingl...@gmx.at wrote:
 
  Dear all,
 
  I am struggling with a (currently) cost-intensive problem: 
 calculating the
  (non-normalized) cumulative distribution function, given 
 the (non-normalized)
  probabilities. something like:
 
  probs - t(matrix(rep(1:100),nrow=10)) # matrix with 
 row-wise probabilites
  F - t(apply(probs, 1, cumsum)) #SLOOOW!
 
  One (already faster, but for sure not ideal) solution - 
 thanks to Henrik Bengtsson:
 
  F - matrix(0, nrow=nrow(probs), ncol=ncol(probs));
  F[,1] - probs[,1,drop=TRUE];
  for (cc in 2:ncol(F)) {
    F[,cc] - F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
  }
 
  In my case, probs is a (30,000 x 10) matrix, and i need to 
 iterate this step around
  200,000 times, so speed is crucial. I currently can make 
 sure to have no NAs, but
  in order to extend matrixStats, this could be a nontrivial issue.
 
  Any ideas for speeding up this - probably routine - task?
 
  Thanks in advance,
  Gregor
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 

[R] Multi-line graph?

2010-10-15 Thread barnhillec

Hi,

I am relatively new to R but not to graphing, which I used to do in Excel
and a few other environments on the job. I'm going back to school for a PhD
and am teaching myself R beforehand. So I hope this question is not
unacceptably ignorant but I have perused every entry level document I can
find and so far I'm out of luck.

I'm trying to graph some simple music psychology data. Columns are musical
intervals, rows are the initials of the subjects. Numbers are in beats per
minute (this is the value at which they hear the melodic interval split into
two streams). So here's my table:

 Tenth Fifth Third
GG 112152   168
EC  100120   140
SQ  160   184NA
SK   120   100   180

I want a multi-line graph where the intervals are on the X axis, and the y
axis is the beats per minute, and each subject has a different line. In
Excel this would be no problem but I am having trouble in R.

The only way I can figure out how to plot this in R is if the columns or
rows are taken as variables. But the variable is beats per minute. Any
suggestions?

I appreciate the help. -Eric
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Multi-line-graph-tp2997402p2997402.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multi-line graph?

2010-10-15 Thread Dieter Menne


barnhillec wrote:
 
 I'm trying to graph some simple music psychology data. Columns are musical
 intervals, rows are the initials of the subjects. Numbers are in beats per
 minute (this is the value at which they hear the melodic interval split
 into two streams). So here's my table:
 
  Tenth Fifth Third
 GG 112152   168
 EC  100120   140
 SQ  160   184NA
 SK   120   100   180
 
 I want a multi-line graph where the intervals are on the X axis, and the y
 axis is the beats per minute, and each subject has a different line. 
 
 
The most difficult part of it (at least that's what my students think) is
getting the data into the right format. If you have the changes to start
with the data in the long format, use it. What you need is:

init interval beats
GC   Tenth112

In this case, reformatting almost works with the default version of the melt
function in package reshape.

It's good that you supplied a data example, but in general it is better if
you could provide it in a copy-and-paste format as shown below. I needed
more time to reformat the data than to write the rest.


Dieter


library(lattice)
library(reshape)
dt = data.frame(
   init  = c(GG,EC, SQ,SK),
   Tenth = c(112,100,160,120),
   Fifth = c(152,120,184,100),
   Third = c(168,140,NA,180))

# The data should look like this
#init interval beats
#GC   Tenth112
#dtlong = melt(dt) # almost does it, but column variable is ugly
dtlong = melt(dt,variable_name=beats) # better
dtlong
   
xyplot(value~variable,groups=init,data=dtlong,type=l, 
auto.key=list(lines=TRUE))





-- 
View this message in context: 
http://r.789695.n4.nabble.com/Multi-line-graph-tp2997402p2997435.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multi-line graph?

2010-10-15 Thread William Revelle

?matplot

e.g.,

copy your data to the clipboard then

library(psych)
my.data - read.clipboard()

 my.data

   Tenth Fifth Third
GG   112   152   168
EC   100   120   140
SQ   160   184NA
SK   120   100   180

matplot(t(my.data),type=b)

Bill



At 10:27 AM -0700 10/15/10, barnhillec wrote:

Hi,

I am relatively new to R but not to graphing, which I used to do in Excel
and a few other environments on the job. I'm going back to school for a PhD
and am teaching myself R beforehand. So I hope this question is not
unacceptably ignorant but I have perused every entry level document I can
find and so far I'm out of luck.

I'm trying to graph some simple music psychology data. Columns are musical
intervals, rows are the initials of the subjects. Numbers are in beats per
minute (this is the value at which they hear the melodic interval split into
two streams). So here's my table:

 Tenth Fifth Third
GG 112152   168
EC  100120   140
SQ  160   184NA
SK   120   100   180

I want a multi-line graph where the intervals are on the X axis, and the y
axis is the beats per minute, and each subject has a different line. In
Excel this would be no problem but I am having trouble in R.

The only way I can figure out how to plot this in R is if the columns or
rows are taken as variables. But the variable is beats per minute. Any
suggestions?

I appreciate the help. -Eric
--
View this message in context: 
http://r.789695.n4.nabble.com/Multi-line-graph-tp2997402p2997402.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with merging two zoo objects

2010-10-15 Thread Megh Dal
Dear all, I have following 2 zoo objects. However when I try to merge those 2 
objects into one, nothing is coming as intended. Please see below the objects 
as well as the merged object:


 dat11
  V2   V3   V4   V5
2010-10-15 13:43:54 73.8 73.8 73.8 73.8
2010-10-15 13:44:15 73.8 73.8 73.8 73.8
2010-10-15 13:45:51 73.8 73.8 73.8 73.8
2010-10-15 13:46:21 73.8 73.8 73.8 73.8
2010-10-15 13:47:27 73.8 73.8 73.8 73.8
2010-10-15 13:47:54 73.8 73.8 73.8 73.8
2010-10-15 13:49:51 73.7 73.7 73.7 73.7
 dat22
  V2   V3   V4   V5
2010-10-15 12:09:12 74.0 74.0 74.0 74.0
2010-10-15 12:09:33 73.9 73.9 73.9 73.9
2010-10-15 12:20:36 74.0 74.0 74.0 74.0
2010-10-15 12:30:36 74.0 74.0 74.0 74.0
2010-10-15 12:41:03 73.7 73.7 73.7 73.7
 merge(dat11, dat22)
V2.dat11 V3.dat11 V4.dat11 V5.dat11 V2.dat22 V3.dat22 
V4.dat22 V5.dat22
2010-10-15 12:09:12   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 12:09:33   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:43:54   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:44:15   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:45:51   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:46:21   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:47:27   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:47:54   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:49:51   NA   NA   NA   NA   NA   NA   
NA   NA
Warning messages:
1: In MATCH(x, x) == seq_len(length(x)) :
  longer object length is not a multiple of shorter object length
2: In MATCH(x, x) == seq_len(length(x)) :
  longer object length is not a multiple of shorter object length

If somebody points me whether I went wrong, it would be really great.

Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Overlaying two png?

2010-10-15 Thread steven mosher
I have a program that creates a Png file using Rgooglemap with an extent
(lonmin,lonmax,latmin,latmax)
I also have a contour plot of the same location, same extent, same sized
(height/width) png file.

I'm looking for a way to make the contour semi transparent and overlay it on
the google map ( hybrid map)

Since I have 7000 of these to do an automated process is desired ( grin)

Any pointers in the right direction ?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with merging two zoo objects

2010-10-15 Thread Dieter Menne


Megh wrote:
 
 Dear all, I have following 2 zoo objects. However when I try to merge
 those 2 objects into one, nothing is coming as intended. Please see below
 the objects as well as the merged object:
 
 
 merge(dat11, dat22)
 V2.dat11 V3.dat11 V4.dat11 V5.dat11 V2.dat22 V3.dat22
 V4.dat22 V5.dat22
 2010-10-15 12:09:12   NA   NA   NA   NA   NA   NA 
  
 NA   NA
 
Since the simulated example works, it must have to do with your data. Try
str(dat11), str(dat12), maybe something strange has crept in.

Dieter


library(zoo)
x.date - as.Date(paste(2003, 02, c(1, 3, 7, 9, 14), sep = -))
x - zoo(matrix(1:10, ncol = 2), x.date)
x
str(x)

y.date - as.Date(paste(2006, 02, c(1, 3, 7, 9, 14), sep = -))
y - zoo(matrix(11:20, ncol = 2), y.date)
y
str(y)

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Problem-with-merging-two-zoo-objects-tp2997472p2997494.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with merging two zoo objects

2010-10-15 Thread Achim Zeileis



On Fri, 15 Oct 2010, Megh Dal wrote:


Dear all, I have following 2 zoo objects. However when I try to merge those 2 
objects into one, nothing is coming as intended. Please see below the objects 
as well as the merged object:



dat11

 V2   V3   V4   V5
2010-10-15 13:43:54 73.8 73.8 73.8 73.8
2010-10-15 13:44:15 73.8 73.8 73.8 73.8
2010-10-15 13:45:51 73.8 73.8 73.8 73.8
2010-10-15 13:46:21 73.8 73.8 73.8 73.8
2010-10-15 13:47:27 73.8 73.8 73.8 73.8
2010-10-15 13:47:54 73.8 73.8 73.8 73.8
2010-10-15 13:49:51 73.7 73.7 73.7 73.7

dat22

 V2   V3   V4   V5
2010-10-15 12:09:12 74.0 74.0 74.0 74.0
2010-10-15 12:09:33 73.9 73.9 73.9 73.9
2010-10-15 12:20:36 74.0 74.0 74.0 74.0
2010-10-15 12:30:36 74.0 74.0 74.0 74.0
2010-10-15 12:41:03 73.7 73.7 73.7 73.7

merge(dat11, dat22)

   V2.dat11 V3.dat11 V4.dat11 V5.dat11 V2.dat22 V3.dat22 
V4.dat22 V5.dat22
2010-10-15 12:09:12   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 12:09:33   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:43:54   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:44:15   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:45:51   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:46:21   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:47:27   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:47:54   NA   NA   NA   NA   NA   NA   
NA   NA
2010-10-15 13:49:51   NA   NA   NA   NA   NA   NA   
NA   NA
Warning messages:
1: In MATCH(x, x) == seq_len(length(x)) :
 longer object length is not a multiple of shorter object length
2: In MATCH(x, x) == seq_len(length(x)) :
 longer object length is not a multiple of shorter object length

If somebody points me whether I went wrong, it would be really great.


merge() does cbind() (among some more general computations), I guess you 
want rbind().


Try rbind(dat11, dat22).

hth,
Z


Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with merging two zoo objects

2010-10-15 Thread Gabor Grothendieck
On Fri, Oct 15, 2010 at 2:20 PM, Megh Dal megh700...@yahoo.com wrote:
 Dear all, I have following 2 zoo objects. However when I try to merge those 2 
 objects into one, nothing is coming as intended. Please see below the objects 
 as well as the merged object:


 dat11
                      V2   V3   V4   V5
 2010-10-15 13:43:54 73.8 73.8 73.8 73.8
 2010-10-15 13:44:15 73.8 73.8 73.8 73.8
 2010-10-15 13:45:51 73.8 73.8 73.8 73.8
 2010-10-15 13:46:21 73.8 73.8 73.8 73.8
 2010-10-15 13:47:27 73.8 73.8 73.8 73.8
 2010-10-15 13:47:54 73.8 73.8 73.8 73.8
 2010-10-15 13:49:51 73.7 73.7 73.7 73.7
 dat22
                      V2   V3   V4   V5
 2010-10-15 12:09:12 74.0 74.0 74.0 74.0
 2010-10-15 12:09:33 73.9 73.9 73.9 73.9
 2010-10-15 12:20:36 74.0 74.0 74.0 74.0
 2010-10-15 12:30:36 74.0 74.0 74.0 74.0
 2010-10-15 12:41:03 73.7 73.7 73.7 73.7
 merge(dat11, dat22)
                    V2.dat11 V3.dat11 V4.dat11 V5.dat11 V2.dat22 V3.dat22 
 V4.dat22 V5.dat22
 2010-10-15 12:09:12       NA       NA       NA       NA       NA       NA     
   NA       NA
 2010-10-15 12:09:33       NA       NA       NA       NA       NA       NA     
   NA       NA
 2010-10-15 13:43:54       NA       NA       NA       NA       NA       NA     
   NA       NA
 2010-10-15 13:44:15       NA       NA       NA       NA       NA       NA     
   NA       NA
 2010-10-15 13:45:51       NA       NA       NA       NA       NA       NA     
   NA       NA
 2010-10-15 13:46:21       NA       NA       NA       NA       NA       NA     
   NA       NA
 2010-10-15 13:47:27       NA       NA       NA       NA       NA       NA     
   NA       NA
 2010-10-15 13:47:54       NA       NA       NA       NA       NA       NA     
   NA       NA
 2010-10-15 13:49:51       NA       NA       NA       NA       NA       NA     
   NA       NA
 Warning messages:
 1: In MATCH(x, x) == seq_len(length(x)) :
  longer object length is not a multiple of shorter object length
 2: In MATCH(x, x) == seq_len(length(x)) :
  longer object length is not a multiple of shorter object length

 If somebody points me whether I went wrong, it would be really great.


If I try it then it works properly so there is likely something wrong
with your dat11 and dat22 objects.  If you provide the problem
reproducibly one might be able to say more.

 Lines1 - Date Time V2   V3   V4   V5
+ 2010-10-15 13:43:54 73.8 73.8 73.8 73.8
+ 2010-10-15 13:44:15 73.8 73.8 73.8 73.8
+ 2010-10-15 13:45:51 73.8 73.8 73.8 73.8
+ 2010-10-15 13:46:21 73.8 73.8 73.8 73.8
+ 2010-10-15 13:47:27 73.8 73.8 73.8 73.8
+ 2010-10-15 13:47:54 73.8 73.8 73.8 73.8
+ 2010-10-15 13:49:51 73.7 73.7 73.7 73.7

 Lines2 - Date Time V2   V3   V4   V5
+ 2010-10-15 12:09:12 74.0 74.0 74.0 74.0
+ 2010-10-15 12:09:33 73.9 73.9 73.9 73.9
+ 2010-10-15 12:20:36 74.0 74.0 74.0 74.0
+ 2010-10-15 12:30:36 74.0 74.0 74.0 74.0
+ 2010-10-15 12:41:03 73.7 73.7 73.7 73.7

 library(zoo)
 dat1 - read.zoo(textConnection(Lines1), header = TRUE,
+ index = list(1, 2), FUN = function(d, t) as.POSIXct(paste(d, t)))
Warning messages:
1: closing unused connection 8 (Lines2)
2: closing unused connection 7 (Lines1)
3: closing unused connection 5 (Lines2)
4: closing unused connection 4 (Lines1)
5: closing unused connection 3 (Lines2)
 dat2 - read.zoo(textConnection(Lines2), header = TRUE,
+ index = list(1, 2), FUN = function(d, t) as.POSIXct(paste(d, t)))
 merge(dat1, dat2)
V2.dat1 V3.dat1 V4.dat1 V5.dat1 V2.dat2 V3.dat2
V4.dat2 V5.dat2
2010-10-15 12:09:12  NA  NA  NA  NA74.074.0
74.074.0
2010-10-15 12:09:33  NA  NA  NA  NA73.973.9
73.973.9
2010-10-15 12:20:36  NA  NA  NA  NA74.074.0
74.074.0
2010-10-15 12:30:36  NA  NA  NA  NA74.074.0
74.074.0
2010-10-15 12:41:03  NA  NA  NA  NA73.773.7
73.773.7
2010-10-15 13:43:5473.873.873.873.8  NA  NA
  NA  NA
2010-10-15 13:44:1573.873.873.873.8  NA  NA
  NA  NA
2010-10-15 13:45:5173.873.873.873.8  NA  NA
  NA  NA
2010-10-15 13:46:2173.873.873.873.8  NA  NA
  NA  NA
2010-10-15 13:47:2773.873.873.873.8  NA  NA
  NA  NA
2010-10-15 13:47:5473.873.873.873.8  NA  NA
  NA  NA
2010-10-15 13:49:5173.773.773.773.7  NA  NA
  NA  NA

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with merging two zoo objects

2010-10-15 Thread Megh Dal
Hi Gabor, please see the attached files which is in text format. I have opened 
them on excel then, used clipboard to load them into R. Still really unclear 
what to do.

Also can you please elaborate this term index = list(1, 2), FUN = function(d, 
t) as.POSIXct(paste(d, t)) in your previous file? In help, it is given 
that:If FUN is specified then read.zoo calls FUN with the index as the first 
argument. I really could not connect your syntax with help.

--- On Sat, 10/16/10, Gabor Grothendieck ggrothendi...@gmail.com wrote:

 From: Gabor Grothendieck ggrothendi...@gmail.com
 Subject: Re: [R] Problem with merging two zoo objects
 To: Megh Dal megh700...@yahoo.com
 Cc: r-h...@stat.math.ethz.ch
 Date: Saturday, October 16, 2010, 12:11 AM
 On Fri, Oct 15, 2010 at 2:20 PM, Megh
 Dal megh700...@yahoo.com
 wrote:
  Dear all, I have following 2 zoo objects. However when
 I try to merge those 2 objects into one, nothing is coming
 as intended. Please see below the objects as well as the
 merged object:
 
 
  dat11
                       V2   V3   V4   V5
  2010-10-15 13:43:54 73.8 73.8 73.8 73.8
  2010-10-15 13:44:15 73.8 73.8 73.8 73.8
  2010-10-15 13:45:51 73.8 73.8 73.8 73.8
  2010-10-15 13:46:21 73.8 73.8 73.8 73.8
  2010-10-15 13:47:27 73.8 73.8 73.8 73.8
  2010-10-15 13:47:54 73.8 73.8 73.8 73.8
  2010-10-15 13:49:51 73.7 73.7 73.7 73.7
  dat22
                       V2   V3   V4   V5
  2010-10-15 12:09:12 74.0 74.0 74.0 74.0
  2010-10-15 12:09:33 73.9 73.9 73.9 73.9
  2010-10-15 12:20:36 74.0 74.0 74.0 74.0
  2010-10-15 12:30:36 74.0 74.0 74.0 74.0
  2010-10-15 12:41:03 73.7 73.7 73.7 73.7
  merge(dat11, dat22)
                     V2.dat11 V3.dat11
 V4.dat11 V5.dat11 V2.dat22 V3.dat22 V4.dat22 V5.dat22
  2010-10-15 12:09:12       NA       NA      
 NA       NA       NA       NA       NA      
 NA
  2010-10-15 12:09:33       NA       NA      
 NA       NA       NA       NA       NA      
 NA
  2010-10-15 13:43:54       NA       NA      
 NA       NA       NA       NA       NA      
 NA
  2010-10-15 13:44:15       NA       NA      
 NA       NA       NA       NA       NA      
 NA
  2010-10-15 13:45:51       NA       NA      
 NA       NA       NA       NA       NA      
 NA
  2010-10-15 13:46:21       NA       NA      
 NA       NA       NA       NA       NA      
 NA
  2010-10-15 13:47:27       NA       NA      
 NA       NA       NA       NA       NA      
 NA
  2010-10-15 13:47:54       NA       NA      
 NA       NA       NA       NA       NA      
 NA
  2010-10-15 13:49:51       NA       NA      
 NA       NA       NA       NA       NA      
 NA
  Warning messages:
  1: In MATCH(x, x) == seq_len(length(x)) :
   longer object length is not a multiple of shorter
 object length
  2: In MATCH(x, x) == seq_len(length(x)) :
   longer object length is not a multiple of shorter
 object length
 
  If somebody points me whether I went wrong, it would
 be really great.
 
 
 If I try it then it works properly so there is likely
 something wrong
 with your dat11 and dat22 objects.  If you provide the
 problem
 reproducibly one might be able to say more.
 
  Lines1 - Date Time
 V2   V3   V4   V5
 + 2010-10-15 13:43:54 73.8 73.8 73.8 73.8
 + 2010-10-15 13:44:15 73.8 73.8 73.8 73.8
 + 2010-10-15 13:45:51 73.8 73.8 73.8 73.8
 + 2010-10-15 13:46:21 73.8 73.8 73.8 73.8
 + 2010-10-15 13:47:27 73.8 73.8 73.8 73.8
 + 2010-10-15 13:47:54 73.8 73.8 73.8 73.8
 + 2010-10-15 13:49:51 73.7 73.7 73.7 73.7
 
  Lines2 - Date Time
 V2   V3   V4   V5
 + 2010-10-15 12:09:12 74.0 74.0 74.0 74.0
 + 2010-10-15 12:09:33 73.9 73.9 73.9 73.9
 + 2010-10-15 12:20:36 74.0 74.0 74.0 74.0
 + 2010-10-15 12:30:36 74.0 74.0 74.0 74.0
 + 2010-10-15 12:41:03 73.7 73.7 73.7 73.7
 
  library(zoo)
  dat1 - read.zoo(textConnection(Lines1), header =
 TRUE,
 + index = list(1, 2), FUN = function(d, t)
 as.POSIXct(paste(d, t)))
 Warning messages:
 1: closing unused connection 8 (Lines2)
 2: closing unused connection 7 (Lines1)
 3: closing unused connection 5 (Lines2)
 4: closing unused connection 4 (Lines1)
 5: closing unused connection 3 (Lines2)
  dat2 - read.zoo(textConnection(Lines2), header =
 TRUE,
 + index = list(1, 2), FUN = function(d, t)
 as.POSIXct(paste(d, t)))
  merge(dat1, dat2)
                
     V2.dat1 V3.dat1 V4.dat1 V5.dat1 V2.dat2
 V3.dat2
 V4.dat2 V5.dat2
 2010-10-15 12:09:12      NA   
   NA      NA     
 NA    74.0    74.0
 74.0    74.0
 2010-10-15 12:09:33      NA   
   NA      NA     
 NA    73.9    73.9
 73.9    73.9
 2010-10-15 12:20:36      NA   
   NA      NA     
 NA    74.0    74.0
 74.0    74.0
 2010-10-15 12:30:36      NA   
   NA      NA     
 NA    74.0    74.0
 74.0    74.0
 2010-10-15 12:41:03      NA   
   NA      NA     
 NA    73.7    73.7
 73.7    73.7
 2010-10-15 13:43:54    73.8   
 73.8    73.8    73.8     
 NA      NA
   NA      NA
 2010-10-15 13:44:15    73.8   
 73.8    73.8    73.8     
 NA      NA
   NA      NA
 2010-10-15 13:45:51    73.8   
 73.8    73.8    73.8     
 NA      NA
   NA      NA
 

Re: [R] using apply function and storing output

2010-10-15 Thread Dennis Murphy
Hi:

You need to give a function for rollapply() to apply :)

Here's my toy example:

d - as.data.frame(matrix(rpois(30, 5), nrow = 10))
library(zoo)
d1 - zoo(d)   # uses row numbers as index

# rolling means of 3 in each subseries (columns)
 rollmean(d1, 3)
V1   V2   V3
2 3.00 5.33 4.33
3 3.33 4.33 4.00
4 2.67 3.67 3.33
5 4.33 3.67 3.33
6 5.00 5.00 5.00
7 4.67 4.67 4.00
8 5.67 3.67 4.00
9 5.00 3.00 2.67

# create new function
 cv - function(x) sd(x)/mean(x)

# use new function in rollapply():
 rollapply(d1, 3, FUN = cv)
 V1V2V3
2 0.333 0.1082532 0.5329387
3 0.1732051 0.4803845 0.6614378
4 0.2165064 0.5677271 0.4582576
5 0.7418193 0.5677271 0.4582576
6 0.600 0.3464102 0.400
7 0.7525467 0.4948717 0.6614378
8 0.8882158 0.5677271 0.6614378
9 1.0583005 0.333 0.2165064

This is a simplistic example, but it should get you started.

HTH,
Dennis
On Fri, Oct 15, 2010 at 2:00 AM, David A. dasol...@hotmail.com wrote:

  Hi Dennis and Dimitris,

 thanks for your answers. I am trying the rollmean() function and also the
 rollapply() function because I also want to calculate CV for the values.
 For this I created a co.var() function. I am having problems using them.

 co.var-function(x)(
 +sd(x)/mean(x)
 +)


  dim(mydata)
 [1] 1710  244

 xxx-rollmean(mydata,3,by=3)

 works fine and creates a vector which I will transform into a matrix. I
 still have to find out how to store the output in my previously created
 570x244 0's matrix in an ordered way.

 but, since the examples in the help page says it´s the same, I tried

  xxx-rollapply(mydata,3,mean,by=3)
 Error in UseMethod(rollapply) :
   no applicable method for 'rollapply' applied to an object of class
 c('matrix', 'integer', 'numeric')

 and, with my created function...

  xxx-rollapply(ord_raw_filt.df,3,FUN=co.var,by=3)
 Error in UseMethod(rollapply) :
   no applicable method for 'rollapply' applied to an object of class
 c('matrix', 'integer', 'numeric')

 Can you help me with the error?

 Dave

 --
 Date: Fri, 15 Oct 2010 00:45:08 -0700
 Subject: Re: [R] using apply function and storing output
 From: djmu...@gmail.com
 To: dasol...@hotmail.com
 CC: r-help@r-project.org


 Hi:

 Look into the rollmean() function in package zoo.

 HTH,
 Dennis

 On Fri, Oct 15, 2010 at 12:34 AM, David A. dasol...@hotmail.com wrote:


 Hi list,

 I have a 1710x244 matrix of numerical values and I would like to calculate
 the mean of every group of three consecutive values per column to obtain a
 new matrix of 570x244.  I could get it done using a for loop but how can I
 do that using apply functions?
 In addition to this, do I have to initizalize a 570x244 matrix with 0's to
 store the calculated values or can the output matrix be generated while
 calculating the mean values?

 Cheers,

 Dave

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] scaling on normlized data set

2010-10-15 Thread hosnaw

Hi,

I am using R and tried to normalize the data within each sample group using
RMA. When I tried to import the all the normalized expression data as a
single text file and make a boxplot, it showed discrepancy among the sample
groups.  I tried to scale them or re-normalize them again, so that it can be
used for further analysis. Unfortunately, I did not manage it on using
AffyPLM package. Would you please help me out with this problem.

Thanks in advance.

Best regards,
H. Nawaz
-- 
View this message in context: 
http://r.789695.n4.nabble.com/scaling-on-normlized-data-set-tp2996771p2996771.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bwplot change whiskers position to percentile 5 and P95

2010-10-15 Thread Christophe Bouffioux
Hi David,
More info

Thanks a lot
Christophe

##
library(Hmisc)
library(lattice)
library(fields)
library(gregmisc)
library(quantreg)

 str(sasdata03_a)
'data.frame':   109971 obs. of  6 variables:
 $ jaar  : Factor w/ 3 levels 2006,2007,..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Cat_F : Factor w/ 6 levels a,d,g,h,..: 1 1 1 1 1 1 1 1 1 1
...
 $ montant_oa: num  8087 1960 1573 11502 3862 ...
 $ nombre_cas: int  519 140 111 780 262 125 79 334 97 503 ...
 $ montant_i : num  221 221 221 221 221 ...
 $ nombre_i  : int  12 12 12 12 12 12 12 12 12 12 ...
 aggregate.table(sasdata03_a$nombre_cas, sasdata03_a$jaar,
sasdata03_a$Cat_F, nobs)
a   d  g  h  i   k
2006 7568 1911 5351 7430 7377 7217
2007 7499 1914 5378 7334 7275 7138
2008 7524 1953 5488 7292 7192 7130
 d2008 - sasdata03_a[sasdata03_a$Cat_F==d  sasdata03_a$jaar==2008, ]
 quantile(d2008$nombre_cas, probs = c(0.95))
   95%
3630.2

#bb#
#   #
#  graph on real data   #
#   #
#bb#

bwplot(jaar ~ nombre_cas | Cat_F, xlab=, ylab=,
   data=sasdata03_a  , xlim=c(-100,3800), layout=c(2,3), X =
sasdata03_a$nombre_i,
   pch = |,
   stats=boxplot2.stats,
   main=Fig. 1: P95 d 2008=3630,

   par.settings = list(
   plot.symbol = list(alpha = 1, col = transparent,cex = 1,pch = 20)),
   panel = function(x, y, ..., X, subscripts){
  panel.grid(v = -1, h = 0)
  panel.bwplot(x, y, ..., subscripts = subscripts)
  X - X[subscripts]
  X - tapply(X, y, unique)
  Y - tapply(y, y, unique)
  tg - table(y)
  panel.points(X, Y, cex=3, pch =| , col = red)
  panel.text((3700*0.8), (Y-0.15), labels = paste(N=, tg))

  })


#Simulated data 1  #


ex - data.frame(v1 = log(abs(rt(180, 3)) + 1),
 v2 = rep(c(2007, 2006, 2005), 60),
 z  = rep(c(a, b, c, d, e, f), e = 30))

ex2 - data.frame(v1b = log(abs(rt(18, 3)) + 1),
 v2 = rep(c(2007, 2006, 2005), 6),
 z  = rep(c(a, b, c, d, e, f), e = 3))
ex3 - merge(ex, ex2, by=c(v2,z))

#
#  #
#graph on simulated data   #
#  #
#

bwplot(as.factor(v2) ~ v1 | as.factor(z), data = ex3, layout=c(3,2), X =
ex3$v1b,
  pch = |, main=Boxplot.stats,
  stats=boxplot2.stats,
  par.settings = list(
  plot.symbol = list(alpha = 1, col = transparent,cex = 1,pch = 20)),
 panel = function(x, y, ..., X, subscripts){
   panel.grid(v = -1, h = 0)
   panel.bwplot(x, y, ..., subscripts = subscripts)
X - X[subscripts]
xmax =max(x)
X - tapply(X, y, unique)
Y - tapply(y, y, unique)
tg - table(y)
panel.points(X, Y, cex=3, pch =| , col = red)
panel.text((xmax-xmax*0.1), (Y-0.15), labels = paste(N=, tg))
 })
##



On Thu, Oct 14, 2010 at 5:22 PM, David Winsemius dwinsem...@comcast.netwrote:


 On Oct 14, 2010, at 11:07 AM, Christophe Bouffioux wrote:

 Hi,

 I have tried your proposition, and it works properly on the simulated
 data, but not on my real data, and I do not see any explanations, this is
 weird, i have no more ideas to explore the problem


 You should:
 a) provide the code that you used to call boxplot
 b) offer str(sasdata03_a) ,  since summary will often obscure
 class-related problems
 c) consider running:
 sasdata03_a$Cat_F - factor(sasdata03_a$Cat_F) # to get rid of the empty
 factor level




 so here i give some information on my data, nothing special actually,

 Christophe

  summary(sasdata03_a)
   jaar   Cat_F  montant_oanombre_cas
  montant_i
  2006:36854   a  :22591Min.   :  -112.8Min.   :   -5.0
   Min.   :   33.22
  2007:36538   h  :220561st Qu.:  1465.5 1st Qu.:  104.01st
 Qu.:   37.80
  2008:36579   i  :21844 Median :  4251.5 Median :  307.0
 Median :   50.00
k  :21485 Mean   : 13400.0Mean   :  557.5
  Mean   : 1172.17
g  :16217 3rd Qu.: 13648.53rd Qu.:  748.0
  3rd Qu.:  458.67
d  : 5778  Max.   : 534655.6Max.   :13492.0
   Max.   :17306.73
(Other):0
nombre_i
  Min.   :  1.00
  1st Qu.:  1.00
  Median :  5.00
  Mean   : 44.87
  3rd Qu.: 29.00
  Max.   :689.00

  is.data.frame(sasdata03_a)
 [1] TRUE

 The code of the function:

 boxplot2.stats -  function (x, coef = 1.5, do.conf = TRUE, do.out = TRUE)
 {
if (coef  0)
stop('coef' must not be negative)
nna - !is.na(x)
n - sum(nna)
stats - stats::fivenum(x, na.rm = TRUE)
stats[c(1,5)]- quantile(x, probs=c(0.05, 0.95))
iqr - diff(stats[c(2, 4)])
if (coef == 0)
do.out - FALSE
else {

Re: [R] RJava help

2010-10-15 Thread lord12

public class my_convolve
{
public static  void main(String[] args)
{

}
 public static void convolve()
 {
System.out.println(Hello);
  }   
}  


library(rJava) 
.jinit(classpath=C:/Documents and Settings/GV/workspace/Test/bin,
parameters=-Xmx512m)
.jcall(my_convolve, method=convolve)

Error in .jcall(my_convolve, method = convolve) : 
  method convolve with signature ()V not found



-- 
View this message in context: 
http://r.789695.n4.nabble.com/RJava-help-tp2995886p2996914.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Time vs Concentration Graphs by ID

2010-10-15 Thread Anh Nguyen
I tried that too, it doesn't work because of the way I wrote the code.
Listing y as free or not giving it a limit makes the scale go from -0.5 to
0.5, which is useless. This is what my code looks like now (it's S-Plus
code, btw)-- I'll try reading up on lattices in R to see if I can come up
with something better.

par(mfrow = c(2,2))
unique.id - unique(d1$ID)
unique.id - sort(unique.id)
for(j in unique.id){
temp.id - d1[d1$ID==j,]
unique.dose -unique(temp.id$DOSE)
plot(0,0,type=n,
xlab= Time (hr),ylab=Concentration (ug/L),
xlim=c(0,32),ylim=c(0,200),
main=paste(ID,j)
)
for(i in unique.dose){
temp.subanddose - temp.id[temp.id$DOSE==5,]
points(temp.subanddose$TAD,
temp.subanddose$DV,col=1,pch=1,)
points(temp.subanddose$TAD,
temp.subanddose$IPRE,type=l,col=1)}
for(i in unique.dose){
temp.subanddose - temp.id[temp.id$DOSE==7,]
points(temp.subanddose$TAD,
temp.subanddose$DV,col=2,pch=2,)
points(temp.subanddose$TAD,
temp.subanddose$IPRE,type=l,col=2)}
for(i in unique.dose){
temp.subanddose - temp.id[temp.id$DOSE==10,]
points(temp.subanddose$TAD,
temp.subanddose$DV,col=3,pch=3,)
points(temp.subanddose$TAD,
temp.subanddose$IPRE,type=l,col=3)}
key(text=list(c(5 mg, 7 mg, 10 mg)),lines=list(type=rep
(l,3),col=1:3),border=T,corner=c(1,1))
}



On Fri, Oct 15, 2010 at 1:31 AM, David Winsemius dwinsem...@comcast.netwrote:


 On Oct 15, 2010, at 3:46 AM, Anh Nguyen wrote:

  Hello Dennis,

 That's a very good suggestion. I've attached a template here as a .png
 file,
 I hope you can view it. This is what I've managed to achieve in S-Plus (we
 use S-Plus at work but I also use R because there's some very good R
 packages for PK data that I want to take advantage of that is not
 available
 in S-Plus). The only problem with this is, unfortunately, I cannot figure
 out how make the scale non-uniform and I hope to fix that.


 That would be easy if your efforts which I have not yet seen were in
 lattice. If htat were the case then adding this would solve you problem:

 scales=list(y=list(relation=free)

 --
 David

 My data looks
 like this:

 IDDose Time Conc  Pred ...
 1 5   0  00
 1 5   0.5   68
 1 5   1 16   20
 ...
 1 7   0  00
 1 7   0.5  10   12
 1 7   1 20   19
 ...
 110  3 60   55
 ...
 2512   4 2
 ...
 ect


 I don't care if it's ggplot or something else as long as it looks like how
 I
 envisioned.




 On Fri, Oct 15, 2010 at 12:22 AM, Dennis Murphy djmu...@gmail.com
 wrote:

  I don't recall that you submitted a reproducible example to use as a
 template for assistance. Ista was kind enough to offer a potential
 solution,
 but it was an abstraction based on the limited information provided in
 your
 previous mail. If you need help, please provide an example data set that
 illustrates the problems you're encountering and what you hope to achieve
 -
 your chances of a successful resolution will be much higher when you do.
 BTW, there's a dedicated newsgroup for ggplot2:
 look for the mailing list link at  http://had.co.nz/ggplot2/

 HTH,
 Dennis


 On Thu, Oct 14, 2010 at 10:02 PM, Anh Nguyen eataban...@gmail.com
 wrote:

  I found 2 problems with this method:

 - There is only one line for predicted dose at 5 mg.
 - The different doses are 5, 7, and 10 mg but somehow there is a legend
 for
 5,6,7,8,9,10.
 - Is there a way to make the line smooth?
 - The plots are also getting a little crowded and I was wondering if
 there
 a
 way to split it into 2 or more pages?

 Thanks for your help.

 On Thu, Oct 14, 2010 at 8:09 PM, Ista Zahn iz...@psych.rochester.edu

 wrote:


  Hi,
 Assuming the data is in a data.frame named D, something like

 library(ggplot2) # May need install.packages(ggplot2) first
 ggplot(D, aes(x=Time, y=Concentration, color=Dose) +
 geom_point() +
 geom_line(aes(y = PredictedConcentration, group=1)) +
 facet_wrap(~ID, scales=free, ncol=3)

 should do it.

 -Ista
 On Thu, Oct 14, 2010 at 10:25 PM, thaliagoo eataban...@gmail.com

 wrote:


 Hello-- I have a data for small population who took 1 drug at 3

 different

 doses. I have the actual drug concentrations as well as predicted
 concentrations by my model. This is what I'm looking for:

 - Time vs Concentration by ID (individual plots), with each subject
 occupying 1 plot -- there is to be 9 plots per page (3x3)
 - Observed drug concentration is made up of points, and predicted drug
 concentration is a curve without points. Points and curve will be the

 same

 color for each dose. Different doses will have different colors.
 - A legend to specify which color correlates to which dose.
 - Axes should be different for each individual (as some individual

 will

 

Re: [R] split data with missing data condition

2010-10-15 Thread Henrique Dallazuanna
Try this:

split(as.data.frame(DF), is.na(DF$x))


On Fri, Oct 15, 2010 at 9:45 AM, Jumlong Vongprasert jumlong.u...@gmail.com
 wrote:

 Dear all
 I have data like this:
  x  y
  [1,] 59.74889  3.1317081
  [2,] 38.77629  1.7102589
  [3,]   NA  2.2312962
  [4,] 32.35268  1.3889621
  [5,] 74.01394  1.5361227
  [6,] 34.82584  1.1665412
  [7,] 42.72262  2.7870875
  [8,] 70.54999  3.3917257
  [9,] 59.37573  2.6763249
  [10,] 68.87422  1.9697770
  [11,] 19.00898  2.0584415
  [12,] 60.27915  2.5365194
  [13,] 50.76850  2.3943836
  [14,]   NA  2.2862790
  [15,] 39.01229  1.7924957

 and I want to spit data into two set of data,  data set of nonmising and
 data set of missing.
 How I can do this.
 Many Thanks.
 Jumlong


 --
 Jumlong Vongprasert
 Institute of Research and Development
 Ubon Ratchathani Rajabhat University
 Ubon Ratchathani
 THAILAND
 34000

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using apply function and storing output

2010-10-15 Thread David A.

Thanks Dennis,

I don't think it was a problem of not feeding in a function for rollapply(), 
because I was using mean() and my co.var() function in the FUN argument. 
The key part seems to be the transformation that zoo() does to the matrix. If I 
do the same transformation to my original matrix, the rollapply() function now 
works fine.

Thanks again

David

Date: Fri, 15 Oct 2010 03:00:27 -0700
Subject: Re: [R] using apply function and storing output
From: djmu...@gmail.com
To: dasol...@hotmail.com
CC: r-help@r-project.org

Hi:

You need to give a function for rollapply() to apply :)

Here's my toy example:

d - as.data.frame(matrix(rpois(30, 5), nrow = 10))
library(zoo)
d1 - zoo(d)   # uses row numbers as index


# rolling means of 3 in each subseries (columns)
 rollmean(d1, 3)
V1   V2   V3
2 3.00 5.33 4.33
3 3.33 4.33 4.00
4 2.67 3.67 3.33
5 4.33 3.67 3.33

6 5.00 5.00 5.00
7 4.67 4.67 4.00
8 5.67 3.67 4.00
9 5.00 3.00 2.67

# create new function
 cv - function(x) sd(x)/mean(x)

# use new function in rollapply():

 rollapply(d1, 3, FUN = cv)
 V1V2V3
2 0.333 0.1082532 0.5329387
3 0.1732051 0.4803845 0.6614378
4 0.2165064 0.5677271 0.4582576
5 0.7418193 0.5677271 0.4582576
6 0.600 0.3464102 0.400

7 0.7525467 0.4948717 0.6614378
8 0.8882158 0.5677271 0.6614378
9 1.0583005 0.333 0.2165064

This is a simplistic example, but it should get you started.

HTH,
Dennis
On Fri, Oct 15, 2010 at 2:00 AM, David A. dasol...@hotmail.com wrote:






Hi Dennis and Dimitris,

thanks for your answers. I am trying the rollmean() function and also the 
rollapply() function because I also want to calculate CV for the values.
For this I created a co.var() function. I am having problems using them.


co.var-function(x)(
+sd(x)/mean(x)
+)


 dim(mydata)
[1] 1710  244

xxx-rollmean(mydata,3,by=3) 

works fine and creates a vector which I will transform into a matrix. I still 
have to find out how to store the output in my previously created 570x244 0's 
matrix in an ordered way.


but, since the examples in the help page says it´s the same, I tried

 xxx-rollapply(mydata,3,mean,by=3)
Error in UseMethod(rollapply) : 
  no applicable method for 'rollapply' applied to an object of class 
c('matrix', 'integer', 'numeric')


and, with my created function...

 xxx-rollapply(ord_raw_filt.df,3,FUN=co.var,by=3)
Error in UseMethod(rollapply) : 
  no applicable method for 'rollapply' applied to an object of class 
c('matrix', 'integer', 'numeric')


Can you help me with the error?

Dave

Date: Fri, 15 Oct 2010 00:45:08 -0700
Subject: Re: [R] using apply function and storing output
From: djmu...@gmail.com

To: dasol...@hotmail.com
CC: r-help@r-project.org

Hi:


Look into the rollmean() function in package zoo.

HTH,
Dennis

On Fri, Oct 15, 2010 at 12:34 AM, David A. dasol...@hotmail.com wrote:




Hi list,



I have a 1710x244 matrix of numerical values and I would like to calculate the 
mean of every group of three consecutive values per column to obtain a new 
matrix of 570x244.  I could get it done using a for loop but how can I do that 
using apply functions?



In addition to this, do I have to initizalize a 570x244 matrix with 0's to 
store the calculated values or can the output matrix be generated while 
calculating the mean values?



Cheers,



Dave



[[alternative HTML version deleted]]



__

R-help@r-project.org mailing list



PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


  

  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem using BRugs

2010-10-15 Thread Sally Luo
Hi R users,

I am trying to call openbugs from R.  And I got the following error message:

~
model is syntactically correct
expected the collection operator c error pos 8 (error on line 1)
variable ww is not defined in model or in data set
[1] C:\\DOCUME~1\\maomao\\LOCALS~1\\Temp\\RtmpqJk9R3/inits1.txt
Initializing chain 1: model must be compiled before initial values loaded
model must be initialized before updating
model must be initialized before DIC an be monitored
Error in samplesSet(parametersToSave) :
  model must be initialized before monitors used
~~

I did define variable ww in my data and model (they are listed below).  I
am not sure if this is due to some errors in my code (please see below) or
because openbugs cannot handle the model I am using.  In my model, y[i] also
depends on all other y[j]s.  Could you help me figure out the problem and
hopefully get the code to work?

Many thanks for your help.   --- Maomao

~~
data-list(y,cap2,pol2,cap1,pol1,g,wo,wd,ww,mu,tau)

inits-function() {list(beta=beta0, rho_o=rho_o_0, rho_d=rho_d_0,
rho_w=rho_w_0)}

parameters-c(beta, rho_o, rho_d, rho_w)

probit.sim-BRugsFit(data,inits,parameters,modelFile=spatial.openbugs.txt,numChains=1,nIter=2000)



# my model

model {

for (i in 1:676) {

y[i] ~ dbern(p[i])

wwy[i]- inprod(ww[i, 1:676] , y[])

woy[i]- inprod(wo[i, 1:676] , y[])

wdy[i]- inprod(wd[i, 1:676] , y[])

probit(p[i])- rho_o * woy[i] + rho_d * wdy[i] + rho_w * wwy[i] + beta[1] +
beta[2] * cap2[i] + beta[3] * pol2[i] + beta[4] * cap1[i] + beta[5] *
pol1[i] + beta[6] * g[i]+ e[i]

}

# Priors

for (j in 1:6) {

beta[1:6] ~ dmnorm(mu[1:6], tau[1:6, 1:6])

   }

rho_o ~ dunif(-1,1)

rho_d ~ dunif(-1,1)

rho_w ~ dunif(-1,1)

for (i in 1:676) {

e[i] ~ dnorm(0, 1)

}

}

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Time vs Concentration Graphs by ID

2010-10-15 Thread Anh Nguyen
Thank you for the very helpful tips. Just one last question:
- In the lattice method, how can I plot TIME vs OBSconcentration and TIME vs
PREDconcentration in one graph (per individual)? You said in lattice you
would replace 'smooth' by 'l' in the type = argument of xyplot() that just
means now the OBSconc is replaced by a line instead of points and PREDconc
is not plotted, right?
-- I tried to superimpose 2 lattices but that won't work. I can keep the
x-scale the same but not the y-scale. Ex: max pred conc = 60 and max obs
conc = 80 and thus for this case, there would be 2 different y-scales
superimposed on one another.
- the ggplot2 method works, just the weird 5,6,7,8,9,10 mg legends (there
are only 5,7,10 mg doses) but it's nothing that photoshop can't take care
of.

This is very cool, I think I understand it a lot better now. It was a lot
easier than what I was doing before, that's for sure. Thanks!

On Fri, Oct 15, 2010 at 2:32 AM, Dennis Murphy djmu...@gmail.com wrote:

 Hi:

 To get the plots precisely as you have given them in your png file, you're
 most likely going to have to use base graphics, especially if you want a
 separate legend in each panel. Packages ggplot2 and lattice have more
 structured ways of constructing such graphs, so you give up a bit of freedom
 in the details of plot construction to get nicer default configurations.

 Perhaps you've written a panel function in S-PLUS (?) to produce each graph
 in your png file - if so, you could simply add a couple of lines to
 determine the max y-value and from that the limits of the y scale. It
 shouldn't be at all difficult to make such modifications. Since R base
 graphics is very similar to base graphics in S-PLUS, you could possibly get
 away with popping your S-PLUS code directly into R and see how far that
 takes you.

 Lattice is based on Trellis graphics, but the syntax in lattice has changed
 a fair bit vis a vis Trellis as the package has developed. ggplot2 is a more
 recent graphics system in R predicated on the grammar of graphics exposited
 by Leland Wilkinson.

 For my example, I've modified the Theophylline data set in package nlme,
 described in Pinheiro  Bates (2000), Mixed Effects Models in S and S-PLUS,
 Springer. The original data set has eleven unique doses - I combined them
 into three intervals and converted them to factor with cut(). I also created
 four groups of Subjects and put them into a variable ID. The output data
 frame is called theo. I didn't fit the nlme models to the subjects -
 instead, I was lazy and used loess smoothing instead. The code to generate
 the data frame is given below; this is what we mean by a 'reproducible
 example', something that can be copied and pasted into an R session.

 # Use modified version of Theophylline data in package nlme

 library(nlme)
 theo - Theoph
 theo - subset(theo, Dose  3.5)
 theo$dose - cut(theo$Dose, breaks = c(3, 4.5, 5, 6), labels = c('4.25',
 '4.8', '5.5'))
 theo - as.data.frame(theo)
 theo$ID - with(theo, ifelse(Subject %in% c(6, 7, 12), 1,
   ifelse(Subject %in% c(2, 8, 10), 2,
   ifelse(Subject %in% c(4, 11, 5), 3, 4) )))
 # ID is used for faceting, dose for color and shape

 # lattice version:

 xyplot(conc ~ Time | factor(ID), data = theo, col.line = 1:3,
   pch = 1:3, col = 1:3, groups = dose, type = c('p', 'smooth'),
   scales = list(y = list(relation = 'free')),
   auto.key = list(corner = c(0.93, 0.4), lines = TRUE, points =
 TRUE,
   text = levels(theo$dose)) )

 # ggplot2 version:
 # scales = 'free_y' allows independent y scales per panel
 g - ggplot(theo, aes(x = Time, y = conc, shape = dose, colour = dose,
group = Subject))
 g + geom_point() + geom_smooth(method = 'loess', se = FALSE) +
 facet_wrap( ~ ID, ncol = 2, scales = 'free_y') +
 opts(legend.position = c(0.9, 0.4))

 This is meant to give you some indication how the two graphics systems work
 - it's a foundation, not an end product. There are a couple of ways I could
 have adjusted the y-scales in the lattice graphs (either directly in the
 scales = ... part or to use a prepanel function for loess), but since you're
 not likely to use loess in your plots, it's not an important consideration.

 Both ggplot2 and lattice place the legend outside the plot area by default;
 I've illustrated a couple of ways to pull it into the plot region FYI.

 One other thing: if your data set contains fitted values from your PK
 models for each subject * dose combination, the loess smoothing is
 unnecessary. In ggplot2, you would use geom_line(aes(y = pred), ...) in
 place of geom_smooth(), and in lattice you would replace 'smooth' by 'l' in
 the type = argument of xyplot().

 HTH,
 Dennis



 On Fri, Oct 15, 2010 at 12:46 AM, Anh Nguyen eataban...@gmail.com wrote:

 Hello Dennis,

 That's a very good suggestion. I've attached a template here as a .png
 file, I hope you can view it. This 

[R] feed cut() output into goodness-of-fit tests

2010-10-15 Thread Andrei Zorine
Hello,
My question is assuming I have cut()'ed my sample and look at the
table() of it, how can I compute probabilities for the bins? Do I have
to parse table's names() to fetch bin endpoints to pass them to
p[distr-name] functions? i really don't want to input arguments to PDF
functions by hand (nor copy-and-paste way).

 x.fr - table(cut(x,10))
 x.fr

(0.0617,0.549]   (0.549,1.04](1.04,1.52](1.52,2.01] (2.01,2.5]
   16 28 26 18  6
   (2.5,2.99](2.99,3.48](3.48,3.96](3.96,4.45](4.45,4.94]
3  2  0  0  1

 names(x.fr)
 [1] (0.0617,0.549] (0.549,1.04]   (1.04,1.52](1.52,2.01]
 [5] (2.01,2.5] (2.5,2.99] (2.99,3.48](3.48,3.96]
 [9] (3.96,4.45](4.45,4.94]


--

Andrei Zorine

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with merging two zoo objects

2010-10-15 Thread Megh

I have compared dat11 and x using str() function, however did not find
drastic difference:


 str(dat11)
‘zoo’ series from 2010-10-15 13:43:54 to 2010-10-15 13:49:51
  Data: num [1:7, 1:4] 73.8 73.8 73.8 73.8 73.8 73.8 73.7 73.8 73.8 73.8 ...
 - attr(*, dimnames)=List of 2
  ..$ : chr [1:7] 7 6 5 4 ...
  ..$ : chr [1:4] V2 V3 V4 V5
  Index:  POSIXlt[1:7], format: 2010-10-15 13:43:54 2010-10-15 13:44:15
2010-10-15 13:45:51 2010-10-15 13:46:21 2010-10-15 13:47:27
2010-10-15 13:47:54 ...
 str(x)
‘zoo’ series from 2003-02-01 to 2003-02-14
  Data: int [1:5, 1:2] 1 2 3 4 5 6 7 8 9 10
  Index: Class 'Date'  num [1:5] 12084 12086 12090 12092 12097

Thanks,
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Problem-with-merging-two-zoo-objects-tp2997472p2997510.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with merging two zoo objects

2010-10-15 Thread Achim Zeileis

On Fri, 15 Oct 2010, Megh Dal wrote:

Hi Gabor, please see the attached files which is in text format. I have 
opened them on excel then, used clipboard to load them into R. Still 
really unclear what to do.


I've read both files using read.zoo():

R z1 - read.zoo(dat1.txt, sep = ,, header = TRUE,
+format = %m/%d/%Y %H:%M:%S, tz = )
Warning message:
In zoo(rval3, ix) :
  some methods for zoo objects do not work if the index entries in 
'order.by' are not unique

R z2 - read.zoo(dat2.txt, sep = ,, header = TRUE,
+format = %m/%d/%Y %H:%M:%S, tz = )


Then, merge() does not work because some time indexes are not unique (and 
it would be unclear how these should be matched):


R merge(z1, z2)
Error in merge.zoo(z1, z2) :
  series cannot be merged with non-unique index entries in a series

However, you can remove the duplicated, e.g., by computing averages:

R z1a - aggregate(z1, time(z1), mean)

Then

R merge(z1a, z2)

works.

Also can you please elaborate this term index = list(1, 2), FUN = 
function(d, t) as.POSIXct(paste(d, t)) in your previous file? In help, 
it is given that:If FUN is specified then read.zoo calls FUN with the 
index as the first argument. I really could not connect your syntax 
with help.


The way Gabor read the data, the index was in two separate columns 
(columns 1 and 2). Hence, he specified

  index = list(1, 2)
and then provided a function that would return a POSIXct object when 
called with two arguments

  FUN(column1, column2)

hth,
Z


--- On Sat, 10/16/10, Gabor Grothendieck ggrothendi...@gmail.com wrote:


From: Gabor Grothendieck ggrothendi...@gmail.com
Subject: Re: [R] Problem with merging two zoo objects
To: Megh Dal megh700...@yahoo.com
Cc: r-h...@stat.math.ethz.ch
Date: Saturday, October 16, 2010, 12:11 AM
On Fri, Oct 15, 2010 at 2:20 PM, Megh
Dal megh700...@yahoo.com
wrote:
 Dear all, I have following 2 zoo objects. However when
I try to merge those 2 objects into one, nothing is coming
as intended. Please see below the objects as well as the
merged object:


 dat11
                      V2   V3   V4   V5
 2010-10-15 13:43:54 73.8 73.8 73.8 73.8
 2010-10-15 13:44:15 73.8 73.8 73.8 73.8
 2010-10-15 13:45:51 73.8 73.8 73.8 73.8
 2010-10-15 13:46:21 73.8 73.8 73.8 73.8
 2010-10-15 13:47:27 73.8 73.8 73.8 73.8
 2010-10-15 13:47:54 73.8 73.8 73.8 73.8
 2010-10-15 13:49:51 73.7 73.7 73.7 73.7
 dat22
                      V2   V3   V4   V5
 2010-10-15 12:09:12 74.0 74.0 74.0 74.0
 2010-10-15 12:09:33 73.9 73.9 73.9 73.9
 2010-10-15 12:20:36 74.0 74.0 74.0 74.0
 2010-10-15 12:30:36 74.0 74.0 74.0 74.0
 2010-10-15 12:41:03 73.7 73.7 73.7 73.7
 merge(dat11, dat22)
                    V2.dat11 V3.dat11
V4.dat11 V5.dat11 V2.dat22 V3.dat22 V4.dat22 V5.dat22
 2010-10-15 12:09:12       NA       NA      
NA       NA       NA       NA       NA      
NA
 2010-10-15 12:09:33       NA       NA      
NA       NA       NA       NA       NA      
NA
 2010-10-15 13:43:54       NA       NA      
NA       NA       NA       NA       NA      
NA
 2010-10-15 13:44:15       NA       NA      
NA       NA       NA       NA       NA      
NA
 2010-10-15 13:45:51       NA       NA      
NA       NA       NA       NA       NA      
NA
 2010-10-15 13:46:21       NA       NA      
NA       NA       NA       NA       NA      
NA
 2010-10-15 13:47:27       NA       NA      
NA       NA       NA       NA       NA      
NA
 2010-10-15 13:47:54       NA       NA      
NA       NA       NA       NA       NA      
NA
 2010-10-15 13:49:51       NA       NA      
NA       NA       NA       NA       NA      
NA
 Warning messages:
 1: In MATCH(x, x) == seq_len(length(x)) :
  longer object length is not a multiple of shorter
object length
 2: In MATCH(x, x) == seq_len(length(x)) :
  longer object length is not a multiple of shorter
object length

 If somebody points me whether I went wrong, it would
be really great.


If I try it then it works properly so there is likely
something wrong
with your dat11 and dat22 objects.  If you provide the
problem
reproducibly one might be able to say more.

 Lines1 - Date Time
V2   V3   V4   V5
+ 2010-10-15 13:43:54 73.8 73.8 73.8 73.8
+ 2010-10-15 13:44:15 73.8 73.8 73.8 73.8
+ 2010-10-15 13:45:51 73.8 73.8 73.8 73.8
+ 2010-10-15 13:46:21 73.8 73.8 73.8 73.8
+ 2010-10-15 13:47:27 73.8 73.8 73.8 73.8
+ 2010-10-15 13:47:54 73.8 73.8 73.8 73.8
+ 2010-10-15 13:49:51 73.7 73.7 73.7 73.7

 Lines2 - Date Time
V2   V3   V4   V5
+ 2010-10-15 12:09:12 74.0 74.0 74.0 74.0
+ 2010-10-15 12:09:33 73.9 73.9 73.9 73.9
+ 2010-10-15 12:20:36 74.0 74.0 74.0 74.0
+ 2010-10-15 12:30:36 74.0 74.0 74.0 74.0
+ 2010-10-15 12:41:03 73.7 73.7 73.7 73.7

 library(zoo)
 dat1 - read.zoo(textConnection(Lines1), header =
TRUE,
+ index = list(1, 2), FUN = function(d, t)
as.POSIXct(paste(d, t)))
Warning messages:
1: closing unused connection 8 (Lines2)
2: closing unused connection 7 (Lines1)
3: closing unused connection 5 (Lines2)
4: closing unused connection 4 (Lines1)
5: closing unused connection 3 

[R] Beginner question on bar plot

2010-10-15 Thread steven mosher
I've read a number of examples on doing a multiple bar plot, but cant seem
to grasp
how they work or how to get my data into the proper form.

I have two  variable holding the same factor

The variables were created using a cut command, The following simulates that

A - 1:100
B - 1:100
 A[30:60] - 43
 Acut - cut(A,breaks=c(0,10,45,120),labels=c(low,med,high))
 Bcut - cut(B,breaks=c(0,10,45,120),labels=c(low,med,high))

What I want to do is create a barplot with  3 groups of side by side bars

group 1, = low and the two bars would be the count for Acut, and the count
for Bcut
group 2 = med and the two bars again would be the counts for  this factor
level in Acut and Bcut
group 3 = high  and like the above two.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Time vs Concentration Graphs by ID

2010-10-15 Thread Ista Zahn
On Fri, Oct 15, 2010 at 7:29 AM, Anh Nguyen eataban...@gmail.com wrote:
 Thank you for the very helpful tips. Just one last question:
 - In the lattice method, how can I plot TIME vs OBSconcentration and TIME vs
 PREDconcentration in one graph (per individual)? You said in lattice you
 would replace 'smooth' by 'l' in the type = argument of xyplot() that just
 means now the OBSconc is replaced by a line instead of points and PREDconc
 is not plotted, right?
 -- I tried to superimpose 2 lattices but that won't work. I can keep the
 x-scale the same but not the y-scale. Ex: max pred conc = 60 and max obs
 conc = 80 and thus for this case, there would be 2 different y-scales
 superimposed on one another.
 - the ggplot2 method works, just the weird 5,6,7,8,9,10 mg legends (there
 are only 5,7,10 mg doses) but it's nothing that photoshop can't take care
 of.

No need to resort to photoshop... see the breaks argument of ?scale_x_continous

-Ista


 This is very cool, I think I understand it a lot better now. It was a lot
 easier than what I was doing before, that's for sure. Thanks!

 On Fri, Oct 15, 2010 at 2:32 AM, Dennis Murphy djmu...@gmail.com wrote:

 Hi:

 To get the plots precisely as you have given them in your png file, you're
 most likely going to have to use base graphics, especially if you want a
 separate legend in each panel. Packages ggplot2 and lattice have more
 structured ways of constructing such graphs, so you give up a bit of freedom
 in the details of plot construction to get nicer default configurations.

 Perhaps you've written a panel function in S-PLUS (?) to produce each graph
 in your png file - if so, you could simply add a couple of lines to
 determine the max y-value and from that the limits of the y scale. It
 shouldn't be at all difficult to make such modifications. Since R base
 graphics is very similar to base graphics in S-PLUS, you could possibly get
 away with popping your S-PLUS code directly into R and see how far that
 takes you.

 Lattice is based on Trellis graphics, but the syntax in lattice has changed
 a fair bit vis a vis Trellis as the package has developed. ggplot2 is a more
 recent graphics system in R predicated on the grammar of graphics exposited
 by Leland Wilkinson.

 For my example, I've modified the Theophylline data set in package nlme,
 described in Pinheiro  Bates (2000), Mixed Effects Models in S and S-PLUS,
 Springer. The original data set has eleven unique doses - I combined them
 into three intervals and converted them to factor with cut(). I also created
 four groups of Subjects and put them into a variable ID. The output data
 frame is called theo. I didn't fit the nlme models to the subjects -
 instead, I was lazy and used loess smoothing instead. The code to generate
 the data frame is given below; this is what we mean by a 'reproducible
 example', something that can be copied and pasted into an R session.

 # Use modified version of Theophylline data in package nlme

 library(nlme)
 theo - Theoph
 theo - subset(theo, Dose  3.5)
 theo$dose - cut(theo$Dose, breaks = c(3, 4.5, 5, 6), labels = c('4.25',
 '4.8', '5.5'))
 theo - as.data.frame(theo)
 theo$ID - with(theo, ifelse(Subject %in% c(6, 7, 12), 1,
                       ifelse(Subject %in% c(2, 8, 10), 2,
                       ifelse(Subject %in% c(4, 11, 5), 3, 4) )))
 # ID is used for faceting, dose for color and shape

 # lattice version:

 xyplot(conc ~ Time | factor(ID), data = theo, col.line = 1:3,
           pch = 1:3, col = 1:3, groups = dose, type = c('p', 'smooth'),
           scales = list(y = list(relation = 'free')),
           auto.key = list(corner = c(0.93, 0.4), lines = TRUE, points =
 TRUE,
                           text = levels(theo$dose)) )

 # ggplot2 version:
 # scales = 'free_y' allows independent y scales per panel
 g - ggplot(theo, aes(x = Time, y = conc, shape = dose, colour = dose,
                        group = Subject))
 g + geom_point() + geom_smooth(method = 'loess', se = FALSE) +
     facet_wrap( ~ ID, ncol = 2, scales = 'free_y') +
     opts(legend.position = c(0.9, 0.4))

 This is meant to give you some indication how the two graphics systems work
 - it's a foundation, not an end product. There are a couple of ways I could
 have adjusted the y-scales in the lattice graphs (either directly in the
 scales = ... part or to use a prepanel function for loess), but since you're
 not likely to use loess in your plots, it's not an important consideration.

 Both ggplot2 and lattice place the legend outside the plot area by default;
 I've illustrated a couple of ways to pull it into the plot region FYI.

 One other thing: if your data set contains fitted values from your PK
 models for each subject * dose combination, the loess smoothing is
 unnecessary. In ggplot2, you would use geom_line(aes(y = pred), ...) in
 place of geom_smooth(), and in lattice you would replace 'smooth' by 'l' in
 the type = argument of xyplot().

 HTH,
 Dennis



 On Fri, Oct 15, 2010 

Re: [R] Problem with merging two zoo objects

2010-10-15 Thread Gabor Grothendieck
On Fri, Oct 15, 2010 at 3:22 PM, Megh Dal megh700...@yahoo.com wrote:
 Hi Gabor, please see the attached files which is in text format. I have 
 opened them on excel then, used clipboard to load them into R. Still really 
 unclear what to do.

 Also can you please elaborate this term index = list(1, 2), FUN = 
 function(d, t) as.POSIXct(paste(d, t)) in your previous file? In help, it is 
 given that:If FUN is specified then read.zoo calls FUN with the index as the 
 first argument. I really could not connect your syntax with help.

1. It works for me with the files you supplied. Note that you have
some duplicate times in your first file so I aggregated them using
only the last of any set of duplicates:
 library(zoo)
 dat11 - read.zoo(dal1.csv, aggregate = function(x) tail(x, 1),
+ sep = ,, header = TRUE, tz = , format = %m/%d/%Y %H:%M:%S)
 dat22 - read.zoo(dal2.csv,
+ sep = ,, header = TRUE, tz = , format =  %m/%d/%Y %H:%M:%S)
 m - merge(dat11, dat22)
 str(m)
‘zoo’ series from 2010-10-15 09:00:24 to 2010-10-15 13:49:51
  Data: num [1:361, 1:8] 74.3 74.3 74.3 74.2 74.2 ...
 - attr(*, dimnames)=List of 2
  ..$ : NULL
  ..$ : chr [1:8] data.open.dat11 data.high.dat11 data.low.dat11
data.close.dat11 ...
  Index:  POSIXct[1:361], format: 2010-10-15 09:00:24 2010-10-15
09:01:15 ...


  packageDescription(zoo)$Version
[1] 1.6-4

2. You seem to be using an old version of zoo.  With the latest
version of zoo on CRAN, 1.6-4, the index.column= documentation in
help(read.zoo) does document the list construction.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] feed cut() output into goodness-of-fit tests

2010-10-15 Thread Ista Zahn
On Fri, Oct 15, 2010 at 10:22 AM, Andrei Zorine zoav1...@gmail.com wrote:
 Hello,
 My question is assuming I have cut()'ed my sample and look at the
 table() of it, how can I compute probabilities for the bins?

I actually don't know what you mean by this (my own ignorance probably).

 Do I have
 to parse table's names() to fetch bin endpoints

For equal-width bins you can use

seq(min(x), max(x), by = (max(x) - min(x))/10)

HTH,
Ista

to pass them to
 p[distr-name] functions? i really don't want to input arguments to PDF
 functions by hand (nor copy-and-paste way).

 x.fr - table(cut(x,10))
 x.fr

 (0.0617,0.549]   (0.549,1.04]    (1.04,1.52]    (1.52,2.01]     (2.01,2.5]
           16             28             26             18              6
   (2.5,2.99]    (2.99,3.48]    (3.48,3.96]    (3.96,4.45]    (4.45,4.94]
            3              2              0              0              1

 names(x.fr)
  [1] (0.0617,0.549] (0.549,1.04]   (1.04,1.52]    (1.52,2.01]
  [5] (2.01,2.5]     (2.5,2.99]     (2.99,3.48]    (3.48,3.96]
  [9] (3.96,4.45]    (4.45,4.94]


 --

 Andrei Zorine

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with merging two zoo objects

2010-10-15 Thread Megh

Thanks Gabor for pointing to my old version. However I got one more question
why the argument tz= is sitting there? As you are not passing any explicit
value for that, I am assuming it is redundant. Without any tz argument, I
got following:


head(read.zoo(file=f:/dat1.txt, header=T, sep=,, format = %m/%d/%Y
%H:%M:%S))
   data.open data.high data.low data.close
2010-10-15  73.7  73.7 73.7   73.7
2010-10-15  73.8  73.8 73.8   73.8
2010-10-15  73.8  73.8 73.8   73.8
2010-10-15  73.8  73.8 73.8   73.8
2010-10-15  73.8  73.8 73.8   73.8
2010-10-15  73.8  73.8 73.8   73.8
Warning messages:
1: In zoo(rval3, ix) :
  some methods for “zoo” objects do not work if the index entries in
‘order.by’ are not unique
2: In zoo(rval, x.index[i]) :
  some methods for “zoo” objects do not work if the index entries in
‘order.by’ are not unique
 packageDescription(zoo)
Package: zoo
Version: 1.6-4
Date: 2010-07-09
Title: Z's ordered observations
Author: Achim Zeileis, Gabor Grothendieck, Felix Andrews
Maintainer: Achim Zeileis achim.zeil...@r-project.org
Description: An S3 class with methods for totally ordered indexed
observations. It is particularly aimed
   at irregular time series of numeric vectors/matrices and factors.
zoo's key design goals are
   independence of a particular index/date/time class and
consistency with ts and base R by
   providing methods to extend standard generics.
Depends: R (= 2.10.0), stats
Suggests: coda, chron, DAAG, fCalendar, fSeries, fts, its, lattice,
strucchange, timeDate, timeSeries,
   tis, tseries, xts
Imports: stats, utils, graphics, grDevices, lattice (= 0.18-1)
LazyLoad: yes
License: GPL-2


Clearly the format argument is not working properly, the time component is
missing. Why it is so?
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Problem-with-merging-two-zoo-objects-tp2997472p2997662.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Beginner question on bar plot

2010-10-15 Thread Peter Alspach
Tena koe Steven

cutData - rbind(summary(Acut), summary(Bcut))
barplot(cutData, beside=TRUE)

should get you started.  The challenge, as you identify, is to get the data 
into the appropriate form and the simple approach I have used may not work for 
your real data.

HTH 


Peter Alspach

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of steven mosher
 Sent: Saturday, 16 October 2010 9:01 a.m.
 To: r-help
 Subject: [R] Beginner question on bar plot
 
 I've read a number of examples on doing a multiple bar plot, but cant
 seem
 to grasp
 how they work or how to get my data into the proper form.
 
 I have two  variable holding the same factor
 
 The variables were created using a cut command, The following simulates
 that
 
 A - 1:100
 B - 1:100
  A[30:60] - 43
  Acut - cut(A,breaks=c(0,10,45,120),labels=c(low,med,high))
  Bcut - cut(B,breaks=c(0,10,45,120),labels=c(low,med,high))
 
 What I want to do is create a barplot with  3 groups of side by side
 bars
 
 group 1, = low and the two bars would be the count for Acut, and the
 count
 for Bcut
 group 2 = med and the two bars again would be the counts for  this
 factor
 level in Acut and Bcut
 group 3 = high  and like the above two.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

The contents of this e-mail are confidential and may be subject to legal 
privilege.
 If you are not the intended recipient you must not use, disseminate, 
distribute or
 reproduce all or any part of this e-mail or attachments.  If you have received 
this
 e-mail in error, please notify the sender and delete all material pertaining 
to this
 e-mail.  Any opinion or views expressed in this e-mail are those of the 
individual
 sender and may not represent those of The New Zealand Institute for Plant and
 Food Research Limited.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Recovering x/y coordinates from a scatterplot image

2010-10-15 Thread Rob James
Do I recall correctly that there is an R package that can take an image, 
and help one estimate the x/y coordinates?  I can't find the package, 
thought it was an R-tool, but would appreciate any leads.


Thanks,

Rob

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] creating 'all' sum contrasts

2010-10-15 Thread Jonathan Christensen
Michael,

Let c_1 and c_2 be vectors representing contrasts. Then c_1 and c_2
are orthogonal if and only if the inner product is 0. In your example,
you have vectors (1,0,-1) and (0,1,-1). The inner product is 1, so
they are not orthogonal. It's impossible to have more orthogonal
contrasts than you have levels in your factor, a result from basic
linear algebra.

You can get all possible pairwise contrasts, which is different from
orthogonal contrasts (in fact, it's only possible to have floor(n/2)
orthogonal pairwise contrasts). This is probably not the easiest way,
but it works:

n - 10
M - matrix(0,nrow=n,ncol=n*(n-1)/2)
comb - combn(n,2)
M[cbind(comb[1,],1:(n*(n-1)/2))] - 1
M[cbind(comb[2,],1:(n*(n-1)/2))] - -1

M is then a matrix containing all pairwise contrasts for n levels of a factor.

Hope that helps,

Jonathan


On Fri, Oct 15, 2010 at 10:30 AM, Michael Hopkins
hopk...@upstreamsystems.com wrote:

 On 15 Oct 2010, at 13:55, Berwin A Turlach wrote:

 G'day Michael,


 Hi Berwin

 Thanks for the reply

 On Fri, 15 Oct 2010 12:09:07 +0100
 Michael Hopkins hopk...@upstreamsystems.com wrote:

 OK, my last question didn't get any replies so I am going to try and
 ask a different way.

 When I generate contrasts with contr.sum() for a 3 level categorical
 variable I get the 2 orthogonal contrasts:

 contr.sum( c(1,2,3) )
  [,1] [,2]
 1    1    0
 2    0    1
 3   -1   -1

 These two contrasts are *not* orthogonal.

 I'm surprised.  Can you please tell me how you calculated that.

 This provides the contrasts 1-3 and 2-3 as expected.  But I also
 want it to create 1-2 (i.e. 1-3 - 2-3).  So in general I want
 all possible orthogonal contrasts - think of it as the contrasts for
 all pairwise comparisons between the levels.

 You have to decide what you want.  The contrasts for all pairwise
 comparaisons between the levels or all possible orthogonal contrasts?


 Well the pairwise contrasts are the most important as I am looking for 
 evidence of whether they are zero (i.e. no difference between levels) or not. 
  But see my above comment about orthogonality.

 The latter is actually not well defined.  For a factor with p levels,
 there would be p orthogonal contrasts, which are only identifiable up to
 rotation, hence infinitely many such sets. But there are p(p-1)
 pairwise comparisons. So unless p=2, yo have to decide what you want

 Well of course the pairwise comparisons are bi-directional so in fact only 
 p(p-1)/2 are of interest to me.

 Are there are any options for contrast() or other functions/libraries
 that will allow me to do this automatically?

 Look at package multcomp, in particular functions glht and mcp, these
 might help.

 Thanks I will have a look.

 But I want to be able to do this transparently within lm() using 
 regsubsets() etc as I am collecting large quantities of summary stats from 
 all possible models to use with a model choice criterion based upon true 
 Bayesian model probabilities.

 Cheers,

       Berwin

 == Full address 
 Berwin A Turlach                      Tel.: +61 (8) 6488 3338 (secr)
 School of Maths and Stats (M019)            +61 (8) 6488 3383 (self)
 The University of Western Australia   FAX : +61 (8) 6488 1028
 35 Stirling Highway
 Crawley WA 6009                e-mail: ber...@maths.uwa.edu.au
 Australia                        http://www.maths.uwa.edu.au/~berwin




 Michael Hopkins
 Algorithm and Statistical Modelling Expert

 Upstream
 23 Old Bond Street
 London
 W1S 4PZ

 Mob +44 0782 578 7220
 DL   +44 0207 290 1326
 Fax  +44 0207 290 1321

 hopk...@upstreamsystems.com
 www.upstreamsystems.com




        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with merging two zoo objects

2010-10-15 Thread Gabor Grothendieck
On Fri, Oct 15, 2010 at 4:27 PM, Megh megh700...@yahoo.com wrote:

 Thanks Gabor for pointing to my old version. However I got one more question
 why the argument tz= is sitting there? As you are not passing any explicit

It would otherwise assume Date class.

 str(read.zoo(file=dal1.csv, header=TRUE, sep=,, format = %m/%d/%Y 
 %H:%M:%S, aggregate = mean))
‘zoo’ series from 2010-10-15 to 2010-10-15
  Data: num [1, 1:4] 73.7 73.7 73.7 73.7
 - attr(*, dimnames)=List of 2
  ..$ : chr 2010-10-15
  ..$ : chr [1:4] data.open data.high data.low data.close
  Index: Class 'Date'  num 14897  


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] feed cut() output into goodness-of-fit tests

2010-10-15 Thread Phil Spector

Andrei -
Looking inside the code for cut, it looks like you could retrieve 
the breaks as follows:


getbreaks = function(x,nbreaks){
nb = nbreaks + 1
dx = diff(rx - range(x,na.rm=TRUE))
seq.int(rx[1] - dx/1000,rx[2] + dx/1000,length.out=nb)
}

The dx/1000 is what makes cut()'s break different than
a simple call to seq().


- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu



On Fri, 15 Oct 2010, Andrei Zorine wrote:


Hello,
My question is assuming I have cut()'ed my sample and look at the
table() of it, how can I compute probabilities for the bins? Do I have
to parse table's names() to fetch bin endpoints to pass them to
p[distr-name] functions? i really don't want to input arguments to PDF
functions by hand (nor copy-and-paste way).


x.fr - table(cut(x,10))
x.fr


(0.0617,0.549]   (0.549,1.04](1.04,1.52](1.52,2.01] (2.01,2.5]
  16 28 26 18  6
  (2.5,2.99](2.99,3.48](3.48,3.96](3.96,4.45](4.45,4.94]
   3  2  0  0  1


names(x.fr)

[1] (0.0617,0.549] (0.549,1.04]   (1.04,1.52](1.52,2.01]
[5] (2.01,2.5] (2.5,2.99] (2.99,3.48](3.48,3.96]
[9] (3.96,4.45](4.45,4.94]


--

Andrei Zorine

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using optimize with two unknowns, e.g. to parameterize a distribution with given confidence interval

2010-10-15 Thread David LeBauer
Hi,

I would like to write a function that finds parameters of a log-normal
distribution with a 1-alpha CI of (x_lcl, x_ucl):

However, I don't know how to optimize for the two unknown parameters.

Here is my unsuccessful attempt to find a lognormal distribution with
a 90%CI of 1,20:

prior - function(x_lcl, x_ucl, alpha, mean, var) {
  a - (plnorm(x_lcl, mean, var) - (alpha/2))^2
  b - (plnorm(x_ucl, mean, var) - (1-alpha/2))^2
  return(a+b)
  }

optimize(fn=prior, interval = c(-5, 100), 1, 20)

I understand that this problem has a closed form solution, but I would
like to make this a general function.

Thanks,

David

-- 
David LeBauer, PhD
Energy Biosciences Institute
University of Illinois Urbana-Champaign
1206 W. Gregory Drive
Urbana, IL  61801, U.S.A.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using apply function and storing output

2010-10-15 Thread Gabor Grothendieck
On Fri, Oct 15, 2010 at 6:46 AM, David A. dasol...@hotmail.com wrote:

 Thanks Dennis,

 I don't think it was a problem of not feeding in a function for rollapply(), 
 because I was using mean() and my co.var() function in the FUN argument.
 The key part seems to be the transformation that zoo() does to the matrix. If 
 I do the same transformation to my original matrix, the rollapply() function 
 now works fine.


rollapply has ts and zoo methods so you have to pass it an object of
one those classes:

 methods(rollapply)
[1] rollapply.ts*  rollapply.zoo*

   Non-visible functions are asterisked


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Dealing with Non-Standard Hours

2010-10-15 Thread Clint Bowman
A data set I obtained has the hours running from 01 through 24 
rather than the conventional 00 through 23.  My favorite, strptime, 
balks at hour 24.


I thought it would be easy to correct but it must be too late on 
Friday for my brain and caffeine isn't helping.


TIA for a hint,

Clint

--
Clint BowmanINTERNET:   cl...@ecy.wa.gov
Air Quality Modeler INTERNET:   cl...@math.utah.edu
Department of Ecology   VOICE:  (360) 407-6815
PO Box 47600FAX:(360) 407-7534
Olympia, WA 98504-7600

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Recovering x/y coordinates from a scatterplot image

2010-10-15 Thread Barry Rowlingson
On Fri, Oct 15, 2010 at 9:46 PM, Rob James aetiolo...@gmail.com wrote:
 Do I recall correctly that there is an R package that can take an image, and
 help one estimate the x/y coordinates?  I can't find the package, thought it
 was an R-tool, but would appreciate any leads.

 Thanks,

 I dont know an R package for this, but there are various programs to
do it, none of which I can ever remember the name once every six years
when I need it. Some searching on the internet has found this one:

http://plotdigitizer.sourceforge.net/

but doubtless there are others!

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] compiling and testing R-2.12.0

2010-10-15 Thread Kjetil Halvorsen
I downloaded the tarball for R-2-12.0, made
./configure
make

without problems.

Then
make test
...which have now been running for more than an hour, and seems to
have stalled at:
comparing 'reg-plot-latin1.ps' to './reg-plot-latin1.ps.save' ... OK
make[3]: Leaving directory `/home/kjetil/R/R-2.12.0/tests'
make[2]: Leaving directory `/home/kjetil/R/R-2.12.0/tests'
make[2]: Entering directory `/home/kjetil/R/R-2.12.0/tests'
running tests of Internet and socket functions
  expect some differences
make[3]: Entering directory `/home/kjetil/R/R-2.12.0/tests'
running code in 'internet.R' ...

at which point it has been for about an hour!

¿Any ideas?
I am running in a shell from within emacs23, on ubuntu 10.10  maverick.

Kjetil

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Parameter extract

2010-10-15 Thread joeman3285

Awesome!  It worked.  Thank you both for your help.
-joe
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Data-Parameter-extract-tp2996369p2997761.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dealing with Non-Standard Hours

2010-10-15 Thread jim holtman
You could have posted an example of your data.  You can use 'sub' to
substitute one set of characters for another in your data.  There are
other ways of doing it if we had an example of your data.

On Fri, Oct 15, 2010 at 5:55 PM, Clint Bowman cl...@ecy.wa.gov wrote:
 A data set I obtained has the hours running from 01 through 24 rather than
 the conventional 00 through 23.  My favorite, strptime, balks at hour 24.

 I thought it would be easy to correct but it must be too late on Friday for
 my brain and caffeine isn't helping.

 TIA for a hint,

 Clint

 --
 Clint Bowman                    INTERNET:       cl...@ecy.wa.gov
 Air Quality Modeler             INTERNET:       cl...@math.utah.edu
 Department of Ecology           VOICE:          (360) 407-6815
 PO Box 47600                    FAX:            (360) 407-7534
 Olympia, WA 98504-7600

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] compiling and testing R-2.12.0

2010-10-15 Thread Patrick Connolly
On Fri, 15-Oct-2010 at 07:41PM -0300, Kjetil Halvorsen wrote:

| I downloaded the tarball for R-2-12.0, made
| ./configure
| make
| 
| without problems.
| 
| Then
| make test
| ...which have now been running for more than an hour, and seems to
| have stalled at:
| comparing 'reg-plot-latin1.ps' to './reg-plot-latin1.ps.save' ... OK
| make[3]: Leaving directory `/home/kjetil/R/R-2.12.0/tests'
| make[2]: Leaving directory `/home/kjetil/R/R-2.12.0/tests'
| make[2]: Entering directory `/home/kjetil/R/R-2.12.0/tests'
| running tests of Internet and socket functions
|   expect some differences
| make[3]: Entering directory `/home/kjetil/R/R-2.12.0/tests'
| running code in 'internet.R' ...
| 
| at which point it has been for about an hour!

Do you access the internet via a proxy server?  In my experience, the
HTTP_PROXY variable isn't read by the R process that does those tests.
It doesn't matter much since it times out after about 5 minutes.  I
think the internet tests are the last to run, so you've probably
passed all the others.

HTH



| 
| ¿Any ideas?
| I am running in a shell from within emacs23, on ubuntu 10.10  maverick.
| 
| Kjetil
| 
| __
| R-help@r-project.org mailing list
| https://stat.ethz.ch/mailman/listinfo/r-help
| PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
| and provide commented, minimal, self-contained, reproducible code.

-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___Patrick Connolly   
 {~._.~}   Great minds discuss ideas
 _( Y )_ Average minds discuss events 
(:_~*~_:)  Small minds discuss people  
 (_)-(_)  . Eleanor Roosevelt
  
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] trouble with installing R-2.12.0 from source on Windows

2010-10-15 Thread Erin Hodgess
Dear R People:

I'm trying to install R-2.12.0 from source on a Netbook with Windows XP.

I have installed the Rtools.exe (version 2.12)

However, when I enter tar xvfz R-2.12.0.tar.gz

I keep getting the message cannot change owneship to uid 501, gid 20
invalid argument

Has anyone else run across this, please?

Thanks,
Sincerely,
Erin


-- 
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with merging two zoo objects

2010-10-15 Thread Gabor Grothendieck
On Fri, Oct 15, 2010 at 9:56 PM, Megh Dal megh700...@yahoo.com wrote:
 However I have noticed a strange thing. Placing of tz =  matters here:

 head(read.zoo(f:/dat1.txt, sep = ,, header = TRUE, format =  %m/%d/%Y 
 %H:%M:%S), tz = )

Your tz argument has been passed as an argument of head.  You want it
as an argument of read.zoo .

           data.open data.high data.low data.close
 2010-10-15      73.7      73.7     73.7       73.7
 2010-10-15      73.8      73.8     73.8       73.8
 2010-10-15      73.8      73.8     73.8       73.8
 2010-10-15      73.8      73.8     73.8       73.8
 2010-10-15      73.8      73.8     73.8       73.8
 2010-10-15      73.8      73.8     73.8       73.8
 Warning messages:
 1: In zoo(rval3, ix) :
  some methods for “zoo” objects do not work if the index entries in 
 ‘order.by’ are not unique
 2: In zoo(rval, x.index[i]) :
  some methods for “zoo” objects do not work if the index entries in 
 ‘order.by’ are not unique

You are missing the aggregate= argument to read.zoo which is needed if
you have duplicate times in your input to tell it how to resolve them.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Time vs Concentration Graphs by ID

2010-10-15 Thread Dennis Murphy
Hi:

On Fri, Oct 15, 2010 at 4:29 AM, Anh Nguyen eataban...@gmail.com wrote:

 Thank you for the very helpful tips. Just one last question:
 - In the lattice method, how can I plot TIME vs OBSconcentration and TIME
 vs PREDconcentration in one graph (per individual)? You said in lattice
 you would replace 'smooth' by 'l' in the type = argument of xyplot() that
 just means now the OBSconc is replaced by a line instead of points and
 PREDconc is not plotted, right?


My mistake. That requires a little trickery. Try using both observed and
predicted as responses in the xyplot, as in

xyplot(conc + pred ~ Time | ID, ..., type = c('p', 'l'), distribute.type =
TRUE, ...)

where the ... refers to the other arguments in the xyplot() call.

If that works (and that *is* an if since it's untested), then conc should be
plotted as points and pred as lines on the same (x, y) scale. See section
5.3 of the Lattice book by Deepayan Sarkar; the code for the book example is
on its web site:

http://lmdvr.r-forge.r-project.org/figures/figures.html

The figure in question is Figure 5.12, but it uses data from several
previous figures, so you might just want to copy and paste everything from
just before Figure 5.10 up to it - the data frame in use is SeatacWeather.

-- I tried to superimpose 2 lattices but that won't work.


No, it won't. You might be able to get away with that in base graphics (add
= TRUE or lines/points), but not in Lattice. Actually, ggplot2 is better
positioned to handle that case 'easily'.


 I can keep the x-scale the same but not the y-scale. Ex: max pred conc = 60
 and max obs conc = 80 and thus for this case, there would be 2 different
 y-scales superimposed on one another.


You may need to plot concentration first if it has the higher y limit. No
guarantee that will work, but it's worth a shot. If it fails, you can set
the y-limits per panel manually using scales, something like

   scales = list(y = list(limits = list(c(0, 60), c(0, 80), c(0, 60), c(0,
60

in panel order, but again, that's untested and I'm not certain it would
work. Better solutions/corrections welcome.

- the ggplot2 method works, just the weird 5,6,7,8,9,10 mg legends (there
 are only 5,7,10 mg doses) but it's nothing that photoshop can't take care
 of.


You must have dose as numeric in your input data frame. If you convert it to
factor (e.g, with a new variable) and use the factor version of dose in
ggplot2, it should work.


 This is very cool, I think I understand it a lot better now. It was a lot
 easier than what I was doing before, that's for sure. Thanks!


For complex requests like these, it *really* helps to have a template data
frame to work with. It took me about a half hour to rearrange the Theoph
data frame from nlme into shape so that I could get a plot that remotely
resembled what you were looking for.  Alas, I don't have the time today to
do the same for this request.

HTH,
Dennis


 On Fri, Oct 15, 2010 at 2:32 AM, Dennis Murphy djmu...@gmail.com wrote:

 Hi:

 To get the plots precisely as you have given them in your png file, you're
 most likely going to have to use base graphics, especially if you want a
 separate legend in each panel. Packages ggplot2 and lattice have more
 structured ways of constructing such graphs, so you give up a bit of freedom
 in the details of plot construction to get nicer default configurations.

 Perhaps you've written a panel function in S-PLUS (?) to produce each
 graph in your png file - if so, you could simply add a couple of lines to
 determine the max y-value and from that the limits of the y scale. It
 shouldn't be at all difficult to make such modifications. Since R base
 graphics is very similar to base graphics in S-PLUS, you could possibly get
 away with popping your S-PLUS code directly into R and see how far that
 takes you.

 Lattice is based on Trellis graphics, but the syntax in lattice has
 changed a fair bit vis a vis Trellis as the package has developed. ggplot2
 is a more recent graphics system in R predicated on the grammar of graphics
 exposited by Leland Wilkinson.

 For my example, I've modified the Theophylline data set in package nlme,
 described in Pinheiro  Bates (2000), Mixed Effects Models in S and S-PLUS,
 Springer. The original data set has eleven unique doses - I combined them
 into three intervals and converted them to factor with cut(). I also created
 four groups of Subjects and put them into a variable ID. The output data
 frame is called theo. I didn't fit the nlme models to the subjects -
 instead, I was lazy and used loess smoothing instead. The code to generate
 the data frame is given below; this is what we mean by a 'reproducible
 example', something that can be copied and pasted into an R session.

 # Use modified version of Theophylline data in package nlme

 library(nlme)
 theo - Theoph
 theo - subset(theo, Dose  3.5)
 theo$dose - cut(theo$Dose, breaks = c(3, 4.5, 5, 6), labels = c('4.25',
 '4.8', '5.5'))
 theo - 

Re: [R] Recovering x/y coordinates from a scatterplot image

2010-10-15 Thread Dennis Murphy
Hi Rob:

Are you thinking of the digitize package?

HTH,
Dennis

On Fri, Oct 15, 2010 at 1:46 PM, Rob James aetiolo...@gmail.com wrote:

 Do I recall correctly that there is an R package that can take an image,
 and help one estimate the x/y coordinates?  I can't find the package,
 thought it was an R-tool, but would appreciate any leads.

 Thanks,

 Rob

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with merging two zoo objects

2010-10-15 Thread Megh Dal
However I have noticed a strange thing. Placing of tz =  matters here:

 head(read.zoo(f:/dat1.txt, sep = ,, header = TRUE, format =  %m/%d/%Y 
 %H:%M:%S), tz = )
   data.open data.high data.low data.close
2010-10-15  73.7  73.7 73.7   73.7
2010-10-15  73.8  73.8 73.8   73.8
2010-10-15  73.8  73.8 73.8   73.8
2010-10-15  73.8  73.8 73.8   73.8
2010-10-15  73.8  73.8 73.8   73.8
2010-10-15  73.8  73.8 73.8   73.8
Warning messages:
1: In zoo(rval3, ix) :
  some methods for “zoo” objects do not work if the index entries in ‘order.by’ 
are not unique
2: In zoo(rval, x.index[i]) :
  some methods for “zoo” objects do not work if the index entries in ‘order.by’ 
are not unique
 head(read.zoo(f:/dat1.txt, sep = ,, header = TRUE, tz = , format =  
 %m/%d/%Y %H:%M:%S))
data.open data.high data.low data.close
2010-10-15 09:00:24 74.35 74.3574.35  74.35
2010-10-15 09:01:15 74.30 74.3074.30  74.30
2010-10-15 09:01:21 74.35 74.3574.35  74.35
2010-10-15 09:01:27 74.20 74.2074.20  74.20
2010-10-15 09:01:30 74.25 74.2574.25  74.25
2010-10-15 09:01:36 74.25 74.2574.25  74.25
Warning message:
In zoo(rval3, ix) :
  some methods for “zoo” objects do not work if the index entries in ‘order.by’ 
are not unique

Is it a bug or a rule that for any function, placing of it's arguments matter?

Thanks,

--- On Sat, 10/16/10, Megh Dal megh700...@yahoo.com wrote:

 From: Megh Dal megh700...@yahoo.com
 Subject: Re: [R] Problem with merging two zoo objects
 To: Gabor Grothendieck ggrothendi...@gmail.com
 Cc: r-help@r-project.org
 Date: Saturday, October 16, 2010, 7:20 AM
 I dont know whether I am missing
 something or not:
 
  head(read.zoo(file=f:/dat1.txt, header=T, sep=,,
 format = %m/%d/%Y %H:%M:%S), tz=GMT)
            data.open
 data.high data.low data.close
 2010-10-15      73.7     
 73.7     73.7   
    73.7
 2010-10-15      73.8     
 73.8     73.8   
    73.8
 2010-10-15      73.8     
 73.8     73.8   
    73.8
 2010-10-15      73.8     
 73.8     73.8   
    73.8
 2010-10-15      73.8     
 73.8     73.8   
    73.8
 2010-10-15      73.8     
 73.8     73.8   
    73.8
 Warning messages:
 1: In zoo(rval3, ix) :
   some methods for “zoo” objects do not work if
 the index entries in ‘order.by’ are not unique
 2: In zoo(rval, x.index[i]) :
   some methods for “zoo” objects do not work if
 the index entries in ‘order.by’ are not unique
  head(read.zoo(file=f:/dat1.txt, header=T, sep=,,
 format = %m/%d/%Y %H:%M:%S))
            data.open
 data.high data.low data.close
 2010-10-15      73.7     
 73.7     73.7   
    73.7
 2010-10-15      73.8     
 73.8     73.8   
    73.8
 2010-10-15      73.8     
 73.8     73.8   
    73.8
 2010-10-15      73.8     
 73.8     73.8   
    73.8
 2010-10-15      73.8     
 73.8     73.8   
    73.8
 2010-10-15      73.8     
 73.8     73.8   
    73.8
 Warning messages:
 1: In zoo(rval3, ix) :
   some methods for “zoo” objects do not work if
 the index entries in ‘order.by’ are not unique
 2: In zoo(rval, x.index[i]) :
   some methods for “zoo” objects do not work if
 the index entries in ‘order.by’ are not unique
 
 In either case, I am missing the time component. Where I
 am going wrong?
 
 Thanks,
 
 
 --- On Sat, 10/16/10, Gabor Grothendieck ggrothendi...@gmail.com
 wrote:
 
  From: Gabor Grothendieck ggrothendi...@gmail.com
  Subject: Re: [R] Problem with merging two zoo objects
  To: Megh megh700...@yahoo.com
  Cc: r-help@r-project.org
  Date: Saturday, October 16, 2010, 2:33 AM
  On Fri, Oct 15, 2010 at 4:27 PM, Megh
  megh700...@yahoo.com
  wrote:
  
   Thanks Gabor for pointing to my old version.
 However I
  got one more question
   why the argument tz= is sitting there? As you
 are
  not passing any explicit
  
  It would otherwise assume Date class.
  
   str(read.zoo(file=dal1.csv, header=TRUE,
 sep=,,
  format = %m/%d/%Y %H:%M:%S, aggregate = mean))
  ‘zoo’ series from 2010-10-15 to 2010-10-15
    Data: num [1, 1:4] 73.7 73.7 73.7 73.7
   - attr(*, dimnames)=List of 2
    ..$ : chr 2010-10-15
    ..$ : chr [1:4] data.open data.high data.low
  data.close
    Index: Class 'Date'  num 14897 
 
 
  
  
  -- 
  Statistics  Software Consulting
  GKX Group, GKX Associates Inc.
  tel: 1-877-GKX-GROUP
  email: ggrothendieck at gmail.com
  
 
 
 
 




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with merging two zoo objects

2010-10-15 Thread Megh Dal
I dont know whether I am missing something or not:

 head(read.zoo(file=f:/dat1.txt, header=T, sep=,, format = %m/%d/%Y 
 %H:%M:%S), tz=GMT)
   data.open data.high data.low data.close
2010-10-15  73.7  73.7 73.7   73.7
2010-10-15  73.8  73.8 73.8   73.8
2010-10-15  73.8  73.8 73.8   73.8
2010-10-15  73.8  73.8 73.8   73.8
2010-10-15  73.8  73.8 73.8   73.8
2010-10-15  73.8  73.8 73.8   73.8
Warning messages:
1: In zoo(rval3, ix) :
  some methods for “zoo” objects do not work if the index entries in ‘order.by’ 
are not unique
2: In zoo(rval, x.index[i]) :
  some methods for “zoo” objects do not work if the index entries in ‘order.by’ 
are not unique
 head(read.zoo(file=f:/dat1.txt, header=T, sep=,, format = %m/%d/%Y 
 %H:%M:%S))
   data.open data.high data.low data.close
2010-10-15  73.7  73.7 73.7   73.7
2010-10-15  73.8  73.8 73.8   73.8
2010-10-15  73.8  73.8 73.8   73.8
2010-10-15  73.8  73.8 73.8   73.8
2010-10-15  73.8  73.8 73.8   73.8
2010-10-15  73.8  73.8 73.8   73.8
Warning messages:
1: In zoo(rval3, ix) :
  some methods for “zoo” objects do not work if the index entries in ‘order.by’ 
are not unique
2: In zoo(rval, x.index[i]) :
  some methods for “zoo” objects do not work if the index entries in ‘order.by’ 
are not unique

In either case, I am missing the time component. Where I am going wrong?

Thanks,


--- On Sat, 10/16/10, Gabor Grothendieck ggrothendi...@gmail.com wrote:

 From: Gabor Grothendieck ggrothendi...@gmail.com
 Subject: Re: [R] Problem with merging two zoo objects
 To: Megh megh700...@yahoo.com
 Cc: r-help@r-project.org
 Date: Saturday, October 16, 2010, 2:33 AM
 On Fri, Oct 15, 2010 at 4:27 PM, Megh
 megh700...@yahoo.com
 wrote:
 
  Thanks Gabor for pointing to my old version. However I
 got one more question
  why the argument tz= is sitting there? As you are
 not passing any explicit
 
 It would otherwise assume Date class.
 
  str(read.zoo(file=dal1.csv, header=TRUE, sep=,,
 format = %m/%d/%Y %H:%M:%S, aggregate = mean))
 ‘zoo’ series from 2010-10-15 to 2010-10-15
   Data: num [1, 1:4] 73.7 73.7 73.7 73.7
  - attr(*, dimnames)=List of 2
   ..$ : chr 2010-10-15
   ..$ : chr [1:4] data.open data.high data.low
 data.close
   Index: Class 'Date'  num 14897 
 
 
 
 -- 
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com
 




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R MySQL (Databases)

2010-10-15 Thread Santosh Srinivas
Dear R-helpers,

Considering that a substantial part of analysis is related data
manipulation, I'm just wondering if I should do the basic data part in a
database server (currently I have the data in .txt file).
For this purpose, I am planning to use MySQL. Is MySQL a good way to go
about? Are there any anticipated problems that I need to be aware of?

Considering, that many users here use large datasets. Do you typical store
the data in databases and query relevant portions for your analysis?
Does it speed up the entire process? Is it neater to do things in a
database? (for e.g. errors could corrected at data import stage itself by
conditions in defining the data itself in the database as opposed to
discovering things when you do the analysis in R and realize something is
wrong in the output?)

This is vis-à-vis using the built in SQLLite, indexing, etc capabilities in
R itself? Does performance work better with a database backend (especially
for simple but large datasets)?

The financial applications that I am thinking of are not exactly realtime
but quick response and fast performance would definitely help.

Aside info, I want to take things to a cloud environment at some point of
time just because it will be easier and cheaper to deliver.

Kind of an open question, but any inputs will help.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] convert factor data to numeric

2010-10-15 Thread andreas

cheers mate that worked fine

thanks a bunch
-- 
View this message in context: 
http://r.789695.n4.nabble.com/convert-factor-data-to-numeric-tp1012855p2996495.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] AIC in bestglm, glm, and lm - why do they differ?

2010-10-15 Thread dmgillis
I recently found the ?bestglm? package while trolling CRAN for a  
function to save some keystrokes with simple (k10) nested models.  I  
am interested in using the best of several nested linear models which  
I have explored using lm(), glm() and now bestglm().  When I compare  
the AIC values for the ?best? candidate model I get the same AIC  
values with lm() and glm() but a different AIC (and likelihood) value  
from bestglm().  Is this the result of some difference in likelihood  
calculation that I am missing in reviewing the documentation and help  
files? I can provide code if there is interest in looking into this,  
otherwise I will continue to assemble my tables the long way with  
glm() and lm(), though the options and potential of the bestglm()  
package has me very interested.


Cheers, Darren Gillis

Biological Sciences
University of Manitoba
Winnipeg, MB

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Time vs Concentration Graphs by ID

2010-10-15 Thread Anh Nguyen
I found 2 problems with this method:

- There is only one line for predicted dose at 5 mg.
- The different doses are 5, 7, and 10 mg but somehow there is a legend for
5,6,7,8,9,10.
- Is there a way to make the line smooth?
- The plots are also getting a little crowded and I was wondering if there a
way to split it into 2 or more pages?

Thanks for your help.

On Thu, Oct 14, 2010 at 8:09 PM, Ista Zahn iz...@psych.rochester.eduwrote:

 Hi,
 Assuming the data is in a data.frame named D, something like

 library(ggplot2) # May need install.packages(ggplot2) first
 ggplot(D, aes(x=Time, y=Concentration, color=Dose) +
 geom_point() +
 geom_line(aes(y = PredictedConcentration, group=1)) +
 facet_wrap(~ID, scales=free, ncol=3)

 should do it.

 -Ista
 On Thu, Oct 14, 2010 at 10:25 PM, thaliagoo eataban...@gmail.com wrote:
 
  Hello-- I have a data for small population who took 1 drug at 3 different
  doses. I have the actual drug concentrations as well as predicted
  concentrations by my model. This is what I'm looking for:
 
  - Time vs Concentration by ID (individual plots), with each subject
  occupying 1 plot -- there is to be 9 plots per page (3x3)
  - Observed drug concentration is made up of points, and predicted drug
  concentration is a curve without points. Points and curve will be the
 same
  color for each dose. Different doses will have different colors.
  - A legend to specify which color correlates to which dose.
  - Axes should be different for each individual (as some individual will
 have
  much higher drug concentration than others) and I want to see in detail
 how
  well predicted data fits observed data.
 
  Any help would be greatly appreciated.
  --
  View this message in context:
 http://r.789695.n4.nabble.com/Time-vs-Concentration-Graphs-by-ID-tp2996431p2996431.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Ista Zahn
 Graduate student
 University of Rochester
 Department of Clinical and Social Psychology
 http://yourpsyche.org


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RJava help

2010-10-15 Thread Romain Francois

Le 14/10/10 20:28, lord12 a écrit :

I need help with RJava. So I run this R code:

library(rJava) #load the rJava library
.jinit(classpath=c:/Documents and Settings/GV/workspace/Test/src,
parameters=-Xmx512m)
#the above is to load the Java virtual machine,


x = runif(1000)
y = runif(1000)
#the above are two vectors to convolve

#now you can call the Java method as
z=.jcall(Test, [D, convolve, x,y)

I get an error Error in .jcall(Test, [D, convolve, x, y) :
   RcallMethod: cannot determine object class.

Why is this?


It would be useful if you showed your Test java class and repost on the 
appropriate mailing list for rJava: 
http://mailman.rz.uni-augsburg.de/mailman/listinfo/stats-rosuda-devel


Romain

--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/b8wOqW : LondonR Rcpp slides
|- http://bit.ly/cCmbgg : Rcpp 0.8.6
`- http://bit.ly/bzoWrs : Rcpp svn revision 2000

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fast rowCumsums wanted for calculating the cdf

2010-10-15 Thread Gregor
Dear all,

Maybe the easiest solution: Is there anything that speaks against generalizing
cumsum from base to cope with matrices (as is done in matlab)? E.g.:

cumsum(Matrix, 1)
equivalent to
apply(Matrix, 1, cumsum)

The main advantage could be optimized code if the Matrix is extreme nonsquare
(e.g. 100,000x10), but the summation is done over the short side (in this case 
10).
apply would practically yield a loop over 100,000 elements, and vectorization 
w.r.t.
the long side (loop over 10 elements) provides considerable efficiency gains.

Many regards,
Gregor




On Tue, 12 Oct 2010 10:24:53 +0200
Gregor mailingl...@gmx.at wrote:

 Dear all,
 
 I am struggling with a (currently) cost-intensive problem: calculating the
 (non-normalized) cumulative distribution function, given the (non-normalized)
 probabilities. something like:
 
 probs - t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites
 F - t(apply(probs, 1, cumsum)) #SLOOOW!
 
 One (already faster, but for sure not ideal) solution - thanks to Henrik 
 Bengtsson:
 
 F - matrix(0, nrow=nrow(probs), ncol=ncol(probs));
 F[,1] - probs[,1,drop=TRUE];
 for (cc in 2:ncol(F)) {
   F[,cc] - F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
 }
 
 In my case, probs is a (30,000 x 10) matrix, and i need to iterate this step 
 around
 200,000 times, so speed is crucial. I currently can make sure to have no NAs, 
 but
 in order to extend matrixStats, this could be a nontrivial issue.
 
 Any ideas for speeding up this - probably routine - task?
 
 Thanks in advance,
 Gregor
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] compare histograms

2010-10-15 Thread Rainer M Krug
On Fri, Oct 15, 2010 at 2:47 AM, Michael Bedward
michael.bedw...@gmail.comwrote:

 Hi Rainer,

 Great - many thanks for that.  Yes, I'm using OSX

 I initially tried to use install.packages to get get a pre-built
 binary of earthmovdist from Rforge, but it failed with...

 In getDependencies(pkgs, dependencies, available, lib) :
  package ‘earthmovdist’ is not available


Yes - we had some problems with getting the package build for OSX, but we
(more specifically Dirk) are working on that.



 When I tried installing with type=source this was also failing.

 However, after reading your post I looked at the error messages
 properly and it turned out to be a trivial problem. The .First
 function defined in my .Rprofile was printing some text to the console
 with cat() which was being incorrectly picked up by the package build
 as if it was a makefile argument. When I commented out the call to cat
 the package installed successfully. I haven't had this problem
 installing other packages from source so I think there must be a
 little problem with your setup (?)


Thanks for letting us know - could you send us the offending entry in your
.Rprofile (or the whole .Rprofile?), so that we can see if it is an OSX or
general problem?



 Now that it's installed I look forward to trying it out shortly.


Great - please give us some feedback on what you think about it.

Cheers,

Rainer



 Thanks again.

 Michael




 On 15 October 2010 03:17, Rainer M Krug r.m.k...@gmail.com wrote:
 
 
  On Thu, Oct 14, 2010 at 3:15 AM, Michael Bedward 
 michael.bedw...@gmail.com
  wrote:
 
  Hi Juan,
 
  Yes, you can use EMD to quantify the difference between any pair of
  histograms regardless of their shape. The only constraint, at least
  the way that I've done it previously, is to have compatible bins. The
  original application of EMD was to compare images based on colour
  histograms which could have all sorts of shapes.
 
  I looked at the package that Dennis alerted me to on RForge but
  unfortunately it seems to be inactive
 
  No - well, it depends how you define inactive: the functionality we
 wanted
  to include is included, therefore no further development was necessary.
 
 
  and the nightly builds are broken. I've downloaded the source code and
  will have a look at it
  sometime in the next few days.
 
  Thanks for alerting us - we will look into that. But just don't use the
  nightly builds, as they are not different to the last release. Just
 download
  the package for your system (I assume Windows or mac, as I just installed
  from source without problems under Linux).
  Let me know if it doesn't work,
  Cheers,
  Rainer
 
 
  Meanwhile, let me know if you want a copy of my own code. It uses the
  lpSolve package.
 
  Michael
 
  On 14 October 2010 08:46, Juan Pablo Fededa jpfed...@gmail.com wrote:
   Hi Michael,
  
  
   I have the same challenge, can you use this earth movers distance it
 to
   compare bimodal distributions?
   Thanks  cheers,
  
  
   Juan
  
  
   On Wed, Oct 13, 2010 at 4:39 AM, Michael Bedward
   michael.bedw...@gmail.com
   wrote:
  
   Just to add to Greg's comments: I've previously used 'Earth Movers
   Distance' to compare histograms. Note, this is a distance metric
   rather than a parametric statistic (ie. not a test) but it at least
   provides a consistent way of quantifying similarity.
  
   It's relatively easy to implement the metric in R (formulating it as
 a
   linear programming problem). Happy to dig out the code if needed.
  
   Michael
  
   On 13 October 2010 02:44, Greg Snow greg.s...@imail.org wrote:
That depends a lot on what you mean by the histograms being
equivalent.
   
You could just plot them and compare visually.  It may be easier to
compare them if you plot density estimates rather than histograms.
 Even
better would be to do a qqplot comparing the 2 sets of data rather
than the
histograms.
   
If you want a formal test then the ks.test function can compare 2
datasets.  Note that the null hypothesis is that they come from the
same
distribution, a significant result means that they are likely
different (but
the difference may not be of practical importance), but a
non-significant
test could mean they are the same, or that you just do not have
enough power
to find the difference (or the difference is hard for the ks test
 to
see).
 You could also use a chi-squared test to compare this way.
   
Another approach would be to use the vis.test function from the
TeachingDemos package.  Write a small function that will either
 plot
your 2
histograms (density plots), or permute the data between the 2
 groups
and
plot the equivalent histograms.  The vis.test function then
 presents
you
with an array of plots, one of which is the original data and the
rest based
on permutations.  If there is a clear meaningful difference in the
groups
you will be 

[R] Downloading file with lapply

2010-10-15 Thread Santosh Srinivas
I'm still getting familiar with lapply

I have this date sequence
x - seq(as.Date(01-Jan-2010,format=%d-%b-%Y), Sys.Date(), by=1) #to
generate series of dates

I want to apply the function for all values of x . so I use lapply (Still a
newbie!)
I wrote this test function
pFun - function (x) {
  print(paste(This is: ,x,sep=))
  }


When I run it:
 lapply(x,pFun(x))
It gives the result correctly.. but end with an error

 Error in match.fun(FUN) : 
  'pFun(x)' is not a function, character or symbol

Question 1: What is the issue here??


Now, to the real problem. I wrote a function to cmDownFun(sDate)which is
working correctly
If I do cmDownFun (x[6]) .. it works correctly .. (i.e. my function was ok
this time around!)

Hoewever if I do, lapply (x,cmDownFun(x))
It fails at 01-Jan-2010  this is because of: HTTP status was '404 Not
Found'

This is OK, because the file does not exist ... but I want to continue for
the remaining values of x
lapply(x,try(cmDownFun(x),silent = TRUE)) .. does not work
I also put the try inside the function itself but does not work either.




--
Thanks R-Helpers. Yes, this is a silly question and it will not be repeated!
:-)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fast rowCumsums wanted for calculating the cdf

2010-10-15 Thread Joshua Wiley
Hi,

You might look at Reduce().  It seems faster.  I converted the matrix
to a list in an incredibly sloppy way (which you should not emulate)
because I cannot think of  the simple way.

 probs - t(matrix(rep(1:1000), nrow=10)) # matrix with row-wise 
 probabilites
 F - matrix(0, nrow=nrow(probs), ncol=ncol(probs));
 F[,1] - probs[,1,drop=TRUE];
 add - function(x) {Reduce(`+`, x, accumulate = TRUE)}


 system.time(F.slow - t(apply(probs, 1, cumsum)))
   user  system elapsed
 36.758   0.416  42.464

 system.time(for (cc in 2:ncol(F)) {
+  F[,cc] - F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
+ })
   user  system elapsed
  0.980   0.196   1.328

 system.time(add(list(probs[,1], probs[,2], probs[,3], probs[,4], probs[,5], 
 probs[,6], probs[,7], probs[,8], probs[,9], probs[,10])))
   user  system elapsed
  0.420   0.072   0.539


.539 seconds for 10 vectors each with 1 million elements does not seem
bad.  For 3 each, on my system it took .014 seconds, which for
200,000 iterations, I estimated as:

 (20*.014)/60/60
[1] 0.778

(and this is on a stone age system with a single core processor and
only 756MB of RAM)

It should not be difficult to get the list back to a matrix.

Cheers,

Josh



On Thu, Oct 14, 2010 at 11:51 PM, Gregor mailingl...@gmx.at wrote:
 Dear all,

 Maybe the easiest solution: Is there anything that speaks against 
 generalizing
 cumsum from base to cope with matrices (as is done in matlab)? E.g.:

 cumsum(Matrix, 1)
 equivalent to
 apply(Matrix, 1, cumsum)

 The main advantage could be optimized code if the Matrix is extreme nonsquare
 (e.g. 100,000x10), but the summation is done over the short side (in this 
 case 10).
 apply would practically yield a loop over 100,000 elements, and vectorization 
 w.r.t.
 the long side (loop over 10 elements) provides considerable efficiency gains.

 Many regards,
 Gregor




 On Tue, 12 Oct 2010 10:24:53 +0200
 Gregor mailingl...@gmx.at wrote:

 Dear all,

 I am struggling with a (currently) cost-intensive problem: calculating the
 (non-normalized) cumulative distribution function, given the (non-normalized)
 probabilities. something like:

 probs - t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites
 F - t(apply(probs, 1, cumsum)) #SLOOOW!

 One (already faster, but for sure not ideal) solution - thanks to Henrik 
 Bengtsson:

 F - matrix(0, nrow=nrow(probs), ncol=ncol(probs));
 F[,1] - probs[,1,drop=TRUE];
 for (cc in 2:ncol(F)) {
   F[,cc] - F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
 }

 In my case, probs is a (30,000 x 10) matrix, and i need to iterate this step 
 around
 200,000 times, so speed is crucial. I currently can make sure to have no 
 NAs, but
 in order to extend matrixStats, this could be a nontrivial issue.

 Any ideas for speeding up this - probably routine - task?

 Thanks in advance,
 Gregor

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >