[R] multiple weights in MANOVA

2010-10-29 Thread Michael Hopkins


Hi all

I have 7 responses that I want to fit with MANOVA.  I have accumulated the 
responses into a matrix Y and each response has a weight vector associated with 
it, so I have accumulated the weights into a matrix WT of the same dimensions 
as Y.  When I try

fit - manova( Y ~ X1 + X2 + X3, weights = WT )

I get the following message:

Error in lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok,  : 
  incompatible dimensions

Does this mean I have used manova() wrongly or does it not handle a matrix of 
weights as required here?

Thanks in advance and please CC replies to me


Michael Hopkins
Algorithm and Statistical Modelling Expert
 
Upstream
23 Old Bond Street
London
W1S 4PZ


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] creating 'all' sum contrasts

2010-10-15 Thread Michael Hopkins


OK, my last question didn't get any replies so I am going to try and ask a 
different way.

When I generate contrasts with contr.sum() for a 3 level categorical variable I 
get the 2 orthogonal contrasts:

 contr.sum( c(1,2,3) )
  [,1] [,2]
110
201
3   -1   -1

This provides the contrasts 1-3 and 2-3 as expected.  But I also want it to 
create 1-2 (i.e. 1-3 - 2-3).  So in general I want all possible 
orthogonal contrasts - think of it as the contrasts for all pairwise 
comparisons between the levels.

Are there are any options for contrast() or other functions/libraries that will 
allow me to do this automatically?  I could go through and create new columns 
but I am using this for complex multi-factor experiments with varying levels 
per factor and fitting the models within other functions (e.g. regsubsets()) so 
an automatic solution using what is already available would be far preferable.


Michael Hopkins
Algorithm and Statistical Modelling Expert
 
Upstream
23 Old Bond Street
London
W1S 4PZ


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] creating 'all' sum contrasts

2010-10-15 Thread Michael Hopkins

On 15 Oct 2010, at 13:55, Berwin A Turlach wrote:

 G'day Michael,
 

Hi Berwin

Thanks for the reply

 On Fri, 15 Oct 2010 12:09:07 +0100
 Michael Hopkins hopk...@upstreamsystems.com wrote:
 
 OK, my last question didn't get any replies so I am going to try and
 ask a different way.
 
 When I generate contrasts with contr.sum() for a 3 level categorical
 variable I get the 2 orthogonal contrasts:
 
 contr.sum( c(1,2,3) )
  [,1] [,2]
 110
 201
 3   -1   -1
 
 These two contrasts are *not* orthogonal.
 
I'm surprised.  Can you please tell me how you calculated that.

 This provides the contrasts 1-3 and 2-3 as expected.  But I also
 want it to create 1-2 (i.e. 1-3 - 2-3).  So in general I want
 all possible orthogonal contrasts - think of it as the contrasts for
 all pairwise comparisons between the levels.
 
 You have to decide what you want.  The contrasts for all pairwise
 comparaisons between the levels or all possible orthogonal contrasts?
 

Well the pairwise contrasts are the most important as I am looking for evidence 
of whether they are zero (i.e. no difference between levels) or not.  But see 
my above comment about orthogonality.

 The latter is actually not well defined.  For a factor with p levels,
 there would be p orthogonal contrasts, which are only identifiable up to
 rotation, hence infinitely many such sets. But there are p(p-1)
 pairwise comparisons. So unless p=2, yo have to decide what you want
 
Well of course the pairwise comparisons are bi-directional so in fact only 
p(p-1)/2 are of interest to me.

 Are there are any options for contrast() or other functions/libraries
 that will allow me to do this automatically?
 
 Look at package multcomp, in particular functions glht and mcp, these
 might help.
 
Thanks I will have a look.  

But I want to be able to do this transparently within lm() using regsubsets() 
etc as I am collecting large quantities of summary stats from all possible 
models to use with a model choice criterion based upon true Bayesian model 
probabilities.

 Cheers,
 
   Berwin
 
 == Full address 
 Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)
 School of Maths and Stats (M019)+61 (8) 6488 3383 (self)
 The University of Western Australia   FAX : +61 (8) 6488 1028
 35 Stirling Highway   
 Crawley WA 6009e-mail: ber...@maths.uwa.edu.au
 Australiahttp://www.maths.uwa.edu.au/~berwin




Michael Hopkins
Algorithm and Statistical Modelling Expert
 
Upstream
23 Old Bond Street
London
W1S 4PZ

Mob +44 0782 578 7220
DL   +44 0207 290 1326
Fax  +44 0207 290 1321

hopk...@upstreamsystems.com
www.upstreamsystems.com
 



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] general construction of 'all pairwise comparison' contrast in ANOVA

2010-10-12 Thread Michael Hopkins


Hi R people

I am using regsubsets() to fit large numbers of models and collect summary 
statistics in order to perform a Bayesian analysis of multi-way ANOVA with 
specific prior information. In general the variables have differing numbers of 
levels =2.  This works well but with variable of more than 2 levels there are 
naturally some arbitrary decisions about which treatment contrasts to include 
in the model matrix.  It's obviously easy to choose the 'standard' contrasts 
using 

options( contrasts = c( contr.sum, contr.poly ))

with contr.helmert, contr.sum or contr.treatment as required.

However, what I would really like is a set of all pairwise comparison 
contrasts, and preferably in a form that can be fed straight to regsubsets() so 
that all labelling, fitting, output etc is automatically dealt with.

Are there any functions or libraries in R that will allow me to do this without 
too much grief?

Thanks in advance


Michael Hopkins
Algorithm and Statistical Modelling Expert
 
Upstream
23 Old Bond Street
London
W1S 4PZ

Mob +44 0782 578 7220
DL   +44 0207 290 1326
Fax  +44 0207 290 1321

hopk...@upstreamsystems.com
www.upstreamsystems.com
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 'all subsets' fitting algorithm for Bayesian approach

2010-10-04 Thread Michael Hopkins

Bert

Thanks for your reply but I already have the Box  Steinberg work in my 
extensive library of 'homework'  :o)

I have some problem specific priors that I need to use for calculating model 
probabilities so that I can produce predictive distributions using Bayesian 
model averaging - hence I need to be able to extract the key summary stats (as 
an analogue of the likelihood) for each model from an exhaustive selection of 
possible model terms.  

If this is available in R already then I would love to hear about it.  I am 
aware of leaps() and regsubsets() but I am not sure that provides exactly what 
I need here, though it may be possible to adapt it somehow.

Michael


On 1 Oct 2010, at 18:32, Bert Gunter wrote:

 u... You are reinventing the wheel. In fact, several wheels: the
 statistical literature already has several different approaches worked
 out for this. For example, George Box and David Steinberg did one
 about 20 years ago, and it has been incorporated as one of the options
 in the JMP DOE model choice procedure.
 
 So do your homework and save yourself some effort. Maybe even all your effort.
 
 -- Bert
 
 On Fri, Oct 1, 2010 at 7:02 AM, Michael Hopkins
 hopk...@upstreamsystems.com wrote:
 
 Hi R experts
 
 I am just wondering if something is already available (or easily adaptable) 
 to do the following.
 
 I am planning to build linear models for all possible combinations of terms, 
 so for example if the terms are sent into a function as this string
 
  X1 + X2 + X3 + X4 + X1:X2
 
 I would want to build models for all possible combinations of these 5 terms, 
 e.g.
 
m1 - lm( y ~ X1 + X3 )
 
 and capture at least the residual sum of squares and total number of model 
 parameters from each model produced.  This will become part of a Bayesian 
 approach to infer actual model probabilities when specialist prior knowledge 
 is also introduced into the problem.
 
 At a high level this particular problem requires something like:
 
 1) the term 'string' to be broken down into it's elements which are 
 separated by + and, I suppose, stored in a list for easier manipulation
 
 2) a matrix with 2^5 rows and 5 columns to be formed with a 0 present if the 
 term is not included and 1 if it is.  Then a model will be fitted to 
 represent every row of this matrix and the key statistics stored in vectors 
 of length 2^5
 
 For N terms of course the number of models will be 2^N.
 
 Is there anything available already?  This is a very similar problem to all 
 subsets regression.
 
 My skill at manipulating strings in R is very limited; can anyone recommend 
 some links or available functions which would make the separations and 
 constructions required easy to achieve?
 
 Thanks in advance to all
 
 
 Michael Hopkins
 Algorithm and Statistical Modelling Expert
 
 
 -- 
 Bert Gunter
 Genentech Nonclinical Biostatistics
 467-7374
 http://devo.gene.com/groups/devo/depts/ncb/home.shtml




Michael Hopkins
Algorithm and Statistical Modelling Expert
 
Upstream
23 Old Bond Street
London
W1S 4PZ

Mob +44 0782 578 7220
DL   +44 0207 290 1326
Fax  +44 0207 290 1321

hopk...@upstreamsystems.com
www.upstreamsystems.com
 
IMPORTANT NOTICE
The information in this e-mail and any attached files is...{{dropped:22}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 'all subsets' fitting algorithm for Bayesian approach

2010-10-01 Thread Michael Hopkins

Hi R experts

I am just wondering if something is already available (or easily adaptable) to 
do the following.

I am planning to build linear models for all possible combinations of terms, so 
for example if the terms are sent into a function as this string

 X1 + X2 + X3 + X4 + X1:X2

I would want to build models for all possible combinations of these 5 terms, 
e.g. 

m1 - lm( y ~ X1 + X3 )

and capture at least the residual sum of squares and total number of model 
parameters from each model produced.  This will become part of a Bayesian 
approach to infer actual model probabilities when specialist prior knowledge is 
also introduced into the problem.

At a high level this particular problem requires something like:

1) the term 'string' to be broken down into it's elements which are separated 
by + and, I suppose, stored in a list for easier manipulation

2) a matrix with 2^5 rows and 5 columns to be formed with a 0 present if the 
term is not included and 1 if it is.  Then a model will be fitted to represent 
every row of this matrix and the key statistics stored in vectors of length 2^5

For N terms of course the number of models will be 2^N.

Is there anything available already?  This is a very similar problem to all 
subsets regression.  

My skill at manipulating strings in R is very limited; can anyone recommend 
some links or available functions which would make the separations and 
constructions required easy to achieve?

Thanks in advance to all


Michael Hopkins
Algorithm and Statistical Modelling Expert
 
Upstream
23 Old Bond Street
London
W1S 4PZ

Mob +44 0782 578 7220
DL   +44 0207 290 1326
Fax  +44 0207 290 1321

hopk...@upstreamsystems.com
www.upstreamsystems.com
 
IMPORTANT NOTICE
The information in this e-mail and any attached files is...{{dropped:22}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Some questions about string processing

2010-09-27 Thread Michael Hopkins


Thanks Phil, very helpful and works as advertised.  

Any thoughts on the second question?

(P.S. for anyone digging this up in the future, there's a comma missing after 
the formula in lm())


On 24 Sep 2010, at 18:16, Phil Spector wrote:

 Michael -
   You're doing too much work half the time, and not enough
 the other half :-).  Try this (untested):
 
 auto_io = function(data_name,factors){
   resp_data =  read.table(data_name, header = TRUE )
   temp_model = lm(formula(paste('y',factors,sep='~')) data = resp_data )
   . . .
 
   - Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spec...@stat.berkeley.edu
 
 
 On Fri, 24 Sep 2010, Michael Hopkins wrote:
 
 
 
 Hi all
 
 A couple of questions about string processing from someone who has only 
 scratched the surface so far.
 
 1) I am wanting to send some strings into a function to allow flexibility 
 inside.  My first idea has been e.g.
 
 auto_io - function( var_string, factors ) {
 #  e.g. var_string sent as test_file.txt  factors sent as x1 + x2 + x3
 
 # input
  data_name   - get( var_string )
 
 #term_list   - get( factors )
 
  resp_data   - read.table( data_name, 
 header = TRUE )
 
 # fit
  temp_model  -  lm( y ~ factors, data = resp_data )
 
  etc...
 
 Neither the read.table() nor the lm() are working because in each case the 
 string is not converted or understood properly.  I'm sure this is possible 
 (and in fact probably easy) but haven't yet seen the light.  Could someone 
 illuminate me here?
 
 2) I will be wanting to process strings as lists of strings.  For instance, 
 I might instead want to send factors above as x1  x2  x3 and add the + 
 when I need them, or perhaps * instead, and also look at sub-models by 
 removing parts of the string etc.  What functions should I be looking at 
 here and are there any examples available?
 
 Thanks in advance.  Feel free to CC me on your reply.
 
 
 Michael Hopkins
 Algorithm and Statistical Modelling Expert
 
 Upstream
 23 Old Bond Street
 London
 W1S 4PZ
 
 Mob +44 0782 578 7220
 DL   +44 0207 290 1326
 Fax  +44 0207 290 1321
 
 hopk...@upstreamsystems.com
 www.upstreamsystems.com
 
 IMPORTANT NOTICE
 The information in this e-mail and any attached files is...{{dropped:22}}
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 




Michael Hopkins
Algorithm and Statistical Modelling Expert
 
Upstream
23 Old Bond Street
London
W1S 4PZ

Mob +44 0782 578 7220
DL   +44 0207 290 1326
Fax  +44 0207 290 1321

hopk...@upstreamsystems.com
www.upstreamsystems.com
 
IMPORTANT NOTICE
The information in this e-mail and any attached files is CONFIDENTIAL and may 
be legally privileged or prohibited from disclosure and unauthorised use. The 
views of the author may not necessarily reflect those of Upstream S.A. It is 
intended solely for the addressee, or the employee or agent responsible for 
delivering such materials to the addressee. If you have received this message 
in error please return it to the sender then delete the email and destroy any 
copies of it. If you are not the intended recipient, any form of reproduction, 
dissemination, copying, disclosure, modification, distribution and/or 
publication, or any action taken or omitted to be taken in reliance upon this 
message or its attachments is prohibited and may be unlawful. At present the 
integrity of e-mail across the Internet cannot be guaranteed and messages sent 
via this medium are potentially at risk. All liability is excluded to the 
extent permitted by law for any claims arising as a result of the u!
 se of this medium to transmit information by or to Upstream S.A.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Some questions about string processing

2010-09-24 Thread Michael Hopkins


Hi all

A couple of questions about string processing from someone who has only 
scratched the surface so far.

1) I am wanting to send some strings into a function to allow flexibility 
inside.  My first idea has been e.g.

auto_io - function( var_string, factors ) {
#  e.g. var_string sent as test_file.txt  factors sent as x1 + x2 + x3

# input
data_name   - get( var_string )

#   term_list   - get( factors )

resp_data   - read.table( data_name, 
header = TRUE )

# fit
temp_model  -  lm( y ~ factors, data = resp_data )

etc...

Neither the read.table() nor the lm() are working because in each case the 
string is not converted or understood properly.  I'm sure this is possible (and 
in fact probably easy) but haven't yet seen the light.  Could someone 
illuminate me here?

2) I will be wanting to process strings as lists of strings.  For instance, I 
might instead want to send factors above as x1  x2  x3 and add the + when I 
need them, or perhaps * instead, and also look at sub-models by removing 
parts of the string etc.  What functions should I be looking at here and are 
there any examples available?

Thanks in advance.  Feel free to CC me on your reply.


Michael Hopkins
Algorithm and Statistical Modelling Expert
 
Upstream
23 Old Bond Street
London
W1S 4PZ

Mob +44 0782 578 7220
DL   +44 0207 290 1326
Fax  +44 0207 290 1321

hopk...@upstreamsystems.com
www.upstreamsystems.com
 
IMPORTANT NOTICE
The information in this e-mail and any attached files is...{{dropped:22}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subset inside a lattice plot using panel.lines

2008-08-04 Thread Michael Hopkins


Hi R people

Pulling my hair out here trying to get something very simple to work.   
Have a data frame of 774 rows and want to plot first and second half  
on same axes with different colours.  A variable is present call 'row'  
created like this and checked to be OK:

row - seq( len=length( variable1 ) )

...so I just want to plot the two subsets where row = 387 and where  
row  387.  None of these work properly:

plots both over each other in correct colours
opt_plot2 - function( var_string, c, units ) {
print( xyplot( get(var_string) ~ variable1 | variable2, panel =
function( x, y, ... ) {
panel.xyplot( x, y, type =n, ... )
panel.grid( h=-1, v=-1, lty=3, lwd=1, col=grey )
panel.lines( x, y, col = red, subset = row = 387 )
panel.lines( x, y, col = dark green, subset = row  387 )
},
}

plots both just in red
... panel.lines( x[row = 387], y[row = 387], col = red )
panel.lines( x[row  387], y[row  387], col = dark green )

first - (row = 387)
second - (row  387)

plots both over each other in correct colours
... panel.lines( x, y, col = red, subset = first )
panel.lines( x, y, col = dark green, subset = second )

plots both just in red
... panel.lines( x[first], y[first], col = red )
panel.lines( x[second], y[second], col = dark green )


I'm feeling frustrated and a bit stupid but should this be so  
difficult?  Any help or tips on what I am doing wrong greatly  
appreciated.

TIA

Michael

__

 Hopkins Research  Touch the Future
__


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subset inside a lattice plot using panel.lines

2008-08-04 Thread Michael Hopkins



Hi R people [duplicate - sorry, just posted HTML by mistake]

Pulling my hair out here trying to get something very simple to work.   
Have a data frame of 774 rows and want to plot first and second half  
on same axes with different colours.  A variable is present call 'row'  
created like this and checked to be OK:


row - seq( len=length( variable1 ) )

...so I just want to plot the two subsets where row = 387 and where  
row  387.  None of these work properly:


plots both over each other in correct colours
opt_plot2 - function( var_string, c, units ) {
print( xyplot( get(var_string) ~ variable1 | variable2, panel =
function( x, y, ... ) {
panel.xyplot( x, y, type =n, ... )
panel.grid( h=-1, v=-1, lty=3, lwd=1, col=grey )
panel.lines( x, y, col = red, subset = row = 387 )
panel.lines( x, y, col = dark green, subset = row  387 )
},
}

plots both just in red
... panel.lines( x[row = 387], y[row = 387], col = red )
panel.lines( x[row  387], y[row  387], col = dark 
green )

first - (row = 387)
second - (row  387)

plots both over each other in correct colours
... panel.lines( x, y, col = red, subset = first )
panel.lines( x, y, col = dark green, subset = second )
plots both just in red
... panel.lines( x[first], y[first], col = red )
panel.lines( x[second], y[second], col = dark green )


I'm feeling frustrated and a bit stupid but should this be so  
difficult?  Any help or tips on what I am doing wrong greatly  
appreciated.


TIA

Michael

__

Hopkins Research  Touch the Future

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dev.off() inside a function other glitches

2008-07-02 Thread Michael Hopkins


Thanks to all who replied on list and to me directly.

You are right, FAQ 7.22 is the answer.  For any others who get caught  
by this it's important to print(..) a plot with a function, not just  
create it, i.e.


print( xyplot(...   ) )

not just

xyplot(... )

..otherwise it doesn't necessarily get sent to the opened device.

Michael


On 2 Jul 2008, at 01:06, Deepayan Sarkar wrote:

On 7/1/08, Michael Hopkins [EMAIL PROTECTED]  
wrote:



Hi R people

I am using a function to create a pdf device, then send a lot of  
plots

to it in a loop then a last lattice xyplot (itself within a function)
outside the loop and finally call dev.off() to write to the file.
This works well apart from the fact that the last plot does not get
sent to the file unless I comment out dev.off() and then apply it in
the console afterwards instead:

   plot_stuff( ...); dev.off()

The device is opened like this:

   pdf( paste( var_string, .pdf, sep= ), onefile=TRUE,  
paper=a4r,

width=9, height=6.5 )

Also, if I try to send two different xyplots after the loop only the
last one ever gets written to the file, whether or not I apply the
dev.off() trick above.

Any thoughts on why this stuff happens and best ways to avoid it are
appreciated.


This sounds like FAQ 7.22:

http://cran.us.r-project.org/doc/FAQ/R-FAQ.html#Why-do-lattice_002ftrellis-graphics-not-work_003f

The dev.off() issue is most likely a red herring; the important bit is
that the xyplot() call was not the last expression in your function.

-Deepayan





Web:http://www.hopkins-research.com/
 Office:+44 (O)1732 228844
 Mobile:+44 (O)7813 467381

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] dev.off() inside a function other glitches

2008-07-01 Thread Michael Hopkins


Hi R people

I am using a function to create a pdf device, then send a lot of plots  
to it in a loop then a last lattice xyplot (itself within a function)  
outside the loop and finally call dev.off() to write to the file.   
This works well apart from the fact that the last plot does not get  
sent to the file unless I comment out dev.off() and then apply it in  
the console afterwards instead:

plot_stuff( ...); dev.off()

The device is opened like this:

pdf( paste( var_string, .pdf, sep= ), onefile=TRUE, paper=a4r,  
width=9, height=6.5 )

Also, if I try to send two different xyplots after the loop only the  
last one ever gets written to the file, whether or not I apply the  
dev.off() trick above.

Any thoughts on why this stuff happens and best ways to avoid it are  
appreciated.

Other info - R 2.7.1 on Intel Mac 10.5.3.

TIA

Michael

__

 Hopkins Research  Touch the Future
__


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.