[R] multiple weights in MANOVA
Hi all I have 7 responses that I want to fit with MANOVA. I have accumulated the responses into a matrix Y and each response has a weight vector associated with it, so I have accumulated the weights into a matrix WT of the same dimensions as Y. When I try fit - manova( Y ~ X1 + X2 + X3, weights = WT ) I get the following message: Error in lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, : incompatible dimensions Does this mean I have used manova() wrongly or does it not handle a matrix of weights as required here? Thanks in advance and please CC replies to me Michael Hopkins Algorithm and Statistical Modelling Expert Upstream 23 Old Bond Street London W1S 4PZ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] creating 'all' sum contrasts
OK, my last question didn't get any replies so I am going to try and ask a different way. When I generate contrasts with contr.sum() for a 3 level categorical variable I get the 2 orthogonal contrasts: contr.sum( c(1,2,3) ) [,1] [,2] 110 201 3 -1 -1 This provides the contrasts 1-3 and 2-3 as expected. But I also want it to create 1-2 (i.e. 1-3 - 2-3). So in general I want all possible orthogonal contrasts - think of it as the contrasts for all pairwise comparisons between the levels. Are there are any options for contrast() or other functions/libraries that will allow me to do this automatically? I could go through and create new columns but I am using this for complex multi-factor experiments with varying levels per factor and fitting the models within other functions (e.g. regsubsets()) so an automatic solution using what is already available would be far preferable. Michael Hopkins Algorithm and Statistical Modelling Expert Upstream 23 Old Bond Street London W1S 4PZ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] creating 'all' sum contrasts
On 15 Oct 2010, at 13:55, Berwin A Turlach wrote: G'day Michael, Hi Berwin Thanks for the reply On Fri, 15 Oct 2010 12:09:07 +0100 Michael Hopkins hopk...@upstreamsystems.com wrote: OK, my last question didn't get any replies so I am going to try and ask a different way. When I generate contrasts with contr.sum() for a 3 level categorical variable I get the 2 orthogonal contrasts: contr.sum( c(1,2,3) ) [,1] [,2] 110 201 3 -1 -1 These two contrasts are *not* orthogonal. I'm surprised. Can you please tell me how you calculated that. This provides the contrasts 1-3 and 2-3 as expected. But I also want it to create 1-2 (i.e. 1-3 - 2-3). So in general I want all possible orthogonal contrasts - think of it as the contrasts for all pairwise comparisons between the levels. You have to decide what you want. The contrasts for all pairwise comparaisons between the levels or all possible orthogonal contrasts? Well the pairwise contrasts are the most important as I am looking for evidence of whether they are zero (i.e. no difference between levels) or not. But see my above comment about orthogonality. The latter is actually not well defined. For a factor with p levels, there would be p orthogonal contrasts, which are only identifiable up to rotation, hence infinitely many such sets. But there are p(p-1) pairwise comparisons. So unless p=2, yo have to decide what you want Well of course the pairwise comparisons are bi-directional so in fact only p(p-1)/2 are of interest to me. Are there are any options for contrast() or other functions/libraries that will allow me to do this automatically? Look at package multcomp, in particular functions glht and mcp, these might help. Thanks I will have a look. But I want to be able to do this transparently within lm() using regsubsets() etc as I am collecting large quantities of summary stats from all possible models to use with a model choice criterion based upon true Bayesian model probabilities. Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin Michael Hopkins Algorithm and Statistical Modelling Expert Upstream 23 Old Bond Street London W1S 4PZ Mob +44 0782 578 7220 DL +44 0207 290 1326 Fax +44 0207 290 1321 hopk...@upstreamsystems.com www.upstreamsystems.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] general construction of 'all pairwise comparison' contrast in ANOVA
Hi R people I am using regsubsets() to fit large numbers of models and collect summary statistics in order to perform a Bayesian analysis of multi-way ANOVA with specific prior information. In general the variables have differing numbers of levels =2. This works well but with variable of more than 2 levels there are naturally some arbitrary decisions about which treatment contrasts to include in the model matrix. It's obviously easy to choose the 'standard' contrasts using options( contrasts = c( contr.sum, contr.poly )) with contr.helmert, contr.sum or contr.treatment as required. However, what I would really like is a set of all pairwise comparison contrasts, and preferably in a form that can be fed straight to regsubsets() so that all labelling, fitting, output etc is automatically dealt with. Are there any functions or libraries in R that will allow me to do this without too much grief? Thanks in advance Michael Hopkins Algorithm and Statistical Modelling Expert Upstream 23 Old Bond Street London W1S 4PZ Mob +44 0782 578 7220 DL +44 0207 290 1326 Fax +44 0207 290 1321 hopk...@upstreamsystems.com www.upstreamsystems.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 'all subsets' fitting algorithm for Bayesian approach
Bert Thanks for your reply but I already have the Box Steinberg work in my extensive library of 'homework' :o) I have some problem specific priors that I need to use for calculating model probabilities so that I can produce predictive distributions using Bayesian model averaging - hence I need to be able to extract the key summary stats (as an analogue of the likelihood) for each model from an exhaustive selection of possible model terms. If this is available in R already then I would love to hear about it. I am aware of leaps() and regsubsets() but I am not sure that provides exactly what I need here, though it may be possible to adapt it somehow. Michael On 1 Oct 2010, at 18:32, Bert Gunter wrote: u... You are reinventing the wheel. In fact, several wheels: the statistical literature already has several different approaches worked out for this. For example, George Box and David Steinberg did one about 20 years ago, and it has been incorporated as one of the options in the JMP DOE model choice procedure. So do your homework and save yourself some effort. Maybe even all your effort. -- Bert On Fri, Oct 1, 2010 at 7:02 AM, Michael Hopkins hopk...@upstreamsystems.com wrote: Hi R experts I am just wondering if something is already available (or easily adaptable) to do the following. I am planning to build linear models for all possible combinations of terms, so for example if the terms are sent into a function as this string X1 + X2 + X3 + X4 + X1:X2 I would want to build models for all possible combinations of these 5 terms, e.g. m1 - lm( y ~ X1 + X3 ) and capture at least the residual sum of squares and total number of model parameters from each model produced. This will become part of a Bayesian approach to infer actual model probabilities when specialist prior knowledge is also introduced into the problem. At a high level this particular problem requires something like: 1) the term 'string' to be broken down into it's elements which are separated by + and, I suppose, stored in a list for easier manipulation 2) a matrix with 2^5 rows and 5 columns to be formed with a 0 present if the term is not included and 1 if it is. Then a model will be fitted to represent every row of this matrix and the key statistics stored in vectors of length 2^5 For N terms of course the number of models will be 2^N. Is there anything available already? This is a very similar problem to all subsets regression. My skill at manipulating strings in R is very limited; can anyone recommend some links or available functions which would make the separations and constructions required easy to achieve? Thanks in advance to all Michael Hopkins Algorithm and Statistical Modelling Expert -- Bert Gunter Genentech Nonclinical Biostatistics 467-7374 http://devo.gene.com/groups/devo/depts/ncb/home.shtml Michael Hopkins Algorithm and Statistical Modelling Expert Upstream 23 Old Bond Street London W1S 4PZ Mob +44 0782 578 7220 DL +44 0207 290 1326 Fax +44 0207 290 1321 hopk...@upstreamsystems.com www.upstreamsystems.com IMPORTANT NOTICE The information in this e-mail and any attached files is...{{dropped:22}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 'all subsets' fitting algorithm for Bayesian approach
Hi R experts I am just wondering if something is already available (or easily adaptable) to do the following. I am planning to build linear models for all possible combinations of terms, so for example if the terms are sent into a function as this string X1 + X2 + X3 + X4 + X1:X2 I would want to build models for all possible combinations of these 5 terms, e.g. m1 - lm( y ~ X1 + X3 ) and capture at least the residual sum of squares and total number of model parameters from each model produced. This will become part of a Bayesian approach to infer actual model probabilities when specialist prior knowledge is also introduced into the problem. At a high level this particular problem requires something like: 1) the term 'string' to be broken down into it's elements which are separated by + and, I suppose, stored in a list for easier manipulation 2) a matrix with 2^5 rows and 5 columns to be formed with a 0 present if the term is not included and 1 if it is. Then a model will be fitted to represent every row of this matrix and the key statistics stored in vectors of length 2^5 For N terms of course the number of models will be 2^N. Is there anything available already? This is a very similar problem to all subsets regression. My skill at manipulating strings in R is very limited; can anyone recommend some links or available functions which would make the separations and constructions required easy to achieve? Thanks in advance to all Michael Hopkins Algorithm and Statistical Modelling Expert Upstream 23 Old Bond Street London W1S 4PZ Mob +44 0782 578 7220 DL +44 0207 290 1326 Fax +44 0207 290 1321 hopk...@upstreamsystems.com www.upstreamsystems.com IMPORTANT NOTICE The information in this e-mail and any attached files is...{{dropped:22}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Some questions about string processing
Thanks Phil, very helpful and works as advertised. Any thoughts on the second question? (P.S. for anyone digging this up in the future, there's a comma missing after the formula in lm()) On 24 Sep 2010, at 18:16, Phil Spector wrote: Michael - You're doing too much work half the time, and not enough the other half :-). Try this (untested): auto_io = function(data_name,factors){ resp_data = read.table(data_name, header = TRUE ) temp_model = lm(formula(paste('y',factors,sep='~')) data = resp_data ) . . . - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Fri, 24 Sep 2010, Michael Hopkins wrote: Hi all A couple of questions about string processing from someone who has only scratched the surface so far. 1) I am wanting to send some strings into a function to allow flexibility inside. My first idea has been e.g. auto_io - function( var_string, factors ) { # e.g. var_string sent as test_file.txt factors sent as x1 + x2 + x3 # input data_name - get( var_string ) #term_list - get( factors ) resp_data - read.table( data_name, header = TRUE ) # fit temp_model - lm( y ~ factors, data = resp_data ) etc... Neither the read.table() nor the lm() are working because in each case the string is not converted or understood properly. I'm sure this is possible (and in fact probably easy) but haven't yet seen the light. Could someone illuminate me here? 2) I will be wanting to process strings as lists of strings. For instance, I might instead want to send factors above as x1 x2 x3 and add the + when I need them, or perhaps * instead, and also look at sub-models by removing parts of the string etc. What functions should I be looking at here and are there any examples available? Thanks in advance. Feel free to CC me on your reply. Michael Hopkins Algorithm and Statistical Modelling Expert Upstream 23 Old Bond Street London W1S 4PZ Mob +44 0782 578 7220 DL +44 0207 290 1326 Fax +44 0207 290 1321 hopk...@upstreamsystems.com www.upstreamsystems.com IMPORTANT NOTICE The information in this e-mail and any attached files is...{{dropped:22}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Michael Hopkins Algorithm and Statistical Modelling Expert Upstream 23 Old Bond Street London W1S 4PZ Mob +44 0782 578 7220 DL +44 0207 290 1326 Fax +44 0207 290 1321 hopk...@upstreamsystems.com www.upstreamsystems.com IMPORTANT NOTICE The information in this e-mail and any attached files is CONFIDENTIAL and may be legally privileged or prohibited from disclosure and unauthorised use. The views of the author may not necessarily reflect those of Upstream S.A. It is intended solely for the addressee, or the employee or agent responsible for delivering such materials to the addressee. If you have received this message in error please return it to the sender then delete the email and destroy any copies of it. If you are not the intended recipient, any form of reproduction, dissemination, copying, disclosure, modification, distribution and/or publication, or any action taken or omitted to be taken in reliance upon this message or its attachments is prohibited and may be unlawful. At present the integrity of e-mail across the Internet cannot be guaranteed and messages sent via this medium are potentially at risk. All liability is excluded to the extent permitted by law for any claims arising as a result of the u! se of this medium to transmit information by or to Upstream S.A. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Some questions about string processing
Hi all A couple of questions about string processing from someone who has only scratched the surface so far. 1) I am wanting to send some strings into a function to allow flexibility inside. My first idea has been e.g. auto_io - function( var_string, factors ) { # e.g. var_string sent as test_file.txt factors sent as x1 + x2 + x3 # input data_name - get( var_string ) # term_list - get( factors ) resp_data - read.table( data_name, header = TRUE ) # fit temp_model - lm( y ~ factors, data = resp_data ) etc... Neither the read.table() nor the lm() are working because in each case the string is not converted or understood properly. I'm sure this is possible (and in fact probably easy) but haven't yet seen the light. Could someone illuminate me here? 2) I will be wanting to process strings as lists of strings. For instance, I might instead want to send factors above as x1 x2 x3 and add the + when I need them, or perhaps * instead, and also look at sub-models by removing parts of the string etc. What functions should I be looking at here and are there any examples available? Thanks in advance. Feel free to CC me on your reply. Michael Hopkins Algorithm and Statistical Modelling Expert Upstream 23 Old Bond Street London W1S 4PZ Mob +44 0782 578 7220 DL +44 0207 290 1326 Fax +44 0207 290 1321 hopk...@upstreamsystems.com www.upstreamsystems.com IMPORTANT NOTICE The information in this e-mail and any attached files is...{{dropped:22}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subset inside a lattice plot using panel.lines
Hi R people Pulling my hair out here trying to get something very simple to work. Have a data frame of 774 rows and want to plot first and second half on same axes with different colours. A variable is present call 'row' created like this and checked to be OK: row - seq( len=length( variable1 ) ) ...so I just want to plot the two subsets where row = 387 and where row 387. None of these work properly: plots both over each other in correct colours opt_plot2 - function( var_string, c, units ) { print( xyplot( get(var_string) ~ variable1 | variable2, panel = function( x, y, ... ) { panel.xyplot( x, y, type =n, ... ) panel.grid( h=-1, v=-1, lty=3, lwd=1, col=grey ) panel.lines( x, y, col = red, subset = row = 387 ) panel.lines( x, y, col = dark green, subset = row 387 ) }, } plots both just in red ... panel.lines( x[row = 387], y[row = 387], col = red ) panel.lines( x[row 387], y[row 387], col = dark green ) first - (row = 387) second - (row 387) plots both over each other in correct colours ... panel.lines( x, y, col = red, subset = first ) panel.lines( x, y, col = dark green, subset = second ) plots both just in red ... panel.lines( x[first], y[first], col = red ) panel.lines( x[second], y[second], col = dark green ) I'm feeling frustrated and a bit stupid but should this be so difficult? Any help or tips on what I am doing wrong greatly appreciated. TIA Michael __ Hopkins Research Touch the Future __ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subset inside a lattice plot using panel.lines
Hi R people [duplicate - sorry, just posted HTML by mistake] Pulling my hair out here trying to get something very simple to work. Have a data frame of 774 rows and want to plot first and second half on same axes with different colours. A variable is present call 'row' created like this and checked to be OK: row - seq( len=length( variable1 ) ) ...so I just want to plot the two subsets where row = 387 and where row 387. None of these work properly: plots both over each other in correct colours opt_plot2 - function( var_string, c, units ) { print( xyplot( get(var_string) ~ variable1 | variable2, panel = function( x, y, ... ) { panel.xyplot( x, y, type =n, ... ) panel.grid( h=-1, v=-1, lty=3, lwd=1, col=grey ) panel.lines( x, y, col = red, subset = row = 387 ) panel.lines( x, y, col = dark green, subset = row 387 ) }, } plots both just in red ... panel.lines( x[row = 387], y[row = 387], col = red ) panel.lines( x[row 387], y[row 387], col = dark green ) first - (row = 387) second - (row 387) plots both over each other in correct colours ... panel.lines( x, y, col = red, subset = first ) panel.lines( x, y, col = dark green, subset = second ) plots both just in red ... panel.lines( x[first], y[first], col = red ) panel.lines( x[second], y[second], col = dark green ) I'm feeling frustrated and a bit stupid but should this be so difficult? Any help or tips on what I am doing wrong greatly appreciated. TIA Michael __ Hopkins Research Touch the Future __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dev.off() inside a function other glitches
Thanks to all who replied on list and to me directly. You are right, FAQ 7.22 is the answer. For any others who get caught by this it's important to print(..) a plot with a function, not just create it, i.e. print( xyplot(... ) ) not just xyplot(... ) ..otherwise it doesn't necessarily get sent to the opened device. Michael On 2 Jul 2008, at 01:06, Deepayan Sarkar wrote: On 7/1/08, Michael Hopkins [EMAIL PROTECTED] wrote: Hi R people I am using a function to create a pdf device, then send a lot of plots to it in a loop then a last lattice xyplot (itself within a function) outside the loop and finally call dev.off() to write to the file. This works well apart from the fact that the last plot does not get sent to the file unless I comment out dev.off() and then apply it in the console afterwards instead: plot_stuff( ...); dev.off() The device is opened like this: pdf( paste( var_string, .pdf, sep= ), onefile=TRUE, paper=a4r, width=9, height=6.5 ) Also, if I try to send two different xyplots after the loop only the last one ever gets written to the file, whether or not I apply the dev.off() trick above. Any thoughts on why this stuff happens and best ways to avoid it are appreciated. This sounds like FAQ 7.22: http://cran.us.r-project.org/doc/FAQ/R-FAQ.html#Why-do-lattice_002ftrellis-graphics-not-work_003f The dev.off() issue is most likely a red herring; the important bit is that the xyplot() call was not the last expression in your function. -Deepayan Web:http://www.hopkins-research.com/ Office:+44 (O)1732 228844 Mobile:+44 (O)7813 467381 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] dev.off() inside a function other glitches
Hi R people I am using a function to create a pdf device, then send a lot of plots to it in a loop then a last lattice xyplot (itself within a function) outside the loop and finally call dev.off() to write to the file. This works well apart from the fact that the last plot does not get sent to the file unless I comment out dev.off() and then apply it in the console afterwards instead: plot_stuff( ...); dev.off() The device is opened like this: pdf( paste( var_string, .pdf, sep= ), onefile=TRUE, paper=a4r, width=9, height=6.5 ) Also, if I try to send two different xyplots after the loop only the last one ever gets written to the file, whether or not I apply the dev.off() trick above. Any thoughts on why this stuff happens and best ways to avoid it are appreciated. Other info - R 2.7.1 on Intel Mac 10.5.3. TIA Michael __ Hopkins Research Touch the Future __ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.