Re: [R] length of a string
sLengths - with(dataFrame, nchar(as.character(SEQUENCE))) Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: +61 4 8819 4402 Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of João Fadista Sent: Wednesday, 5 September 2007 11:51 PM To: r-help@stat.math.ethz.ch Subject: [R] length of a string Dear all, I would like to know how can I compute the length of a string in a dataframe. Example: SEQUENCE ID TGCTCCCATCTCCACGGHR04FS00645 ACTGAACTCCCATCTCCAAT HR0595847847 I would like to know how to compute the length of each SEQUENCE. Best regards, João Fadista [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] t-distribution
for the upper tail: 1-pt(1.11, 9) [1] 0.1478873 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Nair, Murlidharan T Sent: Thursday, 2 August 2007 4:43 AM To: r-help@stat.math.ethz.ch Subject: [R] t-distribution If I have a calculated t can I get the probability associated with it using an R function by giving it the df and t? I know I can do the whole calculation using t.test() or get the t-distribution using qt(). If t=1.11 and df =9 can I get the probability? Thanks../Murli [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] t-distribution
Well, is t = 1.11 all that accurate in the first place? :-) In fact, reading beween the lines of the original enquiry, what the person probably wanted was something like ta - pt(-1.11, 9) + pt(1.11, 9, lower.tail = FALSE) which is the two-sided t-test tail area. The teller of the parable will usually leave some things unexplained... Bill. Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: +61 4 8819 4402 Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ben Bolker Sent: Thursday, 2 August 2007 4:57 AM To: r-help@stat.math.ethz.ch Subject: Re: [R] t-distribution Bill.Venables at csiro.au writes: for the upper tail: 1-pt(1.11, 9) [1] 0.1478873 wouldn't pt(1.11, 9, lower.tail=FALSE) be more accurate? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] automatically generating and accessing data frames of varyingdimensions
I'm not sure why you want to do it this way (it would probably help if we had a more complete picture of what you were really trying to do, but here are a few possibilities for the questions you ask. 1. generating data frames. rw - c(4,5,2) cl - c(3,3,3) for(i in 1:length(rw)) assign(paste(auto.data, i, sep=.), as.data.frame(array(0, dim=c(rw[i], cl[i]), dimnames = list(NULL, paste(X, 1:cl[i], sep=) check: auto.data.1 X1 X2 X3 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 auto.data.2 X1 X2 X3 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 auto.data.3 X1 X2 X3 1 0 0 0 2 0 0 0 2. filling them up (... are you sure you want to do it this way?) The simplest way is probably through an intermediary for(nam in paste(auto.data, 1:3, sep=.)) { # loop over the names tmp - get(nam) for(i in 1:nrow(tmp)) for(j in 1:ncol(tmp)) tmp[i, j] - i+j-i*j # 'some value' assign(nam, tmp) rm(tmp) } check: auto.data.1 X1 X2 X3 1 1 1 1 2 1 0 -1 3 1 -1 -3 4 1 -2 -5 auto.data.2 X1 X2 X3 1 1 1 1 2 1 0 -1 3 1 -1 -3 4 1 -2 -5 5 1 -3 -7 auto.data.3 X1 X2 X3 1 1 1 1 2 1 0 -1 It may work, but I have to say, though, I'm almost sure this is a mistake. There has to be a better way using the facilities that R provides for avoiding heavy loops like this. Just a hunch... Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: +61 4 8819 4402 Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Drescher, Michael (MNR) Sent: Friday, 13 July 2007 9:19 AM To: r-help@stat.math.ethz.ch Subject: [R] automatically generating and accessing data frames of varyingdimensions Hi All, I want to automatically generate a number of data frames, each with an automatically generated name and an automatically generated number of rows. The number of rows has been calculated before and is different for all data frames (e.g. c(4,5,2)). The number of columns is known a priori and the same for all data frames (e.g. c(3,3,3)). The resulting data frames could look something like this: auto.data.1 X1 X2 X3 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 auto.data.2 X1 X2 X3 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 auto.data.3 X1 X2 X3 1 0 0 0 2 0 0 0 Later, I want to fill the elements of the data frames with values read from somewhere else, automatically looping through the previously generated data frames. I know that I can automatically generate variables with the right number of elements with something like this: auto.length - c(12,15,6) for(i in 1:3) { + nam - paste(auto.data,i, sep=.) + assign(nam, 1:auto.length[i]) + } auto.data.1 [1] 1 2 3 4 5 6 7 8 9 10 11 12 auto.data.2 [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 auto.data.3 [1] 1 2 3 4 5 6 But how do I turn these variables into data frames or give them any dimensions? Any commands such as 'as.matrix', 'data.frame', or 'dim' do not seem to work. I also seem not to be able to access the variables with something like auto.data.i since: auto.data.i Error: object auto.data.i not found Thus, how would I be able to automatically write to the elements of the data frames later in a loop such as ... for(i in 1:3) { + for(j in 1:nrow(auto.data.i)) { ### this obviously does not work since 'Error in nrow(auto.data.i) : object auto.data.i not found' + for(k in 1:ncol(auto.data.i)) { + auto.data.i[j,k] - 'some value' + }}} Thanks a bunch for all your help. Best, Michael Michael Drescher Ontario Forest Research Institute Ontario Ministry of Natural Resources 1235 Queen St East Sault Ste Marie, ON, P6A 2E3 Tel: (705) 946-7406 Fax: (705) 946-2030 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] type III ANOVA for a nested linear model
The message from this cute little data set is very clear. Consider fm - aov(resp ~ A*B + A/C, mydata) drop1(fm, test = F) Single term deletions Model: resp ~ A * B + A/C Df Sum of Sq RSS AIC F value Pr(F) none 65.540 47.261 A:B 216.132 81.672 47.222 0.7384 0.5168 A:C 6 199.501 265.041 60.411 3.0440 0.1007 So neither of the non-marginal terms is significant. To address questions about the main effects the natural next step is to remove the interactions. By orthogonality you can safely cut a few corners and do both at once: drop1(update(fm, .~A+B), test = F) Single term deletions Model: resp ~ A + B Df Sum of Sq RSS AIC F value Pr(F) none 281.17 57.47 A 2 33.12 314.30 55.48 0.82460.4586 B 1915.21 1196.38 81.54 45.5695 9.311e-06 There is a very obvious, even trivial, B main effect, but nothing else. All this becomes even more glaring if you take the unusal step of plotting the data. What sort of editor would overlook this clear and demonstrable message leaping out from the data in favour of some arcane argument about types of sums of squares? Several answers come to mind: A power freak, a SAS afficianado, an idiot. If you get nowhere with this editor, my suggestion, hard as it may seem, is that you do not submit to that kind of midnless idealogy and make fatuous compromises for the sake of immediate publication. If necessary, part company with that editor and find somewhere else to publish where the editor has some inkling of what statistical inference is all about. Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Carsten Jaeger Sent: Tuesday, 10 July 2007 4:15 AM To: R help list Subject: [R] type III ANOVA for a nested linear model Hello, is it possible to obtain type III sums of squares for a nested model as in the following: lmod - lm(resp ~ A * B + (C %in% A), mydata)) I have tried library(car) Anova(lmod, type=III) but this gives me an error (and I also understand from the documentation of Anova as well as from a previous request (http://finzi.psych.upenn.edu/R/Rhelp02a/archive/64477.html) that it is not possible to specify nested models with car's Anova). anova(lmod) works, of course. My data (given below) is balanced so I expect the results to be similar for both type I and type III sums of squares. But are they *exactly* the same? The editor of the journal which I'm sending my manuscript to requests what he calls conventional type III tests and I'm not sure if can convince him to accept my type I analysis. R mydata A B C resp 1 1 1 1 34.12 2 1 1 2 32.45 3 1 1 3 44.55 4 1 2 1 20.88 5 1 2 2 22.32 6 1 2 3 27.71 7 2 1 6 38.20 8 2 1 7 31.62 9 2 1 8 38.71 102 2 6 18.93 112 2 7 20.57 122 2 8 31.55 133 1 9 40.81 143 1 10 42.23 153 1 11 41.26 163 2 9 28.41 173 2 10 24.07 183 2 11 21.16 Thanks a lot, Carsten __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a little problem on selecting a subset from dataset A accordingto dataset B?
AB - with(B, subset(A, coords.x1 %in% X1)) AB coords.x1 coords.x2 0 542250.9 3392404 7 541512.5 3394722 8 541479.3 3394878 9 538903.4 3395943 18 543274.0 3389919 19 543840.8 3392012 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of zhijie zhang Sent: Monday, 9 July 2007 2:43 AM To: R-help@stat.math.ethz.ch Subject: [R] a little problem on selecting a subset from dataset A accordingto dataset B? Dear Friends, I want to extract the records from A according to B, but the results are not correct because R says : The length of long object is not integer times on the length of short object. Anybody have met the same problem? How to do it correctly? length(A)=47 length(B)=6 A[A$coords.x1==B$X1,] #the program for the above task. I should get 6 records, but i only get former 4 records for the above reason. Thanks. The folloing shows dataset A and B. A coords.x1 coords.x2 0 542250.89 3392404.1 1 538813.87 3388339.0 2 536049.19 3385821.6 3 533659.62 3383194.2 4 530642.30 3376834.9 5 529573.15 3378177.8 6 530853.82 3394838.8 7 541512.51 3394721.6 8 541479.33 3394877.8 9 538903.39 3395942.5 10 536019.95 3396286.1 11 538675.23 3384213.2 12 535127.95 3381255.4 13 533852.24 3378660.4 14 531360.91 3379273.8 15 539289.14 3375759.8 16 543410.51 3384353.1 17 543089.27 3388170.1 18 543274.03 3389919.2 19 543840.77 3392012.4 20 553383.55 3402401.8 21 554621.51 3397938.9 22 564096.42 3397524.4 23 567529.64 3398702.9 24 561798.76 3404864.0 25 562868.34 3405502.2 26 563145.22 3403192.1 27 562419.87 3404090.4 28 558321.85 3403879.9 29 567050.74 3404973.1 30 570609.70 3408742.4 31 556777.57 3397858.0 32 531353.38 3368596.6 33 533513.50 3372749.3 34 537543.19 3364284.8 35 538779.41 3368224.8 36 525930.09 3374067.7 37 522990.85 3369213.1 38 528826.37 3359019.0 39 533865.85 3362595.4 40 531200.25 3365053.0 41 551054.10 3377181.3 42 546974.19 3369284.8 43 572315.59 3359541.1 44 562703.63 3355173.4 45 558959.31 3357804.4 46 558531.39 3361741.1 B X1X2 1 542250.89 3392404.1 2 541512.51 3394721.6 3 541479.33 3394877.8 4 538903.39 3395942.5 5 543274.03 3389919.2 6 543840.77 3392012.4 -- With Kind Regards, oooO: (..): :\.(:::Oooo:: ::\_)::(..):: :::)./::: ::(_/ : [*** ] Zhi Jie,Zhang ,PHD Tel:86-21-54237149 Dept. of Epidemiology,School of Public Health,Fudan University Address:No. 138 Yi Xue Yuan Road,Shanghai,China Postcode:200032 Email:[EMAIL PROTECTED] Website: www.statABC.com [*** ] oooO: (..): :\.(:::Oooo:: ::\_)::(..):: :::)./::: ::(_/ : [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Within matrix
Unless you want to do this millions of times, efficiecy is probably not a big issue here, but simplicity always pays off. I'm presuming you are dealing with a single classification setup. Let f (n x 1) be a *factor* defining the classes Let X (n x p) be the data matrix. Then the steps I would use to find the between and within SSP matrices, 'by hand' are as follows: Tot - scale(X, scale = FALSE) # sweep out the grand means Res - resid(aov(X ~ f))# sweep out the class means WSS - crossprod(Res) # within SSP matrix BSS - crossprod(Tot - Res) # between SSP matrix Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: +61 4 8819 4402 Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Tuesday, 10 July 2007 9:25 AM To: r-help@stat.math.ethz.ch Subject: [R] Within matrix Hi all, I am working on cluster, I am trying to evaluate a within and between matrix. Is there any facility for that ? I did my own function, but I am not a programmer, so I am affraid I am not really able to programme efficiant and fast function... Thanks Christophe Ce message a ete envoye par IMP, grace a l'Universite Paris 10 Nanterre __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to calculate the index the number of speciescombinations?
Here is a step by step explanation. The way you present the data is as species (rows) by sites (columns) data frame dim(species_x_sites) [1] 17 20 There are in fact only 19 sites as one of the columns of the data frame is the species name: names(species_x_sites) [1] Species Cuba Hispaniola Jamaica Puerto_Rico [6] Guadeloupe Martinique Dominica St._Lucia Barbados [11] St._Vincent Grenada Antigua St._Croix Grand_Cayman [16] St._KittsBarbuda Montserrat St._Martin St._Thomas To use the standard tools you need to turn it around and make a site_x_species matrix site_x_species - t(as.matrix(species_x_sites[, -1])) (The [, -1] simply omits the species column and the t() transoses it) dim(site_x_species) [1] 19 17 Now how many unique combinations are there? unique_combos - unique(site_x_species) # unique *rows* Not to find out how many we could use dim(unique_combos) [1] 10 17 or nrow(unique_combos) [1] 10 Which I believe corresponds to your index. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Zhang Jian Sent: Saturday, 7 July 2007 4:20 PM To: Sarah Goslee; r-help Subject: Re: [R] How to calculate the index the number of speciescombinations? Sorry, I can not understand your reply very clearly. How to compute the number of unique sites ? Can you give me a simply example or do a simply analyse using one data? Thanks very much. Jian Zhang On 7/7/07, Sarah Goslee [EMAIL PROTECTED] wrote: It should be the number of unique sites. In this case, the number of unique columns in the data frame. See ?unique. (Interestingly, convention is usually that species are columns and sites are rows.) For your sample data you only see 10 of the 2^17 possible combinations of 17 species (not 2n). Sarah On 7/7/07, Zhang Jian [EMAIL PROTECTED] wrote: I want to analyze the co-occurrence of some species. In some papers, the authors said that the indexthe number of species combinations (COMBO) is a good index. I try to calculate the index by R language. But I can not get the right value. I think that I do not understand the concept of the index because my english is not good. The concept: *The number of species combinations *This index scans the columns of the presence-absence matrix and keeps track of the number of unique species combinations that are represented in different sites. For an assemblage of n species, there are 2n possible species combinations, including the combination of no species being present (Pielou and Pielou 1968). In most real matrices, the number of sites (= columns) is usually substantially less than 2n, which places an upper bound on the number of species combinations that can be found in both the observed and the simulated matrices. Presence-absence Data (Each row represents different species and each column represents a different site. A 1 indicates a species is present at a particular site, and a 0 indicates that a species is absent from a particular site): Species Cuba Hispaniola Jamaica Puerto_Rico Guadeloupe Martinique Dominica St._Lucia Barbados St._Vincent Grenada Antigua St._Croix Grand_Cayman St._Kitts Barbuda Montserrat St._Martin St._Thomas Carduelis_dominicensis 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Loxia_leucoptera 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Volatinia_jacarina 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 Sporophila_nigricollis 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 Melopyrrha_nigra 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 Loxigilla_portoricensis 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Loxigilla_violacea 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Loxigilla_noxis 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 Melanospiza_richardsoni 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 Tiara_olivacea 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 Tiara_bicolor 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 Tiara_canora 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Loxipasser_anoxanthus 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Saltator_albicollis 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 Torreornis_inexpectata 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Ammodramus_savannarum 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Zonotrichia_capensis 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -- Sarah Goslee http://www.functionaldiversity.org [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] termplot with uniform y-limits
Does anyone have, or has anyone ever considered making, a version of 'termplot' that allows the user to specify that all plots should have the same y-limits? This seems a natural thing to ask for, as the plots share a y-scale. If you don't have the same y-axes you can easily misread the comparative contributions of the different components. Notes: the current version of termplot does not allow the user to specify ylim. I checked. the plot tools that come with mgcv do this by default. Thanks Simon. Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: +61 4 8819 4402 Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about PCA with prcomp
...but with 500 variables and only 20 'entities' (observations) you will have 481 PCs with dead zero eigenvalues. How small is 'smaller' and how many is a few? Everyone who has responded to this seems to accept the idea that PCA is the way to go here, but that is not clear to me at all. There is a 2-sample structure in the 20 observations that you have. If you simply ignore that in doing your PCA you are making strong assumptions about sampling that would seem to me unlikely to be met. If you allow for the structure and project orthogonal to it then you are probably throwing the baby out with the bathwater - you want to choose variables which maximise separation between the 2 samples (and now you are up to 482 zero principal variances, if that matters...). I think this problem probably needs a bit of a re-think. Some variant on singular LDA, for example, may be a more useful way to think about it. Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ravi Varadhan Sent: Monday, 2 July 2007 1:29 PM To: 'Patrick Connolly' Cc: r-help@stat.math.ethz.ch; 'Mark Difford' Subject: Re: [R] Question about PCA with prcomp The PCs that are associated with the smaller eigenvalues. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: Patrick Connolly [mailto:[EMAIL PROTECTED] Sent: Monday, July 02, 2007 4:23 PM To: Ravi Varadhan Cc: 'Mark Difford'; r-help@stat.math.ethz.ch Subject: Re: [R] Question about PCA with prcomp On Mon, 02-Jul-2007 at 03:16PM -0400, Ravi Varadhan wrote: | Mark, | | What you are referring to deals with the selection of covariates, | since PC | doesn't do dimensionality reduction in the sense of covariate selection. | But what Mark is asking for is to identify how much each data point | contributes to individual PCs. I don't think that Mark's query makes much | sense, unless he meant to ask: which individuals have high/low scores | on PC1/PC2. Here are some comments that may be tangentially related | to Mark's | question: | | 1. If one is worried about a few data points contributing heavily to | the estimation of PCs, then one can use robust PCA, for example, | using robust covariance matrices. MASS has some tools for this. | 2. The biplot for the first 2 PCs can give some insights 3. PCs, | especially, the last few PCs, can be used to identify outliers. What is meant by last few PCs? -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_Middle minds discuss events (:_~*~_:)Small minds discuss people (_)-(_) . Anon ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subscripting specified variables in a function
I think what you are trying to do is quite tricky. Here is one way you might like to think about. tdat - data.frame(a = 1:5, b = c(1:3, 101,101)) tdat a b 1 1 1 2 2 2 3 3 3 4 4 101 5 5 101 test.fx - function(dta, expvar, expval) { lang - substitute(subset(dat, v1 v2), list(dat = substitute(dta), v1 = substitute(expvar), v2 = substitute(expval))) newdta - eval.parent(lang) summary(newdta[deparse(substitute(expvar))]) } test.fx(tdat, b, 100) b Min. :101 1st Qu.:101 Median :101 Mean :101 3rd Qu.:101 Max. :101 test.fx(tdat, b, 2) b Min. : 3.00 1st Qu.: 52.00 Median :101.00 Mean : 68.33 3rd Qu.:101.00 Max. :101.00 Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: +61 4 8819 4402 Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Zodet, Marc W. (AHRQ) Sent: Wednesday, 27 June 2007 12:43 AM To: r-help@stat.math.ethz.ch Subject: [R] Subscripting specified variables in a function I'm trying to create a function which will allow me to subset a data set based on values of various specified variables. I also want to then apply some other function(s) (e.g., summary). This is what I've tried so far test.fx - function(dta, expvar, expval) { + newdta - subset(dta, eval(expvar)expval) + summary(newdta$eval(expvar)) + } test.fx(fyc04s, quote(totexp04), 100) Error in summary(newdta$eval(expvar)) : attempt to apply non-function The subset works fine, but the my attempt to access the specified variable bombs. Is there a syntactical change I can make? Is it better to attach newdta? Thanks in advance for any guidance. Marc Marc W. Zodet, MS Senior Health Statistician Agency for Healthcare Research and Quality Center for Financing, Access, and Cost Trends 301-427-1563 (Telephone) 301-427-1276 (Fax) [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] correlation comparison one more time
Hypothesis tests are normally set up to test a null hypothesis within a broader class of alternatives, which includes the null as a special case. Roughly speaking the logic is We assume that the outer class includes the truth. We have a simple special case of this we call the null hypothesis that in some sense represents 'no effect'. Does the data provide cogent evidence that the special case is not adequate? A standard way to address this question is, for example, to maximise the likelihood under null and alternative and to use the difference in log likelihood as the basis of a test statistic known as the likelihood ratio. The way you have set up your hypotheses does not match this paradigm. Your hypothesis is, in essence, that the squared correlation between A and D is larger than any other squared correlation involving two different variables, which include A or D. It is clear enough what you are asking, but since it doesn't match the standard paradigm it is unlilely that any standard procedure will be available to address it. It is unclear, for example, how you might go about setting up a likelihood ratio test. I think the answer to your question is no, not off the shelf at least, and you probably need to think about the problem in the null and alternative hypothesis framework to make progress. Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: (I don't have one!) Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of rafael Sent: Sunday, 17 June 2007 8:20 PM To: r-help@stat.math.ethz.ch Subject: [R] correlation comparison one more time I would like ask again, because I cant find the answer I have such problem: My data containing 4 variables (A,B,C,D) and are completed from 4 samples. Each of matrix is such: A B C D A 1 ab ac ad B ab 1bc bd C ac bc 1 cd D ad bd cd 1 My hypothesis are that ad is the strongest correlation for A and for D (sign doesn't matter) bc is the strongest correlation for B and for C (sign doesn't matter) across samples. Is it possible test these hypothesis? Any help would be appreciated Rafał Bartczuk [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Retain names in conversion of matrix to vector
There is a slightly surprising way to do this in one step. Here's an example tmp - matrix(1:16, 4, 4) dimnames(tmp) - list(letters[1:4], letters[1:4]) tmp a b c d a 1 5 9 13 b 2 6 10 14 c 3 7 11 15 d 4 8 12 16 as.data.frame(as.table(tmp)) Var1 Var2 Freq 1 aa1 2 ba2 3 ca3 4 da4 5 ab5 6 bb6 7 cb7 8 db8 9 ac9 10bc 10 11cc 11 12dc 12 13ad 13 14bd 14 15cd 15 16dd 16 Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile:(I don't have one!) Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of philozine Sent: Friday, 15 June 2007 9:04 AM To: r-help@stat.math.ethz.ch Subject: [R] Retain names in conversion of matrix to vector Hi R-listers, I'm using R only for a few basic functions but am having difficulty doing something that *should* be simple. I have an nxn matrix, Q, where Q[i,j] is a directed value (in this case, oil exports from i to j). Note that Q[i,j]~=Q[j,i]. I imported column names along with the matrix then copied them to the rows using rownames(Q) - colnames(Q). Simple so far. What I'd like to do now is convert Q for export into a vector of values with the original row and column names intact. Having one vector each for row, column, and cell would be ideal, e.g., [1,1] = i's name, [1,2] = j's name, and [1,3] = Q[i, j]. But just being able to export my matrix data in vector form with the correct row/col names for each observation would be sufficient. Thus far I've tried c(), vector(), and a few others, but can't get the correct results. They do generate the correct vector of matrix values, but they do not appear to retain both row and column names. (Or, rather, I have not discovered how to make them do so.) To illustrate, my data currently look something like this: ABCD A | 0 |.1 |.4 |.6 | B |.2 | 0 |.2 |.1 | C |.5 |.9 | 0 |.9 | D |.7 | 0 |.3 | 0 | I would like them to look like this (at least when exported as a .txt file, if not necessary when displayed within R): i j Q | A | A | 0 | | A | B |.1 | | A | C |.4 | | A | D |.6 | | B | A |.2 | | B | B | 0 | | B | C |.2 | [...] and so on If anybody knows how to do this, I will be extremely appreciative! Best regards, - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] AIC consistency with S-PLUS
I think we need some clarification here. extractAIC() is available on both systems, in the stats package in R and in the MASS library in S-PLUS. If you use extractAIC() on both systems, do you get the same ordering of models? AIC() is also available on both systems, in the stats package again in R, and in the nlme3 library in S-PLUS. On R there are not too many classes where methods are available for both, but on S-PLUS there are a few. So what are you comparing with what? You need to say what classes of objects you are dealing with, that is very important, and what generic functions you are using for the comparison on each system. The capacity for confusion in this area is immense. Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile:(I don't have one!) Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Alice Shelly Sent: Friday, 1 June 2007 10:07 AM To: r-help@stat.math.ethz.ch Subject: [R] AIC consistency with S-PLUS Hello- I understand that log-likelihoods are bound to differ by constants, but if i estimate AIC for a set of simple nested linear models using the following 4 methods, shouldn't at least two of them produce the same ordering of models? in R: extractAIC AIC in S-PLUS: AIC n*log(deviance(mymodel)/n) + 2*p I find it troubling that these methods all give me different answers as to the best model or even short set of models. Thanks for your comments. Alice Shelly [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] scan a directory and then make a string vector consisting offile names
Would you believe it's list.files() See ?list.files for variants. Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile:(I don't have one!) Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Taka Matzmoto Sent: Friday, 1 June 2007 12:19 PM To: r-help@stat.math.ethz.ch Subject: [R] scan a directory and then make a string vector consisting offile names Dear R-users, I am looking for a way to scan a directory and then make a string vector consisting of the file names in the directory. For example, under c:\temp\ there are 4 files a.txt, b.txt, c.txt, and d.txt I would like to have a string vector c(a.txt,b.txt,c.txt,d.txt) How do I do that? Thanks Taka, _ Make every IM count. Download Messenger and join the i'm Initiative now. It's free. http://im.live.com/messenger/im/home/?source=TAGHM_June07 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Factor function: odd behavior when labels argument containsduplicates?
There is a difference between levels and labels. I think this is what you want. x - factor(rep(0:5, 2)) x [1] 0 1 2 3 4 5 0 1 2 3 4 5 Levels: 0 1 2 3 4 5 levels(x) - c(1,1:5) x [1] 1 1 2 3 4 5 1 1 2 3 4 5 Levels: 1 2 3 4 5 table(x) x 1 2 3 4 5 4 2 2 2 2 Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile:(I don't have one!) Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Steen Ladelund Sent: Wednesday, 30 May 2007 6:27 PM To: r-help@stat.math.ethz.ch Subject: [R] Factor function: odd behavior when labels argument containsduplicates? Hi all. I think it would be nice to be able to combine levels of a factor on creation a la x - rep(0:5,5) y - factor(x,levels=0:5,labels=c('1','1',2:5)) ## (1) y [1] 1 1 2 3 4 5 1 1 2 3 4 5 1 1 2 3 4 5 1 1 2 3 4 5 1 1 2 3 4 5 Levels: 1 1 2 3 4 5 I thougt this would (should?) create a factor with 5 levels, ie combining 0 and 1 in x into one level in y. I find it hard to predict the behavior of the following lines: table(factor(y)) ## Odd ? 1 1 2 3 4 5 10 0 5 5 5 5 table(factor(factor(y))) ## This is what I want 1 2 3 4 5 10 5 5 5 5 It also seems strange that this should be the way to go: levels(y) - levels(y) ## Does what I want following (1) even tough it appear to be a void statement? y y [1] 1 1 2 3 4 5 1 1 2 3 4 5 1 1 2 3 4 5 1 1 2 3 4 5 1 1 2 3 4 5 Levels: 1 2 3 4 5 Am I missing an important point about factors or the factor function? steen Steen Ladelund, statistician +4543233275 [EMAIL PROTECTED] Research Center for Prevention and Health Glostrup University Hospital, Denmark www.fcfs.kbhamt.dk __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] linear model by month
dat - + Month ExcessReturn Return STO + 8 0.047595875 0.05274292 0.854352503 + 8 0.016134874 0.049226941 4.399372005 + 8 -0.000443869 0.004357305 -1.04980297 + 9 0.002206554 -0.089068828 0.544809429 + 9 0.021296551 0.003795071 0.226875834 + 9 0.006741578 0.014104606 0.721986383 + dat - read.table(textConnection(dat), header = T) dat - transform(dat, Month = factor(paste(M, Month, sep=))) FM - lm(ExcessReturn ~ Month/(Return+STO) - 1, dat) coef(FM) MonthM8MonthM9 -0.0140439300.028057112 MonthM8:Return MonthM9:Return 1.2916883380.097940939 MonthM8:STOMonthM9:STO -0.007593598 -0.031436815 # the coefficients are two intercepts, two slopes on Return, two slopes on STO. Why must it be lm? It might be simple to use lmList from the nlme package. Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile:(I don't have one!) Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Benoit Chemineau Sent: Monday, 28 May 2007 11:48 PM To: r-help@stat.math.ethz.ch Subject: [R] linear model by month Hi R-programmers ! I would like to perform a linear model regressio using the 'lm' function and i don't know how to do it. The data is organised as below: Month ExcessReturn Return STO 8 0.047595875 0.05274292 0.854352503 8 0.016134874 0.049226941 4.399372005 8 -0.000443869 0.004357305 -1.04980297 9 0.002206554 -0.089068828 0.544809429 9 0.021296551 0.003795071 0.226875834 9 0.006741578 0.014104606 0.721986383 the model is: ExcessReturn= a + b1*Return + b2*STO + u, u is the error term, a is the intercept. I would like to have monthly estimates of b1 and b2 using the least squares estimation. (b1month8 and b2month8, b1month9 and b2month9). Thank you for your help ! [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] looking for the na.omit equivalent for a matrix of characters
Surely all you need to do is change them to missing and use na.omit. e.g. x - matrix(letters, 13, 2) x[4] - NA # put one in the middle is.na(x[x == NA]) - TRUE (x - na.omit(x)) Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile:(I don't have one!) Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Yee Sent: Tuesday, 29 May 2007 12:50 PM To: r-help@stat.math.ethz.ch Subject: [R] looking for the na.omit equivalent for a matrix of characters I have a matrix of characters (actually numbers that have been read in as numbers), and I'd like to remove the NA. I'm familiar with na.omit, but is there an equivalent of na.omit when the NA are the actual characters NA? Thanks, Andrew [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Speeding up resampling of rows from a large matrix
Here is a possibility. The only catch is that if a pair of rows is selected twice you will get the results in a block, not scattered at random throughout the columns of G. I can't see that as a problem. ### --- start code excerpt --- nSNPs - 1000 H - matrix(sample(0:1, 120*nSNPs , replace=T), nrow=120) # G - matrix(0, nrow=3, ncol=nSNPs) # Keep in mind that the real H is 120 x 65000 ij - as.matrix(subset(expand.grid(i = 1:120, j = 1:120), i j)) nResamples - 3000 sel - sample(1:nrow(ij), nResamples, rep = TRUE) repf - table(sel) # replication factors ij - ij[as.numeric(names(repf)), ] # distinct choice made G - matrix(0, nrow = 3, ncol = nrow(ij)) # for now for(j in 1:ncol(G)) G[,j] - rowSums(outer(0:2, colSums(H[ij[j, ], ]), ==)) G - G[, rep(1:ncol(G), repf)] # bulk up the result # _ # _ # _ # _pair - replicate(nResamples, sample(1:120, 2)) # _ # _gen - function(x){g - sum(x); c(g==0, g==1, g==2)} # _ # _for (i in 1:nResamples){ # _G - G + apply(H[pair[,i],], 2, gen) # _} ### --- end of code excerpt --- I did a timing on my machine which is a middle-of-the range windows monstrosity... system.time({ + + nSNPs - 1000 + H - matrix(sample(0:1, 120*nSNPs , replace=T), nrow=120) + + # G - matrix(0, nrow=3, ncol=nSNPs) + + # Keep in mind that the real H is 120 x 65000 + + ij - as.matrix(subset(expand.grid(i = 1:120, j = 1:120), i j)) + + nResamples - 3000 + sel - sample(1:nrow(ij), nResamples, rep = TRUE) + repf - table(sel) # replication factors + ij - ij[as.numeric(names(repf)), ] # distinct choice made + + G - matrix(0, nrow = 3, ncol = nrow(ij)) # for now + + for(j in 1:ncol(G)) + G[,j] - rowSums(outer(0:2, colSums(H[ij[j, ], ]), ==)) + + G - G[, rep(1:ncol(G), repf)] # bulk up the result + + # _ + # _ + # _ + # _pair - replicate(nResamples, sample(1:120, 2)) + # _ + # _gen - function(x){g - sum(x); c(g==0, g==1, g==2)} + # _ + # _for (i in 1:nResamples){ + # _G - G + apply(H[pair[,i],], 2, gen) + # _} + # _#-- - + # _ + }) user system elapsed 0.970.000.99 Less than a second. Somewhat of an improvement on the 80 minutes, I reckon. This will increase, of course when you step the size of the H matrix up from 1000 to 65000 columns Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile:(I don't have one!) Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Juan Pablo Lewinger Sent: Friday, 25 May 2007 4:04 PM To: r-help@stat.math.ethz.ch Subject: [R] Speeding up resampling of rows from a large matrix I'm trying to: Resample with replacement pairs of distinct rows from a 120 x 65,000 matrix H of 0's and 1's. For each resampled pair sum the resulting 2 x 65,000 matrix by column: 0 1 0 1 ... + 0 0 1 1 ... ___ = 0 1 1 2 ... For each column accumulate the number of 0's, 1's and 2's over the resamples to obtain a 3 x 65,000 matrix G. For those interested in the background, H is a matrix of haplotypes, each pair of haplotypes forms a genotype, and each column corresponds to a SNP. I'm using resampling to compute the null distribution of the maximum over correlated SNPs of a simple statistic. The code: #--- nSNPs - 1000 H - matrix(sample(0:1, 120*nSNPs , replace=T), nrow=120) G - matrix(0, nrow=3, ncol=nSNPs) # Keep in mind that the real H is 120 x 65000 nResamples - 3000 pair - replicate(nResamples, sample(1:120, 2)) gen - function(x){g - sum(x); c(g==0, g==1, g==2)} for (i in 1:nResamples){ G - G + apply(H[pair[,i],], 2, gen) } #--- The problem is that the loop takes about 80 mins to complete and I need to repeat the whole thing 10,000 times, which would then take over a year and a half! Is there a way to speed this up so that the full 10,000 iterations take a reasonable amount of time (say a week)? My machine has an Intel Xeon 3.40GHz CPU with 1GB of RAM sessionInfo() R version 2.5.0 (2007-04-23) i386-pc-mingw32 I would greatly appreciate any help. Juan Pablo Lewinger Department of Preventive Medicine Keck School of Medicine University of Southern California __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman
Re: [R] name of object in quotes
This is a long-standing idiom: foo - function(xvar) deparse(substitute(xvar)) foo(x1) [1] x1 Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: (I don't have one!) Home Phone:+61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gonzalo Rivero Sent: Wednesday, 23 May 2007 8:14 PM To: R-help@stat.math.ethz.ch Subject: [R] name of object in quotes I am writing a function in which, at some point, I to recuperate the name of a previous object in quotes. I am currently using the function Cs() from the Hmisc library but the result is: foo - function(xvar) { variable - Cs(xvar) return(variable) } foo(x1) xvar when I would expected to obtain x1. Any suggestion? Thanks -- *Gonzalo Rivero* Ph.D. candidate Centro de Estudios Avanzados en Ciencias Sociales Juan March Institute Castelló 77, 2nd floor 28006, Madrid __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] converting a row of a data.frame to a vector
If the data frame has factors and numeric vectors, there is a question on what form you want the row vector to be in. Only a data frame (list) can have a mixture of the two. Consider: dat - data.frame(x=1:3, y=4:6, z=letters[1:3]) (r1 - dat[1,]) x y z 1 1 4 a class(r1) [1] data.frame (r1 - data.matrix(dat)[1,]) x y z 1 4 1 class(r1) [1] integer (r1 - as.matrix(dat)[1,]) x y z 1 4 a class(r1) [1] character Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: (I don't have one!) Home Phone:+61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Yee Sent: Wednesday, 16 May 2007 9:37 AM To: r-help@stat.math.ethz.ch Subject: [R] converting a row of a data.frame to a vector I've searched for the answer to this in the help list archive, but wasn't able to get the answer to work. I'm interested in converting a row of a data.frame into a vector. However, when I use as.vector(x,[1,]) I get another data.frame, instead of a vector. (On the other hand, when I use as.vector(x,[,1]), I get a vector.) Thanks, Andrew [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] number of days
The standard answer is to use as.POSIXlt(2007-05-14) - as.POSIXlt(1950-01-01) Time difference of 20952 days If you want to ensure the time difference is in daily units, then difftime(as.POSIXlt(2007-05-14), as.POSIXlt(1950-01-01), units = days) Time difference of 20952 days If you want to use the answer in numerical computations, then one more coercion is usually needed as.numeric(difftime(as.POSIXlt(2007-05-14), as.POSIXlt(1950-01-01), units = days)) [1] 20952 Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary):+61 7 3826 7304 Mobile:(I don't have one!) Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of threshold Sent: Monday, 14 May 2007 7:16 PM To: r-help@stat.math.ethz.ch Subject: [R] number of days Hi, some of you knows how to calculate in R the number of days between given dates? issue relevant in option pricing thanks, robert -- View this message in context: http://www.nabble.com/number-of-days-tf3751163.html#a10600371 Sent from the R help mailing list archive at Nabble.com. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple question about function with glm
There is a statistical problem and a code problem. The statistical problem is that if your 'x' has a mean that depends non-trivially on predictors, then you would not expect its distribution ignoring predictors to be normal. You would expect the residuals after modelling to be normal. Basically you cannot sensibly test normality of the response before you fit the model. It's a very common mistake, even if rather an obvious one. The code problem is that, do you really know whether your models have been fitted or not? The 'summary(xmodel)' part of your function below will not print anything out, so if you were expecting something from that you would be disappointed. You might try replacing summary(xmodel) confint(xmodel) By print(summary(xmodel)) print(confint(xmodel)) But this is not really a very good paradigm in genera. Finally, I'm a bit puzzled why you use glm() when the simpler lm() would have done the job. You are fitting a linear model and do not need the extra paraphernaila that generalized linear models require. Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chung-hong Chan Sent: Sunday, 6 May 2007 6:47 PM To: r-help@stat.math.ethz.ch Subject: [R] Simple question about function with glm Dear all, I coded a function called u.glm u.glm - function (x,ahi,age,bmiz,gender) { library(nortest) lil.rslt - lillie.test(x) if (lil.rslt$p.value 0.05) { cat(Logtrans=0, lillie=,lil.rslt$p.value,\n) xmodel-glm(x~ahi+age+bmiz+as.factor(gender)) summary(xmodel) confint(xmodel) } else { cat(Logtrans=1, lillie=,lil.rslt$p.value,\n) xmodel-glm(x~ahi+age+bmiz+as.factor(gender)) summary(xmodel) confint(xmodel) } } Basically I just want to test the response variable for normality before modeling. When I try to use this function, it can do the lillie's test but failed to do the glm. What's wrong with my code? Regards, CH -- The scientists of today think deeply instead of clearly. One must be sane to think clearly, but one can think deeply and be quite insane. Nikola Tesla http://www.macgrass.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hotelling T-Squared vs Two-Factor Anova
I take it all subjects are measured at the same time points, or Hotelling's T^2 becomes rather messy. The essential difference lies in the way the variance matrix is modelled. The usual repeated measures model would model the variance matrix as equal variances and equal covariances, i.e. with two parameters, (though you can vary this using, e.g. lme). Hotelling's T^2 would model the variance matrix as a general symmetric matrix, i.e. for the 4x4 case using 4+3+2+1 = 10 parameters. If it is appropriate, the repeated measures model is much more parsimonious. Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Sean Scanlan Sent: Saturday, 14 April 2007 5:38 PM To: [EMAIL PROTECTED] Subject: [R] Hotelling T-Squared vs Two-Factor Anova Hi, I am a graduate student at Stanford University and I have a general statistics question. What exactly is the difference between doing a two-factor repeated measures ANOVA and a Hotelling T-squared test for a paired comparison of mean vectors? Given: Anova: repeated measures on both factors, 1st factor = two different treatments, 2nd factor = 4 time points, where you are measuring the blood pressure at each of the time points. Hotelling T^2: You look at the difference in the 4x1 vector of blood pressure measurements for the two different treatments, where the four rows in the vector are the four time points. I am mainly interested in the main effects of the two treatments. Can someone please explain if there would be a difference in the two methods or any advantage in using one over the other? Thanks, Sean __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] resolving expand.grid NA errors
Hi Bob, Instead of fitting the model as mod.multacute -multinom(kc$group ~ kc$in.acute.danger * kc$violent.convictions, na.rm=T) you might have more success fitting it sa mod.multacute - multinom(group ~ in.acute.danger * violent.convictions, data = kc, na.action = na.omit) This separates the variables from the data frame in which they occur. When you predict with a new data frame, the process will look for factors in.acure.danger and not kc$in.acute.danger. Note also that na.rm is not an argument for multinom. You probably mean na.action. I suspect this is the problem, but without the data I can't be sure. Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: (I don't have one!) Home Phone:+61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bob Green Sent: Monday, 26 March 2007 6:30 AM To: r-help@stat.math.ethz.ch Subject: [R] resolving expand.grid NA errors I am hoping for some advice regarding resolving error messages I have received when trying to use the expand.grid command. library(nnet) library(MASS) library(car) mod.multacute -multinom(kc$group ~ kc$in.acute.danger * kc$violent.convictions, na.rm=T) summary(mod.multacute, cor=F, Wald=T) Anova (mod.multacute) confint (mod.multacute) predictors - expand.grid(group=1:3, in.acute.danger = c(y,n), violent.convictions = c(y,n)) p.fit - predict(mod.multacute, predictors, type='probs') Error in predict.multinom(mod.multacute, predictors, type = probs) : NAs are not allowed in subscripted assignments In addition: Warning message: 'newdata' had 12 rows but variable(s) found have 160 rows There are two errors - the NA error which I thought would have been removed in line 3 above. I also tried ra.omit. I know there will be a difference between the raw data and the new data but do not know what I need to change to be able to successfully run the command. What I want to do is obtain the fitted probabilities and plot them. Any suggestions are appreciated, Bob Green __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem dropping rows based on values in a column
I think you want delete - c(14772,14744) jdata - subset(jdata, !(PID %in% delete)) Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: (I don't have one!) Home Phone:+61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John Sorkin Sent: Monday, 26 March 2007 12:19 PM To: r-help@stat.math.ethz.ch Subject: [R] Problem dropping rows based on values in a column I am trying to drop rows of a dataframe based on values of the column PID, but my strategy is not working. I hope someoen can tell me what I am doing incorrectly. # Values of PID column jdata[,PID] [1] 16608 16613 16355 16378 16371 16280 16211 16169 16025 11595 15883 15682 15617 15615 15212 14862 16539 [18] 12063 16755 16720 16400 16257 16209 16200 16144 11598 13594 15419 15589 15982 15825 15834 15491 15822 [35] 15803 15795 10202 15680 15587 15552 15588 15375 15492 15568 15196 10217 15396 15477 15446 15374 14092 [52] 14033 15141 14953 15473 10424 13445 14854 10481 14793 14744 14772 #Prepare to drop last two rows, rows that ahve 14744 and 14772 in the PID column delete-c(14772,14744) #Try to delete last two rows, but as you will see, I am not able to drop the last two rows. jdata[jdata$PID!=delete,PID] [1] 16608 16613 16355 16378 16371 16280 16211 16169 16025 11595 15883 15682 15617 15615 15212 14862 16539 [18] 12063 16755 16720 16400 16257 16209 16200 16144 11598 13594 15419 15589 15982 15825 15834 15491 15822 [35] 15803 15795 10202 15680 15587 15552 15588 15375 15492 15568 15196 10217 15396 15477 15446 15374 14092 [52] 14033 15141 14953 15473 10424 13445 14854 10481 14793 14744 14772 Thanks, John John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics Baltimore VA Medical Center GRECC, University of Maryland School of Medicine Claude D. Pepper OAIC, University of Maryland Clinical Nutrition Research Unit, and Baltimore VA Center Stroke of Excellence University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) [EMAIL PROTECTED] Confidentiality Statement: This email message, including any attachments, is for the\ s...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need to fit a regression line using orthogonal residuals
Jonathon Kopecky asks: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jonathon Kopecky Sent: Tuesday, 30 January 2007 5:52 AM To: r-help@stat.math.ethz.ch Subject: [R] Need to fit a regression line using orthogonal residuals I'm trying to fit a simple linear regression of just Y ~ X, but both X and Y are noisy. Thus instead of fitting a standard linear model minimizing vertical residuals, I would like to minimize orthogonal/perpendicular residuals. I have tried searching the R-packages, but have not found anything that seems suitable. I'm not sure what these types of residuals are typically called (they seem to have many different names), so that may be my trouble. I do not want to use Principal Components Analysis (as was answered to a previous questioner a few years ago), I just want to minimize the combined noise of my two variables. Is there a way for me to do this in R? [WNV] There's always a way if you are prepared to program it. Your question is a bit like asking Is there a way to do this in Fortran? The most direct way to do it is to define a function that gives you the sum of the perpendicular distances and minimise it using, say, optim(). E.g. ppdis - function(b, x, y) sum((y - b[1] - b[2]*x)^2/(1+b[2]^2)) b0 - lsfit(x, y)$coef # initial value op - optim(b0, ppdis, method = BFGS, x=x, y=y) op # now to check the results plot(x, y, asp = 1) # why 'asp = 1'?? exercise abline(b0, col = red) abline(op$par, col = blue) There are a couple of things about this you should be aware of, though First, this is just a fiddly way of finding the first principal component, so your desire not to use Principal Component Analysis is somewhat thwarted, as it must be. Second, the result is sensitive to scale - if you change the scales of either x or y, e.g. from lbs to kilograms, the answer is different. This also means that unless your measurement units for x and y are comparable it's hard to see how the result can make much sense. A related issue is that you have to take some care when plotting the result or orthogonal distances will not appear to be orthogonal. Third, the resulting line is not optimal for either predicting y for a new x or x from a new y. It's hard to see why it is ever of much interest. Bill Venables. Jonathon Kopecky University of Michigan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] comparing two matrices
But this is using paste() the wrong way round. A better way would be join - function(x) do.call(paste, c(as.data.frame(x), sep = \r)) which(join(mat1) %in% join(mat2)) [1] 8 13 16 19 24 This is essentially the technique used by duplicated.data.frame Bill Venables -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Adrian Dusa Sent: Sunday, 21 January 2007 8:17 PM To: Dimitris Rizopoulos Cc: [EMAIL PROTECTED]; r-help@stat.math.ethz.ch Subject: Re: [R] comparing two matrices On Sunday 21 January 2007 12:04, Dimitris Rizopoulos wrote: I think the following should work in your case: mat1 - data.matrix(expand.grid(0:2, 0:2, 0:2)) mat2 - mat1[c(19, 16, 13, 24, 8), ] ind1 - apply(mat1, 1, paste, collapse = /) ind2 - apply(mat2, 1, paste, collapse = /) match(ind2, ind1) Oh yes, I thought about that too. It works fast enough for small matrices, but I deal with very large ones. Using paste() on such matrices decreases the speed dramatically. Thanks again, Adrian -- Adrian Dusa Romanian Social Data Archive 1, Schitu Magureanu Bd 050025 Bucharest sector 5 Romania Tel./Fax: +40 21 3126618 \ +40 21 3120210 / int.101 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vectorize rearrangement within each column
As with most things like this, you can trade memory for speed. Here is an obfuscated solution that appears to eschew loops entirely. ma - matrix(10:15, nr = 3) idx - matrix(c(1,3,2, 2,3,1), nr = 3) mb - ma mb[] - as.vector(ma)[as.vector(idx + outer(rep(nrow(ma), nrow(ma)), 1:ncol(ma)-1, '*'))] mb [,1] [,2] [1,] 10 14 [2,] 12 15 [3,] 11 13 Ordinarily, though, my preferred solution would be the for() loop. Bill Venables CMIS, CSIRO Laboratories, PO Box 120, Cleveland, Qld. 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary):+61 7 3826 7304 Mobile (rarely used):+61 4 1963 4642 Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Osorio Roberto Sent: Friday, 19 January 2007 4:15 PM To: r-help@stat.math.ethz.ch Subject: [R] Vectorize rearrangement within each column Consider a matrix like ma = matrix(10:15, nr = 3) ma [,1] [,2] [1,] 10 13 [2,] 11 14 [3,] 12 15 I want to rearrange each column according to row indexes (1 to 3) given in another matrix, as in idx = matrix(c(1,3,2, 2,3,1), nr = 3) idx [,1] [,2] [1,]12 [2,]33 [3,]21 The new matrix mb will have for each column the corresponding column of ma indexed by the corresponding column of idx, as in mb = ma for (j in 1:2) mb[,j] = ma[idx[,j], j] mb [,1] [,2] [1,] 10 14 [2,] 12 15 [3,] 11 13 Can I avoid the for() loop? I'm specially interested to find out if a fast implementation using lapply() would be feasible for large input matrices (analogues of ma and idx) transformed into data frames. Roberto Osorio __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] if ... problem with compound instructions
Step 1: quadrant - 1 + (X[, 1] 0) + 2*(X[, 2] 0) This is not the usual labelling of the quadrants as '3' and '4' are interchanged. If you want to be picky about it quadrant - ifelse(quadrant 2, 7 - quadrant, quadrant) Step 2: angle - atan2(X[,2], X[,1]) %% (2*pi) # I think this is what you want (why did you want to know the quadrant?) Oh, then you might do X[, 3:4] - cbind(quadrant, angle) --- Bill Venables CMIS, CSIRO Laboratories, PO Box 120, Cleveland, Qld. 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile (rarely used): +61 4 1963 4642 Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Richard Rowe Sent: Monday, 1 January 2007 12:36 PM To: r-help@stat.math.ethz.ch Subject: [R] if ... problem with compound instructions I am having problems with the 'if' syntax. I have an n x 4 matrix, X say. The first two columns hold x, y values and I am attempting to fill the second two columns with the quadrant in which the datapoint (x, y) is and with the heading angle. So I have two problems 1) how to do this elegantly (at which I've failed as I can't seem to vectorize the problem) and 2) how to accomplish the task in a for loop ... for (i in 1: length(X[,1])) ( if ((X[i,1] ==0) (X[i,2] ==0)) (X[i,3] - NA; X[i,4] -NA) else ( removing the pathological case ... then a series of nested if statements assigning quadrant and calculating heading e.g. if ((X[i,1] 0) (X[i,2] = 0)) (X[i,3] - 4; X[i,4] - atan(X[i,1]/X[i,2]) + 2*pi) else ( In the first instance the ';' seems to the source of a syntax error. Removing the second elements of the compound statement solves the syntax problem and the code runs. As the R syntax is supposed to be 'Algol-like' I had thought if A then B else C should work for compound B ... ie that the bracket (X[i,3] - NA; X[i,4] -NA) should be actioned 1) any elegant solutions to what must be a common task? 2) any explanations for the ';' effect ? thanks Richard Rowe Dr Richard Rowe Zoology Tropical Ecology School of Tropical Biology James Cook University Townsville 4811 AUSTRALIA ph +61 7 47 81 4851 fax +61 7 47 25 1570 JCU has CRICOS Provider Code 00117J __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange logical results
Hi Erin, You would be safe on a machine that represented floating point numbers in base 10, and I haven't seen one of those for such a long time... All modern machines use base 2 for floating point numbers. The moral of the story is not to believe what you see printed. The number you see printed innocently as '-0.4' has been arrived at by two different processes and uses two different *approximations* to the real thing on a binary machine, and my chance they have arrived at a slightly different result. Slight, but enough to make '==' ring the alarm. Here is a demo. x - seq(-1,1,by=0.1) x[7] - (-0.4) [1] 1.110223e-16 So the method used by seq() to arrive at an approximation to -0.4 is just slightly different from the method used by the parser when it reads the characters '-0.4' and translates them into a floating point number. It just so happens that for the others you checked the two approximations agreed, but you can't trust that to happen all the time. Moral of the story: don't use the '==' or '!=' operators with floating point numbers. It's an old tale but still current. OK, so what can you do to implement the idea of checking equality 'within a tolerance'? I'm glad you asked. You can write a couple of binary operators yourself. There is an object called .Machine that is a list of machine constants. The obvious one to compare the difference with is .Machine$double.eps `%~=%` - function(a, b) abs(a - b) .Machine$double.eps `%~!%` - function(a, b) abs(a - b) .Machine$double.eps x %~=% -0.4 [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [15] FALSE FALSE FALSE FALSE FALSE FALSE FALSE x %~!% -0.4 [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE [15] TRUE TRUE TRUE TRUE TRUE TRUE TRUE The world is approximately sane once more. Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Erin Hodgess Sent: Friday, 29 December 2006 8:36 PM To: r-help@stat.math.ethz.ch Subject: [R] strange logical results Dear R People: I am getting some odd results when using logical operators: x - seq(from=-1,to=1,by=0.1) x [1] -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 [16] 0.5 0.6 0.7 0.8 0.9 1.0 x == -1 [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE x == -0.9 [1] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE x == -0.8 [1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE x == -0.7 [1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE x == -0.6 [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE x == -0.5 [1] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE x == -0.4 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE Should this show as true also, please? I saw this in both Windows and LINUX Versions 2.4.0 Thanks in advance, Sincerely, Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Higher Dimensional Matrices
Lars from space [sic] asks: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of downunder Sent: Monday, 25 December 2006 12:31 PM To: r-help@stat.math.ethz.ch Subject: [R] Higher Dimensional Matrices Hi all. I want to calculate partial correlations while controlling for one or more variables. That works already fine when I control for example just for x[,1] and x[,2] that gives me one single correlation matrix and i have to to it for x [,1]...x[,10]. That would give me out 10 matrices. Controlling for 2 Variables 100 matrices. how can I run a loop to get f.e the 10 or 100 matrices at once? I appreciate for every hint. have nice holidays. I don't quite understand this. You have 10 variables and you want to find the partial correlations controlling for two of them at a time. If you take each possible set of two variables to condition upon at a time, this would give you choose(10, 2) = 45 matrices, wouldn't it? Where do you get '10 or 100' matrices from? greetings lars x - read.table(C:/.dat) dim(x) #200x10 a - matrix(0,200,10) for (i in 1:10) a[,i] - residuals(lm(x[,i]~1+x[,1]+x[,2])) b - matrix(0,200,10) for (i in 1:10) b[,i] - residuals(lm(x[,i]~1+x[,1]+x[,2])) #a=round(a,5) #b=round(b,5) d - cor(a,b) d But a and b are the same, aren't they? Why do you need to compute them twice? Why not just use cor(a, a) [which is the same as cor(a), of course]? There is a more serious problem here, though. Your residuals are after regression on x[, 1:2] so if you *select* x[, 1:2] as the y-variables in your regression then the residuals are going to be zero, but in practice round-off error. so the first two rows and columns of d will be correlations with round-off error, i.e. misleading junk. It doesn't make sense to include the conditioning variables in the correlation matrix *conditioning on them*. Only the 8 x 8 matrix of the others among themselves is defined, really. So how do we do it? Here are a few pointers. To start, here is a function that uses a somewhat more compact way of finding the partial correlations than your method. Sorting out how it works should be a useful exercise. partialCorr - function (cond, mat) cor(qr.resid(qr(cbind(1, mat[, cond])), mat[, -cond])) To find the matrix of partial correlations conditioning on x[, 1:2] you would use d - partialCorr(c(1,2), x) So how to do it for all possible conditioning pairs of variables. Well you could do it in an obvious loop: cmats - array(0, c(8,8,45)) k - 0 for(i in 1:9) for(j in (i+1):10) { k - k+1 cmats[, , k] - partialCorr(c(i, j), x) } Now the results are in a 3-way array, but without any kind of labels. Perhaps you should think about how to fix that one yourself... With more than two conditioning variables you should probably use a function to generate the subsets of the appropriate size rather than trying to write ever more deeply nested loops. There are plenty of functions around to do this. Bill Venables. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to sum one column in a data frame keyed on other columns
Here is an elementary way of doing it: dat url time somethingirrelevant visits 1 www.foo.com 1:00 xxx100 2 www.foo.com 1:00 yyy 50 3 www.foo.com 2:00 xyz 25 4 www.bar.com 1:00 xxx200 5 www.bar.com 1:00 zzz200 6 www.foo.com 2:00 xxx500 dat - transform(dat, key = paste(url, time)) total_visits - with(dat, tapply(visits, key, sum)) m - match(names(total_visits), dat$key) tdat - cbind(dat[m, c(url, time)], total_visits) tdat url time total_visits 4 www.bar.com 1:00 400 1 www.foo.com 1:00 150 3 www.foo.com 2:00 525 This should not be too difficult to morph into a fairly general function. Here's what I might do [warning: somewhat obscure code follows] sumUp - function(dat, key_list, sum_list) { key - with(dat, do.call(paste, dat[, key_list, drop = FALSE])) totals - as.matrix(sapply(dat[, sum_list, drop = FALSE], tapply, key, sum)) dimnames(totals)[[2]] - paste(total, sum_list, sep = _) m - match(dimnames(totals)[[1]], key) cbind(dat[m, key_list, drop = FALSE], totals) } check: sumUp(dat, c(url, time), visits) url time total_visits 4 www.bar.com 1:00 400 1 www.foo.com 1:00 150 3 www.foo.com 2:00 525 sumUp(dat, url, visits) url total_visits 4 www.bar.com 400 1 www.foo.com 675 Question for the reader: why to you need 'drop = FALSE' (in three places)? Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of George Nachman Sent: Wednesday, 13 December 2006 9:35 AM To: r-help@stat.math.ethz.ch Subject: [R] How to sum one column in a data frame keyed on other columns I have a data frame that looks like this: url time somethingirrelevant visits www.foo.com 1:00 xxx 100 www.foo.com 1:00 yyy 50 www.foo.com 2:00 xyz 25 www.bar.com 1:00 xxx 200 www.bar.com 1:00 zzz 200 www.foo.com 2:00 xxx 500 I'd like to write some code that takes this as input and outputs something like this: url time total_vists www.foo.com 1:00 150 www.foo.com 2:00 525 www.bar.com 1:00 400 In other words, I need to calculate the sum of visits for each unique tuple of (url,time). I can do it with this code, but it's very slow, and doesn't seem like the right approach: keys = list() getkey = function(m,cols,index) { paste(m[index,cols],collapse=,) } for (i in 1:nrow(data)) { keys[[getkey(data,1:2,i)]] = 0 } for (i in 1:nrow(data)) { keys[[getkey(data,1:2,i)]] = keys[[getkey(data,1:2,i)]] + data[i,4] } I'm sure there's a more functional-programming approach to this problem! Any ideas? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strings as factors
Here is a possibility: test - expand.grid(id = 1:2, sex = c('male', 'female')) sapply(test, class) id sex integer factor test - transform(test, sex = as.character(sex)) sapply(test, class) id sex integer character But I am surprised at the reason you give for needing it as a character vector, because factors often act as character vectors under matching anyway. sexf - factor(test[[2]]) sexf [1] male male female female Levels: female male which(sexf %in% male) [1] 1 2 which(sexf == male) [1] 1 2 Bill Venables -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Alexander Nervedi Sent: Tuesday, 12 December 2006 10:09 AM To: r-help@stat.math.ethz.ch Subject: [R] strings as factors Hi, To be able to match cases with a benchmark I need to have a data.frame with a character id variable. however, I am surprised why this seems to be so hard. In fact I was unable to succeed. Here is what I tried: test1 -expand.grid(ID = 1:2, sex = c(male,female)) is(test1[,2]) [1] factor oldClass test2 -expand.grid(ID = 1:2, sex = c('male','female')) is(test2[,2]) [1] factor oldClass test3 -expand.grid(ID = 1:2, sex = I(c(male,female))) is(test3[,2]) [1] factor oldClass test4 -expand.grid(ID = 1:2, sex = I(c('male','female'))) is(test4[,2]) [1] factor oldClass options(stringsAsFactors = FALSE) options(stringsAsFactors) $stringsAsFactors [1] FALSE test5 -expand.grid(ID = 1:2, sex = I(c('male','female'))) is(test5[,2]) [1] factor oldClass is there anyway I can get sex to be a character? Arnab _ Visit MSN Holiday Challenge for your chance to win up to $50,000 in Holiday __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Force square crosstabulation
Use factors with specified levels. lev - letters[1:4] table(factor(letters[1:4], levels = lev), factor(letters[c(1:3,3)], levels = lev)) a b c d a 1 0 0 0 b 0 1 0 0 c 0 0 1 0 d 0 0 1 0 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Manuel Morales Sent: Sunday, 3 December 2006 11:27 AM To: r-help@stat.math.ethz.ch Subject: [R] Force square crosstabulation Hello list members, I'm looking for a way to force the results of a crosstabulation to be square - that is, to include 0 values. For example: table(letters[1:4],letters[c(1:3,3)]) yields: a b c a 1 0 0 b 0 1 0 c 0 0 1 d 0 0 1 I would like to return: a b c d a 1 0 0 0 b 0 1 0 0 c 0 0 1 0 d 0 0 1 0 Any suggestions? Thanks! -- Manuel A. Morales http://mutualism.williams.edu __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Command Line Prompt Symbol
The simplest way to do this is to put options(prompt = R ) in your .Rprofile file. If you don't have an .Rprifile file, then persuading Windows to let you call a file by that name can be frustrating. I suggest you delegate the job to R itself and use something like this --- wd - getwd() setwd(Sys.getenv(R_USER)) ## change working directories cat('\noptions(prompt = R )\n', file = .Rprofile, append = TRUE) setwd(wd) --- Putting the .Rprofile in the R_USER directory ensures that it will be used prior to all invocations of R you make. Bill Venables, CMIS, CSIRO Laboratories, PO Box 120, Cleveland, Qld. 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary):+61 7 3826 7304 Mobile (rarely used):+61 4 1963 4642 Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jacob van Wyk Sent: Friday, 10 November 2006 3:33 PM To: r-help@stat.math.ethz.ch Subject: [R] Command Line Prompt Symbol Hi I run R in Windows. Is there a simple way of changing the prompt symbol to, say, R ? (Not just for a temporary session, but every time R command window is opened.) The documentation of doing this is rather sparse. Much appreciated for your assistance. Jacob Jacob L van Wyk Department of Statistics University of Johannesburg, APK P O Box 524 Auckland Park 2006 South Africa Tel: +27 11 489 3080 Fax: +27 11 489 2832 [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Type II and III sum of square in Anova (R, car package)
I cannot resist a very brief entry into this old and seemingly immortal issue, but I will be very brief, I promise! Amasco Miralisus suggests: As I understood form R FAQ, there is disagreement among Statisticians which SS to use (http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-does-the-output-from-a nova_0028_0029-depend-on-the-order-of-factors-in-the-model_003f). To let this go is to concede way too much. The 'disagreement' is really over whether this is a sensible question to ask in the first place. One side of the debate suggests that the real question is what hypotheses does it make sense to test and within what outer hypotheses. Settle that question and no issue on types of sums of squares arises. This is often a hard question to get your head around, and the attraction of offering a variety of 'types of sums of squares' holds out the false hope that perhaps you don't need to do so. The bad news is that for good science and good decision making, you do. Bill Venables. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] singular matrix
Nongluck Klibbua reports: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Nongluck Klibbua Sent: Tuesday, 29 August 2006 11:12 AM To: R-help@stat.math.ethz.ch Subject: [R] singular matrix Dear R-users, I try to use solve to get the solution of this matrix.But it has error happen below. How I can solve this problem. [1] a [,1] [1,] 0.8109437 [2,] 5.7569740 [1] b [,1] [,2] [1,] 0.3141293 2.230037 [2,] 2.2300367 15.831264 Error in solve.default(b, a) : system is computationally singular: reciprocal condition number = 1.37415e-018 The irony seems to be that if you quote them to only this number of digits then the system is no longer quite singular enough for the problem to appear, at least on Windows R 2.3.1: a [,1] [1,] 0.8109437 [2,] 5.7569740 b [,1] [,2] [1,] 0.3141293 2.230037 [2,] 2.2300367 15.831264 solve(b, a) [,1] [1,] 2.5831242104 [2,] -0.0002203103 b %*% solve(b, a) - a ### check [,1] [1,] -1.110223e-16 [2,] -8.881784e-16 In general, though, in dealing with singular systems you might want to look at the function ginv in the MASS library. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Check values in colums matrix
As a minor footnote to both of these, I would add that both assume that all the columns of the dataset are numeric. It doesn't cost much to generalize it to cover any matrix structure, of any mode: constantColmuns - function(Xmat) which(apply(Xmat, 2, function(z) length(unique(z)) == 1)) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Berton Gunter Sent: Friday, 25 August 2006 9:37 AM To: 'Gabor Grothendieck'; 'Muhammad Subianto' Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Check values in colums matrix Absolutely. But do note that if the values in obj are the product of numerical computations then columns of equal values may turn out to be only **nearly** equal and so the sd may turn out to be **nearly** 0 and not exactly 0. This is a standard issue in numerical computation, of course, and has been commented on in this list at least dozens of times, but it's still a gotcha for the unwary (so now dozens +1). -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gabor Grothendieck Sent: Thursday, August 24, 2006 4:28 PM To: Muhammad Subianto Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Check values in colums matrix Try sd(obj.tr) which will give a vector of standard deviations, one per column. A column's entry will be zero if and only if all values in the column are the same. On 8/24/06, Muhammad Subianto [EMAIL PROTECTED] wrote: Dear all, I apologize if my question is quite simple. I have a dataset (20 columns 1000 rows) which some of columns have the same value and the others have different values. Here are some piece of my dataset: obj - cbind(c(1,1,1,4,0,0,1,4,-1), c(0,1,1,4,1,0,1,4,-1), c(1,1,1,4,2,0,1,4,-1), c(1,1,1,4,3,0,1,4,-1), c(1,1,1,4,6,0,1,5,-1), c(1,1,1,4,6,0,1,6,-1), c(1,1,1,4,6,0,1,7,-1), c(1,1,1,4,6,0,1,8,-1)) obj.tr - t(obj) obj.tr obj.tr [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [1,]11140014 -1 [2,]01141014 -1 [3,]11142014 -1 [4,]11143014 -1 [5,]11146015 -1 [6,]11146016 -1 [7,]11146017 -1 [8,]11146018 -1 How can I do to check columns 2,3,4,6,7 and 9 have the same value, and columns 1,5 and 8 have different values. Best, Muhammad Subianto __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm inside one self-defined function
Mike Wolfgang asks: From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mike Wolfgang Sent: Wednesday, 23 August 2006 1:31 PM To: R-help list Subject: [R] glm inside one self-defined function Hi list, I've searched in R-help and found some related discussions but still could not understand this type of error. My own function is pretty complex, so I would not put it here, but the basic algorithm is like this: myfun-function(k){ mydata-...#by someway I create a data frame mymodel-glm(y~.,family=binomial(),data=mydata) ...#some other stuff } I think you are leaving out something. Here is a test of what you claim gives a problem (R 2.3.1, Windows): myfun - function(n) { + z - rnorm(n) + mydata - data.frame(x = z, + y = rbinom(n, size = 1, prob = exp(z)/(1+exp(z + fm - glm(y ~ x, binomial, mydata) + fm + } myfun(100) Call: glm(formula = y ~ x, family = binomial, data = mydata) Coefficients: (Intercept)x 0.1587 1.0223 Degrees of Freedom: 99 Total (i.e. Null); 98 Residual Null Deviance: 137.6 Residual Deviance: 118.3AIC: 122.3 Not even a murmur of complaint. (This also works in S-PLUS 7.0 but earlier versions of S-PLUS gave a problem rather like the one you note, curiously.) Look again at your code and see if the abstract version you give really matches what you did, may I suggest? as I execute this function, it gives error like this Error in inherits(x, data.frame) : object mydata not found So I guess glm here tries to find mydata in the parent environment. Why doesn't it take mydata inside the function? How to let glm correctly locate it? Is this (scope/environment) mentioned in R manual? Thanks, Mike __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with workaround for: Function '`[`' is not in thederivatives table
Earl F. Glynn asks: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Earl F. Glynn Sent: Tuesday, 15 August 2006 8:44 AM To: r-help@stat.math.ethz.ch Subject: [R] Help with workaround for: Function '`[`' is not in thederivatives table # This works fine: a - 1 b - 2 c - 3 E - expression(a * exp(b*X) + c) X - c(0.5, 1.0, 2.0) eval(E) [1] 5.718282 10.389056 57.598150 D(E, b) a * (exp(b * X) * X) eval(D(E, b)) [1] 1.359141 7.389056 109.196300 # But if (a,b,c) are replaced with (A[1], A[2], A[3]), how can I get a derivative using D? It's well to note what D can differentiate with respect to. The second argument is, to quote the help page, a character string giving the name of a variable... 'A[1]' is a character string, but it is not the name of a variable. When parsed it becomes a call to the function, literally, `[`. A - c(1, 2, 3) E - expression(A[1] * exp(A[2]*X) + A[3]) X - c(0.5, 1.0, 2.0) eval(E) [1] 5.718282 10.389056 57.598150 No problem, because the evaluator does know how to evaluate `[`, along with everything else in this expr. # Why doesn't this work? Any workarounds? D(E, A[2]) Error in D(E, A[2]) : Function '`[`' is not in the derivatives table It fails because 'A[2]' is not a (character string giving the) name of a variable. I think the error message could be a bit more informative, but ... all you need to do is read he help page, really. If I want to have a long vector of coefficients, A, (perhaps dozens) how can I use D to compute partial derivatives? Essentially you need to turn calls to '[' into names of variables, do the derivative business and then turn them back again. This is not easy to do in complete generality, but if you are only talking about singly subscripted arrays, you can get somewhere. Here is an outline of what I mean. Ident - ([A-z][A-z0-9_.]*) Subsc - ([A-z][A-z0-9_.]*|[0-9]+) patn - paste(Ident, \\[, Subsc, \\], sep = ) repl - \\1__\\2 E - expression(A[1] * exp(A[2]*X) + A[3]) Es - deparse(E[[1]]) Es [1] A[1] * exp(A[2] * X) + A[3] Ess - gsub(patn, repl, Es) Ess [1] A__1 * exp(A__2 * X) + A__3 Ex - parse(text = Ess)[[1]] Ex A__1 * exp(A__2 * X) + A__3 OK, the calls to `[` have been replaced by variables with two underscores in the middle. We hope this works - there is a strong assumption here on just how complicated your indices are, for example. We are assuming they are either identifiers (not using two successive underscores in their names) or numbers. If they are not, you need to get even craftier. Ex1 - D(Ex, A__2) Ex1 A__1 * (exp(A__2 * X) * X) Ex1s - deparse(Ex1) Ex1s [1] A__1 * (exp(A__2 * X) * X) pat1 - paste(Ident, __, Subsc, sep = ) rep1 - \\1\\[\\2\\] Ex1ss - gsub(pat1, rep1, Ex1s) Ex1ss [1] A[1] * (exp(A[2] * X) * X) Ex2 - parse(text = Ex1ss)[[1]] Ex2 A[1] * (exp(A[2] * X) * X) Which is the required result. This is messy and gets messier if you are looking for some kind of generality, but you need to remember, R is not, and never will be, a replacement for Maple. Thanks for any help with this. Best of luck in automating this, but the tools are there. Bill Venables. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constrain coefs. in linear model to sum to 0
Gorjanc Gregor asks: Hello! I would like to use constrain to sum coeficients of a factor to 0 instead of classical corner contraint i.e. I would like to fit a model like lm(y ~ 1 + effectA + effectB) and say get parameters intercept effectA_1 effectA_2 effectB_1 effectB_2 effectB_3 where effectA_1 represents deviation of level A_1 from intercept and sum(effectA_1, effectA_2) = 0 and the same for factor B. Is this possible to do? Yes, but not quite as simply as you would like. If you set options(contrasts = c(contr.sum, contr.poly)) for example, then factor models are parametrised as you wish above, BUT you don't get all the effects directly In your case above, for example, if fm is the fitted model object, then coef(fm) Will give you intercept, effectA_2, effectB_2, effectB_3. The remaining effects*_1 you will need to calculate as the negative of the sum of all the others. This gets a bit more complicated when you have crossed terms, a*b, but the same principle applies. Bill Venables. Lep pozdrav / With regards, Gregor Gorjanc -- University of Ljubljana PhD student Biotechnical FacultyURI: http://www.bfro.uni-lj.si/MR/ggorjan Zootechnical Department mail: gregor.gorjanc at bfro.uni-lj.si Groblje 3 tel: +386 (0)1 72 17 861 SI-1230 Domzale fax: +386 (0)1 72 17 888 Slovenia, Europe -- One must learn by doing the thing; for though you think you know it, you have no certainty until you try. Sophocles ~ 450 B.C. Well, now's your chance! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting models in a loop
Markus Gesmann writes: Murray, How about creating an empty list and filling it during your loop: mod - list() for (i in 1:6) { mod[[i]] - lm(y ~ poly(x,i)) print(summary(mod[[i]])) } All your models are than stored in one object and you can use lapply to do something on them, like: lapply(mod, summary) or lapply(mod, coef) I think it is important to see why this deceptively simple solution does not achieve the result that Murray wanted. Take any fitted model object, say mod[[4]]. For this object the formula component of the call will be, literally, y ~ poly(x, i), and not y ~ poly(x, 4), as would be required to use the object, e.g. for prediction. In fact all objects have the same formula. You could, of course, re-create i and some things would be OK, but getting pretty messy. You would still have a problem if you wanted to plot the fit with termplot(), for example, as it would try to do a two-dimensional plot of the component if both arguments to poly were variables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: 01 August 2006 06:16 To: [EMAIL PROTECTED]; r-help@stat.math.ethz.ch Subject: Re: [R] Fitting models in a loop Murray, Here is a general paradigm I tend to use for such problems. It extends to fairly general model sequences, including different responses, c First a couple of tiny, tricky but useful functions: subst - function(Command, ...) do.call(substitute, list(Command, list(...))) abut - function(...) ## jam things tightly together do.call(paste, c(lapply(list(...), as.character), sep = )) Name - function(...) as.name(do.call(abut, list(...))) Now the gist. fitCommand - quote({ MODELi - lm(y ~ poly(x, degree = i), theData) print(summary(MODELi)) }) for(i in 1:6) { thisCommand - subst(fitCommand, MODELi = Name(model_, i), i = i) print(thisCommand) ## only as a check eval(thisCommand) } At this point you should have the results and objects(pat = ^model_) should list the fitted model objects, all of which can be updated, summarised, plotted, c, because the information on their construction is all embedded in the call. Bill. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Murray Jorgensen Sent: Tuesday, 1 August 2006 2:09 PM To: r-help@stat.math.ethz.ch Subject: [R] Fitting models in a loop If I want to display a few polynomial regression fits I can do something like for (i in 1:6) { mod - lm(y ~ poly(x,i)) print(summary(mod)) } Suppose that I don't want to over-write the fitted model objects, though. How do I create a list of blank fitted model objects for later use in a loop? Murray Jorgensen -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting models in a loop
This (below) also runs into trouble if you try to predict with new data since you have no rule for re-constructing the formula. Also, you can't plot the term as a single contributor to the linear predictor with termplot(). I'm sure given enough ingenuity you can get round these two, but why avoid the language manipulation solution, when it does the lot? Bill. -Original Message- From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] Sent: Wednesday, 2 August 2006 12:01 PM To: Venables, Bill (CMIS, Cleveland) Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; r-help@stat.math.ethz.ch Subject: Re: [R] Fitting models in a loop A simple way around this is to pass it as a data frame. In the code below the only change we made was to change the formula from y ~ poly(x, i) to y ~ . and pass poly(x,i) in a data frame as argument 2 of lm: # test data set.seed(1) x - 1:10 y - x^3 + rnorm(10) # run same code except change the lm call mod - list() for (i in 1:3) { mod[[i]] - lm(y ~., data.frame(poly(x, i))) print(summary(mod[[i]])) } After running the above we can test that it works: for(i in 1:3) print(formula(mod[[i]])) y ~ X1 y ~ X1 + X2 y ~ X1 + X2 + X3 On 8/1/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Markus Gesmann writes: Murray, How about creating an empty list and filling it during your loop: mod - list() for (i in 1:6) { mod[[i]] - lm(y ~ poly(x,i)) print(summary(mod[[i]])) } All your models are than stored in one object and you can use lapply to do something on them, like: lapply(mod, summary) or lapply(mod, coef) I think it is important to see why this deceptively simple solution does not achieve the result that Murray wanted. Take any fitted model object, say mod[[4]]. For this object the formula component of the call will be, literally, y ~ poly(x, i), and not y ~ poly(x, 4), as would be required to use the object, e.g. for prediction. In fact all objects have the same formula. You could, of course, re-create i and some things would be OK, but getting pretty messy. You would still have a problem if you wanted to plot the fit with termplot(), for example, as it would try to do a two-dimensional plot of the component if both arguments to poly were variables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: 01 August 2006 06:16 To: [EMAIL PROTECTED]; r-help@stat.math.ethz.ch Subject: Re: [R] Fitting models in a loop Murray, Here is a general paradigm I tend to use for such problems. It extends to fairly general model sequences, including different responses, c First a couple of tiny, tricky but useful functions: subst - function(Command, ...) do.call(substitute, list(Command, list(...))) abut - function(...) ## jam things tightly together do.call(paste, c(lapply(list(...), as.character), sep = )) Name - function(...) as.name(do.call(abut, list(...))) Now the gist. fitCommand - quote({ MODELi - lm(y ~ poly(x, degree = i), theData) print(summary(MODELi)) }) for(i in 1:6) { thisCommand - subst(fitCommand, MODELi = Name(model_, i), i = i) print(thisCommand) ## only as a check eval(thisCommand) } At this point you should have the results and objects(pat = ^model_) should list the fitted model objects, all of which can be updated, summarised, plotted, c, because the information on their construction is all embedded in the call. Bill. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Murray Jorgensen Sent: Tuesday, 1 August 2006 2:09 PM To: r-help@stat.math.ethz.ch Subject: [R] Fitting models in a loop If I want to display a few polynomial regression fits I can do something like for (i in 1:6) { mod - lm(y ~ poly(x,i)) print(summary(mod)) } Suppose that I don't want to over-write the fitted model objects, though. How do I create a list of blank fitted model objects for later use in a loop? Murray Jorgensen -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting models in a loop
Murray, Here is a general paradigm I tend to use for such problems. It extends to fairly general model sequences, including different responses, c First a couple of tiny, tricky but useful functions: subst - function(Command, ...) do.call(substitute, list(Command, list(...))) abut - function(...) ## jam things tightly together do.call(paste, c(lapply(list(...), as.character), sep = )) Name - function(...) as.name(do.call(abut, list(...))) Now the gist. fitCommand - quote({ MODELi - lm(y ~ poly(x, degree = i), theData) print(summary(MODELi)) }) for(i in 1:6) { thisCommand - subst(fitCommand, MODELi = Name(model_, i), i = i) print(thisCommand) ## only as a check eval(thisCommand) } At this point you should have the results and objects(pat = ^model_) should list the fitted model objects, all of which can be updated, summarised, plotted, c, because the information on their construction is all embedded in the call. Bill. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Murray Jorgensen Sent: Tuesday, 1 August 2006 2:09 PM To: r-help@stat.math.ethz.ch Subject: [R] Fitting models in a loop If I want to display a few polynomial regression fits I can do something like for (i in 1:6) { mod - lm(y ~ poly(x,i)) print(summary(mod)) } Suppose that I don't want to over-write the fitted model objects, though. How do I create a list of blank fitted model objects for later use in a loop? Murray Jorgensen -- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: [EMAIL PROTECTED]Fax 7 838 4155 Phone +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating 3D Gaussian Plot
Getting a picture like this is pretty easy. e.g. x - y - seq(-5, 5, len = 200) X - expand.grid(x = x, y = y) X - transform(X, z = dnorm(x, -2.5)*dnorm(y) - dnorm(x, 2.5)*dnorm(y)) z - matrix(X$z, nrow = 200) persp(x, y, z, col = lightgoldenrod, border = NA, theta = 30, phi = 15, ticktype = detailed, ltheta = -120, shade = 0.25) You can vary things as you wish. I don't follow the remark about picking grid points at random for analysis, though. On simple, entirely deterministic things like this wouldn't you just be analysing the randomness that you inject into it by the choice process, effectively? Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Laura Quinn Sent: Sunday, 29 January 2006 12:28 AM To: Duncan Murdoch Cc: r-help@stat.math.ethz.ch Subject: Re: [R] Creating 3D Gaussian Plot My apologies. With further apologies for the poor graphics, this link demonstrates the sort of 3d mesh which I am hoping to replicate - I would like to be able to replicate a number of these of varying intensity. Demonstrating different levels of potential via the steepness of the slopes. http://maxwell.ucdavis.edu/~electro/potential/images/steep.jpg I then wish to pick a number of grid points at random from the output to perform a further analysis upon. I hope this makes things a little clearer! Again, any help gratefully received, thank you. Laura Quinn Institute of Atmospheric Science School of Earth and Environment University of Leeds Leeds LS2 9JT tel: +44 113 343 1596 fax: +44 113 343 6716 mail: [EMAIL PROTECTED] On Sat, 28 Jan 2006, Duncan Murdoch wrote: On 1/28/2006 8:55 AM, Laura Quinn wrote: Hello, I requested help a couple of weeks ago creating a dipole field in R but receieved no responses. Eventually I opted to create a 3d sinusoidal plot and concatenate this with its inverse as a means for a next best situation. It seems that this isn't sufficient for my needs and I'm really after creating a continuous 3d gaussian mesh with a positive and negative dipole. The names you're using don't mean anything to me; perhaps there just aren't enough atmospheric scientists on the list and that's why you didn't get any response. If you don't get a response this time, you should describe what you want in basic terms, and/or point to examples of it on the web. Duncan Murdoch Can anyone offer any pointers at all? Laura Quinn Institute of Atmospheric Science School of Earth and Environment University of Leeds Leeds LS2 9JT tel: +44 113 343 1596 fax: +44 113 343 6716 mail: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How do you convert this from S Plus to R - any help appreciated . thanks
This particular call to exportData is probably equivalent in effect to the R call: write.table(MU.Cost, file = paste(C:/RAUDSL/S, as.character(MU.Cost$Run.Id[1]), .,as.character(MU.Cost$MU.Id[1]), .MU.PRICE.OUTPUT.txt, sep=), append = FALSE) as Barry sort of guesses, but in general exportData handles a wide variety of data export facilities, including writing to odbc connexions, so other uses of exportData would need the RODBC library in R and functions like sqlSave or sqlUpdate. write.table is also an S-PLUS function as well, but exportData is generally a whole lot faster. Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Barry Rowlingson Sent: Friday, 27 January 2006 7:57 PM To: r-help@stat.math.ethz.ch Subject: Re: [R] How do you convert this from S Plus to R - any help appreciated . thanks Briggs, Meredith M wrote: exportData(MU.Cost,paste(C:/RAUDSL/S,as.character(MU.Cost$Run.Id[1]), .,as.character(MU.Cost$MU.Id[1]),.MU.PRICE.OUTPUT.txt,sep=),append = FALSE,type=ASCII,quote=FALSE) Looks like perfectly good R to me. Except there's no exportData function. I assume this is an Splus function that R doesn't have, in which case telling us what it does might help. What does the Splus manual have to say about it? I'm guessing R's write.table might be of use. Assuming its exportData that has you stuck - the other bits should allwork in R no problem, all it does is construct a path from parts of the MU.Cost object. Barry __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] D(dnorm...)?
Hi Spencer, I think if you have a problem that needs a lot of symbolic manipulation you are probably better off driving it directly from something like Maple or Mathematica (I prefer maple, actually) than trying to drive it from R. It just gets too clumsy. On the other hand it is very handy having a simple differentiator available in R, like D(...) for small jobs that are not worth taking to a big system like Maple. The point I was trying to make in the previous message was that with a little thought it could be made a lot more convenient. This arose in connexion with a real problem. We needed to differentiate a pdf that had the normal density function in it, but was otherwise quite simple and we had to hack the code in another system (not unlike R, as it happens) to handle it. The hack was quite small and it became clear that with a slight change of design users would not need to do hacks like this for simple extensions such as the one we needed. As it was a hack, we only put it in for the standard density and I suspect that is the reason why even now the derivative tables in both R and the other system (not unlike R) only handle normal density and distribution funcitons in one variable. I'm sort of avoiding your question, because I don't know how hard it would be to link R with Yacas, either, but if you really wanted to go that way I see that Yacas can be driven over the net via a Java applet. Something like this might provide the simplest link, if not the most efficient. But note that Yacas uses the eccentric Mathematica notation, where the functions are Capitalised, for example, as nouns are in German. That's a small bother you could do without, too. Regards, Bill Venables. -Original Message- From: Spencer Graves [mailto:[EMAIL PROTECTED] Sent: Thursday, 26 January 2006 2:14 PM To: Venables, Bill (CMIS, Cleveland) Cc: [EMAIL PROTECTED]; r-help@stat.math.ethz.ch; [EMAIL PROTECTED] Subject: Re: [R] D(dnorm...)? Hello, Bill: I'm not qualified to make this suggestion since I'm incapable of turning it into reality, but what about creating a link between R and one of the Mathematica clones like Yacas? I can immagine that it could be substantially more difficult than linking R to other software like Excel, but ... . Spencer Graves [EMAIL PROTECTED] wrote: Yes Bert, this time you are missing something (unusually) ... As Brian Ripley pointed out 'dnorm' is in the derivative table, *but* only as a function of one variable. So if you want to find the derivative of dnorm(x, mean, sigma) you have to write it as 1/sigma * dnorm((x - mu)/sigma). Here is a little example: D(Quote(pnorm((x-mu)/sigma)), x) dnorm((x - mu)/sigma) * (1/sigma) D(D(Quote(pnorm((x-mu)/sigma)), x), mu) (x - mu)/sigma * (dnorm((x - mu)/sigma) * (1/sigma)) * (1/sigma) --- Like Brian, I recall the suggestion that we make D(...) extensible. I still think it is a good idea and worth considering. Under one scheme you would specify an object such as Fnorm - structure(quote(pnorm(x, mu, sigma)), deriv = list(x = Quote(dnorm(x, mu, sigma)/sigms), mu = Quote(-dnorm(x, mu, sigma)/sigma), sigma = Quote(-(x - mu)*dnorm(x, mu, sigma)/sigma^2), class = dfunction) ane write a generic differentiate function with a dfunction method and D as the default. I don't think it's quite that easy, but the plan is clear enough. Bill. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Berton Gunter Sent: Thursday, 26 January 2006 8:58 AM To: 'Spencer Graves'; r-help@stat.math.ethz.ch Subject: Re: [R] D(dnorm...)? dnorm() is an internal function, so I don't see how D (or deriv) can do anything with it symbolically. Am I missing something? -- Bert -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Spencer Graves Sent: Wednesday, January 25, 2006 2:43 PM To: r-help@stat.math.ethz.ch Subject: [R] D(dnorm...)? Can someone help me understand the following: D(expression(dnorm(x, mean)), mean) [1] 0 sessionInfo() R version 2.2.1, 2005-12-20, i386-pc-mingw32 attached base packages: [1] methods stats graphics grDevices utils datasets [7] base By my computations, this should be something like ((mean-x)/sd^2)*dnorm(...). Thanks for your help. Spencer Graves __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing
Re: [R] D(dnorm...)?
While symbolic computation is handy, I actually think a more pressing addition to R is some kind of automatic differentiation facility, particularly 'reverse mode' AD, which can be spectacular. There are free tools available for it as well, though I don't know how well developed they are. See: http://www-unix.mcs.anl.gov/autodiff/AD_Tools/ I admit this is not quite the same thing, but for statistical computations this is, in my experience, the key thing you need. (Well, for frequentist estimation at any rate...) There are commercial systems that use this idea already, of course. Two that I know of are 'ADMB' (and its associated 'ADMB-RE' for random effects) estimation and of course the 'S-NUOPT' module for another system not unlike R. ADMB is, frankly, difficult to use but it performs so well and so quickly once you get it going nothing else seems to come close to it. I has become almost a de-facto standard at the higher end of the fishery stock assessment game, for example, where they are always fitting huge, highly complex and very non-linear models. Bill V. -Original Message- From: Berwin A Turlach [mailto:[EMAIL PROTECTED] On Behalf Of Berwin A Turlach Sent: Thursday, 26 January 2006 4:50 PM To: Spencer Graves Cc: Venables, Bill (CMIS, Cleveland); r-help@stat.math.ethz.ch; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [R] D(dnorm...)? G'day Spencer, SG == Spencer Graves [EMAIL PROTECTED] writes: SG I'm not qualified to make this suggestion since I'm incapable SG of turning it into reality, [...] This statement applies to me too, nevertheless I would like to point out the following GPL library: http://www.gnu.org/software/libmatheval/ I am wondering since some time how hard it would be to incorporate that library into R and let it provide symbolic differentiation capabilities for R. Cheers, Berwin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] D(dnorm...)?
Yes Bert, this time you are missing something (unusually) ... As Brian Ripley pointed out 'dnorm' is in the derivative table, *but* only as a function of one variable. So if you want to find the derivative of dnorm(x, mean, sigma) you have to write it as 1/sigma * dnorm((x - mu)/sigma). Here is a little example: D(Quote(pnorm((x-mu)/sigma)), x) dnorm((x - mu)/sigma) * (1/sigma) D(D(Quote(pnorm((x-mu)/sigma)), x), mu) (x - mu)/sigma * (dnorm((x - mu)/sigma) * (1/sigma)) * (1/sigma) --- Like Brian, I recall the suggestion that we make D(...) extensible. I still think it is a good idea and worth considering. Under one scheme you would specify an object such as Fnorm - structure(quote(pnorm(x, mu, sigma)), deriv = list(x = Quote(dnorm(x, mu, sigma)/sigms), mu = Quote(-dnorm(x, mu, sigma)/sigma), sigma = Quote(-(x - mu)*dnorm(x, mu, sigma)/sigma^2), class = dfunction) ane write a generic differentiate function with a dfunction method and D as the default. I don't think it's quite that easy, but the plan is clear enough. Bill. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Berton Gunter Sent: Thursday, 26 January 2006 8:58 AM To: 'Spencer Graves'; r-help@stat.math.ethz.ch Subject: Re: [R] D(dnorm...)? dnorm() is an internal function, so I don't see how D (or deriv) can do anything with it symbolically. Am I missing something? -- Bert -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Spencer Graves Sent: Wednesday, January 25, 2006 2:43 PM To: r-help@stat.math.ethz.ch Subject: [R] D(dnorm...)? Can someone help me understand the following: D(expression(dnorm(x, mean)), mean) [1] 0 sessionInfo() R version 2.2.1, 2005-12-20, i386-pc-mingw32 attached base packages: [1] methods stats graphics grDevices utils datasets [7] base By my computations, this should be something like ((mean-x)/sd^2)*dnorm(...). Thanks for your help. Spencer Graves __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Making a markov transition matrix
That solution for the case 'with gaps' merely omits transitions where the transition information is not for a single time step. (Mine can be modified for this as well - see below.) But if you know that a firm went from state i in year y to state j in year y+3, say, without knowing the intermediate states, that must tell you something about the 1-step transition matrix as well. How do you use this information? That's a much more difficult problem but you can do it using maximum likelihood, e.g. You think about how to calculate the likelihood function - and then to optimise it. This is getting a bit away from the original 'programming trick' question, but it is an interesting problem that occurs more often than I had realised. I'd be interested in knowing if anyone had done anything slick in this area. Bill Venables. -Original Message- From: Ajay Narottam Shah [mailto:[EMAIL PROTECTED] Sent: Sunday, 22 January 2006 5:15 PM To: R-help Cc: [EMAIL PROTECTED]; Venables, Bill (CMIS, Cleveland) Subject: Re: [R] Making a markov transition matrix On Sun, Jan 22, 2006 at 01:47:00PM +1100, [EMAIL PROTECTED] wrote: If this is a real problem, here is a slightly tidier version of the function I gave on R-help: transitionM - function(name, year, state) { raw - data.frame(name = name, state = state)[order(name, year), ] raw01 - subset(data.frame(raw[-nrow(raw), ], raw[-1, ]), name == name.1) with(raw01, table(state, state.1)) } To modify this solution for the 'with gaps' case, omitting multiple step transitions, you need to include the year in the 'raw' data frame and then just change the subset condition to name == name.1 year == year.1 - 1 Notice that this does assume there are 'no gaps' in the time series within firms, but it does not require that each firm have responses for the same set of years. Estimating the transition probability matrix when there are gaps within firms is a more interesting problem, both statistically and, when you figure that out, computationally. With help from Gabor, here's my best effort. It should work even if there are gaps in the timeseries within firms, and it allows different firms to have responses in different years. It is wrapped up as a function which eats a data frame. Somebody should put this function into Hmisc or gtools or something of the sort. # Problem statement: # # You are holding a dataset where firms are observed for a fixed # (and small) set of years. The data is in long format - one # record for one firm for one point in time. A state variable is # observed (a factor). # You wish to make a markov transition matrix about the time-series # evolution of that state variable. set.seed(1001) # Raw data in long format -- raw - data.frame(name=c(f1,f1,f1,f1,f2,f2,f2,f2), year=c(83, 84, 85, 86, 83, 84, 85, 86), state=sample(1:3, 8, replace=TRUE) ) transition.probabilities - function(D, timevar=year, idvar=name, statevar=state) { merged - merge(D, cbind(nextt=D[,timevar] + 1, D), by.x = c(timevar, idvar), by.y = c(nextt, idvar)) t(table(merged[, grep(statevar, names(merged), value = TRUE)])) } transition.probabilities(raw, timevar=year, idvar=name, statevar=state) -- Ajay Shah http://www.mayin.org/ajayshah [EMAIL PROTECTED] http://ajayshahblog.blogspot.com *(:-? - wizard who doesn't know the answer. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Making a markov transition matrix
If you can be sure that there are no missing years within firms, I think I would do it this way: raw - raw[do.call(order, raw), ] # not needed here raw01 - subset(data.frame(raw[-nrow(raw), ], raw[-1, ]), name == name.1) with(raw01, table(state, state.1)) state.1 state 1 2 3 1 1 0 0 2 0 2 1 3 1 1 0 So what would the general function look like? I suppose transitionM - function(name, year, state) { raw - data.frame(name = name, year = year, state = state) raw - raw[do.call(order, raw), ] # needed in general raw01 - subset(data.frame(raw[-nrow(raw), ], raw[-1, ]), name == name.1) with(raw01, table(state, state.1)) } give it a burl: with(raw, transitionM(name, year, state)) state.1 state 1 2 3 1 1 0 0 2 0 2 1 3 1 1 0 (NB no 'for' loops.) ezy peezy. W. Bill Venables, CMIS, CSIRO Laboratories, PO Box 120, Cleveland, Qld. 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary):+61 7 3826 7304 Mobile (rarely used):+61 4 1963 4642 Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of jim holtman Sent: Sunday, 22 January 2006 11:20 AM To: Ajay Narottam Shah Cc: R-help Subject: Re: [R] Making a markov transition matrix Ignore last reply. I sent the wrong script. set.seed(1001) # Raw data in long format -- raw - data.frame(name=c(f1,f1,f1,f1,f2,f2,f2,f2), + year=c(83, 84, 85, 86, 83, 84, 85, 86), + state=sample(1:3, 8, replace=TRUE) + ) # Shift to wide format -- fixedup - reshape(raw, timevar=year, idvar=name, v.names=state, + direction=wide) trans - as.matrix(fixedup) result - NULL # loop through all the columns and build up the 'result' for (i in 2:(ncol(trans) - 1)){ + result - rbind(result, cbind(name=trans[,1], PREV=trans[,i], NEXT=trans[,i+1])) + } result name PREV NEXT 1 f1 3 2 5 f2 2 3 1 f1 2 2 5 f2 3 1 1 f1 2 2 5 f2 1 1 (markov - table(result[,PREV], result[,NEXT])) 1 2 3 1 1 0 0 2 0 2 1 3 1 1 0 On 1/21/06, jim holtman [EMAIL PROTECTED] wrote: Is this what you want: set.seed(1001) # Raw data in long format -- raw - data.frame(name=c(f1,f1,f1,f1,f2,f2,f2,f2), year=c(83, 84, 85, 86, 83, 84, 85, 86), state=sample(1:3, 8, replace=TRUE) ) # Shift to wide format -- fixedup - reshape(raw, timevar=year, idvar=name, v.names=state, direction=wide) trans - as.matrix(fixedup) result - NULL for (i in 2:(ncol(trans) - 1)){ result - rbind(result, cbind(name=trans[,1], prev=trans[,i], next=trans[,i+1])) } result markov - table(try$prev.state, try$new.state) On 1/21/06, Ajay Narottam Shah [EMAIL PROTECTED] wrote: Folks, I am holding a dataset where firms are observed for a fixed (and small) set of years. The data is in long format - one record for one firm for one point in time. A state variable is observed (a factor). I wish to make a markov transition matrix about the time-series evolution of that state variable. The code below does this. But it's hardcoded to the specific years that I observe. How might one generalise this and make a general function which does this? :-) -ans. set.seed(1001) # Raw data in long format -- raw - data.frame(name=c(f1,f1,f1,f1,f2,f2,f2,f2), year=c(83, 84, 85, 86, 83, 84, 85, 86), state=sample(1:3, 8, replace=TRUE) ) # Shift to wide format -- fixedup - reshape(raw, timevar=year, idvar=name, v.names=state, direction=wide) # Now tediously build up records for an intermediate data structure try - rbind( data.frame(prev=fixedup$state.83, new=fixedup$state.84), data.frame(prev=fixedup$state.84, new=fixedup$state.85), data.frame(prev=fixedup$state.85, new=fixedup$state.86) ) # This is a bad method because it is hardcoded to the specific values # of year. markov - table(destination$prev.state, destination$new.state) -- Ajay Shah http://www.mayin.org/ajayshah [EMAIL PROTECTED] http://ajayshahblog.blogspot.com *(:-? - wizard who doesn't know the answer. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/pos ting-guide.html -- Jim Holtman Cincinnati, OH +1 513 247 0281 What the problem you are trying to solve? -- Jim Holtman Cincinnati, OH +1 513 247 0281 What the problem you are trying to solve? [[alternative HTML version deleted
Re: [R] problems with glm
Most of your problems seem to come from 'link = log' whereas you probably mean 'link = logit' (which is the default. Hence: ## success - c(13,12,11,14,14,11,13,11,12) failure - c(0,0,0,0,0,0,0,2,2) predictor - c(0,80*5^(0:7)) glm(cbind(success,failure) ~ predictor, + family = binomial, #(link=log), + control = glm.control(epsilon=1e-8,trace=TRUE,maxit=50)) Deviance = 7.621991 Iterations - 1 Deviance = 6.970934 Iterations - 2 Deviance = 6.941054 Iterations - 3 Deviance = 6.940945 Iterations - 4 Deviance = 6.940945 Iterations - 5 Call: glm(formula = cbind(success, failure) ~ predictor, family = binomial, control = glm.control(epsilon = 1e-08, trace = TRUE, maxit = 50)) Coefficients: (Intercept)predictor 4.180e+00 -4.106e-07 Degrees of Freedom: 8 Total (i.e. Null); 7 Residual Null Deviance: 12.08 Residual Deviance: 6.941AIC: 15.85 ### Bill Venables, CMIS, CSIRO Laboratories, PO Box 120, Cleveland, Qld. 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary):+61 7 3826 7304 Mobile (rarely used):+61 4 1963 4642 Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Roland R Regoes Sent: Sunday, 15 January 2006 11:04 PM To: r-help@stat.math.ethz.ch Subject: [R] problems with glm Dear R users, I am having some problems with glm. The first is an error message subscript out of bounds. The second is the fact that reasonable starting values are not accepted by the function. To be more specific, here is an example: success - c(13,12,11,14,14,11,13,11,12) failure - c(0,0,0,0,0,0,0,2,2) predictor - c(0,80*5^(0:7)) glm(cbind(success,failure) ~ predictor + 0, + family=binomial(link=log), + control=glm.control(epsilon=1e-8,trace=TRUE,maxit=50)) Deviance = 3.348039 Iterations - 1 Error: subscript out of bounds In addition: Warning message: step size truncated: out of bounds The model with intercept yields: glm(cbind(success,failure) ~ predictor , + family=binomial(link=log), + control=glm.control(epsilon=1e-8,trace=FALSE,maxit=50)) Call: glm(formula = cbind(success, failure) ~ predictor, family = binomial(link = log), control = glm.control(epsilon = 1e-08, trace = FALSE, maxit = 50)) Coefficients: (Intercept)predictor -5.830e-17 -4.000e-08 Degrees of Freedom: 8 Total (i.e. Null); 7 Residual Null Deviance: 12.08 Residual Deviance: 2.889AIC: 11.8 There were 33 warnings (use warnings() to see them) warnings() 1: step size truncated: out of bounds ... 31: step size truncated: out of bounds 32: algorithm stopped at boundary value in: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, ... 33: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, ... Since the intercept in the above fit is fairly small I thought I could use -4e-8 as a reasonable starting value in a model without intercept. But to no avail: glm(cbind(success,failure) ~ predictor + 0, start=-4e-8, + family=binomial(link=log), + control=glm.control(epsilon=1e-8,trace=FALSE,maxit=50)) Error in glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, : cannot find valid starting values: please specify some I am stuck here. Am I doing something wrong when specifying the starting value? I would appreciate any help. (I could not find anything relevant in the documentation of glm and the mailing list archives, but I did not read the source code of glm yet.) Roland PS: I am using R Version 2.2.0 (R Cocoa GUI 1.13 (1915)) on MacOSX 10.4.4 -- --- Roland Regoes Theoretical Biology Universitaetsstr. 16 ETH Zentrum, CHN H76.1 CH-8092 Zurich, Switzerland tel: +41-44-632-6935 fax: +41-44-632-1271 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] reduce levels
sub$mm - factor(sub$mm) is the simplest way to change a single factor in this way. To do a whole data frame you just need a loop: drop_unused_levels - function(data) as.data.frame(lapply(data, function(x) if(is.factor(x)) factor(x) else x )) Here's your example again, but witn a slightly different idiom... test - data.frame(mm = letters[1:3]) sub - subset(test, mm == b) fixedSub - drop_unused_levels(sub) Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Claus Atzenbeck Sent: Tuesday, 8 November 2005 9:52 AM To: R-help Subject: [R] reduce levels Hi all: I have an example that shows my problem: test - data.frame(c(a, b, c)) colnames(test) - mm sub - subset(test, mm==b) sub$mm [1] b Levels: a b c levels(sub$mm) [1] a b c How can I reduce the levels to exclusively those which are part of the data frame? That is in the above named example exclusively b. Reason: I want to iterate exclusively through those levels that exist within a subset, but leave away all others. Thanks for any hint. Claus __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Roots of quadratic system.
Are you looking for a unique solution or families of solutions? Can't you turn a root-finding problem for a system of equations with a unique solution into an optimisation problem, anyway? E.g. You want to solve f1(x) = g1 f2(x) = g2 ... Why not optimise L(x) = (f1(x) - g1)^2 + (f2(x) - g2)^2 + ... with respect to x? If the minimum value is zero, then you are done; if it is greater than zero your original system does not have a solution. If you are in the complex domain the changes needed are obvious. V. : -Original Message- : From: [EMAIL PROTECTED] : [mailto:[EMAIL PROTECTED] On Behalf Of John Janmaat : Sent: Monday, 2 May 2005 12:48 AM : To: r-help@stat.math.ethz.ch : Subject: [R] Roots of quadratic system. : : : Hello, : : I have a system of quadratic equations (results of a : Hamiltonian optimization) : which I need to find the roots for. Is there a package : and/or function which : will find the roots for a quadratic system? Note that I am : not opimizing, but : rather solving the first order conditions which come from a : Hamiltonian. I am : basically looking for something in R that will do the same : thing as fsolve in : Matlab. : : Thanks, : : John. : : == : Dr. John Janmaat : Department of Economics : Acadia University : Tel: 902-585-1461 : : __ : R-help@stat.math.ethz.ch mailing list : https://stat.ethz.ch/mailman/listinfo/r-help : PLEASE do read the posting guide! : http://www.R-project.org/posting-guide.html : __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] How to extract function arguments literally
instead of as.character(substitute(arg)) use deparse(substitute(arg)) : -Original Message- : From: [EMAIL PROTECTED] : [mailto:[EMAIL PROTECTED] On Behalf Of Shigeru Mase : Sent: Saturday, 30 April 2005 1:24 PM : To: r-help@stat.math.ethz.ch : Subject: [R] How to extract function arguments literally : : : Dear all, : : One of my friends asked me if it is possible to extract actual R : function arguments literally (precisely, as strings). The reason is : simple. He feels sometimes awkward to attach quotation marks :-). What : he actually wants is to pass R command arguments to XLisp subroutines : (He has been an enthusiastic XLisp user for a long time and : still tends : to use R as a wrapper to XLisp). Is it possible, or should I : advise him : not to be so lazy? The following is his simple example to explain the : situation. : : R function lc which passes its argument to an Xlisp function: : : lc=function(cmd){ : cmd=as.character(substitute(cmd)) : .XLisp(lcommand, cmd)} : : Corresponding XLisp function lcommand: : : (defun lcommand (str) (eval (with-input-from-string (s str) : (read s : : If the argument cmd is a string (i.e. with quotation marks) or has no : space, it works fine. : : as.character(substitute(abc)) : [1] abc : : But, if it has no quotation marks or contains spaces, it : yileds an error. : : as.character(substitute((def x1 x2)) : : Error: syntax error : : He wants to use the syntax like lc((def x1 x2)), not like : lc((def x1 x2)). : : Thanks in advance. : : == : Shigeru MASE [EMAIL PROTECTED] : Tokyo Institute of Technology : Dept. of Math. and Comp. Sciences : O-Okayama 2-12-1-W8-28, Tokyo, 152-8550, Japan : : __ : R-help@stat.math.ethz.ch mailing list : https://stat.ethz.ch/mailman/listinfo/r-help : PLEASE do read the posting guide! : http://www.R-project.org/posting-guide.html : __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] The eigen function
To me this is not at all surprising. If you read the help info for eigen it says clearly that calculating the eigenvectors is the slow part. So it is entirely likely that completely different algorithms will be used if you are asking for only the eigenvalues, or if you are asking for both eigenvalues and eigenvectors. At least that's what I would do... (You should check what happens with EISPACK = TRUE as well, though.) Bill Venables. : -Original Message- : From: [EMAIL PROTECTED] : [mailto:[EMAIL PROTECTED] On Behalf Of : Louisell, Paul T. : Sent: Tuesday, 26 April 2005 6:26 AM : To: 'r-help@stat.math.ethz.ch' : Subject: [R] The eigen function : : : I'm using R version 2.0.1 on a Windows 2000 operating system. : Here is some : actual code I executed: : : test : [,1] [,2] : [1,] 1000 500 : [2,] 500 250 : eigen(test, symmetric=T)$values : [1] 1.25e+03 -3.153033e-15 : eigen(test, symmetric=T)$values[2] = 0 : [1] FALSE : eigen(test, symmetric=T, only.values=T)$values : [1] 12500 : eigen(test, symmetric=T, only.values=T)$values[2] = 0 : [1] TRUE : : I'm wondering why the 'eigen' function is returning different values : depending on whether the parameter only.values=T. This is : probably some : numerical quirk of the code; it must do things differently : when it has to : compute eigenvectors than it does when only computing : eigenvalues. It's : easily checked that the exact eigenvalues are 1250 and 0. Can : one of the : developers tell me whether this should be regarded as a bug or not? : : Thanks, : : Paul Louisell : Pratt Whitney : Statistician : TechNet: 435-5417 : e-mail: [EMAIL PROTECTED] : : __ : R-help@stat.math.ethz.ch mailing list : https://stat.ethz.ch/mailman/listinfo/r-help : PLEASE do read the posting guide! : http://www.R-project.org/posting-guide.html : __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] question about about the drop1
Ronggui asks: : -Original Message- : From: [EMAIL PROTECTED] : [mailto:[EMAIL PROTECTED] On Behalf Of ronggui : Sent: Saturday, 23 April 2005 5:41 PM : To: r help : Subject: [R] question about about the drop1 : : : the data is : : table.8.3-data.frame(expand.grid( : marijuana=factor(c(Yes,No),levels=c(No,Yes)), : cigarette=factor(c(Yes,No),levels=c(No,Yes)), : alcohol=factor(c(Yes,No),levels=c(No,Yes))), : count=c(911,538,44,456,3,43,2,279)) : : fit3-glm(count ~ .^3, poisson, table.8.3) : sumary(fit3) : ... : Residual deviance: -1.5543e-15 on 0 degrees of freedom : AIC: 65.043 : : drop1(fit3, .~., test=Chisq) Putting in the dummy formula, . ~ ., will force inappropriate terms (marginal terms) to be dropped from the model. : Single term deletions : : Model: : count ~ (marijuana + cigarette + alcohol)^3 : Df DevianceAICLRT Pr(Chi) : none -1.554e-15 65.04 : marijuana1 365.78 428.83 365.78 2.2e-16 *** : ~~ : marijuana:cigarette:alcohol 1 0.37 63.42 0.37 0.54084 : ~ : : update(fit3,.~.-marijuana ) : ... : Residual Deviance: -1.188e-13 AIC: 65.04 :~ Yes. This is the same as the full model, indicating that the update has not changed the model at all. You still have higher order interactions in the model so the main effect term has to be left in (or it's equivalent). : : update(fit3,.~.-marijuana:cigarette:alcohol ) : ... : Residual Deviance: 0.374AIC: 63.42 :~ This update has removed the only non-marginal term from the model and so has made a change. This is why you are seeing the same deviance and AIC here as you saw with drop1. (This line from drop1 was the only appropriate one to be shown.) : : i find 0.374 and 63.42 are equal the result from drop1,but : Residual Deviance: -1.188e-13 AIC: 65.04is not. : : so i wonder why? what does the drop1 work exactly? If you tried drop1(fit3, test = Chisq) at the initial stage you would have only seen the three-way interaction. The other terms you are seeing with your inappropriate invocation of drop1 are not what you think they are. : : thank you ! : : __ : R-help@stat.math.ethz.ch mailing list : https://stat.ethz.ch/mailman/listinfo/r-help : PLEASE do read the posting guide! : http://www.R-project.org/posting-guide.html : __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] A question on the library lme4
Luis Fernando Chaves asks: : -Original Message- : From: [EMAIL PROTECTED] : [mailto:[EMAIL PROTECTED] On Behalf Of Luis : Fernando Chaves : Sent: Sunday, 24 April 2005 11:29 AM : To: [EMAIL PROTECTED] : Subject: [R] A question on the library lme4 : : : Hi, : : I ran the following model using nlme: : : model2 - lme(log(malrat1) ~ I(year-1982), : random = ~1|Continent/Country, data=wbmal10) : : I'm trying to run a Poisson GlMM to avoid the above : transformation but I : don't know how to specify the model using lmer in the lme4 library: : : model3 - lmer((malrat1) ~ I(year-1982) + ??, : data=wbmal10, family=poisson) (I asked the same question of Doug Bates, as it happens.) Try the following: wbmal10 - transform(wbmal10, Cont_Country = Continent:Country) require(lme4) model3 - lmer(malrat1 ~ I(year - 1982) + (1|Continent) + (1|Cont_Country), family = poisson, data = wbmal10) : : How can I introduce a random factor of the form= ~1|Continent/Country? : : Thanks; : : -- : Luis Fernando Chaves : : __ : R-help@stat.math.ethz.ch mailing list : https://stat.ethz.ch/mailman/listinfo/r-help : PLEASE do read the posting guide! : http://www.R-project.org/posting-guide.html : __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Anova - interpretation of the interaction term
: -Original Message- : From: [EMAIL PROTECTED] : [mailto:[EMAIL PROTECTED] On Behalf Of : michael watson (IAH-C) : Sent: Friday, 22 April 2005 7:47 PM : To: r-help@stat.math.ethz.ch : Subject: [R] Anova - interpretation of the interaction term : : : Hi : : So carrying on my use of analysis of variance to check for the effects : of two factors. It's made simpler by the fact that both my : factors have : only two levels each, creating four unique groups. : : I have a highly significant interaction term. In the context of the : experiment, this makes sense. I can visualise the data : graphically, and : sure enough I can see that both factors have different effects on the : data DEPENDING on what the value of the other factor is. : : I explain this all to my colleague - and she asks but which ones are : different? This is best illustrated with an example. We have either : infected | uninfected, and vaccinated | unvaccinated (the two : factors). : We're measuring expression of a gene. Graphically, in the infected : group, vaccination makes expression go up. In the uninfected group, : vaccination makes expression go down. In both the vaccinated and : unvaccinated groups, infection makes expression go down, but it goes : down further in unvaccinated than it does in vaccinated. : : So from a statistical point of view, I can see exactly why the : interaction term is significant, but what my colleage wants to know is : that WITHIN the vaccinated group, does infection decrease expression : significantly? And within the unvaccinated group, does infection : decrease expression significantly? Etc etc etc Can I get this : information from the output of the ANOVA, or do I carry out a separate : test on e.g. just the vaccinated group? (seems a cop out to me) No, you can't get this kind of specific information out of the anova table and yes, anova tables *are* a bit of a cop out. (I sometimes think they should only be allowed between consenting adults in private.) What you are asking for is a non-standard, but perfectly reasonable partition of the degrees of freedom between the classes of a single factor with four levels got by pairing up the levels of vaccination and innoculation. Of course you can get this information, but you have to do a bit of work for it. Before I give the example which I don't expect too many people to read entirely, let me issue a little challenge, namely to write tools to automate a generalized version of the procedure below. Here is the example, (drawing from the explanation given in a certain book, to wit chapter 6): dat - expand.grid(vac = c(N, Y), inf = c(-, +)) dat - rbind(dat, dat) # to get a bit of replication Now we make a 4-level factor from vaccination and infection and generate a bit of data with an infection effect built into it: dat - transform(dat, vac_inf = vac:inf, y = as.numeric(inf) + rnorm(8)) dat vac inf vac_inf y 1N - N:- 0.2285096 2Y - Y:- 1.3504610 3N + N:+ 2.5581254 4Y + Y:+ 2.9208313 11 N - N:- -0.8403039 21 Y - Y:- -0.2440574 31 N + N:+ 2.4844055 41 Y + Y:+ 2.0772671 Now give the joint factor contrasts reflecting the partition we want to effect: levels(dat$vac_inf) [1] N:- N:+ Y:- Y:+ m - matrix(scan(), ncol = 4, byrow = T) 1: -1 1 0 0 5: 0 0 -1 1 9: 1 1 -1 -1 13: Read 12 items fractions(ginv(m)) ## just to see what it looks like [,1] [,2] [,3] [1,] -1/20 1/4 [2,] 1/20 1/4 [3,]0 -1/2 -1/4 [4,]0 1/2 -1/4 Note that we could have simply used t(m), but this is not always possible. Associate these contrasts, fit and analyse: contrasts(dat$vac_inf) - ginv(m) gm - aov(y ~ vac_inf, dat) summary(gm) Df Sum Sq Mean Sq F value Pr(F) vac_inf 3 12.1294 4.0431 7.348 0.04190 Residuals4 2.2009 0.5502 This doesn't tell us too much other than there are differences, probably. Now to specify the partition: summary(gm, split = list(vac_inf = list(- vs +|N = 1, - vs +|Y = 2))) Df Sum Sq Mean Sq F value Pr(F) vac_inf 3 12.1294 4.0431 7.3480 0.04190 vac_inf: - vs +|N 1 7.9928 7.9928 14.5262 0.01892 vac_inf: - vs +|Y 1 3.7863 3.7863 6.8813 0.05860 Residuals4 2.2009 0.5502 As expected, infection changes the mean for both vaccinated and unvaccinated, as we arranged when we generated the data. : : Many thanks, and sorry, but it's Friday. : : Mick : : __ : R-help@stat.math.ethz.ch mailing list : https://stat.ethz.ch/mailman/listinfo/r-help : PLEASE do read the posting guide! : http://www.R-project.org/posting-guide.html : __ R-help@stat.math.ethz.ch mailing list
RE: [R] Using R to illustrate the Central Limit Theorem
Here's a bit of a refinement on Ted's first suggestion. N - 1 graphics.off() par(mfrow = c(1,2), pty = s) for(k in 1:20) { m - (rowMeans(matrix(runif(M*k), N, k)) - 0.5)*sqrt(12*k) hist(m, breaks = FD, xlim = c(-4,4), main = k, prob = TRUE, ylim = c(0,0.5), col = lemonchiffon) pu - par(usr)[1:2] x - seq(pu[1], pu[2], len = 500) lines(x, dnorm(x), col = red) qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ., col = blue) abline(0, 1, col = red) Sys.sleep(1) } -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Friday, 22 April 2005 4:48 AM To: Paul Smith Cc: r-help@stat.math.ethz.ch Subject: RE: [R] Using R to illustrate the Central Limit Theorem On 21-Apr-05 Paul Smith wrote: Dear All I am totally new to R and I would like to know whether R is able and appropriate to illustrate to my students the Central Limit Theorem, using for instance 100 independent variables with uniform distribution and showing that their sum is a variable with an approximated normal distribution. Thanks in advance, Paul Similar to Francisco's suggestion: m-numeric(1); for(k in (1:20)){ for(i in(1:1)){m[i]-(mean(runif(k))-0.5)*sqrt(12*k)} hist(m,breaks=0.3*(-15:15),xlim=c(-4,4),main=sprintf(%d,k)) } (On my slowish laptop, this ticks over at a satidfactory rate, about 1 plot per second. If your mahine is much faster, then simply increase 1 to a larger number.) The real problem with demos like this, starting with the uniform distribution, is that the result is, to the eye, already approximately normal when k=3, and it's only out in the tails that the improvement shows for larger values of k. This was in fact the way we used to simulate a normal distribution in the old days: look up 3 numbers in Kendall Babington-Smith's Tables of Random Sampling Numbers, which are in effect pages full of integers uniform on 00-99, and take their mean. It's the one book I ever encountered which contained absolutely no information -- at least, none that I ever spotted. A more dramatic illustration of the CLT effect might be obtained if, instead of runif(k), you used rbinom(k,1,p) for p 0.5, say: m-numeric(1); p-0.75; for(j in (1:50)){ k-j*j for(i in(1:1)){m[i]-(mean(rbinom(k,1,p))-p)/sqrt(p*(1-p)/k)} hist(m,breaks=41,xlim=c(-4,4),main=sprintf(%d,k)) } Cheers, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 21-Apr-05 Time: 19:48:05 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] generalized linear mixed models - how to compare?
Andrew Criswell asks: Hello All: Should I conclude from this discussion that there is no practical means by which nested generalized mixed models can be compared from output produced through glmmPQL or GLMM? [WNV] The picture is, in my view, not as bleak as this, but there are are certainly many open questions in this area and much research left to do. What is one then to do??? [WNV] Research in statistics, perhaps? Every little bit helps. It is a mistake to assume that everything is known about even the common approximations used in statistical practice, and this area is still opening up. Andrew On Sun, 17 Apr 2005, Deepayan Sarkar wrote: On Sunday 17 April 2005 08:39, Nestor Fernandez wrote: I want to evaluate several generalized linear mixed models, including the null model, and select the best approximating one. I have tried glmmPQL (MASS library) and GLMM (lme4) to fit the models. Both result in similar parameter estimates but fairly different likelihood estimates. My questions: 1- Is it correct to calculate AIC for comparing my models, given that they use quasi-likelihood estimates? If not, how can I compare them? 2- Why the large differences in likelihood estimates between the two procedures? The likelihood reported by glmmPQL is wrong, as it's the likelihood of an incorrect model (namely, an lme model that approximates the correct glmm model). Actually glmmPQL does not report a likelihood. It returns an object of class lme, but you need to refer to the reference for how to interpret that. It *is* support software for a book. GLMM uses (mostly) the same procedure to get parameter estimates, but as a final step calculates the likelihood for the correct model for those estimates (so the likelihood reported by it should be fairly reliable). Well, perhaps but I need more convincing. The likelihood involves many high-dimensional non-analytic integrations, so I do not see how GLMM can do those integrals -- it might approximate them, but that would not be `calculates the likelihood for the correct model'. It would be helpful to have a clarification of this claim. (Our experiments show that finding an accurate value of the log-likelihood is difficult and many available pieces of software differ in their values by large amounts.) Further, since neither procedure does ML fitting, this is not a maximized likelihood as required to calculate an AIC value. And even if it were, you need to be careful as often one GLMM is a boundary value for another, in which case the theory behind AIC needs adjustment. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Andrew R. Criswell, Ph.D. Graduate School, Bangkok University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] array indexing and which
You need to think about it just a bit harder. [Hint: what happens if you leave out the first 'which' and just make ids - (d[, 1] 0) does it work then...?] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Werner Wernersen Sent: Monday, 18 April 2005 3:13 AM To: r-help@stat.math.ethz.ch Subject: [R] array indexing and which Hi R friends! I am stuck with a stupid question: I can circumvent it but I would like to understand why it is wrong. It would be nice if you could give me a hint... I have an 2D array d and do the following: ids - which(d[,1]0) then I have a vector gk with same column size as d and do: ids2 - which(gk[ids]==1) but I can't interprete the result I get in ids2. I get the expected result when I use: which(gk==1 d[,1]0) Why is the first version wrong? The reason why I try to use the ids vectors is that I want to avoid recomputation. Thanks for your help! Werner __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] nls segmented model with unknown joint points
This is how I'd write the formula for use with nls/nlme: y ~ b41*(x - 1) + b42*(x^2 - 1) + ifelse((a41 - x) = 0, b43*(a41 - x)^2, 0) + ifelse((a42 - x) = 0, b44*(a42 - x)^2, 0) This is a direct translation from your funny foreign-looking code below that probably makes it clear what's going on. A more swish R form might be y ~ b41*(x - 1) + b42*(x^2 - 1) + b43*pmax(a41 - x, 0)^2 + b44*pmax(a42 - x, 0)^2 You mention nlm, too. Here you would use a function rather than a formula, but the idea is the same. V. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of andy Sent: Sunday, 17 April 2005 1:09 PM To: r-help@stat.math.ethz.ch Subject: [R] nls segmented model with unknown joint points Hello, I am interested in fitting a segmented model with unknown joint points in nls and perhaps eventually in nlme. I can fit this model in sas (see below, joint points to be estimated are a41 and a41), but am unsure how to specify this in the nlm function. I would really appreciate any suggestions or example code. Thanks a lot. -andy proc nlin data=Stems.Trees; params b41=-3 b42=1.5 b43=-1.5 b44=50 a41=0.75 a42=0.1; term1 = (b41*(x - 1) + b42*(x**2 -1)); if (a41 - x) = 0 then term2 = (b43*(a41 - x)**2); else term2 = 0; if (a42 - x) =0 then term3 = (b44*(a42 - x)**2); else term3 = 0; model y = term1+term2+term3; run; __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Getting subsets of a data frame
You should look at ?[ and look very carefully at the drop argument. For your example sw[, 1] is the first component of the data frame, but sw[, 1, drop = FALSE] is a data frame consisting of just the first component, as mathematically fastidious people would expect. This is a convention, and like most arbitrary conventions it can be very useful most of the time, but some of the time it can be a very nasty trap. Caveat emptor. Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Fernando Saldanha Sent: Saturday, 16 April 2005 1:07 PM To: Submissions to R help Subject: [R] Getting subsets of a data frame I was reading in the Reference Manual about Extract.data.frame. There is a list of examples of expressions using [ and [[, with the outcomes. I was puzzled by the fact that, if sw is a data frame, then sw[, 1:3] is also a data frame, but sw[, 1] is just a vector. Since R has no scalars, it must be the case that 1 and 1:1 are the same: 1 == 1:1 [1] TRUE Then why isn't sw[,1] = sw[, 1:1] a data frame? FS __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] String in data frame
str - as.character(d$name) should do the trick. (damn... my shift key just broke as well...) bill venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Cuichang Zhao Sent: Saturday, 16 April 2005 2:00 PM To: r-help@stat.math.ethz.ch Subject: [R] String in data frame hello, how can take the string in the data frame. right now i have a table that create as a data frame and stored in the file called data.xls and now i want to read data frame as a table in my another r program, i used the following command: the first column of the data frame is just one number called num, but the second one a list of string, called name. d - read.table(data.xls, header = T); what i get for d is a table of 2 columns: num and name: the one for d$num is just a normal list without any levels with it, and this is what i want. but the second d$name is a list with a levels with it. the list i have for d$name is: d$name = (bal bal bal bal bal bal), levlels:bal, however, when i want to have for following code, something different happens: namelist - NA; a - d$name[1]; #this will outputs a = bal, lelvels:bal namelist - c(namelist, a); #this does not outptu (NA, bal), instead it outputs (NA, 1), if i keep #adding a to the namelist, it keeps adding 1 to the namelist instead of bal. However, i want to add bal to the namelist, not 1, so how i can do this? Thank you very much. Cuichang Zhao April 15, 2005 - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] chronological ordering of factor in lm() and plot()
First put day.names - c(sun, mon, tue, wed, thu, fri, sat) then days - factor(as.character(days), levels = day.names) will ensure the ordering you want. Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Eric C. Jennings Sent: Saturday, 16 April 2005 2:29 PM To: R-help Subject: [R] chronological ordering of factor in lm() and plot() I am trying to do some basic regression and ANOVA on cycle times (numeric vectors) across weekdays (character vector), where I have simply labelled my days as: days- c(mon,tue,wed...etc). (NOTE: There are actually multiple instances of each day, and the data is read-in from a .dat file.) I have no trouble at all with the actual number crunching, It is the proper ordering of the factor that I am asking about. R first alphabetizes it(fri,mon,thu...) before doing the work of lm(), aov() and especially plot(). I have tried as.ordered(factor( )), but that doesn't do anything. If I re-assign levels() in the way that I want, that just renames the the levels of the factor but does not reorder it internally. I've looked at chron(), but that seems to entail using a numeric vector instead of a character vector. How can I get it to properly (chronologically) order the factor. (In some ways I'm thinking that all I can do is: days- c(a.mon,b.tues,c.wed...etc) Thanks for all that you can do Eric Jennings [EMAIL PROTECTED] [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] factors in multinom function (nnet)
Alexandre, I have a couple of remarks to make, not all of which you might find immediately helpful, I regret to say. * The choice between using predictors linearly or in factor versions is a modelling choice that is in no way specific to multinom. It is a general aspect of modelling that has to be faced in a whole variety of situations. Indeed the full spectrum of choices is much wider than this: linear, polynomials, splines, different sorts of splines, harmonic terms, factors, ... In fact the idea behind gam's was really to allow some of this extensive field of choices to be model driven, but I digress. Point 1: you need to learn about modelling first and then apply it to multinom. * It is curious to me that someone could be interested in multinomial models per se. Usually people have a context where multinomial models might be one approach to describing the situation in a statistically useful way. Another could be something like classification trees. The context is really what decides what modelling choices of this kind might be sensible. * There is an obvious suggestion for one reference, a certain notorious blue and yellow book for which multinom is part of the support software. I believe they discuss some of the alternatives as well, like classification trees, and some of the principles of modelling, but it's been a while since I read it... * Frank Harrell recently issued an excellent article on this list on brain surgery in a hurry to which you may usefully refer. I believe it was on April 1. Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Alexandre Brito Sent: Wednesday, 13 April 2005 8:20 AM To: r-help@stat.math.ethz.ch Subject: [R] factors in multinom function (nnet) Dear All: I am interested in multinomial logit models (function multinon, library nnet) but I'm having troubles in choose whether to define the predictors as factors or not. I had posted earlier this example (thanks for the reply ronggui): worms - data.frame(year = rep(2000:2004, c(3,3,3,3,3)), age = rep(1:3, 5), mud = c(2,5,0,8,7,7,5,9,14,12,8,7,5,13,11), sand = c(4,7,13,4,14,13,20,17,15,23,20,9,35,27,18), rocks = c(2,6,7,9,3,2,2,10,5,19,13,17,11,20,29)) k - as.matrix(worms[,3:5]) (mud, sand and rocks are factors; age and year are predictors) Now there are several possibilities: m1 - multinom(k ~ year+age, data = worms) m2 - multinom(k ~ factor(year)+age, data = worms) m3 - multinom(k ~ year+factor(age), data = worms) m4 - multinom(k ~ factor(year)+factor(age), data = worms) m5 - multinom(k ~ year:age, data = worms) m6 - multinom(k ~ year*age, data = worms) m7 - multinom(k ~ factor(year):age, data=worms) m8 - multinom(k ~ year:factor(age), data=worms) and so on. I am far from an expert on this, and I would like to learn more about the utilization of multinom function in R and the kind of doubts I described above. So I hope that someone can recommend me some references in this matter (internet, books...) if any is available. Thanks in advance, best wishes Alexandre __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] glm family=binomial logistic sigmoid curve problem
Couple of points: * If you provide relative frequencies for the binomial response, you need also to give weights so that the actual counts can be reconstructed. This is what the warning message is telling you: if you reconstruct the counts using the default (unity) weights, the counts are not integers... In this case the simplest work-around is to use a quasibinomial family, which at least shuts up the warning message. * predict with glm objects, by default, predicts *linear predictors*. You need to predict responses. Here's how I would (minimally) correct your script: year - c(2003+(6/12), 2004+(2/12), 2004+(10/12), 2005+(4/12)) percent - c(0.31, 0.43, 0.47, 0.50) plot(year, percent, xlim = c(2003, 2007), ylim = c(0, 1)) Lm - lm(percent ~ year) abline(Lm) bm - glm(percent ~ year, family = quasibinomial) points(year, fitted(bm), pch = 3) curve(predict(bm, data.frame(year = x), type = resp), add = TRUE) Supplementary points: * It is a good idea to work with data frames, despite the fact that you need not. * Using lm as the name for a fitted linear model object can cause problems, not to say confusion. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of James Salsman Sent: Monday, 11 April 2005 4:51 PM To: r-help@stat.math.ethz.ch Subject: [R] glm family=binomial logistic sigmoid curve problem I'm trying to plot an extrapolated logistic sigmoid curve using glm(..., family=binomial) as follows, but neither the fitted() points or the predict()ed curve are plotting correctly: year - c(2003+(6/12), 2004+(2/12), 2004+(10/12), 2005+(4/12)) percent - c(0.31, 0.43, 0.47, 0.50) plot(year, percent, xlim=c(2003, 2007), ylim=c(0, 1)) lm - lm(percent ~ year) abline(lm) bm - glm(percent ~ year, family=binomial) Warning message: non-integer #successes in a binomial glm! in: eval(expr, envir, enclos) points(year, fitted(bm), pch=+) NULL curve(predict(bm, data.frame(year=x)), add=TRUE) All four of the binomial-fitted points fall exactly on the simple linear regression line, and the predict() curve is nowhere near any of the data points. What am I doing wrong? What does the warning mean? Do I need more points? I am using R on Windows, Version 2.0.1 (2004-11-15) Thank you for your kind help. Sincerely, James Salsman __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Re: beta distribution in terms of it's mean and standarddeviation
For three of the beta distribution functions in R the parameters defining the distribution are alpha, beta and 'ncp'. Pretty trivially, there is no bijection between these three and the mean and variance, but for the special case of ncp = 0, I think there is. Rather than just write it down, it's probably a good idea to see how to get it. Note that mu = alpha/(alpha+beta) s2 = alpha*beta/((alpha+beta)^2*(alpha+beta+1)) = mu*(1-mu)/(alpha+beta+1) So as an intermediate result put alpha + beta = mu*(1-mu)/s2 - 1 (= ab, say, which must be positive or you are in trouble) giving alpha = mu*ab beta = (1-mu)*ab Go forth and write your versions of the (central) beta distribution support functions... Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tolga Uzuner Sent: Monday, 11 April 2005 9:01 AM To: [EMAIL PROTECTED] Cc: r-help@stat.math.ethz.ch Subject: [R] Re: beta distribution in terms of it's mean and standarddeviation Tolga Uzuner wrote: Hi, Is the beta distribution implemented in terms of it's mean and standard deviation, as opposed to alpha and beta, in any R package ? Thanks Tolga Hmm... answering my own question... guess there is no bijection between {alpha,beta} and {mean,variance} which is why... ocurred to me after I sent the question, unless someone disagrees. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] plotting Principal components vs individual variables.
At the cost of breaking the thread I'm going to change your subject and replace 'Principle' by 'Principal'. I just can't stand it any longer... OK, here is how I would solve your other problems. First put wh - c(USA, New Zealand, Dominican Republic, Western Samoa, Cook Islands) ind - match(wh, row.names(running)) Now 'ind' has the indices of the special set as numbers. You need this because although your data frame is indexed by the country names, your 'principal' [NB] component scores are not. Plot the scores against the first variable: plot(running$X100m, running.pca$scores[, 1]) # you got this far Finally, pick out the special set: points(running$X100m[ind], running.pca$scores[ind, 1], pch=4, col=red, cex=2) text(running$X100m[ind], running.pca$scores[ind, 1], pos = 3, cex = 0.7) # optional and you can forget all about the subset data frame running2. Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Brett Stansfield Sent: Monday, 11 April 2005 10:04 AM To: R help (E-mail) Subject: [R] plotting Principle components vs individual variables. Dear R, I'm trying to plot the first principle component of an analysis vs the first variable but am having trouble. I have no trouble doing the initial plot but have difficulty thereafter. First I want to highlight some points of the following data set list(running) [[1]] X100m X200m X400m X800m X1500m X5K X10K Marathon Argentina 10.39 20.81 46.84 1.81 3.70 14.04 29.36 137.72 Australia 10.31 20.06 44.84 1.74 3.57 13.28 27.66 128.30 Austria10.44 20.81 46.82 1.79 3.60 13.26 27.72 135.90 Belgium10.34 20.68 45.04 1.73 3.60 13.22 27.45 129.95 Bermuda10.28 20.58 45.91 1.80 3.75 14.68 30.55 146.62 Brazil 10.22 20.43 45.21 1.73 3.66 13.62 28.62 133.13 Burma 10.64 21.52 48.30 1.80 3.85 14.45 30.28 139.95 Canada 10.17 20.22 45.68 1.76 3.63 13.55 28.09 130.15 Chile 10.34 20.80 46.20 1.79 3.71 13.61 29.30 134.03 China 10.51 21.04 47.30 1.81 3.73 13.90 29.13 133.53 Columbia 10.43 21.05 46.10 1.82 3.74 13.49 27.88 131.35 Cook Islands 12.18 23.20 52.94 2.02 4.24 16.70 35.38 164.70 Costa Rica 10.94 21.90 48.66 1.87 3.84 14.03 28.81 136.58 Czechoslovakia 10.35 20.65 45.64 1.76 3.58 13.42 28.19 134.32 Denmark10.56 20.52 45.89 1.78 3.61 13.50 28.11 130.78 Dominican Republic 10.14 20.65 46.80 1.82 3.82 14.91 31.45 154.12 Finland10.43 20.69 45.49 1.74 3.61 13.27 27.52 130.87 France 10.11 20.38 45.28 1.73 3.57 13.34 27.97 132.30 East Germany 10.12 20.33 44.87 1.73 3.56 13.17 27.42 129.92 West Germany 10.16 20.37 44.50 1.73 3.53 13.21 27.61 132.23 United Kingdom 10.11 20.21 44.93 1.70 3.51 13.01 27.51 129.13 Greece 10.22 20.71 46.56 1.78 3.64 14.59 28.45 134.60 Guatemala 10.98 21.82 48.40 1.89 3.80 14.16 30.11 139.33 Hungary10.26 20.62 46.02 1.77 3.62 13.49 28.44 132.58 India 10.60 21.42 45.73 1.76 3.73 13.77 28.81 131.98 Indonesia 10.59 21.49 47.80 1.84 3.92 14.73 30.79 148.83 Ireland10.61 20.96 46.30 1.79 3.56 13.32 27.81 132.35 Israel 10.71 21.00 47.80 1.77 3.72 13.66 28.93 137.55 Italy 10.01 19.72 45.26 1.73 3.60 13.23 27.52 131.08 Japan 10.34 20.81 45.86 1.79 3.64 13.41 27.72 128.63 Kenya 10.46 20.66 44.92 1.73 3.55 13.10 27.38 129.75 South Korea10.34 20.89 46.90 1.79 3.77 13.96 29.23 136.25 North Korea10.91 21.94 47.30 1.85 3.77 14.13 29.67 130.87 Luxembourg 10.35 20.77 47.40 1.82 3.67 13.64 29.08 141.27 Malaysia 10.40 20.92 46.30 1.82 3.80 14.64 31.01 154.10 Mauritius 11.19 22.45 47.70 1.88 3.83 15.06 31.77 152.23 Mexico 10.42 21.30 46.10 1.80 3.65 13.46 27.95 129.20 Netherlands10.52 20.95 45.10 1.74 3.62 13.36 27.61 129.02 New Zealand10.51 20.88 46.10 1.74 3.54 13.21 27.70 128.98 Norway 10.55 21.16 46.71 1.76 3.62 13.34 27.69 131.48 Papua New Guinea 10.96 21.78 47.90 1.90 4.01 14.72 31.36 148.22 Philippines10.78 21.64 46.24 1.81 3.83 14.74 30.64 145.27 Poland 10.16 20.24 45.36 1.76 3.60 13.29 27.89 131.58 Portugal 10.53 21.17 46.70 1.79 3.62 13.13 27.38 128.65 Rumania10.41 20.98 45.87 1.76 3.64 13.25 27.67 132.50 Singapore 10.38 21.28 47.40 1.88 3.89 15.11 31.32 157.77 Spain 10.42 20.77 45.98 1.76 3.55 13.31 27.73 131.57 Sweden 10.25 20.61 45.63 1.77 3.61 13.29 27.94 130.63 Switzerland10.37 20.46 45.78 1.78 3.55 13.22 27.91 131.20 Taiwan 10.59
RE: [R] axis colors in pairs plot
Hi Anne, Here's one suggestion, use a simple panel function: cols - c(red, green3, blue) with(iris, pairs(iris[, -5], main = Andersons Iris Data - 3 species, panel = function(x, y, ...) points(x, y, pch = (2:4)[Species], col = cols[Species], ...) )) Bill Venables -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Anne York Sent: Friday, 8 April 2005 8:51 AM To: Help R Subject: [R] axis colors in pairs plot The following command produces red axis line in a pairs plot: pairs(iris[1:4], main = Anderson's Iris Data -- 3 species, pch = +, col = c(red, green3, blue)[unclass(iris$Species)]) Trying to fool pairs in the following way produces the same plot as above: pairs(iris[1:4], main = Anderson's Iris Data -- 3 species,pch = +, col = c(black, red, green3, blue)[ 1+ unclass(iris$Species)]) One very kludgy work-around is to define a new level 1, say foo in the first row of iris: iris2=iris iris2$Species = as.character(iris2$Species) iris2$Species[1]=foo iris2$Species = factor(iris2$Species) pairs(iris2[1:4], main = Anderson's Iris Data -- 3 species, pch = +, col = c( black,red, green3,blue)[ unclass(iris2$Species)]) However, if any other row is redefined, the red-axis persists. For example: iris2=iris iris2$Species = as.character(iris2$Species) iris2$Species[3]=foo iris2$Species = factor(iris2$Species) pairs(iris2[1:4], main = Anderson's Iris Data -- 3 species, pch = +, col = c( black,red, green3,blue)[ unclass(iris2$Species)]) I'd appreciate suggestions for a simpler work-around. Thanks, Anne __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] two methods for regression, two different results
This is possible if x and z are orthogonal, but in general it doesn't work as you have noted. (If it did work it would almost amount to a way of inverting geenral square matrices by working one row at a time, no going back...) It is possible to fit a bivariate regression using simple linear regression techniques iteratively like this, but it is a bit more involved than your two step process. 1. regress y on x and take the residuals: ryx - resid(lm(y ~ x)) 2. regress z on x and take the residuals: rzx - resid(lm(z ~ x)) 3. regress ryx on rzx: fitz - lm(ryx ~ rzx) 4. this gives you the estimate of the coefficient on z (what you call below b2): b2 - coef(fitz)[2] 5. regress y - b2*z on x: fitx - lm(I(y - b2*z) ~ x) This last step gets you the estimates of b0 and b1. None of this works with significances, though, because in going about it this way you have essentially disguised the degrees of freedom involved. So you can get the right estimates, but the standard errors, t-statistics and residual variances are all somewhat inaccurate (though usually not by much). If x and z are orthogonal the (curious looking) step 2 is not needed. This kind of idea lies behind some algorithms (e.g. Stevens' algorithm) for fitting very large regressions essentially by iterative processes to avoid constructing a huge model matrix. Bill Venables -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John Sorkin Sent: Wednesday, 6 April 2005 12:55 PM To: r-help@stat.math.ethz.ch Subject: [R] two methods for regression, two different results Please forgive a straight stats question, and the informal notation. let us say we wish to perform a liner regression: y=b0 + b1*x + b2*z There are two ways this can be done, the usual way, as a single regression, fit1-lm(y~x+z) or by doing two regressions. In the first regression we could have y as the dependent variable and x as the independent variable fit2-lm(y~x). The second regrssion would be a regression in which the residuals from the first regression would be the depdendent variable, and the independent variable would be z. fit2-lm(fit2$residuals~z) I would think the two methods would give the same p value and the same beta coefficient for z. The don't. Can someone help my understand why the two methods do not give the same results. Additionally, could someone tell me when one method might be better than the other, i.e. what question does the first method anwser, and what question does the second method answer. I have searched a number of textbooks and have not found this question addressed. Thanks, John John Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics Baltimore VA Medical Center GRECC and University of Maryland School of Medicine Claude Pepper OAIC University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 410-605-7119 -- NOTE NEW EMAIL ADDRESS: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] how to draw a 45 degree line on qqnorm() plot?
Remember this is just a diagnostic procedure and all you are really looking for is whether the normal scores plot is pretty straight. The reason for anchoring the guiding line at the quartiles is really the same reason that boxplots use quartiles for the central chunk. You don't want the guiding line to be influenced by the tails too much. It's a robustness thing. In order for the 45 degree line to be useful, you have to *standardize* your residuals so that they look like *standard* normal residuals (just as the normal scores are standardized). If you do that then the 45 degree line is likely to be useful, but no more so in practice than the usual qqline, which applies irrespective of standardization. Bill Venables. -Original Message- From: Terry Mu [mailto:[EMAIL PROTECTED] Sent: Sunday, 3 April 2005 6:10 PM To: Venables, Bill (CMIS, Cleveland) Cc: r-help@stat.math.ethz.ch Subject: Re: [R] how to draw a 45 degree line on qqnorm() plot? Thank you. I didn't know scale(). Qqline passes through 1st and 3rd quantiles, It doesn't seem very useful to me. I thought a diagnol line will demonstrate the deviation from a standard normal. Correct me if I was wrong. Thanks. On Apr 3, 2005 2:40 AM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: You didn't try very hard. Try this, look at it and think about it: jj - scale(sample(1:100, 10)) qqnorm(jj) abline(0, 1) Rather than abline, however, most people, however, would use qqline(jj) in which case you don't need the scaling. V. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Terry Mu Sent: Sunday, 3 April 2005 4:00 PM To: R-Help Subject: [R] how to draw a 45 degree line on qqnorm() plot? # I can not draw a 45 degree line on a qqnorm() plot, jj - sample(c(1:100), 10) qqnorm(jj) abline() don't work. Thank you. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] how to draw a 45 degree line on qqnorm() plot?
You didn't try very hard. Try this, look at it and think about it: jj - scale(sample(1:100, 10)) qqnorm(jj) abline(0, 1) Rather than abline, however, most people, however, would use qqline(jj) in which case you don't need the scaling. V. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Terry Mu Sent: Sunday, 3 April 2005 4:00 PM To: R-Help Subject: [R] how to draw a 45 degree line on qqnorm() plot? # I can not draw a 45 degree line on a qqnorm() plot, jj - sample(c(1:100), 10) qqnorm(jj) abline() don't work. Thank you. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] NA's?
Your message doesn't help us very much. You haven't said what kind of calculation it is you want to do, and that certainly matters. For example, for some kinds of computations the solution you started below would work fine: M - matrix(1:16, 4, 4) is.na(diag(M)) - TRUE M [,1] [,2] [,3] [,4] [1,] NA59 13 [2,]2 NA 10 14 [3,]37 NA 15 [4,]48 12 NA rowSums(M, na.rm = TRUE) [1] 27 26 25 24 colSums(M, na.rm = TRUE) [1] 9 20 31 42 You can also use apply( ) with functions that will accept missing values (and ignore them) for computations on either the rows or the columns. Hoping for a general mechanism that would somehow signal the diagonal values as values to be ignored in a general way is not a possibility. Just as a curiosity, what were you hoping that na.omit(M) would do? V. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Asha Jayanthi Sent: Thursday, 31 March 2005 11:22 AM To: r-help@stat.math.ethz.ch Subject: [R] NA's? I have a large matrix of data . The size of the matrix ranges from 100 x 100 to 1000 x 1000 Now i have to do computations on that. And should not consider the diagonal elements. I tried setting diag(M) = NA and M = na.omit(M). But this omits all the rows. I only want to omit that diagonal elements only but consider the whole row. diag(M) = 0 seems like a good option but this will affect my result. How to proceed with this. How to just ignore some specific values. what if i want to consider only the upper / lower triangular matrix Asha http://adfarm.mediaplex.com/ad/ck/4686-26272-10936-31?ck=RegSell Start your business. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] NA's?
is.na(diag(M)) - TRUE cmeans - colMeans(M, na.rm = TRUE) csd - apply(M, 2, sd, na.rm = TRUE) cvar - csd ^2 (or cvar - apply(M, 2, var, na.rm = TRUE) ) Using 'kmeans' on a matrix but 'ignoring the diagonal entries' just doesn't make sense as it stands, so I can't help you there. V. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Asha Jayanthi Sent: Thursday, 31 March 2005 3:48 PM To: r-help@stat.math.ethz.ch Subject: RE: [R] NA's? I am sorry about that. I like to do column mean, sd, var as well as kmeans on the matrix does this na.rm = TRUE work for such fuctions and only the diagonal is ignored? From: [EMAIL PROTECTED] To: [EMAIL PROTECTED],r-help@stat.math.ethz.ch Subject: RE: [R] NA's? Date: Thu, 31 Mar 2005 11:42:05 +1000 Your message doesn't help us very much. You haven't said what kind of calculation it is you want to do, and that certainly matters. For example, for some kinds of computations the solution you started below would work fine: M - matrix(1:16, 4, 4) is.na(diag(M)) - TRUE M [,1] [,2] [,3] [,4] [1,] NA59 13 [2,]2 NA 10 14 [3,]37 NA 15 [4,]48 12 NA rowSums(M, na.rm = TRUE) [1] 27 26 25 24 colSums(M, na.rm = TRUE) [1] 9 20 31 42 You can also use apply( ) with functions that will accept missing values (and ignore them) for computations on either the rows or the columns. Hoping for a general mechanism that would somehow signal the diagonal values as values to be ignored in a general way is not a possibility. Just as a curiosity, what were you hoping that na.omit(M) would do? V. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Asha Jayanthi Sent: Thursday, 31 March 2005 11:22 AM To: r-help@stat.math.ethz.ch Subject: [R] NA's? I have a large matrix of data . The size of the matrix ranges from 100 x 100 to 1000 x 1000 Now i have to do computations on that. And should not consider the diagonal elements. I tried setting diag(M) = NA and M = na.omit(M). But this omits all the rows. I only want to omit that diagonal elements only but consider the whole row. diag(M) = 0 seems like a good option but this will affect my result. How to proceed with this. How to just ignore some specific values. what if i want to consider only the upper / lower triangular matrix Asha http://adfarm.mediaplex.com/ad/ck/4686-26272-10936-31?ck=RegSell Start your business. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html http://ads.mediaturf.net/event.ng/Type=clickFlightID=17307AdID=44925T argetID=9763Targets=9763Values=414,868,1093,2385Redirect=http:%2F%2Fw ww.icicibanknripromotions.com%2Fm2i_feb%2Fnri_M2I_feb.jsp%3Fadid%3D44925 %26siteid%3D1093%26flightid%3D17307 Get a FREE 30 minute India Calling Card. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Generating list of vector coordinates
rev(expand.grid(k = 1:5, j = 1:4, i = 1:3)) i j k 1 1 1 1 2 1 1 2 3 1 1 3 4 1 1 4 5 1 1 5 6 1 2 1 7 1 2 2 8 1 2 3 ... 55 3 3 5 56 3 4 1 57 3 4 2 58 3 4 3 59 3 4 4 60 3 4 5 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ronnen Levinson Sent: Tuesday, 29 March 2005 9:21 AM To: r-help@stat.math.ethz.ch Subject: [R] Generating list of vector coordinates Hi. Can anyone suggest a simple way to obtain in R a list of vector coordinates of the following form? The code below is Mathematica. In[5]:= Flatten[Table[{i,j,k},{i,3},{j,4},{k,5}], 2] Out[5]= {{1,1,1},{1,1,2},{1,1,3},{1,1,4},{1,1,5},{1,2,1},{1,2,2},{1,2,3},{1 ,2,4},{1,2, 5},{1,3,1},{1,3,2},{1,3,3},{1,3,4},{1,3,5},{1,4,1},{1,4,2},{1,4,3}, {1,4, 4},{1,4,5},{2,1,1},{2,1,2},{2,1,3},{2,1,4},{2,1,5},{2,2,1},{2,2,2}, {2,2, 3},{2,2,4},{2,2,5},{2,3,1},{2,3,2},{2,3,3},{2,3,4},{2,3,5},{2,4,1}, {2,4, 2},{2,4,3},{2,4,4},{2,4,5},{3,1,1},{3,1,2},{3,1,3},{3,1,4},{3,1,5}, {3,2, 1},{3,2,2},{3,2,3},{3,2,4},{3,2,5},{3,3,1},{3,3,2},{3,3,3},{3,3,4}, {3,3, 5},{3,4,1},{3,4,2},{3,4,3},{3,4,4},{3,4,5}} I've been futzing with apply(), outer(), and so on but haven't found an elegant solution. Thanks, Ronnen. P.S. E-mailed CCs of posted replies appreciated. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Robust multivariate regression with rlm
lm works for multivariate responses rlm does not - check what the help file says about the response. That's about it, really. Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Markku Mielityinen Sent: Thursday, 24 March 2005 5:20 PM To: r-help@stat.math.ethz.ch Subject: [R] Robust multivariate regression with rlm Dear Group, I am having trouble with using rlm on multivariate data sets. When I call rlm I get Error in lm.wfit(x, y, w, method = qr) : incompatible dimensions lm on the same data sets seem to work well (see code example). Am I doing something wrong? I have already browsed through the forums and google but could not find any related discussions. I use Windows XP and R Version 2.0.1 (2004-11-15) (if that makes a difference). Example code: Mx [,1] [,2] [1,] 49.10899 45.75513 [2,] 505.92018 48.81037 [3,] 973.30659 50.28478 [4,] 55.99533 508.94504 [5,] 964.96028 513.69579 [6,] 48.25670 975.94972 [7,] 510.21291 967.62767 [8,] 977.12363 978.29216 My [,1] [,2] [1,] 50 50 [2,] 512 50 [3,] 974 50 [4,] 50 512 [5,] 974 512 [6,] 50 974 [7,] 512 974 [8,] 974 974 model-lm(My~Mx) model Call: lm(formula = My ~ Mx) Coefficients: [,1] [,2] (Intercept) 0.934727 3.918421 Mx1 1.003517 -0.004202 Mx2 -0.002624 0.998155 model-rlm(My~Mx) Error in lm.wfit(x, y, w, method = qr) : incompatible dimensions model-rlm(My~Mx,psi=psi.bisquare) Error in lm.wfit(x, y, w, method = qr) : incompatible dimensions Another example (this one seems to work): Mx-matrix(c(0,0,1,0,0,1),ncol=2,byrow=TRUE)+1 My-matrix(c(0,0,1,1,-1,1),ncol=2,byrow=TRUE)+1 model-rlm(My~Mx) model Call: rlm(formula = My ~ Mx) Converged in 0 iterations Coefficients: [,1] [,2] (Intercept)1 -1 Mx111 Mx2 -11 Degrees of freedom: 6 total; 0 residual Scale estimate: 0 Best regards, Markku Mielityinen __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] error with polr()
This is always tricky. Here is a work-around. Try asking for the Hessian with the original fit: fm - polr(factor(y) ~ lx, data = ord.dat, Hess=T) summary(fm) Call: polr(formula = factor(y) ~ lx, data = ord.dat, Hess = T) Coefficients: Value Std. Error t value lx 2.420614 0.8146359 2.971406 Intercepts: Value Std. Error t value 0|1 0.5865 0.8118 0.7224 1|2 4.8966 1.7422 2.8106 Residual Deviance: 20.43286 AIC: 26.43286 --- [I have no idea if this is the same as SAS but if not, please report the problem to SAS Inc.] Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Chaehyung Ahn Sent: Tuesday, 22 March 2005 11:44 AM To: r-help@stat.math.ethz.ch Subject: [R] error with polr() Dear Sir, I get an error message when I use polr() in MASS package. My data is ord.dat. I made y a factor. y y1 y2 x lx 1 0 0 0 3.2e-02 -1.49485 2 0 0 0 3.2e-02 -1.49485 3 0 0 0 1.0e-01 -1.0 4 0 0 0 1.0e-01 -1.0 5 0 0 0 3.2e-01 -0.49485 6 0 0 0 3.2e-01 -0.49485 7 1 1 0 1.0e+00 0.0 8 0 0 0 1.0e+00 0.0 9 1 1 0 3.2e+00 0.50515 10 1 1 0 3.2e+00 0.50515 11 0 0 0 1.0e+01 1.0 12 1 1 0 1.0e+01 1.0 13 1 1 0 3.2e+01 1.50515 14 2 1 1 3.2e+01 1.50515 15 2 1 1 1.0e+02 2.0 16 1 1 0 1.0e+02 2.0 17 2 1 1 3.2e+02 2.50515 18 1 1 0 3.2e+02 2.50515 19 2 1 1 1.0e+03 3.0 20 2 1 1 1.0e+03 3.0 When I try, polr(y~lx,data=ord.dat) I gives me a output, which is the same as that from SAS. But when I try, summary(polr(y~lx,data=ord.dat)) Re-fitting to get Hessian Error in optim(start, fmin, gmin, method = BFGS, hessian = Hess, ...) : initial value in vmmin is not finite And the weird thing is that it's fine if I use x instead of lx, where lx=log10(x). thanks Sincerely, cahn __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] A rude question
When Haydn was asked about his 100+ symphonies he is reputed to have replied sunt mala bona mixta which is kind of dog latin for There are good ones and bad ones all mixed together. It's certainly the same with R packages so to continue the latin motif: caveat emptor The R engine, on the other hand, is pretty well uniformly excellent code but you have to take my word for that. Actually, you don't. The whole engine is open source so, if you wish, you can check every line of it. If people were out to push dodgy software, this is not the way they'd go about it. Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Thursday, 27 January 2005 3:10 PM To: r-help@stat.math.ethz.ch Subject: [R] A rude question Dear all, I am beginner using R. I have a question about it. When you use it, since it is written by so many authors, how do you know that the results are trustable?(I don't want to affend anyone, also I trust people). But I think this should be a question. Thanks, Ming __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] AIC, glm, lognormal distribution
Benjamin, A couple of points: fit - glm(y ~ x, gaussian(link = log)) does NOT fit a model with a lognormal response distribution. It fits a non-linear regression model with an ordinary gaussian response distribution. This model has constant variance, whereas the lognormal model (which you would fit by transforming the response) has constant coefficient of variation. You would transform the response for two reasons, namely it should linearize the relationship between (transformed) response and predictors AND it should change a constant CV into homoscedasticity, or constant variance. This latter property as important as the first, usually. You should not think of a glm with log link as a kind of handy alternative to a log-transformed regression as they are in reality very different models. Second point: you claim that the calls fitA - glm(y ~ x, gaussian(link = log)) fitB - glm(y ~ x, gaussian) fitC - lm(y ~ x) give identical results. This is NOT true for me on R 2.0.1 (Windows), so you may care to check that, (although fitB and fitC are fully equivalent, of course). When you sort out the model you really need to use, you may find stepAIC in the MASS library useful as one tool for model selection, or at least for steps towards that generally rather complex goal. Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Benjamin M. Osborne Sent: Monday, 13 December 2004 12:40 PM To: [EMAIL PROTECTED] Subject: [R] AIC, glm, lognormal distribution I'm attempting to do model selection with AIC, using a glm and a lognormal distribution, but: fit1-glm(BA~Year,data=pdat.sp1.65.04, family=gaussian(link=log)) ## gives the same result as either of the following: fit1-glm(BA~Year,data=pdat.sp1.65.04, family=gaussian) fit1-lm(BA~Year,data=pdat.sp1.65.04) fit1 #Coefficients: #(Intercept) Year2004 #-1.6341 -0.2741 #Degrees of Freedom: 84 Total (i.e. Null); 83 Residual #Null Deviance: 1.521 #Residual Deviance: 1.476AIC: -97.31 fit1-glm(BA~Year,data=pdat.sp1.65.04, family=quasi(link=log)) # also gives the same result but returns AIC: NA ## Is it possible to model a lognormal distribution without having to transform ## the data themselves? (i.e.: fit1-lm(log(BA)~Year,data=pdat.sp1.65.04) Thanks in advance, Ben Osborne -- Botany Department University of Vermont 109 Carrigan Drive Burlington, VT 05405 [EMAIL PROTECTED] phone: 802-656-0297 fax: 802-656-0440 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Can't seem to finish a randomForest.... Just goes and goes!
Alternatively, if you can arrive at a sensible ordering of the levels you can declare them ordered factors and make the computation feasible once again. Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Torsten Hothorn Sent: Monday, 5 April 2004 4:27 PM To: David L. Van Brunt, Ph.D. Cc: R-Help Subject: Re: [R] Can't seem to finish a randomForest Just goes and goes! On Sun, 4 Apr 2004, David L. Van Brunt, Ph.D. wrote: Playing with randomForest, samples run fine. But on real data, no go. Here's the setup: OS X, same behavior whether I'm using R-Aqua 1.8.1 or the Fink compile-of-my-own with X-11, R version 1.8.1. This is on OS X 10.3 (aka Panther), G4 800Mhz with 512M physical RAM. I have not altered the Startup options of R. Data set is read in from a text file with read.table, and has 46 variables and 1,855 cases. Trying the following: The DV is categorical, 0 or 1. Most of the IV's are either continuous, or correctly read in as factors. The largest factor has 30 levels Only the ^ This means: there are 2^(30-1) = 536.870.912 possible splits to be evaluated everytime this variable is picked up (minus something due to empty levels). At least the last time I looked at the code, randomForest used an exhaustive search over all possible splits. Try reducing the number of levels to something reasonable (or for a first shot: remove this variable from the learning sample). Best, Torsten DV seems to need identifying as a factor to force class trees over regresssion: Mydata$V46-as.factor(Mydata$V46) Myforest.rf-randomForest(V46~.,data=Mydata,ntrees=100,mtry=7,proximi ties=FALSE , importance=FALSE) 5 hours later, R.bin was still taking up 75% of my processor. When I've tried this with larger data, I get errors referring to the buffer (sorry, not in front of me right now). Any ideas on this? The data don't seem horrifically large. Seems like there are a few options for setting memory size, but I'm not sure which of them to try tweaking, or if that's even the issue. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] binding vectors or matrix using their names
Goodness Patrick, this must surely qualify for the obfuscated R competition finals. I love it! There are two solutions I can think of with do.call and here they are: x - 1 x2 - runif(10) x12 - c(x, x2) do.call(cbind, lapply(x12, as.name)) x x2 [1,] 1 0.99327265 [2,] 1 0.63260097 [3,] 1 0.17411170 [4,] 1 0.54635634 [5,] 1 0.75603670 [6,] 1 0.27739270 [7,] 1 0.32125068 [8,] 1 0.01326344 [9,] 1 0.37519602 [10,] 1 0.11133052 do.call(cbind, lapply(x12, get)) [,1] [,2] [1,]1 0.99327265 [2,]1 0.63260097 [3,]1 0.17411170 [4,]1 0.54635634 [5,]1 0.75603670 [6,]1 0.27739270 [7,]1 0.32125068 [8,]1 0.01326344 [9,]1 0.37519602 [10,]1 0.11133052 I suspect the first offers a few advantages over the second, (which some other people have implicitly suggested). The first preserves the names in the result. Also, if the vectors are large, the second constructs a large language object and then evaluates it to get a large result. The first uses the names rather than the objects themselves, so at least the language object is small, even if the result is not. A more serious, philosophical word on Patrick's solution. It is rarely necessary (in my limited experience, sure) to have to use parse() like this. Where it provides a quick (kludgy?) solution I often find it a useful exercise to consider alternatives. They often come out simpler and you nearly always pick up something useful in the process. Bill V. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Burns Sent: Thursday, 25 March 2004 7:02 AM To: Stephane DRAY Cc: [EMAIL PROTECTED]; Stephane DRAY; Tom Blackwell Subject: Re: [R] binding vectors or matrix using their names I think you are looking for the eval-parse-text idiom: eval(parse(text=paste(cbind(, paste(my.names, collapse=, ), Patrick Burns Burns Statistics [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and A Guide for the Unwilling S User) Stephane DRAY wrote: Hi Tom, Your approach did not work, do.call(cbind, as.list(my.names)) [,1] [,2] [1,] x x2 but it helps me a lot to find the good one: do.call(cbind, as.list(parse(text=my.names))) Thanks, At 14:56 24/03/2004, Tom Blackwell wrote: I believe the syntax is result - do.call(cbind, as.list(my.names)) Haven't checked this on your example, though. - tom blackwell - u michigan medical school - ann arbor - On Wed, 24 Mar 2004, Stephane DRAY wrote: Hello list, I have two vectors x and x2: x=runif(10) x2=runif(10) and one vectors with their names : my.names=c(x,x2) I would like to cbind these two vectors using their names contained in the vector my.names. I can create a string with comma ncomma=paste(my.names,collapse=,) and now, I just need a function to transform this string into a adequate argument for cbind: cbind(afunction(ncomma)) Is there in R a function that can do the job ? If not, how can I do it ?? Thanks in advance, Sincerely. Stéphane DRAY - - Département des Sciences Biologiques Université de Montréal, C.P. 6128, succursale centre-ville Montréal, Québec H3C 3J7, Canada Tel : 514 343 6111 poste 1233 E-mail : [EMAIL PROTECTED] - - Web http://www.steph280.freesurf.fr/ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Stéphane DRAY -- Département des Sciences Biologiques Université de Montréal, C.P. 6128, succursale centre-ville Montréal, Québec H3C 3J7, Canada Tel : 514 343 6111 poste 1233 E-mail : [EMAIL PROTECTED] -- Web http://www.steph280.freesurf.fr/ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!
RE: [R] profile error on an nls object
Adrian Dragulescu asks: Hello all, This is the error message that I get. hyp.res - nls(log(y)~log(pdf.hyperb(theta,X)), data=dataModel, + start=list(theta=thetaE0), + trace=TRUE) 45.54325 : 0.100 1.3862944 -4.5577142 0.0005503 3.728302 : 0.0583857346 0.4757772859 -4.9156128701 0.0005563154 1.584317 : 0.0194149477 0.3444648833 -4.9365149150 0.0004105426 1.569333 : 0.0139310639 0.3824648048 -4.9024001228 0.0004089738 1.569311 : 0.0137155342 0.3888648619 -4.8979817546 0.0004137501 1.569311 : 0.0136895846 0.3893564152 -4.8976182201 0.0004141057 1.569311 : 0.0136876315 0.3894059947 -4.8975821760 0.0004141343 hyp.res.S - summary(hyp.res) hyp.res.S Formula: log(y) ~ log(pdf.hyperb(theta, X)) Parameters: Estimate Std. Error t value Pr(|t|) theta1 0.0136876 0.0359964 0.3800.705 theta2 0.3894060 0.3079860 1.2640.211 theta3 -4.8975822 0.2219928 -22.062 2e-16 *** theta4 0.0004141 0.0005457 0.7590.451 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.1542 on 66 degrees of freedom Correlation of Parameter Estimates: theta1theta2theta3 theta2 -0.02168 theta3 -0.02029 0.997736 theta4 -0.97182 -0.008054 -0.008952 pr1 - profile(hyp.res) 1.825584 : 0.3894059947 -4.8975821760 0.0004141343 ... 1.875413 : 0.000300086 1.830592703 0.000608001 Error in prof$getProfile() : step factor 0.000488281 reduced below `minFactor' of 0.000976563 Why is there an error on profile, which should only evaluate the objective function for different parameter values? profile() doesn't merely evaluate the objective function at different parameter values. It holds one of the parameters constant and optimises the objective function with respect to all the others. This is repeated for a sequence of values passing through the mle, and the same for each parameter. If you fix a parameter at a very unrealistic value (as can happen) it is not surprising that the optimisation with respect to all the others runs into trouble. This is a computationally difficult area and I am more surprised by the cases that work without a hitch than by the ones that don't. You can expect to have to tweak things a bit in many cases. Bill Venables. Thanks a lot. Adrian __ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Recursive partitioning with multicollinear variables
No, for regression trees collinearity is a non-issue, because it is not a linear procedure. Having variables that are linearly dependent (even exactly so) merely widens the scope of choice that the algorithm has to make cuts. I'm not sure what you mean by Multicollinear variables should appear as alternate splits. Do you mean that every second split should be in one variable of a particular set? Perhaps you mean alternative instead of alternate? In either case I think you are worrying over nothing. Just go ahead and do the tree-based model analysis and don't worry about it. Here is a little picture that might clarify things. Suppose Latitude and Longitude are two variables on which the algorithm may choose to split. This means that splits in these geographical variables can only occur in a North-South or an East-West direction. Let's suppose you add in two extra variables that are completely dependent on the first, namely LatPlusLong - Latitude + Longitude LatMinusLong - Latitude - Longitude and now offer all four variables as potential split variables. Now the algorithm may split North-South, East-West, NorthEast-SouthWest or NorthWest-SouthEast. All you have done is increase the scope of choice for the algorithm to make splits. Not only does the linear dependence not matter, but I'd argue it could be a pretty good thing. One serious message to take from this as well, though, is to use regression trees for prediction. Don't read too much into the variables that the algorithm has chosen to use at any stage. Bill Venables. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jean-Noel Sent: Monday, 9 February 2004 8:25 PM To: [EMAIL PROTECTED] Subject: [R] Recursive partitioning with multicollinear variables Dear all, I would like to perform a regression tree analysis on a dataset with multicollinear variables (as climate variables often are). The questions that I am asking are: 1- Is there any particular statistical problem in using multicollinear variables in a regression tree? 2- Multicollinear variables should appear as alternate splits. Would it be more accurate to present these alternate splits in the results of the analysis or apply a variable selection or reduction procedure before the regression tree? Thank you in advance, Jean-Noel Candau INRA - Unité de Recherches Forestières Méditerranéennes Avenue A. Vivaldi 84000 AVIGNON Tel: (33) 4 90 13 59 22 Fax: (33) 4 90 13 59 59 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html