[R] modifications to text.tree function
Hi. I have to make some minor modifications to the text.tree function - I don't like the way it prints the split labels (they are too long in my case and overlap). I tried to make s simple modification to the text.tree function so that it will limit the number of significant digits in tree labels, but could not - the original function uses some undocumented "treeco" function, which I can not find. Any ideas ? Thanks. -- Alexander Sirotkin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] question about plot.acf
Hi. I'm looking for a way to plot autocorrelation, but in a little bit different way than plot.acf does. Instead of plotting NxN graphs (assuming N is ht enumber of variables) like plot.acf does, I'd like to have one graph of sum of all autocorrelations vs. lag. Is there any function that already does this or should a try to write it myslef ? Thanks a lot. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] statistical significance test for cluster agreement
Christian, I think I understand your point, but I do not completely agree with you. I also did not describe my problem clear enough. > If you see two > clusterings on the same > data, they are identical, if they are 100% > identical, and if not, then > not. What you are actually saying is that all values of Rand index for cluster agreement other then 1 inidicate that clusters do not agree. I believe that many people would disagree with this statement. Let me explain my problem in a little bit more detail. I have some classified data set. These classes were ontained using non-statistical methods. What I'm trying to do is run some clustering algorithm and compare it's results to this known classification. I think that this is not very different from calculating mean and comparing it to some known value. I think that is should be theoretically possible to use Rand index as a test statistic. Or maybe I'm missing something... __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] statistical significance test for cluster agreement
Like you said, such kind of test will not give me anything that Rand index does not, except for p-value. The null hypothesis, in my case, is that clustering results does not match a different clustering, that someone alse did on the same data. And I do believe that this hypothesis is valid. Basicly, it's not that different from chi-squared goodness of fit test which check whether or not my data comes from particular distribution. With an exception that I don't know how to do chi-squared test in this case :) --- "Liaw, Andy" <[EMAIL PROTECTED]> wrote: > But what would such a test do that the rand index > does not? Would you > interpret the p-value from such a test, if exists, > to have the meaning that > a real test of hypothesis has? AFAIK you basically > need to have the > hypotheses pinned down even before you see any data, > for the inference to be > valid. Is that possible with clustering? > > Just my $0.02... > Andy > > > From: Alexander Sirotkin [at Yahoo] > > > > I was wondering, whether there is a way to have > > statistical significance test for cluster > agreement. > > > > I know that I can use classAgreement() function to > get > > Rand index, which will give me some indication > whether > > the clusters agree or not, but it would be > interesting > > to have a formal test. > > > > Thanks. > > > > __ > > [EMAIL PROTECTED] mailing list > > > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > > > > -- > Notice: This e-mail message, together with any > attachments, contains > information of Merck & Co., Inc. (One Merck Drive, > Whitehouse Station, New > Jersey, USA 08889), and/or its affiliates (which may > be known outside the > United States as Merck Frosst, Merck Sharp & Dohme > or MSD and in Japan as > Banyu) that may be confidential, proprietary > copyrighted and/or legally > privileged. It is intended solely for the use of the > individual or entity > named on this message. If you are not the intended > recipient, and have > received this message in error, please notify us > immediately by reply e-mail > and then delete it from your system. > __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] statistical significance test for cluster agreement
I was wondering, whether there is a way to have statistical significance test for cluster agreement. I know that I can use classAgreement() function to get Rand index, which will give me some indication whether the clusters agree or not, but it would be interesting to have a formal test. Thanks. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] importing S-Plus data files
S-Plus version is 6.1 (on both Linux and Windows), R is 1.8.1. It's Win2K, although I don't think it matters. Thanks. --- "Liaw, Andy" <[EMAIL PROTECTED]> wrote: > You can help yourself to help us by at least telling > us what versions of > S-PLUS on Linux the data were created from, the > version of S-PLUS you are > using under Windows (which version of Windows?) and > the version of R you are > using. > > I believe starting in S-PLUS 6.1, the data created > by S-PLUS is binary > compatible between Linux and Windows versions. > > Andy > > > From: Demiurg [at Yahoo] > > > > I have some data in the Linux version of S-Plus, > which I can not use > > anymore. The program is just broken and won't run. > I'm trying to > > find a way to import that data to either Windows > version of S-Plus > > (which I have running on my other machine) or R > (Linux or Windows, > > it doesn't matter). > > > > Unfortunately, nothing seems to work. > > Windows S-Plus seem to ignore files from Linux > .Data directory > > and non of the import ruotines available in R can > handle my data. > > > > Any suggestions would be appreciated. > > > > __ > > [EMAIL PROTECTED] mailing list > > > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > > > > -- > Notice: This e-mail message, together with any > attachments, contains > information of Merck & Co., Inc. (One Merck Drive, > Whitehouse Station, New > Jersey, USA 08889), and/or its affiliates (which may > be known outside the > United States as Merck Frosst, Merck Sharp & Dohme > or MSD and in Japan, as > Banyu) that may be confidential, proprietary > copyrighted and/or legally > privileged. It is intended solely for the use of the > individual or entity > named on this message. If you are not the intended > recipient, and have > received this message in error, please notify us > immediately by reply e-mail > and then delete it from your system. > __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] problem with bagging
I'm having the most weird problem with bagging function. For some unknown reason it does not improve the classification (compared to rpart), but instead gives much worse results ! Running rpart on my data gives error rate of about 0.3 and bagging, instead of improving this results, gives error rate of 0.9 !!! I'm running both rpart and bagging with exactly the same parameters, I even tried to run bagging() with nbagg=1, which should be identical to rpart, but still - bagging gives this terrible error rate. Any help would be appriciated. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Difference between summary.lm() and summary.aov()
Thanks a lot to everybody. Two more questions, if you don't mind : How anova() treats non-categorical variables, such as severity in my case ? I was under impression that ANOVA is defined for categorical variables only. I read about drop1() and I understand that it performs F-test for nested models, correct me if I'm wrong. It is unclear to me, however, how it manages to do this F-test for interactions ? Thanks a lot. --- Peter Dalgaard <[EMAIL PROTECTED]> wrote: > "Alexander Sirotkin [at Yahoo]" > <[EMAIL PROTECTED]> writes: > > > John, > > > > What you are saying is that any conclusion I can > make > > from summary.aov (for instance, to answer a > question > > if physician is a significant variable) will not > be > > correct ? > > Summary.aov is for summarizing aov objects, so > you're lucky to get > something that is sensible at all. You should use > anova() to get > analysis of variance tables. These are sequential so > that you can use > them (give or take some quibbles about the residual > variance) for > reducing the model from the "bottom up". I.e. if you > place "physician" > last, you get the F test for whether that variable > is significant. > However, a more convenient way of getting that > result is to use > drop1(). Even then there's no simple relation to the > two > t-tests, except that the F test tests the hypothesis > that *both* > coefficients are zero, where the t-tests do so > individually. > > > > --- John Fox <[EMAIL PROTECTED]> wrote: > > > Dear Spencer and Alexander, > > > > > > In this case, physician is apparently a factor > with > > > three levels, so > > > summary.aov() gives you a sequential ANOVA, > > > equivalent to what you'd get > > > from anova(). There no simple relationship > between > > > the F-statistic for > > > physician, which has 2 df in the numerator, and > the > > > two t's. (By the way, I > > > doubt whether a sequential ANOVA is what's > wanted > > > here.) > > > > > > Regards, > > > John > > > > > > At 09:17 AM 12/6/2003 -0800, Spencer Graves > wrote: > > > > The square of a Student's t with "df" > degrees > > > of freedom is an F > > > > distribution with 1 and "df" degrees of > freedom. > > > > hope this helps. spencer graves > > > > > > > >Alexander Sirotkin [at Yahoo] wrote: > > > > > > > >>I have a simple linear model (fitted with > lm()) > > > with 2 > > > >>independant > > > >>variables : one categorical and one integer. > > > >> > > > >>When I run summary.lm() on this model, I get a > > > >>standard linear > > > >>regression summary (in which one categorical > > > variable > > > >>has to be > > > >>converted into many indicator variables) which > > > looks > > > >>like : > > > >> > > > >>Estimate Std. Error t value > Pr(>|t|) > > > >>(Intercept) -3595.3 2767.1 -1.299 > 0.2005 > > > >>physicianB 802.0 2289.5 0.350 > 0.7277 > > > >>physicianC4906.8 2419.8 2.028 > 0.0485 * > > > >>severity 7554.4 906.3 8.336 > 1.12e-10 > > > *** > > > >> > > > >>and when I run summary.aov() I get similar > ANOVA > > > table > > > >>: > > > >> Df Sum SqMean Sq F value > > > Pr(>F) > > > >>physician2 294559803 147279901 3.3557 > > > 0.04381 > > > >>* > > > >>severity 1 3049694210 3049694210 69.4864 > > > 1.124e-10 > > > >>*** > > > >>Residuals 45 1975007569 43889057 > > > >> > > > >>What is absolutely unclear to me is how > F-value > > > and > > > >>Pr(>F) for the > > > >>categorical "physician" variable of the > > > summary.aov() > > > >>is calculated > > > >>from the t-value of the summary.lm() table. > > > >> > > > >>I looked at the summary.aov() source code but > > > still > > > >>could not figure > > > >>it. > > > >> > > > >>Thanks a lot. > >
Re: [R] Difference between summary.lm() and summary.aov()
John, What you are saying is that any conclusion I can make from summary.aov (for instance, to answer a question if physician is a significant variable) will not be correct ? --- John Fox <[EMAIL PROTECTED]> wrote: > Dear Spencer and Alexander, > > In this case, physician is apparently a factor with > three levels, so > summary.aov() gives you a sequential ANOVA, > equivalent to what you'd get > from anova(). There no simple relationship between > the F-statistic for > physician, which has 2 df in the numerator, and the > two t's. (By the way, I > doubt whether a sequential ANOVA is what's wanted > here.) > > Regards, > John > > At 09:17 AM 12/6/2003 -0800, Spencer Graves wrote: > > The square of a Student's t with "df" degrees > of freedom is an F > > distribution with 1 and "df" degrees of freedom. > > hope this helps. spencer graves > > > >Alexander Sirotkin [at Yahoo] wrote: > > > >>I have a simple linear model (fitted with lm()) > with 2 > >>independant > >>variables : one categorical and one integer. > >> > >>When I run summary.lm() on this model, I get a > >>standard linear > >>regression summary (in which one categorical > variable > >>has to be > >>converted into many indicator variables) which > looks > >>like : > >> > >>Estimate Std. Error t value Pr(>|t|) > >>(Intercept) -3595.3 2767.1 -1.299 0.2005 > >>physicianB 802.0 2289.5 0.350 0.7277 > >>physicianC4906.8 2419.8 2.028 0.0485 * > >>severity 7554.4 906.3 8.336 1.12e-10 > *** > >> > >>and when I run summary.aov() I get similar ANOVA > table > >>: > >> Df Sum SqMean Sq F value > Pr(>F) > >>physician2 294559803 147279901 3.3557 > 0.04381 > >>* > >>severity 1 3049694210 3049694210 69.4864 > 1.124e-10 > >>*** > >>Residuals 45 1975007569 43889057 > >> > >>What is absolutely unclear to me is how F-value > and > >>Pr(>F) for the > >>categorical "physician" variable of the > summary.aov() > >>is calculated > >>from the t-value of the summary.lm() table. > >> > >>I looked at the summary.aov() source code but > still > >>could not figure > >>it. > >> > >>Thanks a lot. > >> > >>__ > >> > >> > >>__ > >>[EMAIL PROTECTED] mailing list > >>https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >> > > > >__ > >[EMAIL PROTECTED] mailing list > >https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > - > John Fox > Department of Sociology > McMaster University > Hamilton, Ontario, Canada L8S 4M4 > email: [EMAIL PROTECTED] > phone: 905-525-9140x23604 > web: www.socsci.mcmaster.ca/jfox > - > __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Difference between summary.lm() and summary.aov()
I have a simple linear model (fitted with lm()) with 2 independant variables : one categorical and one integer. When I run summary.lm() on this model, I get a standard linear regression summary (in which one categorical variable has to be converted into many indicator variables) which looks like : Estimate Std. Error t value Pr(>|t|) (Intercept) -3595.3 2767.1 -1.299 0.2005 physicianB 802.0 2289.5 0.350 0.7277 physicianC4906.8 2419.8 2.028 0.0485 * severity 7554.4 906.3 8.336 1.12e-10 *** and when I run summary.aov() I get similar ANOVA table : Df Sum SqMean Sq F valuePr(>F) physician2 294559803 147279901 3.3557 0.04381 * severity 1 3049694210 3049694210 69.4864 1.124e-10 *** Residuals 45 1975007569 43889057 What is absolutely unclear to me is how F-value and Pr(>F) for the categorical "physician" variable of the summary.aov() is calculated from the t-value of the summary.lm() table. I looked at the summary.aov() source code but still could not figure it. Thanks a lot. __ New Yahoo! Photos - easier uploading and sharing. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] R memory and CPU requirements
Thanks for all the responses. After re-examining my data I came to realize that second order interactions would be enough in my particular case. With second order instructions I managed to fit a model with less then 512MB RAM. Thanks to everybody. --- John Fox <[EMAIL PROTECTED]> wrote: > Dear Alexander, > > > At 01:29 AM 10/17/2003 -0700, Alexander Sirotkin > \[at Yahoo\] wrote: > >I agree completely. > > > >In fact, I have about 5000 observations, which > should > >be enough. > >I was using 200 samples because of RAM limitations > and > > I'm afraid to think about what amount of RAM I'll > >need to fit an aov() for such data. > > > > > OK -- I didn't realize that you have 5000 > observations. Perhaps I didn't > read some of the earlier messages carefully enough. > > At the risk of getting you to repeat information > that you've already > provided, how many degrees of freedom are there in > the model that you're > trying to fit? I can create a 5000 by 5000 model > matrix on my relatively > anemic Windows machine, and surely (unless there's > some specification > error) your model should have many fewer df than > that if it includes just > the main effects and two-way interactions (or by all > interactions, do you > mean higher-order interactions as well?). > > Perhaps providing the following information would > help: What is the model > formula? Which variables are factors? How many > levels does each factor have? > > Regards, > John > > >--- John Fox <[EMAIL PROTECTED]> wrote: > > > Dear Alexander, > > > > > > If I understand you correctly, you have a sample > of > > > 200 observations. Even > > > if you had only two factors with 40 levels each, > the > > > main effects and > > > interactions of these factors would require > about > > > 1600 degrees of freedom > > > -- that is, more than the number of > observations. > > > This doesn't make a whole > > > lot of sense. > > > > > > I hope that this helps, > > > John > > > > > > At 05:03 PM 10/16/2003 -0700, Alexander Sirotkin > > > \[at Yahoo\] wrote: > > > > > > >--- Deepayan Sarkar <[EMAIL PROTECTED]> > wrote: > > > > > On Thursday 16 October 2003 17:59, Alexander > > > > > Sirotkin \[at Yahoo\] wrote: > > > > > > Thanks for all the help on my previous > > > questions. > > > > > > > > > > > > One more (hopefully last one) : I've been > very > > > > > > surprised when I tried to fit a model > (using > > > > > aov()) > > > > > > for a sample of size 200 and 10 variables > and > > > > > their > > > > > > interactions. > > > > > > > > > > That doesn't really say much. How many of > these > > > > > variables are factors ? How > > > > > many levels do they have ? And what is the > order > > > of > > > > > the interaction ? (Note > > > > > that for 10 numeric variables, if you allow > all > > > > > interactions, then there will > > > > > be a 100 terms in your model. This increases > for > > > > > factors.) > > > > > > > > > > In other words, how big is your model matrix > ? > > > (See > > > > > ?model.matrix) > > > > > > > > > > Deepayan > > > > > > > > > > > > > > > > >I see... > > > > > > > >Unfortunately, model.matrix() ran out of memory > :) > > > >I have 10 variables, 6 of which are factor, 2 > of > > > which > > > > > > > >have quite a lot of levels (about 40). And I > would > > > >like > > > >to allow all interactions. > > > > > > > >I understand your point about categorical > > > variables, > > > >but > > > >still - this does not seem like too much data > to > > > me. > > > > > > > > > > > >I remmeber fitting all kinds of models (mostly > > > >decision > > > >trees) for much, much larger data sets. > > > > > > > >__ > > > >[EMAIL PROTECTED] mailing list > > > > > > >https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > > > > > > >- > > > John Fox > > > Department of Sociology > > > McMaster University > > > Hamilton, Ontario, Canada L8S 4M4 > > > email: [EMAIL PROTECTED] > > > phone: 905-525-9140x23604 > > > web: www.socsci.mcmaster.ca/jfox > > > > >- > > > > > > > > >__ > >Do you Yahoo!? > search > >http://shopping.yahoo.com > > - > John Fox > Department of Sociology > McMaster University > Hamilton, Ontario, Canada L8S 4M4 > email: [EMAIL PROTECTED] > phone: 905-525-9140x23604 > web: www.socsci.mcmaster.ca/jfox > - > __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] R memory and CPU requirements
--- Deepayan Sarkar <[EMAIL PROTECTED]> wrote: > On Thursday 16 October 2003 19:03, Alexander > Sirotkin \[at Yahoo\] wrote: > > > > > Thanks for all the help on my previous > questions. > > > > > > > > One more (hopefully last one) : I've been very > > > > surprised when I tried to fit a model (using > > > > aov()) > > > > for a sample of size 200 and 10 variables and > > > > their interactions. > > > > > > That doesn't really say much. How many of these > > > variables are factors ? How > > > many levels do they have ? And what is the order > of > > > the interaction ? (Note > > > that for 10 numeric variables, if you allow all > > > interactions, then there will > > > be a 100 terms in your model. This increases for > > > factors.) > > > > > > In other words, how big is your model matrix ? > (See > > > ?model.matrix) > > > > > > Deepayan > > > > I see... > > > > Unfortunately, model.matrix() ran out of memory :) > > I have 10 variables, 6 of which are factor, 2 of > which > > > > have quite a lot of levels (about 40). And I would > > like to allow all interactions. > > > > I understand your point about categorical > variables, > > but still - this does not seem like too much data > to me. > > That's one way to look at it. You don't have enough > data for the model you are > trying to fit. The usual approach under these > circumstances is to try > 'simpler' models. > > Please try to understand what you are trying to do > (in this case by reading an > introductory linear model text) before blindly > applying a methodology. > > Deepayan > > I did study ANOVA and I do have enough observations. 200 was only a random sample of more then 5000 which I think should be enough. However, I'm afraid to even think about amount of RAM I will need with R to fit a model for this data. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] R memory and CPU requirements
I agree completely. In fact, I have about 5000 observations, which should be enough. I was using 200 samples because of RAM limitations and I'm afraid to think about what amount of RAM I'll need to fit an aov() for such data. --- John Fox <[EMAIL PROTECTED]> wrote: > Dear Alexander, > > If I understand you correctly, you have a sample of > 200 observations. Even > if you had only two factors with 40 levels each, the > main effects and > interactions of these factors would require about > 1600 degrees of freedom > -- that is, more than the number of observations. > This doesn't make a whole > lot of sense. > > I hope that this helps, > John > > At 05:03 PM 10/16/2003 -0700, Alexander Sirotkin > \[at Yahoo\] wrote: > > >--- Deepayan Sarkar <[EMAIL PROTECTED]> wrote: > > > On Thursday 16 October 2003 17:59, Alexander > > > Sirotkin \[at Yahoo\] wrote: > > > > Thanks for all the help on my previous > questions. > > > > > > > > One more (hopefully last one) : I've been very > > > > surprised when I tried to fit a model (using > > > aov()) > > > > for a sample of size 200 and 10 variables and > > > their > > > > interactions. > > > > > > That doesn't really say much. How many of these > > > variables are factors ? How > > > many levels do they have ? And what is the order > of > > > the interaction ? (Note > > > that for 10 numeric variables, if you allow all > > > interactions, then there will > > > be a 100 terms in your model. This increases for > > > factors.) > > > > > > In other words, how big is your model matrix ? > (See > > > ?model.matrix) > > > > > > Deepayan > > > > > > > > >I see... > > > >Unfortunately, model.matrix() ran out of memory :) > >I have 10 variables, 6 of which are factor, 2 of > which > > > >have quite a lot of levels (about 40). And I would > >like > >to allow all interactions. > > > >I understand your point about categorical > variables, > >but > >still - this does not seem like too much data to > me. > > > > > >I remmeber fitting all kinds of models (mostly > >decision > >trees) for much, much larger data sets. > > > >__ > >[EMAIL PROTECTED] mailing list > >https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > - > John Fox > Department of Sociology > McMaster University > Hamilton, Ontario, Canada L8S 4M4 > email: [EMAIL PROTECTED] > phone: 905-525-9140x23604 > web: www.socsci.mcmaster.ca/jfox > - > __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] R memory and CPU requirements
--- Deepayan Sarkar <[EMAIL PROTECTED]> wrote: > On Thursday 16 October 2003 17:59, Alexander > Sirotkin \[at Yahoo\] wrote: > > Thanks for all the help on my previous questions. > > > > One more (hopefully last one) : I've been very > > surprised when I tried to fit a model (using > aov()) > > for a sample of size 200 and 10 variables and > their > > interactions. > > That doesn't really say much. How many of these > variables are factors ? How > many levels do they have ? And what is the order of > the interaction ? (Note > that for 10 numeric variables, if you allow all > interactions, then there will > be a 100 terms in your model. This increases for > factors.) > > In other words, how big is your model matrix ? (See > ?model.matrix) > > Deepayan > I see... Unfortunately, model.matrix() ran out of memory :) I have 10 variables, 6 of which are factor, 2 of which have quite a lot of levels (about 40). And I would like to allow all interactions. I understand your point about categorical variables, but still - this does not seem like too much data to me. I remmeber fitting all kinds of models (mostly decision trees) for much, much larger data sets. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] R memory and CPU requirements
Thanks for all the help on my previous questions. One more (hopefully last one) : I've been very surprised when I tried to fit a model (using aov()) for a sample of size 200 and 10 variables and their interactions. It turns out that even 2GB of RAM is not anough for aov() with this sample size, which does not seem so big for me. Am I doing something wrong or this is considered a normal memory requirements ? Frankly, I just don't have an access to a machine with more then 2GB of RAM so I'm not sure how I should attack this problem. 10x. P.S. When I reduced sample size to 50 2GB RAM was enough, but aov() kept working for all night and has not finished yet. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] aov and non-categorical variables
Thanks. One more question, if you don't mind. If instead of aov(), I call lm() directly it fits a linear regression model and if it encounters categorical variable it does what needs to be done in this case - defines a new indicator variable for each level of categorical var. However, if I call aov() with the same data (categorical and numeric) I don't see all these indicator variables in the ANOVA table. It is unclear to me how the ANOVA table with lots of inidcator variables produced by lm() is transferred into the ANOVA table of aov(). Also, after you mention the Error() term in aov() I tried to find some explaination about it in R manuals, and did not find any. Do you know where the meaning of Error() in aov() is documented ? Thanks. --- [EMAIL PROTECTED] wrote: > On 15 Oct 2003 at 9:32, Alexander Sirotkin [at > Yahoo] wrote: > > > It is unclear to me how aov() handles > non-categorical > > variables. > > aov is an interface to lm, so it can estimate every > model lm > can, the difference is that it produces the results > (summary) > in the classical way for anova. > > > > > I mean it works and produces results that I would > > expect, but I was under impression that ANOVA is > only > > defined for categorical variables. > > > > In addition, help(aov) says that it "call to 'lm' > for > > each stratum", which I presume means that it > calls > > to lm() for every group of the categorical > variable, > > No. With anova you can also define "error strata" > using > Error() as part of the formula, lm() cannot do that. > If you don't use > Error() in the formula, lm() is called only once. > > Kjetil Halvorsen > > > however I don't quite understand what this means > for > > non-categorical variable. > > > > Thanks > > > > __ > > [EMAIL PROTECTED] mailing list > > > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] aov and non-categorical variables
It is unclear to me how aov() handles non-categorical variables. I mean it works and produces results that I would expect, but I was under impression that ANOVA is only defined for categorical variables. In addition, help(aov) says that it "call to 'lm' for each stratum", which I presume means that it calls to lm() for every group of the categorical variable, however I don't quite understand what this means for non-categorical variable. Thanks __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help