[R] modifications to text.tree function

2005-05-12 Thread Alexander Sirotkin
Hi.

I have to make some minor modifications to the text.tree function - I
don't like the way it prints the split labels (they are too long in my
case and overlap). I tried to make s simple modification to the
text.tree function so that it will limit the number of significant
digits in tree labels, but could not - the original function uses some
undocumented "treeco" function, which I can not find.

Any ideas ? Thanks.
-- 
Alexander Sirotkin

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] question about plot.acf

2005-03-31 Thread Alexander Sirotkin
Hi.

I'm looking for a way to plot autocorrelation, but in a little bit
different way than
plot.acf does. Instead of plotting NxN graphs (assuming N is ht enumber of 
variables) like plot.acf does, I'd like to have one graph of sum of
all autocorrelations
vs. lag. Is there any function that already does this or should a try
to write it
myslef ? 

Thanks a lot.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] statistical significance test for cluster agreement

2004-03-24 Thread Alexander Sirotkin \[at Yahoo\]
Christian,

I think I understand your point, but I do not
completely agree with you. I also did not describe 
my problem clear enough.

> If you see two
> clusterings on the same
> data, they are identical, if they are 100%
> identical, and if not, then
> not. 

What you are actually saying is that all values of 
Rand index for cluster agreement other then 1 
inidicate that clusters do not agree. I believe
that many people would disagree with this statement.

Let me explain my problem in a little bit more detail.

I have some classified data set. These classes were 
ontained using non-statistical methods. What I'm
trying
to do is run some clustering algorithm and compare
it's results to this known classification.

I think that this is not very different from
calculating mean and comparing it to some known value.

I think that is should be theoretically possible to
use
Rand index as a test statistic. 

Or maybe I'm missing something...

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] statistical significance test for cluster agreement

2004-03-24 Thread Alexander Sirotkin \[at Yahoo\]
Like you said, such kind of test will not give me
anything that Rand index does not, except for p-value.

The null hypothesis, in my case, is that clustering
results does not match a different clustering, that
someone alse did on the same data.

And I do believe that this hypothesis is valid.
Basicly, it's not that different from chi-squared
goodness of fit test which check whether or not my 
data comes from particular distribution. With an 
exception that I don't know how to do chi-squared test
in this case :)



--- "Liaw, Andy" <[EMAIL PROTECTED]> wrote:
> But what would such a test do that the rand index
> does not?  Would you
> interpret the p-value from such a test, if exists,
> to have the meaning that
> a real test of hypothesis has?  AFAIK you basically
> need to have the
> hypotheses pinned down even before you see any data,
> for the inference to be
> valid.  Is that possible with clustering?
> 
> Just my $0.02...
> Andy
> 
> > From: Alexander Sirotkin [at Yahoo]
> > 
> > I was wondering, whether there is a way to have
> > statistical significance test for cluster
> agreement.
> > 
> > I know that I can use classAgreement() function to
> get
> > Rand index, which will give me some indication
> whether
> > the clusters agree or not, but it would be
> interesting
> > to have a formal test.
> > 
> > Thanks.
> > 
> > __
> > [EMAIL PROTECTED] mailing list
> >
>
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> > 
> > 
> 
> 
>
--
> Notice:  This e-mail message, together with any
> attachments, contains
> information of Merck & Co., Inc. (One Merck Drive,
> Whitehouse Station, New
> Jersey, USA 08889), and/or its affiliates (which may
> be known outside the
> United States as Merck Frosst, Merck Sharp & Dohme
> or MSD and in Japan as
> Banyu) that may be confidential, proprietary
> copyrighted and/or legally
> privileged. It is intended solely for the use of the
> individual or entity
> named on this message.  If you are not the intended
> recipient, and have
> received this message in error, please notify us
> immediately by reply e-mail
> and then delete it from your system.
>

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] statistical significance test for cluster agreement

2004-03-23 Thread Alexander Sirotkin \[at Yahoo\]
I was wondering, whether there is a way to have
statistical significance test for cluster agreement.

I know that I can use classAgreement() function to get
Rand index, which will give me some indication whether
the clusters agree or not, but it would be interesting
to have a formal test.

Thanks.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] importing S-Plus data files

2004-02-27 Thread Alexander Sirotkin \[at Yahoo\]
S-Plus version is 6.1 (on both Linux and Windows), R
is 1.8.1. 

It's Win2K, although I don't think it matters.

Thanks.

--- "Liaw, Andy" <[EMAIL PROTECTED]> wrote:
> You can help yourself to help us by at least telling
> us what versions of
> S-PLUS on Linux the data were created from, the
> version of S-PLUS you are
> using under Windows (which version of Windows?) and
> the version of R you are
> using.
> 
> I believe starting in S-PLUS 6.1, the data created
> by S-PLUS is binary
> compatible between Linux and Windows versions.
> 
> Andy
> 
> > From: Demiurg [at Yahoo]
> > 
> > I have some data in the Linux version of S-Plus,
> which I can not use
> > anymore. The program is just broken and won't run.
> I'm trying to
> > find a way to import that data to either Windows
> version of S-Plus
> > (which I have running on my other machine) or R
> (Linux or Windows,
> > it doesn't matter).
> > 
> > Unfortunately, nothing seems to work.
> > Windows S-Plus seem to ignore files from Linux
> .Data directory
> > and non of the import ruotines available in R can
> handle my data.
> > 
> > Any suggestions would be appreciated.
> > 
> > __
> > [EMAIL PROTECTED] mailing list
> >
>
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> > 
> > 
> 
> 
>
--
> Notice:  This e-mail message, together with any
> attachments, contains
> information of Merck & Co., Inc. (One Merck Drive,
> Whitehouse Station, New
> Jersey, USA 08889), and/or its affiliates (which may
> be known outside the
> United States as Merck Frosst, Merck Sharp & Dohme
> or MSD and in Japan, as
> Banyu) that may be confidential, proprietary
> copyrighted and/or legally
> privileged. It is intended solely for the use of the
> individual or entity
> named on this message.  If you are not the intended
> recipient, and have
> received this message in error, please notify us
> immediately by reply e-mail
> and then delete it from your system.
>

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] problem with bagging

2004-02-06 Thread Alexander Sirotkin \[at Yahoo\]
I'm having the most weird problem with bagging
function.
For some unknown reason it does not improve the
classification (compared to rpart), but instead gives
much worse results !

Running rpart on my data gives error rate of about 0.3
and bagging, instead of improving this results, gives
error rate of 0.9 !!! 

I'm running both rpart and bagging with exactly the
same parameters, I even tried to run bagging() with
nbagg=1, which should be identical to rpart, but still
- bagging gives this terrible error rate.

Any help would be appriciated.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Difference between summary.lm() and summary.aov()

2003-12-16 Thread Alexander Sirotkin \[at Yahoo\]
Thanks a lot to everybody. Two more questions, if you
don't mind :

How anova() treats non-categorical variables, such as
severity in my case ? I was under impression that
ANOVA is defined for categorical variables only.

I read about drop1() and I understand that it performs
F-test for nested models, correct me if I'm wrong. It
is unclear to me, however, how it manages to do this
F-test for interactions ?

Thanks a lot.

--- Peter Dalgaard <[EMAIL PROTECTED]> wrote:
> "Alexander Sirotkin [at Yahoo]"
> <[EMAIL PROTECTED]> writes:
> 
> > John,
> > 
> > What you are saying is that any conclusion I can
> make
> > from summary.aov (for instance, to answer a
> question
> > if physician is a significant variable) will not
> be
> > correct ?
> 
> Summary.aov is for summarizing aov objects, so
> you're lucky to get
> something that is sensible at all. You should use
> anova() to get
> analysis of variance tables. These are sequential so
> that you can use
> them (give or take some quibbles about the residual
> variance) for
> reducing the model from the "bottom up". I.e. if you
> place "physician"
> last, you get the F test for whether that variable
> is significant.
> However, a more convenient way of getting that
> result is to use
> drop1(). Even then there's no simple relation to the
> two
> t-tests, except that the F test tests the hypothesis
> that *both*
> coefficients are zero, where the t-tests do so
> individually. 
>  
> 
> > --- John Fox <[EMAIL PROTECTED]> wrote:
> > > Dear Spencer and Alexander,
> > > 
> > > In this case, physician is apparently a factor
> with
> > > three levels, so 
> > > summary.aov() gives you a sequential ANOVA,
> > > equivalent to what you'd get 
> > > from anova(). There no simple relationship
> between
> > > the F-statistic for 
> > > physician, which has 2 df in the numerator, and
> the
> > > two t's. (By the way, I 
> > > doubt whether a sequential ANOVA is what's
> wanted
> > > here.)
> > > 
> > > Regards,
> > >   John
> > > 
> > > At 09:17 AM 12/6/2003 -0800, Spencer Graves
> wrote:
> > > >  The square of a Student's t with "df"
> degrees
> > > of freedom is an F 
> > > > distribution with 1 and "df" degrees of
> freedom.
> > > >  hope this helps.  spencer graves
> > > >
> > > >Alexander Sirotkin [at Yahoo] wrote:
> > > >
> > > >>I have a simple linear model (fitted with
> lm())
> > > with 2
> > > >>independant
> > > >>variables : one categorical and one integer.
> > > >>
> > > >>When I run summary.lm() on this model, I get a
> > > >>standard linear
> > > >>regression summary (in which one categorical
> > > variable
> > > >>has to be
> > > >>converted into many indicator variables) which
> > > looks
> > > >>like :
> > > >>
> > > >>Estimate Std. Error t value
> Pr(>|t|)
> > > >>(Intercept)  -3595.3 2767.1  -1.299  
> 0.2005
> > > >>physicianB 802.0 2289.5   0.350  
> 0.7277
> > > >>physicianC4906.8 2419.8   2.028  
> 0.0485 *
> > > >>severity  7554.4  906.3   8.336
> 1.12e-10
> > > ***
> > > >>
> > > >>and when I run summary.aov() I get similar
> ANOVA
> > > table
> > > >>:
> > > >>   Df Sum SqMean Sq F value   
> > > Pr(>F)
> > > >>physician2  294559803  147279901  3.3557  
> > > 0.04381
> > > >>*
> > > >>severity 1 3049694210 3049694210 69.4864
> > > 1.124e-10
> > > >>***
> > > >>Residuals   45 1975007569   43889057
> > > >>
> > > >>What is absolutely unclear to me is how
> F-value
> > > and
> > > >>Pr(>F) for the
> > > >>categorical "physician" variable of the
> > > summary.aov()
> > > >>is calculated
> > > >>from the t-value of the summary.lm() table.
> > > >>
> > > >>I looked at the summary.aov() source code but
> > > still
> > > >>could not figure
> > > >>it.
> > > >>
> > > >>Thanks a lot.
> > 

Re: [R] Difference between summary.lm() and summary.aov()

2003-12-07 Thread Alexander Sirotkin \[at Yahoo\]
John,

What you are saying is that any conclusion I can make
from summary.aov (for instance, to answer a question
if physician is a significant variable) will not be
correct ?


--- John Fox <[EMAIL PROTECTED]> wrote:
> Dear Spencer and Alexander,
> 
> In this case, physician is apparently a factor with
> three levels, so 
> summary.aov() gives you a sequential ANOVA,
> equivalent to what you'd get 
> from anova(). There no simple relationship between
> the F-statistic for 
> physician, which has 2 df in the numerator, and the
> two t's. (By the way, I 
> doubt whether a sequential ANOVA is what's wanted
> here.)
> 
> Regards,
>   John
> 
> At 09:17 AM 12/6/2003 -0800, Spencer Graves wrote:
> >  The square of a Student's t with "df" degrees
> of freedom is an F 
> > distribution with 1 and "df" degrees of freedom.
> >  hope this helps.  spencer graves
> >
> >Alexander Sirotkin [at Yahoo] wrote:
> >
> >>I have a simple linear model (fitted with lm())
> with 2
> >>independant
> >>variables : one categorical and one integer.
> >>
> >>When I run summary.lm() on this model, I get a
> >>standard linear
> >>regression summary (in which one categorical
> variable
> >>has to be
> >>converted into many indicator variables) which
> looks
> >>like :
> >>
> >>Estimate Std. Error t value Pr(>|t|)
> >>(Intercept)  -3595.3 2767.1  -1.299   0.2005
> >>physicianB 802.0 2289.5   0.350   0.7277
> >>physicianC4906.8 2419.8   2.028   0.0485 *
> >>severity  7554.4  906.3   8.336 1.12e-10
> ***
> >>
> >>and when I run summary.aov() I get similar ANOVA
> table
> >>:
> >>   Df Sum SqMean Sq F value   
> Pr(>F)
> >>physician2  294559803  147279901  3.3557  
> 0.04381
> >>*
> >>severity 1 3049694210 3049694210 69.4864
> 1.124e-10
> >>***
> >>Residuals   45 1975007569   43889057
> >>
> >>What is absolutely unclear to me is how F-value
> and
> >>Pr(>F) for the
> >>categorical "physician" variable of the
> summary.aov()
> >>is calculated
> >>from the t-value of the summary.lm() table.
> >>
> >>I looked at the summary.aov() source code but
> still
> >>could not figure
> >>it.
> >>
> >>Thanks a lot.
> >>
> >>__
> >>

> >>
> >>__
> >>[EMAIL PROTECTED] mailing list
>
>>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> >>
> >
> >__
> >[EMAIL PROTECTED] mailing list
>
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 
>
-
> John Fox
> Department of Sociology
> McMaster University
> Hamilton, Ontario, Canada L8S 4M4
> email: [EMAIL PROTECTED]
> phone: 905-525-9140x23604
> web: www.socsci.mcmaster.ca/jfox
>
-
>

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Difference between summary.lm() and summary.aov()

2003-12-06 Thread Alexander Sirotkin \[at Yahoo\]
I have a simple linear model (fitted with lm()) with 2
independant
variables : one categorical and one integer.

When I run summary.lm() on this model, I get a
standard linear
regression summary (in which one categorical variable
has to be
converted into many indicator variables) which looks
like :

Estimate Std. Error t value Pr(>|t|)
(Intercept)  -3595.3 2767.1  -1.299   0.2005
physicianB 802.0 2289.5   0.350   0.7277
physicianC4906.8 2419.8   2.028   0.0485 *
severity  7554.4  906.3   8.336 1.12e-10 ***

and when I run summary.aov() I get similar ANOVA table
: 

   Df Sum SqMean Sq F valuePr(>F)
physician2  294559803  147279901  3.3557   0.04381
*
severity 1 3049694210 3049694210 69.4864 1.124e-10
***
Residuals   45 1975007569   43889057

What is absolutely unclear to me is how F-value and
Pr(>F) for the
categorical "physician" variable of the summary.aov()
is calculated
from the t-value of the summary.lm() table.

I looked at the summary.aov() source code but still
could not figure
it.

Thanks a lot.

__

New Yahoo! Photos - easier uploading and sharing.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] R memory and CPU requirements

2003-10-17 Thread Alexander Sirotkin \[at Yahoo\]
Thanks for all the responses. 

After re-examining my data I came to realize that
second order interactions would be enough in my
particular case. With second order instructions I
managed to fit a model with less then 512MB RAM.

Thanks to everybody.


--- John Fox <[EMAIL PROTECTED]> wrote:
> Dear Alexander,
> 
> 
> At 01:29 AM 10/17/2003 -0700, Alexander Sirotkin
> \[at Yahoo\] wrote:
> >I agree completely.
> >
> >In fact, I have about 5000 observations, which
> should
> >be enough.
> >I was using 200 samples because of RAM limitations
> and
> >  I'm afraid to think about what amount of RAM I'll
> >need to fit an aov() for such data.
> >
> 
> 
> OK -- I didn't realize that you have 5000
> observations. Perhaps I didn't 
> read some of the earlier messages carefully enough.
> 
> At the risk of getting you to repeat information
> that you've already 
> provided, how many degrees of freedom are there in
> the model that you're 
> trying to fit? I can create a 5000 by 5000 model
> matrix on my relatively 
> anemic Windows machine, and surely (unless there's
> some specification 
> error) your model should have many fewer df than
> that if it includes just 
> the main effects and two-way interactions (or by all
> interactions, do you 
> mean higher-order interactions as well?).
> 
> Perhaps providing the following information would
> help: What is the model 
> formula? Which variables are factors? How many
> levels does each factor have?
> 
> Regards,
>   John
> 
> >--- John Fox <[EMAIL PROTECTED]> wrote:
> > > Dear Alexander,
> > >
> > > If I understand you correctly, you have a sample
> of
> > > 200 observations. Even
> > > if you had only two factors with 40 levels each,
> the
> > > main effects and
> > > interactions of these factors would require
> about
> > > 1600 degrees of freedom
> > > -- that is, more than the number of
> observations.
> > > This doesn't make a whole
> > > lot of sense.
> > >
> > > I hope that this helps,
> > >   John
> > >
> > > At 05:03 PM 10/16/2003 -0700, Alexander Sirotkin
> > > \[at Yahoo\] wrote:
> > >
> > > >--- Deepayan Sarkar <[EMAIL PROTECTED]>
> wrote:
> > > > > On Thursday 16 October 2003 17:59, Alexander
> > > > > Sirotkin \[at Yahoo\] wrote:
> > > > > > Thanks for all the help on my previous
> > > questions.
> > > > > >
> > > > > > One more (hopefully last one) : I've been
> very
> > > > > > surprised when I tried to fit a model
> (using
> > > > > aov())
> > > > > > for a sample of size 200 and 10 variables
> and
> > > > > their
> > > > > > interactions.
> > > > >
> > > > > That doesn't really say much. How many of
> these
> > > > > variables are factors ? How
> > > > > many levels do they have ? And what is the
> order
> > > of
> > > > > the interaction ? (Note
> > > > > that for 10 numeric variables, if you allow
> all
> > > > > interactions, then there will
> > > > > be a 100 terms in your model. This increases
> for
> > > > > factors.)
> > > > >
> > > > > In other words, how big is your model matrix
> ?
> > > (See
> > > > > ?model.matrix)
> > > > >
> > > > > Deepayan
> > > > >
> > > >
> > > >
> > > >I see...
> > > >
> > > >Unfortunately, model.matrix() ran out of memory
> :)
> > > >I have 10 variables, 6 of which are factor, 2
> of
> > > which
> > > >
> > > >have quite a lot of levels (about 40). And I
> would
> > > >like
> > > >to allow all interactions.
> > > >
> > > >I understand your point about categorical
> > > variables,
> > > >but
> > > >still - this does not seem like too much data
> to
> > > me.
> > > >
> > > >
> > > >I remmeber fitting all kinds of models (mostly
> > > >decision
> > > >trees) for much, much larger data sets.
> > > >
> > > >__
> > > >[EMAIL PROTECTED] mailing list
> > >
> >
>
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> > >
> > >
>
>-
> > > John Fox
> > > Department of Sociology
> > > McMaster University
> > > Hamilton, Ontario, Canada L8S 4M4
> > > email: [EMAIL PROTECTED]
> > > phone: 905-525-9140x23604
> > > web: www.socsci.mcmaster.ca/jfox
> > >
>
>-
> > >
> >
> >
> >__
> >Do you Yahoo!?

> search
> >http://shopping.yahoo.com
> 
>
-
> John Fox
> Department of Sociology
> McMaster University
> Hamilton, Ontario, Canada L8S 4M4
> email: [EMAIL PROTECTED]
> phone: 905-525-9140x23604
> web: www.socsci.mcmaster.ca/jfox
>
-
>

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] R memory and CPU requirements

2003-10-17 Thread Alexander Sirotkin \[at Yahoo\]

--- Deepayan Sarkar <[EMAIL PROTECTED]> wrote:
> On Thursday 16 October 2003 19:03, Alexander
> Sirotkin \[at Yahoo\] wrote:
> 
> > > > Thanks for all the help on my previous
> questions.
> > > >
> > > > One more (hopefully last one) : I've been very
> > > > surprised when I tried to fit a model (using
> > > > aov())
> > > > for a sample of size 200 and 10 variables and
> > > > their interactions.
> > >
> > > That doesn't really say much. How many of these
> > > variables are factors ? How
> > > many levels do they have ? And what is the order
> of
> > > the interaction ? (Note
> > > that for 10 numeric variables, if you allow all
> > > interactions, then there will
> > > be a 100 terms in your model. This increases for
> > > factors.)
> > >
> > > In other words, how big is your model matrix ?
> (See
> > > ?model.matrix)
> > >
> > > Deepayan
> >
> > I see...
> >
> > Unfortunately, model.matrix() ran out of memory :)
> > I have 10 variables, 6 of which are factor, 2 of
> which
> >
> > have quite a lot of levels (about 40). And I would
> > like to allow all interactions.
> >
> > I understand your point about categorical
> variables,
> > but still - this does not seem like too much data
> to me.
> 
> That's one way to look at it. You don't have enough
> data for the model you are 
> trying to fit. The usual approach under these
> circumstances is to try 
> 'simpler' models.
> 
> Please try to understand what you are trying to do
> (in this case by reading an 
> introductory linear model text) before blindly
> applying a methodology.
> 
> Deepayan
> 
> 


I did study ANOVA and I do have enough observations.
200 was only a random sample of more then 5000 which I
think should be enough. However, I'm afraid to even
think about amount of RAM I will need with R to fit a
model for this data.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] R memory and CPU requirements

2003-10-17 Thread Alexander Sirotkin \[at Yahoo\]
I agree completely. 

In fact, I have about 5000 observations, which should
be enough. 
I was using 200 samples because of RAM limitations and
 I'm afraid to think about what amount of RAM I'll
need to fit an aov() for such data.


--- John Fox <[EMAIL PROTECTED]> wrote:
> Dear Alexander,
> 
> If I understand you correctly, you have a sample of
> 200 observations. Even 
> if you had only two factors with 40 levels each, the
> main effects and 
> interactions of these factors would require about
> 1600 degrees of freedom 
> -- that is, more than the number of observations.
> This doesn't make a whole 
> lot of sense.
> 
> I hope that this helps,
>   John
> 
> At 05:03 PM 10/16/2003 -0700, Alexander Sirotkin
> \[at Yahoo\] wrote:
> 
> >--- Deepayan Sarkar <[EMAIL PROTECTED]> wrote:
> > > On Thursday 16 October 2003 17:59, Alexander
> > > Sirotkin \[at Yahoo\] wrote:
> > > > Thanks for all the help on my previous
> questions.
> > > >
> > > > One more (hopefully last one) : I've been very
> > > > surprised when I tried to fit a model (using
> > > aov())
> > > > for a sample of size 200 and 10 variables and
> > > their
> > > > interactions.
> > >
> > > That doesn't really say much. How many of these
> > > variables are factors ? How
> > > many levels do they have ? And what is the order
> of
> > > the interaction ? (Note
> > > that for 10 numeric variables, if you allow all
> > > interactions, then there will
> > > be a 100 terms in your model. This increases for
> > > factors.)
> > >
> > > In other words, how big is your model matrix ?
> (See
> > > ?model.matrix)
> > >
> > > Deepayan
> > >
> >
> >
> >I see...
> >
> >Unfortunately, model.matrix() ran out of memory :)
> >I have 10 variables, 6 of which are factor, 2 of
> which
> >
> >have quite a lot of levels (about 40). And I would
> >like
> >to allow all interactions.
> >
> >I understand your point about categorical
> variables,
> >but
> >still - this does not seem like too much data to
> me.
> >
> >
> >I remmeber fitting all kinds of models (mostly
> >decision
> >trees) for much, much larger data sets.
> >
> >__
> >[EMAIL PROTECTED] mailing list
>
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 
>
-
> John Fox
> Department of Sociology
> McMaster University
> Hamilton, Ontario, Canada L8S 4M4
> email: [EMAIL PROTECTED]
> phone: 905-525-9140x23604
> web: www.socsci.mcmaster.ca/jfox
>
-
>

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] R memory and CPU requirements

2003-10-16 Thread Alexander Sirotkin \[at Yahoo\]

--- Deepayan Sarkar <[EMAIL PROTECTED]> wrote:
> On Thursday 16 October 2003 17:59, Alexander
> Sirotkin \[at Yahoo\] wrote:
> > Thanks for all the help on my previous questions.
> >
> > One more (hopefully last one) : I've been very
> > surprised when I tried to fit a model (using
> aov())
> > for a sample of size 200 and 10 variables and
> their
> > interactions.
> 
> That doesn't really say much. How many of these
> variables are factors ? How 
> many levels do they have ? And what is the order of
> the interaction ? (Note 
> that for 10 numeric variables, if you allow all
> interactions, then there will 
> be a 100 terms in your model. This increases for
> factors.)
> 
> In other words, how big is your model matrix ? (See
> ?model.matrix)
> 
> Deepayan
> 


I see... 

Unfortunately, model.matrix() ran out of memory :)
I have 10 variables, 6 of which are factor, 2 of which

have quite a lot of levels (about 40). And I would
like 
to allow all interactions.

I understand your point about categorical variables,
but 
still - this does not seem like too much data to me.


I remmeber fitting all kinds of models (mostly
decision 
trees) for much, much larger data sets.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] R memory and CPU requirements

2003-10-16 Thread Alexander Sirotkin \[at Yahoo\]
Thanks for all the help on my previous questions.

One more (hopefully last one) : I've been very
surprised when I tried to fit a model (using aov())
for a sample of size 200 and 10 variables and their
interactions. 

It turns out that even 2GB of RAM is not anough for
aov() with this sample size, which does not seem so
big for me. Am I doing something wrong or this is
considered a normal memory requirements ? 

Frankly, I just don't have an access to a machine with
more then 2GB of RAM so I'm not sure how I should
attack this problem.

10x.

P.S. When I reduced sample size to 50 2GB RAM was
enough, but aov() kept working for all night and has
not finished yet.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] aov and non-categorical variables

2003-10-15 Thread Alexander Sirotkin \[at Yahoo\]
Thanks. One more question, if you don't mind.

If  instead of aov(), I call lm() directly it fits a
linear regression model and if it encounters
categorical variable it does what needs to be done in
this case - defines a new indicator variable for each
level of categorical var.

However, if I call aov() with the same data
(categorical and numeric) I don't see all these
indicator variables in the ANOVA table. It is unclear
to me how the ANOVA table with lots of inidcator
variables produced by lm() is transferred into the
ANOVA table of aov().

Also, after you mention the Error() term in aov() I
tried to find some explaination about it in R manuals,
and did not find any. Do you know where the meaning of
Error() in aov() is documented ?

Thanks.

--- [EMAIL PROTECTED] wrote:
> On 15 Oct 2003 at 9:32, Alexander Sirotkin [at
> Yahoo] wrote:
> 
> > It is unclear to me how aov() handles
> non-categorical
> > variables.
> 
> aov is an interface to lm, so it can estimate every
> model lm
> can, the difference is that it produces the results
> (summary)
> in the classical way for anova.
> 
> > 
> > I mean it works and produces results that I would
> > expect, but I was under impression that ANOVA is
> only
> > defined for categorical variables.
> > 
> > In addition, help(aov) says that it "call to 'lm'
> for
> > each  stratum", which  I presume means that it
> calls
> > to lm() for every group of the categorical
> variable, 
> 
> No. With anova you can also define "error strata"
> using 
> Error() as part of the formula, lm() cannot do that.
> If you don't use 
> Error() in the formula, lm() is called only once. 
> 
> Kjetil Halvorsen
> 
> > however I don't quite understand what this means
> for
> > non-categorical variable.
> > 
> > Thanks
> > 
> > __
> > [EMAIL PROTECTED] mailing list
> >
>
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 
>

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] aov and non-categorical variables

2003-10-15 Thread Alexander Sirotkin \[at Yahoo\]
It is unclear to me how aov() handles non-categorical
variables.

I mean it works and produces results that I would
expect, but I was under impression that ANOVA is only
defined for categorical variables.

In addition, help(aov) says that it "call to 'lm' for
each  stratum", which  I presume means that it calls
to lm() for every group of the categorical variable,
however I don't quite understand what this means for
non-categorical variable.

Thanks

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help