Re: [R] Text Mining

2007-07-06 Thread wssecn
See the tm package.

Washington S. Silva


 Hi everybody,
 I am a new R user. Is there any package devoted to text mining analysis in 
 R ?
 Thanks
 Gilles
 [EMAIL PROTECTED]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] inter-rater agreement index kappa

2007-06-26 Thread wssecn
See packages concord and psy,

hope this helps,

Washington S. Silva

 On Tuesday 26 June 2007 10:14, Nair, Murlidharan T wrote:
  Is there a function that calculates the inter-rater agreement index
  (kappa) in R?
 
  Thanks ../Murli
 
 I have found a couple useful approaches:
 
 # PCC, kappa, rand index
 require(e1701)
 classAgreement(2x2.table)
 
 # kendall's tau
 cor(x,y, method='kendall')
 
 cheers,
 
 -- 
 Dylan Beaudette
 Soils and Biogeochemistry Graduate Group
 University of California at Davis
 530.754.7341
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] normality tests [Broadcast]

2007-05-25 Thread wssecn
 The normality of the residuals is important in the inference procedures for 
the classical linear regression model, and normality is very important in 
correlation analysis (second moment)...

Washington S. Silva

 Thank you all for your replies they have been more useful... well
 in my case I have chosen to do some parametric tests (more precisely
 correlation and linear regressions among some variables)... so it
 would be nice if I had an extra bit of support on my decisions... If I
 understood well from all your replies... I shouldn't pay s much
 attntion on the normality tests, so it wouldn't matter which one/ones
 I use to report... but rather focus on issues such as the power of the
 test...
 
 Thanks again.
 
 On 25/05/07, Lucke, Joseph F [EMAIL PROTECTED] wrote:
   Most standard tests, such as t-tests and ANOVA, are fairly resistant to
  non-normalilty for significance testing. It's the sample means that have
  to be normal, not the data.  The CLT kicks in fairly quickly.  Testing
  for normality prior to choosing a test statistic is generally not a good
  idea.
 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Liaw, Andy
  Sent: Friday, May 25, 2007 12:04 PM
  To: [EMAIL PROTECTED]; Frank E Harrell Jr
  Cc: r-help
  Subject: Re: [R] normality tests [Broadcast]
 
  From: [EMAIL PROTECTED]
  
   On 25/05/07, Frank E Harrell Jr [EMAIL PROTECTED] wrote:
[EMAIL PROTECTED] wrote:
 Hi all,

 apologies for seeking advice on a general stats question. I ve run
 
 normality tests using 8 different methods:
 - Lilliefors
 - Shapiro-Wilk
 - Robust Jarque Bera
 - Jarque Bera
 - Anderson-Darling
 - Pearson chi-square
 - Cramer-von Mises
 - Shapiro-Francia

 All show that the null hypothesis that the data come from a normal
 
 distro cannot be rejected. Great. However, I don't think
   it looks nice
 to report the values of 8 different tests on a report. One note is
 
 that my sample size is really tiny (less than 20
   independent cases).
 Without wanting to start a flame war, are there any
   advices of which
 one/ones would be more appropriate and should be reported
   (along with
 a Q-Q plot). Thank you.

 Regards,

   
Wow - I have so many concerns with that approach that it's
   hard to know
where to begin.  But first of all, why care about
   normality?  Why not
use distribution-free methods?
   
You should examine the power of the tests for n=20.  You'll probably
 
find it's not good enough to reach a reliable conclusion.
  
   And wouldn't it be even worse if I used non-parametric tests?
 
  I believe what Frank meant was that it's probably better to use a
  distribution-free procedure to do the real test of interest (if there is
  one) instead of testing for normality, and then use a test that assumes
  normality.
 
  I guess the question is, what exactly do you want to do with the outcome
  of the normality tests?  If those are going to be used as basis for
  deciding which test(s) to do next, then I concur with Frank's
  reservation.
 
  Generally speaking, I do not find goodness-of-fit for distributions very
  useful, mostly for the reason that failure to reject the null is no
  evidence in favor of the null.  It's difficult for me to imagine why
  there's insufficient evidence to show that the data did not come from a
  normal distribution would be interesting.
 
  Andy
 
 
   
Frank
   
   
--
Frank E Harrell Jr   Professor and Chair   School
   of Medicine
  Department of Biostatistics
   Vanderbilt University
   
  
  
   --
   yianni
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
  
  
 
 
  
  --
  Notice:  This e-mail message, together with any
  attachments,...{{dropped}}
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -- 
 yianni
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and