Re: [R] Datamining-package-?

2007-02-27 Thread Roberto Perdisci
Hi,
  out of curiosity, what is the name of the package you found?

Roberto

On 2/27/07, j.joshua thomas [EMAIL PROTECTED] wrote:
 Dear Group,

 I have found the package.

 Thanks very much


 JJ
 ---


 On 2/28/07, j.joshua thomas [EMAIL PROTECTED] wrote:
 
 
  I couldn't locate package rattle?  Need some one's help.
 
 
  JJ
  ---
 
 
 
  On 2/28/07, Daniel Nordlund [EMAIL PROTECTED] wrote:
  
-Original Message-
From: [EMAIL PROTECTED] [mailto:
   [EMAIL PROTECTED]
On Behalf Of j.joshua thomas
Sent: Tuesday, February 27, 2007 5:52 PM
To: r-help@stat.math.ethz.ch
Subject: Re: [R] Datamining-package-?
   
Hi again,
The idea of preprocessing is mainly based on the need to prepare the
   data
before they are actually used in pattern extraction.or feed the data
into EA's (Genetic Algorithm) There are no standard practice yet
   however,
the frequently used on are
   
1. the extraction of derived attributes that is quantities that
   accompany
but not directly related to the data patterns and may prove meaningful
   or
increase the understanding of the patterns
   
2. the removal of some existing attributes that should be of no
   concern to
the mining process and its insignificance
   
So i looking for a package that can do this two above mentioned
   points
   
Initially i would like to visualize the data into pattern and
   understand the
patterns.
   
   
   snip
  
   Joshua,
  
   You might take a look at the package rattle on CRAN for initially
   looking at your data and doing some basic data mining.
  
   Hope this is helpful,
  
   Dan
  
   Daniel Nordlund
   Bothell, WA, USA
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
 
 
  --
  Lecturer J. Joshua Thomas
  KDU College Penang Campus
  Research Student,
  University Sains Malaysia
 



 --
 Lecturer J. Joshua Thomas
 KDU College Penang Campus
 Research Student,
 University Sains Malaysia

 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to install a package in R on a linux machine?

2007-02-22 Thread Roberto Perdisci
Hi,
  try this:

$sudo R CMD INSTALL downloaded.package.tar.gz

If you don't use 'sudo' (or do not have privileges to do so), you need
to either become root (with su) or ask the administrator of the
machine you are using to install the package for you

regards,
Roberto


On 2/22/07, Gabor Csardi [EMAIL PROTECTED] wrote:
 The easiest is perhaps to do

 install.packages(packagename)

 this downloads the package and installs it into the default R package
 library on your machine. If you want to install it to a different
 directory use the 'lib' argument of 'install.packages'.

 If you don't want to download the package again but want to use the
 downloaded one, use the following command:

 install.packages(repos=NULL, pkgs=the.file.you've.downloaded)

 You can also install R packages from the command line, like this:

 R CMD INSTALL -l lib.directectory downloaded.package.file

 Gabor

 On Thu, Feb 22, 2007 at 04:44:25PM +0800, gallon li wrote:
  I downloaded the tar.gz file from r-project website (and saved it in a local
  directory) and wish to use the package in R.
 
  But I am not sure how to use the install.packages command. I tried a few
  times and still couldn't figure out the correct way to install this package.
 
[[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 --
 Csardi Gabor [EMAIL PROTECTED]MTA RMKI, ELTE TTK

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Putting splom in a function

2007-02-15 Thread Roberto Perdisci
On 2/15/07, Deepayan Sarkar [EMAIL PROTECTED] wrote:
 On 2/14/07, Roberto Perdisci [EMAIL PROTECTED] wrote:

  if I use
groups = groups
  instead of
groups = as.symbol(groups)
 
  shomthing is plotted, but not the correct scatterplot.

 Try groups = eval(as.name(groups))

groups = eval(as.name(groups))  does exactly what I was looking for :)
and then I noticed that groups = eval(as.symbol(groups)) works as well

thank you,
Roberto

 Deepayan

  I think the problem is that I don't cast the 'groups' variable to the
  correct type. Besides as.symbol() I tried also as.expression(),
  because ?xyplot says groups: a variable or expression to be evaluated
  in the data frame specified by 'data'.
  What is the correct type? What as.* should I use?
 
  thank you,
  regards,
  Roberto
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Does rpart package have some requirements on the original data set?

2007-02-15 Thread Roberto Perdisci
Hi,
  try to set minsplit=2 and cp=0. After training you can prune with
different values of cp, and plot how the accuracy changes.

try this code (which I'm sure can be improved)

require(rpart)

rpart.prune.stats - function(unpruned.tree,testset,class.index.name,cp) {
acc.rpart.pruned - list()
nnodes - NULL

rpart.pruned - unpruned.tree;
for(i in 1:length(cp)) {
print(paste(cp =,cp[i]))

rpart.pruned - prune(rpart.pruned,cp[i])
pred.rpart.pruned - predict(rpart.pruned,testset,type=class)
acc - sum(pred.rpart.pruned==testset[,class.index.name])/nrow(testset)
acc.rpart.pruned - c(acc.rpart.pruned,list(acc))
nnodes - c(nnodes,nrow(rpart.pruned$frame))
}

return(list(acc = acc.rpart.pruned, nnodes = nnodes))
}


plot.rpart.prune.results -
function(formula,traininingset,testset,class.index.name,dataset.name,cp,add=F,ylim=NULL)
{

 rpart.unpruned -
rpart(formula,data=traininingset,control=rpart.control(minsplit=2,cp=0))
 res - rpart.prune.stats(rpart.unpruned,testset,class.index.name,cp)

 x - unlist(res$acc)
 y - unlist(res$nnodes)

 print(x)
 print(y)


if(add)
par(new=T)
plot(cp,x,type=l,col=blue,ylim=ylim,ann=F)

text(cp[c(seq(1,length(cp),by=5))],x[c(seq(1,length(cp),by=5))],paste((,y[seq(1,length(cp),by=5)],),sep=),pos=3,cex=0.5)
title(main=dataset.name,xlab=cp,ylab=Accuracy,font=3,cex=0.5)
}


and call it using something similar
plot.rpart.prune.results(Class~.,DatasetX.train,DatasetX.test,Class,DatasetX,cp=seq(0,0.005,by=0.0001))


You can also oversample the minority class using sampling with
replacement or undersample the majority class.  This are two very
simple techniques used in machine learning when dealing with
unbalanced datasets (there are more complicated techniques which
produce better results, though)

hope this helps,
cheers,
Roberto

On 2/15/07, Liu, Ningwei [EMAIL PROTECTED] wrote:
 Hi,



 I am currently studying Decision Trees by using rpart package in R. I
 artificially created a data set which includes the dependant variable
 (y) and a few independent variables (x1, x2...). The dependant variable
 y only comprises 0 and 1. 90% of y are 1 and 10% of y are 0. When I
 apply rpart to it, there is no splitting at all.



 I am wondering whether this is because of the special distribution of
 y. Since the majority of y is 1 (information in the data set is small),
 rpart automatically regards it as already a single class and therefore
 won't proceed any further. If this understanding is correct, what I
 should do if I still want rpart to do something on this data set?





 Thanks a lot!





 Ningwei


 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Putting splom in a function

2007-02-14 Thread Roberto Perdisci
Hello R list,
  I have a little problem with splom. I'd like to wrap it in a
function, for example:

multi.scatterplot - function(data,groups,cols,colors) {
splom(~data[,cols], groups = as.symbol(groups), data = data, panel
= panel.superpose, col=colors)
}

and then call it like in

multi.scatterplot(iris,Species,1:4,c(green,blue,red))

but the problem is:
Error in form$groups[form$subscr] : object is not subsettable

if I use
  groups = groups
instead of
  groups = as.symbol(groups)

shomthing is plotted, but not the correct scatterplot.

I think the problem is that I don't cast the 'groups' variable to the
correct type. Besides as.symbol() I tried also as.expression(),
because ?xyplot says groups: a variable or expression to be evaluated
in the data frame specified by 'data'.
What is the correct type? What as.* should I use?

thank you,
regards,
Roberto

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Probabilities calibration error ROCR

2007-01-24 Thread Roberto Perdisci
Hello,
  I'd need to compute the calibration error of posterior class
probabilities p(y|x) estimated by using rpart as classification tree.
Namely, I train rpart on a dataset D and then use predict(...
type=prob) to estimate p(y|x).

  I've found the possibility to do that in the ROCR package, but I
cannot find a link to a paper/book which explains the details of the
implemented algorithm. Do you know of any reference where I can find
the details of the algorithm that computes the calibration error
implemented in ROCR (apart from ROCR's source code)?  Is there any
other function/package I can use to compute the calibration error?

thank you,
regards,
Roberto

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Laplace correction for rpart class probability estimate

2007-01-18 Thread Roberto Perdisci
Hello everybody,
  I'm using rpart to fit a classification tree. I'm interested in the
way rpart estimates the class membership probabilities. Does it
implement the Laplace correction rule? Is there any parameter I can
use to ask rpart to do that?
I was not able to find this option in the manual or on the internet.

thank you,
regards,
Roberto

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.