Re: [R] Datamining-package-?
Hi, out of curiosity, what is the name of the package you found? Roberto On 2/27/07, j.joshua thomas [EMAIL PROTECTED] wrote: Dear Group, I have found the package. Thanks very much JJ --- On 2/28/07, j.joshua thomas [EMAIL PROTECTED] wrote: I couldn't locate package rattle? Need some one's help. JJ --- On 2/28/07, Daniel Nordlund [EMAIL PROTECTED] wrote: -Original Message- From: [EMAIL PROTECTED] [mailto: [EMAIL PROTECTED] On Behalf Of j.joshua thomas Sent: Tuesday, February 27, 2007 5:52 PM To: r-help@stat.math.ethz.ch Subject: Re: [R] Datamining-package-? Hi again, The idea of preprocessing is mainly based on the need to prepare the data before they are actually used in pattern extraction.or feed the data into EA's (Genetic Algorithm) There are no standard practice yet however, the frequently used on are 1. the extraction of derived attributes that is quantities that accompany but not directly related to the data patterns and may prove meaningful or increase the understanding of the patterns 2. the removal of some existing attributes that should be of no concern to the mining process and its insignificance So i looking for a package that can do this two above mentioned points Initially i would like to visualize the data into pattern and understand the patterns. snip Joshua, You might take a look at the package rattle on CRAN for initially looking at your data and doing some basic data mining. Hope this is helpful, Dan Daniel Nordlund Bothell, WA, USA __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Lecturer J. Joshua Thomas KDU College Penang Campus Research Student, University Sains Malaysia -- Lecturer J. Joshua Thomas KDU College Penang Campus Research Student, University Sains Malaysia [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to install a package in R on a linux machine?
Hi, try this: $sudo R CMD INSTALL downloaded.package.tar.gz If you don't use 'sudo' (or do not have privileges to do so), you need to either become root (with su) or ask the administrator of the machine you are using to install the package for you regards, Roberto On 2/22/07, Gabor Csardi [EMAIL PROTECTED] wrote: The easiest is perhaps to do install.packages(packagename) this downloads the package and installs it into the default R package library on your machine. If you want to install it to a different directory use the 'lib' argument of 'install.packages'. If you don't want to download the package again but want to use the downloaded one, use the following command: install.packages(repos=NULL, pkgs=the.file.you've.downloaded) You can also install R packages from the command line, like this: R CMD INSTALL -l lib.directectory downloaded.package.file Gabor On Thu, Feb 22, 2007 at 04:44:25PM +0800, gallon li wrote: I downloaded the tar.gz file from r-project website (and saved it in a local directory) and wish to use the package in R. But I am not sure how to use the install.packages command. I tried a few times and still couldn't figure out the correct way to install this package. [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Csardi Gabor [EMAIL PROTECTED]MTA RMKI, ELTE TTK __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Putting splom in a function
On 2/15/07, Deepayan Sarkar [EMAIL PROTECTED] wrote: On 2/14/07, Roberto Perdisci [EMAIL PROTECTED] wrote: if I use groups = groups instead of groups = as.symbol(groups) shomthing is plotted, but not the correct scatterplot. Try groups = eval(as.name(groups)) groups = eval(as.name(groups)) does exactly what I was looking for :) and then I noticed that groups = eval(as.symbol(groups)) works as well thank you, Roberto Deepayan I think the problem is that I don't cast the 'groups' variable to the correct type. Besides as.symbol() I tried also as.expression(), because ?xyplot says groups: a variable or expression to be evaluated in the data frame specified by 'data'. What is the correct type? What as.* should I use? thank you, regards, Roberto __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Does rpart package have some requirements on the original data set?
Hi, try to set minsplit=2 and cp=0. After training you can prune with different values of cp, and plot how the accuracy changes. try this code (which I'm sure can be improved) require(rpart) rpart.prune.stats - function(unpruned.tree,testset,class.index.name,cp) { acc.rpart.pruned - list() nnodes - NULL rpart.pruned - unpruned.tree; for(i in 1:length(cp)) { print(paste(cp =,cp[i])) rpart.pruned - prune(rpart.pruned,cp[i]) pred.rpart.pruned - predict(rpart.pruned,testset,type=class) acc - sum(pred.rpart.pruned==testset[,class.index.name])/nrow(testset) acc.rpart.pruned - c(acc.rpart.pruned,list(acc)) nnodes - c(nnodes,nrow(rpart.pruned$frame)) } return(list(acc = acc.rpart.pruned, nnodes = nnodes)) } plot.rpart.prune.results - function(formula,traininingset,testset,class.index.name,dataset.name,cp,add=F,ylim=NULL) { rpart.unpruned - rpart(formula,data=traininingset,control=rpart.control(minsplit=2,cp=0)) res - rpart.prune.stats(rpart.unpruned,testset,class.index.name,cp) x - unlist(res$acc) y - unlist(res$nnodes) print(x) print(y) if(add) par(new=T) plot(cp,x,type=l,col=blue,ylim=ylim,ann=F) text(cp[c(seq(1,length(cp),by=5))],x[c(seq(1,length(cp),by=5))],paste((,y[seq(1,length(cp),by=5)],),sep=),pos=3,cex=0.5) title(main=dataset.name,xlab=cp,ylab=Accuracy,font=3,cex=0.5) } and call it using something similar plot.rpart.prune.results(Class~.,DatasetX.train,DatasetX.test,Class,DatasetX,cp=seq(0,0.005,by=0.0001)) You can also oversample the minority class using sampling with replacement or undersample the majority class. This are two very simple techniques used in machine learning when dealing with unbalanced datasets (there are more complicated techniques which produce better results, though) hope this helps, cheers, Roberto On 2/15/07, Liu, Ningwei [EMAIL PROTECTED] wrote: Hi, I am currently studying Decision Trees by using rpart package in R. I artificially created a data set which includes the dependant variable (y) and a few independent variables (x1, x2...). The dependant variable y only comprises 0 and 1. 90% of y are 1 and 10% of y are 0. When I apply rpart to it, there is no splitting at all. I am wondering whether this is because of the special distribution of y. Since the majority of y is 1 (information in the data set is small), rpart automatically regards it as already a single class and therefore won't proceed any further. If this understanding is correct, what I should do if I still want rpart to do something on this data set? Thanks a lot! Ningwei [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Putting splom in a function
Hello R list, I have a little problem with splom. I'd like to wrap it in a function, for example: multi.scatterplot - function(data,groups,cols,colors) { splom(~data[,cols], groups = as.symbol(groups), data = data, panel = panel.superpose, col=colors) } and then call it like in multi.scatterplot(iris,Species,1:4,c(green,blue,red)) but the problem is: Error in form$groups[form$subscr] : object is not subsettable if I use groups = groups instead of groups = as.symbol(groups) shomthing is plotted, but not the correct scatterplot. I think the problem is that I don't cast the 'groups' variable to the correct type. Besides as.symbol() I tried also as.expression(), because ?xyplot says groups: a variable or expression to be evaluated in the data frame specified by 'data'. What is the correct type? What as.* should I use? thank you, regards, Roberto __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Probabilities calibration error ROCR
Hello, I'd need to compute the calibration error of posterior class probabilities p(y|x) estimated by using rpart as classification tree. Namely, I train rpart on a dataset D and then use predict(... type=prob) to estimate p(y|x). I've found the possibility to do that in the ROCR package, but I cannot find a link to a paper/book which explains the details of the implemented algorithm. Do you know of any reference where I can find the details of the algorithm that computes the calibration error implemented in ROCR (apart from ROCR's source code)? Is there any other function/package I can use to compute the calibration error? thank you, regards, Roberto __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Laplace correction for rpart class probability estimate
Hello everybody, I'm using rpart to fit a classification tree. I'm interested in the way rpart estimates the class membership probabilities. Does it implement the Laplace correction rule? Is there any parameter I can use to ask rpart to do that? I was not able to find this option in the manual or on the internet. thank you, regards, Roberto __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.