[R] randomForest

2005-07-07 Thread Weiwei Shi
guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Weiwei Shi, Ph.D Did

Re: [R] randomForest

2005-07-07 Thread Weiwei Shi
it works. thanks, but: (just curious) why i tried previously and i got is.vector(sample.size) [1] TRUE i also tried as.vector(sample.size) and assigned it to sampsz,it still does not work. On 7/7/05, Duncan Murdoch [EMAIL PROTECTED] wrote: On 7/7/2005 3:38 PM, Weiwei Shi wrote: Hi

Re: [R] randomForest

2005-07-07 Thread Weiwei Shi
0 0 0 3 0 5 and class number returned from sample.size is like: 28, 8, 82, 28, 18, 22 Should I use gbm to try it since it might focus more on misplaced cases? thanks, weiwei On 7/7/05, Liaw, Andy [EMAIL PROTECTED] wrote: From: Weiwei Shi it works. thanks, but: (just curious

Re: [R] comparing strength of association instead of strength of evidence?

2005-07-08 Thread Weiwei Shi
PROTECTED] wrote: Weiwei Shi wrote: Hi, I asked this question before, which was hidden in a bunch of questions. I repharse it here and hope I can get some help this time: I have 2 contingency tables which have the same group variable Y. I want to compare the strength of association

Re: [R] comparing strength of association instead of strength of evidence?

2005-07-10 Thread Weiwei Shi
and text categorization is the focus and interests of this project. Decision tree, Bayesian network or SVM/LSI might be candidates. Thanks for further suggestion, Weiwei On 7/10/05, Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] wrote: Weiwei Shi wrote: Dear all: I still need some further

Re: [R] randomForest

2005-07-11 Thread Weiwei Shi
Thanks. Many people pointed that out. (It was due to that I only knew lappy by that time :). On 7/11/05, Martin Maechler [EMAIL PROTECTED] wrote: Duncan == Duncan Murdoch [EMAIL PROTECTED] on Thu, 07 Jul 2005 15:44:38 -0400 writes: Duncan On 7/7/2005 3:38 PM, Weiwei Shi wrote

[R] read.table

2005-07-13 Thread Weiwei Shi
. And this time it finished quickly. So, there might be something wrong in my data format causing that problem. then, my question is, is there a way in R to track at which line, something wrong occurs? Thanks, Weiwei -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III

Re: [R] read.table

2005-07-13 Thread Weiwei Shi
to see at which line it hesitates to move on? regards, weiwei On 7/13/05, Weiwei Shi [EMAIL PROTECTED] wrote: Hi, I have a question on read.table. I have a dataset with 273,000 lines and 195 columns. I used the read.table to load the data into R: trn-read.table('train1.dat', header=F, sep

Re: [R] read.table

2005-07-13 Thread Weiwei Shi
, Gabor Grothendieck [EMAIL PROTECTED] wrote: You could use the nlines= argument to scan to read in a portion at a time. On 7/13/05, Weiwei Shi [EMAIL PROTECTED] wrote: add: I used trn-matrix(scan('train1.dat', sep='|', na.string='.'), nrow=273529, ncol=195) it is done

Re: [R] read.table

2005-07-13 Thread Weiwei Shi
there is another problem since last time i forgot byrow :( trn-matrix(scan('train1.dat', sep='|', na.string='.'), nrow=273529, ncol=195, byrow=T) Read 53338155 items Error: cannot allocate vector of size 416704 Kb please help with this 'simple' reading task. weiwei On 7/13/05, Weiwei Shi

Re: [R] read.table

2005-07-13 Thread Weiwei Shi
that its of the required size, just in case, and then turn it into a matrix and transpose it. On 7/13/05, Weiwei Shi [EMAIL PROTECTED] wrote: there is another problem since last time i forgot byrow :( trn-matrix(scan('train1.dat', sep='|', na.string='.'), nrow=273529, ncol=195, byrow=T

Re: [R] read.table

2005-07-13 Thread Weiwei Shi
On 7/13/05, Weiwei Shi [EMAIL PROTECTED] wrote: i think what you meant is trn-matrix(scan('train1.dat', sep='|', na.string='.'), nrow=195, ncol=273529) and then transpose it. However: Error: cannot allocate vector of size 512000 Kb the answer is no :( I think i am going to write my

[R] read large amount of data

2005-07-18 Thread Weiwei Shi
total, 1023040k used, 2088696k free, 150160k buffers Swap: 4008208k total,19040k used, 3989168k free, 668892k cached Thanks, weiwei -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III __ R-help

Re: [R] Chemoinformatic people

2005-07-21 Thread Weiwei Shi
-guide.html -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting

Re: [R] RandomForest question

2005-07-21 Thread Weiwei Shi
/posting-guide.html -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org

[R] imbalanced data set

2005-07-23 Thread Weiwei Shi
in order to increase classification accuracy. Is there any work which has been implemented in R or some GNU softwares? Thanks, weiwei -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch

[R] cluster

2005-07-25 Thread Weiwei Shi
. Please share your experience in using clustering (Any available implementation outside R is also welcome) weiwei -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https

Re: [R] cluster

2005-07-26 Thread Weiwei Shi
) by cross-validation or bootstrap applied to the resulting decision tree in the classification problem. Best, Christian On Mon, 25 Jul 2005, Weiwei Shi wrote: Dear listers: Here I have a question on clustering methods available in R. I am trying to down-sampling the majority class

Re: [R] CART analysis

2005-07-27 Thread Weiwei Shi
] Some people live and die by actuarial tables Groundhog Day __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Weiwei Shi

Re: [R] thks all

2005-07-27 Thread Weiwei Shi
__ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Weiwei Shi, Ph.D Did you always know? No, I did

[R] outlier detection

2005-08-03 Thread Weiwei Shi
Hi, there: I am wondering what packages are available in R which can do outlier detection in large-scale dataset. Thanks for sharing info, weiwei -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III __ R-help

Re: [R] outlier detection

2005-08-03 Thread Weiwei Shi
, Wensui Liu [EMAIL PROTECTED] wrote: Random forest can do the job. HTH. On 8/3/05, Weiwei Shi [EMAIL PROTECTED] wrote: Hi, there: I am wondering what packages are available in R which can do outlier detection in large-scale dataset. Thanks for sharing info, weiwei -- Weiwei

[R] some thoughts on outlier detection, need help!

2005-08-04 Thread Weiwei Shi
. Is there similar algorithm in R or published? Thanks for any suggestions, weiwei -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do

[R] computationally singular

2005-08-08 Thread Weiwei Shi
if it is due to some variables and not sure if dropping variables is a good idea either. Thanks for help, weiwei -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https

Re: [R] computationally singular

2005-08-08 Thread Weiwei Shi
, Christian On Mon, 8 Aug 2005, Weiwei Shi wrote: Hi, I have a dataset which has around 138 variables and 30,000 cases. I am trying to calculate a mahalanobis distance matrix for them and my procedure is like this: Suppose my data is stored in mymatrix S-cov(mymatrix) # this is fine

Re: [R] computationally singular

2005-08-10 Thread Weiwei Shi
PCA definately is worth of trying, which was my second thought. But thanks for the help and also on the suggestion. On 8/10/05, Kjetil Brinchmann Halvorsen [EMAIL PROTECTED] wrote: Weiwei Shi wrote: I think the problem might be caused two variables are very correlated. Should I check the cov

[R] clustering or homegenity approaches?

2005-08-11 Thread Weiwei Shi
. Thanks, Weiwei -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org

[R] need help

2005-08-12 Thread Weiwei Shi
basically there is one (or more) big 'gap' in the case i seek. thanks, weiwei -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman

[R] multiple responses classification/regression problem

2005-08-12 Thread Weiwei Shi
Hi, there: I am wondering if anyone knows about this? the response variable is a vector. I knew mvpart might be able to do this. anyone would like to share some examples? of course, nnet can do that too, but what else? Thanks, you guys have a good weekend! Weiwei -- Weiwei Shi, Ph.D Did you

Re: [R] need help

2005-08-12 Thread Weiwei Shi
Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Weiwei Shi Sent: Friday, August 12, 2005 2:05 PM To: r-help Subject: [R] need help Hi, there: I think i need to re-phrase my question since last time I did not get any reply but i think the question

Re: [R] staying with R, jobs in R

2005-08-29 Thread Weiwei Shi
process. - George E. P. Box __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Weiwei Shi, Ph.D Did you always know? No, I did

[R] a question on proximity measurement in randomForest

2005-09-24 Thread Weiwei Shi
, should rf grow trees in a classification way so that it can give a better measure on prox? thanks for your comments, Weiwei -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III [[alternative HTML version deleted

[R] missing handling

2005-09-27 Thread Weiwei Shi
[each,], index.missing, m.trn1[index.missing]);}) [1] 1.53 0.00 1.53 0.00 0.00 Another general question is are there some packages in R doing missing handling? Thanks, -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III [[alternative HTML version deleted

Re: [R] missing handling

2005-10-04 Thread Weiwei Shi
5 [5,] 3 5 9 5 6 5 3 8 6 7 [6,] 9 6 10 5 10 4 2 10 4 5 [7,] 5 2 5 10 3 7 6 4 6 8 [8,] 2 6 1 8 9 2 7 8 3 8 [9,] 9 1 4 9 8 10 2 8 1 7 [10,] 2 4 8 7 8 4 3 8 5 5 On 9/27/05, Weiwei Shi [EMAIL PROTECTED] wrote: Hi, I have the following codes to replace missing using median, assuming

[R] generalized linear model and missing handling

2005-10-04 Thread Weiwei Shi
case), but I am wondering if there are other ways, when glm or something like it is concerned in R? Thanks, Weiwei -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III [[alternative HTML version deleted]] __ R

Re: [R] generalized linear model and missing handling

2005-10-04 Thread Weiwei Shi
I mean complete.cases(df) gives me all falses On 10/4/05, Weiwei Shi [EMAIL PROTECTED] wrote: Hi, I have a dataset and want to build a generalized linear model on it. Unfortunately, complete.cases(df) returns null, which means I have to find a way to fill those missings. One way is following

[R] pca in dimension reduction

2005-10-05 Thread Weiwei Shi
Hi, there: I am wondering if anyone here can provide an example using pca doing dimension reduction for a dataset. The dataset can be n*q (n=q or n=q). As to dimension reduction, are there other implementations for like ICA, Isomap, Locally Linear Embedding... Thanks, weiwei -- Weiwei Shi

Re: [R] pca in dimension reduction

2005-10-05 Thread Weiwei Shi
of the statistician is to catalyze the scientific learning process. - George E. P. Box -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Weiwei Shi Sent: Wednesday, October 05, 2005 12:27 PM To: r-help Subject: [R] pca in dimension reduction Hi, there: I

[R] a problem in random forest

2005-10-11 Thread Weiwei Shi
) [1] 7427 it works. But i need to know votes so I have to use the first way. Please help. Weiwei -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III [[alternative HTML version deleted]] __ R-help

Re: [R] a problem in random forest

2005-10-11 Thread Weiwei Shi
I am sorry for bother. I think I figured that out. the result of votes for test data is not rf$votes, but rf$test$votes On 10/11/05, Weiwei Shi [EMAIL PROTECTED] wrote: Hi, there: I spent some time on this but I think I really cannot figure it out, maybe I missed something here: my data

[R] an error in my using of nnet

2005-10-11 Thread Weiwei Shi
final value 7361.038121 converged Error in y - tmp : non-numeric argument to binary operator Please help! Thanks, Weiwei -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III [[alternative HTML version deleted

Re: [R] an error in my using of nnet

2005-10-11 Thread Weiwei Shi
: Weiwei Shi wrote: Hi, there: I am trying nnet as followed: mg.nnet-nnet(x=trn3[,r.v[1:100]], y=trn3[,209], size=5, decay = 5e-4, maxit = 200) # weights: 511 initial value 13822.108453 iter 10 value 7408.169201 iter 20 value 7362.201934 iter 30 value 7361.669408 iter 40 value

Re: [R] memory problems when combining randomForests

2006-07-31 Thread Weiwei Shi
/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III [[alternative HTML version

Re: [R] memory problems when combining randomForests

2006-07-31 Thread Weiwei Shi
Found it from another paper: importance sample learning ensemble (ISLE) which originates from Friedman and Popescu (2003). On 7/31/06, Weiwei Shi [EMAIL PROTECTED] wrote: Hi, Andy: What's the Jerry Friedman's ISLE? I googled it and did not find the paper on it. Could you give me a link

[R] filter high-throughput microarray data with noise

2006-09-11 Thread Weiwei Shi
is the variance x2.var[2,] Group.1 V3 V5 V7 V9 V11 V13 -2147022884 17.30989 14.15427 6.495755 5.791014 767.9342 510.5714 2. Is there any good reference on this kind of things? like online materials or book. thanks, -- Weiwei Shi, Ph.D Research Scientist

[R] two questions associated with heatmap

2006-09-22 Thread Weiwei Shi
#FF to #FF evenly. so basically i need a vector like this: c(#FF, ?, ?, ?, #FF) the number of groups can be 10 or whatever. thanks. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III

Re: [R] [BioC] two questions associated with heatmap

2006-09-23 Thread Weiwei Shi
oh, i mean how it defines the group? i just want to confirm if it is based on the whole matrix? On 9/23/06, Sean Davis [EMAIL PROTECTED] wrote: Sean Davis wrote: Weiwei Shi wrote: hi, there: i have 2 questions associated with heatmap in heatmap.2{gplot}, there is a bar called raw z

[R] 5 binary_class models vs one 5-class model

2006-09-26 Thread Weiwei Shi
approach will you take? thanks, -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read

[R] mx2 contingency tables or (2^(m-1)-1)'s 2x2 contingency tables in the context of feature selection for random forest

2006-09-28 Thread Weiwei Shi
used in rf, I am thinking if I should simply use mx2 contingency table or (2^(m-1)-1)'s 2x2 contingency tables in which I pick the best p-value to evaluate A's power. For the latter, I am sure it is very alike the way used in rf. But is the former good enough? Thanks. -- Weiwei Shi, Ph.D Research

Re: [R] how to convert all columns of a data frame into factors

2006-10-04 Thread Weiwei Shi
, so lapply() is a natural choice. Andy From: Gabor Grothendieck Try this: replace(BOD, TRUE, lapply(BOD, factor)) On 10/4/06, Weiwei Shi [EMAIL PROTECTED] wrote: Hi, I use apply apply(x, 2, factor) but it does not work. please help. thanks. -- Weiwei Shi

[R] a question on using arules package

2006-10-05 Thread Weiwei Shi
it. Maybe someone can explain to me about the structure of rules from that package. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https

[R] how to check object size in workspace

2006-10-06 Thread Weiwei Shi
to check objects' size (the size when I try to save it in disk) in workspace, anyway. thanks, -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list

[R] correlation b/w a continuous variable and a categorical variable

2006-10-13 Thread Weiwei Shi
-- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R

Re: [R] correlation b/w a continuous variable and a categorical variable

2006-10-13 Thread Weiwei Shi
. On 10/13/06, Achim Zeileis [EMAIL PROTECTED] wrote: On Fri, 13 Oct 2006 17:15:45 -0400 Weiwei Shi wrote: Dear Listers: I happen to have this question in mind, is there a way to evaluate the correlation between a continuous variable and a categorical variable (without discretizing

[R] multiple trees

2005-01-05 Thread Weiwei Shi
dataset has 142 variables, the last one is a categorical response variable. also, i am not sure how to save the trees into a list or something so that I can handle, like pointer array or something in C. Thanks. Weiwei Shi, Ph.D cv- function(all.data,n.folds=10,mcp=0.003) { n - nrow(all.data) idx

[R] gbm

2005-01-12 Thread Weiwei Shi
Hi, there: I am wondering if I can find some detailed explanation on gbm or explanation on examples of gbm. thanks, Ed __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!

[R] gbm

2005-01-12 Thread Weiwei Shi
Hi, there: Thanks a lot for all people' prompt replies. In detail, I am facing a huge amount of data: over 10,000 and 400 vars. This project is very challenging and interesting to me. I tried rpart which gives me some promising results but not good enough. So I am trying randomForest and gbm now.

[R] load object

2005-01-13 Thread Weiwei Shi
Hi, I happen to re-write my codes to save memory and my approach is write my obj into file first and later I load it. However, it seems like: load(filename) can load the object but the function returns the name of the object instead of the reference to it. For example, I have an object called

Re: [R] load object

2005-01-13 Thread Weiwei Shi
[EMAIL PROTECTED] wrote: From: Douglas Bates [EMAIL PROTECTED] Fri Jan 14 08:35:33 2005 Weiwei Shi wrote: Hi, I happen to re-write my codes to save memory and my approach is write my obj into file first and later I load it. However, it seems like: load(filename) can

[R] rpart

2005-01-17 Thread Weiwei Shi
Hi, there: I am working on a classification problem by using rpart. when my response variable y is binary, the trees grow very fast, but if I add one more case to y, that is making y has 3 cases, the tree growing cannot be finished. the command looks like: x-rpart(r0$V142~.,data=r0[,1:141],

[R] suggestion on data mining book using R

2005-01-19 Thread Weiwei Shi
Hi, there: I think I need a book on data mining book using R. I knew Modern Applied Statistics with S-plus (2nd Ed) or Modern Applied Statistics with S (4th Ed) might be a good choice. But not sure if there is other better suggestion and which one between the two is better. thanks, Ed

[R] multi-class classification using rpart

2005-01-25 Thread WeiWei Shi
Hi, I am trying to make a multi-class classification tree by using rpart. I used MASS package'd data: fgl to test and it works well. However, when I used my small-sampled data as below, the program seems to take forever. I am not sure if it is due to slowness or there is something wrong with my

Re: [R] multi-class classification using rpart

2005-01-25 Thread WeiWei Shi
:04 -0500, Liaw, Andy [EMAIL PROTECTED] wrote: From: WeiWei Shi Hi, I am trying to make a multi-class classification tree by using rpart. I used MASS package'd data: fgl to test and it works well. However, when I used my small-sampled data as below, the program seems to take forever

Collapsing solution to the question discussed above: Re: [R] multi-class classification using rpart

2005-01-25 Thread WeiWei Shi
current thought to collapse them since it is a classification problem. I am searching for some papers which discussed on this topic. Anyone has more ideas or info like paper? Thanks. Ed On Tue, 25 Jan 2005 21:49:26 +0100, Uwe Ligges [EMAIL PROTECTED] wrote: WeiWei Shi wrote: Hi, Andy: Thanks

[R] how to evaluate the significance of attributes in tree growing

2005-01-26 Thread WeiWei Shi
Hi, there: I am wondering if there is a package in R (doing decison trees) which can provide some methods to evaluate the significance of attributes. I remembered randomForest gives some output like that. Unfortunately my current computing env. cannot handle my datasets if I use randomForest. So,

[R] clustering

2005-01-27 Thread WeiWei Shi
Hi, I just get a question (sorry if it is a dumb one) and I phase my question in the following R codes: group1-rnorm(n=50, mean=0, sd=1) group2-rnorm(n=20, mean=1, sd=1.5) group3-c(group1,group2) Now, if I am given a dataset from group3, what method (discriminant analysis, clustering, maybe) is

Re: [R] clustering

2005-01-27 Thread WeiWei Shi
:16 -0600, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: The cluster analysis should be able to handle that. I think if you know how many clusters you have, kmeans is ok, or the EM algorithm can also do that. On Thu, Jan 27, 2005 at 03:44:42PM -0500, WeiWei Shi wrote: Hi, I just get a question

Re: [R] clustering

2005-01-27 Thread WeiWei Shi
of gaussians, likelihood based approaches might be better. MASS (the book) has an example of fitting univariate mixture of gaussians using various optimizers. The code is even in $R_HOME/library/MASS/scripts/ch16.R. Andy From: WeiWei Shi Hi, thanks for reply. In fact, I tried both of them

Re: [R] clustering

2005-01-28 Thread WeiWei Shi
, the resulting distributions conditional on the intervals are not normals, but truncated normals! This is important if you try to check within group normality, unless you have strongly separated clusters (which does not seem to be the case). Christian On Fri, 28 Jan 2005, WeiWei Shi wrote

Re: [R] Begginer with R

2005-01-28 Thread WeiWei Shi
http://www.liacc.up.pt/~ltorgo/DataMiningWithR/ might be a good start point. also the book: MASS (4th Ed) Check some previous lists and you will find some similar topics too. I used R just for like 2 weeks. Not sure of your level of knowledge in data mining. but hope this help. Ed On Fri,

[R] feature (attribute) selection

2005-02-02 Thread WeiWei Shi
Hi, there: Recently, I read some papers about feature or attribute selection and most of them are discussed in the context of supervised learning. I knew Weka has implementation on some of them but I am wondering if there is any package available in R which can do this kind of job. Thanks for

Re: [R] feature (attribute) selection

2005-02-02 Thread WeiWei Shi
-0500, Das, Rajdeep [EMAIL PROTECTED] wrote: Hi, Look for dprep package for wrapper based feature selction that use lda, knn etc. Also you can use package rfe that implements recursive feature elimination using SVM. -Original Message- From: WeiWei Shi To: R-help

[R] genetic algorithm

2005-02-04 Thread WeiWei Shi
Hi, I am doing some research on feature selection for classfication problem using genetic algorithm in a wrapper approach. I am wondering if there is some package which is already built for this purpose. I was advised before about dprep package but I don't think it used GA there (if I am wrong,

[R] R or weka

2005-02-07 Thread WeiWei Shi
Hi, guys: These days I keep using R and Weka to do data mining. I think my next step is open the source codes so that I can customrize them and make them better server my purpose. But now I kinda hesitate to do so b/c I am really not sure which is better to start with. You know, both require some

[R] a syntax question: $V100

2005-02-14 Thread WeiWei Shi
Hi, there: I have a syntax question. I have a dataset x with 100 variables. I did not set the column name so I used x$V1...x$V100. For my case, I need to put the number (e.g. 20) into another variable, like index so that I can refer to x$V20 by using something like x$V(index) but I don't know how

[R] gbm

2005-02-18 Thread WeiWei Shi
Hi, there: I am always experiencing the scalability of some R packages. This time, I am trying gbm to do adaboosting on my project. Initially I tried to grow trees by using rpart on a dataset with 200 variables and 30,000 observations. Now, I am thinking if I can apply adaboosting on it. I am

Re: [R] passing command line arguments to 'R CMD BATCH myScript.R'

2005-02-25 Thread WeiWei Shi
Hi, I recently solved the problem: I ran a program from Linux. The basic idea is using enviroment variables and ?Sys.getenv This is a general approach to calling R from a script file. Here is part of my codes and explanation: # assign some values to the arguments dvar=5 categorical=3 #

[R] cda

2005-02-25 Thread WeiWei Shi
Hi, there: I am wondering if I can get some general help or source about canonical discriminant analysis in R. My idea is trying to linearly combine 300 variables supervisely (according to the class lables to the observations. I think it is kinda PCA to do some decreasing dimentionality work,

[R] repost my question of cda in case

2005-03-01 Thread WeiWei Shi
Dear R-helpers: I sent this question 3 days ago but I didn't get any reply. In case this question was somewhat not seen by people who happpened to know the answer, I repost it here. Sorry for bother but I am kind of needing some help. BTW, if the question itself was not well expressed, please let

[R] gbm

2005-03-04 Thread WeiWei Shi
Hi, there: Is there anyone who read the codes for gbm package before? Before i sent this email, I also sent an email to ask for help from the author, Greg. But still I am wondering if someone here can share some understanding like the roadmap or document on the implementation too. Thanks, Ed.

[R] regression modeling

2006-04-24 Thread Weiwei Shi
-- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide

Re: [R] regression modeling

2006-04-25 Thread Weiwei Shi
vary as the dataset gets larger and larger? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Weiwei Shi Sent: Monday, April 24, 2006 12:45 PM To: r-help Subject: [R] regression modeling Hi, there: I am looking for a regression modeling

[R] process monitoring, a simple question

2006-05-01 Thread Weiwei Shi
this onto the screen: step1 is running... step1 is done. and so on. Thanks. -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list

Re: [R] boosting - second posting

2006-05-30 Thread Weiwei Shi
}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed

[R] when dimensionality is larger than the number of observations?

2006-05-30 Thread Weiwei Shi
Hi, there: Can anyone here kindly point some good reference or links on this topic? Esp. some solutions from BioConductor or R, when dealing with microarray-like, fat data? thanks, -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III [[alternative

[R] time series clustering

2006-06-02 Thread Weiwei Shi
? Thanks, Weiwei -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read

[R] a statistics question

2006-04-07 Thread Weiwei Shi
am also wondering if R has already some function or package addressing this kind of problem. Thanks -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III [[alternative HTML version deleted]] __ R-help

Re: [R] a statistics question

2006-04-07 Thread Weiwei Shi
of Computational and Graphical Statistics, 12, 475-511. and the LogicReg package. Peter Ehlers Weiwei Shi wrote: Hi there, I have a statistics question on a classification problem: Suppose I have 1000 binary variables and one binary dependent variable. I want to find a way similar

Re: [R] Normalization and missing values

2005-04-13 Thread WeiWei Shi
the way of scaling, IMHO, really depends on the distribution of each column in your original files. if each column in your data follows a normal distrbution, then a standard normalization will fit your requirement. My previous research in microarray data shows me a simple linear standardization

[R] an interesting qqnorm question

2005-04-22 Thread WeiWei Shi
Hi, r-gurus: I happened to have a question in my work: I have a dataset, which has only one dimention, like 0.99037297527605 0.991179836732708 0.995635340631367 0.997186769599305 0.991632565640424 0.984047197106486 0.99225943762649 1.00555642128421 0.993725402926564 the data is saved in a

[R] Re: an interesting qqnorm question

2005-04-22 Thread WeiWei Shi
hope it is not b/c some central limit therory, otherwise my initial plan will fail :) On 4/22/05, WeiWei Shi [EMAIL PROTECTED] wrote: Hi, r-gurus: I happened to have a question in my work: I have a dataset, which has only one dimention, like 0.99037297527605 0.991179836732708

[R] have to point it out again: a distribution question

2005-04-28 Thread WeiWei Shi
) are not gaussian at all. -- Vincent On 4/22/05, WeiWei Shi [EMAIL PROTECTED] wrote: hope it is not b/c some central limit therory, otherwise my initial plan will fail :) On 4/22/05, WeiWei Shi [EMAIL PROTECTED] wrote: Hi, r-gurus: I happened to have a question in my work

Re: [R] have to point it out again: a distribution question

2005-04-28 Thread WeiWei Shi
Of WeiWei Shi Sent: Thursday, April 28, 2005 1:38 PM To: Vincent ZOONEKYND Cc: R-help@stat.math.ethz.ch Subject: [R] have to point it out again: a distribution question Dear R-helpers: I pointed out my question last time but it is only partially solved. So I would like to point it out again

Re: [R] have to point it out again: a distribution question

2005-04-29 Thread WeiWei Shi
there? -Original Message- From: WeiWei Shi [mailto:[EMAIL PROTECTED] Sent: Thursday, April 28, 2005 4:18 PM To: Huntsinger, Reid Cc: R-help@stat.math.ethz.ch Subject: Re: [R] have to point it out again: a distribution question Here is summary of l-qqnorm(kk) # kk is my sample l$y (which is my

Re: [R] have to point it out again: a distribution question

2005-04-29 Thread WeiWei Shi
PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of WeiWei Shi Sent: Friday, April 29, 2005 3:22 PM To: bogdan romocea Cc: R-help@stat.math.ethz.ch Subject: Re: [R] have to point it out again: a distribution question discretization from continuous domain to categorical one so that some data

[R] predict problem

2005-05-16 Thread Weiwei Shi
for the time being. Thanks, -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R

[R] predict() question

2005-05-17 Thread Weiwei Shi
in validation, it should complain). Maybe for robustness, predict() has to check first if there is new level or not. I am not sure if my understanding is right or not, please be advised! Thanks, -- Weiwei Shi, Ph.D Did you always know? No, I did not. But I believed... ---Matrix III

[R] Re: text mining: ttda

2005-05-18 Thread Weiwei Shi
Can anyone suggest some good text mining reference or books? thanks, weiwei On 5/18/05, Weiwei Shi [EMAIL PROTECTED] wrote: Hi, I am working on a text mining project and i am interested in ttda package. however, I really cannot find the document for this package in English. Can anyone give

Re: [R] Re: text mining: ttda

2005-05-19 Thread Weiwei Shi
(levels(forms), ...) : ispell output : wrong size ... and I got the error message as above. Please be advised! Thanks, weiwei On 5/19/05, Jean-Pierre Muller [EMAIL PROTECTED] wrote: Dear Weiwei, Le 19 mai 05, à 00:17, Weiwei Shi a écrit : Can anyone suggest some good text mining

[R] help: reference book

2005-06-01 Thread Weiwei Shi
Hi, listers: I am really in need for some good books on financial market analysis, better with R. Can anyone help? Thanks. weiwei __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!

  1   2   3   >