[R] variable importance in random forest
Hello, In Breiman papers on random forests 4 variable importance measures are described. as far as I can tell only two are available in the random forest R package. reduction in accuracy when the variable is permuted, and the mean decrease in the gini index due to the variable (no permutation). is this gini measure computed on the training set or the OOB cases?. in any event, Breiman actually seems to prefer a different measure based on average lowering of margin across all cases when the variable is permuted. is there any way to get this 'margin-based' variable importance measure from the result returned by the randomForest function? or do I have to use the original Breiman code to get access to this measure? I am using randomForest package release 4.3 many thanks Murad Nayal __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] unavoidable loop? a better way??
__ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[Fwd: Re: [R] unavoidable loop? a better way??]
__ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] 64-bit R on Intel Xeon EM64T running fine
Roger D. Peng wrote: This is good news. As far as I know R has built for quite some time now on a number of 64 bit platforms (Linux on AMD Opteron/Athlon64, Solaris/Sparc) Ill add SGI/IRIX 64 bit platform to the list. I've been running a 64 bit-compiled R on an SGI octane 2 for over a year now without a problem. R sessions often allocate around 8 GB of memory. but I can't recall seeing a build on Intel with the 64 bit extensions. By the way, did you happen to run `make check' just for kicks? -roger Michael Seewald wrote: Dear mailing-list members, In the days of cheap RAM and microarray applications feasting on memory, 64-bit computers become more and more useful - to actually make use of memory beyond the magic 4GB border. I would like to report the success of running 64-bit R on an Intel Xeon EM64T machine under Linux. Just like on an AMD Opteron, R v2.0.0 compiles fine (and out of the box) and is happily allocating memory until RAM and swap reach their limit. Hardware: - HP xw6200 workstation - dual Intel Xeon 3.4GHz with hyper-threading enabled - 4GB RAM, 4GB swap System: either - Fedora Core 2 x86_64 bit Linux or - Red Hat Enterprise Linux Workstation 3.0 x86_64 bit R: - v2.0.0 Really, no problems at all during setup, a big thank you to the R developers making this possible! Best wishes, Michael __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] debugging non-visible functions
Hi, I would like to step-through a non-visible function. but apparently I don't know enough about namespaces to get that to work: methods(predict) ... deleted lines ... [27] predict.rpart* predict.smooth.spline* [31] predict.survreg.penal* Non-visible functions are asterisked debug(predict.rpart) Error: Object predict.rpart not found getAnywhere(predict.rpart) A single object matching 'predict.rpart' was found It was found in the following places registered S3 method for predict from namespace rpart namespace:rpart with value function (object, newdata = list(), type = c(vector, prob, class, matrix), ...) { ... deleted code ... } environment: namespace:rpart debug(predict.rpart,pos=package:rpart) Error: Object predict.rpart not found how can I 'debug' non-visible functions, like predict.rpart? many thanks Murad Nayal __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] silhoutte.default bugs
Martin Maechler wrote: Murad == Murad Nayal [EMAIL PROTECTED] on Wed, 21 Jan 2004 15:19:28 -0500 writes: Murad This might have been fixed in later versions (I am Murad using R1.7.0), yes, the bug has been fixed long ago, from my ChangeLog (!), it was 2003-07-18. sorry about that. I have been reluctant to upgrade recently for fear of disrupting my environment while in the middle of a project. as I mentioned I searched the archive and found posts citing this problem but no replies stating that it has been fixed (the Nj=1 case). I'm still willing to consider your *feature request* (as opposed to bug fix) of allowing inputs where the grouping vector does contain other than 1:g . that would be great. it is straightforward to do and will broaden the utility of silhouette. I'll send you the suggested patch privately. best regards, Murad __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] silhoutte.default bugs
Hello all, This might have been fixed in later versions (I am using R1.7.0), r-help archive contains messages reporting similar problems but no reports of codes fixes. I have encountered a couple of problems using the silhouette function. one occurs when the clustering contains clusters composed of 1 element (Martin Maechler posted code few months ago that fixes a similar problem that occurs when clusters have only 2 elements but not the case with 1 element). the other problem is due to silhouette's assumption that the clusters are numbered sequentially starting at 1. one of the clustering programs I use (snob) assigns more or less arbitrary integer ids to clusters starting from 3! (clusters 1 and 2 have special meaning in snob). the modified code fixing both problems is included below, changes are commented. best Murad silhouette.default - function (x, dist, dmatrix, ...) { cll - match.call() if (!is.null(cl - x$clustering)) x - cl n - length(x) if (!all(x == round(x))) stop(`x' must only have integer codes) k - length(clid - sort(unique(x))) if (k = 1 || k = n) return(NA) if (missing(dist)) { if (missing(dmatrix)) stop(Need either a dissimilarity `dist' or diss.matrix `dmatrix') if (is.null(dm - dim(dmatrix)) || length(dm) != 2 || !all(n == dm)) stop(`dmatrix' is not a dissimilarity matrix compatible to `x') } else { dist - as.dist(dist) if (n != attr(dist, Size)) stop(clustering `x' and dissimilarity `dist' are incompatible) dmatrix - as.matrix(dist) } wds - matrix(NA, n, 3, dimnames = list(names(x), c(cluster, neighbor, sil_width))) for (j in 1:k) { Nj - sum(iC - x == clid[j]) # # the following line changed from wds[iC, cluster] - j # wds[iC, cluster] - clid[j] a.i - if (Nj 1) colSums(dmatrix[iC, iC])/(Nj - 1) else 0 # # the following line changed from # diC - rbind(apply(dmatrix[!iC, iC], 2, function(r) tapply(r, # x[!iC], mean))) # diC - rbind(apply(cbind(dmatrix[!iC, iC]), 2, function(r) tapply(r, x[!iC], mean))) minC - max.col(-t(diC)) wds[iC, neighbor] - clid[-j][minC] # # the following line changed from # b.i - diC[cbind(minC, seq(minC))] # b.i - diC[cbind(minC, seq(along=minC))] s.i - (b.i - a.i)/pmax(b.i, a.i) wds[iC, sil_width] - s.i } attr(wds, Ordered) - FALSE attr(wds, call) - cll class(wds) - silhouette wds } -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] model-based clustering
Hello, I was wondering whether a Poisson mixture modeler/cluster analysis package is available for R. I scanned CRAN packages and couldn't find anything but I thought I'd ask. If not could anyone recommend a non-R open source package. I have found 'snob' but this program seems a bit hard to use in an automated, non interactive fashion. regards, Murad -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] model-based clustering
Hello Murray, thanks for the response. I would actually love to hear alternative suggestions about the problem I am trying to solve. I just thought a short question will be less of a burden on people's time and have a higher chance of being answered. basically the data sets I need to analyze contain 2000-1 objects. each characterized by, depending on the data set, 9-20 attributes. all integers greater than zero, typically the range is [0,1000] with numbers 5 particularly common. there is no apriori reason why these objects should cluster into discrete groups. and in fact when the data is explored graphically (xgobi) it doesn't show an obvious clustering pattern. however, with 9-20 dimensions involved, it is probably easy to miss subtle patterns. I have tried clustering the data using a number of standard approaches including hclust,kmeans,fanny etc. but these methods didn't seem to be able to generate convincingly distinct, homogeneous clusters. of course given the type of the data involved Poisson mixtures seem like the natural choice. I have experimented a bit with snob using contrived data sets (where you know which class objects really belong to) and it has been fairly promising, except maybe for snob's tendency to break the known classes into multiple subclasses. I actually would like to try to code this in R. It would be very helpful to me in fact if you can contribute any code/code fragments/examples from your earlier work on this, either to the list or privately. many thanks Murad [EMAIL PROTECTED] wrote: The list could probably be more useful if you gave more details about your data and the problem. I have written a bit of R code myself for fitting a finite mixture of univariate Poissons by EM and found it very simple to program in R. I suspect that your problem is multivariate, but that should not present any difficulties. The Snob program employs a fairly sophisticated model search strategy based on the Minimum Message Length criterion. If you do not know much about the solution that you are seeking it might be a good way to go. I appreciate that Snob can be rather complex to set up and get going but I think that you should be able to get quite a bit of help from the Monash University people behind the program. They are usually quite keen to encourage new users of Snob. Murray Jorgensen Hello, I was wondering whether a Poisson mixture modeler/cluster analysis package is available for R. I scanned CRAN packages and couldn't find anything but I thought I'd ask. If not could anyone recommend a non-R open source package. I have found 'snob' but this program seems a bit hard to use in an automated, non interactive fashion. regards, Murad -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] SVM question
Hello all, I am trying to use svm (from the e1071 package) to solve a binary classification problem. The two classes in my particular data set are unequally populated. class 'I' (for important) has about 3000 instances while class B (for background) has about 20,000. experimenting with different classifiers I realized that in cases where such an asymmetry exists there is a danger in trivially inflating accuracy levels by biasing the classifier towards the more prevalent class. for example, using the numbers cited above, if the testing set maintains the same distribution of classes as the original data set then you can get an accuracy of about 85% by simply classifying everything as a B. an unsatisfactory classifier given the 'importance' of detecting the I class. which brings me to my question: I am trying to adjust for these issues by - using the class.weights parameter of svm: I couldn't quite get a sense of how to use this parameter from the svm help page (or the introductory papers on the libsvm web site). Is this supposed to be a vector of the priors for the two classes i.e. c(I=.15,B=.85) (which gave me horrible coverage of the 'I' class). is there any 'correct' or conventional values to use for this parameter in cases of unequal sample sizes (for example, the 'complement' of the priors: c(I=0.85,B=0.15) on the grounds that these values will give the two classes in the dataset equal weights. or is it simply another tunable parameter. - choosing training sets that contain randomly selected but equal numbers of cases of each class (and testing on the remaining cases. this is repeated to assess stability of the accuracy and coverage values). here i get mediocre accuracy but respectable coverage of I. This is not strictly an R question, but I thought someone on the list might have had recent experience with these types of problems and can offer some comments about such an approach. many thanks -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] graphics reset
Hello, Is there a specific command to clear the graphics window. On occasion I need to construct plots using commands that don't clear the graphics window (like text, lines and points etc.) -only- and hence need to clear the graphics completely before hand. also, is there a way to restore the graphics parameters to default values, say in these cases where you forgot to save the original values and want to restore the graphics to some sane state after a long R session. many thanks -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] kmeans error (bug?)
Prof Brian Ripley wrote: This is not a bug. It just means that the algorithm sometimes finds an empty cluster, and as you asked for 34 clusters and it had 33 or less it stops. What to do in this situation is currently under discussion, but the advice given is good: try another set of initial centres. I am running kmeans in a loop for a range of possible cluster numbers. The error terminates the loop. is there a mechanism by which I can 'trap' the error so that I can rerun kmeans with another set of initial centers and hence allow the loop to run to completion. something like try {} catch() mechanism of C++ for example. A flag for kmeans that would have it return say a NULL value rather than an error would also help in this type of application. In fact, I wonder if anyone can point me to research, or better still R functions/package/recipe, that help in choosing the best number of clusters for the data. What I have tried so far is to do a manova using the clustering result from kmeans, plot the approximate F statistic and/or the p-value and look for cluster numbers where a sharp increase in F or -log(pvalue) occur. what I would like to do but don't know how is to formally compare successive clustering models. I know you can compare models using the R function anova. but anova does not seem to work with mlm models? Please do read the description of a bug in the R FAQ, and do not misuse the term to mean `something I do not understand'. This wasn't really a declaration that this behavior is a bug, rather it was a question of whether it is (hence the question mark). I guess what I found somewhat confusing is that if kmeans was selecting data points at random as the initial cluster centers then, at least initially, non of these clusters would start out empty. It wasn't immediately clear how could further refinement result in clusters becoming empty. thanks for the feedback On Mon, 10 Nov 2003, Murad Nayal wrote: I have been getting the following intermittent error from kmeans: str(cavint.p.r) num [1:1967, 1:13] 0.691 0.123 0.388 0.268 0.485 ... - attr(*, dimnames)=List of 2 ..$ : chr [1:1967] 6 49 87 102 ... ..$ : chr [1:13] HYD NEG POS OXY ... set.seed(34) kmeans(cavint.p.r,centers=34) Error: empty cluster: try a better set of initial centers the seed being equal to the number of centers in this case is just a coincidence. I've encountered the same error with or without setting the seed at different numbers of clusters. there is nothing particularly unusual about cavint.p.r (no NAs, NULLs), except maybe for the fact that the rows sum to 1. sum(is.na(cavint.p.r)) [1] 0 sum(is.nan(cavint.p.r)) [1] 0 I thought kmeans should select initial centers from the data if not given explicitly! any idea what might be going wrong? And what makes you think it did not? -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] kmeans error (bug?)
Hello, I have been getting the following intermittent error from kmeans: str(cavint.p.r) num [1:1967, 1:13] 0.691 0.123 0.388 0.268 0.485 ... - attr(*, dimnames)=List of 2 ..$ : chr [1:1967] 6 49 87 102 ... ..$ : chr [1:13] HYD NEG POS OXY ... set.seed(34) kmeans(cavint.p.r,centers=34) Error: empty cluster: try a better set of initial centers the seed being equal to the number of centers in this case is just a coincidence. I've encountered the same error with or without setting the seed at different numbers of clusters. there is nothing particularly unusual about cavint.p.r (no NAs, NULLs), except maybe for the fact that the rows sum to 1. sum(is.na(cavint.p.r)) [1] 0 sum(is.nan(cavint.p.r)) [1] 0 I thought kmeans should select initial centers from the data if not given explicitly! any idea what might be going wrong? I am running R 1.7.0 many thanks Murad __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] clustering distributions
Hi, I have a dataset where each case is characterized by a histogram. I would like to cluster these cases using a sensible distance measure, possibly relative entropy? Is there a way I can use R facilities to do this (hclust etc.). I couldn't find an alternative to dist that would compute something like relative entropies. thanks Murad __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] anova model refinement/clustering question
Hi, I am trying to refine models of a continuous response variable and a number of categorical predictor variables. I know of some model refinement tools available in R that help in the selection of model terms like dropterm and addterm from MASS etc. However, I would also like to try to refine the model by 'coalescing' some levels of some of the predictor factors. Is there a standard procedure / R-functions that will allow me to do this. This might be naive but I thought that one way to do this is to perform a pairwise comparison between all levels, say using tukeyHSD, and coalesce levels that do not have a statistically significant difference in the average of the response variable between them. so in a way this becomes a clustering problem. is there a relatively easy way to do this in R, say short of trying to figure out how to make the relevant tukeyHSD output look like a dist object and trick hclust into using it. I am somewhat of an amateur in the field (and R) and I am probably making that obvious. any guidance to the 'right' path to approach this (privately or on the list) is really appreciated. many thanks Murad -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] multiple character matching within a string
Hello all, I need to count the number of times certain characters occur in a string. The only way I have found so far to accomplish this is by using strsplit i.e. my.string - DDDRRHIH my.char - D num.char - -1 + length(unlist(strsplit(my.string,my.char))) now you probably won't be surprised if I say that this has proven to be extremely slow (I am not sure exactly why though, is it because strsplit creates new list for every call?). Is there an alternative way to do this short of going to compiled code? many thanks, -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] EMACS/ESS problems
Hello all, since we're on the topic of R-editors. I am using emacs/ess on a unix workstation (to interact with R and have been having a little problem. I usually write the R commands I need to run in a separate buffer then copy and paste them into the *R* buffer for evaluation. The problem is, if any command is spread over multiple lines emacs/R hangs when I paste it in the R buffer for evaluation. if I use a debugger to see what's going on in both programs they're usually waiting on a select statement (input/output). Anybody has had to deal with a similar situation. any advice for a workaround? both emacs/ess are relatively recent versions (installed a few months ago). I tried using ess-eval-buffer/region instead of cutting and pasting and the same thing happens for me. many thanks -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] EMACS/ESS problems
Hi, A.J. Rossini wrote: 1. I've never seen this behavior, ever. Do you get the same with C-c C-r (highlight region, then C-c C-r sends to the R process in Emacs). Or, if you use C-c C-n to step through the lines? maybe my environment is not set up correctly. C-c C-r doesn't do anything: highlight: (either in a text buffer or *ESS*) v = c(1, 2, 3) C-c C-r switch to *R* v Error: Object v not found OR ess-eval-region prints in the one line buffer at the bottom: Starting evalutions... and hangs (I have to stop it with C-g) now if v was defined on one line v=c(1,2,3) ess-eval-region returns with Finished evaluation. but the v vector is still not defined in R I am sure I am doing something silly. just can't figure out what? 2. [EMAIL PROTECTED] might be a better place to send this. thanks for the advice. I'll try that too. Murad Nayal [EMAIL PROTECTED] writes: Hello all, since we're on the topic of R-editors. I am using emacs/ess on a unix workstation (to interact with R and have been having a little problem. I usually write the R commands I need to run in a separate buffer then copy and paste them into the *R* buffer for evaluation. The problem is, if any command is spread over multiple lines emacs/R hangs when I paste it in the R buffer for evaluation. if I use a debugger to see what's going on in both programs they're usually waiting on a select statement (input/output). Anybody has had to deal with a similar situation. any advice for a workaround? both emacs/ess are relatively recent versions (installed a few months ago). I tried using ess-eval-buffer/region instead of cutting and pasting and the same thing happens for me. many thanks -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- [EMAIL PROTECTED]http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] EMACS/ESS problems
that was exactly what I was missing, Everything now works as advertised. Thank you all so much for the help. you just turned my already very satisfying experience using R into a even more enjoyable one. all the best Rich Heiberger wrote: It looks like you are not using ESS correctly. ESS is designed to work from a buffer containing a file whose name has the .r extension. Thus, open a file, for example, C-x C-f myfile.r and then start using R. My diagnosis is based on your line highlight: (either in a text buffer or *ESS*) ESS won't work from a text buffer because R code has different requirements from ordinary written paragraphs in English or another natural language. The *ESS* buffer is part of the background mechanism that makes ESS work. It is not intended that a user ever look at the *ESS* buffer. One other issue that your original email suggests. We do not recommend the statement v=c(1,2,3) for assignment. It is much better to use the assignment arrow v - c(1,2,3) (with spaces on both sides of the arrow for legibility). It is true that the = will usually do what you expect, but there are some subtle differences (mostly in argument lists to functions). While I can expand on the reasons, for the moment I just want to suggest that you get into the habit of using the assignment arrow. Rich -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] lattice question
Hello, I am using lattice to plot histograms of one variable conditioned on another continuous variable. for this I am using equal.count on the conditioning variable to get the appropriate shingle. I would like to have in my plot a representation of the shingle's intervals including the min/max values and maybe tick marks. some sort of axis for the conditioning variable. while the 'strips' of the lattice plot do represent the single intervals as a darkly shaded region. I can't find a way to also include in the plot the actual min/max numbers corresponding to the shingle's intervals. is that possible? regards -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] help on barplot
Hello, I am trying to compare two histograms using barplot. the idea is to plot the histograms as pairs of columns side by side for each x value. I was able to do it using barplot before but I can't remember now for the life of me now how I did it in the past: d [,1] [,2] -37.5 0.00 2.789396e-05 -32.5 0.0001394700 5.578801e-05 -27.5 0.0019804742 1.732218e-02 -22.5 0.0217294282 1.380474e-01 -17.5 0.0938912134 4.005579e-02 -12.5 0.0630683403 4.351464e-03 -7.5 0.0163179916 8.368201e-05 -2.5 0.0025941423 5.578801e-05 2.5 0.0002789400 0.00e+00 7.5 0.00 0.00e+00 barplot(d,beside=TRUE) barplot here plots two separate 'sets' of columns, on the left side a bar plot of d[,1] is plotted while on the right side a separate bar plot of d[,2] is plotted. how can I combine the two? actually, while on the subject of histograms. is it possible to plot a 3D-histogram in R (a true 3D bar plot, without using image). many thanks Murad -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Formal definitions of R-language.
some comments. I am still learning S/R so please let me know if I am missing something. M.Kondrin wrote: Hello! Some CS-guys (the type who knows what Church formalism is) keep asking me questions about formal definitions of R-language that I can not answer (or even understand). Is there some freely available papers which I can throw at them where it would be explained is R functional/OOP/procedural language, R is object oriented in the sense that the basic entities of the language (data elements, language elements etc.) are complex entities with type and defined behavior (objects). it has inheritance, polymorphism, operator overloading etc. you can define constructors for objects, but as far as I know no destructors (you can define clean up routines on libraries though). it has has elements of being a functional language like the fact that programs are composed of expressions that are turned into function objects that get evaluated. it also supports lambda expressions. it is not a pure functional language because functions can and do have side effects, it has persistent state and assignments, and it has flow of control statements. also, recursion, as far as I know, is inefficient in S/R. which tend to discourage purely functional programming. does it use weak/strong, weak typing. variables are not typed and keep type information once constructed. conversion between types is often automatic or can be programmed to be so, hence operations on disparate types can often be carried out. dynamic/static typization, it is dynamically typed. objects carry and supply type information at run time. types (as well as behavior) can (only) be defined at run time. the S-evaluator has to start first, it then constructs class and function definitions in the run-time environment. object type can be changed, modified and augmented at run time, at least with old style classes, (can you add or remove slots in the new style classes?). does it use lazy or ...(do not know what) evaluation, uses lazy evaluation of expressions. expressions are constructed by the S-evaluator, but not evaluated until needed. what sort of garbage collector it uses? No garbage collector. uses reference counting to discard objects that are no longer needed. Thanks. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] help on R programming.
thanks you all for the replies, it's been very helpful. regards Prof Brian Ripley wrote: On Mon, 23 Jun 2003, Murad Nayal wrote: - what is the correct way to -remove- a component from a list. this seems to do the trick: list[[1]] = NULL, however, you'd think this should simply attach a NULL object at the first component position? This is in the FAQ, section 3.3.3, and is an S/R difference that catches people quite often. It's related to the difference between [] and [[]]. Generally you will find that it is better to program by generating whole lists with lapply() or to copy lists retaining what you want (which does not copy the components, in general, and so is cheap). As for your comments on books: `S Programming' does discuss the design of classes (both informal and formal), the main data sructures in R. As others have said, the Green Book (Chambers, 1998) is by not means out of date, except in the sense that the precise langage it describes has never been available: it is not a description of any version of S-PLUS nor R. Generally, though, you need to make sure you have at your fingertips the resources which come with R: the various manuals (including R-lang) and the on-line help. For example, I have just spend several days documenting in the help pages exactly how subscripting of data frames works (and correcting dozens of anomalies and bugs). -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] help on R programming.
Hello all, I am looking for books to help me gain a firmer grasp on the S/R programming language , programing / data structures etc. it seems that for this purpose two books are typically recommended: Programming with Data: A Guide to the S Language, John M. Chambers and S Programming by Venables Ripley. - The Chambers book is published 1998. is it a bit dated at this point. - is the Venables and Ripley's book a good source on the design and manipulation of data structures in R (it seems mostly focused on R extensions). - are there any other books, possibly published more recently, that you could recommend. I also have a couple of particular programming questions: -coming from a C++/java programming background I found that I often end up in R with lists of objects (each constructed, in turn, as a list, say list(x=x,y=y,z=z)). often, these individual objects have recursive 'attributes' so a matrix representation of this set of objects is not an option. although a data.frame might be. I typically need to access certain attributes of these objects for plotting or analysis etc. however, I have not been able to come up with a clean way to do this? e.g. object.list = list(o1=list(x=1,y=2,z=3), o2=list(x=11,y=22,z=33)) what I would like to do is say get a vector of x values for the objects in object.list, but something like object.list[[1:length(object.list)]]$x, for example, returns NULL. is there a better way to set up such an object list data structure that will allow me to do this? - what is the correct way to -remove- a component from a list. this seems to do the trick: list[[1]] = NULL, however, you'd think this should simply attach a NULL object at the first component position? many thanks for any help -- Murad Nayal M.D. Ph.D. Department of Biochemistry and Molecular Biophysics College of Physicians and Surgeons of Columbia University 630 West 168th Street. New York, NY 10032 Tel: 212-305-6884 Fax: 212-305-6926 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help