Re: [R] Arrange histogram
[EMAIL PROTECTED] wrote: The data set has a number of variables each of which is classified into two groups. For each variable of each group, I need to create a histogram. All the histograms are to be lined up into a file that looks like group1 group2 Variable 1 Histogram histogram Variable 2 Histogram histogram ... Can you give me a hint as to what package I'd look into for help? lattice is your friend. Uwe Ligges Thank you Jue Wang, Biostatistician Contracted Position for Preclinical Research Biostatistics PrO Unlimited (908) 231-3022 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Need help with barplots
I`ve read all the manuals and still couln`t find what is the difference between the stacked and side-by-side barplots ? Could you explain me ? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help with barplots
laba diena [EMAIL PROTECTED] writes: I`ve read all the manuals and still couln`t find what is the difference between the stacked and side-by-side barplots ? Could you explain me ? Did you try par(ask=TRUE) example(barplot) -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help with barplots
First, produce two barplots for comparison: par(mfrow=c(2,1) ) barplot(VADeaths,beside=TRUE) barplot(VADeaths) The same information is in both plots; in the top, it is displayed as 5 separate bars for each group, and in the stacked plot it is shown as 5 separate regions in each of the four bars. The hight of each of these regions is the same as the hight of the corresponding bar in the side-by-side plot. The stacked plot enables you to see overall differences more easily (easier to see that the death rate is highest for Urban Males), but it is harder to compare the sizes of the categories. On 13/10/06, laba diena [EMAIL PROTECTED] wrote: I`ve read all the manuals and still couln`t find what is the difference between the stacked and side-by-side barplots ? Could you explain me ? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- = David Barron Said Business School University of Oxford Park End Street Oxford OX1 1HP __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] C code for KalmnaLike
On Thu, 2006-10-12 at 10:57 -0400, Leeds, Mark (IED) wrote: you shouldn't need it. Kalmanlike() ( spelling ) I think is in the base package and there is atleast One constributed package and probably many others that do kalman filtering but I can't recall the names of them. Check out the list of packages at www.r-project.org. Mark, That pre-supposes that Malini just wants to perform kalman filtering, and not look at the inner workings of the implementation in R. KalmanLike is in package stats distributed with base R, but it is defined as: KalmanLike function (y, mod, nit = 0, fast = TRUE) { x - .Call(KalmanLike, y, mod$Z, mod$a, mod$P, mod$T, mod$V, mod$h, mod$Pn, as.integer(nit), FALSE, fast = fast, PACKAGE = stats) names(x) - c(ssq, sumlog) s2 - x[1]/length(y) list(Lik = 0.5 * (log(x[1]/length(y)) + x[2]/length(y)), s2 = s2) } environment: namespace:stats So, not much use in reading the R code as this just calls compiled code. If Malini really does want to look at the C code for KalmanLike then Uwe Ligges recently posted a preview of an article he is writing for R News, which explains how to access various parts of R's source code. The preview is still available from: http://www.statistik.uni-dortmund.de/~ligges/R_Help_Desk_preview.pdf The information contained in the article should allow Malini to find the C for KalmanLike. HTH, G -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Malini Subramanian Sent: Thursday, October 12, 2006 9:56 AM To: R-help@stat.math.ethz.ch Subject: Re: [R] C code for KalmnaLike hi, i am looking for c code of kalman filtering please can you help me...thankyou bye... - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC ENSIS, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/cv/ London, UK. WC1E 6BT. [w] http://www.ucl.ac.uk/~ucfagls/ %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rarefy a matrix of counts
I thought at first that you could use a weighted sample (the sample function) but, you can't since it doesn't take proper account of replacement if you try that. You can use the list approach, but through the power of R, you don't need a lot of loops to do it... I can't speak for the efficiency of this approach in terms of cpu cycle. In short: apply(z2,2,function(x)sample(rep(names(x),x),100)) In long: #let's load the data: z = scan(,,sep=\n) sample.1 sample.2 sample.3 red.candy 400 300 2500 green.candy1000 200 black.candy 3001000500 #and turn into a table z2 = read.table(textConnection(z), header=TRUE, row.names=1) # let's create a functon to expand a sample column into individuals: expand - function(x) rep(names(x), x) # test it on a smaller set: ex - expand( c( red = 2, blue = 3) ) ex [1] red red blue blue blue # and sample 2 things from that: sample( ex, 2 ) # combine the two samplex - function( x, size ) sample(expand(x), size ) samplex( c( red = 2, blue = 3), size = 2 ) # ok, now we use the apply function to apply this to each column apply(z2, 2, samplex, size = 2 ) # you wanted 100? apply(z2, 2, samplex, size = 100 ) # all done. #You should note that if there are less than 100 (samplenumber) candies in any given sample, this function will fail. # eg: apply(z2, 2, samplex, size = 2000 ) Error in sample(length(x), size, replace, prob) : cannot take a sample larger than the population when 'replace = FALSE' -Alex On 11 Oct 2006, at 15:10, Brian Frappier wrote: Hi Petr, Thanks for your response. I have data that looks like the following: sample 1 sample 2 sample 3 red candy400 300 2500 green candy1000 200 black candy 3001000500 I don't want to randomly select either the samples (columns) or the candy types (rows), which sample as you state would allow me. Instead, I want to randomly sample 100 candies from each sample and retain info on their associated type. I could make a list of all the candies in each sample: sample 1 red red red red green green black red black ... and then randomly sample those rows. Repeat for each sample. But, I am not sure how to do that without alot of loops, and am wondering if there is an easier way in R. Thanks! I should have laid this out in the first email...sorry. On 10/11/06, Petr Pikal [EMAIL PROTECTED] wrote: Hi I am not experienced in Matlab and from your explanation I do not understand what exactly do you want. It seems that you want randomly choose a sample of 100 rows from your martix, what can be achived by sample. DF-data.frame(rnorm(100), 1:100, 101:200, 201:300) DF[sample(1:100, 10),] If you want to do this several times, you need to save your result and than it depends on what you want to do next. One suitable form is list of matrices the other is array and you can use for loop for completing it. HTH Petr On 10 Oct 2006 at 17:40, Brian Frappier wrote: Date sent: Tue, 10 Oct 2006 17:40:47 -0400 From: Brian Frappier [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Subject:[R] rarefy a matrix of counts Hi all, I have a matrix of counts for objects (rows) by samples (columns). I aimed for about 500 counts in each sample (I have about 80 samples) and would now like to rarefy these down to 100 counts in each sample using simple random sampling without replacement. I plan on rarefying several times for each sample. I could do the tedious looping task of making a list of all objects (with its associated identifier) in each sample and then use the wonderful sampling package to select a sub-sample of 100 for each sample and thereby get a logical vector of inclusions. I would then regroup the resulting logical vector into a vector of counts by object, rinse and repeat several times for each sample. Alternately, using the same list, I could create a random index of integers between 1 and the number of objects for a sample (without repeats) and then select those objects from the list. Again, rinse and repeat several time for each sample. Is there a way to directly rarefy a matrix of counts without having to create a list of objects first? I am trying to switch to R from Matlab and am trying to pick up good programming habits from the start. Much appreciation! [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented,
[R] Need help with tables
I have a data file with 3 columns and I need to take only 2, how to do that ? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help with tables
I'm not quite sure what you mean, but if you are wanting to select columns of a data frame, have a look at help([) David On 13/10/06, laba diena [EMAIL PROTECTED] wrote: I have a data file with 3 columns and I need to take only 2, how to do that ? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- = David Barron Said Business School University of Oxford Park End Street Oxford OX1 1HP __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] bug: Editing function formals deletes the environment
First, here's the specific bug I have. Later I'll say why I care. ls(zappo) Error in try(name) : object zappo not found # good. f = function(zappo) { function(y) zappo + y } g = f(1) g(1) [1] 2 formals(g) $y formals(g)$y formals(g)$y = 2 g function (y = 2) zappo + y g(1) Error in g(1) : object zappo not found # looks like formals strips the environment off stuff. anything I can do about this? -Alex Original question: I'm trying to change the behaviour of a package, to simplify the interface. I'd rather not change the package, although I could. There's a hidden function whose defaults I wish to change. I'm using R 2.3.1 for macosX. Upgrading is not an option. This is what I do: library(R2HTML) # get the function to modify x = getFromNamespace(HTML.data.frame, R2HTML) # change the default for an argument formals(x)[Border]=list(NULL) # put the function back assignInNamespace(HTML.data.frame, x, R2HTML) #test the function: HTML(data.frame(1:2), file=stdout()) Error: could not find function HTMLReplaceNA # what seems to be happening is that the formals function is stripping the namespace off the variable x. I can't tell why. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bug: Editing function formals deletes the environment
Ah, it's fixed in 2.4.0. I'll work around it. -Alex On 13 Oct 2006, at 11:19, Alex Brown wrote: First, here's the specific bug I have. Later I'll say why I care. ls(zappo) Error in try(name) : object zappo not found # good. f = function(zappo) { function(y) zappo + y } g = f(1) g(1) [1] 2 formals(g) $y formals(g)$y formals(g)$y = 2 g function (y = 2) zappo + y g(1) Error in g(1) : object zappo not found # looks like formals strips the environment off stuff. anything I can do about this? -Alex Original question: I'm trying to change the behaviour of a package, to simplify the interface. I'd rather not change the package, although I could. There's a hidden function whose defaults I wish to change. I'm using R 2.3.1 for macosX. Upgrading is not an option. This is what I do: library(R2HTML) # get the function to modify x = getFromNamespace(HTML.data.frame, R2HTML) # change the default for an argument formals(x)[Border]=list(NULL) # put the function back assignInNamespace(HTML.data.frame, x, R2HTML) #test the function: HTML(data.frame(1:2), file=stdout()) Error: could not find function HTMLReplaceNA # what seems to be happening is that the formals function is stripping the namespace off the variable x. I can't tell why. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] NA-handling in glm.fit?
Dear Sir or Madam, I'm wondering if there is any routine or argument in the function 'glm.fit' that makes it handle NA's. The function 'glm' can handle NA's but I can't make make it work (or find anything written on this in the help files) with 'glm.fit'. Is it even possible in'glm.fit'? How? Thanks before hand, Fredrik Thuring, Business Researcher __ Codan Forsikring, Gammel Kongevej 60, DK-1790 Copenhagen V tele: +45 33 55 26 63, fax: +45 33 55 21 22 e-mail: [EMAIL PROTECTED] http://www.codan.dk -- This e-mail and any attachment may be confidential and may a...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Comparing of two non-normal distributions
Dear R Users, Suppose comparing two non-normal distributions is our interest. Like distribution of financial time series, they are negative skewed with fat tail. Which test can better help and in which pachage? ( For example in goodness-of-fit) Kolmogorov-Smirnov test has its own incompatibility. Are there for example Anderson-Darling or Kuiper statistics in R? I'm grateful to any reply. Amir - Get your email and more, right on the new Yahoo.com [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fw: nested linear model; with common intercept
Dear R-help, I posted this on 4 Oct but got no response (I wasn't even told to go away and do some more background reading ;) ). I am reposting it in the, perhaps, vain hope that someone with knowledge of the subject will reply, if only to point me in a different direction to which I am now facing. Earlier Posting:--- I am sorry if this is more of a stats question than an R-question, but I have found it difficult to get a clear answer by other means. Q. Would it be wrong to specify a nested model and retain a common intercept, e.g. lm(NH4 ~ Site/TideCode + 1) I am aware (?) that my Site-coefficients are now calculated relative to my reference Site (treatment.contrasts), *but* that my TideCode levels now relate to their reference level within Site. Is that correct? Thank you in advance for help. Regards, Mark Difford. Mark Difford Ph.D. candidate, Botany Department, Nelson Mandela Metropolitan University, Port Elizabeth, SA. Send instant messages to your online friends http://uk.messenger.yahoo.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Barplot legend position
Dear useRs, I'm trying to create a barplot like so: x=matrix(1:10,2,5) barplot(x,leg=c(left,right),besid=T) The legend is placed in default position topright, however the data are plotted there too. I tried controlling the legend position by adding x=topleft but this results in an error that x matches multiple formal arguments. Leaving out the legend and making a separate call to legend leaves out the colors of bars ... Please advice, Ingmar __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multiply two matrixes with the different dimension column by column
Dear all, I would like to multiply two matrixes with the different dimension column by column. Let make an example: If I have two matrixes X and Yas follow: X- matrix(1:12, nrow=4, ncol=3, dimnames=list(c(A,B,C,D), c(stage1,stage2,stage3))) Y- matrix(1:28, nrow=4, ncol=7, dimnames=list(c(A,B,C,D), c(site1,site2,site3,site4,site5, site6,site7))) I would like to multiply first column of the Ymatrix (site1) to the all of the columns in Xmatrix. Then, the product will be three new columns (for example:site1stage1, site1stage2 and site1stage3 or something like this) which I want to add to Ymatrix. As my site (Y) dataset has too many columns, it's not easy to do it in Excel and I'm looking for a command in R to prepare a new data frame for more analysis. So I would greatly appreciate if anybody can help me in this case. Thanks in advance Majid Majid Iravani PhD Student Swiss Federal Research Institute WSL Research Group of Vegetation Ecology Zürcherstrasse 111 CH-8903 Birmensdorf Switzerland Phone: +41-1-739-2693 Fax: +41-1-739-2215 Email: [EMAIL PROTECTED] http://www.wsl.ch/staff/majid.iravani/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bug in lowess
Frank Harrell wrote: [...] Thank you Brian. It seems that no matter what is the right answer, the answer currently returned on my system is clearly wrong. lowess()$y should be constrained to be within range(y). Really? That assertion is offered without proof and I am pretty sure is incorrect. Consider x - c(1:10, 20) y - c(1:10, 5) + 0.01*rnorm(11) lowess(x,y) $x [1] 1 2 3 4 5 6 7 8 9 10 20 $y [1] 0.9983192 1.9969599 2.9960805 3.9948224 4.9944158 5.9959855 [7] 6.9964400 7.9981434 8.9990607 10.0002567 19.9946117 Remember that lowess is a local *linear* fitting method, and may give zero weight to some data points, so (as here) it can extrapolate. After reading what src/appl/lowess.doc says should happen with zero weights, I think the answer given on Frank's system probably is the correct one. Rounding error is determining which of the last two points is given zero robustness weight: on a i686 system both of the last two are, and on mine only the last is. As far as I can tell in infinite-precision arithmetic both would be zero, and hence the value at x=120 would be found by extrapolation from those (far) to the left of it. I am inclined to think that the best course of action is to quit with a warning when the MAD of the residuals is effectively zero. However, we need to be careful not to call things 'bugs' that we do not understand well enough. This might be a design error in lowess, but it is not AFAICS a bug in the implementation. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mantel test!
Dear Sir/ Madam, I would like to ask you about mantel test in R. In ade4 package, if I want to use mantel.rtest, I get error massage Object of class to dist expected. As I already have two dissimilarity matrices, shall I again compute distance measure using this function? If not, could you please let me know which function/command I can use to do? Thank you very much in advance and have a nice day! Best regards, Hossein ___ Hossein Moradi PhD Student Institute of Environmental Sciences University of Zurich Winterthurerstrasse 190 Switzerland Tel: +41 1 635 61 18 Fax: +41 1 635 57 11 E-mail: mailto:[EMAIL PROTECTED] [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NA-handling in glm.fit?
On Fri, 2006-10-13 at 12:51 +0200, Fredrik Thuring wrote: Dear Sir or Madam, I'm wondering if there is any routine or argument in the function 'glm.fit' that makes it handle NA's. The function 'glm' can handle NA's but I can't make make it work (or find anything written on this in the help files) with 'glm.fit'. Is it even possible in'glm.fit'? How? Thanks before hand, Fredrik Thuring, Business Researcher glm() deals with NAs etc via the na.action argument, which is missing from glm.fit. glm is a wrapper around glm.fit, so look at the code of glm and see how it handles NAs when generating the x and y arguments that go into glm.fit. You should also look at ?glm which points to ?na.omit and look at options(na.action) to set what your setting is currently. You'll probably want to run na.omit or na.exlude on a combination of the response and predictor matrix (the arguments you are passing as x and y to glm.fit) to remove incomplete cases. HTH G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC ENSIS, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/cv/ London, UK. WC1E 6BT. [w] http://www.ucl.ac.uk/~ucfagls/ %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Barplot legend position
For example : x=matrix(1:10,2,5) barplot(x,besid=T) legend(topleft, c(left,right), density= c(0,1000)) 2006/10/13, Ingmar Visser [EMAIL PROTECTED]: Dear useRs, I'm trying to create a barplot like so: x=matrix(1:10,2,5) barplot(x,leg=c(left,right),besid=T) The legend is placed in default position topright, however the data are plotted there too. I tried controlling the legend position by adding x=topleft but this results in an error that x matches multiple formal arguments. Leaving out the legend and making a separate call to legend leaves out the colors of bars ... Please advice, Ingmar __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- David [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Barplot legend position
Thanks, this could work! However, the legend does not reproduce the color/shading used in the original barplot, are those available somehow? Best, Ingmar From: David Hajage [EMAIL PROTECTED] Date: Fri, 13 Oct 2006 14:11:21 +0200 To: Ingmar Visser [EMAIL PROTECTED] Cc: R-help@stat.math.ethz.ch Subject: Re: [R] Barplot legend position For example : x=matrix(1:10,2,5) barplot(x,besid=T) legend(topleft, c(left,right), density= c(0,1000)) 2006/10/13, Ingmar Visser [EMAIL PROTECTED]: Dear useRs, I'm trying to create a barplot like so: x=matrix(1:10,2,5) barplot(x,leg=c(left,right),besid=T) The legend is placed in default position topright, however the data are plotted there too. I tried controlling the legend position by adding x=topleft but this results in an error that x matches multiple formal arguments. Leaving out the legend and making a separate call to legend leaves out the colors of bars ... Please advice, Ingmar __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- David [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Copula in R 2.4.0
Dear R helper, does anyone have an idea on why R.2.4.0 draws the surface for the two command lines below and the next time it renders the error message below for exactly the same command lines: norm.cop - normalCopula(0.5) persp(norm.cop, dcopula) Error in ceiling(length.out) : Non-numeric argument to mathematical function. I will appreciate any help from anyone thanks, Dominique K. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mantel test!
On Fri, 2006-10-13 at 14:07 +0200, Hossein Moradi wrote: Dear Sir/ Madam, I would like to ask you about mantel test in R. In ade4 package, if I want to use mantel.rtest, I get error massage Object of class to dist expected. As I already have two dissimilarity matrices, shall I again compute distance measure using this function? If not, could you please let me know which function/command I can use to do? Thank you very much in advance and have a nice day! Best regards, Hossein If the dissimilarity matrices are full, square, symmetric matrices, then you can use: mantel.rtest(as.dist(your_mat1), as.dist(your_mat2)) See ?as.dist HTH G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC ENSIS, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/cv/ London, UK. WC1E 6BT. [w] http://www.ucl.ac.uk/~ucfagls/ %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Log-scale in histogramm
Hello My data looks ugly in a normal histogramm. How can I create a histogramm with a Y-axis in log-scale? Thanks for your help! David Graf -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multiple barplots on the same axis
Hi R newbie here :) I need to plot 3 barplots in the same axis, something like | | ___ | | | _| | _| | _ | _ | || | _ | || | _ | || | | | || || || || || || || || | -+- | v1 v2 v3 Is there any documentation describing how to achieve that, and what data file layout would make the job easier? Thanks in advance, Andre __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fw: nested linear model; with common intercept
I f I understand you correctly, I don't think that your model is doing what you think it is. Look at the model.matrix. Consider a toy example: x - 1:10 y - factor(letters[1:2]) dd - expand.grid(x, y) dd$resp - rnorm(20) model.matrix(~Var2/Var1+1, dd) (Intercept) Var2b Var2a:Var1 Var2b:Var1 11 0 1 0 21 0 2 0 31 0 3 0 41 0 4 0 51 0 5 0 61 0 6 0 71 0 7 0 81 0 8 0 91 0 9 0 10 1 0 10 0 11 1 1 0 1 12 1 1 0 2 13 1 1 0 3 14 1 1 0 4 15 1 1 0 5 16 1 1 0 6 17 1 1 0 7 18 1 1 0 8 19 1 1 0 9 20 1 1 0 10 attr(,assign) [1] 0 1 2 2 attr(,contrasts) attr(,contrasts)$Var2 [1] contr.treatment Do you want something more like the following? model.matrix(~Var2:Var1, dd) (Intercept) Var2a:Var1 Var2b:Var1 11 1 0 21 2 0 31 3 0 41 4 0 51 5 0 61 6 0 71 7 0 81 8 0 91 9 0 10 1 10 0 11 1 0 1 12 1 0 2 13 1 0 3 14 1 0 4 15 1 0 5 16 1 0 6 17 1 0 7 18 1 0 8 19 1 0 9 20 1 0 10 attr(,assign) [1] 0 1 1 attr(,contrasts) attr(,contrasts)$Var2 [1] contr.treatment although, an expert will surely correct me, if I'm in error here Q. Would it be wrong to specify a nested model and retain a common intercept, e.g. lm(NH4 ~ Site/TideCode + 1) I am aware (?) that my Site-coefficients are now calculated relative to my reference Site (treatment.contrasts), *but* that my TideCode levels now relate to their reference level within Site. Is that correct? Thank you in advance for help. Regards, Mark Difford. -- Ken Knoblauch Inserm U371 Institut Cellule Souche et Cerveau Département Neurosciences Intégratives 18 avenue du Doyen Lépine 69500 Bron France tel: +33 (0)4 72 91 34 77 fax: +33 (0)4 72 91 34 61 portable: +33 (0)6 84 10 64 10 http://www.lyon.inserm.fr/371/ [[alternative text/enriched version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bug: Editing function formals deletes the environment
If you are just modifying an S3 method in a package you may not need to reinsert the method into the package since UseMethod first looks into the caller environment for methods anyways and only second does it look for methods in the package. Thus: HTML.data.frame - R2HTML:::HTML.data.frame HTML.data.frame$Border - 2 HTML(BOD, file = file(clipboard, w), append = FALSE) would be sufficient if you intend to call HTML. There is one significant caveat. If you intend to call a function in R2HTML which in turn calls HTML then the above would not be enough. You would also have to modify the environment of the caller too. Thus after running the above: HTML2clip(BOD) would still get the old Border since we are calling HTML2clip which in turn calls HTML (as opposed to calling HTML directly). In this case, we would need to create a new HTML2clip with a reset environment too: HTML2clip - R2HTML:::HTML2clip environment(HTML2clip) - environment() HTML2clip(BOD) would get the Border=2 value. On 10/13/06, Alex Brown [EMAIL PROTECTED] wrote: First, here's the specific bug I have. Later I'll say why I care. ls(zappo) Error in try(name) : object zappo not found # good. f = function(zappo) { function(y) zappo + y } g = f(1) g(1) [1] 2 formals(g) $y formals(g)$y formals(g)$y = 2 g function (y = 2) zappo + y g(1) Error in g(1) : object zappo not found # looks like formals strips the environment off stuff. anything I can do about this? -Alex Original question: I'm trying to change the behaviour of a package, to simplify the interface. I'd rather not change the package, although I could. There's a hidden function whose defaults I wish to change. I'm using R 2.3.1 for macosX. Upgrading is not an option. This is what I do: library(R2HTML) # get the function to modify x = getFromNamespace(HTML.data.frame, R2HTML) # change the default for an argument formals(x)[Border]=list(NULL) # put the function back assignInNamespace(HTML.data.frame, x, R2HTML) #test the function: HTML(data.frame(1:2), file=stdout()) Error: could not find function HTMLReplaceNA # what seems to be happening is that the formals function is stripping the namespace off the variable x. I can't tell why. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bug in lowess
Prof Brian Ripley wrote: Frank Harrell wrote: [...] Thank you Brian. It seems that no matter what is the right answer, the answer currently returned on my system is clearly wrong. lowess()$y should be constrained to be within range(y). Really? That assertion is offered without proof and I am pretty sure is incorrect. Consider x - c(1:10, 20) y - c(1:10, 5) + 0.01*rnorm(11) lowess(x,y) $x [1] 1 2 3 4 5 6 7 8 9 10 20 $y [1] 0.9983192 1.9969599 2.9960805 3.9948224 4.9944158 5.9959855 [7] 6.9964400 7.9981434 8.9990607 10.0002567 19.9946117 Remember that lowess is a local *linear* fitting method, and may give zero weight to some data points, so (as here) it can extrapolate. Brian - thanks - that's a good example though not typical of the kind I see from patients. After reading what src/appl/lowess.doc says should happen with zero weights, I think the answer given on Frank's system probably is the correct one. Rounding error is determining which of the last two points is given zero robustness weight: on a i686 system both of the last two are, and on mine only the last is. As far as I can tell in infinite-precision arithmetic both would be zero, and hence the value at x=120 would be found by extrapolation from those (far) to the left of it. I am inclined to think that the best course of action is to quit with a warning when the MAD of the residuals is effectively zero. However, we need to be careful not to call things 'bugs' that we do not understand well enough. This might be a design error in lowess, but it is not AFAICS a bug in the implementation. Yes it appears to be a weakness in the underlying algorithm. Thanks Frank __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiply two matrixes with the different dimension column by column
Here are two ways: 1. Using inner from: http://tolstoy.newcastle.edu.au/R/help/05/04/3709.html try: array(inner(t(X), Y, *), c(4, 21)) 2. using model.matrix get all terms and interactions and eliminate the non-interactions: model.matrix(~ X * Y - X - Y - 1) On 10/13/06, Majid Iravani [EMAIL PROTECTED] wrote: Dear all, I would like to multiply two matrixes with the different dimension column by column. Let make an example: If I have two matrixes X and Yas follow: X- matrix(1:12, nrow=4, ncol=3, dimnames=list(c(A,B,C,D), c(stage1,stage2,stage3))) Y- matrix(1:28, nrow=4, ncol=7, dimnames=list(c(A,B,C,D), c(site1,site2,site3,site4,site5, site6,site7))) I would like to multiply first column of the Ymatrix (site1) to the all of the columns in Xmatrix. Then, the product will be three new columns (for example:site1stage1, site1stage2 and site1stage3 or something like this) which I want to add to Ymatrix. As my site (Y) dataset has too many columns, it's not easy to do it in Excel and I'm looking for a command in R to prepare a new data frame for more analysis. So I would greatly appreciate if anybody can help me in this case. Thanks in advance Majid Majid Iravani PhD Student Swiss Federal Research Institute WSL Research Group of Vegetation Ecology Zürcherstrasse 111 CH-8903 Birmensdorf Switzerland Phone: +41-1-739-2693 Fax: +41-1-739-2215 Email: [EMAIL PROTECTED] http://www.wsl.ch/staff/majid.iravani/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiply two matrixes with the different dimension column by column
Majid Iravani wrote: I would like to multiply two matrixes with the different dimension column by column. Let make an example: If I have two matrixes X and Yas follow: X- matrix(1:12, nrow=4, ncol=3, dimnames=list(c(A,B,C,D), c(stage1,stage2,stage3))) Y- matrix(1:28, nrow=4, ncol=7, dimnames=list(c(A,B,C,D), c(site1,site2,site3,site4,site5, site6,site7))) t(X) %*% Y will give something; I don't know if this is what you want. Alberto Monteiro __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiply two matrixes with the different dimension column by column
apply(Y, 2, function(y)list(y*X)) On 13 Oct 2006, at 12:33, Majid Iravani wrote: Dear all, I would like to multiply two matrixes with the different dimension column by column. Let make an example: If I have two matrixes X and Yas follow: X- matrix(1:12, nrow=4, ncol=3, dimnames=list(c(A,B,C,D), c(stage1,stage2,stage3))) Y- matrix(1:28, nrow=4, ncol=7, dimnames=list(c(A,B,C,D), c(site1,site2,site3,site4,site5, site6,site7))) I would like to multiply first column of the Ymatrix (site1) to the all of the columns in Xmatrix. Then, the product will be three new columns (for example:site1stage1, site1stage2 and site1stage3 or something like this) which I want to add to Ymatrix. As my site (Y) dataset has too many columns, it's not easy to do it in Excel and I'm looking for a command in R to prepare a new data frame for more analysis. So I would greatly appreciate if anybody can help me in this case. Thanks in advance Majid -- -- Majid Iravani PhD Student Swiss Federal Research Institute WSL Research Group of Vegetation Ecology Zürcherstrasse 111 CH-8903 Birmensdorf Switzerland Phone: +41-1-739-2693 Fax: +41-1-739-2215 Email: [EMAIL PROTECTED] http://www.wsl.ch/staff/majid.iravani/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cross two dataframe
Actually, I'm a beginner too :-) It's just this list is so helpful, and I was helped so many times, that I am trying to bring my small contribution :-) How about: #_ Y1=X*Y[,1] Y=cbind(Y,Y1) Y site1 site2 site3 site4 site5 site6 site7 stage1 stage2 stage3 A 1 5 913172125 1 5 9 B 2 61014182226 4 12 20 C 3 71115192327 9 21 33 D 4 81216202428 16 32 48 #___ And then change the names as you like... I bet there is a more elegant way, but it seems to work :-) hth, Mihai Nica 170 East Griffith St. G5 Jackson, MS 39201 601-914-0361 - Original Message From: Majid Iravani [EMAIL PROTECTED] To: Mihai Nica [EMAIL PROTECTED] Sent: Friday, October 13, 2006 1:58:29 AM Subject: Re: [R] Cross two dataframe Dear Mihai Nica Thanks. Actually I dont want to merge two data frames. For example if I have two matrixes X and Yas follow: X- matrix(1:12, nrow=4, ncol=3, dimnames=list(c(A,B,C,D), c(stage1,stage2,stage3))) Y- matrix(1:28, nrow=4, ncol=7, dimnames=list(c(A,B,C,D), c(site1,site2,site3,site4,site5, site6,site7))) I would like to multiply first column of the Ymatrix (site1) to the all of the columns in Xmatrix. Then, the product will be three new columns (site1stage1, site1stage2 and site1stage3) which I want to add to Ymatrix. As my site (Y) dataset has about 400 columns its not easy to do it in Excel and Im looking for a command in R to prepare a new data frame for more analysis. So I would greatly appreciate if you help me in this case. Thanks a lot again Majid At 10:06 AM 10/12/2006 -0700, you wrote: Mihai Nica Majid Iravani PhD Student Swiss Federal Research Institute WSL Research Group of Vegetation Ecology Zürcherstrasse 111 CH-8903 Birmensdorf Switzerland Phone: +41-1-739-2693 Fax: +41-1-739-2215 Email: [EMAIL PROTECTED] http://www.wsl.ch/staff/majid.iravani/ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] C code for KalmnaLike
Gavin, you are right. I thought he just wanted a routine. My bad. -Original Message- From: Gavin Simpson [mailto:[EMAIL PROTECTED] Sent: Friday, October 13, 2006 5:00 AM To: Leeds, Mark (IED) Cc: Malini Subramanian; R-help@stat.math.ethz.ch Subject: Re: [R] C code for KalmnaLike On Thu, 2006-10-12 at 10:57 -0400, Leeds, Mark (IED) wrote: you shouldn't need it. Kalmanlike() ( spelling ) I think is in the base package and there is atleast One constributed package and probably many others that do kalman filtering but I can't recall the names of them. Check out the list of packages at www.r-project.org. Mark, That pre-supposes that Malini just wants to perform kalman filtering, and not look at the inner workings of the implementation in R. KalmanLike is in package stats distributed with base R, but it is defined as: KalmanLike function (y, mod, nit = 0, fast = TRUE) { x - .Call(KalmanLike, y, mod$Z, mod$a, mod$P, mod$T, mod$V, mod$h, mod$Pn, as.integer(nit), FALSE, fast = fast, PACKAGE = stats) names(x) - c(ssq, sumlog) s2 - x[1]/length(y) list(Lik = 0.5 * (log(x[1]/length(y)) + x[2]/length(y)), s2 = s2) } environment: namespace:stats So, not much use in reading the R code as this just calls compiled code. If Malini really does want to look at the C code for KalmanLike then Uwe Ligges recently posted a preview of an article he is writing for R News, which explains how to access various parts of R's source code. The preview is still available from: http://www.statistik.uni-dortmund.de/~ligges/R_Help_Desk_preview.pdf The information contained in the article should allow Malini to find the C for KalmanLike. HTH, G -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Malini Subramanian Sent: Thursday, October 12, 2006 9:56 AM To: R-help@stat.math.ethz.ch Subject: Re: [R] C code for KalmnaLike hi, i am looking for c code of kalman filtering please can you help me...thankyou bye... - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC ENSIS, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/cv/ London, UK. WC1E 6BT. [w] http://www.ucl.ac.uk/~ucfagls/ %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% This is not an offer (or solicitation of an offer) to buy/se...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Copula in R 2.4.0
As there is no function normalCopula in R 2.4.0, your example is incomplete. Please see the footer of this message (and perhaps contact the maintainer of the package involved). Please don't send messages repeatedly: if you don't get an answer the first time study the posting guide and work out what the problem might be. On Fri, 13 Oct 2006, Dominique Katshunga wrote: Dear R helper, does anyone have an idea on why R.2.4.0 draws the surface for the two command lines below and the next time it renders the error message below for exactly the same command lines: norm.cop - normalCopula(0.5) persp(norm.cop, dcopula) Error in ceiling(length.out) : Non-numeric argument to mathematical function. I will appreciate any help from anyone thanks, Dominique K. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RODBC sqlQuery insert slow
Hello, I am trying to insert a lot of data into a table using windows R (2.3.1) and a mysql database via RODBC. First I read a file with read.csv and then form sql insert statements for each row and execute the insert query one row at a time. See the loop below. This turns out to be very slow. Can anyone please suggest a way to speed it up? Thanks, Bill # R code ntry=dim(ti)[1] date() nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger) for (i in 1:ntry) { sql=INSERT INTO logger (time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) VALUES( d1=strptime(ti[i,2],%d/%m/%y %H:%M:%S %p) sql=paste(sql,',d1,' ) sql=paste(sql,,,ti[i,3] ) sql=paste(sql,,,ti[i,4] ) sql=paste(sql,,,ti[i,5] ) sql=paste(sql,,,ti[i,6] ) sql=paste(sql,,,ti[i,7] ) sql=paste(sql,,,ti[i,8] ) sql=paste(sql,,,ti[i,9] ) sql=paste(sql,,,ti[i,10]) sql=paste(sql,,,ti[i,11]) sql=paste(sql,,,ti[i,12]) sql=paste(sql,) ) #print(sql) sqlQuery(channel, sql) } nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger) nadded=nafter-nbefore;nadded date() __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple barplots on the same axis
On Thu, 2006-10-12 at 23:08 -0300, Andre Nathan wrote: Hi R newbie here :) I need to plot 3 barplots in the same axis, something like | | ___ | | | _| | _| | _ | _ | || | _ | || | _ | || | | | || || || || || || || || | -+- | v1 v2 v3 Is there any documentation describing how to achieve that, and what data file layout would make the job easier? Thanks in advance, Andre There are examples in ?barplot, where the VADeaths data is used. The key is the use of 'beside = TRUE', to enable grouped bars: barplot(VADeaths, beside = TRUE, col = c(lightblue, mistyrose, lightcyan, lavender, cornsilk), legend = rownames(VADeaths), ylim = c(0, 100)) To see what the VADeaths data set looks like: VADeaths See ?barplot for more information. There is also an R Graphics Gallery with code at: http://addictedtor.free.fr/graphiques/index.php and From Data to Graphics at: http://zoonek2.free.fr/UNIX/48_R/03.html Both of which are helpful. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problemss compiling RODBC
When updating to the very last version of RODBC under freebsd 6.1 the errors below pop up but RODBC compiles till the end and, it seems, to work properly. What are those errors about? Vittorio .. checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... grep: error while loading shared libraries: /usr/local/lib/libpcre.so.0: ELF file OS ABI invalid yes checking whether cc accepts -g... grep: error while loading shared libraries: /usr/local/lib/libpcre.so.0: ELF file OS ABI invalid yes checking for cc option to accept ANSI C... grep: error while loading shared libraries: /usr/local/lib/libpcre.so.0: ELF file OS ABI invalid none needed .. . grep: error while loading shared libraries: /usr/local/lib/libpcre.so.0: ELF file OS ABI invalid config. status: creating src/Makevars config.status: creating src/config.h grep: error while loading shared libraries: /usr/local/lib/libpcre.so. 0: ELF file OS ABI invalid ** libs cc -I/usr/local/lib/R/include - I/usr/local/lib/R/include -I. -I/usr/local/include - I/usr/local/include -D__NO_MATH_INLINES -fpic -O2 -fno-strict- aliasing -pipe -march=pentium4 -c RODBC.c -o RODBC.o cc -shared - L/usr/local/lib -o RODBC.so RODBC.o -lodbc -L/usr/local/lib - L/usr/local/lib -L/usr/local/lib/R/lib -lR ** R ** inst ** preparing package for lazy loading ** help etc. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] combinatorics
Hi How do I generate all ways of ordering sets of indistinguishable items? suppose I have two A's, two B's and a C. Then I want AABBC AABCB AACBC ABABC . . .snip... BBAAC . . .snip... CBBAA [there are 5!/(2!*2!) = 30 arrangements. Note AABBC != BBAAC] How do I do this? -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] loop, pipe connection, output objects
Hi all, I have the following -newbye- problem. Inside R, I am trying to process a file and creating from it many files. The file is organized in different columns, the second containing a code. I want to create as output objects, which contain only entries in a certain code range, and whose name contain the code itself. Here is my attempt indice - (201:399) for(i in indice){ data.i - read.table(pipe(paste(gawk '{if ($2 =, i, $2, i+1,) print $2,$3}' base_6_mod , sep=''))) print(paste(code ..., i, ... done)) } The problems are: 1- My sintax data.i does not work, and loop only produces a big data.i object. My goal, obviously was to have something like data.201, data.202, etc (second order problem) 2- I wonder if the sintax for the index ($2 =, i, $2, i+1,) works Thanks for your help (hoping I manged to be enough clear), marco -- Marco Grazzi __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Barplot legend position
e.g.: barplot(x,col=c(lightgrey,darkgrey),besid=T) legend(topleft,c(left,right),fill=c(lightgrey,darkgrey)) try: ?legend and example(legend) for documentation!!! Ingmar Visser schrieb: Thanks, this could work! However, the legend does not reproduce the color/shading used in the original barplot, are those available somehow? Best, Ingmar From: David Hajage [EMAIL PROTECTED] Date: Fri, 13 Oct 2006 14:11:21 +0200 To: Ingmar Visser [EMAIL PROTECTED] Cc: R-help@stat.math.ethz.ch Subject: Re: [R] Barplot legend position For example : x=matrix(1:10,2,5) barplot(x,besid=T) legend(topleft, c(left,right), density= c(0,1000)) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help with tables
I suggest you look at one of the guides: http://cran.r-project.org/other-docs.html before answering questions like this to the mailing list... please read the posting guide! laba diena schrieb: I have a data file with 3 columns and I need to take only 2, how to do that ? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] glm cannot find valid starting values
Hi, I have some similar problems. Some times ago this problem dont there existed. Look this simple example: Y - c(0,0,0,1,4,8,16) X - c(1,2,3,4,5,6,7) m - glm(Y~X,family=gaussian(link=log)) Error in eval(expr, envir, enclos) : cannot find valid starting values: please specify some m - glm(Y~X,family=gaussian(link=log)) Error in eval(expr, envir, enclos) : cannot find valid starting values: please specify some What is the problem? I think that the problem is with the log link algorithm and zeros. Look Y - c(0,0.1,0.5,1,4,8,16) m - glm(Y~X,family=gaussian(link=log)) Error in eval(expr, envir, enclos) : cannot find valid starting values: please specify some Y - c(0.01,0.1,0.5,1,4,8,16) m - glm(Y~X,family=gaussian(link=log)) Without Zeros it work. It is a real bug? Thanks Ronaldo -- Preserve wildlife -- pickle a squirrel today! -- Prof. Ronaldo Reis Júnior | .''`. UNIMONTES/Depto. Biologia Geral/Lab. Ecologia Evolutiva | : :' : Campus Universitário Prof. Darcy Ribeiro, Vila Mauricéia | `. `'` CP: 126, CEP: 39401-089, Montes Claros - MG - Brasil | `- Fone: (38) 3229-8190 | [EMAIL PROTECTED] | ICQ#: 5692561 | LinuxUser#: 205366 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Producing a random bistochastic matrix
Hi everyone, I need some help to produce a random bistochastic matrix, that is a squared matrix of positive real numbers e_ij, with sum(e_i)=1 and sum(e_j)=1. Thanks Florent Bresson ___ Demandez à ceux qui savent sur Yahoo! Questions/Réponses __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rarefy a matrix of counts
Thank you, Alex! That's exactly what I was looking to do. I'm going to remove the loops and use your apply function approach. Best regards and much thanks, brian On 10/13/06, Alex Brown [EMAIL PROTECTED] wrote: I thought at first that you could use a weighted sample (the sample function) but, you can't since it doesn't take proper account of replacement if you try that. You can use the list approach, but through the power of R, you don't need a lot of loops to do it... I can't speak for the efficiency of this approach in terms of cpu cycle. In short: apply(z2,2,function(x)sample(rep(names(x),x),100)) In long: #let's load the data: z = scan(,,sep=\n) sample.1 sample.2 sample.3 red.candy 400 300 2500 green.candy1000 200 black.candy 3001000500 #and turn into a table z2 = read.table(textConnection(z), header=TRUE, row.names=1) # let's create a functon to expand a sample column into individuals: expand - function(x) rep(names(x), x) # test it on a smaller set: ex - expand( c( red = 2, blue = 3) ) ex [1] red red blue blue blue # and sample 2 things from that: sample( ex, 2 ) # combine the two samplex - function( x, size ) sample(expand(x), size ) samplex( c( red = 2, blue = 3), size = 2 ) # ok, now we use the apply function to apply this to each column apply(z2, 2, samplex, size = 2 ) # you wanted 100? apply(z2, 2, samplex, size = 100 ) # all done. #You should note that if there are less than 100 (samplenumber) candies in any given sample, this function will fail. # eg: apply(z2, 2, samplex, size = 2000 ) Error in sample(length(x), size, replace, prob) : cannot take a sample larger than the population when 'replace = FALSE' -Alex On 11 Oct 2006, at 15:10, Brian Frappier wrote: Hi Petr, Thanks for your response. I have data that looks like the following: sample 1 sample 2 sample 3 red candy400 300 2500 green candy1000 200 black candy 3001000500 I don't want to randomly select either the samples (columns) or the candy types (rows), which sample as you state would allow me. Instead, I want to randomly sample 100 candies from each sample and retain info on their associated type. I could make a list of all the candies in each sample: sample 1 red red red red green green black red black ... and then randomly sample those rows. Repeat for each sample. But, I am not sure how to do that without alot of loops, and am wondering if there is an easier way in R. Thanks! I should have laid this out in the first email...sorry. On 10/11/06, Petr Pikal [EMAIL PROTECTED] wrote: Hi I am not experienced in Matlab and from your explanation I do not understand what exactly do you want. It seems that you want randomly choose a sample of 100 rows from your martix, what can be achived by sample. DF-data.frame(rnorm(100), 1:100, 101:200, 201:300) DF[sample(1:100, 10),] If you want to do this several times, you need to save your result and than it depends on what you want to do next. One suitable form is list of matrices the other is array and you can use for loop for completing it. HTH Petr On 10 Oct 2006 at 17:40, Brian Frappier wrote: Date sent: Tue, 10 Oct 2006 17:40:47 -0400 From: Brian Frappier [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Subject:[R] rarefy a matrix of counts Hi all, I have a matrix of counts for objects (rows) by samples (columns). I aimed for about 500 counts in each sample (I have about 80 samples) and would now like to rarefy these down to 100 counts in each sample using simple random sampling without replacement. I plan on rarefying several times for each sample. I could do the tedious looping task of making a list of all objects (with its associated identifier) in each sample and then use the wonderful sampling package to select a sub-sample of 100 for each sample and thereby get a logical vector of inclusions. I would then regroup the resulting logical vector into a vector of counts by object, rinse and repeat several times for each sample. Alternately, using the same list, I could create a random index of integers between 1 and the number of objects for a sample (without repeats) and then select those objects from the list. Again, rinse and repeat several time for each sample. Is there a way to directly rarefy a matrix of counts without having to create a list of objects first? I am trying to switch to R from Matlab and am trying to pick up good programming habits
Re: [R] RODBC sqlQuery insert slow
On Fri, 2006-10-13 at 09:09 -0400, Bill Szkotnicki wrote: Hello, I am trying to insert a lot of data into a table using windows R (2.3.1) and a mysql database via RODBC. First I read a file with read.csv and then form sql insert statements for each row and execute the insert query one row at a time. See the loop below. This turns out to be very slow. Can anyone please suggest a way to speed it up? Thanks, Bill # R code ntry=dim(ti)[1] date() nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger) for (i in 1:ntry) { sql=INSERT INTO logger (time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) VALUES( d1=strptime(ti[i,2],%d/%m/%y %H:%M:%S %p) sql=paste(sql,',d1,' ) sql=paste(sql,,,ti[i,3] ) sql=paste(sql,,,ti[i,4] ) sql=paste(sql,,,ti[i,5] ) sql=paste(sql,,,ti[i,6] ) sql=paste(sql,,,ti[i,7] ) sql=paste(sql,,,ti[i,8] ) sql=paste(sql,,,ti[i,9] ) sql=paste(sql,,,ti[i,10]) sql=paste(sql,,,ti[i,11]) sql=paste(sql,,,ti[i,12]) sql=paste(sql,) ) #print(sql) sqlQuery(channel, sql) } nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger) nadded=nafter-nbefore;nadded date() I sure will try to help you out here. I've been working with RODBC. I think what slows you down here is your loop with multiple paste commands. Have you considered the sqlSave() function with the append=T argument? I think you could replace your loop with: dat - cbind(strptime(ti[,2],%d/%m/%y %H:%M:%S %p),d1,ti[,3:12]) sqlSave(channel,dat,logger,append=T) Of course, I haven't tested this so you may need some minor adjustments, but I think this will greatly speed up your insert job. Regards, Jerome -- Jerome Asselin, M.Sc., Agent de recherche, RHCE CHUM -- Centre de recherche 3875 rue St-Urbain, 3e etage // Montreal QC H2W 1V1 Tel.: 514-890-8000 Poste 15914; Fax: 514-412-7106 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RODBC sqlQuery insert slow
Large for loops are slow. Try to avoid them using apply, sapply, etc. I've made the paste statements a lot shorter by using collapse. See ?paste for more info. Append.SQL - function(x, channel){ sql=INSERT INTO logger (time, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10) VALUES(d1=strptime(x[2],%d/%m/%y %H:%M:%S %p ', d1, ' ,, paste(x[3:12], collapse = , ), ) ) sqlQuery(channel, sql) } ntry=dim(ti)[1] date() nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger) apply(ti, 2, Append.SQL, channel = channel) nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger) nadded=nafter-nbefore;nadded date() ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Reseach Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 [EMAIL PROTECTED] www.inbo.be Do not put your faith in what statistics say until you have carefully considered what they do not say. ~William W. Watt A statistical analysis, properly conducted, is a delicate dissection of uncertainties, a surgery of suppositions. ~M.J.Moroney -Oorspronkelijk bericht- Van: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Namens Bill Szkotnicki Verzonden: vrijdag 13 oktober 2006 15:09 Aan: [EMAIL PROTECTED] Onderwerp: [R] RODBC sqlQuery insert slow Hello, I am trying to insert a lot of data into a table using windows R (2.3.1) and a mysql database via RODBC. First I read a file with read.csv and then form sql insert statements for each row and execute the insert query one row at a time. See the loop below. This turns out to be very slow. Can anyone please suggest a way to speed it up? Thanks, Bill # R code ntry=dim(ti)[1] date() nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger) for (i in 1:ntry) { sql=INSERT INTO logger (time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) VALUES( d1=strptime(ti[i,2],%d/%m/%y %H:%M:%S %p) sql=paste(sql,',d1,' ) sql=paste(sql,,,ti[i,3] ) sql=paste(sql,,,ti[i,4] ) sql=paste(sql,,,ti[i,5] ) sql=paste(sql,,,ti[i,6] ) sql=paste(sql,,,ti[i,7] ) sql=paste(sql,,,ti[i,8] ) sql=paste(sql,,,ti[i,9] ) sql=paste(sql,,,ti[i,10]) sql=paste(sql,,,ti[i,11]) sql=paste(sql,,,ti[i,12]) sql=paste(sql,) ) #print(sql) sqlQuery(channel, sql) } nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger) nadded=nafter-nbefore;nadded date() __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Barplot legend position
--- Ingmar Visser [EMAIL PROTECTED] wrote: Thanks, this could work! However, the legend does not reproduce the color/shading used in the original barplot, are those available somehow? Best, Ingmar ?legend Try x=matrix(1:10,2,5) barplot(x,besid=T, col=c(red,blue)) legend(topleft, c(left,right), fill=c(red,blue)) From: David Hajage [EMAIL PROTECTED] Date: Fri, 13 Oct 2006 14:11:21 +0200 To: Ingmar Visser [EMAIL PROTECTED] Cc: R-help@stat.math.ethz.ch Subject: Re: [R] Barplot legend position For example : x=matrix(1:10,2,5) barplot(x,besid=T) legend(topleft, c(left,right), density= c(0,1000)) 2006/10/13, Ingmar Visser [EMAIL PROTECTED]: Dear useRs, I'm trying to create a barplot like so: x=matrix(1:10,2,5) barplot(x,leg=c(left,right),besid=T) The legend is placed in default position topright, however the data are plotted there too. I tried controlling the legend position by adding x=topleft but this results in an error that x matches multiple formal arguments. Leaving out the legend and making a separate call to legend leaves out the colors of bars ... Please advice, Ingmar __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- David [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Log-scale in histogramm
On Fri, 2006-10-13 at 13:33 +0200, David Graf wrote: Hello My data looks ugly in a normal histogramm. How can I create a histogramm with a Y-axis in log-scale? Thanks for your help! David Graf I'm not sure that you want to use a log scale here, but may be better served by log transforming your data. For example: # Generate 100 random values from a log normal dist: x - rlnorm(100) # Now do a histogram on x hist(x, freq = FALSE) # Now use log(x) hist(log(x), freq = FALSE) Does that help? Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nontabular logistic regression
Hi. I'm attempting to fit a logistic/binomial model so I can determine the influence of landscape on the probability that a box gets used by a bird. I've looked at a few sources (MASS text, Dalgaard, Fox and google) and the examples are almost always based on tabular predictor variables. My data, however are not. I'm not sure if that is the source of the problems or not because the one example that includes a continuous predictor looks to be coded exactly the same way. Looking at the output, I get estimates for each case when I should get a single estimate for purbank. Any suggestions? Many thanks, Jeff THE DATA: (200 boxes total, used [0 if unoccupied, 1 occupied], the rest are landscape variables). box use purbank purban2 purban1 pgrassk pgrass2 pgrass1 grassdist grasspatchk 1 1 0.003813435 0.02684564 0.06896552 0.3282487 0.6845638 0.7586207 0 3.73 2 1 0.04429451 0.1610738 0.1724138 0.1534174 0.3825503 0.6551724 0 1.023261 3 1 0.04458785 0.06040268 0 0.1628043 0.5570470.7586207 0 0.9605769 4 1 0.06072162 0.2080537 0.06896552 0.01936052 0 0 323.10990.2284615 5 0 0.6080962 0.6979866 0.6896552 0.03168084 0.1275168 0.2413793 30 0.2627027 6 1 0.6060428 0.6107383 0.3448276 0.04077442 0.2885906 0.4482759 30 0.2978571 7 1 0.3807568 0.4362416 0.6896552 0.06864183 0.03355705 0 94.868330.468 8 0 0.3649164 0.3154362 0.4137931 0.06277501 0.1275168 0 120 0.4585714 THE CODE: box.use- read.csv(c:\\eabl\\2004\\use_logistic2.csv, header=TRUE) attach(box.use) box.use - na.omit(box.use) use - factor(use, levels=0:1) levels(use) - c(unused, used) glm1 - glm(use ~ purbank, binomial) THE OUTPUT: Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept)-4.544e-16 1.414e+00 -3.21e-161.000 purbank02.157e+01 2.923e+04 0.0010.999 purbank0.001173365 2.157e+01 2.067e+04 0.0010.999 purbank0.001466706 2.157e+01 2.923e+04 0.0010.999 purbank0.001760047 6.429e-16 2.000e+00 3.21e-161.000 purbank0.002346729 2.157e+01 2.923e+04 0.0010.999 purbank0.003813435 2.157e+01 2.923e+04 0.0010.999 purbank0.004106776 2.157e+01 2.067e+04 0.0010.999 purbank0.004693458 2.157e+01 2.067e+04 0.0010.999 Jeffrey A. Stratford, Ph.D. Postdoctoral Associate 331 Funchess Hall Department of Biological Sciences Auburn University Auburn, AL 36849 334-329-9198 FAX 334-844-9234 http://www.auburn.edu/~stratja __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problemss compiling RODBC
On Fri, 2006-10-13 at 14:59 +0100, Vittorio wrote: When updating to the very last version of RODBC under freebsd 6.1 the errors below pop up but RODBC compiles till the end and, it seems, to work properly. What are those errors about? I don't know what adverse effect this may have in RODBC. To me, it looks like you're either missing the /usr/local/lib/libpcre.so.0 file or that the file is damaged. Can you confirm whether libpcre is installed on your system and for the correct ARCH? Try to reinstall/recompile the pcre package. More info at http://www.pcre.org (appears temporarily down at the moment I write). Then I'd try to recompile RODBC. HTH, Jerome -- Jerome Asselin, M.Sc., Agent de recherche, RHCE CHUM -- Centre de recherche 3875 rue St-Urbain, 3e etage // Montreal QC H2W 1V1 Tel.: 514-890-8000 Poste 15914; Fax: 514-412-7106 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RODBC sqlQuery insert slow
Is there a reason why the data have to be inserted 1 row at a time? Is it possible to insert the entire table at once? sqlSave perhaps. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bill Szkotnicki Sent: Friday, October 13, 2006 9:09 AM To: [EMAIL PROTECTED] Subject: [R] RODBC sqlQuery insert slow Hello, I am trying to insert a lot of data into a table using windows R (2.3.1) and a mysql database via RODBC. First I read a file with read.csv and then form sql insert statements for each row and execute the insert query one row at a time. See the loop below. This turns out to be very slow. Can anyone please suggest a way to speed it up? Thanks, Bill # R code ntry=dim(ti)[1] date() nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger) for (i in 1:ntry) { sql=INSERT INTO logger (time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) VALUES( d1=strptime(ti[i,2],%d/%m/%y %H:%M:%S %p) sql=paste(sql,',d1,' ) sql=paste(sql,,,ti[i,3] ) sql=paste(sql,,,ti[i,4] ) sql=paste(sql,,,ti[i,5] ) sql=paste(sql,,,ti[i,6] ) sql=paste(sql,,,ti[i,7] ) sql=paste(sql,,,ti[i,8] ) sql=paste(sql,,,ti[i,9] ) sql=paste(sql,,,ti[i,10]) sql=paste(sql,,,ti[i,11]) sql=paste(sql,,,ti[i,12]) sql=paste(sql,) ) #print(sql) sqlQuery(channel, sql) } nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger) nadded=nafter-nbefore;nadded date() __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This e-mail message is intended only for the named recipient(s) above. It may contain confidential information. If you are not the intended recipient you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachment(s) is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender by replying to this e-mail and delete the message and any attachment(s) from your system. Thank you. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help with tables
--- laba diena [EMAIL PROTECTED] wrote: I have a data file with 3 columns and I need to take only 2, how to do that Have a look at the manual? An Introduction to R 2.7 Index vectors; selecting and modifying subsets of a data set __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R in Perl-Scripts
Dear all, I would like to use R-comands within Perl-Scripts. How can I do this? Yours Torsten __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rmpi performance
Dear R users, we are trying to do some parallel computing using library(snow). In particular we have a cluster with 3 nodes cl - makeCluster(3, type = MPI) 3 slaves are spawned successfully. 0 failed. and we want to compute the function op_mat (see below) first with the master and then with the cluster using system.time for checking the computational performance. op_mat = function(mat) { + inv = solve(mat) + det_inv = det(inversa) + tr_inv = sum(diag(inversa)) + return(list(c(det=det_inv,tr=tr_inv))) + } nn = 3000 XX = matrix(rnorm(nn*nn),nn,nn) # with the master system.time(op_matrici(XX)) [1] 42.283 1.883 44.168 0.000 0.000 # with the cluster system.time(clusterCall(cl,op_matrici,XX)) [1] 11.523 12.612 71.562 0.000 0.000 You can see that using the master it takes 44.168 seconds for computing the function on matrix XX while it takes 71.562 seconds (more time!!!) with the cluster. Can you give us some advice in order to understand why the cluster is slower than the master? Thank you very much in advance, bye Michela and Marco Ps: we have a gigabit ethernet between the master and the nodes __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Helmert contrasts for repeated measures and split-plot expts
comments in line Roy Sanderson wrote: Dear R-help I have two separate experiments, one a repeated-measures design, the other a split-plot. In a standard ANOVA I have usually undertaken a multiple-comparison test on a significant factor with e.g TukeyHSD, but as I understand it such a test is inappropriate for repeated measures or split-plot designs. Is it therefore sensible to use Helmert contrasts for either of these designs? Whilst not providing all the pairwise comparisons of TukeyHSD, presumably the P-statistic for each Helmert contrast will indicate clearly whether that contrast is significant and should be retained in the model. (This seems to come with the disadvantage that the parameter values are harder to interpret than with Treatment contrasts.) In the repeated-measures design the factor in question has three levels, whilst in the split-plot design it has four. You don't need to restrict yourself to Helmert vs. treatment contrasts: You can use any set of contrasts that will provide estimates of (k-1) parameters for a factor with k levels and interpret the p values as you suggest. I see two issues with doing this: correlation among parameter estimates and individual vs. group p values. CORRELATED PARAMETER ESTIMATES: Helmert contrasts are orthogonal for a balanced design but will produce correlated parameter estimates with an unbalanced design. This will generally increase the p values due to variance inflation created by the correlation. If one or more correlations are too large, you may wish to try custom contrasts that produce parameter estimates that are essentially uncorrelated; this should give you the smallest p value you could expect for that comparison. If I was interested in, e.g., 2*k comparisons, I might run the same analysis several times with different contrasts, taking the p value for each comparison from an analysis in which the coefficient for that comparison had a low correlation with the remaining (k-2) coefficients for that k-level factor. INDIVIDUAL VS. GROUP p VALUES: In many but not all cases, under the null hypothesis of no effect, a p value will follow a uniform distribution. Thus, if we compute 1,000 p values using a typical procedure when nothing is going on, we can expect roughly 50 of them to be less than 0.05 by chance alone. The Bonferroni inequality suggests that if we do m comparisons, we should multiply the smallest p value by m to convert it to a family- or group-wise p value. This is known to be conservative, and with more than (k-1) comparisons among k levels of a factor, it is extremely conservative. In that case, I would be inclined to multiple the smallest p value by (k-1), even if I considered many more than (k-1) comparisons among the k levels. I don't know a reference for doing this, and if I were going to do it for a publication, I might do some simulations to check it. Perhaps someone else might enlighten us both on how sensible this might be. Hope this helps. Spencer Graves Many thanks in advance Roy --- Roy Sanderson Institute for Research on Environment and Sustainability Devonshire Building University of Newcastle Newcastle upon Tyne NE1 7RU United Kingdom Tel: +44 191 246 4835 Fax: +44 191 246 4999 http://www.ncl.ac.uk/environment/ [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RODBC sqlQuery insert slow
I am trying to insert a lot of data into a table using windows R (2.3.1) and a mysql database via RODBC. First I read a file with read.csv and then form sql insert statements for each row and execute the insert query one row at a time. See the loop below. This turns out to be very slow. Can anyone please suggest a way to speed it up? A few weeks ago I had to solve a similar task. Inserting each row turned out to be horrible slow due to paste() and the data.frame-indexing. The estimated runtime would have been over 3 weeks, so I used MySQLs LOAD DATE INFILE syntax to speed things up. You must have FILE_PRIV = 'Y' set in the mysql.user-table to use this small hack, and I'm not that sure that this runs remotely. It is also assumed that your df has valid column-names. tmp_filename - tempfile() write.table(df, tmp_filename, na = \\N, row.names = FALSE, col.names = FALSE, quote = FALSE, sep = \t) query - paste( LOAD DATA LOCAL INFILE ', tmp_filename, ', INTO TABLE , your_table, (, toString(names(df)), );, sep = ) sqlQuery(channel, query) unlink(tmp_filename) The total runtime for the LOAD DATA INFILE querys was something below 5 minutes, inserting 3e+6 rows with 200 columns. Michel Lang __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple barplots on the same axis
--- Andre Nathan [EMAIL PROTECTED] wrote: Hi R newbie here :) I need to plot 3 barplots in the same axis, something like | | ___ | | | _| | _| | _ | _ | || | _ | || | _ | || | | | || || || || || || || || | -+- | v1 v2 v3 Is there any documentation describing how to achieve that, and what data file layout would make the job easier? Thanks in advance, Andre You might have a look at Using R for Data Analysis and Graphics - Introduction, Examples and Commentary by John Maindonald and at An Introduction to S and the Hmisc and Design Libraries by Carlos Alzola and Frank E. Harrell Both are available on the R site : Follow the trail Other Contributed documentation. Are you aware of R Site Search http://finzi.psych.upenn.edu/search.html? To do what you want using traditional R graphics you should read up on par() Type ?par Here is some really quick and dirty code that might help as a starting point. aa- 1:4 bb - c(2,3,5,6) cc - c(4,5,6,7) par(mfcol=c(1,3)) barplot(aa) barplot(bb) barplot(cc) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combinatorics
Hi Robin, This approach first generates all combinations and then eliminates the non-feasible ones. It should work fine for smallish vectors but might not scale well for larger vectors. Hopefully it gives you what you need for this problem. xx - c(A,A,B,B,C) yy - 1:length(xx) zz - expand.grid(yy,yy,yy,yy,yy) ss - zz[ apply(zz, 1, FUN=function(x) length(unique(x))) == length(xx), ] ss - as.matrix(ss) pp - apply(ss, 1, FUN=function(x,v) paste(v[as.vector(x)], collapse=), v=xx) res - unique(pp) res [1] CBBAA BCBAA BBCAA CBABA BCABA CABBA ACBBA BACBA ABCBA BBACA BABCA [12] ABBCA CBAAB BCAAB CABAB ACBAB BACAB ABCAB CAABB ACABB AACBB BAACB [23] ABACB AABCB BBAAC BABAC ABBAC BAABC ABABC AABBC length(res) [1] 30 -Christos Christos Hatzis, Ph.D. Nuvera Biosciences, Inc. 400 West Cummings Park Suite 5350 Woburn, MA 01801 Tel: 781-938-3830 www.nuverabio.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Robin Hankin Sent: Friday, October 13, 2006 10:19 AM To: [EMAIL PROTECTED] Subject: [R] combinatorics Hi How do I generate all ways of ordering sets of indistinguishable items? suppose I have two A's, two B's and a C. Then I want AABBC AABCB AACBC ABABC . . .snip... BBAAC . . .snip... CBBAA [there are 5!/(2!*2!) = 30 arrangements. Note AABBC != BBAAC] How do I do this? -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nontabular logistic regression
On Fri, 2006-10-13 at 09:28 -0500, Jeffrey Stratford wrote: Hi. I'm attempting to fit a logistic/binomial model so I can determine the influence of landscape on the probability that a box gets used by a bird. I've looked at a few sources (MASS text, Dalgaard, Fox and google) and the examples are almost always based on tabular predictor variables. My data, however are not. I'm not sure if that is the source of the problems or not because the one example that includes a continuous predictor looks to be coded exactly the same way. Looking at the output, I get estimates for each case when I should get a single estimate for purbank. Any suggestions? Many thanks, Jeff THE DATA: (200 boxes total, used [0 if unoccupied, 1 occupied], the rest are landscape variables). box use purbank purban2 purban1 pgrassk pgrass2 pgrass1 grassdist grasspatchk 1 1 0.003813435 0.02684564 0.06896552 0.3282487 0.6845638 0.7586207 0 3.73 2 1 0.04429451 0.1610738 0.1724138 0.1534174 0.3825503 0.6551724 0 1.023261 3 1 0.04458785 0.06040268 0 0.1628043 0.5570470.7586207 0 0.9605769 4 1 0.06072162 0.2080537 0.06896552 0.01936052 0 0 323.10990.2284615 5 0 0.6080962 0.6979866 0.6896552 0.03168084 0.1275168 0.2413793 30 0.2627027 6 1 0.6060428 0.6107383 0.3448276 0.04077442 0.2885906 0.4482759 30 0.2978571 7 1 0.3807568 0.4362416 0.6896552 0.06864183 0.03355705 0 94.868330.468 8 0 0.3649164 0.3154362 0.4137931 0.06277501 0.1275168 0 120 0.4585714 THE CODE: box.use- read.csv(c:\\eabl\\2004\\use_logistic2.csv, header=TRUE) attach(box.use) box.use - na.omit(box.use) use - factor(use, levels=0:1) levels(use) - c(unused, used) glm1 - glm(use ~ purbank, binomial) THE OUTPUT: Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept)-4.544e-16 1.414e+00 -3.21e-161.000 purbank02.157e+01 2.923e+04 0.0010.999 purbank0.001173365 2.157e+01 2.067e+04 0.0010.999 purbank0.001466706 2.157e+01 2.923e+04 0.0010.999 purbank0.001760047 6.429e-16 2.000e+00 3.21e-161.000 purbank0.002346729 2.157e+01 2.923e+04 0.0010.999 purbank0.003813435 2.157e+01 2.923e+04 0.0010.999 purbank0.004106776 2.157e+01 2.067e+04 0.0010.999 purbank0.004693458 2.157e+01 2.067e+04 0.0010.999 It appears that the 'purbank' variable is being imported as a factor, hence the multiple levels indicated in the left hand column. Check: str(box.use) right after the read.csv() step and see what it shows. From the sample data above, it _should_ be along the lines of: str(box.use) 'data.frame': 8 obs. of 10 variables: $ box: int 1 2 3 4 5 6 7 8 $ use: int 1 1 1 1 0 1 1 0 $ purbank: num 0.00381 0.04429 0.04459 0.06072 0.60810 ... $ purban2: num 0.0268 0.1611 0.0604 0.2081 0.6980 ... $ purban1: num 0.069 0.172 0.000 0.069 0.690 ... $ pgrassk: num 0.3282 0.1534 0.1628 0.0194 0.0317 ... $ pgrass2: num 0.685 0.383 0.557 0.000 0.128 ... $ pgrass1: num 0.759 0.655 0.759 0.000 0.241 ... $ grassdist : num0 0 0 323 30 ... $ grasspatchk: num 3.730 1.023 0.961 0.228 0.263 ... Hence, you should be able to use: model - glm(use ~ purbank, data = box.use, family = binomial) summary(model) Call: glm(formula = use ~ purbank, family = binomial, data = box.use) Deviance Residuals: Min1QMedian3Q Max -1.61450 -0.03098 0.31935 0.45888 1.39194 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept)3.223 2.225 1.4480.147 purbank -6.129 4.773 -1.2840.199 (Dispersion parameter for binomial family taken to be 1) Null deviance: 8.9974 on 7 degrees of freedom Residual deviance: 6.5741 on 6 degrees of freedom AIC: 10.574 Number of Fisher Scoring iterations: 5 Note that na.omit() is the default operation for most R models, so is redundant. Also, I would not attach the data frame, as you can use the 'data' argument in model related functions. This avoids the confusion of having multiple copies of the source data set and wondering why changes made can become confusing and problematic. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nontabular logistic regression
Jeffrey Stratford said the following on 10/13/2006 9:28 AM: Hi. I'm attempting to fit a logistic/binomial model so I can determine the influence of landscape on the probability that a box gets used by a bird. I've looked at a few sources (MASS text, Dalgaard, Fox and google) and the examples are almost always based on tabular predictor variables. My data, however are not. I'm not sure if that is the source of the problems or not because the one example that includes a continuous predictor looks to be coded exactly the same way. Looking at the output, I get estimates for each case when I should get a single estimate for purbank. Any suggestions? Many thanks, Jeff THE DATA: (200 boxes total, used [0 if unoccupied, 1 occupied], the rest are landscape variables). box use purbank purban2 purban1 pgrassk pgrass2 pgrass1 grassdist grasspatchk 1 1 0.003813435 0.02684564 0.06896552 0.3282487 0.6845638 0.7586207 0 3.73 2 1 0.04429451 0.1610738 0.1724138 0.1534174 0.3825503 0.6551724 0 1.023261 3 1 0.04458785 0.06040268 0 0.1628043 0.5570470.7586207 0 0.9605769 4 1 0.06072162 0.2080537 0.06896552 0.01936052 0 0 323.10990.2284615 5 0 0.6080962 0.6979866 0.6896552 0.03168084 0.1275168 0.2413793 30 0.2627027 6 1 0.6060428 0.6107383 0.3448276 0.04077442 0.2885906 0.4482759 30 0.2978571 7 1 0.3807568 0.4362416 0.6896552 0.06864183 0.03355705 0 94.868330.468 8 0 0.3649164 0.3154362 0.4137931 0.06277501 0.1275168 0 120 0.4585714 THE CODE: box.use- read.csv(c:\\eabl\\2004\\use_logistic2.csv, header=TRUE) attach(box.use) box.use - na.omit(box.use) use - factor(use, levels=0:1) levels(use) - c(unused, used) glm1 - glm(use ~ purbank, binomial) THE OUTPUT: Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept)-4.544e-16 1.414e+00 -3.21e-161.000 purbank02.157e+01 2.923e+04 0.0010.999 purbank0.001173365 2.157e+01 2.067e+04 0.0010.999 purbank0.001466706 2.157e+01 2.923e+04 0.0010.999 purbank0.001760047 6.429e-16 2.000e+00 3.21e-161.000 purbank0.002346729 2.157e+01 2.923e+04 0.0010.999 purbank0.003813435 2.157e+01 2.923e+04 0.0010.999 purbank0.004106776 2.157e+01 2.067e+04 0.0010.999 purbank0.004693458 2.157e+01 2.067e+04 0.0010.999 Jeffrey A. Stratford, Ph.D. Postdoctoral Associate 331 Funchess Hall Department of Biological Sciences Auburn University Auburn, AL 36849 334-329-9198 FAX 334-844-9234 http://www.auburn.edu/~stratja __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. That's not what I get: lines - box,use,purbank,purban2,purban1,pgrassk,pgrass2,pgrass1,grassdist,grasspatchk 1,1,0.003813435,0.02684564,0.06896552,0.3282487,0.6845638,0.7586207,0,3.73 2,1,0.04429451,0.1610738,0.1724138,0.1534174,0.3825503,0.6551724,0,1.023261 3,1,0.04458785,0.06040268,0,0.1628043,0.557047,0.7586207,0,0.9605769 4,1,0.06072162,0.2080537,0.06896552,0.01936052,0,0,323.1099,0.2284615 5,0,0.6080962,0.6979866,0.6896552,0.03168084,0.1275168,0.2413793,30,0.2627027 6,1,0.6060428,0.6107383,0.3448276,0.04077442,0.2885906,0.4482759,30,0.2978571 7,1,0.3807568,0.4362416,0.6896552,0.06864183,0.03355705,0,94.86833,0.468 8,0,0.3649164,0.3154362,0.4137931,0.06277501,0.1275168,0,120,0.4585714 box.use - read.csv(textConnection(lines)) box.use - na.omit(box.use) box.use$use - factor(box.use$use, levels=0:1) levels(box.use$use) - c(unused, used) (glm1 - glm(use ~ purbank, binomial, box.use)) You need to check why purbank is being interpreted as a factor in your code. Also, I removed your use of attach because I find it dangerous (especially with no detach). Better to be explicit. HTH, --sundar __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R in Perl-Scripts
Torsten, your message is a bit terse. Do you have Omegahat's RSPerl package installed? If not, visit www.omegahat.org, and have a look at the documentation. (Btw, you will find a lot of other useful stuff there.) Sorry that I can't offer more at the moment -- but to know where to start may be better than nothing ;-) Cheers, Jörg __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combinatorics
Hi Christos thanks for this. Unfortunately, this approach wouldn't work for me because the real problem is too big for it: I have letters A-F and two of each, giving 12!/(2^6) ~= 7e6 combinations (borderline feasible) But in the approach you coded up below, matrix zz would have 6^12 ~= 2e9 rows before eliminating the non-feasible ones. This is too big for me. Looks like it's going to be another weekend lost to C [but at least I now have some confidence that I've not overlooked anything obvious!] With very best wishes, I really appreciate your efforts Robin On 13 Oct 2006, at 16:21, Christos Hatzis wrote: Hi Robin, This approach first generates all combinations and then eliminates the non-feasible ones. It should work fine for smallish vectors but might not scale well for larger vectors. Hopefully it gives you what you need for this problem. xx - c(A,A,B,B,C) yy - 1:length(xx) zz - expand.grid(yy,yy,yy,yy,yy) ss - zz[ apply(zz, 1, FUN=function(x) length(unique(x))) == length (xx), ] ss - as.matrix(ss) pp - apply(ss, 1, FUN=function(x,v) paste(v[as.vector(x)], collapse=), v=xx) res - unique(pp) res [1] CBBAA BCBAA BBCAA CBABA BCABA CABBA ACBBA BACBA ABCBA BBACA BABCA [12] ABBCA CBAAB BCAAB CABAB ACBAB BACAB ABCAB CAABB ACABB AACBB BAACB [23] ABACB AABCB BBAAC BABAC ABBAC BAABC ABABC AABBC length(res) [1] 30 -Christos Christos Hatzis, Ph.D. Nuvera Biosciences, Inc. 400 West Cummings Park Suite 5350 Woburn, MA 01801 Tel: 781-938-3830 www.nuverabio.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Robin Hankin Sent: Friday, October 13, 2006 10:19 AM To: [EMAIL PROTECTED] Subject: [R] combinatorics Hi How do I generate all ways of ordering sets of indistinguishable items? suppose I have two A's, two B's and a C. Then I want AABBC AABCB AACBC ABABC . . .snip... BBAAC . . .snip... CBBAA [there are 5!/(2!*2!) = 30 arrangements. Note AABBC != BBAAC] How do I do this? -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combinatorics
Use 'permutations' in 'gtools' x - permutations(5,5) y - c('a','a','b','b','c')[x] dim(y) - dim(x) unique(y) On 10/13/06, Robin Hankin [EMAIL PROTECTED] wrote: Hi How do I generate all ways of ordering sets of indistinguishable items? suppose I have two A's, two B's and a C. Then I want AABBC AABCB AACBC ABABC . . .snip... BBAAC . . .snip... CBBAA [there are 5!/(2!*2!) = 30 arrangements. Note AABBC != BBAAC] How do I do this? -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nontabular logistic regression
On Fri, 2006-10-13 at 09:28 -0500, Jeffrey Stratford wrote: Hi. I'm attempting to fit a logistic/binomial model so I can determine the influence of landscape on the probability that a box gets used by a bird. I've looked at a few sources (MASS text, Dalgaard, Fox and google) and the examples are almost always based on tabular predictor variables. My data, however are not. I'm not sure if that is the source of the problems or not because the one example that includes a continuous predictor looks to be coded exactly the same way. Looking at the output, I get estimates for each case when I should get a single estimate for purbank. Any suggestions? Many thanks, Jeff Hi Jeff, using the snippet of data you provided (copy/paste into a text file and read in with read.table) worked fine: box.use - read.table(~/tmp/tmp.txt, header = TRUE) box.use str(box.use) 'data.frame': 8 obs. of 10 variables: $ box: int 1 2 3 4 5 6 7 8 $ use: int 1 1 1 1 0 1 1 0 $ purbank: num 0.00381 0.04429 0.04459 0.06072 0.60810 ... $ purban2: num 0.0268 0.1611 0.0604 0.2081 0.6980 ... $ purban1: num 0.069 0.172 0.000 0.069 0.690 ... $ pgrassk: num 0.3282 0.1534 0.1628 0.0194 0.0317 ... $ pgrass2: num 0.685 0.383 0.557 0.000 0.128 ... $ pgrass1: num 0.759 0.655 0.759 0.000 0.241 ... $ grassdist : num0 0 0 323 30 ... $ grasspatchk: num 3.730 1.023 0.961 0.228 0.263 ... Now I don't like attach, and you just don't need it so I deviate a little now. Replace box.use$use directly and make use of the data argument in glm. Also, your data didn't have any missing data so I'm not sure whether the response or predictor is missing and whether your na.omit is needed or not - I don't use it below. box.use$use - factor(box.use$use, levels=0:1) levels(box.use$use) - c(unused, used) box.use str(box.use) glm1 - glm(use ~ purbank, data = box.use, family = binomial()) summary(glm1) Call: glm(formula = use ~ purbank, family = binomial(), data = box.use) Deviance Residuals: Min1QMedian3Q Max -1.61450 -0.03098 0.31935 0.45888 1.39194 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept)3.223 2.225 1.4480.147 purbank -6.129 4.773 -1.2840.199 (Dispersion parameter for binomial family taken to be 1) Null deviance: 8.9974 on 7 degrees of freedom Residual deviance: 6.5741 on 6 degrees of freedom AIC: 10.574 Number of Fisher Scoring iterations: 5 I suspect something got messed up in your reading of the data and R thought purbank was a factor or character. Always check your data after reading in, and str() is a your friend here as printed representations are not always what they seem. HTH G THE DATA: (200 boxes total, used [0 if unoccupied, 1 occupied], the rest are landscape variables). box use purbank purban2 purban1 pgrassk pgrass2 pgrass1 grassdist grasspatchk 1 1 0.003813435 0.02684564 0.06896552 0.3282487 0.6845638 0.7586207 0 3.73 2 1 0.04429451 0.1610738 0.1724138 0.1534174 0.3825503 0.6551724 0 1.023261 3 1 0.04458785 0.06040268 0 0.1628043 0.5570470.7586207 0 0.9605769 4 1 0.06072162 0.2080537 0.06896552 0.01936052 0 0 323.10990.2284615 5 0 0.6080962 0.6979866 0.6896552 0.03168084 0.1275168 0.2413793 30 0.2627027 6 1 0.6060428 0.6107383 0.3448276 0.04077442 0.2885906 0.4482759 30 0.2978571 7 1 0.3807568 0.4362416 0.6896552 0.06864183 0.03355705 0 94.868330.468 8 0 0.3649164 0.3154362 0.4137931 0.06277501 0.1275168 0 120 0.4585714 THE CODE: box.use- read.csv(c:\\eabl\\2004\\use_logistic2.csv, header=TRUE) attach(box.use) box.use - na.omit(box.use) use - factor(use, levels=0:1) levels(use) - c(unused, used) glm1 - glm(use ~ purbank, binomial) THE OUTPUT: Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept)-4.544e-16 1.414e+00 -3.21e-161.000 purbank02.157e+01 2.923e+04 0.0010.999 purbank0.001173365 2.157e+01 2.067e+04 0.0010.999 purbank0.001466706 2.157e+01 2.923e+04 0.0010.999 purbank0.001760047 6.429e-16 2.000e+00 3.21e-161.000 purbank0.002346729 2.157e+01 2.923e+04 0.0010.999 purbank0.003813435 2.157e+01 2.923e+04 0.0010.999 purbank0.004106776 2.157e+01 2.067e+04 0.0010.999 purbank0.004693458 2.157e+01 2.067e+04 0.0010.999 Jeffrey A. Stratford, Ph.D. Postdoctoral Associate 331 Funchess Hall Department of
Re: [R] RODBC sqlQuery insert slow
Thanks for the help ... the sqlSave() function was the solution. The lesson, which has been stated many times before, is to avoid loops wherever possible! Bill # fast RODBC inserting dat - cbind(as.character(strptime(ti[,2],%d/%m/%y %H:%M:%S %p)),ti[,3:12]) # you need the as.character to make sure the time is stored correctly in mysql names(dat)=c(time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) sqlSave(channel,dat,logger,rownames=F,append=T) # very fast. # Jerome Asselin wrote: On Fri, 2006-10-13 at 09:09 -0400, Bill Szkotnicki wrote: Hello, I am trying to insert a lot of data into a table using windows R (2.3.1) and a mysql database via RODBC. First I read a file with read.csv and then form sql insert statements for each row and execute the insert query one row at a time. See the loop below. This turns out to be very slow. Can anyone please suggest a way to speed it up? Thanks, Bill # R code ntry=dim(ti)[1] date() nbefore=sqlQuery(channel,SELECT COUNT(*) FROM logger) for (i in 1:ntry) { sql=INSERT INTO logger (time,v1,v2,v3,v4,v5,v6,v7,v8,v9,v10) VALUES( d1=strptime(ti[i,2],%d/%m/%y %H:%M:%S %p) sql=paste(sql,',d1,' ) sql=paste(sql,,,ti[i,3] ) sql=paste(sql,,,ti[i,4] ) sql=paste(sql,,,ti[i,5] ) sql=paste(sql,,,ti[i,6] ) sql=paste(sql,,,ti[i,7] ) sql=paste(sql,,,ti[i,8] ) sql=paste(sql,,,ti[i,9] ) sql=paste(sql,,,ti[i,10]) sql=paste(sql,,,ti[i,11]) sql=paste(sql,,,ti[i,12]) sql=paste(sql,) ) #print(sql) sqlQuery(channel, sql) } nafter=sqlQuery(channel,SELECT COUNT(*) FROM logger) nadded=nafter-nbefore;nadded date() I sure will try to help you out here. I've been working with RODBC. I think what slows you down here is your loop with multiple paste commands. Have you considered the sqlSave() function with the append=T argument? I think you could replace your loop with: dat - cbind(strptime(ti[,2],%d/%m/%y %H:%M:%S %p),d1,ti[,3:12]) sqlSave(channel,dat,logger,append=T) Of course, I haven't tested this so you may need some minor adjustments, but I think this will greatly speed up your insert job. Regards, Jerome __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problemss compiling RODBC
There is. # ls /usr/local/lib/libpcre* @libpcre.so libpcre.so.0 Vittorio Alle 14:39, venerdì 13 ottobre 2006, Jerome Asselin ha scritto: On Fri, 2006-10-13 at 14:59 +0100, Vittorio wrote: When updating to the very last version of RODBC under freebsd 6.1 the errors below pop up but RODBC compiles till the end and, it seems, to work properly. What are those errors about? I don't know what adverse effect this may have in RODBC. To me, it looks like you're either missing the /usr/local/lib/libpcre.so.0 file or that the file is damaged. Can you confirm whether libpcre is installed on your system and for the correct ARCH? Try to reinstall/recompile the pcre package. More info at http://www.pcre.org (appears temporarily down at the moment I write). Then I'd try to recompile RODBC. HTH, Jerome __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rmpi performance
clusterCall invokes the same function on all three nodes. You have basically discovered the communication costs of performing the calculation in parallel. You'll get the easiest gains from snow (and other parallel packages in R) with 'embarrassingly parallel' problems, where the same algorithm is applied to different data sets / slices of data. For performance gains from a single call to op_mat, you'd have to do some serious parallel algorithm development to distribute the data and computations effectively. Hope that helps, Martin Michela Cameletti [EMAIL PROTECTED] writes: Dear R users, we are trying to do some parallel computing using library(snow). In particular we have a cluster with 3 nodes cl - makeCluster(3, type = MPI) 3 slaves are spawned successfully. 0 failed. and we want to compute the function op_mat (see below) first with the master and then with the cluster using system.time for checking the computational performance. op_mat = function(mat) { + inv = solve(mat) + det_inv = det(inversa) + tr_inv = sum(diag(inversa)) + return(list(c(det=det_inv,tr=tr_inv))) + } nn = 3000 XX = matrix(rnorm(nn*nn),nn,nn) # with the master system.time(op_matrici(XX)) [1] 42.283 1.883 44.168 0.000 0.000 # with the cluster system.time(clusterCall(cl,op_matrici,XX)) [1] 11.523 12.612 71.562 0.000 0.000 You can see that using the master it takes 44.168 seconds for computing the function on matrix XX while it takes 71.562 seconds (more time!!!) with the cluster. Can you give us some advice in order to understand why the cluster is slower than the master? Thank you very much in advance, bye Michela and Marco Ps: we have a gigabit ethernet between the master and the nodes __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Martin T. Morgan Bioconductor / Computational Biology http://bioconductor.org __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Log-scale in histogramm
On Fri, 13 Oct 2006, Marc Schwartz wrote: On Fri, 2006-10-13 at 13:33 +0200, David Graf wrote: Hello My data looks ugly in a normal histogramm. How can I create a histogramm with a Y-axis in log-scale? Thanks for your help! David Graf There is a log-histogram (called log.hist) in my package HyperbolicDist, and an updated one on my web page: http://www.stat.auckland.ac.nz/~dscott/Rpackage/NewFunctions/logHist.R David Scott _ David Scott Visiting (Until January 07) Department of Probability and Statistics The University of Sheffield The Hicks Building Hounsfield Road Sheffield S3 7RH United Kingdom Phone: +44 114 222 3908 Email: [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Adding per-panel text to panel strips in lattice xyplot
I would like to add auxiliary information to the bottom of two strips on each panel that comes from a table look-up using the values of two variables that define the panel. For example I might panel on sex and race, showing 3 randomly chosen time series in each panel and want to add (n=100) in the bottom strip to indicate the 3 curves were sampled from 100. Is there a not-too-hard way to do that? I would like to do this both with and without groups= and superposition, but especially with. Thanks Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rmpi performance
On Fri, 13 Oct 2006, Michela Cameletti wrote: Dear R users, we are trying to do some parallel computing using library(snow). In particular we have a cluster with 3 nodes cl - makeCluster(3, type = MPI) 3 slaves are spawned successfully. 0 failed. and we want to compute the function op_mat (see below) first with the master and then with the cluster using system.time for checking the computational performance. op_mat = function(mat) { + inv = solve(mat) + det_inv = det(inversa) + tr_inv = sum(diag(inversa)) + return(list(c(det=det_inv,tr=tr_inv))) + } What is inversa? nn = 3000 XX = matrix(rnorm(nn*nn),nn,nn) # with the master system.time(op_matrici(XX)) [1] 42.283 1.883 44.168 0.000 0.000 # with the cluster system.time(clusterCall(cl,op_matrici,XX)) [1] 11.523 12.612 71.562 0.000 0.000 You can see that using the master it takes 44.168 seconds for computing the function on matrix XX while it takes 71.562 seconds (more time!!!) Of coure it takes more time to do the same computation plus communication! The amount of additional time seems high if your nodes are comparable in speed to your master and you really are getting gigabit performance. I would look for a visualization tool an idea of what is happening--perhaps xmpi if your MPI is LAM. Best, luke with the cluster. Can you give us some advice in order to understand why the cluster is slower than the master? Thank you very much in advance, bye Michela and Marco Ps: we have a gigabit ethernet between the master and the nodes __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Luke Tierney Chair, Statistics and Actuarial Science Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: [EMAIL PROTECTED] Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loop, pipe connection, output objects
Why don't you read the whole file in and the use subsetting to get your ranges instead of reading the file multiple time using 'gawk'. You can then use 'assign' to create your objects; it would be better to use a list. indice - (201:399) result - list() x - read.table('base_6_mod') for (i in indice){ assign(paste('data.', i, sep=''), x[x[,2] = i x[,2] i+1, 2:3]) # or better result[[i]] - x[x[,2] = i x[,2] i+1, 2:3] } # or without loops result - lapply(indice, function(z) x[x[,2] = z x[,2] z,]) On 10/13/06, Marco Grazzi [EMAIL PROTECTED] wrote: Hi all, I have the following -newbye- problem. Inside R, I am trying to process a file and creating from it many files. The file is organized in different columns, the second containing a code. I want to create as output objects, which contain only entries in a certain code range, and whose name contain the code itself. Here is my attempt indice - (201:399) for(i in indice){ data.i - read.table(pipe(paste(gawk '{if ($2 =, i, $2, i+1,) print $2,$3}' base_6_mod , sep=''))) print(paste(code ..., i, ... done)) } The problems are: 1- My sintax data.i does not work, and loop only produces a big data.i object. My goal, obviously was to have something like data.201, data.202, etc (second order problem) 2- I wonder if the sintax for the index ($2 =, i, $2, i+1,) works Thanks for your help (hoping I manged to be enough clear), marco -- Marco Grazzi __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cygwin script for Sweave
below is a very simple bash script to run Sweave from a cygwin terminal, run pdflatex on the generated .tex file, and then view the resulting .pdf output. i usually use cygwin when i am (forced to be on) Windoze, but i found a few issues with paths that this script works around. pdfview, used in the script, is simply: $ cat /usr/local/bin/pdfview #!/bin/bash.exe if [ $# -eq 1 ] then /c/Program\ Files/Adobe/Acrobat\ 6.0/Reader/AcroRd32.exe `cygpath -w -a -s $1` else /c/Program\ Files/Adobe/Acrobat\ 6.0/Reader/AcroRd32.exe fi mutatis mutandis for your own Adobe Reader. here is the script: #!/bin/bash.exe # rnw.sh [.Rnw file] # # $1 must be a .Rnw file # RNWFILE=$1 PWD=`pwd` FILEBASE=`basename $1 .Rnw` TEXFILE=$FILEBASE.tex PDFFILE=$FILEBASE.pdf echo \ library(\utils\); \ setwd(\`cygpath -m $PWD`\); \ Sweave(\$RNWFILE\)\ \ | /c/R/R-2.3.1/bin/Rterm.exe --no-save --no-restore # the resulting .tex file contains an annoying c: ... # replace it with the pdflatex-friendly /c : sed -e 's/c:/\/c/g' --in-place $TEXFILE # now run text processing pdflatex $TEXFILE pdfview $PDFFILE __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] HP UX
I have a user who is currently running R on a desktop system which takes 3 days to run. We have an Itanium 2 Cluster running HP UX. My system manager has tried to install R and has sent the following message INSTALL file says do ./configure make ./configure fails with ( tail end of output checking for main in -ltermcap... no checking for main in -ltermlib... no checking for rl_callback_read_char in -lreadline... no checking for history_truncate_file... no configure: error: --with-readline=yes (default) and headers/libs are not available Has anyone install R on HP UX system? Ricky __ Principal Analyst Information Services Queen's University Belfast tel: 02890 974824 email: [EMAIL PROTECTED] [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loop, pipe connection, output objects
read your data frame in all at once and then cut it on x[2] and split the result, e.g. split(iris, cut(iris$Sepal.Length, 4:8)) Please provide reproducible code. Without input its not reproducible. See last line of every message to r-help. On 10/13/06, Marco Grazzi [EMAIL PROTECTED] wrote: Hi all, I have the following -newbye- problem. Inside R, I am trying to process a file and creating from it many files. The file is organized in different columns, the second containing a code. I want to create as output objects, which contain only entries in a certain code range, and whose name contain the code itself. Here is my attempt indice - (201:399) for(i in indice){ data.i - read.table(pipe(paste(gawk '{if ($2 =, i, $2, i+1,) print $2,$3}' base_6_mod , sep=''))) print(paste(code ..., i, ... done)) } The problems are: 1- My sintax data.i does not work, and loop only produces a big data.i object. My goal, obviously was to have something like data.201, data.202, etc (second order problem) 2- I wonder if the sintax for the index ($2 =, i, $2, i+1,) works Thanks for your help (hoping I manged to be enough clear), marco -- Marco Grazzi __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problemss compiling RODBC
On Fri, 2006-10-13 at 18:07 +, vittorio wrote: There is. # ls /usr/local/lib/libpcre* @libpcre.so libpcre.so.0 I don't use freebsd. So I'm not sure how to help. As I hinted before, I'd first try to reinstall or update the pcre package and make sure that all its dependencies are satisfied. Confirm that the pcre package version is reasonably up-to-date. I see your problem seems to have something to do with the grep command. Can you actually use grep as a system command? E.g.: ps axu | grep $USER What do you get if you run this? ldd /bin/grep Jerome -- Jerome Asselin, M.Sc., Agent de recherche, RHCE CHUM -- Centre de recherche 3875 rue St-Urbain, 3e etage // Montreal QC H2W 1V1 Tel.: 514-890-8000 Poste 15914; Fax: 514-412-7106 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rmpi performance
On Fri, 13 Oct 2006, Michela Cameletti wrote: Dear R users, we are trying to do some parallel computing using library(snow). In particular we have a cluster with 3 nodes cl - makeCluster(3, type = MPI) 3 slaves are spawned successfully. 0 failed. and we want to compute the function op_mat (see below) first with the master and then with the cluster using system.time for checking the computational performance. op_mat = function(mat) { + inv = solve(mat) + det_inv = det(inversa) + tr_inv = sum(diag(inversa)) + return(list(c(det=det_inv,tr=tr_inv))) + } nn = 3000 XX = matrix(rnorm(nn*nn),nn,nn) # with the master system.time(op_matrici(XX)) [1] 42.283 1.883 44.168 0.000 0.000 # with the cluster system.time(clusterCall(cl,op_matrici,XX)) [1] 11.523 12.612 71.562 0.000 0.000 You can see that using the master it takes 44.168 seconds for computing the function on matrix XX while it takes 71.562 seconds (more time!!!) with the cluster. Can you give us some advice in order to understand why the cluster is slower than the master? clusterCall() evaluates the same call on each computer in the cluster, so it will always be slower than just evaluating on the master. It is useful for setup that has to be performed on each machine, or for parallel evaluation of random functions (eg boostrapping, simulation) To split up a single computation you have to do it explicitly, eg with parLapply, parSapply, and parApply, or parMM for parallel matrix multiplication. It's unlikely that you could speed up inverting a dense matrix even with gigabit ethernet for communication -- the success of ATLAS and Dr Goto's tuned BLAS libraries shows that the time taken for dense linear algebra can be dominated by communications overhead even between a CPU and its own memory. -thomas __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combinatorics
On 13-Oct-06 Robin Hankin wrote: Hi How do I generate all ways of ordering sets of indistinguishable items? suppose I have two A's, two B's and a C. Then I want AABBC AABCB AACBC I think you mean AACBB here! ABABC . . .snip... BBAAC . . .snip... CBBAA [there are 5!/(2!*2!) = 30 arrangements. Note AABBC != BBAAC] How do I do this? I've tried to think of an efficient and economical (and therefore clever) way of doing this for larger problems; but that will have to wait for another day! Meanwhile, a problem of the order of the one you describe above can be solved quite slickly: X-c(A,A,B,B,C) library(combinat) ##[result below stripped of quotes for clarity] unique(array(permn(X))) [[1]] [1] A A B B C [[2]] [1] A A B C B [[3]] [1] A A C B B [[4]] [1] A C A B B [[5]] [1] C A A B B [[6]] [1] A B A B C [[7]] [1] A B A C B [[8]] [1] A B C A B [[9]] [1] A C B A B [[10]] [1] C A B A B [[11]] [1] C B A A B [[12]] [1] B C A A B [[13]] [1] B A C A B [[14]] [1] B A A C B [[15]] [1] B A A B C [[16]] [1] B A B A C [[17]] [1] B A B C A [[18]] [1] B A C B A [[19]] [1] B C A B A [[20]] [1] C B A B A [[21]] [1] C A B B A [[22]] [1] A C B B A [[23]] [1] A B C B A [[24]] [1] A B B C A [[25]] [1] A B B A C [[26]] [1] B B A A C [[27]] [1] B B A C A [[28]] [1] B B C A A [[29]] [1] B C B A A [[30]] [1] C B B A A However, the above simple function will quickly get short of breath if the total number of items gets much above, say 10. Hoping this helps! Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 13-Oct-06 Time: 17:40:20 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combinatorics
On Fri, 13 Oct 2006, Robin Hankin wrote: Hi How do I generate all ways of ordering sets of indistinguishable items? suppose I have two A's, two B's and a C. Then I want AABBC AABCB AACBC ABABC . . .snip... BBAAC . . .snip... CBBAA [there are 5!/(2!*2!) = 30 arrangements. Note AABBC != BBAAC] How do I do this? I'd recursively use combn() to choose locations for A's, then B's, then ... where.A - combn(5,2)[, rep( 1:choose(5,2), each = choose(3,2)*choose(1,1))] where.not.A - apply(where.A,2,function(x) (1:5)[-x]) where.B - matrix(apply(unique( where.not.A, MARGIN=2), 2, combn, 2 ),nr=2) where.not.AB - apply(rbind(where.A,where.B),2,function(x) (1:5)[-x] ) result - matrix(C,nr=5,nc=30) result[ cbind( c( where.A ), c( col( where.A ) ) ) ] - A result[ cbind( c( where.B ), c( col( where.B ) ) ) ] - B cbind( apply(result,2,paste,collapse=) ) [,1] [1,] AABBC [2,] AABCB [3,] AACBB . . . -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0717 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bug in lowess
Folks: This interesting dicussion brings up an issue of what I have referred to for some time as safe statistics, by which I mean: Usually, but not necessarily automated)Statistical procedures that are guranteed to give either (a) a reasonable answer; or (b) Do not give an answer and when possible emit useful error messages. All standard least squares procedures taught in basic statistics courses are examples (from many different perspectives) of unsafe statistics. Robustness/resistance clearly takes us some way along this path, but as is clear from the discussion, not the whole way. The reason I think that this is important is a) Based on my own profound ignorance/limitations, I think it's impossible to expect those who need to use many different kinds of sophisticated statistical analyses to understand enough of the technical details to be able to actively and effectively guide their appropriate when this requires such guidance (e.g., least aquares with extensive diagnostics; overfitting in nonlinear regression); b) The explosion of large complex data in all disciplines that **require** some sort of automated analyses to be used (e.g. microarray data?). Having said this, it is unclear to me even **if** safe statistics is a meaningful concept: can it ever be -- at all? But I believe one thing is clear: A lot of people devote a lot of labor to optimal procedures that are far too sensitive to the manifold peculiarities of real data to give reliable, trustworthy results in practice considerable expert coaxing. We at least need a greater variety of less optimal but safer data analysis procedures. R -- or rather it's many contributors-- seems to me to be the exception in recognizing and doing something about this. And as a humble example of what I mean: I like simple running medians of generally small span for smoothing sequential data (please don't waste time giving me counterexamples of why this is bad or how it can go wrong -- I know there are many). I would appreciate anyone else's thoughts on this, pro or con, perhaps privately rather than on the list if you view this as too far off-topic. (NOTE: TO be clear: My personal views, not those of my company or colleagues) My best regards to all, Bert Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Frank E Harrell Jr Sent: Friday, October 13, 2006 5:51 AM To: Prof Brian Ripley Cc: [EMAIL PROTECTED] Subject: Re: [R] Bug in lowess Prof Brian Ripley wrote: Frank Harrell wrote: [...] Thank you Brian. It seems that no matter what is the right answer, the answer currently returned on my system is clearly wrong. lowess()$y should be constrained to be within range(y). Really? That assertion is offered without proof and I am pretty sure is incorrect. Consider x - c(1:10, 20) y - c(1:10, 5) + 0.01*rnorm(11) lowess(x,y) $x [1] 1 2 3 4 5 6 7 8 9 10 20 $y [1] 0.9983192 1.9969599 2.9960805 3.9948224 4.9944158 5.9959855 [7] 6.9964400 7.9981434 8.9990607 10.0002567 19.9946117 Remember that lowess is a local *linear* fitting method, and may give zero weight to some data points, so (as here) it can extrapolate. Brian - thanks - that's a good example though not typical of the kind I see from patients. After reading what src/appl/lowess.doc says should happen with zero weights, I think the answer given on Frank's system probably is the correct one. Rounding error is determining which of the last two points is given zero robustness weight: on a i686 system both of the last two are, and on mine only the last is. As far as I can tell in infinite-precision arithmetic both would be zero, and hence the value at x=120 would be found by extrapolation from those (far) to the left of it. I am inclined to think that the best course of action is to quit with a warning when the MAD of the residuals is effectively zero. However, we need to be careful not to call things 'bugs' that we do not understand well enough. This might be a design error in lowess, but it is not AFAICS a bug in the implementation. Yes it appears to be a weakness in the underlying algorithm. Thanks Frank __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] a correlation matrix subset where the subset avg. is a maximum
Hello R group, Given a correlation matrix, I would like to obtain the best subset of pairs in the matrix of some size n such that the mean of r for that subset is a maximum compared to any other possible subset of size n. I've been looking at the deal and subselect packages but they don't seem to do what I need. Does anyone have any suggestions? Thanks in advance, Ryan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nontabular logistic regression
Gavin, That worked! I went through and I found a few missing cases where I had . instead of NA - I'm still in SAS mode. Many thanks! Jeffrey A. Stratford, Ph.D. Postdoctoral Associate 331 Funchess Hall Department of Biological Sciences Auburn University Auburn, AL 36849 334-329-9198 FAX 334-844-9234 http://www.auburn.edu/~stratja Gavin Simpson [EMAIL PROTECTED] 10/13/06 11:23 AM On Fri, 2006-10-13 at 09:28 -0500, Jeffrey Stratford wrote: Hi. I'm attempting to fit a logistic/binomial model so I can determine the influence of landscape on the probability that a box gets used by a bird. I've looked at a few sources (MASS text, Dalgaard, Fox and google) and the examples are almost always based on tabular predictor variables. My data, however are not. I'm not sure if that is the source of the problems or not because the one example that includes a continuous predictor looks to be coded exactly the same way. Looking at the output, I get estimates for each case when I should get a single estimate for purbank. Any suggestions? Many thanks, Jeff Hi Jeff, using the snippet of data you provided (copy/paste into a text file and read in with read.table) worked fine: box.use - read.table(~/tmp/tmp.txt, header = TRUE) box.use str(box.use) 'data.frame': 8 obs. of 10 variables: $ box: int 1 2 3 4 5 6 7 8 $ use: int 1 1 1 1 0 1 1 0 $ purbank: num 0.00381 0.04429 0.04459 0.06072 0.60810 ... $ purban2: num 0.0268 0.1611 0.0604 0.2081 0.6980 ... $ purban1: num 0.069 0.172 0.000 0.069 0.690 ... $ pgrassk: num 0.3282 0.1534 0.1628 0.0194 0.0317 ... $ pgrass2: num 0.685 0.383 0.557 0.000 0.128 ... $ pgrass1: num 0.759 0.655 0.759 0.000 0.241 ... $ grassdist : num0 0 0 323 30 ... $ grasspatchk: num 3.730 1.023 0.961 0.228 0.263 ... Now I don't like attach, and you just don't need it so I deviate a little now. Replace box.use$use directly and make use of the data argument in glm. Also, your data didn't have any missing data so I'm not sure whether the response or predictor is missing and whether your na.omit is needed or not - I don't use it below. box.use$use - factor(box.use$use, levels=0:1) levels(box.use$use) - c(unused, used) box.use str(box.use) glm1 - glm(use ~ purbank, data = box.use, family = binomial()) summary(glm1) Call: glm(formula = use ~ purbank, family = binomial(), data = box.use) Deviance Residuals: Min1QMedian3Q Max -1.61450 -0.03098 0.31935 0.45888 1.39194 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept)3.223 2.225 1.4480.147 purbank -6.129 4.773 -1.2840.199 (Dispersion parameter for binomial family taken to be 1) Null deviance: 8.9974 on 7 degrees of freedom Residual deviance: 6.5741 on 6 degrees of freedom AIC: 10.574 Number of Fisher Scoring iterations: 5 I suspect something got messed up in your reading of the data and R thought purbank was a factor or character. Always check your data after reading in, and str() is a your friend here as printed representations are not always what they seem. HTH G THE DATA: (200 boxes total, used [0 if unoccupied, 1 occupied], the rest are landscape variables). box use purbank purban2 purban1 pgrassk pgrass2 pgrass1 grassdist grasspatchk 1 1 0.003813435 0.02684564 0.06896552 0.3282487 0.6845638 0.7586207 0 3.73 2 1 0.04429451 0.1610738 0.1724138 0.1534174 0.3825503 0.6551724 0 1.023261 3 1 0.04458785 0.06040268 0 0.1628043 0.5570470.7586207 0 0.9605769 4 1 0.06072162 0.2080537 0.06896552 0.01936052 0 0 323.10990.2284615 5 0 0.6080962 0.6979866 0.6896552 0.03168084 0.1275168 0.2413793 30 0.2627027 6 1 0.6060428 0.6107383 0.3448276 0.04077442 0.2885906 0.4482759 30 0.2978571 7 1 0.3807568 0.4362416 0.6896552 0.06864183 0.03355705 0 94.868330.468 8 0 0.3649164 0.3154362 0.4137931 0.06277501 0.1275168 0 120 0.4585714 THE CODE: box.use- read.csv(c:\\eabl\\2004\\use_logistic2.csv, header=TRUE) attach(box.use) box.use - na.omit(box.use) use - factor(use, levels=0:1) levels(use) - c(unused, used) glm1 - glm(use ~ purbank, binomial) THE OUTPUT: Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept)-4.544e-16 1.414e+00 -3.21e-161.000 purbank02.157e+01 2.923e+04 0.0010.999 purbank0.001173365 2.157e+01 2.067e+04 0.0010.999 purbank0.001466706
Re: [R] combinatorics
I've tried to think of an efficient and economical (and therefore clever) way of doing this for larger problems; but that will have to wait for another day! The ruby permutations library (http://permutation.rubyforge.org/doc/index.html) references The Algorithm Design Manual, Steven S. Skiena, Telos/Springer, 1997, for permutation sequences. Hadley __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding per-panel text to panel strips in lattice xyplot
Take a look at the xysplom function in package HH. You can use it as a model for what you want. tmp - data.frame(x=rnorm(24), y=rnorm(24), a=factor(rep(letters[1:2],12)), b=factor(rep(LETTERS[1:3], c(8,8,8 xysplom(y ~ x | a*b, data=tmp, corr=TRUE, layout=c(2,3)) The work is done with the cooperation of two functions. xysplom.default looks for the corr argument and then creates an additional conditioning factor and gives it a constant value. strip.xysplom sees where it is and changes the strip label as needed. strip.xysplom uses the R-2.3.1 technology for finding where it is. Deepayan added some more functions in R-2.4.1, so that part of the code can now be somewhat simplified. lattice in R-2.4.0 also has a new strip.left argument (see ?xyplot and serach for strip.left) that will allow you to put the additional information in an additional left strip for each panel, rather than by making changes to one of the standard top strips. Rich __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] a correlation matrix subset where the subset avg is a maximum
Hello R group, Given a correlation matrix, I would like to obtain the best subset of pairs in the matrix of some size n such that the mean of r for that subset is a maximum compared to any other possible subset of size n. I've been looking at the deal and subselect packages but they don't seem to do what I need. Does anyone have any suggestions? Thanks in advance, Ryan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cygwin script for Sweave
On Windows you could just put this into sweave.bat, say, and then place that anywhere in your path (or in the current directory): set infile=%~sdpn1 set infile=%infile:\=/% cmd Rcmd Sweave %infile%.Rnw pdflatex %infile%.tex start %infile%.pdf On 10/13/06, Thomas Harte [EMAIL PROTECTED] wrote: below is a very simple bash script to run Sweave from a cygwin terminal, run pdflatex on the generated .tex file, and then view the resulting .pdf output. i usually use cygwin when i am (forced to be on) Windoze, but i found a few issues with paths that this script works around. pdfview, used in the script, is simply: $ cat /usr/local/bin/pdfview #!/bin/bash.exe if [ $# -eq 1 ] then /c/Program\ Files/Adobe/Acrobat\ 6.0/Reader/AcroRd32.exe `cygpath -w -a -s $1` else /c/Program\ Files/Adobe/Acrobat\ 6.0/Reader/AcroRd32.exe fi mutatis mutandis for your own Adobe Reader. here is the script: #!/bin/bash.exe # rnw.sh [.Rnw file] # # $1 must be a .Rnw file # RNWFILE=$1 PWD=`pwd` FILEBASE=`basename $1 .Rnw` TEXFILE=$FILEBASE.tex PDFFILE=$FILEBASE.pdf echo \ library(\utils\); \ setwd(\`cygpath -m $PWD`\); \ Sweave(\$RNWFILE\)\ \ | /c/R/R-2.3.1/bin/Rterm.exe --no-save --no-restore # the resulting .tex file contains an annoying c: ... # replace it with the pdflatex-friendly /c : sed -e 's/c:/\/c/g' --in-place $TEXFILE # now run text processing pdflatex $TEXFILE pdfview $PDFFILE __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] a correlation matrix subset where the subset avg. is a maximum
Hello R group, Given a correlation matrix, I would like to obtain the best subset of pairs in the matrix of some size n such that the mean of r for that subset is a maximum compared to any other possible subset of size n. I've been looking at the deal and subselect packages but they don't seem to do what I need. Does anyone have any suggestions? Thanks in advance, Ryan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding error bars to lattice plots
Dear Deepayan and Sundar, Thank you so much for your help with this. However, I may have phrased my problem too specifically, assuming that *in general* I could apply your response to all Lattice graphics. What I need is a barchart or vertical dotchart, with error bars, across three treatments, with a form that should look something like: barchart(median~fac1|by1, groups=group1). Your solution works great with the problem I had posed xyplot(voice.part ~ median|by1, groups=group1, [snip], yet when I switch the axes for a vertical dotchart xyplot(median~voice.part|by1, groups=group1, [snip] the error bars remain horizontal and do not follow the switched axes. I tried to alter all 'lx' and 'ux' to 'ly' and 'uy' in panel.ci and prepanel.ci, but to no avail. I'm afraid I have not (YET) managed to understand just how Lattice works. Re. moving panel.abline() from panel.groups to panel, does that mean I need to rewrite panel.superpose? Again, any advice would be greatly appreciated. Thanks, Dan Deepayan Sarkar wrote: On 10/12/06, Daniel E. Bunker [EMAIL PROTECTED] wrote: Dear R users, About a year ago Deepayan offered a suggestion to incorporate error bars into a dotplot using the singer data as an example http://finzi.psych.upenn.edu/R/Rhelp02a/archive/63875.html. When I try to utilize this code with a grouping variable, I get an error stating that the subscripts argument is missing. I have tried to insert them in various ways, but cannot figure out where they should go. Deepayan's original code follows, with additions from me for factor, grouping and by variables. (Note that I could use xYplot (Dotplot), but I need my response variable on the vertical axis.) Any suggestions would be greatly appreciated. Thanks, Dan prepanel.ci - function(x, y, lx, ux, subscripts, ...) { x - as.numeric(x) lx - as.numeric(lx[subscripts]) ux - as.numeric(ux[subscripts]) list(xlim = range(x, ux, lx, finite = TRUE)) } panel.ci - function(x, y, lx, ux, subscripts, pch = 16, ...) { x - as.numeric(x) y - as.numeric(y) lx - as.numeric(lx[subscripts]) ux - as.numeric(ux[subscripts]) panel.abline(h = unique(y), col = grey) panel.arrows(lx, y, ux, y, col = 'black', length = 0.25, unit = native, angle = 90, code = 3) panel.xyplot(x, y, pch = pch, ...) } singer.split - with(singer, split(height, voice.part)) singer.ucl - sapply(singer.split, function(x) { st - boxplot.stats(x) c(st$stats[3], st$conf) }) singer.ucl - as.data.frame(t(singer.ucl)) names(singer.ucl) - c(median, lower, upper) singer.ucl$voice.part - factor(rownames(singer.ucl), levels = rownames(singer.ucl)) # add factor, grouping and by variables singer.ucl$fac1=c(level1,level1, level2, level2) singer.ucl$by1=c(two,one) singer.ucl$group1=c(rep(letters[1],4),rep(letters[2],4)) ## show the data frame singer.ucl # Deepayan's original example with(singer.ucl, xyplot(voice.part ~ median, lx = lower, ux = upper, prepanel = prepanel.ci, panel = panel.ci), horizontal=FALSE) # with by variable, works fine with(singer.ucl, xyplot(voice.part ~ median|by1, lx = lower, ux = upper, prepanel = prepanel.ci, panel = panel.ci)) # with groups, fails for lack of subscripts. with(singer.ucl, xyplot(voice.part ~ median, groups=group1, lx = lower, ux = upper, prepanel = prepanel.ci, panel = panel.ci)) Although that does seem to be the eventual error message, this fails not due to the lack of subscripts, but because 'panel.ci' does not know how to deal with groups. One solution to this is Sundar's approach, which is to change the panel function to handle groups. Another generic solution is to use 'panel.superpose', which _does_ know how to handle groups, and also accepts a custom panel function to be called for each group. Often (but not always), you can use a panel function designed for a non-groups aware display for this. In this case, the following gives results similar to Sundar's code: with(singer.ucl, xyplot(voice.part ~ median, groups=group1, lx = lower, ux = upper, prepanel = prepanel.ci, panel = panel.superpose, panel.groups = panel.ci, pch = 16)) Note the need for an explicit pch=16, since the default in panel.ci is overridden by panel.groups. # what I need, ultimately, is something like this, with error bars: with(singer.ucl, dotplot(median~fac1|by1, groups=group1)) If you have more than one interval (from different groups) for any level of your categorical variable - which seems to be the case in this example - you
[R] side by side plot of Histogram and densityplot
Using par seems easily put a hist and a density side by side on the same output window. I would like to use some features in histogram from Lattice, but how can I put histogram and densityplot side by side on the same graph? Thank you par(mfrow=c(2,1)) hist(y) plot(density(y)) Jue Wang, Biostatistician Contracted Position for Preclinical Research Biostatistics PrO Unlimited (908) 231-3022 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] side by side plot of Histogram and densityplot
On 10/13/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Using par seems easily put a hist and a density side by side on the same output window. I would like to use some features in histogram from Lattice, but how can I put histogram and densityplot side by side on the same graph? ?print.trellis example(print.trellis) -Deepayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] correlation b/w a continuous variable and a categorical variable
Dear Listers: I happen to have this question in mind, is there a way to evaluate the correlation between a continuous variable and a categorical variable (without discretizing the former)? My intuitive is using lda by considering the latter as response variable but not sure. thanks, weiwei -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a correlation matrix subset where the subset avg is a maximum
Thanks for the thought in any case Mark. Your right about the brute force. I'll expand a bit with an example though for the sake of clarity. Given a correlation matrix of 4 covariates ABCD with distances of: AB=0.2; AC=0.6; AD=0.3 ; BC=0.9 ; BD=0.8 ; CD=0.7 Find the optimal subset (size n, n being the number of covariates) where the mean of r for the subset is a maximum. Of course all NxN distances need to be considered between any chosen subset covariates. Thus for n1, the solution would be simply BC = 0.9 And for n2, the solution would be BCD as (BC + CD + BD)/3) = 0.8 is the maximum mean r value that could be obtained from any of the subsets with n2. I'd expected that this would be a common problem but 2 days of googling has given me little. I'm expecting a greedy graph traversal or the like will be my answer but I'd hoped to whip a solution of in R. Any help would be greatly appreciated. Ryan Leeds, Mark (IED) wrote: hi ryan : I reread and you already have the correlation matrix so brute force should definitely work. So, if the correlation matrix was size 20 by 20 and your n was 9. Then, you have to have of size 10 or greater so the number of possoibilities would be ( 20 choose 10 ) + ( 20 choose 11 ) + ( 200 choose 12 ) + ( 20 choose 13 ) + . ( 20 choose 20 ) Oh boy, it is too large a problem to do by brute force. There are too many possibilities even for this size of problem. Hopefully Someone else will have a better idea. Forget my brute force idea. It's useless and I apologize. I Made a mistake. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ryan Austin Sent: Friday, October 13, 2006 2:43 PM To: r-help@stat.math.ethz.ch Subject: [R] a correlation matrix subset where the subset avg is a maximum Hello R group, Given a correlation matrix, I would like to obtain the best subset of pairs in the matrix of some size n such that the mean of r for that subset is a maximum compared to any other possible subset of size n. I've been looking at the deal and subselect packages but they don't seem to do what I need. Does anyone have any suggestions? Thanks in advance, Ryan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This is not an offer (or solicitation of an offer) to buy/sell the securities/instruments mentioned or an official confirmation. Morgan Stanley may deal as principal in or own or act as market maker for securities/instruments mentioned or may advise the issuers. This is not research and is not from MS Research but it may refer to a research analyst/research report. Unless indicated, these views are the author's and may differ from those of Morgan Stanley research or others in the Firm. We do not represent this is accurate or complete and we may not update this. Past performance is not indicative of future returns. For additional information, research reports and important disclosures, contact me or see https://secure.ms.com/servlet/cls. You should not use e-mail to request, authorize or effect the purchase or sale of any security or instrument, to send transfer instructions, or to effect any other transactions. We cannot guarantee that any such requests received via! e-mail will be processed in a timely manner. This communication is solely for the addressee(s) and may contain confidential information. We do not waive confidentiality by mistransmission. Contact me if you do not wish to receive these communications. In the UK, this communication is directed in the UK to those persons who are market counterparties or intermediate customers (as defined in the UK Financial Services Authority's rules). __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a correlation matrix subset where the subset avg is a maximum
Thanks for the thought in any case Mark. Your right about the brute force. I'll expand a bit with an example though for the sake of clarity. Given a correlation matrix of 4 covariates ABCD with distances of: AB=0.2; AC=0.6; AD=0.3 ; BC=0.9 ; BD=0.8 ; CD=0.7 Find the optimal subset (size n, n being the number of covariates) where the mean of r for the subset is a maximum. Of course all NxN distances need to be considered between any chosen subset covariates. Thus for n1, the solution would be simply BC = 0.9 And for n2, the solution would be BCD as (BC + CD + BD)/3) = 0.8 is the maximum mean r value that could be obtained from any of the subsets with n2. I'd expected that this would be a common problem but 2 days of googling has given me little. I'm expecting a greedy graph traversal or the like will be my answer but I'd hoped to whip a solution of in R. Any help would be greatly appreciated. Ryan Leeds, Mark (IED) wrote: hi ryan : I reread and you already have the correlation matrix so brute force should definitely work. So, if the correlation matrix was size 20 by 20 and your n was 9. Then, you have to have of size 10 or greater so the number of possoibilities would be ( 20 choose 10 ) + ( 20 choose 11 ) + ( 200 choose 12 ) + ( 20 choose 13 ) + . ( 20 choose 20 ) Oh boy, it is too large a problem to do by brute force. There are too many possibilities even for this size of problem. Hopefully Someone else will have a better idea. Forget my brute force idea. It's useless and I apologize. I Made a mistake. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ryan Austin Sent: Friday, October 13, 2006 2:43 PM To: r-help@stat.math.ethz.ch Subject: [R] a correlation matrix subset where the subset avg is a maximum Hello R group, Given a correlation matrix, I would like to obtain the best subset of pairs in the matrix of some size n such that the mean of r for that subset is a maximum compared to any other possible subset of size n. I've been looking at the deal and subselect packages but they don't seem to do what I need. Does anyone have any suggestions? Thanks in advance, Ryan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This is not an offer (or solicitation of an offer) to buy/sell the securities/instruments mentioned or an official confirmation. Morgan Stanley may deal as principal in or own or act as market maker for securities/instruments mentioned or may advise the issuers. This is not research and is not from MS Research but it may refer to a research analyst/research report. Unless indicated, these views are the author's and may differ from those of Morgan Stanley research or others in the Firm. We do not represent this is accurate or complete and we may not update this. Past performance is not indicative of future returns. For additional information, research reports and important disclosures, contact me or see https://secure.ms.com/servlet/cls. You should not use e-mail to request, authorize or effect the purchase or sale of any security or instrument, to send transfer instructions, or to effect any other transactions. We cannot guarantee that any such requests received via! e-mail will be processed in a timely manner. This communication is solely for the addressee(s) and may contain confidential information. We do not waive confidentiality by mistransmission. Contact me if you do not wish to receive these communications. In the UK, this communication is directed in the UK to those persons who are market counterparties or intermediate customers (as defined in the UK Financial Services Authority's rules). __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Version of R in saved workspaces
useRs and developeRs- Apologies for my naivety, but I just couldn't figure out how to open an old workspace (created using R 2.3.0) using R 2.3.0 and not R 2.4.0 which is currently happening. For all I know this has always been the case, but I'm having a problem with a function that doesn't work in 2.4.0 but does work in 2.3.1 (function is in the process of being fixed). So I would ideally like to open my old workspaces with the version of R I used in creating the workspace and at the moment not in R 2.4.0 - though in a few weeks when the function is fixed I'm sure I'll love 2.4.0 as I have all previous versions:) Thanks for any help, -Mat Steps to what I'm doing. 1. In R 2.3.1 I import data, manipulate it until my heart's content, and run some analyses. 2. I save the workspace to my local drive - name it CoolStats.RData. 3. I see R 2.4.0 is released and immediately download it and start playing with it only to find the above mentioned error. 4. Since error is going to take some time to fix, will work with R 2.3.1 in the interim. 5. Open CoolStat.Rdata and when I type version I get the below. version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 4.0 year 2006 month 10 day03 svn rev39566 language R version.string R version 2.4.0 (2006-10-03) *** Mat Soukup, Ph.D. Food and Drug Administration 10903 New Hampshire Ave. BLDG 22 RM 5329 Silver Spring, MD 20993-0002 Phone: 301.796.1005 *** [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding error bars to lattice plots
On 10/13/06, Daniel E. Bunker [EMAIL PROTECTED] wrote: Dear Deepayan and Sundar, Thank you so much for your help with this. However, I may have phrased my problem too specifically, assuming that *in general* I could apply your response to all Lattice graphics. What I need is a barchart or vertical dotchart, with error bars, across three treatments, with a form that should look something like: barchart(median~fac1|by1, groups=group1). Your solution works great with the problem I had posed xyplot(voice.part ~ median|by1, groups=group1, [snip], yet when I switch the axes for a vertical dotchart xyplot(median~voice.part|by1, groups=group1, [snip] the error bars remain horizontal and do not follow the switched axes. I tried to alter all 'lx' and 'ux' to 'ly' and 'uy' in panel.ci and prepanel.ci, but to no avail. I'm afraid I have not (YET) managed to understand just how Lattice works. Re. moving panel.abline() from panel.groups to panel, does that mean I need to rewrite panel.superpose? Only if you consider writing a function B that uses function A rewriting function A. Again, any advice would be greatly appreciated. Continuing with the singer example, the following works for me: prepanel.ci - function(x, y, ly, uy, subscripts, ...) { y - as.numeric(y) ly - as.numeric(ly[subscripts]) uy - as.numeric(uy[subscripts]) list(ylim = range(y, uy, ly, finite = TRUE)) } panel.ci - function(x, y, ly, uy, subscripts, pch = 16, ...) { x - as.numeric(x) y - as.numeric(y) ly - as.numeric(ly[subscripts]) uy - as.numeric(uy[subscripts]) panel.arrows(x, ly, x, uy, col = 'black', length = 0.25, unit = native, angle = 90, code = 3) panel.xyplot(x, y, pch = pch, ...) } with(singer.ucl, xyplot(median ~ voice.part, groups=group1, ly = lower, uy = upper, prepanel = prepanel.ci, panel = function(x, y, ...) { panel.abline(v = unique(as.numeric(x)), col = grey) panel.superpose(x, y, ...) }, panel.groups = panel.ci, pch = 16)) Let us know if anything is not obvious. -Deepayan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] correlation b/w a continuous variable and a categorical variable
On Fri, 13 Oct 2006 17:15:45 -0400 Weiwei Shi wrote: Dear Listers: I happen to have this question in mind, is there a way to evaluate the correlation between a continuous variable and a categorical variable (without discretizing the former)? My intuitive is using lda by considering the latter as response variable but not sure. It depends what exactly you mean by evaluate correlation. If you want to test independence of two variables X and Y against some form of association, you can generally use statistics based on sum h(Y) * g(X) where h() and g() are suitable transformations of X and Y. Special cases of this framework are tests for correlation of continuous variables and Chi-squared type statistics for categorical variables. This approach is implemented in the package coin, see independence_test() and the package vignette. hth, Z thanks, weiwei -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] correlation b/w a continuous variable and a categorical variable
I see. i think the question is, I did not have a clear idea of the correlation between them (if I insist no transformation). Otherwise, for a binary variable case, maybe a simple one-way t-test serves the purpose if I defined such correlation or dependency as the group mean difference. thanks. On 10/13/06, Achim Zeileis [EMAIL PROTECTED] wrote: On Fri, 13 Oct 2006 17:15:45 -0400 Weiwei Shi wrote: Dear Listers: I happen to have this question in mind, is there a way to evaluate the correlation between a continuous variable and a categorical variable (without discretizing the former)? My intuitive is using lda by considering the latter as response variable but not sure. It depends what exactly you mean by evaluate correlation. If you want to test independence of two variables X and Y against some form of association, you can generally use statistics based on sum h(Y) * g(X) where h() and g() are suitable transformations of X and Y. Special cases of this framework are tests for correlation of continuous variables and Chi-squared type statistics for categorical variables. This approach is implemented in the package coin, see independence_test() and the package vignette. hth, Z thanks, weiwei -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a correlation matrix subset where the subset avg is a maximum
Thanks for the thought in any case Mark. Your right about the brute force. I'll expand a bit with an example though for the sake of clarity. Given a correlation matrix of 4 covariates ABCD with distances of: AB=0.2; AC=0.6; AD=0.3 ; BC=0.9 ; BD=0.8 ; CD=0.7 Find the optimal subset (size n, n being the number of covariates) where the mean of r for the subset is a maximum. Of course all NxN distances need to be considered between any chosen subset covariates. Thus for n1, the solution would be simply BC = 0.9 And for n2, the solution would be BCD as (BC + CD + BD)/3) = 0.8 is the maximum mean r value that could be obtained from any of the subsets with n2. I'd expected that this would be a common problem but 2 days of googling has given me little. I'm expecting a greedy graph traversal or the like will be my answer but I'd hoped to whip a solution off in R. Any help would be greatly appreciated. Ryan Leeds, Mark (IED) wrote: hi ryan : I reread and you already have the correlation matrix so brute force should definitely work. So, if the correlation matrix was size 20 by 20 and your n was 9. Then, you have to have of size 10 or greater so the number of possoibilities would be ( 20 choose 10 ) + ( 20 choose 11 ) + ( 200 choose 12 ) + ( 20 choose 13 ) + . ( 20 choose 20 ) Oh boy, it is too large a problem to do by brute force. There are too many possibilities even for this size of problem. Hopefully Someone else will have a better idea. Forget my brute force idea. It's useless and I apologize. I Made a mistake. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ryan Austin Sent: Friday, October 13, 2006 2:43 PM To: r-help@stat.math.ethz.ch Subject: [R] a correlation matrix subset where the subset avg is a maximum Hello R group, Given a correlation matrix, I would like to obtain the best subset of pairs in the matrix of some size n such that the mean of r for that subset is a maximum compared to any other possible subset of size n. I've been looking at the deal and subselect packages but they don't seem to do what I need. Does anyone have any suggestions? Thanks in advance, Ryan __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This is not an offer (or solicitation of an offer) to buy/sell the securities/instruments mentioned or an official confirmation. Morgan Stanley may deal as principal in or own or act as market maker for securities/instruments mentioned or may advise the issuers. This is not research and is not from MS Research but it may refer to a research analyst/research report. Unless indicated, these views are the author's and may differ from those of Morgan Stanley research or others in the Firm. We do not represent this is accurate or complete and we may not update this. Past performance is not indicative of future returns. For additional information, research reports and important disclosures, contact me or see https://secure.ms.com/servlet/cls. You should not use e-mail to request, authorize or effect the purchase or sale of any security or instrument, to send transfer instructions, or to effect any other transactions. We cannot guarantee that any such requests received via! e-mail will be processed in a timely manner. This communication is solely for the addressee(s) and may contain confidential information. We do not waive confidentiality by mistransmission. Contact me if you do not wish to receive these communications. In the UK, this communication is directed in the UK to those persons who are market counterparties or intermediate customers (as defined in the UK Financial Services Authority's rules). __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Version of R in saved workspaces
Start R-2.3.1 from the icon which is probably still on your desktop. setwd() to the directory where your CoolStats.RData sits. load(CoolStats.RData) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] correlation b/w a continuous variable and a categorical variable
On Fri, 13 Oct 2006 18:17:10 -0400 Weiwei Shi wrote: I see. i think the question is, I did not have a clear idea of the correlation between them (if I insist no transformation). Otherwise, for a binary variable case, maybe a simple one-way t-test serves the purpose if I defined such correlation or dependency as the group mean difference. ...another special case of the general framework I outlined below. But the man page and package vignette I already pointed you to, give you a much better explanation of this. Z thanks. On 10/13/06, Achim Zeileis [EMAIL PROTECTED] wrote: On Fri, 13 Oct 2006 17:15:45 -0400 Weiwei Shi wrote: Dear Listers: I happen to have this question in mind, is there a way to evaluate the correlation between a continuous variable and a categorical variable (without discretizing the former)? My intuitive is using lda by considering the latter as response variable but not sure. It depends what exactly you mean by evaluate correlation. If you want to test independence of two variables X and Y against some form of association, you can generally use statistics based on sum h(Y) * g(X) where h() and g() are suitable transformations of X and Y. Special cases of this framework are tests for correlation of continuous variables and Chi-squared type statistics for categorical variables. This approach is implemented in the package coin, see independence_test() and the package vignette. hth, Z thanks, weiwei -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. Did you always know? No, I did not. But I believed... ---Matrix III __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to see if row names of a dataframe are stored compactly
Reading the list of changes for R version 2.4.0, I was happy to see that the row names of dataframes can be stored compactly (as the integer n when row.names(df) is 1:n). help(row.names) contains this paragraph: Row names of the form '1:n' for 'n 2' are stored internally in a compact form, which might be seen by calling 'attributes' but never via 'row.names' or 'attr(x, row.names)'. I am unable to get attributes(x)$row.names to return just nrow(x). Am I misreading the documentation? Does might be seen mean possibly in some future version of R in this case? (x - as.data.frame(matrix(1:9, nrow=3))) V1 V2 V3 1 1 4 7 2 2 5 8 3 3 6 9 attributes(x)$row.names [1] 1 2 3 row.names(x) - seq(len=nrow(x)) attributes(x)$row.names [1] 1 2 3 Best, Hsiu-Khuern. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combinatorics
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Robin. When I saw this, I thought expand.grid would do. But since it is too big and since I sympathize that C isn't the ideal use of ones time, perhaps the Combinations package on www.omegahat.org might be helpful. This provides a one-at-a-time approach. D. Robin Hankin wrote: Hi Christos thanks for this. Unfortunately, this approach wouldn't work for me because the real problem is too big for it: I have letters A-F and two of each, giving 12!/(2^6) ~= 7e6 combinations (borderline feasible) But in the approach you coded up below, matrix zz would have 6^12 ~= 2e9 rows before eliminating the non-feasible ones. This is too big for me. Looks like it's going to be another weekend lost to C [but at least I now have some confidence that I've not overlooked anything obvious!] With very best wishes, I really appreciate your efforts Robin On 13 Oct 2006, at 16:21, Christos Hatzis wrote: Hi Robin, This approach first generates all combinations and then eliminates the non-feasible ones. It should work fine for smallish vectors but might not scale well for larger vectors. Hopefully it gives you what you need for this problem. xx - c(A,A,B,B,C) yy - 1:length(xx) zz - expand.grid(yy,yy,yy,yy,yy) ss - zz[ apply(zz, 1, FUN=function(x) length(unique(x))) == length (xx), ] ss - as.matrix(ss) pp - apply(ss, 1, FUN=function(x,v) paste(v[as.vector(x)], collapse=), v=xx) res - unique(pp) res [1] CBBAA BCBAA BBCAA CBABA BCABA CABBA ACBBA BACBA ABCBA BBACA BABCA [12] ABBCA CBAAB BCAAB CABAB ACBAB BACAB ABCAB CAABB ACABB AACBB BAACB [23] ABACB AABCB BBAAC BABAC ABBAC BAABC ABABC AABBC length(res) [1] 30 -Christos Christos Hatzis, Ph.D. Nuvera Biosciences, Inc. 400 West Cummings Park Suite 5350 Woburn, MA 01801 Tel: 781-938-3830 www.nuverabio.com -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Robin Hankin Sent: Friday, October 13, 2006 10:19 AM To: [EMAIL PROTECTED] Subject: [R] combinatorics Hi How do I generate all ways of ordering sets of indistinguishable items? suppose I have two A's, two B's and a C. Then I want AABBC AABCB AACBC ABABC . . .snip... BBAAC . . .snip... CBBAA [there are 5!/(2!*2!) = 30 arrangements. Note AABBC != BBAAC] How do I do this? -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. -- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - -- Duncan Temple Lang[EMAIL PROTECTED] Department of Statistics work: (530) 752-4782 4210 Mathematical Sciences Building fax: (530) 752-7099 One Shields Ave. University of California at Davis Davis, CA 95616, USA -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.3 (Darwin) iD8DBQFFMCTY9p/Jzwa2QP4RAiCiAJ9APb87RkA7Ap1Y8BigNtmI3Q8oAQCfRzfp 3+v/Ari5BVD5/5hDYDIVzWY= =8NBK -END PGP SIGNATURE- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.