[R] Strange behavior with poisosn and glm
Hi, I'm just learning about poison links for the glm function. One of the data sets I'm playing with has several of the variables as factors (i.e. month, group, etc.) When I call the glm function with a formula that has a factor variable, R automatically converts the variable to a series of variables with unique names and binary values. For example, with this pseudo data: yv1month 21january 31.4februrary 1.56.3february 1.24.5january 5.54.0march I use this call: m - glm(y ~ v1 + month, family=poisson) R gives me back a model with variables of Intercept v1 monthJanuary monthFebruary monthMarch I'm concerned that this might be doing some strange things to my model. Can anyone offer some enlightenment? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two questions for R beginners
On Tue, 2 Mar 2010 08:58:25 +1300 Peter Alspach peter.alsp...@plantandfood.co.nz wrote: This brings up another confusion for new users. Simply typing the object name at the command line gives just one view of the object (that provided by print()). Good point. Any good introduction to R should include a brief discussion on 'str'. But sometimes even 'str' can fool you from discovering the real underlying structure of an object, e.g. for data frames. The solution is to use 'unclass' first. -- Karl Ove Hufthammer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] file reading /problems with encoding
Quoting Uwe Ligges lig...@statistik.tu-dortmund.de: R is not able to re-encode the file to the native encoding. But if you keep it in UTF-8, what is the problem to grep for the specific characters (as grep and friends support the argument useBytes these days)? The Problem with UTF-8 is that I'm not able to cat a valid xml-file. Using the encoding=UTF-8 option in either the file() or the readLines() command will cause an error. If I would leave out both, it's not possible for me to run a gsub command on the string, because of special characters - even with the useBytes-option turned on: grep(über 40%,xml,useBytes=TRUE) will return integer(0). And the problem is obvious: By reading in the file, the ü was taken to üb. However I believe, that I did not use the useBytes-option in the right way, didn't I? Thanks a lot for your help! Best regards, Tom __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] embedded nuls in 2.10 versus 2.11
I have been reading binary files, and parsing the output, for some time now. I have tried to develop a technique that is as robust as possible to all the strange things that appear in text fields, not to mention different global/regional encodings. I have no control over the data generated by users, so I would like to be as flexible and accommodating as possible. The following code is straightforward, but will fail with embedded nuls in R = 2.10 fid = open(filename, rb) readChar(fid, n=10) close(fid) Previous suggestions from the R-help list led me to consider fid = open(filename, rb) rawToChar(readBin(fid, raw, 10)) close(fid) or even fid = open(filename, rb) iconv(rawToChar(readBin(fid, raw, 10)), to=UTF-8) close(fid) to ensure that my output is well behaved. With the new error handling in rawToChar() in R = 2.11, embedded nuls are no longer allowed except at the end of the string. I run across these all the time in my user data. How can I recover as much of the text as possible when reading in from a binary file with embedded nuls in R = 2.11 and keep the code backwards compatible with R 2.11? thanks... Brandon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange behavior with poisosn and glm
On 02-Mar-10 08:02:27, Noah Silverman wrote: Hi, I'm just learning about poison links for the glm function. One of the data sets I'm playing with has several of the variables as factors (i.e. month, group, etc.) When I call the glm function with a formula that has a factor variable, R automatically converts the variable to a series of variables with unique names and binary values. For example, with this pseudo data: yv1month 21january 31.4februrary 1.56.3february 1.24.5january 5.54.0march I use this call: m - glm(y ~ v1 + month, family=poisson) R gives me back a model with variables of Intercept v1 monthJanuary monthFebruary monthMarch I'm concerned that this might be doing some strange things to my model. Can anyone offer some enlightenment? Thanks! The creation of auxiliary variables is the way to incorporate a factor variable into a model. These are usually called dummy variables, and are essentially indicator variables. Your data above would correspond to variables I (for Intercept), J (for January), F (for February) and M (for March) in addition to the other variables y and v1 as below: y v1I J F M # month 2 1 1 1 0 0 # january 3 1.4 1 0 1 0 # februrary 1.56.3 1 0 1 0 # february 1.24.5 1 1 0 0 # january 5.54.0 1 0 0 1 # march The linear predictor L in the model for y would then be L = a*I + b*v1 + c1*J + c2*F + c3*J evaluated arithmetically; e.g. for row 2 of the data it is a + b*1.4 + c2 However, as given, J + F + M = I, so there is redundancy in the variables, since there are only three independent values there (not so if you exclude the Intercept using a model formula y ~ v1 + month - 1), so R will provide estimates which are computed in terms of some pattern of differences between these four variables called contrasts. Different patterns of difference present different representations of the three independent aspects. There are many different kinds of contrasts available. One of these will be chosen as default by R (depending in particular on whether the factor variable is being used as an ordered factor or an unordered factor). See ?contrasts for an outline of what is there, ?contrast for more detail, and look at the help for particular contrasts such as ?contr.helmert, ?contr.poly, ?contr.sum, ?contr.treatment. After all that: No, R is not doing strange things to your model! ted. E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 02-Mar-10 Time: 08:47:11 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple Linear Autoregressive Model with R Language
Emil Davtyan wrote: Hello - I need to do simple linear autoregressive model with R software for my thesis. I looked into all your documentation and I am not able to find anything too helpful. Can someone help me with the codes? Thanks Emil [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Hi, Google R ar model, the first hit gives: http://stat.ethz.ch/R-manual/R-patched/library/stats/html/ar.html cheers, Paul -- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +3130 274 3113 Mon-Tue Phone: +3130 253 5773 Wed-Fri http://intamap.geo.uu.nl/~paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple Linear Autoregressive Model with R Language
On Mon, 1 Mar 2010, Emil Davtyan wrote: Hello - I need to do simple linear autoregressive model with R software for my thesis. I looked into all your documentation and I am not able to find anything too helpful. Can someone help me with the codes? By all documentation you mean that you have also looked at the time series and econometrics task views that containt information on that topic? See http://CRAN.R-project.org/view=TimeSeries http://CRAN.R-project.org/view=Econometrics In particular ar() (or maybe arima()) in the basic stats model seems to be what you are looking for. Packages FitAR or dynlm might also be useful. Best, Z Thanks Emil [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the predict.lda function
On Mon, 2010-03-01 at 16:55 -0500, Diana Connett wrote: Hello. I just downloaded R onto a new computer, and after entering library(MASS), I still get the message Error: could not find function predict.lda when I try to use the predict.lda function (even just predict.lda()) Can anyone help me out? Stop calling it directly, use the generic predict() instead. The reason predict.lda can't be found is that it is hidden in a package NAMESPACE: require(MASS) Loading required package: MASS predict.lda() Error: could not find function predict.lda methods(predict) [1] predict.ar*predict.Arima* [3] predict.arima0*predict.glm [5] predict.glmmPQL* predict.HoltWinters* [7] predict.lda* predict.lm [9] predict.loess* predict.lqs* [11] predict.mca* predict.mlm [13] predict.nls* predict.polr* [15] predict.poly predict.ppr* [17] predict.prcomp*predict.princomp* [19] predict.qda* predict.rlm* [21] predict.smooth.spline* predict.smooth.spline.fit* [23] predict.StructTS* Non-visible functions are asterisked By calling things directly you aren't really using R the way the developers want you to. You should not need to know that there are all those predict methods and what their names are etc. You should just need to check that there is a method for the object/code you are using and then call the generic function whilst R takes care of everything else. If you *must* call it directly: MASS:::predict.lda() See ?`:::` HTH G Thank you! Diana Connett __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Changepoints estimation in a data series
On Mon, 1 Mar 2010, FMH wrote: Dear All, I'm trying to find changepoints in a data series which only consist of 11 measurements of altitude(x) and temperature(y), respectively, in which the data are as followed: y = 16.3, 16.2, 16.1, 15.6, 14.2, 10, 8.2, 8.0, 7.5, 7.3, 7.2 x = 1, 2, 5, 10, 15, 20, 25, 30, 40, 50, 60 From the above series, i reckon there is more than one changepoint and presuming there is a package in R which might enable the estimation on such changepoints. It depends what exactly you mean by changepoint, especially because the curve looks more sigmoidal than with a clear-cut change. Maybe these have been averaged already. In any case, some useful methods might include: o maxstat_test() in package coin for changepoint estimation via maximally selected statistics o breakpoints() in package strucchange for OLS estimation of two separate constant means o segmented() in package segmented for OLS estimation of a broken line trend If you are looking for two separate constant means, I would probably employ the maximally selected statistics in this situation: ## data x - c(1, 2, 5, 10, 15, 20, 25, 30, 40, 50, 60) y - c(16.3, 16.2, 16.1, 15.6, 14.2, 10, 8.2, 8.0, 7.5, 7.3, 7.2) plot(y ~ x, type = b) ## test library(coin) maxstat_test(y ~ x) ## add estimated changepoint abline(v = 15, lty = 2) hth, Z Could someone please advice me on this matter by using R? Cheers, FMH __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] capturing errors in Sweave
Thanks, Berwin. That works just great! --sundar On Tue, Mar 2, 2010 at 12:57 AM, Berwin A Turlach ber...@maths.uwa.edu.auwrote: G'day Sundar, On Mon, 1 Mar 2010 23:46:55 -0800 Sundar Dorai-Raj sdorai...@gmail.com wrote: Thanks for the input, but I don't want try in the Sweave output. I want the output to look just like it does in the console, as if an uncaptured error really did occur. I don't think that you will get around using try; and you will have to work moderately hard to make the output appear as it does on the console. Probably somewhere along the lines: Sweave code start ++ Function-4a= MySqrt - function(x) { if (missing(x)) { stop('x' is missing with no default) } if (!is.numeric(x)) { stop('x' should only be numeric) } if (x 0) { stop('x' should be non-negative) } return(sqrt(x)) } @ echo=FALSE= tmp - try(MySqrt()) @ eval=FALSE= MySqrt() @ echo=FALSE= cat(tmp[1]) @ echo=FALSE= tmp - try(MySqrt(a)) @ eval=FALSE= MySqrt(a) @ echo=FALSE= cat(tmp[1]) @ echo=FALSE= tmp - try(MySqrt(-2)) @ eval=FALSE= MySqrt(-2) @ echo=FALSE= cat(tmp[1]) @ = MySqrt(4) @ +++ Sweave code end ++ Now what I would like to know is how to include easily warning messages in my Sweave output without having to try whether Jean Lobry's [1] hack still works. :) HTH. Cheers, Berwin [1] https://www.stat.math.ethz.ch/pipermail/r-help/2006-December/121975.html == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] repeated measures anova, car package
Hello John, As you said, I could also take a means model and test linear hypothesis for the desired effects - would this also be the case for the repeated measure i did in the first place. I copied the model from the car model where you first call: modx-lm(cbind(div_h, div_l) ~ site, divrep) (?Could I test linear hypothesis here, instead of continuing as I did beneath) idat cover 1 high 2 low (av.ok1 - Anova(modx, idata=idat, idesign=~cover)) Type II Repeated Measures MANOVA Tests: Pillai test statistic Df test stat approx F num Df den Df Pr(F) site1 0.49908 9.9631 1 10 0.010220 * cover 1 0.28145 3.9169 1 10 0.075984 . site:cover 1 0.53963 11.7216 1 10 0.006507 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 divrep repl. site div_h div_l 1 1 Scrub 4.18 5.23 2 2 Scrub 5.47 7.18 3 3 Scrub 3.74 4.97 4 4 Scrub 2.62 5.17 5 5 Scrub 3.33 6.43 6 6 Scrub 1.62 8.96 7 1 Tall_Forb 4.70 3.88 8 2 Tall_Forb 3.65 1.97 9 3 Tall_Forb 2.50 1.19 10 4 Tall_Forb 1.87 2.37 11 5 Tall_Forb 5.33 3.56 12 6 Tall_Forb 3.06 3.60 Your answers helped a lot - Thank you very much for the quick reply! Best wishes, Kay -- View this message in context: http://n4.nabble.com/repeated-measures-anova-car-package-tp1573721p1574747.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Thougt I understood factors but??
On Mon, 1 Mar 2010 14:23:04 -0500 Liaw, Andy andy_l...@merck.com wrote: Indeed this is one of the (few, I believe) traps of R, Oh, no; there are many more: http://www.burns-stat.com/pages/Tutor/R_inferno.pdf :-) -- Karl Ove Hufthammer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] why a text editor?
On Mon, 01 Mar 2010 16:26:37 - (GMT) ted.hard...@manchester.ac.uk ted.hard...@manchester.ac.uk wrote: In vim (to which I'm wedded for life) it will pick up matching (), {} and []. You can also easily move between matching delimeter by typing '%'. A similar feature should be available in all *good* text editors. -- Karl Ove Hufthammer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two questions for R beginners
On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch murd...@stats.uwo.ca wrote: Suppose X is a dataframe or a matrix. What would you expect to get from X[1]? What about as.vector(X), or as.numeric(X)? All this of course depends on type of object one is speaking of. There are plenty of surprises available, and it's best to use the most logical way of extracting. E.g., to extract the top-left element of a 2D structure (data frame or matrix), use 'X[1,1]'. Luckily, R provides some shortcuts. For example, you can write 'X[2,3]' on a data frame, just as if it was a matrix, even though the underlying structure is completely different. (This doesn't work on a normal list; there you have to type the whole 'X[[2]][3]'.) The behaviour of the 'as.' functions may sometimes be surprising, at least for me. For example, 'as.data.frame' on a named vector gives a single-column data frame, instead of a single-row data frame. (I'm not sure what's the recommended way of converting a named vector to row data frame, but 'as.data.frame(t(X))' works, even though both 'X' and 't(X)' looks like a row of numbers.) The point is that a dataframe is a list, and a matrix isn't. If users don't understand that, then they'll be confused somewhere. Making matrices more list-like in one respect will just move the confusion elsewhere. The solution is to understand the difference. My main problem is not understanding the difference, which is easy, but knowing which type of I have when I get the output a function in a package. If I know the object is a named vector or a matrix with column names, it's easy enough to type 'X[,colname]', and if it's a data frame one may use the shortcut 'X$colname'. Usually, it *is* documented what the return value of a function is, but just looking at the output is much faster, and *usually* gives the correct answer. For example, 'mean' applied on a data frame gives a named vector, not a data frame, which is somewhat surprising (given that the columns of a data frame may be of different types, while the elements of a vector may not). (And yes, I know that it's *documented* that it returns a named vector.) On the other hand, perhaps it is surprising that 'mean' works on data frames at all. :-) -- Karl Ove Hufthammer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lm.influence on glm objects
Dear R users Today I discovered that function lm.influence() stops when applied to glm objects with the following error message Error in if (NROW(e) != n) stop(non-NA residual length does not match cases used in fitting) : argument is of length zero After inspecting lm.influence.R (both into R-2.10.1.tar.gz and R-patched.tar.gz) i found (line 53) that n is computed as n - as.integer(nrow(model$qr$qr)) However, glm objects (differently from lm objects) do not have a $qr component. Is this intentional, i.e. it means that we have to use lm.influence only with glm objects? It could be, but I remark that the lm.influence{stats} help says: The influence.measures() and other functions listed in See Also provide a more user oriented way of computing a variety of regression diagnostics. These all build on lm.influence. Note that for GLMs (other than the Gaussian family with identity link) these are based on one-step approximations which may be inadequate if a case has high influence. Moreover, if we have to use such a function only with lm objects, I would suggest to implement some more explicit check. Thanks in advance for any help Fabrizio Cipollini __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange behavior with poisosn and glm
Ted, Brilliant explanation (as usual) I'm back in school, just starting on a post-graduate degree in stats so the help is really appreciated. Now, I have a slightly trickier question about the same model. I've seen more than one way to get values out of the glm model. i.e. If we're looking at the 10th item in the dataset: note: m is the model fitted(m)[10] predict(m,dataset[10,]) Give me different results. From my data, I get the following real results: predict(m,data[100,]) 100 7.727999 fitted(m)[100] 179 3956.637 From my understanding, the exp of the prediction should be equal to the fitted value. Here it is not. I don't understand why. Any insight? -N On 3/2/10 12:47 AM, (Ted Harding) wrote: On 02-Mar-10 08:02:27, Noah Silverman wrote: Hi, I'm just learning about poison links for the glm function. One of the data sets I'm playing with has several of the variables as factors (i.e. month, group, etc.) When I call the glm function with a formula that has a factor variable, R automatically converts the variable to a series of variables with unique names and binary values. For example, with this pseudo data: yv1month 21january 31.4februrary 1.56.3february 1.24.5january 5.54.0march I use this call: m- glm(y ~ v1 + month, family=poisson) R gives me back a model with variables of Intercept v1 monthJanuary monthFebruary monthMarch I'm concerned that this might be doing some strange things to my model. Can anyone offer some enlightenment? Thanks! The creation of auxiliary variables is the way to incorporate a factor variable into a model. These are usually called dummy variables, and are essentially indicator variables. Your data above would correspond to variables I (for Intercept), J (for January), F (for February) and M (for March) in addition to the other variables y and v1 as below: y v1I J F M # month 2 1 1 1 0 0 # january 3 1.4 1 0 1 0 # februrary 1.56.3 1 0 1 0 # february 1.24.5 1 1 0 0 # january 5.54.0 1 0 0 1 # march The linear predictor L in the model for y would then be L = a*I + b*v1 + c1*J + c2*F + c3*J evaluated arithmetically; e.g. for row 2 of the data it is a + b*1.4 + c2 However, as given, J + F + M = I, so there is redundancy in the variables, since there are only three independent values there (not so if you exclude the Intercept using a model formula y ~ v1 + month - 1), so R will provide estimates which are computed in terms of some pattern of differences between these four variables called contrasts. Different patterns of difference present different representations of the three independent aspects. There are many different kinds of contrasts available. One of these will be chosen as default by R (depending in particular on whether the factor variable is being used as an ordered factor or an unordered factor). See ?contrasts for an outline of what is there, ?contrast for more detail, and look at the help for particular contrasts such as ?contr.helmert, ?contr.poly, ?contr.sum, ?contr.treatment. After all that: No, R is not doing strange things to your model! ted. E-Mail: (Ted Harding)ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 02-Mar-10 Time: 08:47:11 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading sas7bdat files directly
The dsread output is little-endian, as that's the native format for floats on the Wintel platform. The byte order should stay the same if converting directly to a float, using a data structure like (C/C++): union { char bytes[8]; double value; } If reading the values with a SAS HEX informat, the bytes will need to be reversed. It's obviously trivial for me to add an endian-ness option, I'll do that later Chris. On 02/03/2010 02:06, Roger DeAngelis(xlr82sas) wrote: Hi, It looks like we may need to swap bytes(little endian to big endian). I will look into it tonight. As a side note, SAS reserves 28 floats for missing values. It should be easy to convert these to NaN on input to R. You can test this in SAS by converting the 16 char floats to ieee8. in SAS and doing a put. The result will be A, B...Z, . and _. SAS code that produced the listing is below. Here are the floats that map to the 28 missing values in SAS A FD00 B FC00 C FB00 D FA00 E F900 F F800 G F700 H F600 I F500 J F400 K F300 L F200 M F100 N F000 O EF00 P EE00 Q ED00 R EC00 S EB00 T EA00 U E900 V E800 W E700 X E600 Y E500 Z E400 _ FF00 . FE00 data mis; retain A .A B .B C .C D .D E .E F .F G .G H .H I .I J .J K .K L .L M .M N .N O .O P .P Q .Q R .R S .S T .T U .U V .V W .W X .X Y .Y Z .Z _ ._ DOT .; array mis[28] A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _ DOT; do idx=1 to 28; hex=put(mis[idx],ieee8.); xeh=put(hex,hex16.); put @1 mis[idx] @6 xeh; end; [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] capturing errors in Sweave
G'day Sundar, On Mon, 1 Mar 2010 23:46:55 -0800 Sundar Dorai-Raj sdorai...@gmail.com wrote: Thanks for the input, but I don't want try in the Sweave output. I want the output to look just like it does in the console, as if an uncaptured error really did occur. I don't think that you will get around using try; and you will have to work moderately hard to make the output appear as it does on the console. Probably somewhere along the lines: Sweave code start ++ Function-4a= MySqrt - function(x) { if (missing(x)) { stop('x' is missing with no default) } if (!is.numeric(x)) { stop('x' should only be numeric) } if (x 0) { stop('x' should be non-negative) } return(sqrt(x)) } @ echo=FALSE= tmp - try(MySqrt()) @ eval=FALSE= MySqrt() @ echo=FALSE= cat(tmp[1]) @ echo=FALSE= tmp - try(MySqrt(a)) @ eval=FALSE= MySqrt(a) @ echo=FALSE= cat(tmp[1]) @ echo=FALSE= tmp - try(MySqrt(-2)) @ eval=FALSE= MySqrt(-2) @ echo=FALSE= cat(tmp[1]) @ = MySqrt(4) @ +++ Sweave code end ++ Now what I would like to know is how to include easily warning messages in my Sweave output without having to try whether Jean Lobry's [1] hack still works. :) HTH. Cheers, Berwin [1] https://www.stat.math.ethz.ch/pipermail/r-help/2006-December/121975.html == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019)+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: ber...@maths.uwa.edu.au Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Install R 2.10.1 on Windows XP Errors
From CRAN: 2.8 What's the best way to upgrade? That's a matter of taste. For most people the best thing to do is to uninstall R (see the previous Q), install the new version, copy any installed packages to the library folder in the new installation, run update.packages(checkBuilt=TRUE, ask=FALSE) in the new R and then delete anything left of the old installation. Is there now a new procedure for updating the old packages? Michael Borregaard, PhD student Department of Biology University of Copenhagen -- View this message in context: http://n4.nabble.com/Install-R-2-10-1-on-Windows-XP-Errors-tp1310942p1574794.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with package fpc
I am trying to load package fpc in order to use the 'plotcluster' function however everytime I attempt to do so I get the following warning message: library(fpc) Loading required package: MASS Error: package 'MASS' could not be loaded In addition: Warning messages: 1: package 'fpc' was built under R version 2.9.2 2: In library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc = lib.loc) : there is no package called 'MASS' I thought that MASS was one of the basic packages supplied with R but when I look in the library I cannot find it. However in windows programme documents under R there is a folder called MASS. Could somebody tell me where I am going wrong? Thanks in advance, Sarah Paul, Cardiff University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two questions for R beginners
On Mon, Mar 1, 2010 at 11:49 PM, Liviu Andronic landronim...@gmail.com wrote: On 3/1/10, Keo Ormsby keo.orms...@gmail.com wrote: Perhaps my biggest problem was that I couldn't (and still haven't) seen *absolute beginners* documents. there was once a link posted on r-sig-teaching that would probably fit your needs, but I cannot find it now. OK, I found it. Below is an excerpt of that r-sig-teaching e-mail. Liviu On Thu, Jul 2, 2009 at 2:19 PM, Robert W. Hayden hay...@mv.mv.com wrote: I think such a website would be a real asset. It would be most useful if it either were restricted to intro. stats. OR organized so that materials for real beginners were easy to extract from all the materials for programmers and Ph.D. statisticians. As a relative beginner myself, I find the usual resources useless. In self defense, I created materials for my own beginning students: http://courses.statistics.com/software/R/Rhome.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bwplot with pch = |
On Mon, Mar 1, 2010 at 10:49 AM, Duncan Mackay mac...@northnet.com.au wrote: Dear All Below is a toy example of a modified standard bwplot. require(lattice) DF - data.frame(site = rep(1:5, each = 20), height = rnorm(100)) bwplot(site ~ height,DF, pch = |, par.settings = list(strip.background = list(col = transparent), box.rectangle = list(col = grey70,lty = 1), box.umbrella = list(col = grey70,lty = 1), plot.symbol = list(alpha = 1,col = grey70,cex = 1,pch = 20), superpose.symbol = list(cex = rep(0.7, 7),col = black, pch = rep(20,7))) ) The help guide shows that pch = | is a special case. This give me a line across the box which is what I want but how do I make it thicker and red. The part of panel.bwplot() responsible for this is if (all(pch == |)) { mult - if (notch) 1 - notch.frac else 1 panel.segments(blist.stats[, 3], levels.fos - mult * blist.height / 2, blist.stats[, 3], levels.fos + mult * blist.height / 2, lwd = box.rectangle$lwd, lty = box.rectangle$lty, col = box.rectangle$col, alpha = alpha) } which shows that you are stuck with the same color as the rest of the box. However, you can add your own thick red lines in a custom panel function: bwplot(site ~ height,DF, pch = |, panel = function(x, y, ...) { panel.bwplot(x, y, ...) meds - tapply(x, y, median) ylocs - seq_along(meds) panel.segments(meds, ylocs - 1/4, meds, ylocs + 1/4, lwd = 2, col = red) }) -Deepayan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question to define a matrix with some vectors with different lengths
Hi, I have some vector v1,v2,...,vk, with different lengths. I want to consider these vectors as a matrix with k rows. Can you please guide me how I can do it? Regards khazaei __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two questions for R beginner
What were your biggest misconceptions or stumbling blocks to getting up and running with R? Easy. I terms of materials I have been unable to find good books that introduce users to R from the perspective of someone familiar only with packages like SPSS or STATA, or not familiar with statistics packages at all. Even introduction texts use jargon without introducing it. I think that R-help files should be more thorough than they are, and contain more examples. I thought that STATA help files were sparse! The notion that 'R is a user community and thus they do this in their spare time' is no excuse for those creating new tools for R not developing complete help files. It doesn't take that much time relative to actually creating the new function. In terms of actual R use - creating, using, and manipulating data are the biggest frustration for those of the 'spreadsheet generation'. I get the impression that one needs to not merely understand, but be fully fluent in the jargon of matrix mathematics to even know what is going on half the time. I find myself - even now - using 'rules of thumb' that 'seemed to work' rather than fully understanding what I am doing. It is particularly discouraging when many of those 'intro books' suggest using something besides R for data manipulation - how clumsy is that!? I find the actual programming syntax itself is the easiest part to master. It is certainly more flexible - but without a particularly sufficient increase in complexity - than trying to write script in SPSS and STATA. Brandon Zicha __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two questions for R beginners
Please take what follows not as an ad hominem statement, but rather as an attempt to improve what is already an excellent program, that has been built as a result of many, many hours of dedicated work by many, many unpaid, unsung volunteers. It troubles me a bit that when a confusing aspect of R is pointed out the response is not to try to improve the language so as to avoid the confusion, but rather to state that the confusion is inherent in the language. I understand that to make changes that would avoid the confusing aspect of the language that has been discussed in this thread would take time and effort by an R wizard (which I am not), time and effort that would not be compensated in the traditional sense. This does not mean that we should not acknowledge the confusion. If we what R to be the de facto lingua franca of statistical analysis doesn't it make sense to strive for syntax that is as straight forward and consistent as possible? Again, please understand that my comment is made with deepest respect for the many people who have unselfishly contributed to the R project. Many thanks to each and every one of you. John Karl Ove Hufthammer k...@huftis.org 3/2/2010 4:00 AM On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch murd...@stats.uwo.ca wrote: Suppose X is a dataframe or a matrix. What would you expect to get from X[1]? What about as.vector(X), or as.numeric(X)? All this of course depends on type of object one is speaking of. There are plenty of surprises available, and it's best to use the most logical way of extracting. E.g., to extract the top-left element of a 2D structure (data frame or matrix), use 'X[1,1]'. Luckily, R provides some shortcuts. For example, you can write 'X[2,3]' on a data frame, just as if it was a matrix, even though the underlying structure is completely different. (This doesn't work on a normal list; there you have to type the whole 'X[[2]][3]'.) The behaviour of the 'as.' functions may sometimes be surprising, at least for me. For example, 'as.data.frame' on a named vector gives a single-column data frame, instead of a single-row data frame. (I'm not sure what's the recommended way of converting a named vector to row data frame, but 'as.data.frame(t(X))' works, even though both 'X' and 't(X)' looks like a row of numbers.) The point is that a dataframe is a list, and a matrix isn't. If users don't understand that, then they'll be confused somewhere. Making matrices more list-like in one respect will just move the confusion elsewhere. The solution is to understand the difference. My main problem is not understanding the difference, which is easy, but knowing which type of I have when I get the output a function in a package. If I know the object is a named vector or a matrix with column names, it's easy enough to type 'X[,colname]', and if it's a data frame one may use the shortcut 'X$colname'. Usually, it *is* documented what the return value of a function is, but just looking at the output is much faster, and *usually* gives the correct answer. For example, 'mean' applied on a data frame gives a named vector, not a data frame, which is somewhat surprising (given that the columns of a data frame may be of different types, while the elements of a vector may not). (And yes, I know that it's *documented* that it returns a named vector.) On the other hand, perhaps it is surprising that 'mean' works on data frames at all. :-) -- Karl Ove Hufthammer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:6}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] code for empirical copula
Hi, I hope somebody can give me an idea where can I can find the code for empirical copula. I have a bivariate data. Thank you so much for your help. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two questions for R beginner
Brandon Zicha wrote: What were your biggest misconceptions or stumbling blocks to getting up and running with R? Easy. I terms of materials I have been unable to find good books that introduce users to R from the perspective of someone familiar only with packages like SPSS or STATA, or not familiar with statistics packages at all. Even introduction texts use jargon without introducing it. I think that R-help files should be more thorough than they are, and contain more examples. I thought that STATA help files were sparse! The notion that 'R is a user community and thus they do this in their spare time' is no excuse for those creating new tools for R not developing complete help files. It doesn't take that much time relative to actually creating the new function. Hi Brandon, I would disagree with your point that documentation doesn't take much time. Writing documentation that is suitable for both the advanced user (being a reference, and thus preferably short) and the beginning user (being sort of a tutorial, and thus prefererably longer) is quite a challenge, comparable to writing a good paper. Apart from the fact that it takes quite a while, it is also not much fun. Often people develop packages for their own research and put the software online so others can benefit, they don;t need the documentation themselves and don't get paid to write the documentation. So saying 'it's no excuse' really goes too far in my view. R is free, you did not pay several thousands of euros giving you the right for good support. Even the support is free through the mailing list. You can get a paid version of R at Revelution Computing. Then you can call them if there are problems. I'm not meaning to offend anybody, but I didn't agree with is no excuse for those creating new tools for R not developing complete help files. Partly the strength of R is in the open source, but sometimes, as with documentation, this can bite you. But I think the R docs aren't that bad, I've seen proprietary software that a worse job than R. my 2euro on the subject :), Cheers, Paul In terms of actual R use - creating, using, and manipulating data are the biggest frustration for those of the 'spreadsheet generation'. I get the impression that one needs to not merely understand, but be fully fluent in the jargon of matrix mathematics to even know what is going on half the time. I find myself - even now - using 'rules of thumb' that 'seemed to work' rather than fully understanding what I am doing. It is particularly discouraging when many of those 'intro books' suggest using something besides R for data manipulation - how clumsy is that!? I find the actual programming syntax itself is the easiest part to master. It is certainly more flexible - but without a particularly sufficient increase in complexity - than trying to write script in SPSS and STATA. Brandon Zicha __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +3130 274 3113 Mon-Tue Phone: +3130 253 5773 Wed-Fri http://intamap.geo.uu.nl/~paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Random real numbers
Hi, How could i generate random real numbers between 0 en 2*pi? Thanks, Frederik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem With Pasting (Mac OS X)
Greetings, fellow travelers. When I paste a series of commands into R, they execute serially, and when I go back through commands by hitting the up key, they show up as a block, rather than as individual lines. Is there any way to change this behavior? I'm running the 32-bit build of R 2.10.1 on Mac OS X 10.6.2. -- View this message in context: http://n4.nabble.com/Problem-With-Pasting-Mac-OS-X-tp1574871p1574871.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two questions for R beginner
On Tue, 2 Mar 2010 12:31:45 +0100 Brandon Zicha brandon.zi...@ua.ac.be wrote: Easy. I terms of materials I have been unable to find good books that introduce users to R from the perspective of someone familiar only with packages like SPSS or STATA, Have you read these books: R for SAS and SPSS Users http://www.springer.com/statistics/computanional+statistics/book/978-0- 387-09417-5 R for Stata Users http://www.springer.com/statistics/computanional+statistics/book/978-1- 4419-1317-3 (I have not, so I don't know how good they are.) -- Karl Ove Hufthammer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question to define a matrix with some vectors with different lengths
On Tue, 2 Mar 2010 12:29:28 +0100 (CET) khaz...@ceremade.dauphine.fr khaz...@ceremade.dauphine.fr wrote: I have some vector v1,v2,...,vk, with different lengths. I want to consider these vectors as a matrix with k rows. Can you please guide me how I can do it? What do you want to do with the missing elements? -- Karl Ove Hufthammer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two questions for R beginners
John Sorkin wrote: Please take what follows not as an ad hominem statement, but rather as an attempt to improve what is already an excellent program, that has been built as a result of many, many hours of dedicated work by many, many unpaid, unsung volunteers. It troubles me a bit that when a confusing aspect of R is pointed out the response is not to try to improve the language so as to avoid the confusion, but rather to state that the confusion is inherent in the language. I understand that to make changes that would avoid the confusing aspect of the language that has been discussed in this thread would take time and effort by an R wizard (which I am not), time and effort that would not be compensated in the traditional sense. This does not mean that we should not acknowledge the confusion. If we what R to be the de facto lingua franca of statistical analysis doesn't it make sense to strive for syntax that is as straight forward and consistent as possible? I think you've misunderstood the argument. It would not be hard to make the suggested change. I don't object to it because it would be too much work, I object to it because I think it is not an improvement. Dataframes and matrices are different, and there is no way to avoid that fact. The arguments in favour of the change seem to be these: - Dataframes and matrices are similar in some respects, so they should be similar in more. In fact, I believe that the source of confusion is the fact that the are similar, so this would not improve things. People would still be confused by the differences, which are unavoidable. - Using $ to extract a column of a matrix would be convenient. I agree, it saves 4 keystrokes to type X$column instead of X[,column]. But I think it increases confusion, so the savings are not worthwhile. For example, the col2rgb function returns a matrix with rows named red, green and blue. But under your proposal, I'd still need to use X[red,] to extract the red component, because columns are components, but rows are not. You are complaining that the lack of $ for matrices is an unnecessary asymmetry, and unnecessary asymmetries are confusing. But your proposal introduces a new one! - Some functions return matrices when I expect a dataframe, or vice versa. That will continue to be true regardless of whether the proposed change is made. You need to read the documentation. If it is unclear, it should be improved, the language shouldn't be changed so that sloppy documentation is accurate. - You suggested this so anyone who disagrees must be lazy. Which really is an ad hominem argument, despite your disclaimer. I think you should respect the fact that there are people who disagree with the value of your suggestion. (Which is also an ad hominem attack, but isn't central to my argument.) Duncan Murdoch Again, please understand that my comment is made with deepest respect for the many people who have unselfishly contributed to the R project. Many thanks to each and every one of you. John Karl Ove Hufthammer k...@huftis.org 3/2/2010 4:00 AM On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch murd...@stats.uwo.ca wrote: Suppose X is a dataframe or a matrix. What would you expect to get from X[1]? What about as.vector(X), or as.numeric(X)? All this of course depends on type of object one is speaking of. There are plenty of surprises available, and it's best to use the most logical way of extracting. E.g., to extract the top-left element of a 2D structure (data frame or matrix), use 'X[1,1]'. Luckily, R provides some shortcuts. For example, you can write 'X[2,3]' on a data frame, just as if it was a matrix, even though the underlying structure is completely different. (This doesn't work on a normal list; there you have to type the whole 'X[[2]][3]'.) The behaviour of the 'as.' functions may sometimes be surprising, at least for me. For example, 'as.data.frame' on a named vector gives a single-column data frame, instead of a single-row data frame. (I'm not sure what's the recommended way of converting a named vector to row data frame, but 'as.data.frame(t(X))' works, even though both 'X' and 't(X)' looks like a row of numbers.) The point is that a dataframe is a list, and a matrix isn't. If users don't understand that, then they'll be confused somewhere. Making matrices more list-like in one respect will just move the confusion elsewhere. The solution is to understand the difference. My main problem is not understanding the difference, which is easy, but knowing which type of I have when I get the output a function in a package. If I know the object is a named vector or a matrix with column names, it's easy enough to type 'X[,colname]', and if it's a data frame one may use the shortcut 'X$colname'. Usually, it *is* documented what the return value of a function is, but just looking at the output is much faster,
Re: [R] two questions for R beginner
Hi Brandon, I just read this book, which I am sure you will be interested in: http://www.amazon.com/SAS-SPSS-Users-Statistics-Computing/dp/0387094172 Cheers!! Albert-Jan ~~ In the face of ambiguity, refuse the temptation to guess. ~~ --- On Tue, 3/2/10, Brandon Zicha brandon.zi...@ua.ac.be wrote: From: Brandon Zicha brandon.zi...@ua.ac.be Subject: Re: [R] two questions for R beginner To: r-help@r-project.org Date: Tuesday, March 2, 2010, 12:31 PM What were your biggest misconceptions or stumbling blocks to getting up and running with R? Easy. I terms of materials I have been unable to find good books that introduce users to R from the perspective of someone familiar only with packages like SPSS or STATA, or not familiar with statistics packages at all. Even introduction texts use jargon without introducing it. I think that R-help files should be more thorough than they are, and contain more examples. I thought that STATA help files were sparse! The notion that 'R is a user community and thus they do this in their spare time' is no excuse for those creating new tools for R not developing complete help files. It doesn't take that much time relative to actually creating the new function. In terms of actual R use - creating, using, and manipulating data are the biggest frustration for those of the 'spreadsheet generation'. I get the impression that one needs to not merely understand, but be fully fluent in the jargon of matrix mathematics to even know what is going on half the time. I find myself - even now - using 'rules of thumb' that 'seemed to work' rather than fully understanding what I am doing. It is particularly discouraging when many of those 'intro books' suggest using something besides R for data manipulation - how clumsy is that!? I find the actual programming syntax itself is the easiest part to master. It is certainly more flexible - but without a particularly sufficient increase in complexity - than trying to write script in SPSS and STATA. Brandon Zicha __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] simple data transformation question
Hi all, I have a (hopefully) simple newbie-level question. # I have data like this: dtf - data.frame(read.table(textConnection(var value company 9887.1 company 91117.0 blaah 91.1 etc 11 etc 97111), header=TRUE)) # I would like to have output like this (the index number may vary): var value.1 value.2 company 9887.1 91117.0 blah 91.1 NA etc 11 97111 # I tried the following. library(reshape) cast(dtf, var~value, mean) # 'mean' because some function needs to be specified. ... this does not what I want, nor does t(dtf). Can somebody help me with the correct transformation, or at least with which function to use best? Thank you in advance! Cheers!! Albert-Jan ~~ In the face of ambiguity, refuse the temptation to guess. ~~ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two questions for R beginners
On Tue, Mar 2, 2010 at 7:27 AM, Duncan Murdoch murd...@stats.uwo.ca wrote: John Sorkin wrote: Please take what follows not as an ad hominem statement, but rather as an attempt to improve what is already an excellent program, that has been built as a result of many, many hours of dedicated work by many, many unpaid, unsung volunteers. It troubles me a bit that when a confusing aspect of R is pointed out the response is not to try to improve the language so as to avoid the confusion, but rather to state that the confusion is inherent in the language. I understand that to make changes that would avoid the confusing aspect of the language that has been discussed in this thread would take time and effort by an R wizard (which I am not), time and effort that would not be compensated in the traditional sense. This does not mean that we should not acknowledge the confusion. If we what R to be the de facto lingua franca of statistical analysis doesn't it make sense to strive for syntax that is as straight forward and consistent as possible? I think you've misunderstood the argument. It would not be hard to make the suggested change. I don't object to it because it would be too much work, I object to it because I think it is not an improvement. Dataframes and matrices are different, and there is no way to avoid that fact. The arguments in favour of the change seem to be these: Users of zoo have some experience with this since zoo uses matrices to represent 2d time series and originally did not support $ as a column extractor but now does. I was originally opposed to adding it for the reasons you state but it was eventually added and having used it for some time now since it got into the package I must say that it is very convenient and I now regard it as a definite improvement in user experience. Certainly I use the feature all the time. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random real numbers
On Tue, 2 Mar 2010 11:51:39 +0100 frederik vanhaelst frederik.vanhae...@gmail.com wrote: How could i generate random real numbers between 0 en 2*pi? Ten such numbers from the uniform distribution: 2*pi*runif(10) -- Karl Ove Hufthammer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random real numbers
runif(20,0,2*pi) [1] 1.29417642 1.10933879 4.31669186 2.41339484 4.83705630 3.12713657 4.50893007 6.23232980 2.38783146 4.88483239 5.87292617 [12] 1.33293077 4.09458703 0.7593 1.67899698 2.42602639 0.08413394 2.40261439 5.46442874 2.13847582 On Tue, Mar 2, 2010 at 5:51 AM, frederik vanhaelst frederik.vanhae...@gmail.com wrote: Hi, How could i generate random real numbers between 0 en 2*pi? Thanks, Frederik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Row-wisely converting a data frame into a list
Hello, is there an elegant way, how I can convert each row of a data frame into distinct elements of a list? In essence, what I'm looking for is something like rows.to.lists - function( df ) { ll - NULL for( i in 1:nrow(df) ) ll - append( ll, list(df[i,]) ) return (ll) } but more done more efficiently (the data frame may contain ten-thousands of rows). I thought about using apply() but this function always returns a matrix. Thanks in advance! Bye, Sebastian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question to define a matrix with some vectors with different lengths
This is what you would use a 'list' for. On Tue, Mar 2, 2010 at 6:29 AM, khaz...@ceremade.dauphine.fr wrote: Hi, I have some vector v1,v2,...,vk, with different lengths. I want to consider these vectors as a matrix with k rows. Can you please guide me how I can do it? Regards khazaei __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random real numbers
From what distribution? If the uniform, runif(100,0,2*pi) If another, install package Runuran, and do this ?Runuran Vignette(Runuran) HTH Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de frederik vanhaelst Enviado el: martes, 02 de marzo de 2010 11:52 Para: r-h...@stat.math.ethz.ch Asunto: [R] Random real numbers Hi, How could i generate random real numbers between 0 en 2*pi? Thanks, Frederik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Double Colors in Main
Dear All, Consider the following trivial code snippet rm(list=ls()) name_vec - c(color1, color2) pdf(test_color.pdf) plot(seq(5), seq(5), main=paste(name_vec[1], and ,name_vec[2], sep=)) dev.off() What I would like to achieve is rather simple to explain, but it is giving me a headache: how can I have two colors in main? Let us say that I would like 'color1' to be blue and 'color2' to be black. Many thanks Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Row-wisely converting a data frame into a list
as.data.frame(t(df)) For example x - as.data.frame(t(mtcars)) typeof(x) [1] list -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Sebastian Bauer Sent: Tuesday, March 02, 2010 8:12 AM To: r-help@r-project.org Subject: [R] Row-wisely converting a data frame into a list Hello, is there an elegant way, how I can convert each row of a data frame into distinct elements of a list? In essence, what I'm looking for is something like rows.to.lists - function( df ) { ll - NULL for( i in 1:nrow(df) ) ll - append( ll, list(df[i,]) ) return (ll) } but more done more efficiently (the data frame may contain ten-thousands of rows). I thought about using apply() but this function always returns a matrix. Thanks in advance! Bye, Sebastian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. === P Please consider the environment before printing this e-mail Cleveland Clinic is ranked one of the top hospitals in America by U.S.News World Report (2009). Visit us online at http://www.clevelandclinic.org for a complete listing of our services, staff and locations. Confidentiality Note: This message is intended for use\...{{dropped:13}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Double Colors in Main
See this thread : http://finzi.psych.upenn.edu/Rhelp10/2009-January/185693.html On 03/02/2010 02:18 PM, Lorenzo Isella wrote: Dear All, Consider the following trivial code snippet rm(list=ls()) name_vec - c(color1, color2) pdf(test_color.pdf) plot(seq(5), seq(5), main=paste(name_vec[1], and ,name_vec[2], sep=)) dev.off() What I would like to achieve is rather simple to explain, but it is giving me a headache: how can I have two colors in main? Let us say that I would like 'color1' to be blue and 'color2' to be black. Many thanks Lorenzo -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://tr.im/OIXN : raster images and RImageJ |- http://tr.im/OcQe : Rcpp 0.7.7 `- http://tr.im/O1wO : highlight 0.1-5 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Row-wisely converting a data frame into a list
Hi! On 03/02/2010 02:22 PM, Nutter, Benjamin wrote: as.data.frame(t(df)) For example x- as.data.frame(t(mtcars)) typeof(x) [1] list Thanks for the quick reply! I would never have guessed that as.data.frame() works that way! BTW This one seems also to do the trick: rows.to.list - function( df ) { ll-apply(df,1,list) ll-lapply(ll,unlist) } It's even a bit faster here. Bye, Sebastian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ANOVA Types and Regression models: the same?
If memory serves, Bill Venables said in the paper cited several times here, that there's only one type of sums of squares. So there's only one type of ANOVA (if I understand what you mean by ANOVA). Just forget about the different types of tests, and simply ask yourself this (hopefully simple and straight forward) question: Which pair of models when compared will answer the question you have at hand? It's not sufficient to just ask: Is factor X significant? It depends on what else is in the model you're entertaining. I think it's high time to retire the archaic concept of the different types of sums of squares. IMHO they are the biggest red herrings in Statistics. Best, Andy From: Ravi Kulkarni Hello, I think I am beginning to understand what is involved in the so-called Type-I, II, ... ANOVAS (thanks to all the replies I got for yesterday's post). I have a question that will help me (and others?) understand it better (or remove a misunderstanding): I know that ANOVA is really a special case of regression where the predictor variable is categorical. I know that there can be various types of regression models commonly called stepwise, add, remove..., where one controls which predictors are added to the regression model and in what order. Is this what the various Types of ANOVA correspond to? I mean that I think of my ANOVA as a regression model (a General Linear Model) and the various ways of entering predictors as the various ANOVA Types. Hope that makes sense... Ravi Kulkarni -- View this message in context: http://n4.nabble.com/ANOVA-Types-and-Regression-models-the-sam e-tp1574654p1574654.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:10}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stack type
Here is an example using proto based on converting Duncan's example: library(proto) Stack - proto(new = function(.) proto(Stack, stack = NULL, push = function(., el) .$stack - c(list(el), .$stack), pop = function(.) { stopifnot(length(.$stack) 0) out - .$stack[[1]] .$stack[[1]] - NULL out })) mystack - Stack$new() mystack$push( 1 ) mystack$push( letters ) mystack$pop() mystack$pop() mystack$pop() # gives an error On Mon, Mar 1, 2010 at 8:14 PM, Duncan Murdoch murd...@stats.uwo.ca wrote: On 01/03/2010 7:56 PM, Worik R wrote: How can I implement a stack in R? I want to push and pop. Every thing I push and pop will be the same type, but not necessarily an atomic type. Use lexical scoping: stack - function() { store - list() push - function(item) { store - c(list(item), store) invisible(length(store)) } pop - function() { if (!length(store)) stop(Nothing to pop!) result - store[[1]] store[[1]] - NULL result } list(push=push, pop=pop) } mystack - stack() mystack$push( 1 ) mystack$push( letters ) mystack$pop() mystack$pop() mystack$pop() # gives an error Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple data transformation question
Try this: reshape(cbind(id = as.numeric(dtf$var), dtf, time = with(dtf, ave(value, var, FUN = seq))), timevar=time, direction=wide) Or: xtabs(value ~ var + ave(value, var, FUN = seq), data = dtf) On Tue, Mar 2, 2010 at 9:40 AM, Albert-Jan Roskam fo...@yahoo.com wrote: Hi all, I have a (hopefully) simple newbie-level question. # I have data like this: dtf - data.frame(read.table(textConnection(var value company 9887.1 company 91117.0 blaah 91.1 etc 11 etc 97111), header=TRUE)) # I would like to have output like this (the index number may vary): var value.1 value.2 company 9887.1 91117.0 blah 91.1 NA etc 11 97111 # I tried the following. library(reshape) cast(dtf, var~value, mean) # 'mean' because some function needs to be specified. ... this does not what I want, nor does t(dtf). Can somebody help me with the correct transformation, or at least with which function to use best? Thank you in advance! Cheers!! Albert-Jan ~~ In the face of ambiguity, refuse the temptation to guess. ~~ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Row-wisely converting a data frame into a list
On Mar 2, 2010, at 8:11 AM, Sebastian Bauer wrote: Hello, is there an elegant way, how I can convert each row of a data frame into distinct elements of a list? split(dfrm, rownames(dfrm)) In essence, what I'm looking for is something like rows.to.lists - function( df ) { ll - NULL for( i in 1:nrow(df) ) ll - append( ll, list(df[i,]) ) return (ll) } but more done more efficiently (the data frame may contain ten- thousands of rows). I thought about using apply() but this function always returns a matrix. Thanks in advance! Bye, Sebastian __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ANOVA Types and Regression models: the same?
Ravi Kulkarni wrote: Hello, I think I am beginning to understand what is involved in the so-called Type-I, II, ... ANOVAS (thanks to all the replies I got for yesterday's post). I have a question that will help me (and others?) understand it better (or remove a misunderstanding): I know that ANOVA is really a special case of regression where the predictor variable is categorical. I know that there can be various types of regression models commonly called stepwise, add, remove..., where one controls which predictors are added to the regression model and in what order. Is this what the various Types of ANOVA correspond to? I mean that I think of my ANOVA as a regression model (a General Linear Model) and the various ways of entering predictors as the various ANOVA Types. Hope that makes sense... Ravi Kulkarni Ravi, John Fox's posting provided a lot of information. Briefly, the Types refer to whether effects are adjusted for all other effects in the model (Types II, III, IV) or no (Type I or sequential tests only adjust for EARLIER terms in the model). See John's posting for definitions of Types II and III and for reasons to almost always use II or III. Stepwise procedures are a whole different kettle of fish, and they yield invalid statistical tests in almost all cases involving more than two candidate variables and more than one fitted model. The world might be a better place if these types were not invented and we stated statistical tests more intuitively. The contrast function in the rms package is one of many ways to do that. Here are the examples from the help file. See especially the examples using type='joint'. set.seed(1) age - rnorm(200,40,12) sex - factor(sample(c('female','male'),200,TRUE)) logit - (sex=='male') + (age-40)/5 y - ifelse(runif(200) = plogis(logit), 1, 0) f - lrm(y ~ pol(age,2)*sex) # Compare a 30 year old female to a 40 year old male # (with or without age x sex interaction in the model) contrast(f, list(sex='female', age=30), list(sex='male', age=40)) # For a model containing two treatments, centers, and treatment # x center interaction, get 0.95 confidence intervals separately # by cente center - factor(sample(letters[1:8],500,TRUE)) treat - factor(sample(c('a','b'), 500,TRUE)) y - 8*(treat=='b') + rnorm(500,100,20) f - ols(y ~ treat*center) lc - levels(center) contrast(f, list(treat='b', center=lc), list(treat='a', center=lc)) # Get 'Type III' contrast: average b - a treatment effect over # centers, weighting centers equally (which is almost always # an unreasonable thing to do) contrast(f, list(treat='b', center=lc), list(treat='a', center=lc), type='average') # Get 'Type II' contrast, weighting centers by the number of # subjects per center. Print the design contrast matrix used. k - contrast(f, list(treat='b', center=lc), list(treat='a', center=lc), type='average', weights=table(center)) print(k, X=TRUE) # Note: If other variables had interacted with either treat # or center, we may want to list settings for these variables # inside the list()'s, so as to not use default settings # For a 4-treatment study, get all comparisons with treatment 'a' treat - factor(sample(c('a','b','c','d'), 500,TRUE)) y - 8*(treat=='b') + rnorm(500,100,20) dd - datadist(treat,center); options(datadist='dd') f - ols(y ~ treat*center) lt - levels(treat) contrast(f, list(treat=lt[-1]), list(treat=lt[ 1]), cnames=paste(lt[-1],lt[1],sep=':'), conf.int=1-.05/3) # Compare each treatment with average of all others for(i in 1:length(lt)) { cat('Comparing with',lt[i],'\n\n') print(contrast(f, list(treat=lt[-i]), list(treat=lt[ i]), type='average')) } options(datadist=NULL) # Six ways to get the same thing, for a variable that # appears linearly in a model and does not interact with # any other variables. We estimate the change in y per # unit change in a predictor x1. Methods 4, 5 also # provide confidence limits. Method 6 computes nonparametric # bootstrap confidence limits. Methods 2-6 can work # for models that are nonlinear or non-additive in x1. # For that case more care is needed in choice of settings # for x1 and the variables that interact with x1. coef(fit)['x1']# method 1 diff(predict(fit, gendata(x1=c(0,1 # method 2 g - Function(fit) # method 3 g(x1=1) - g(x1=0) summary(fit, x1=c(0,1))# method 4 k - contrast(fit, list(x1=1), list(x1=0)) # method 5 print(k, X=TRUE) fit - update(fit, x=TRUE, y=TRUE) # method 6 b - bootcov(fit, B=500, coef.reps=TRUE) bootplot(b, X=k$X)# bootstrap distribution and CL # In a model containing age, race, and sex, # compute an estimate of the mean response for a # 50 year old male, averaged over the races using # observed frequencies for the races as weights f - ols(y ~ age + race + sex)
Re: [R] two questions for R beginner
Brandon Zicha wrote: Hey Paul, Hey Brandon, (adding R-help in the cc) I agree with you that the documentation of R could be better, especially with more examples in code showing not only the common cases, but also more esoteric cases. It would be great if everyone invested a lot of time to write awesome documentation, but this is not the case. I just objected to the tone (I tought :)) I spotted. Some more comments are inline: Accepting the main point of my post - that the often VERY incomplete help files appended to packages can be a major stumbling block for getting up and running in R - I take your point. I probably went a bit to far with my language there. I would point out though that a great many parts of research (like writing a bibliography - or searching for citations of any kind usually) aren't much fun, but are an important part of research related work. Likewise, complete documentation (by which I hardly mean a paper - looking at STATA help files as a minimum would be a good start) is part of programming. I agree that one needs to employ some level of judgement, otherwise you will get helpfile that says First turn on the computer... then click the 'R' Icon But, I have myself created one or two STATA functions that I have put up for public use - so I know how not fun, but necessary complete documentation is. Further, I didn't say that writing documentation doesn't take time. Everything takes time. My point was that relative to actually creating the application - writing more complete documentation takes very little time. If one invests the time to do the 'fun' stuff of writing a new package for R, it seems reasonable that taking the (proportionately) little time to write a nicer help file would be the most 'professional' thing to do. But, this could be my illusion that all researchers seem themselves as professionals - rather than an anarchic egoistic enclave of independent self-interested paper producers. This is what scientists get judged upon, not on how much software they publish and how good their documentation is. Furthermore, it is quite hard for a hardcore R programmer to judge what people find har about their software. I am notorious for assuming greater standards as an acceptable 'norm' than my community at large :-) Furthermore, you are absolutely right that my standards are apparently even to high for many commercial applications! R help is sometimes downright good! So, if I accept that I am demanding S.O.B. and tone down my thoughts of proper documentation and professionalism and adopt the (probably more) reasonable perspective you do at the end of well, this is the world we live in... and come on it's free I totally agree that I probably went too far! But, better yet, I think that this observation you make suggests a solution: Perhaps R could use a more integrated and organized open source help system. I can think of a few possibilities - the easiest being a wiki version of R help. This way users could add useful information to help files - such as more examples, tricks, tips, and known problems. This would take advantage of the open source, free, user-community centered aspects of R, and permit those with an interest in helping beginners to post notes for beginners - on the help files. I know that if such a wiki existed I would have posted my recent example of constrain optimization I just did recently. It wouldn't be too difficult to add a function wikihelp(X) that would open the wiki help page rather than the standard help documentation. Currently, help on any given command is scattered all over help fora all about the web. A central, indexed, and easily referenced help system might be a solution. Heck, such a system could go a step further and link R-help listserv archives by command thus centralizing and integrating the open-source user-built information resource of the listserv into help(). How many e-mails to this listserv begin with 'I just spent a few hours cruising the help forums related to 'X' and couldn't find an answer.' Sounds like a good addition, allowing people to add to the documentation as they see fit. There is ofcourse the R wiki, but this is not widely used and not firmly embedded into R itself. But how would we keep such a system you propose manageable, preventing it from becoming an enormous mess. Maybe some kind of moderation? I note that STATA has all their help files for the latest version of stata available on the web (http://www.stata.com/help.cgi?contents). How difficult would a similar system - only with R, editable and with links to supplementary information - be to set up? I can't imagine it would be horribly expensive in terms of set up costs. A problem is that there is no company that markets R that could set this up, the community is much looser, much more open source. Probably the R core team would be the closest thing we have. What do you
Re: [R] ANOVA Types and Regression models: the same?
Sorry there were 2 typos in my note: John Fox's posting provided a lot of information. Briefly, the Types refer to whether effects are adjusted for all other effects in the model (Types II, III, IV) or no (Type I or sequential tests only adjust for no - not EARLIER terms in the model). See John's posting for definitions of Types II and III and for reasons to almost always use II or III. II or III - II over III Sorry about that Frank __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple data transformation question
Hi Henrique, *Thank you!* The reshape code does precisely what I want. Cheers!! Albert-Jan ~~ In the face of ambiguity, refuse the temptation to guess. ~~ --- On Tue, 3/2/10, Henrique Dallazuanna www...@gmail.com wrote: From: Henrique Dallazuanna www...@gmail.com Subject: Re: [R] simple data transformation question To: Albert-Jan Roskam fo...@yahoo.com Cc: r-help@r-project.org Date: Tuesday, March 2, 2010, 2:45 PM Try this: reshape(cbind(id = as.numeric(dtf$var), dtf, time = with(dtf, ave(value, var, FUN = seq))), timevar=time, direction=wide) Or: xtabs(value ~ var + ave(value, var, FUN = seq), data = dtf) On Tue, Mar 2, 2010 at 9:40 AM, Albert-Jan Roskam fo...@yahoo.com wrote: Hi all, I have a (hopefully) simple newbie-level question. # I have data like this: dtf - data.frame(read.table(textConnection(var value company 9887.1 company 91117.0 blaah 91.1 etc 11 etc 97111), header=TRUE)) # I would like to have output like this (the index number may vary): var value.1 value.2 company 9887.1 91117.0 blah 91.1 NA etc 11 97111 # I tried the following. library(reshape) cast(dtf, var~value, mean) # 'mean' because some function needs to be specified. ... this does not what I want, nor does t(dtf). Can somebody help me with the correct transformation, or at least with which function to use best? Thank you in advance! Cheers!! Albert-Jan ~~ In the face of ambiguity, refuse the temptation to guess. ~~ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nu-SVM crashes in e1071
Hello ! I`m using SVMs for multi-class classification problems. Therefore I`m using the svm() function in the package e1071. If I use svm(...type=C-classification) everything works fine. But if I want to use nu-SVM with svm(..., type=nu-classification, nu=0.5) R crashes immediately. No error message - just crash. Did anybody had the same problem and maybe a solution? I`m using R 2.10.0 and the latest Version of e1071 Thanks TIM BTW: Using the LibSVM wrapper in Weka the same happens. Maybe there is a problem in the LibSVM code... --- Tim Häring Bavarian State Institute of Forest Research Department of Forest Ecology Hans-Carl-von-Carlowitz-Platz 1 D-85354 Freising E-Mail: tim.haer...@lwf.bayern.de http://www.lwf.bayern.de __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange behavior with poisosn and glm
On Tue, 2010-03-02 at 00:58 -0800, Noah Silverman wrote: Ted, Brilliant explanation (as usual) I'm back in school, just starting on a post-graduate degree in stats so the help is really appreciated. Now, I have a slightly trickier question about the same model. I've seen more than one way to get values out of the glm model. i.e. If we're looking at the 10th item in the dataset: note: m is the model fitted(m)[10] predict(m,dataset[10,]) Give me different results. From my data, I get the following real results: predict(m,data[100,]) 100 7.727999 fitted(m)[100] 179 3956.637 I find that unlikely - why is one labelled 100 and the other 179, so perhaps something is wrong here? However, that said, those two calls *will* give you different results because with predict, we can have several types of predictions. see ?predict.glm and note that the default is for type = link, i.e. top produce predictions on the scale of the linear predictor/link function, which then need the inverse of the link function applying to them. What does predict(m, data, type = response)[100] and fitted(m)[100] yield? Do you have missing values etc in your data? G From my understanding, the exp of the prediction should be equal to the fitted value. Here it is not. I don't understand why. Any insight? -N On 3/2/10 12:47 AM, (Ted Harding) wrote: On 02-Mar-10 08:02:27, Noah Silverman wrote: Hi, I'm just learning about poison links for the glm function. One of the data sets I'm playing with has several of the variables as factors (i.e. month, group, etc.) When I call the glm function with a formula that has a factor variable, R automatically converts the variable to a series of variables with unique names and binary values. For example, with this pseudo data: yv1month 21january 31.4februrary 1.56.3february 1.24.5january 5.54.0march I use this call: m- glm(y ~ v1 + month, family=poisson) R gives me back a model with variables of Intercept v1 monthJanuary monthFebruary monthMarch I'm concerned that this might be doing some strange things to my model. Can anyone offer some enlightenment? Thanks! The creation of auxiliary variables is the way to incorporate a factor variable into a model. These are usually called dummy variables, and are essentially indicator variables. Your data above would correspond to variables I (for Intercept), J (for January), F (for February) and M (for March) in addition to the other variables y and v1 as below: y v1I J F M # month 2 1 1 1 0 0 # january 3 1.4 1 0 1 0 # februrary 1.56.3 1 0 1 0 # february 1.24.5 1 1 0 0 # january 5.54.0 1 0 0 1 # march The linear predictor L in the model for y would then be L = a*I + b*v1 + c1*J + c2*F + c3*J evaluated arithmetically; e.g. for row 2 of the data it is a + b*1.4 + c2 However, as given, J + F + M = I, so there is redundancy in the variables, since there are only three independent values there (not so if you exclude the Intercept using a model formula y ~ v1 + month - 1), so R will provide estimates which are computed in terms of some pattern of differences between these four variables called contrasts. Different patterns of difference present different representations of the three independent aspects. There are many different kinds of contrasts available. One of these will be chosen as default by R (depending in particular on whether the factor variable is being used as an ordered factor or an unordered factor). See ?contrasts for an outline of what is there, ?contrast for more detail, and look at the help for particular contrasts such as ?contr.helmert, ?contr.poly, ?contr.sum, ?contr.treatment. After all that: No, R is not doing strange things to your model! ted. E-Mail: (Ted Harding)ted.hard...@manchester.ac.uk Fax-to-email: +44 (0)870 094 0861 Date: 02-Mar-10 Time: 08:47:11 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w]
Re: [R] nu-SVM crashes in e1071
On 02.03.2010 15:41, Häring, Tim (LWF) wrote: Hello ! I`m using SVMs for multi-class classification problems. Therefore I`m using the svm() function in the package e1071. If I use svm(...type=C-classification) everything works fine. But if I want to use nu-SVM with svm(..., type=nu-classification, nu=0.5) R crashes immediately. No error message - just crash. Did anybody had the same problem and maybe a solution? I`m using R 2.10.0 and the latest Version of e1071 Maybe for your unstated OS with unstated version of e1071 on an outdated version of R without a reproducible example given. For my WinXP, R-2.10.1, e1071 1.5-22 I get: library(e1071) data(iris) model - svm(Species ~ ., data = iris, type=nu-classification) model #Call: #svm(formula = Species ~ ., data = iris, type = nu-classification) # # #Parameters: # SVM-Type: nu-classification # SVM-Kernel: radial # gamma: 0.25 # nu: 0.5 # #Number of Support Vectors: 103 Uwe Ligges Thanks TIM BTW: Using the LibSVM wrapper in Weka the same happens. Maybe there is a problem in the LibSVM code... --- Tim Häring Bavarian State Institute of Forest Research Department of Forest Ecology Hans-Carl-von-Carlowitz-Platz 1 D-85354 Freising E-Mail: tim.haer...@lwf.bayern.de http://www.lwf.bayern.de __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: Building R packages in Windows 7
-- Forwarded message -- From: Duncan Murdoch murd...@stats.uwo.ca Date: 25 February 2010 22:35 Subject: Re: [R] Building R packages in Windows 7 To: RHelp r-help@r-project.org, Eric Ferreira ericbferre...@gmail.com On 25/02/2010 7:56 PM, Eric Ferreira wrote: Thank you, Sir, but how can I demand it to create HTML files? The tools::Rd2HTML function will do the translation, but it takes a bit of work to prepare input for it. The idea is that you don't need to save the files, R just produces them when the browser asks for them. There's an option --enable-prebuilt-html that can be used when installing a package, but I don't recommend using it. You'll get the help page as it exists at install time, not as it is intended to be displayed at run time. Links to other packages likely won't work properly. Duncan Murdoch P.S. Please copy your responses to the group, so that others can see the questions and answers. On 25 February 2010 14:41, Duncan Murdoch murd...@stats.uwo.ca wrote: On 25/02/2010 11:49 AM, Eric Ferreira wrote: Ok, I'm working under: Windows 7 Professional 32bits, 4 GB RAM, 320 GB HD, Intel Core 2 Duo processor R 2.10.1 I've installed: Rtools211 MikteX 2.8 HTML Help Workshop Setting my PATH to: c:\Rtools\bin;c:\Rtools\perl\bin;c:\Rtools\MinGW\bin;c:\Arquivos de Programas\R\R-2.10.1pat\bin;c:\Arquivos de Programas\MikTeX 2.8\miktex\bin;c:\Program Files\HTML Help Workshop ...creating the package called ExpDes and asking (at the prompt) : Rcmd build --binary ExpDes Among others, a warning message is printed: WARNING: some HTML links may not be found, and no html files are produced. Right, HTML help files are produced on demand, they aren`t stored in the binary package zip file. HTML Help Workshop is not being used at all. Duncan Murdoch Thank you again. On 25 February 2010 13:02, Duncan Murdoch murd...@stats.uwo.ca wrote: On 25/02/2010 10:56 AM, Eric Ferreira wrote: This is my first package. I'm just getting started doing that, following the steps described on you website... I really don't know how I asking for CHMs to be produced, sorry. All I can suggest is that you need to be less stingy with information. Tell us what you did. Tell us what symptoms you saw. Do both of those by cut and paste from your console, don't paraphrase, or refer to vague instructions like your website. Duncan Murdoch On 25 February 2010 12:52, Duncan Murdoch murd...@stats.uwo.ca wrote: On 25/02/2010 10:40 AM, Eric Ferreira wrote: Dear Duncan Thank so much for your reply. Actually, I'm using the latest version of R and the problem persists. What do you use instead of HTML Help Workshop for newer R versions? We just produce text and HTML help pages on demand, and LaTeX ones for the pdf manuals. How are you asking for CHMs to be produced? Duncan Murdoch Best regards Eric. On 25 February 2010 11:43, Duncan Murdoch murd...@stats.uwo.ca wrote: On 25/02/2010 9:06 AM, Eric Ferreira wrote: Dear useRs, I'm having trouble building R packages in Windows 7 regarding HTML help Workshop. Pointing PATH to c:\Program Files\HTML help Workshop does work in Windows (e.g. Vista) and does not in Windows 7. Some tips?? We don't use the HTML Help Workshop any more since R 2.10.0, so you could upgrade to the current R, and the problem will go away. Otherwise, I think you'll have to ask Microsoft for help. But they aren't likely to be helpful: Win XP is the most recent OS listed as supported. Duncan Murdoch -- Dr Eric B Ferreira Exact Sciences Department Federal University of Alfenas Brazil [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nu-SVM crashes in e1071
I`m using SVMs for multi-class classification problems. Therefore I`m using the svm() function in the package e1071. If I use svm(...type=C-classification) everything works fine. But if I want to use nu-SVM with svm(..., type=nu-classification, nu=0.5) R crashes immediately. No error message - just crash. Did anybody had the same problem and maybe a solution? I`m using R 2.10.0 and the latest Version of e1071 Maybe for your unstated OS with unstated version of e1071 on an outdated version of R without a reproducible example given. For my WinXP, R-2.10.1, e1071 1.5-22 I get: library(e1071) data(iris) model - svm(Species ~ ., data = iris, type=nu-classification) model O.k. - sorry for my sparse information. I just made an update to R-2.10.1 and e1071 version 1.5-22 on WinXP. I can reproduce the example with the iris dataset. However R crashes when I call svm() with my dataset model - svm(soil_unit ~ ., data = traindat, type=nu-classification) My dataset consists of 9259 obs. of 14 variables. My target variable is a factor variable with 22 levels (multi-class classification). Predictors are 12 numeric and 1 factor variables. Hoping this information is enough. TIM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random real numbers
frederik vanhaelst wrote: Hi, How could i generate random real numbers between 0 en 2*pi? Thanks, Frederik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Googeling for R generate random number gave this as a second hit (on my machine): http://blog.revolution-computing.com/2009/02/how-to-choose-a-random-number-in-r.html cheers, Paul -- Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +3130 274 3113 Mon-Tue Phone: +3130 253 5773 Wed-Fri http://intamap.geo.uu.nl/~paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reading data file with both fixed and tab-delimited fields
Hello R wizards, What is the best way to read a data file containing both fixed-width and tab-delimited files? (More detail follows.) _*Details:*_ The U.S. Bureau of Labor Statistics provides local area unemployment statistics at ftp://ftp.bls.gov/pub/time.series/la/, and the data are documented in the file la.txt ftp://ftp.bls.gov/pub/time.series/la/la.txt. Each data file has five tab-delimited fields: * series_id * year * period (codes for things like quarter or month of year) * value * footnote_codes The series_id consists of five fixed-width subfields (length in parentheses): * survey abbreviation (2) * seasonal code (1) * area type code (2) * area code (6) * measure code (2) So an example record might be: LASPS36040003 1990M01 8.8 L I want to read in the data in one pass and convert them to a data frame with the following columns (actual name, class in parentheses): Survey abbreviation (survey, character) Seasonal (seasonal, logical seasonal=T) Area type (area_type_code, factor) Area (area_code, factor) Measure (measure_code, factor) Year (year, Date) Period (period, factor) Value (value, numeric) Footnote (footnote_codes, character but see note) (Regarding the Footnote, I have to look at the data more. If there's just one code per record, this will be a factor; if there are multiple, it will either be character or a list. For not I'm making it only character.) Currently I can read the data just fine using read.table, but this makes series_id the first variable. I want to break out the subfields as separate columns. Any suggestions? Thanks. Marsh Feldman [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting a subset of a time series
The indexing in xts is very nice; it may do what you want. library(xts) x.xts - as.xts(x) plot(x.xts) plot(x.xts['2005::2006-10']) HTH, David Reiner -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Erin Hodgess Sent: Tuesday, March 02, 2010 1:20 AM To: R help Subject: [R] plotting a subset of a time series Dear R People: I have the following time series and plot: x - ts(rnorm(50),start=2005,freq=12) plot(x) which works fine. I would like to plot a subset of that time series, which I did with: plot(window(x,2005,2006.83)) Is there a better way to do this, please? Thanks, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This e-mail and any materials attached hereto, including, without limitation, all content hereof and thereof (collectively, XR Content) are confidential and proprietary to XR Trading, LLC (XR) and/or its affiliates, and are protected by intellectual property laws. Without the prior written consent of XR, the XR Content may not (i) be disclosed to any third party or (ii) be reproduced or otherwise used by anyone other than current employees of XR or its affiliates, on behalf of XR or its affiliates. THE XR CONTENT IS PROVIDED AS IS, WITHOUT REPRESENTATIONS OR WARRANTIES OF ANY KIND. TO THE MAXIMUM EXTENT PERMISSIBLE UNDER APPLICABLE LAW, XR HEREBY DISCLAIMS ANY AND ALL WARRANTIES, EXPRESS AND IMPLIED, RELATING TO THE XR CONTENT, AND NEITHER XR NOR ANY OF ITS AFFILIATES SHALL IN ANY EVENT BE LIABLE FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, INCLUDING, BUT NOT LIMITED TO, DIRECT, INDIRECT, CONSEQUENTIAL, SPECIAL AND PUNITIVE DAMAGES, LOSS OF PROFITS AND TRADING LOSSES, RESULTING FROM ANY PERSON'S USE OR RELIANCE UPON, OR INABILITY TO USE, ANY XR CONTENT, EVEN IF XR IS ADVISED OF THE POSSIBILITY OF SUCH DAMAGES OR IF SUCH DAMAGES WERE FORESEEABLE. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nu-SVM crashes in e1071
On 02.03.2010 17:05, Häring, Tim (LWF) wrote: I`m using SVMs for multi-class classification problems. Therefore I`m using the svm() function in the package e1071. If I use svm(...type=C-classification) everything works fine. But if I want to use nu-SVM with svm(..., type=nu-classification, nu=0.5) R crashes immediately. No error message - just crash. Did anybody had the same problem and maybe a solution? I`m using R 2.10.0 and the latest Version of e1071 Maybe for your unstated OS with unstated version of e1071 on an outdated version of R without a reproducible example given. For my WinXP, R-2.10.1, e1071 1.5-22 I get: library(e1071) data(iris) model- svm(Species ~ ., data = iris, type=nu-classification) model O.k. - sorry for my sparse information. I just made an update to R-2.10.1 and e1071 version 1.5-22 on WinXP. I can reproduce the example with the iris dataset. However R crashes when I call svm() with my dataset model- svm(soil_unit ~ ., data = traindat, type=nu-classification) My dataset consists of 9259 obs. of 14 variables. My target variable is a factor variable with 22 levels (multi-class classification). Predictors are 12 numeric and 1 factor variables. Hoping this information is enough. Well, you might want to send the *reproducible* example (i.e. including data that reproduces a crash) to the e1071 maintainer (CCing David). Maybe he will be unable to help given it is a problem in the underlying libsvm code in which case it might be better to contact the libsvm maintainers. Uwe Ligges TIM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting a subset of a time series
Try this: plot(window(x, end = c(2006, 10))) On Tue, Mar 2, 2010 at 2:19 AM, Erin Hodgess erinm.hodg...@gmail.com wrote: Dear R People: I have the following time series and plot: x - ts(rnorm(50),start=2005,freq=12) plot(x) which works fine. I would like to plot a subset of that time series, which I did with: plot(window(x,2005,2006.83)) Is there a better way to do this, please? Thanks, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random real numbers
On Tue, Mar 2, 2010 at 4:05 PM, Paul Hiemstra p.hiems...@geo.uu.nl wrote: frederik vanhaelst wrote: Hi, How could i generate random real numbers between 0 en 2*pi? Thanks, Googeling for R generate random number gave this as a second hit (on my machine): http://blog.revolution-computing.com/2009/02/how-to-choose-a-random-number-in-r.html If the original poster wanted real random numbers instead of random real numbers: http://finzi.psych.upenn.edu/R/library/random/html/random.html but I'm not sure how best to convert those real random integers into real random reals (between 0 and 2pi). Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] kriging with geoR package
hi all. If someone have the same problems this is the answer: -vertical legend : legend.krige(x.leg=c(X,X),y.leg=c(X,X),kr$pred,vert=TRUE,col=gray(seq(.7,0,l=10))) -sample's positions on the map: ###coords.dat=table$coords### like in image(kr,col=gray(seq(.7,0,l=10)),xlim=c(-1,55),ylim=c(0,53),coords.dat=fau1$coords) -for the grey scale the help provided by the software is nice -- View this message in context: http://n4.nabble.com/kriging-with-geoR-package-tp1008696p1575186.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] capturing errors in Sweave
G'day Sundar, On Tue, 2 Mar 2010 01:03:54 -0800 Sundar Dorai-Raj sdorai...@gmail.com wrote: Thanks, Berwin. That works just great! You are welcome. I noticed by now that cat(tmp) is sufficient; the tmp[1] in cat(tmp[1]) was a left over from earlier attempts to get the output to look correct. Cheers, Berwin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nu-SVM crashes in e1071
Hi, On Tue, Mar 2, 2010 at 11:05 AM, Häring, Tim (LWF) tim.haer...@lwf.bayern.de wrote: I`m using SVMs for multi-class classification problems. Therefore I`m using the svm() function in the package e1071. If I use svm(...type=C-classification) everything works fine. But if I want to use nu-SVM with svm(..., type=nu-classification, nu=0.5) R crashes immediately. No error message - just crash. Did anybody had the same problem and maybe a solution? I`m using R 2.10.0 and the latest Version of e1071 Maybe for your unstated OS with unstated version of e1071 on an outdated version of R without a reproducible example given. For my WinXP, R-2.10.1, e1071 1.5-22 I get: library(e1071) data(iris) model - svm(Species ~ ., data = iris, type=nu-classification) model O.k. - sorry for my sparse information. I just made an update to R-2.10.1 and e1071 version 1.5-22 on WinXP. I can reproduce the example with the iris dataset. However R crashes when I call svm() with my dataset model - svm(soil_unit ~ ., data = traindat, type=nu-classification) My dataset consists of 9259 obs. of 14 variables. My target variable is a factor variable with 22 levels (multi-class classification). Predictors are 12 numeric and 1 factor variables. Hoping this information is enough. While you're sending your bug report to David, perhaps you can try the SVM from kernlab. It relies on code from libsvm, too, but ... you never know. It can't hurt to try. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with choose.files
I have recently upgraded to R 2.10.1 on Windows XP and am using scripts that I've used in previous versions successfully. I'm having a problem with choose.files. The lines read: fura_scan_file-choose.files(caption=Select log file (*.log) for fura-2 scans) PI_scan_file-choose.files(caption=Select log file (*.log) for PI scans) The problem is that the directory chosen after the first choose.files is not remembered. This is an issue b/c my files are nested inside of several directories and it takes a lot of clicking to get to where I need to be. Is there a problem with these lines? Is it likely elsewhere in the script? I apologize for my ignorance and wasting time, but in the documentation for choose.files it suggests this should happen automatically. Caleb Rounds __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading data file with both fixed and tab-delimited fields
I tried to shoehorn the read.* functions and match both the fixed width and the variable width fields in the data but it doesn't seem evident to me. (read.fwf reads fixed width data properly but the rest of the fields must be processed separately -- maybe insert NULL stubs in the remaining fields and fill them in later?) One way is to sidestep the entire issue and convert the structured data you have into a csv file using sed (usually available on most *nix systems) with something like so: cat data | sed -r 's/^(..)(.)(..)(.{6})(..)[ \t]*([^ \t]*)[ \t]*([^ \t]*)[ \t]*([^ \t]*)[ \t]*([^ \t]*)[ \t]*([^ \t]*)/\1,\2,\3,\4,\5,\6,\7,\8,\9/' | less and see if the output is alright and use the resulting .csv file directly in R using read.csv If that does not satisfy you maybe the R Wizards on the list might be able to point you to a native R way of doing this possibly using scan? I'm not sure though. Hope this helps, Chillu On Tue, Mar 2, 2010 at 9:42 PM, Marshall Feldman ma...@uri.edu wrote: Hello R wizards, What is the best way to read a data file containing both fixed-width and tab-delimited files? (More detail follows.) _*Details:*_ The U.S. Bureau of Labor Statistics provides local area unemployment statistics at ftp://ftp.bls.gov/pub/time.series/la/, and the data are documented in the file la.txt ftp://ftp.bls.gov/pub/time.series/la/la.txt. Each data file has five tab-delimited fields: * series_id * year * period (codes for things like quarter or month of year) * value * footnote_codes The series_id consists of five fixed-width subfields (length in parentheses): * survey abbreviation (2) * seasonal code (1) * area type code (2) * area code (6) * measure code (2) So an example record might be: LASPS36040003 1990M01 8.8 L I want to read in the data in one pass and convert them to a data frame with the following columns (actual name, class in parentheses): Survey abbreviation (survey, character) Seasonal (seasonal, logical seasonal=T) Area type (area_type_code, factor) Area (area_code, factor) Measure (measure_code, factor) Year (year, Date) Period (period, factor) Value (value, numeric) Footnote (footnote_codes, character but see note) (Regarding the Footnote, I have to look at the data more. If there's just one code per record, this will be a factor; if there are multiple, it will either be character or a list. For not I'm making it only character.) Currently I can read the data just fine using read.table, but this makes series_id the first variable. I want to break out the subfields as separate columns. Any suggestions? Thanks. Marsh Feldman [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] add a header to a forest plot (metafor)
Dear R-community, I'm currently trying to assemble a forest plot using the forest function from package metaphor. Works well. Even the regular main-argument works for adding a title to the graph. However, I would like to add one top row which explains the nature of the columns. Very much like the usual header in spreadsheet programs. For example: Study Sample Sample Size Estimated Effect Size CI 95%. I tried to add axis(3), but apparently the forest plot isn't that kind of graphic. Does anyone have any idea? Cheerio Sebastian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading data file with both fixed and tab-delimited fields
Ah, I should have mentioned this. Personally I work on Macs (Leopard) and PC's (XP Pro and XP Pro x64). Even though the PC's do have Cygwin, I'm trying to make this code portable. So I want to avoid such things as sed, perl, etc. I want to do this in R, even if processing is a bit slower. Eventually, I'll hide the code in a class, so the code can be a bit complex. Marsh Feldman On 3/2/2010 12:29 PM, Chidambaram Annamalai wrote: I tried to shoehorn the read.* functions and match both the fixed width and the variable width fields in the data but it doesn't seem evident to me. (read.fwf reads fixed width data properly but the rest of the fields must be processed separately -- maybe insert NULL stubs in the remaining fields and fill them in later?) One way is to sidestep the entire issue and convert the structured data you have into a csv file using sed (usually available on most *nix systems) with something like so: cat data | sed -r 's/^(..)(.)(..)(.{6})(..)[ \t]*([^ \t]*)[ \t]*([^ \t]*)[ \t]*([^ \t]*)[ \t]*([^ \t]*)[ \t]*([^ \t]*)/\1,\2,\3,\4,\5,\6,\7,\8,\9/' | less and see if the output is alright and use the resulting .csv file directly in R using read.csv If that does not satisfy you maybe the R Wizards on the list might be able to point you to a native R way of doing this possibly using scan? I'm not sure though. Hope this helps, Chillu On Tue, Mar 2, 2010 at 9:42 PM, Marshall Feldman ma...@uri.edu mailto:ma...@uri.edu wrote: Hello R wizards, What is the best way to read a data file containing both fixed-width and tab-delimited files? (More detail follows.) _*Details:*_ The U.S. Bureau of Labor Statistics provides local area unemployment statistics at ftp://ftp.bls.gov/pub/time.series/la/, and the data are documented in the file la.txt ftp://ftp.bls.gov/pub/time.series/la/la.txt. Each data file has five tab-delimited fields: * series_id * year * period (codes for things like quarter or month of year) * value * footnote_codes The series_id consists of five fixed-width subfields (length in parentheses): * survey abbreviation (2) * seasonal code (1) * area type code (2) * area code (6) * measure code (2) So an example record might be: LASPS36040003 1990M01 8.8 L I want to read in the data in one pass and convert them to a data frame with the following columns (actual name, class in parentheses): Survey abbreviation (survey, character) Seasonal (seasonal, logical seasonal=T) Area type (area_type_code, factor) Area (area_code, factor) Measure (measure_code, factor) Year (year, Date) Period (period, factor) Value (value, numeric) Footnote (footnote_codes, character but see note) (Regarding the Footnote, I have to look at the data more. If there's just one code per record, this will be a factor; if there are multiple, it will either be character or a list. For not I'm making it only character.) Currently I can read the data just fine using read.table, but this makes series_id the first variable. I want to break out the subfields as separate columns. Any suggestions? Thanks. Marsh Feldman [[alternative HTML version deleted]] __ R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dr. Marshall Feldman, PhD Director of Research and Academic Affairs CUSR Logo Center for Urban Studies and Research The University of Rhode Island email: marsh @ uri .edu (remove spaces) Contact Information: Kingston: 202 Hart House Charles T. Schmidt Labor Research Center The University of Rhode Island 36 Upper College Road Kingston, RI 02881-0815 tel. (401) 874-5953: fax: (401) 874-5511 Providence: 206E Shepard Building URI Feinstein Providence Campus 80 Washington Street Providence, RI 02903-1819 tel. (401) 277-5218 fax: (401) 277-5464 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] adding row ID numbers by group
Hello R community, I am hoping for some help with the following problem. I have a data frame containing various groups. These groups are identified by a grouping variable. I would like to add a sequential ID number to each group to later sort these individuals within each group by this ID number. Here is what the final result should look like: ID group var2 1 11 2 12 3 13 4 14 1 25 2 26 3 27 4 28 5 29 1 3 10 2 3 11 3 3 12 4 3 13 5 3 14 I have created the following code to loop through this and compare a given row with the following row for the grouping variable. If a given row would be different from the then following row, the ID number would be reset and I would start counting up again. The problem that I am encountering that at the bottom of the data frame the if statement runs out of a condition against which to compare the last row. Here is what I did: group- c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3) var2- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) data-data.frame(group, var2) data #IDN is the desired ID number by group IDN -numeric(length(test$var2)) IDN for (i in 1:(length(data$group))) { if(data[i,1] (length(data$group))){ if(data[i,1] == data[i+1,1]){ IDN[i]- sum(IDN[i-1],1)} else{ IDN[i]- -55} #for now an arbitrary value } if(data[i,1] == (length(data$group))) { IDN[i] - 99 #for now an arbitrary value } } IDN Is there maybe an easier way to do this? Any thoughts would be very appreciated since I am running out of ideas. Thanks Alexander [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two questions for R beginners
On 02/03/2010 11:53 AM, William Dunlap wrote: -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of John Sorkin Sent: Tuesday, March 02, 2010 3:46 AM To: Karl Ove Hufthammer; r-h...@stat.math.ethz.ch Subject: Re: [R] two questions for R beginners Please take what follows not as an ad hominem statement, but rather as an attempt to improve what is already an excellent program, that has been built as a result of many, many hours of dedicated work by many, many unpaid, unsung volunteers. It troubles me a bit that when a confusing aspect of R is pointed out the response is not to try to improve the language so as to avoid the confusion, but rather to state that the confusion is inherent in the language. I understand that to make changes that would avoid the confusing aspect of the language that has been discussed in this thread would take time and effort by an R wizard (which I am not), time and effort that would not be compensated in the traditional sense. This does not mean that we should not acknowledge the confusion. If we what R to be the de facto lingua franca of statistical analysis doesn't it make sense to strive for syntax that is as straight forward and consistent as possible? Whenever one changes the language that way old code will break. I think in this case not much code would break. Mostly when people have a matrix M and ask for M$column they'll get an error; the proposal is that they'll get the requested column. (It is possible to have a list with names that is also a matrix with dimnames, but I think that is a pretty unusual construction.) But I haven't been convinced that the proposal is a net improvement to the language. Duncan Murdoch The developers can, with a lot of effort, fix their own code, and perhaps even user-written code on CRAN, but code that thousands of users have written will break. There is a lot of code out there that was written by trial and error and by folks who no longer work at an institution: the code works but no one knows exactly why it works. Telling folks they need to change that code because we have a cleaner but different syntax now is not good. Why would one spend time writing a package that might stop working when R is upgraded? I think the solution is not to change current semantics but to write functions that behave better and encourage users to use them, gradually abandoning the old constructs. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com Again, please understand that my comment is made with deepest respect for the many people who have unselfishly contributed to the R project. Many thanks to each and every one of you. John Karl Ove Hufthammer k...@huftis.org 3/2/2010 4:00 AM On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch murd...@stats.uwo.ca wrote: Suppose X is a dataframe or a matrix. What would you expect to get from X[1]? What about as.vector(X), or as.numeric(X)? All this of course depends on type of object one is speaking of. There are plenty of surprises available, and it's best to use the most logical way of extracting. E.g., to extract the top-left element of a 2D structure (data frame or matrix), use 'X[1,1]'. Luckily, R provides some shortcuts. For example, you can write 'X[2,3]' on a data frame, just as if it was a matrix, even though the underlying structure is completely different. (This doesn't work on a normal list; there you have to type the whole 'X[[2]][3]'.) The behaviour of the 'as.' functions may sometimes be surprising, at least for me. For example, 'as.data.frame' on a named vector gives a single-column data frame, instead of a single-row data frame. (I'm not sure what's the recommended way of converting a named vector to row data frame, but 'as.data.frame(t(X))' works, even though both 'X' and 't(X)' looks like a row of numbers.) The point is that a dataframe is a list, and a matrix isn't. If users don't understand that, then they'll be confused somewhere. Making matrices more list-like in one respect will just move the confusion elsewhere. The solution is to understand the difference. My main problem is not understanding the difference, which is easy, but knowing which type of I have when I get the output a function in a package. If I know the object is a named vector or a matrix with column names, it's easy enough to type 'X[,colname]', and if it's a data frame one may use the shortcut 'X$colname'. Usually, it *is* documented what the return value of a function is, but just looking at the output is much faster, and *usually* gives the correct answer. For example, 'mean' applied on a data frame gives a named vector, not a data frame, which is somewhat surprising (given that the columns of a data frame may be of different types, while the elements of a vector may not).
Re: [R] adding row ID numbers by group
Try this: data$ID - with(data, ave(group, group, FUN = seq)) On Tue, Mar 2, 2010 at 2:53 PM, Alexander Schwall alexander.schw...@gmail.com wrote: Hello R community, I am hoping for some help with the following problem. I have a data frame containing various groups. These groups are identified by a grouping variable. I would like to add a sequential ID number to each group to later sort these individuals within each group by this ID number. Here is what the final result should look like: ID group var2 1 1 1 2 1 2 3 1 3 4 1 4 1 2 5 2 2 6 3 2 7 4 2 8 5 2 9 1 3 10 2 3 11 3 3 12 4 3 13 5 3 14 I have created the following code to loop through this and compare a given row with the following row for the grouping variable. If a given row would be different from the then following row, the ID number would be reset and I would start counting up again. The problem that I am encountering that at the bottom of the data frame the if statement runs out of a condition against which to compare the last row. Here is what I did: group- c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3) var2- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) data-data.frame(group, var2) data #IDN is the desired ID number by group IDN -numeric(length(test$var2)) IDN for (i in 1:(length(data$group))) { if(data[i,1] (length(data$group))){ if(data[i,1] == data[i+1,1]){ IDN[i]- sum(IDN[i-1],1)} else{ IDN[i]- -55} #for now an arbitrary value } if(data[i,1] == (length(data$group))) { IDN[i] - 99 #for now an arbitrary value } } IDN Is there maybe an easier way to do this? Any thoughts would be very appreciated since I am running out of ideas. Thanks Alexander [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two questions for R beginners
William, I agree that changing syntax can lead to problems. I don't, however think extending the language will break existing code. Providing a common syntax for accessing matrices and dataframes will not change the way things have been done to date, but rather how things will be done in the future. John John Sorkin jsor...@grecc.umaryland.edu -Original Message- From: William Dunlap wdun...@tibco.com To: John Sorkin jsor...@grecc.umaryland.edu To: Karl Ove Hufthammer k...@huftis.org To: r-h...@stat.math.ethz.ch Sent: 3/2/2010 11:53:45 AM Subject: RE: [R] two questions for R beginners -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of John Sorkin Sent: Tuesday, March 02, 2010 3:46 AM To: Karl Ove Hufthammer; r-h...@stat.math.ethz.ch Subject: Re: [R] two questions for R beginners Please take what follows not as an ad hominem statement, but rather as an attempt to improve what is already an excellent program, that has been built as a result of many, many hours of dedicated work by many, many unpaid, unsung volunteers. It troubles me a bit that when a confusing aspect of R is pointed out the response is not to try to improve the language so as to avoid the confusion, but rather to state that the confusion is inherent in the language. I understand that to make changes that would avoid the confusing aspect of the language that has been discussed in this thread would take time and effort by an R wizard (which I am not), time and effort that would not be compensated in the traditional sense. This does not mean that we should not acknowledge the confusion. If we what R to be the de facto lingua franca of statistical analysis doesn't it make sense to strive for syntax that is as straight forward and consistent as possible? Whenever one changes the language that way old code will break. The developers can, with a lot of effort, fix their own code, and perhaps even user-written code on CRAN, but code that thousands of users have written will break. There is a lot of code out there that was written by trial and error and by folks who no longer work at an institution: the code works but no one knows exactly why it works. Telling folks they need to change that code because we have a cleaner but different syntax now is not good. Why would one spend time writing a package that might stop working when R is upgraded? I think the solution is not to change current semantics but to write functions that behave better and encourage users to use them, gradually abandoning the old constructs. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com Again, please understand that my comment is made with deepest respect for the many people who have unselfishly contributed to the R project. Many thanks to each and every one of you. John Karl Ove Hufthammer k...@huftis.org 3/2/2010 4:00 AM On Mon, 01 Mar 2010 10:00:07 -0500 Duncan Murdoch murd...@stats.uwo.ca wrote: Suppose X is a dataframe or a matrix. What would you expect to get from X[1]? What about as.vector(X), or as.numeric(X)? All this of course depends on type of object one is speaking of. There are plenty of surprises available, and it's best to use the most logical way of extracting. E.g., to extract the top-left element of a 2D structure (data frame or matrix), use 'X[1,1]'. Luckily, R provides some shortcuts. For example, you can write 'X[2,3]' on a data frame, just as if it was a matrix, even though the underlying structure is completely different. (This doesn't work on a normal list; there you have to type the whole 'X[[2]][3]'.) The behaviour of the 'as.' functions may sometimes be surprising, at least for me. For example, 'as.data.frame' on a named vector gives a single-column data frame, instead of a single-row data frame. (I'm not sure what's the recommended way of converting a named vector to row data frame, but 'as.data.frame(t(X))' works, even though both 'X' and 't(X)' looks like a row of numbers.) The point is that a dataframe is a list, and a matrix isn't. If users don't understand that, then they'll be confused somewhere. Making matrices more list-like in one respect will just move the confusion elsewhere. The solution is to understand the difference. My main problem is not understanding the difference, which is easy, but knowing which type of I have when I get the output a function in a package. If I know the object is a named vector or a matrix with column names, it's easy enough to type 'X[,colname]', and if it's a data frame one may use the shortcut 'X$colname'. Usually, it *is* documented what the return value of a function is, but just looking at the output is much faster, and *usually* gives the correct answer. For example, 'mean' applied on a data frame gives a named vector, not a data frame, which is
Re: [R] adding row ID numbers by group
Like this? group- c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3) var2- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) data-data.frame(group, var2) data ddply(data,group,transform,ID=1:length(group)) Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA - Original Message From: Alexander Schwall alexander.schw...@gmail.com To: r-help@r-project.org Sent: Tue, March 2, 2010 9:53:19 AM Subject: [R] adding row ID numbers by group Hello R community, I am hoping for some help with the following problem. I have a data frame containing various groups. These groups are identified by a grouping variable. I would like to add a sequential ID number to each group to later sort these individuals within each group by this ID number. Here is what the final result should look like: ID group var2 1 1 1 2 1 2 3 1 3 4 1 4 1 2 5 2 2 6 3 2 7 4 2 8 5 2 9 1 3 10 2 3 11 3 3 12 4 3 13 5 3 14 I have created the following code to loop through this and compare a given row with the following row for the grouping variable. If a given row would be different from the then following row, the ID number would be reset and I would start counting up again. The problem that I am encountering that at the bottom of the data frame the if statement runs out of a condition against which to compare the last row. Here is what I did: group- c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3) var2- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) data-data.frame(group, var2) data #IDN is the desired ID number by group IDN -numeric(length(test$var2)) IDN for (i in 1:(length(data$group))) { if(data[i,1] (length(data$group))){ if(data[i,1] == data[i+1,1]){ IDN[i]- sum(IDN[i-1],1)} else{ IDN[i]- -55} #for now an arbitrary value } if(data[i,1] == (length(data$group))) { IDN[i] - 99 #for now an arbitrary value } } IDN Is there maybe an easier way to do this? Any thoughts would be very appreciated since I am running out of ideas. Thanks Alexander [[alternative HTML version deleted]] __ ymailto=mailto:R-help@r-project.org; href=mailto:R-help@r-project.org;R-help@r-project.org mailing list href=https://stat.ethz.ch/mailman/listinfo/r-help; target=_blank https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to import map data (maptools?) from a html set of 'coords'
Dear R users, I would like to draw map and import it in maptools/spatstat packages. The 'raw data' I have come from a web page (map.../map) and are basically a list of coordinates of a polygon. I would like to know how to import them in R; I checked the maptools packages, but all the examples use existing .dbf files. I just have a (serie of) text file(s) looking like this: For example, for the French Region Burgundy: area href=region.asp?reg=26 shape=poly title=Bourgogne alt=Bourgogne coords=208,121,211,115,221,113,224,115,225,120,229,122,232,128,251,125,255, 130,256,136,266,138,268,148,267,154,263,160,267,168,267,180,262, 175,256,178,254,184,248,184,243,187,237,187,232,185,234,181,227, 171,216,171,212,166,211,155,208,149,208,135,211,132,213,125,208, 121 any idea welcome, sylvain (If anayone is interested with that type of data, they're available at the INSEE website along with loads of information on the population and economy of each region) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: Building R packages in Windows 7
-- Forwarded message -- From: Duncan Murdoch murd...@stats.uwo.ca Date: 25 February 2010 22:30 Subject: Re: [R] Building R packages in Windows 7 To: Eric Ferreira ericbferre...@gmail.com On 25/02/2010 7:56 PM, Eric Ferreira wrote: Thank you, Sir, but how can I demand it to create HTML files? The tools::Rd2HTML function will do the translation, but it takes a bit of work to prepare input for it. The idea is that you don't need to save the files, R just produces them when the browser asks for them. There's an option --enable-prebuilt-html that can be used when installing a package, but I don't recommend using it. You'll get the help page as it exists at install time, not as it is intended to be displayed at run time. Links to other packages likely won't work properly. Duncan Murdoch On 25 February 2010 14:41, Duncan Murdoch murd...@stats.uwo.ca wrote: On 25/02/2010 11:49 AM, Eric Ferreira wrote: Ok, I'm working under: Windows 7 Professional 32bits, 4 GB RAM, 320 GB HD, Intel Core 2 Duo processor R 2.10.1 I've installed: Rtools211 MikteX 2.8 HTML Help Workshop Setting my PATH to: c:\Rtools\bin;c:\Rtools\perl\bin;c:\Rtools\MinGW\bin;c:\Arquivos de Programas\R\R-2.10.1pat\bin;c:\Arquivos de Programas\MikTeX 2.8\miktex\bin;c:\Program Files\HTML Help Workshop ...creating the package called ExpDes and asking (at the prompt) : Rcmd build --binary ExpDes Among others, a warning message is printed: WARNING: some HTML links may not be found, and no html files are produced. Right, HTML help files are produced on demand, they aren`t stored in the binary package zip file. HTML Help Workshop is not being used at all. Duncan Murdoch Thank you again. On 25 February 2010 13:02, Duncan Murdoch murd...@stats.uwo.ca wrote: On 25/02/2010 10:56 AM, Eric Ferreira wrote: This is my first package. I'm just getting started doing that, following the steps described on you website... I really don't know how I asking for CHMs to be produced, sorry. All I can suggest is that you need to be less stingy with information. Tell us what you did. Tell us what symptoms you saw. Do both of those by cut and paste from your console, don't paraphrase, or refer to vague instructions like your website. Duncan Murdoch On 25 February 2010 12:52, Duncan Murdoch murd...@stats.uwo.ca wrote: On 25/02/2010 10:40 AM, Eric Ferreira wrote: Dear Duncan Thank so much for your reply. Actually, I'm using the latest version of R and the problem persists. What do you use instead of HTML Help Workshop for newer R versions? We just produce text and HTML help pages on demand, and LaTeX ones for the pdf manuals. How are you asking for CHMs to be produced? Duncan Murdoch Best regards Eric. On 25 February 2010 11:43, Duncan Murdoch murd...@stats.uwo.ca wrote: On 25/02/2010 9:06 AM, Eric Ferreira wrote: Dear useRs, I'm having trouble building R packages in Windows 7 regarding HTML help Workshop. Pointing PATH to c:\Program Files\HTML help Workshop does work in Windows (e.g. Vista) and does not in Windows 7. Some tips?? We don't use the HTML Help Workshop any more since R 2.10.0, so you could upgrade to the current R, and the problem will go away. Otherwise, I think you'll have to ask Microsoft for help. But they aren't likely to be helpful: Win XP is the most recent OS listed as supported. Duncan Murdoch -- Dr Eric B Ferreira Exact Sciences Department Federal University of Alfenas Brazil [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating matrix from long table in database (pivoting)
Hi all, I have a table in database that is very long and when simplified it has only two columns in it (id, text). id is the row, and text is the column. Technically the text is a term and and id is the document. If simplifying this and assuming there is only one occurrence of the term per the document. I shall be able to convert this into a binary matrix. Table looks like this... *ID** **Text* 1 this 1 is 1 the 1 first 1 row 2 this 2 is 2 the 2 send 2 row ... in R I would like to have it as *id this is the first second row* 1 1 1 1 1 0 1 2 1 1 1 0 1 1 it would be simpler for me to do this transformation in R as I guess the language is more handy as the SQL. The table in R have few dozen thousand of columns and rows as well. I know how to read the data from database, but just unsure if there is some suitable transformation available. Thank you Jan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] two questions for R beginners
Liviu Andronic escribió: On Mon, Mar 1, 2010 at 11:49 PM, Liviu Andronic landronim...@gmail.com wrote: On 3/1/10, Keo Ormsby keo.orms...@gmail.com wrote: Perhaps my biggest problem was that I couldn't (and still haven't) seen *absolute beginners* documents. there was once a link posted on r-sig-teaching that would probably fit your needs, but I cannot find it now. OK, I found it. Below is an excerpt of that r-sig-teaching e-mail. Liviu On Thu, Jul 2, 2009 at 2:19 PM, Robert W. Hayden hay...@mv.mv.com wrote: I think such a website would be a real asset. It would be most useful if it either were restricted to intro. stats. OR organized so that materials for real beginners were easy to extract from all the materials for programmers and Ph.D. statisticians. As a relative beginner myself, I find the usual resources useless. In self defense, I created materials for my own beginning students: http://courses.statistics.com/software/R/Rhome.htm Hi Liviu, This is indeed the best site for introduction I have seen. Although it still assumes some things that at first might seem unintuitive to the absolute beginner I talk about. For instance, in the first page, it shows that you can do sqrt(x), where x can be a vector, and return a vector of the square roots of each number. Although this is high school matrix algebra, most users expect that the input to square root function to be a single number, not a matrix, as in Excel or a calculator. Other concepts that are not explicitly introduced are R workspace, the use of arguments in functions (with or without the =), etc. Others are things like diff(range(rainfall)) , where you have the output of one function used as the input to another, all in the same command line. All these things seem very basic, but can be difficult if you are trying to learn on your own with no prior experience in programming. I hope I am not sounding too difficult and contrarian, I am just trying to share my experience with starting with R, and in trying to convey this learning to my colleagues and students. In the end, I did find everything I needed to learn, and now I feel at ease with R, and I believe that almost anybody that can use Excel or something like it, could learn R. Thank you for the information, Best wishes, Keo. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding row ID numbers by group
Thank you gentlemen, all three solutions are working and very insightful. Your help and time is very much appreciated. Alexander On Tue, Mar 2, 2010 at 1:08 PM, Felipe Carrillo mazatlanmex...@yahoo.comwrote: Like this? group- c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3) var2- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) data-data.frame(group, var2) data ddply(data,group,transform,ID=1:length(group)) Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish Wildlife Service California, USA - Original Message From: Alexander Schwall alexander.schw...@gmail.com To: r-help@r-project.org Sent: Tue, March 2, 2010 9:53:19 AM Subject: [R] adding row ID numbers by group Hello R community, I am hoping for some help with the following problem. I have a data frame containing various groups. These groups are identified by a grouping variable. I would like to add a sequential ID number to each group to later sort these individuals within each group by this ID number. Here is what the final result should look like: ID group var2 1 11 2 12 3 13 4 14 1 25 2 26 3 2 7 4 28 5 2 9 1 3 10 2 3 11 3 3 12 4 3 13 5 3 14 I have created the following code to loop through this and compare a given row with the following row for the grouping variable. If a given row would be different from the then following row, the ID number would be reset and I would start counting up again. The problem that I am encountering that at the bottom of the data frame the if statement runs out of a condition against which to compare the last row. Here is what I did: group- c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3) var2- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) data-data.frame(group, var2) data #IDN is the desired ID number by group IDN -numeric(length(test$var2)) IDN for (i in 1:(length(data$group))) { if(data[i,1] (length(data$group))){ if(data[i,1] == data[i+1,1]){ IDN[i]- sum(IDN[i-1],1)} else{ IDN[i]- -55} #for now an arbitrary value } if(data[i,1] == (length(data$group))) { IDN[i] - 99 #for now an arbitrary value } } IDN Is there maybe an easier way to do this? Any thoughts would be very appreciated since I am running out of ideas. Thanks Alexander [[alternative HTML version deleted]] __ ymailto=mailto:R-help@r-project.org; href=mailto:R-help@r-project.org;R-help@r-project.org mailing list href=https://stat.ethz.ch/mailman/listinfo/r-help; target=_blank https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plotting fitted lme values as a smooth line
I am trying to plot fitted lme values as a smooth line of a graph showing the exponential relationship between temperature and soil respiration. In the plot, the x-axis has temperature, and the y-axis has soil respiration. When I try to add a line showing temperature versus the fitted values, it is jagged and not smooth. Here is the code I used: lme.1-lme(fixed=LnFlux~Temp, random=~1|Plt, data=resp) fit.1-exp(fitted(lme.1)) plot(Flux$Temp, Flux$Flux, xlab=Temperature, ylab=expression(CO[2]*Flux), xlim=c(-10, 25), ylim=c(0,250), pch=16) ord1-order(CFlux$Temp) lines(CFlux$TempC[ord1], fit.1[ord1], lty=1, lwd=2) This does not produce a straight line, but a jagged one that moves up and down between points. If I use fitted values from a simple linear model (lm), I don't have this problem, and the line is smooth: lm.1-lm(LnFlux~Temp, data=resp) fit.2-exp(fitted(lm.1)) plot(Flux$Temp, Flux$Flux, xlab=Temperature, ylab=expression(CO[2]*Flux), xlim=c(-10, 25), ylim=c(0,250), pch=16) ord2-order(Flux$Temp) lines(Flux$Temp[ord2], fit.2[ord2], lty=1, lwd=2) The only difference I can find between the two is the structure of the fitted objects. The fit.1 object from lme is atomic, and lacks individual data labels. Instead, the labels are: attr(*, label)= chr Fitted values. In contrast, the fit.2 object from the lm is Named num, with: attr(*, names)= chr [1:460] 1 2 3 4. Is this difference causing my problem with adding a smooth line to the graph? If so, is there any way I can change the structure of the lme fitted object to make it more amenable to adding a smooth line to a plot? Or is something else at work? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Compare parameter estimates of a nlsList object
Hello together, Is there a tool to test the statistical differences between parameter estimates of an nlsList fit, with more than two groups? I am able to complete the nlme function for two groups after getting starting paramaters in nlsList, as seen below. fit.nlme - nlme(rate ~ SSmicmen(conc, Vm, K), fixed=Vm+K~state, groups=~state, start=c(212, -52, 0.06,- 0.01), data=Puromycin) summary(fit.nlme) However, I am unable to test the differences between more than 2 paramaters. My data set has 5 different groups and therefore has 5 different paramaeter estimates and I am not sure how to fill in the start=c() for more than 2 groups. Thanks Jens __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating matrix from long table in database (pivoting)
Try this: DF - read.table(textConnection(1 this 1 is 1 the 1 first 1 row 2 this 2 is 2 the 2 send 2 row)) reshape(DF, v.names = 'V2', idvar = 'V1', timevar = 'V2', direction = 'wide') On Tue, Mar 2, 2010 at 3:35 PM, Jan Hornych jh.horn...@gmail.com wrote: Hi all, I have a table in database that is very long and when simplified it has only two columns in it (id, text). id is the row, and text is the column. Technically the text is a term and and id is the document. If simplifying this and assuming there is only one occurrence of the term per the document. I shall be able to convert this into a binary matrix. Table looks like this... *ID** **Text* 1 this 1 is 1 the 1 first 1 row 2 this 2 is 2 the 2 send 2 row ... in R I would like to have it as *id this is the first second row* 1 1 1 1 1 0 1 2 1 1 1 0 1 1 it would be simpler for me to do this transformation in R as I guess the language is more handy as the SQL. The table in R have few dozen thousand of columns and rows as well. I know how to read the data from database, but just unsure if there is some suitable transformation available. Thank you Jan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating matrix from long table in database (pivoting)
Jan - Here's one way: tbl = data.frame(id=c(1,1,1,1,1,2,2,2,2,2), text=c('this','is','the','first','row','this','is','the','second','row')) xtabs(~id+text,tbl) text id first is row second the this 1 1 1 1 0 11 2 0 1 1 1 11 It's a bit tricky to automatically get the column headings to be in the order you want. This comes close: tbl$text = factor(tbl$text,levels=tbl$text[!duplicated(tbl$text)]) xtabs(~id+text,tbl) text id this is the first row second 11 1 1 1 1 0 21 1 1 0 1 1 Hope this helps. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Tue, 2 Mar 2010, Jan Hornych wrote: Hi all, I have a table in database that is very long and when simplified it has only two columns in it (id, text). id is the row, and text is the column. Technically the text is a term and and id is the document. If simplifying this and assuming there is only one occurrence of the term per the document. I shall be able to convert this into a binary matrix. Table looks like this... *ID** **Text* 1 this 1 is 1 the 1 first 1 row 2 this 2 is 2 the 2 send 2 row ... in R I would like to have it as *id this is the first second row* 1 1 1 1 1 0 1 2 1 1 1 0 1 1 it would be simpler for me to do this transformation in R as I guess the language is more handy as the SQL. The table in R have few dozen thousand of columns and rows as well. I know how to read the data from database, but just unsure if there is some suitable transformation available. Thank you Jan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating matrix from long table in database (pivoting)
Or better: reshape(cbind(DF, value = 1), v.names = 'value', idvar = 'V1', timevar = 'V2', direction = 'wide') On Tue, Mar 2, 2010 at 3:49 PM, Henrique Dallazuanna www...@gmail.com wrote: Try this: DF - read.table(textConnection(1 this 1 is 1 the 1 first 1 row 2 this 2 is 2 the 2 send 2 row)) reshape(DF, v.names = 'V2', idvar = 'V1', timevar = 'V2', direction = 'wide') On Tue, Mar 2, 2010 at 3:35 PM, Jan Hornych jh.horn...@gmail.com wrote: Hi all, I have a table in database that is very long and when simplified it has only two columns in it (id, text). id is the row, and text is the column. Technically the text is a term and and id is the document. If simplifying this and assuming there is only one occurrence of the term per the document. I shall be able to convert this into a binary matrix. Table looks like this... *ID** **Text* 1 this 1 is 1 the 1 first 1 row 2 this 2 is 2 the 2 send 2 row ... in R I would like to have it as *id this is the first second row* 1 1 1 1 1 0 1 2 1 1 1 0 1 1 it would be simpler for me to do this transformation in R as I guess the language is more handy as the SQL. The table in R have few dozen thousand of columns and rows as well. I know how to read the data from database, but just unsure if there is some suitable transformation available. Thank you Jan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40 S 49° 16' 22 O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] capturing errors in Sweave
What I ended up using was: cat(unclass(tmp)) --sundar On Tue, Mar 2, 2010 at 8:58 AM, Berwin A Turlach ber...@maths.uwa.edu.auwrote: G'day Sundar, On Tue, 2 Mar 2010 01:03:54 -0800 Sundar Dorai-Raj sdorai...@gmail.com wrote: Thanks, Berwin. That works just great! You are welcome. I noticed by now that cat(tmp) is sufficient; the tmp[1] in cat(tmp[1]) was a left over from earlier attempts to get the output to look correct. Cheers, Berwin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gradient Boosting Trees with correlated predictors in gbm
On Mon, 01-Mar-2010 at 12:01PM -0500, Max Kuhn wrote: | In theory, the choice between two perfectly correlated predictors is | random. Therefore, the importance should be diluted by half. | However, this is implementation dependent. | | For example, run this: | | set.seed(1) | n - 100 | p - 10 | | data - as.data.frame(matrix(rnorm(n*(p-1)), nrow = n)) | data$dup - data[, p-1] | | data$y - 2 + 4 * data$dup - 2 * data$dup^2 + rnorm(n) | | data - data[, sample(1:ncol(data))] | | str(data) | | library(gbm) | fit - gbm(y~., data = data, | distribution = gaussian, | interaction.depth = 10, | n.trees = 100, | verbose = FALSE) | summary(fit) What happens if there's a third? data$DUP -data$dup fit - gbm(y~., data = data, + distribution = gaussian, + interaction.depth = 10, + n.trees = 100, + verbose = FALSE) summary(fit) var rel.inf 1 DUP 55.98653321 2 dup 42.99934344 3 V2 0.30763599 4 V1 0.17108839 5 V4 0.14272470 6 V3 0.13069450 7 V6 0.07839121 8 V7 0.07109805 9 V5 0.06080096 10 V8 0.05168955 11 V9 0. So V9 which was identical to dup has now gone off the radar altogether. At first I thought that might be because 100 trees wasn't nearly enough, so I increased it to 6000 and added in some cross-validation. Doing a summary at the optimal number of trees still gives a similar result. I have to admit to being somewhat puzzled. -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Average minds discuss events (:_~*~_:) Small minds discuss people (_)-(_) . Eleanor Roosevelt ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gradient Boosting Trees with correlated predictors in gbm
In most implementations of boosting, and for that matter, single tree, the first variable wins when there are ties. In randomForest the variables are sampled, and thus not tested in the same order from one node to the next, thus the variables are more likely to share the glory. Best, Andy From: Patrick Connolly On Mon, 01-Mar-2010 at 12:01PM -0500, Max Kuhn wrote: | In theory, the choice between two perfectly correlated predictors is | random. Therefore, the importance should be diluted by half. | However, this is implementation dependent. | | For example, run this: | | set.seed(1) | n - 100 | p - 10 | | data - as.data.frame(matrix(rnorm(n*(p-1)), nrow = n)) | data$dup - data[, p-1] | | data$y - 2 + 4 * data$dup - 2 * data$dup^2 + rnorm(n) | | data - data[, sample(1:ncol(data))] | | str(data) | | library(gbm) | fit - gbm(y~., data = data, | distribution = gaussian, | interaction.depth = 10, | n.trees = 100, | verbose = FALSE) | summary(fit) What happens if there's a third? data$DUP -data$dup fit - gbm(y~., data = data, + distribution = gaussian, + interaction.depth = 10, + n.trees = 100, + verbose = FALSE) summary(fit) var rel.inf 1 DUP 55.98653321 2 dup 42.99934344 3 V2 0.30763599 4 V1 0.17108839 5 V4 0.14272470 6 V3 0.13069450 7 V6 0.07839121 8 V7 0.07109805 9 V5 0.06080096 10 V8 0.05168955 11 V9 0. So V9 which was identical to dup has now gone off the radar altogether. At first I thought that might be because 100 trees wasn't nearly enough, so I increased it to 6000 and added in some cross-validation. Doing a summary at the optimal number of trees still gives a similar result. I have to admit to being somewhat puzzled. -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ~.~.~.~.~. ___Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Average minds discuss events (:_~*~_:) Small minds discuss people (_)-(_). Eleanor Roosevelt ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ~.~.~.~.~. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachme...{{dropped:10}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange behavior with poisosn and glm
On 2/03/2010, at 9:02 PM, Noah Silverman wrote: Hi, I'm just learning about poison links for the glm function. One of the data sets I'm playing with has several of the variables as factors (i.e. month, group, etc.) When I call the glm function with a formula that has a factor variable, R automatically converts the variable to a series of variables with unique names and binary values. For example, with this pseudo data: yv1month 21january 31.4februrary 1.56.3february 1.24.5january 5.54.0march I use this call: m - glm(y ~ v1 + month, family=poisson) R gives me back a model with variables of Intercept v1 monthJanuary monthFebruary monthMarch No it didn't!!! You are kidding the troops/being economical with the truth. If you had used the data that you show, it would've ``given you a model with variables'': Intercept v1 monthfebruray monthjanuary monthmarch No caps in the month name and note the miss-spelling of ``february''. You actually have ***four*** levels for the month factor: january februrary february march If you had spelt ``februrary'' correctly you would have got variables Intercept v1 monthjanuary monthmarch The first level, february would have been omitted, under the default contrasts (contr.treatment). You need k-1 dummy variables to specify a factor with k levels. I'm concerned that this might be doing some strange things to my model. No, you are doing strange things. Notice also that the Poisson distribution is a distribution of ***counts***. Non-negative integers. Whole numbers. Values like 1.5 and 1.2 make no immediate sense in terms of the Poisson distribution. The Poisson likelihood can be evaluated with non-integer responses, but the glm() function will quite rightly worry about non-integer values and give you a warning. (Which you didn't mention.) If you really have non-integer valued responses you shouldn't be using the Poisson family; the quasi family *might* be appropriate --- if you know what you're doing. Can anyone offer some enlightenment? I hope you feel enlightened. cheers, Rolf Turner ## Attention: This e-mail message is privileged and confidential. If you are not the intended recipient please delete the message and notify the sender. Any views or opinions presented are solely those of the author. This e-mail has been scanned and cleared by MailMarshal www.marshalsoftware.com ## __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] distribution for random effects?
Hi R users, I am using the following model to analyze data from a factorial experiment (randomized complete block design with no replication within blocks): model - glm(survival ~ density * vegetation + (1|block), data=sal2005, family=binomial) Does R use a binomial distribution in this formulation to model random effects or a normal distribution (in which case the analysis is not binomial at the scale of the experiment)? If the latter, is there a way to specify the distribution for random effects? Thanks, Maureen [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Three most useful R package
Hi R-fans, I would like put out a question to all R users on this list and hope it will create some feedback and discussion. 1) What are your 3 most useful R package? and 2) What R package do you still miss and why do you think it would make a useful addition? Pulling answers together for these questions will serve as a guide for new users and help people who just want to get a hint where to look first. Happy replying! Best, Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gradient Boosting Trees with correlated predictors in gbm
On Tue, Mar 2, 2010 at 2:43 PM, Liaw, Andy andy_l...@merck.com wrote: In most implementations of boosting, and for that matter, single tree, the first variable wins when there are ties. They must be in a union :-) What happens if there's a third? If they were P perfectly correlated predictors, the importance would would be 100% for the first one encountered by gbm. In reality, where the correlation is strong but not perfect, the other variables would show up with small importances. In the case of RF, the dilution factor is 1/P for perfect correlations and gets fuzzier as the correlation decreases (for reasons that Andy articulated). -- Max __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] add a header to a forest plot (metafor)
Hi Sebastian, Here is an example showing a forest plot with some column headings: library(metafor) data(dat.bcg) dat - dat.bcg res - rma(ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat, measure=RR) windows(width=6.5, height=4.0, pointsize=10) par(mar=c(4,0,4,0)) forest(res, slab=paste(dat$author, , , dat$year, sep=), xlim=c(-16,6), at=log(c(.05,.25,1,4)), atransf=exp, ilab=cbind(dat$tpos, dat$tneg, dat$cpos, dat$cneg), ilab.xpos=c(-9.5,-8,-6,-4.5), cex=.8, ylim=c(-1.5,16), efac=1.8) text(c(-9.5,-8,-6,-4.5), 14.7, c(TB+, TB-, TB+, TB-), font=2, cex=.8) text(c(-8.75,-5.25), 15.7, c(Vaccinated, Control), font=2, cex=.8) text(-16,14.7, Author(s) and Year, pos=4, font=2, cex=.8) text(6, 14.7, Relative Risk [95% CI], pos=2, font=2, cex=.8) title(Figure 1: Forest Plot of the BCG Vaccine Data) So, just use the text() function to add those column headings. With the ilab and ilab.xpos arguments, you can add the information for those columns to the plot. I hope the example helps! Best, -- Wolfgang Viechtbauerhttp://www.wvbauer.com/ Department of Methodology and StatisticsTel: +31 (0)43 388-2277 School for Public Health and Primary Care Office Location: Maastricht University, P.O. Box 616 Room B2.01 (second floor) 6200 MD Maastricht, The Netherlands Debyeplein 1 (Randwyck) Original Message From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Sebastian Stegmann Sent: Tuesday, March 02, 2010 18:36 To: r-help@r-project.org Subject: [R] add a header to a forest plot (metafor) Dear R-community, I'm currently trying to assemble a forest plot using the forest function from package metaphor. Works well. Even the regular main-argument works for adding a title to the graph. However, I would like to add one top row which explains the nature of the columns. Very much like the usual header in spreadsheet programs. For example: Study Sample Sample Size Estimated Effect Size CI 95%. I tried to add axis(3), but apparently the forest plot isn't that kind of graphic. Does anyone have any idea? Cheerio Sebastian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with choose.files
Try this; specify where you want the second one to start: files.a - choose.files() # now change to the directory of the first file name to continue search files.b - choose.files(paste(dirname(files.a[1L]), *, sep='/')) On Tue, Mar 2, 2010 at 12:17 PM, Caleb Rounds caleb.rou...@gmail.com wrote: I have recently upgraded to R 2.10.1 on Windows XP and am using scripts that I've used in previous versions successfully. I'm having a problem with choose.files. The lines read: fura_scan_file-choose.files(caption=Select log file (*.log) for fura-2 scans) PI_scan_file-choose.files(caption=Select log file (*.log) for PI scans) The problem is that the directory chosen after the first choose.files is not remembered. This is an issue b/c my files are nested inside of several directories and it takes a lot of clicking to get to where I need to be. Is there a problem with these lines? Is it likely elsewhere in the script? I apologize for my ignorance and wasting time, but in the documentation for choose.files it suggests this should happen automatically. Caleb Rounds __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] variable substitution in for loops
Friends Seems I've run into another snag. More of the nitty-gritty r-details I don't understand. So, as I mentioned below, dataset[[var_sub]] seems to be understood well by the functions I previously used and I was able to run my loop successfully with the [[var_sub]] as a variable-substitution method. However, now I want to do the same with TukeyHSD, and this function does not play nice with this kind of syntax. So if I do fac-as.factor(dataset$factor) res-aov(dataset$var~dataset$factor) tuk-TukeyHSD(res,fac) things work fine. But if I try (similar to the script below which worked for ROCR functions): fac-as.factor(dataset$factor) var_sub-noquotes(var) res-aov(dataset[[var_sub]]~dataset$factor) tuk-TukeyHSD(res,fac) TukeyHSD craps out with an error, even though res is identical in both cases, apart from the formula syntax. So, TukeyHSD seems to be picky about syntax. Is there any other way I can do variable substitution (so I can read variable names from my list) and get this loop to work for TukeyHSD? Thanks Jon Friends First, thanks to all for great feed-back. Open-source rocks! I have a workable solution to my question, attached below in case it might be of any use to anyone. I'm sure there are more elegant ways of doing this, so any further feedback is welcome! Things I've learned (for other noobs like me to learn from): 1) dataset[[j]] seems equivalent to dataset$var if j-var, though quotes can mess you up, hence j-noquote(varlist[i]) in the script (it also makes a difference that variables in varlist be stored as a space-separated string. tab- or line-break-separated lists don't seem to work, though a different method might handle that) dataset[[var]] is equivalent to dataset$var given var does not contain any special characters. Otherwise j == var has to be TRUE. 2) Loops will abort if they encounter an error (like ROCR encountering a prediction that is singular). Error handling can be built in, but is a little tricky. I reduplicated the method with a function to test and advance the loop on failure. You can suppress error messages if you like) Not tricky, just use try(). 3) Some stats methods don't have NA handling built into them (eg: prediction in ROCR chokes if there are empty cells in the variables) hence it seems a good idea to strip these out before starting. The subsetting with na.omit does this ... given you know what you are doing (and omitting). 4) You reference pieces (slots) of results (S3/S4 objects) by using obj...@slot. The @ operator is defined for slots of *S4* classes. Best, Uwe Ligges Hence, you pull out the the auc value in ROCR-performance by p...@y.value in the script. you can see what slots are in an object by simply listing the object contents at the command lineobject. Thanks again for all the help! Jon Soli Deo Gloria Jon Erik Ween, MD, MS Scientist, Kunin-Lunenfeld Applied Research Unit Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre Assistant Professor, Dept. of Medicine, Div. of Neurology University of Toronto Faculty of Medicine ...code ## R script for automating stats crunching in large datasets ## ## Needs space separated list of variable names matching dataset column names ## ## You have to tinker with the code to customize for your application ## ## ## ## Jon Erik Ween MD, MSc, 26 Feb 2010## library(ROCR) # Load stats package to use if not standard varslist-scan(/Users/jween/Desktop/INCASvars.txt,list) # Read variable list results-as.data.frame(array(,c(3,length(varslist # Initialize results array, one type of stat at a time for now for (i in 1:length(varslist)){ # Loop throught the variables you want to process. Determined by varslist j-noquote(varslist[i]) vars-c(varslist[i],Issue_class) # Variables to be analyzed temp-na.omit(incas[vars]) # Have to subset to get rid of NA values causing ROCR to choke n-nrow(temp) # Record how many cases the analysis ios based on. Need to figure out how to calc cases/controls #.table-table(temp$SubjClass) # Maybe for later figure out cases/controls results[1,i]-j # Name particular results column results[2,i]-n # Number of subjects in analysis test-try(aucval(i,j),silent=TRUE) # Error handling in case procedure craps oust so loop can continue. Supress annoying error messages if(class(test)==try-error) next else # Run procedure only if OK, otherwise skip pred-prediction(incas[[j]],incas$Issue_class); # Procedure perf-performance(pred,auc); results[3,i]-as.numeric(p...@y.values) # Enter result into
[R] R / R+ Webminar *** R-PLUS Rocks: Interactive, Comprehensible and Highly Visual. March 12th @ 12PM ET (USA Time)
Welcome to R/ R-PLUS Webminar Series. R-PLUS 3.3 Rocks: Interactive, Comprehensible and Highly Visual. http://www.xlsolutions-corp.com/webminar.asp. March 12th @ 12PM ET (USA Time) Increase your productivity with R-PLUS 3.3 by attending the webminar and learning how to: 1. Interactively clicking your way through your favorite statisticals models without the need of programming. 2. Use state-of-the-art R-PLUS tools to produce Publication Quality Graphics and Reports at a click 3. Edit your Graphics 4. Take advantage of the new R-PLUS 64-bit on windows for larger data sets 5. For SAS users, our new app R+SAS2R lets you see at a click exactly which R function (syntax included) is equivalent to a given SAS Proc! Come learn about R-PLUS 3.3 new cool features and suggest improvements. Space is limited. Reserve your webminar seat now at: http://www.xlsolutions-corp.com/webminar.asp. You can also email Ms Jennifer McDonald ( jen at xlsolutions-corp.com) to register or request the free webminar video. Our March-April R training courses are available at: www.xlsolutions-corp.com/rcourses Regards - Sue Turner Senior Account Manager XLSolutions Corporation North American Division 1700 7th Ave Suite 2100 Seattle, WA 98101 Phone: 206-686-1578 Email: sue at xlsolutions-corp.com web: www.xlsolutions-corp.com/rcourses __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating a timeSeries Data Frame
Hello I have 2000 univariate timeSeries of about 20 observations each, as the following, I would like to store all of them in one object, sort of a data frame, and to be able to recall each by its column name, which by the way is the same as the first date. Do you know how can I do this. Thank you Felipe Parra GMT 2009-10-12 2009-10-12 0.002346171 2009-10-14 0.002346171 2009-10-21 0.002346171 2009-10-28 0.002650307 2009-11-16 0.002391950 2009-11-16 0.003848032 2010-03-16 0.003848032 2010-06-17 0.008644137 2010-09-16 0.010690464 2010-12-15 0.016356718 2011-03-15 0.018496109 2011-06-16 0.023354671 2011-09-15 0.025211351 2011-12-21 0.029029900 2012-03-21 0.031173566 2012-06-21 0.033641977 2012-10-15 0.023078052 2013-04-15 -0.118415755 2013-10-15 -0.010497527 2014-04-14 0.010497527 2014-10-14 -0.010497527 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] selection simulation
Hi, I was having trouble developing a looping function in a selection simulation that I'm trying to develop and I was hoping if someone could help. Basically, I have a matrix with a random generated numbers representing scores on the variables. The rows represent applicants and columns represent the variables. # Number of variables p-5 # Number of applicants n_ap-100 # Create random scores for the applicants across five variables z-rnorm(n_ap*p,0,1) #Put z into a matrix dim(z)-c(n_ap,p) #Rank applicant scores on variable X1 d1 - rev(order(z[,1])) #Rank applicant scores on variable X2 d2 - rev(order(z[,2])) #Rank applicant scores on variable X3 d3 - rev(order(z[,3])) #Rank applicant scores on variable X4 d4 - rev(order(z[,4])) #Rank applicant scores on variable X5 d5 - rev(order(z[,5])) pool - cbind(d1,d2,d3,d4,d5) Is there a way to specify a vector of selection ratio (e.g. sr-c(.10,.20,.30) ) and use this in a loop so that the function will produce a matrix of applicants who will be selected when the selection ratio is .10, .20, and .30? Thank you and please leave a post if more info is needed -- View this message in context: http://n4.nabble.com/selection-simulation-tp1575587p1575587.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Binding a matrix to a matrix
Hello I have a 2x10x200 matrix and I would like to bind to it another 2x10 matrix in order to end up with an 2x10x2001 matrix, which command should i use in order to do this? Thank you Felipe Parra [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Binding a matrix to a matrix
If it is 3 dimensional then it is an array, not a matrix. The abind function in the abind package is probably what you want. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Luis Felipe Parra Sent: Tuesday, March 02, 2010 3:23 PM To: r-help@r-project.org Subject: [R] Binding a matrix to a matrix Hello I have a 2x10x200 matrix and I would like to bind to it another 2x10 matrix in order to end up with an 2x10x2001 matrix, which command should i use in order to do this? Thank you Felipe Parra [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.