Re: [R] R Crashes when using large matrices (Ubuntu 11.04)
On Fri, 3 Jun 2011, Matias Salibian-Barrera wrote: Hello, This simple SVD calculation (commands are copied immediately below) crashes on my Ubuntu machine (R 2.13.0). However it works fine on my Windows 7 machine, so I suspect there's a problem with (my?) Ubuntu and / or R. Can anybody else reproduce it (with Ubuntu 11.04)? Thanks in advance. From the traceback, the error appears to be in LAPACK or BLAS. There is no evidence here that 'R crashes' rather than one of those crashed R. You don't tell us whether you compiled R yourself or used someone else's pre-compiled distribution -- if the latter, ask on r-sig-debian as this is most likely a problem with the distribution, since Debian/Ubuntu builds normally replace R's LAPACK/BLAS with that from the OS. It works correctly on a vanilla R build on i686 Fedora 14. p - 500 n - 300 set.seed(1234) x - matrix(rnorm(n*p), n, p) sih - var(x) b - svd(sih) produces: *** caught illegal operation *** address 0x42b8c9, cause 'illegal operand' Traceback: 1: .Call(La_svd, jobu, jobv, x, double(min(n, p)), u, v, dgsedd, PACKAGE = base) 2: La.svd(x, nu, nv) 3: svd(sih) I'm using Ubuntu 11.04 and version _ platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 2 minor 13.0 year 2011 month 04 day 13 svn rev 55427 language R version.string R version 2.13.0 (2011-04-13) Thanks, Matias __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] outlining data points
Hi, I have tried numerous methods and packages, but thus far cannot seem to find a solution. I am looking to essentially draw a filled colored shape around subsets on my data points on a scatter plot where none of the shapes overlap but instead bend around each other if necessary. I finally came up with the following steps which approximate what I am looking for, but I am completely lost as to how to implement several of the steps. The steps are listed below, along with a pictorial representation of what I am hoping they will achieve. Any suggestions would be greatly appreciated. 1. Draw a circle of a give radius around each data point radius can be the same for each point, or can be determined by a vector of values either the same length of the number of points or is repeated until all points are assigned 2. Begin with the area of greatest overlap between two circles, draw a line segment between the two intersection points and assign either side of that line to its respective shape 3. Repeat with largest remaining area of overlap. If a previous division has left an intersection point within the new area of overlap, such that there are now two possible points to attach the line segment to, use the one from which a division has already been drawn (so that three shapes now come together in a point) 4. Repeat with successively smaller areas of overlap until no remain 5. Fill each resulting shape with a color determined by an outside vector associated with the points 6. (if possible) calculate the area of each resultant shape http://r.789695.n4.nabble.com/file/n3572306/diagram_circles_coloring_3.png http://r.789695.n4.nabble.com/file/n3572306/circle_interactions_3.png -- View this message in context: http://r.789695.n4.nabble.com/outlining-data-points-tp3572306p3572306.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] movie3d in rgl object 'movie' not found
Hello, I am trying to save a .gif movie using movie3d from the package {rgl}. I am using the following code combined with the globe example on the ?movie3d page. I've installed ImageMagick and the directory seems to be working properly, i.e. when I do Sys.getenv(PATH), C:\\Program Files (x86)\\ImageMagick-6.7.0-Q16 shows up. library(rgl) open3d() lat - matrix(seq(90,-90, len=50)*pi/180, 50, 50, byrow=TRUE) long - matrix(seq(-180, 180, len=50)*pi/180, 50, 50) r - 6378.1 # radius of Earth in km x - r*cos(lat)*cos(long) y - r*cos(lat)*sin(long) z - r*sin(lat) persp3d(x, y, z, col=white, texture=system.file(textures/world.png,package=rgl), specular=black, axes=FALSE, box=FALSE, xlab=, ylab=, zlab=, normal_x=x, normal_y=y, normal_z=z) #I run the above, note the device ID and then enter the following with rgl.cur(1) if my device ID is 1. movie3d(par3dinterp(par3dsave(params = c(userMatrix, scale, zoom, FOV), times = FALSE, dev = rgl.cur(1))), duration = 5, fps = 10, movie = movie, frames = movie, dir=tempdir(), type = gif) #The par3d window pops up, I move the globe around a bit and press record a few times. Then when I press quit, I get the following error: Error in sprintf(%s%03d.png, frames, i) : object 'movie' not found Sorry if I've made a silly mistake; I'm kind of a newb. I haven't found any record of this same issue on the web. Many Thanks! Michelle -- View this message in context: http://r.789695.n4.nabble.com/movie3d-in-rgl-object-movie-not-found-tp3572316p3572316.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] logistic growth model
HI I want to Fit a logistic growth model for this data set where y = k*exp(b0+b1(age))/1 + exp(bo+b1(age)), start = list(b0 = 3, b1 = 3.5 ), trace = TRUE) I need to find the initial valued for b0 and b1. K =3. When I run using b0=3 abd b1 = 3.5, or any number I get the following error 397448.4 : 3.0 3.5 Error in numericDeriv(form[[3L]], names(ind), env) : Missing value or an infinity produced when evaluating the model. Can anyone help me how to get the initial values? what does this error msg implies? -- View this message in context: http://r.789695.n4.nabble.com/logistic-growth-model-tp3572734p3572734.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] logistic growth model
I want to Fit a logistic growth model for y = k *eb0+b1(age)/1 + eb0+b1(age), can some one help on how to get the initial coefficients b0 and b1? I need to estimate in order to do the regression analysis. When I run using b0=0.5 and b1=3.4818, I get the following error 397443.8 : 0.5 3.4818 Error in nls(Height ~ k * exp(b1 + b2 * Age)/(1 + exp(b1 + b2 * Age)), : singular gradient please tell me what is wrong with my initials values, and how to get the initial values __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] movie3d in rgl object 'movie' not found
On Fri, 3 Jun 2011, someone ashamed of his/her real name wrote: Hello, I am trying to save a .gif movie using movie3d from the package {rgl}. I am using the following code combined with the globe example on the ?movie3d page. I've installed ImageMagick and the directory seems to be working properly, i.e. when I do Sys.getenv(PATH), C:\\Program Files (x86)\\ImageMagick-6.7.0-Q16 shows up. library(rgl) open3d() lat - matrix(seq(90,-90, len=50)*pi/180, 50, 50, byrow=TRUE) long - matrix(seq(-180, 180, len=50)*pi/180, 50, 50) r - 6378.1 # radius of Earth in km x - r*cos(lat)*cos(long) y - r*cos(lat)*sin(long) z - r*sin(lat) persp3d(x, y, z, col=white, texture=system.file(textures/world.png,package=rgl), specular=black, axes=FALSE, box=FALSE, xlab=, ylab=, zlab=, normal_x=x, normal_y=y, normal_z=z) #I run the above, note the device ID and then enter the following with rgl.cur(1) if my device ID is 1. But rgl.cur() is the current device, and it does not take an argument in the version of rgl I have. movie3d(par3dinterp(par3dsave(params = c(userMatrix, scale, zoom, FOV), times = FALSE, dev = rgl.cur(1))), duration = 5, fps = 10, movie = movie, frames = movie, dir=tempdir(), type = gif) I think you meant to set dev= in movie3d, not par3dsave (which appears to be part of package tkrgl which you failed to even mention). #The par3d window pops up, I move the globe around a bit and press record a few times. Then when I press quit, I get the following error: Error in sprintf(%s%03d.png, frames, i) : object 'movie' not found Sorry if I've made a silly mistake; I'm kind of a newb. I haven't found any record of this same issue on the web. Don't give the values of arguments that you want to take default values. Specifying 'frames = movie' is not the same thing as using the default value (the scoping rules differ). None of fps = 10, movie = movie, frames = movie, dir=tempdir(), type = gif) is needed (nor would dev = rgl.cur() be). Many Thanks! Michelle -- View this message in context: http://r.789695.n4.nabble.com/movie3d-in-rgl-object-movie-not-found-tp3572316p3572316.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generating random covariance matrices (with a uniform distribution of correlations)
On Fri, Jun 03, 2011 at 01:54:33PM -0700, Ned Dochtermann wrote: Petr, This is the code I used for your suggestion: k-6;kk-(k*(k-1))/2 x-matrix(0,5000,kk) for(i in 1:5000){ A.1-matrix(0,k,k) rs-runif(kk,min=-1,max=1) A.1[lower.tri(A.1)]-rs A.1[upper.tri(A.1)]-t(A.1)[upper.tri(A.1)] cors.i-diag(k) t-.001-min(Re(eigen(A.1)$values)) new.cor-cov2cor(A.1+(t*cors.i)) x[i,]-new.cor[lower.tri(new.cor)]} hist(c(x)); max(c(x)); median(c(x)) This, unfortunately, does not maintain the desired distribution of correlations. Hello. On the contrary to what i thought originally, there are solutions also for the case of the correlation matrix. The first solution creates a singular correlation matrix (of rank 3), but the nondiagonal entries have exactly the uniform distribution on [-1, 1], since the scalar product of two independent uniformly distributed unit vectors in R^3 has the uniform distribution on [-1, 1]. x - matrix(rnorm(18), nrow=6, ncol=3) x - x/sqrt(rowSums(x^2)) a - x %*% t(x) The next solution produces a correlation matrix of full rank, whose non-diagonal entries have distribution very close to the uniform on [-1, 1]. KS test finds a difference only with sample size more than 50'000. w - c(0.01459422, 0.01830718, 0.04066405, 0.50148488, 0.60330865, 0.61832829) x - matrix(rnorm(36), nrow=6, ncol=6) %*% diag(w) x - x/sqrt(rowSums(x^2)) a - x %*% t(x) Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Value of 'pi'
I asked a radiographer friend of mine to examine this suggestion, but he said it wouldn't scan. Ted. On 04-Jun-11 04:04:43, John wrote: Last line, try but one can't, for Pi is transcendental. On Friday, June 03, 2011 04:12:07 AM Jim Lemon wrote: On 06/01/2011 10:14 AM, baptiste auguie wrote: I propose a Pi Haiku (PIQ), Pi is of certain value, In statistics, invaluable, yet Transcending numerics. How about a pi limerick? Pi, the great circumferential, nearly sent the geometers mental. For they tried to extract a solution exact but one can't, for the thing's transcendental. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) ted.hard...@wlandres.net Fax-to-email: +44 (0)870 094 0861 Date: 04-Jun-11 Time: 11:38:09 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bootstrap or Wilks ?
Dear R-users, The following question is more a statistic question than a R issue. I would like to know the difference between the two following non parametric technics : the bootstrap and the Wilks formula. I understand well the theory about the two technics but cannot find anything about their advantages and drawbacks and how to choose one rather than the other... The Wilks formula seems to be the simpliest but I don't know more. Anybody could help ? Thanks in advance. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Superscripts in strip labels of lattice plot
Peter et, al: As a minor note ,,, On Fri, Jun 3, 2011 at 8:00 PM, Peter Ehlers ehl...@ucalgary.ca wrote: David has given you the answer. I'll just add that you might want to widen the strips a bit if you use superscripted factor levels: xyplot(Sepal.Length ~ Sepal.Width | Species, data = iris, strip = strip.custom(factor.levels = expression( 'A'^2,'A'^3,'A'^4)), par.settings = list(layout.heights = list(strip = 1.5))) The quotes surrounding A^2, A^3, A^4 can be omitted. -- Bert Peter Ehlers Here is an example that comes up on a search with terms expression strip.default (which I thought was the correct argument to the strip parameter but turns out I was not remembering my documentation correctly: http://finzi.psych.upenn.edu/R/Rhelp02/archive/57933.html -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R Crashes when using large matrices (Ubuntu 11.04)
On Ubuntu 10.04 it ran fine, albeit in a machine with lots of memory, it seems to work fine. Here's the output: rm(list=ls()) gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 131881 7.1 35 18.7 35 18.7 Vcells 128838 1.0 786432 6.0 559631 4.3 p - 500 n - 300 set.seed(1234) x - matrix(rnorm(n*p), n, p) sih - var(x) b - svd(sih) gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 133536 7.2 35 18.7 35 18.7 Vcells 1030006 7.92644909 20.2 2536523 19.4 sessionInfo() R version 2.13.0 (2011-04-13) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C [3] LC_TIME=en_US.utf8LC_COLLATE=en_US.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8 [7] LC_PAPER=en_US.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base Maybe another 11.04 glitch. JN On 06/04/2011 06:00 AM, r-help-requ...@r-project.org wrote: Message: 93 Date: Fri, 3 Jun 2011 18:55:06 -0700 (PDT) From: Matias Salibian-Barrera msalib...@yahoo.ca To: R-help@r-project.org R-help@r-project.org Subject: [R] R Crashes when using large matrices (Ubuntu 11.04) Message-ID: 75655.88533...@web161614.mail.bf1.yahoo.com Content-Type: text/plain; charset=iso-8859-1 This simple SVD calculation (commands are copied immediately below) crashes on my Ubuntu machine (R 2.13.0). However it works fine on my Windows 7 machine, so I suspect there's a problem with (my?) Ubuntu and / or R. Can anybody else reproduce it (with Ubuntu 11.04)? Thanks in advance. p - 500 n - 300 set.seed(1234) x - matrix(rnorm(n*p), n, p) sih - var(x) b - svd(sih) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] library(SenoMineR)- Triangle Test Query
Dear R Group I was trying to use the triangle.test function in SensoMineR and strangely i encounter a error in the output of preference matrix from the analysis. To illustrate, pl see the following dataframe of a design with the response and preference collected as shown below: design-structure(list(`Product X` = c(3, 1, 4, 2, 4, 2, 1, 3, 4, 2, 4, 2, 1, 3, 4, 2, 4, 2, 3, 1), `Product Y` = c(1, 1, 4, 4, 4, 3, 1, 1, 4, 4, 4, 3, 1, 1, 4, 4, 4, 3, 1, 1), `Product Z` = c(3, 2, 1, 2, 3, 3, 2, 3, 1, 2, 3, 3, 2, 3, 1, 2, 3, 3, 3, 2), Response = structure(c(1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L), .Label = c(X, Z), class = factor), Preference = structure(c(1L, 3L, 1L, 1L, 1L, 2L, 3L, 1L, 1L, 1L, 1L, 2L, 3L, 1L, 1L, 1L, 1L, 2L, 1L, 2L), .Label = c(X, Y, Z), class = factor)), .Names = c(Product X, Product Y, Product Z, Response, Preference), class = data.frame, row.names = c(Panelist1.Test1, Panelist1.Test2, Panelist2.Test1, Panelist2.Test2, Panelist3.Test1, Panelist3.Test2, Panelist4.Test1, Panelist4.Test2, Panelist5.Test1, Panelist5.Test2, Panelist6.Test1, Panelist6.Test2, Panelist7.Test1, Panelist7.Test2, Panelist8.Test1, Panelist8.Test2, Panelist9.Test1, Panelist9.Test2, Panelist10.Test1, Panelist10.Test2)) If you were to investigate the above dataframe, you would find that for the comparision of Product 2 Vs Product 3, the preference indicates product 3 is preferred over product 2 all the time. ## Read output from the following script to see what i mean above: subset(design,`Product X`==2`Product Y`==3`Product Z`==3) ##Output of above would be: . Product X Product Y Product Z Response Preference Panelist3.Test2 2 3 3X Y Panelist6.Test2 2 3 3X Y Panelist9.Test2 2 3 3X Y However when I analyse the design with the answers and preferences using the following script, I get the $pref output which shows that product 2 is preferred over 3 all the time. Can somebody explain what is wrong in my script? answer-as.vector(design$Response) preference-as.vector(design$Preference) triangle.test (design[,1:3], answer,preference) ##$pref output from the triangle.test function shows as follows: $pref 1 2 3 4 1 0 0 0 0 2 4 0 3 0 3 0 0 0 0 4 0 0 0 0 Any help in helping me identify what is going wrong here would be highly appreciated. Regards Vijayan Padmanabhan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Checking and building package
On 04.06.2011 09:37, Petar Milin wrote: Dear Uwe, Please, can you help me with this? I simplified the situation: I am using Debian testing i386 (also, I have VirtualBox with Win XP on it) myPackage is the name of the package and foo.c is in the src/ in NAMESPACE I put: useDynLib(foo) export(f1, f2) everything else is in appropriate place: data/, man/, R/, DESCRIPTION. And as you asked, R CMD INSTALL (or build --binary) generates myPackage.so, but not foo.so. How to proceed? Where I went wrong? Everything is alright and behaves as it should. The dll/so should be named as the package. Well, foo is just an example and should be replaced by your package's name. So my guess was right and you missed the point to replace the foo by the package name in useDynLib(foo) Best, Uwe Ligges Best, Petar On 03/06/11 22:32, Uwe Ligges wrote: On 03.06.2011 21:46, Petar Milin wrote: Hello! I am truing to compile an R-package having c-code. I put foo.c in src/ folder and useDynLib(foo) Where foo is the name of your package, I hope. Does R CMD INSTALL yourpackage generate a packagename.so (or .dll)? If so, it is just the useSynLib() entry that fails. Or does R CMD INSTALL give any error message? Uwe Ligges in NAMESPACE file. When trying R CMD check, I got an error message that shared object 'foo' is not found. Then I did R CMD SHLIB foo.c first. However, after that, I got warnings from R CMD check that there is an object file in /src folder. Even worse is if I run R CMD SHLIB for Windows and for Linux and put in /src both foo.so and foo.dll. What I am doing wrong? I thought that only *.c is needed in src/, then, I read in someones advice that both the source and shared library must be in src/. What should be done if one wants to prepare for CRAN? Thanks! Best, Petar __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [R-SIG-Finance] Measure quality of fit for MA(q), ARMA(p, q) and GARCH(p, q)
Thank you so much all for your invaluable inputs. On Sat, Jun 4, 2011 at 3:36 AM, Patrick Burns patr...@burns-stat.com wrote: A common thing to do is the Ljung-Box test on the residuals. For garch it would be the residuals squared. Actually for garch it should be the rank of the squared residuals -- see http://www.burns-stat.com/pages/Working/ljungbox.pdf However, this is an in-sample test. Much better is to do out-of-sample tests. On 04/06/2011 04:46, Robert A'gata wrote: Hi, I would like to ask for a guideline on how to assess quality of fit for MA, ARMA and GARCH process. For AR, it still looks like a regression for me. So I still can rely on R-square as long as the time series itself is stationary. However, for MA, ARMA or GARCH, I do not know what measure I should use to assess fit quality. Any suggestions would be appreciated. Thank you. Robert ___ r-sig-fina...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. -- Patrick Burns patr...@burns-stat.com http://www.burns-stat.com http://www.portfolioprobe.com/blog twitter: @portfolioprobe __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] outlining data points
On Jun 3, 2011, at 7:20 PM, lana wrote: Hi, I have tried numerous methods and packages, but thus far cannot seem to find a solution. I am looking to essentially draw a filled colored shape around subsets on my data points on a scatter plot where none of the shapes overlap but instead bend around each other if necessary. I finally came up with the following steps which approximate what I am looking for, but I am completely lost as to how to implement several of the steps. The steps are listed below, along with a pictorial representation of what I am hoping they will achieve. Any suggestions would be greatly appreciated. Are you trying to reinvent the wheel? A typical test for clustering algorithms that attempt to find non-convex clusters (ones that bend around) is the rFace function. Searching with the engine used to support the RSiteSearch function finds 17 R functions in various packages that use it in their examples: http://search.r-project.org/cgi-bin/namazu.cgi?query=rFacemax=100result=normalsort=scoreidxname=functions -- David. 1. Draw a circle of a give radius around each data point radius can be the same for each point, or can be determined by a vector of values either the same length of the number of points or is repeated until all points are assigned 2. Begin with the area of greatest overlap between two circles, draw a line segment between the two intersection points and assign either side of that line to its respective shape 3. Repeat with largest remaining area of overlap. If a previous division has left an intersection point within the new area of overlap, such that there are now two possible points to attach the line segment to, use the one from which a division has already been drawn (so that three shapes now come together in a point) 4. Repeat with successively smaller areas of overlap until no remain 5. Fill each resulting shape with a color determined by an outside vector associated with the points 6. (if possible) calculate the area of each resultant shape http://r.789695.n4.nabble.com/file/n3572306/diagram_circles_coloring_3.png http://r.789695.n4.nabble.com/file/n3572306/circle_interactions_3.png -- View this message in context: http://r.789695.n4.nabble.com/outlining-data-points-tp3572306p3572306.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] nonparametric logistic regression based on locally weighted scatterplot smoothing (lowess)
Dear UseRs: Recently, I have read an article regarding the association between age and lymph node metastases. http://jco.ascopubs.org/content/27/18/2931.long In statistical analysis, the authors stated Because a nonlinear relationship between age and lymph node involvement was expected based on existing literature, lymph node involvement was also regressed on age using nonparametric logistic regression based on locally weighted scatterplot smoothing (lowess). http://jco.ascopubs.org/content/27/18/2931.long#ref-11 Could someone explain nonparametric logistic regression based on locally weighted scatterplot smoothing (lowess)? Or it is nonparametric regression based on locally weighted scatterplot smoothing (lowess) Thanks *Yao Zhu* *Department of Urology Fudan University Shanghai Cancer Center Shanghai, China* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nonparametric logistic regression based on locally weighted scatterplot smoothing (lowess)
On Jun 4, 2011, at 11:41 AM, zhu yao wrote: Dear UseRs: Recently, I have read an article regarding the association between age and lymph node metastases. http://jco.ascopubs.org/content/27/18/2931.long In statistical analysis, the authors stated Because a nonlinear relationship between age and lymph node involvement was expected based on existing literature, lymph node involvement was also regressed on age using nonparametric logistic regression based on locally weighted scatterplot smoothing (lowess). http://jco.ascopubs.org/content/27/18/2931.long#ref-11 Could someone explain nonparametric logistic regression based on locally weighted scatterplot smoothing (lowess)? Or it is nonparametric regression based on locally weighted scatterplot smoothing (lowess) One can use a logistic link and a local likelihood. Loader describes the advantages of such a strategy and shows a worked example in pages 60-65 of her text Local Regression and Likelihood. But there is no apparent R content in this question (and the authors of the above paper said they used SAS) so this very much off-topic for this list. You really should start such requests for explication by addressing the authors of the paper. Two other web-based statistical sites for general or medical statistics questions can be found at the GoogleGroups MedStats group and http://stats.stackexchange.com/ . -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] LM/two way analysis/classic parametrisation
I will be pleased to know, how to get the classic parametrisation in a two way analysis of varians: Classic parametrisation: Observed = intercept + row-effect + col-effect+ error, where sum af row-effect=0 and sum of col_effect=0 -- View this message in context: http://r.789695.n4.nabble.com/LM-two-way-analysis-classic-parametrisation-tp3573453p3573453.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LM/two way analysis/classic parametrisation
On Jun 4, 2011, at 10:33 AM, kfl wrote: I will be pleased to know, how to get the classic parametrisation in a two way analysis of varians: Classic parametrisation: Observed = intercept + row-effect + col-effect+ error, where sum af row-effect=0 and sum of col_effect=0 ?contrasts # which has a link to contr.sum -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R Crashes when using large matrices (Ubuntu 11.04)
On Fri, Jun 3, 2011 at 7:03 PM, Matias Salibian-Barrera msalib...@yahoo.ca wrote: Hello, This simple SVD calculation (commands are copied immediately below) crashes on my Ubuntu machine (R 2.13.0). However it works fine on my Windows 7 machine, so I suspect there's a problem with (my?) Ubuntu and / or R. Can anybody else reproduce it (with Ubuntu 11.04)? Thanks in advance. Works fine for me with Ubuntu 11.04 (amd_64) and the pre-compiled R-2.13.0 $ wajig list r-base-core ii r-base-core2.13.0-2natty0 GNU R core of statistical computation and graphics system n - 300 set.seed(1234) x - matrix(rnorm(n*p), n, p) sih - var(x) b - svd(sih) str(b) List of 3 $ d: num [1:500] 5.04 4.94 4.92 4.83 4.82 ... $ u: num [1:500, 1:500] -0.03663 0.05414 0.00182 -0.02847 -0.00117 ... $ v: num [1:500, 1:500] -0.03663 0.05414 0.00182 -0.02847 -0.00117 ... sessionInfo() R version 2.13.0 (2011-04-13) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] OAuthFactory error
i'm trying to create an new OAuthFactory variable for twitter but when i run the handshake i get the following error: Error in FUN(c(key, secret, : unused argument(s) (post.amp = TRUE) has anyone seen this before or have any suggestions. thanks ahead -- View this message in context: http://r.789695.n4.nabble.com/OAuthFactory-error-tp3573800p3573800.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] modify a data frame by values in the columns
On 2011-06-03 13:34, Jason024 wrote: I have a data frame like this: col1 col2 r1 21 r2 43 r3 65 r4 87 r5109 r612 11 r714 13 r816 15 r918 17 r10 20 19 I want to modify this data frame, for example, assign every row in column col1 and col2 to -1 if the values in col1 is less than 12 and values in col2 is greater than 10. The result should look like this: col1 col2 r1 -11 r2 -13 r3 -15 r4 -17 r5 -19 r612 -1 r714 -1 r816 -1 r918 -1 r10 20 -1 I have been struggling to make it to work. Any help is appreciated! This seems made for within(); calling your data.frame 'd': d.new - within(d, { col1 - ifelse(col1 12, -1, col1) col2 - ifelse(col2 10, -1, col2) }) Peter Ehlers Jason -- View this message in context: http://r.789695.n4.nabble.com/modify-a-data-frame-by-values-in-the-columns-tp3571995p3571995.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] modify a data frame by values in the columns
On 2011-06-04 11:11, Peter Ehlers wrote: On 2011-06-03 13:34, Jason024 wrote: I have a data frame like this: col1 col2 r1 21 r2 43 r3 65 r4 87 r5109 r612 11 r714 13 r816 15 r918 17 r10 20 19 I want to modify this data frame, for example, assign every row in column col1 and col2 to -1 if the values in col1 is less than 12 and values in col2 is greater than 10. The result should look like this: col1 col2 r1 -11 r2 -13 r3 -15 r4 -17 r5 -19 r612 -1 r714 -1 r816 -1 r918 -1 r10 20 -1 I have been struggling to make it to work. Any help is appreciated! This seems made for within(); calling your data.frame 'd': d.new- within(d, { col1- ifelse(col1 12, -1, col1) col2- ifelse(col2 10, -1, col2) }) And, of course, the ifelse() isn't necessary: d.new - within(d, { col1[ col1 12 ] - -1 col2[ col2 10 ] - -1 }) Peter Ehlers Jason -- View this message in context: http://r.789695.n4.nabble.com/modify-a-data-frame-by-values-in-the-columns-tp3571995p3571995.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] OAuthFactory error
What have you actually tried? Please read the posting guide. It also tells you to specify version numbers and OS. New vesions are on its way to CRAN. So we can neitehr reproduce (without your code) nor know which version you are trying. Uwe Ligges On 04.06.2011 19:46, hawkhandler wrote: i'm trying to create an new OAuthFactory variable for twitter but when i run the handshake i get the following error: Error in FUN(c(key, secret, : unused argument(s) (post.amp = TRUE) has anyone seen this before or have any suggestions. thanks ahead -- View this message in context: http://r.789695.n4.nabble.com/OAuthFactory-error-tp3573800p3573800.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Nonlinear model fitting of numerical integral function
Dear R List-Members, I wish to find the nonlinear least squares of function defined by an integral which must be evaluated numerically. Is that possible to implement in R? If it is possible, which problems I need to consider first? Many Thanks, Dejan -- View this message in context: http://r.789695.n4.nabble.com/Nonlinear-model-fitting-of-numerical-integral-function-tp3573978p3573978.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nonparametric logistic regression based on locally weighted scatterplot smoothing (lowess)
Take a look at packages mgcv or gam (and probably others). Different smoothers are used, but it's nonlinear, nonparametric logistic regression. which is usually the important part. It also penalizes, which can be even more important than which smoother is used. -- Bert On Sat, Jun 4, 2011 at 9:02 AM, David Winsemius dwinsem...@comcast.net wrote: On Jun 4, 2011, at 11:41 AM, zhu yao wrote: Dear UseRs: Recently, I have read an article regarding the association between age and lymph node metastases. http://jco.ascopubs.org/content/27/18/2931.long In statistical analysis, the authors stated Because a nonlinear relationship between age and lymph node involvement was expected based on existing literature, lymph node involvement was also regressed on age using nonparametric logistic regression based on locally weighted scatterplot smoothing (lowess). http://jco.ascopubs.org/content/27/18/2931.long#ref-11 Could someone explain nonparametric logistic regression based on locally weighted scatterplot smoothing (lowess)? Or it is nonparametric regression based on locally weighted scatterplot smoothing (lowess) One can use a logistic link and a local likelihood. Loader describes the advantages of such a strategy and shows a worked example in pages 60-65 of her text Local Regression and Likelihood. But there is no apparent R content in this question (and the authors of the above paper said they used SAS) so this very much off-topic for this list. You really should start such requests for explication by addressing the authors of the paper. Two other web-based statistical sites for general or medical statistics questions can be found at the GoogleGroups MedStats group and http://stats.stackexchange.com/ . -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with Snowball RWeka
I too have this problem. Everything worked fine last year, but after updating R and packages I can no longer do word stemming. Unfortunately, I didn't save the old binaries, otherwise I would just revert back. Hoping someone finds a solution for R on Windows. Thanks! There is a potential solution for R on Mac OS from Kurt Hornik copied below, but I cannot get this to work on Windows. Here's the code I'm running: #1) Using package Snowball library(Snowball) source - readLines(system.file(words, porter,voc.txt,package = Snowball)) result - SnowballStemmer(source) #2) Using package tm library(tm) data(crude) stemDocument(crude[[1]]) In both instances I got a Java error Could not initialize the GenericPropertiesCreator. This exception was produced: java.lang.NullPointerException. After receiving this error once in the session, no further error messages are generated. However, SnowballStemmer() and stemDocument() return the original unstemmed text. Possible Solution: For those on Mac OS, Kurt Hornik wrote... These issues seem to be specific to Mac OS X. Recent versions of Weka have added a package management system not unlike R's, to the effect that now when external packages (or the Snowball jar) is loaded their KnowledgeFlow GUI is started, which in turn requires AWT---and from what I understand, this does not work on Mac OS X. Short term, you should be able to Sys.setenv(NOAWT, true). More long term, the Weka maintainers have patched their upstream code so that it is possible to turn off the dynamic class discovery altogether, but I have not found the time to test this ... I realize this solution was for Mac OS, but not knowing anything about rJava I tried this on Windows anyways resulting in Error in Sys.setenv(NOAWT, true) : all arguments must be named Here's my session info. R version 2.13.0 Patched (2011-04-21 r55576) Platform: i386-pc-mingw32/i386 (32-bit) (Windows Vista) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] Snowball_0.0-7 tm_0.5-6 rcom_2.2-3.1 rscproxy_1.3-1 loaded via a namespace (and not attached): [1] grid_2.13.0 rJava_0.9-0 (same error with multiple older versions) RWeka_0.4-7 RWekajars_3.7.3-1 [5] slam_0.1-22 tools_2.13.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Predicted values based on fixed effects do not correspond with actual data in cross-classified generalized linear mixed model (lmer)
Dear R-Users, I have fitted a cross-classified generalized linear mixed model using the lmer package with the following code. Mod-lmer(y~x+(1|a)+(1|b)+ (1|c), family=binomial) In this case, only including a covariate (x) as a fixed effect. The fitted values, using fitted(mod), correspond to the raw data nicely, and the mean of the fitted values is equal to the mean of the raw data. In addition, the parameter estimate for the fixed effect (x) corresponds to the data as well (the slope ‘seems’ right). So far so good. The problem arises when I calculate the predicted values based on the intercept and the parameter estimate of the fixed effect, using the formula exp(X)/(1+EXP(X)), where X=intercept + par. Est. * x. When I use calculate the mean of these predicted values, this mean is much lower than the mean of the actual data. The shape of the predicted curve fits nicely to the data, but the predicted lines is always ‘below’ the actual data. Apparently, the intercept of the curve is not predicted correctly. Does anyone know why this is? I guess it has something to do with the fact that the intercept for the fixed effects is estimated for a certain value of the random effects? According to the R documentation on fitted values; ‘the fitted values at level i are obtained by adding together the contributions from the estimated fixed effects and the estimated random effects…’. But is there an 'average contribution' of the random effects? Is there a way to evaluate the fixed effects at the ‘average level’ of the random effects? Do I need to adjust the formula for the predictions to take into account the random effects? Many thanks, Gert Stulp -- View this message in context: http://r.789695.n4.nabble.com/Predicted-values-based-on-fixed-effects-do-not-correspond-with-actual-data-in-cross-classified-gener-tp3574116p3574116.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] using Tps function
Hi, I'm using the Tps function of the fields package to plot 2D surfaces. My problem is that some arrays have a lot of zeroes and this function fits the data in a way that contour lines for zero are all over the place. Sometimes there are lines where there shouldn't be any that extend from the part of the array where there are values greater than zero. How can't I get rid of them (besides using Illustrator or photoshop)? I have tried changing the lambda, but it doesn't help. Is there any other function that would work better to smooth data? thank you in advance, Tania [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] packages for power law distribution
p { margin-bottom: 0.08in; } Dear All, I will appreciate some suggestions of R packages for ESTIMATION OF THE EXPONENT OF POWER-LAW FREQUENCY DISTRIBUTIONS. I have been searching at the R-help list several keywords for this subject and I did not find a very specific package, except the useful normalp package. I believe there are others but I was not able to identify it. I have interest in the exponent of power-law distribution of some events (only frequency) and not bivariate relationships between two variable. Specifically I am looking for packages that has functions of pareto, truncate pareto, discrete pareto and power law function with maximum likelihood estimation. What would be the suggestions for that? Thanks a lot for your attention. Sincerely, Fernando [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] packages for power law distribution
There's code at http://tuvalu.santafe.edu/~aaronc/powerlaws/ -thomas On Sun, Jun 5, 2011 at 9:01 AM, fernando del bon fernandodel...@yahoo.com.br wrote: p { margin-bottom: 0.08in; } Dear All, I will appreciate some suggestions of R packages for ESTIMATION OF THE EXPONENT OF POWER-LAW FREQUENCY DISTRIBUTIONS. I have been searching at the R-help list several keywords for this subject and I did not find a very specific package, except the useful normalp package. I believe there are others but I was not able to identify it. I have interest in the exponent of power-law distribution of some events (only frequency) and not bivariate relationships between two variable. Specifically I am looking for packages that has functions of pareto, truncate pareto, discrete pareto and power law function with maximum likelihood estimation. What would be the suggestions for that? Thanks a lot for your attention. Sincerely, Fernando [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nonparametric logistic regression based on locally weighted scatterplot smoothing (lowess)
The Stanform gam() [gam package] has choices of spline or local-polynomial (defaulting to local-linear) smoothers. That's probably the best match for the description. It shouldn't be necessary to guess -- the paper should have cited the package -- but we know that is often missed. -thomas On Sun, Jun 5, 2011 at 7:43 AM, Bert Gunter gunter.ber...@gene.com wrote: Take a look at packages mgcv or gam (and probably others). Different smoothers are used, but it's nonlinear, nonparametric logistic regression. which is usually the important part. It also penalizes, which can be even more important than which smoother is used. -- Bert On Sat, Jun 4, 2011 at 9:02 AM, David Winsemius dwinsem...@comcast.net wrote: On Jun 4, 2011, at 11:41 AM, zhu yao wrote: Dear UseRs: Recently, I have read an article regarding the association between age and lymph node metastases. http://jco.ascopubs.org/content/27/18/2931.long In statistical analysis, the authors stated Because a nonlinear relationship between age and lymph node involvement was expected based on existing literature, lymph node involvement was also regressed on age using nonparametric logistic regression based on locally weighted scatterplot smoothing (lowess). http://jco.ascopubs.org/content/27/18/2931.long#ref-11 Could someone explain nonparametric logistic regression based on locally weighted scatterplot smoothing (lowess)? Or it is nonparametric regression based on locally weighted scatterplot smoothing (lowess) One can use a logistic link and a local likelihood. Loader describes the advantages of such a strategy and shows a worked example in pages 60-65 of her text Local Regression and Likelihood. But there is no apparent R content in this question (and the authors of the above paper said they used SAS) so this very much off-topic for this list. You really should start such requests for explication by addressing the authors of the paper. Two other web-based statistical sites for general or medical statistics questions can be found at the GoogleGroups MedStats group and http://stats.stackexchange.com/ . -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nonparametric logistic regression based on locally weighted scatterplot smoothing (lowess)
Actually they say they used SAS, and Googling for SAS local linear logistic suggests they used PROC GAM with the LOESS() smoother. Probably quite similar to gam::gam(). -thomas On Sun, Jun 5, 2011 at 9:12 AM, Thomas Lumley tlum...@uw.edu wrote: The Stanform gam() [gam package] has choices of spline or local-polynomial (defaulting to local-linear) smoothers. That's probably the best match for the description. It shouldn't be necessary to guess -- the paper should have cited the package -- but we know that is often missed. -thomas On Sun, Jun 5, 2011 at 7:43 AM, Bert Gunter gunter.ber...@gene.com wrote: Take a look at packages mgcv or gam (and probably others). Different smoothers are used, but it's nonlinear, nonparametric logistic regression. which is usually the important part. It also penalizes, which can be even more important than which smoother is used. -- Bert On Sat, Jun 4, 2011 at 9:02 AM, David Winsemius dwinsem...@comcast.net wrote: On Jun 4, 2011, at 11:41 AM, zhu yao wrote: Dear UseRs: Recently, I have read an article regarding the association between age and lymph node metastases. http://jco.ascopubs.org/content/27/18/2931.long In statistical analysis, the authors stated Because a nonlinear relationship between age and lymph node involvement was expected based on existing literature, lymph node involvement was also regressed on age using nonparametric logistic regression based on locally weighted scatterplot smoothing (lowess). http://jco.ascopubs.org/content/27/18/2931.long#ref-11 Could someone explain nonparametric logistic regression based on locally weighted scatterplot smoothing (lowess)? Or it is nonparametric regression based on locally weighted scatterplot smoothing (lowess) One can use a logistic link and a local likelihood. Loader describes the advantages of such a strategy and shows a worked example in pages 60-65 of her text Local Regression and Likelihood. But there is no apparent R content in this question (and the authors of the above paper said they used SAS) so this very much off-topic for this list. You really should start such requests for explication by addressing the authors of the paper. Two other web-based statistical sites for general or medical statistics questions can be found at the GoogleGroups MedStats group and http://stats.stackexchange.com/ . -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Thomas Lumley Professor of Biostatistics University of Auckland -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cbind 3 or more matrices
How can I cbind three or more matrices like A,B and C. This does not work: cbind(A,B,C) -- Thanks, Jim. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cbind 3 or more matrices
do.call(cbind, list(A, B, C)) On Sat, Jun 4, 2011 at 7:14 PM, Jim Silverton jim.silver...@gmail.com wrote: How can I cbind three or more matrices like A,B and C. This does not work: cbind(A,B,C) -- Thanks, Jim. -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cbind 3 or more matrices
A, B, C should have the same number of rows. mlist = replicate(3, matrix(rnorm(6), 2), simplify=FALSE) names(mlist) = LETTERS[seq_along(mlist)] with(mlist, cbind(A,B,C)) or, do.call(cbind, mlist) HTH, baptiste On 5 June 2011 11:14, Jim Silverton jim.silver...@gmail.com wrote: How can I cbind three or more matrices like A,B and C. This does not work: cbind(A,B,C) -- Thanks, Jim. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cbind 3 or more matrices
Jim - In what sense does cbind(A,B,C) not work? A = matrix(rnorm(10),5,2) B = matrix(rnorm(15),5,3) C = matrix(rnorm(20),5,4) cbind(A,B,C) [,1] [,2] [,3] [,4] [,5][,6] [1,] -0.54194873 -1.1105170 -0.479010 0.619911163 0.1610162 0.49028633 [2,] -0.39289246 0.0752089 1.427386 -0.921868090 -0.7637016 -0.34905125 [3,] -0.07082828 -0.1060497 -1.007713 -0.003673573 -0.8384406 -0.88816295 [4,] 0.22733701 -1.6134894 -1.993654 2.277865363 -2.3599239 -0.21704046 [5,] -0.13809337 0.3443488 -1.384425 0.132130433 0.1345938 -0.04170648 [,7] [,8][,9] [1,] -1.7481451 0.4467964 -0.41358420 [2,] -0.2882922 1.0243662 -0.48263684 [3,] 0.9402479 0.5467952 -0.01922035 [4,] 0.6795783 1.4560765 -0.23013826 [5,] 0.9800312 -1.3462175 -0.77064872 - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Sat, 4 Jun 2011, Jim Silverton wrote: How can I cbind three or more matrices like A,B and C. This does not work: cbind(A,B,C) -- Thanks, Jim. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interpreting Quantile Regression
This is not really an R question, and it indicates that you have a good deal of studying to do about quantile regression before you rely on it. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-Quantile-Regression-tp3574216p3574454.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Interpreting Quantile Regression
Hey there! In normal Regression if p=alpha, there is no significance. If i get this in quantile regression (for every tau), can I conclude, that there is no relationship between x and y? -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-Quantile-Regression-tp3574216p3574216.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Partial Matching
Let's say that I have a string and I want to know if a single word is present in the string. I've written the following function to see if the word Geico is mentioned in the string Cheap Geico car insurance. However, it doesn't work, and I assume it has something to do with the any() function. Do I need to use regular expressions? (I hope not) main - function(keyword){ for( i in keyword ){ n = strsplit(as.character(keyword), ) print( n ) if( any( n==Geico )){ print( Yes ) } } } main(Cheap Geico car insurance) I'm running R 2.13 on Ubuntu 10.10 Thanks, Abraham [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Binary response GLM Question
Hi all, I have a problem with binary response data in GLM fitting. The problem is that the y take only 1 or 0, and if I use logit link, it is the log of the odds ratio, which is p/(1-p). In my situation, think y is p, so sometimes the odds is 0, sometimes it is 1/0, which is (should be) undefine? I wonder how R fits the glm? The FULL detail of this exercise is as follow: -- The data here are concerned with whether people default on a loan taken from a particular bank and for identical interest rates and for a fixed period. The information on each individual is their sex (male of female); their income (in pounds), whether the person is a home owner or not, their age (in years), and the amount of the loan (in pounds). The information recorded is whether the individal defaulted on the loan or not. Study the data and try and understand a relation between the persons characteristics and defaulting. Specifically, what is your estimated probability that a female aged 42, who is not a home owner, has an income of 23,500, and took a loan of 12,000, defaults on the loan? The table holding the data have headings as follows: m/f: male=1, female=0 age: age in years home: home=1 is a home owner, home=0 is not a home owner inc: income loan: amount of loan def: default=1, non-default=0. -- my R code Q3=read.table(tabl3.dat) colnames(Q3)=c(Sex,Age,Home,Inc,Loan,Def) Q3$Sex=as.factor(Q3$Sex) Q3$Home=as.factor(Q3$Home) Q3$Def=as.factor(Q3$Def) Q3.mod=glm(Def~Sex+Age+Home+Inc+Loan,data=Q3,family=binomial(logit)) I dont really get that HOW R actually fits the model? if there is 1/0 that it has to calculate? This does give me some results but I dont quite feel right about it. Now, if I use the empirical logit link, which has a 0.5 correction, log ( y+0.5/ (1+0.5-y) ) as the response, then regress it on the explanntory variables, I got some estimated probability to be 0.49* (when you transfer the log odds back to p), whereas the previous model give 0. Am I wrong in the first place to think that the response is y=default? How should I approach this? Thanks! DATA is attached. http://r.789695.n4.nabble.com/file/n3574478/tabl3.dat tabl3.dat -- View this message in context: http://r.789695.n4.nabble.com/Binary-response-GLM-Question-tp3574478p3574478.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Partial Matching
On Sat, Jun 4, 2011 at 6:44 PM, Abraham Mathew abmathe...@gmail.com wrote: Let's say that I have a string and I want to know if a single word is present in the string. I've written the following function to see if the word Geico is mentioned in the string Cheap Geico car insurance. However, it doesn't work, and I assume it has something to do with the any() function. Do I need to use regular expressions? (I hope not) main - function(keyword){ for( i in keyword ){ n = strsplit(as.character(keyword), ) print( n ) if( any( n==Geico )){ print( Yes ) } } } main(Cheap Geico car insurance) strsplit returns a one component list containing the vector of words so you want to replace the relevant statement with: n = strsplit(as.character(keyword), )[[1]] however, regular expressions is shorter: x - c(Cheap Geico car insurance, Cheap Gorilla car insurance, A Geicor car) regexpr(\\bGeico\\b, x) 0 [1] TRUE FALSE FALSE -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Binary response GLM Question
Hi, Y is not the same as P. P is the conditional probability given the data matrix. So theoretically, P can take on any value in [0, 1], which means the odds can be anywhere from [0, +infty], not just 0 or undefined. In logistic regression, the logit link is pretty standard, so I do not think you would need to use the empirical logit link. I am not sure how much detail you want when you ask how does R fit the glm. It uses an iterative algorithm. If you are willing to spend the time to work through the code, you can learn a lotjust type: binomial at the console (no quotes no () after it), the source for the binomial family will print to the console and you can look through the logit link code. That gets passed off to glm() to use to fit the model. For a more general explanation of the general process, I would get a book or look online for information on logistic regression or maximum liklihood estimation. Cheers, Josh On Sat, Jun 4, 2011 at 6:09 PM, casperyc caspe...@hotmail.co.uk wrote: Hi all, I have a problem with binary response data in GLM fitting. The problem is that the y take only 1 or 0, and if I use logit link, it is the log of the odds ratio, which is p/(1-p). In my situation, think y is p, so sometimes the odds is 0, sometimes it is 1/0, which is (should be) undefine? I wonder how R fits the glm? The FULL detail of this exercise is as follow: -- The data here are concerned with whether people default on a loan taken from a particular bank and for identical interest rates and for a fixed period. The information on each individual is their sex (male of female); their income (in pounds), whether the person is a home owner or not, their age (in years), and the amount of the loan (in pounds). The information recorded is whether the individal defaulted on the loan or not. Study the data and try and understand a relation between the persons characteristics and defaulting. Specifically, what is your estimated probability that a female aged 42, who is not a home owner, has an income of 23,500, and took a loan of 12,000, defaults on the loan? The table holding the data have headings as follows: m/f: male=1, female=0 age: age in years home: home=1 is a home owner, home=0 is not a home owner inc: income loan: amount of loan def: default=1, non-default=0. -- my R code Q3=read.table(tabl3.dat) colnames(Q3)=c(Sex,Age,Home,Inc,Loan,Def) Q3$Sex=as.factor(Q3$Sex) Q3$Home=as.factor(Q3$Home) Q3$Def=as.factor(Q3$Def) Q3.mod=glm(Def~Sex+Age+Home+Inc+Loan,data=Q3,family=binomial(logit)) I dont really get that HOW R actually fits the model? if there is 1/0 that it has to calculate? This does give me some results but I dont quite feel right about it. Now, if I use the empirical logit link, which has a 0.5 correction, log ( y+0.5/ (1+0.5-y) ) as the response, then regress it on the explanntory variables, I got some estimated probability to be 0.49* (when you transfer the log odds back to p), whereas the previous model give 0. Am I wrong in the first place to think that the response is y=default? How should I approach this? Thanks! DATA is attached. http://r.789695.n4.nabble.com/file/n3574478/tabl3.dat tabl3.dat -- View this message in context: http://r.789695.n4.nabble.com/Binary-response-GLM-Question-tp3574478p3574478.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to convert a factor column into a numeric one?
I have a data frame: head(df) Time Temp Conc ReplLog10 10 -20H1 6.406547 22 -20H1 5.738683 37 -20H1 5.796394 4 14 -20H1 4.413691 504H1 6.406547 774H1 5.705433 str(df) 'data.frame': 177 obs. of 5 variables: $ Time : Factor w/ 4 levels 0,2,7,14: 1 2 3 4 1 3 4 1 3 4 ... $ Temp : Factor w/ 4 levels -20,4,25,..: 1 1 1 1 2 2 2 3 3 3 ... $ Conc : Factor w/ 3 levels H,L,M: 1 1 1 1 1 1 1 1 1 1 ... $ Repl : Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 1 1 1 1 1 1 ... $ Log10: num 6.41 5.74 5.8 4.41 6.41 ... levels(df$Temp) [1] -20 4 25 45 levels(df$Time) [1] 0 2 7 14 As you can see, Time and Temp are currently factors, not numeric. I would like to change these columns into numerical ones. df$Time- as.numeric(df$Time) doesn't work, as it changes to the factor level indices (1,2,3,4) instead of the values (0,2,7,14). There must be a direct way of doing this in R. I tried recode() in 'car': df$Temp- recode(df$Temp, '1=-20;2=25;3=4;4=45',as.factor.result=FALSE) head(df) Time Temp Conc Repl Freq 10 -20H1 6.406547 22 -20H1 5.738683 37 -20H1 5.796394 4 14 -20H1 4.413691 50 45H1 6.406547 77 45H1 5.705433 but note that the values for 'Temp' in rows 5 and 7 are 45 and not 4, as expected, although the result is numeric. The same happens if I use the order given by levels(df$Temp) instead of the sort order in the recode() 2nd argument. Any hints? Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: r...@lcfltd.com Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to convert a factor column into a numeric one?
Hi: Try this: dd - data.frame(a = factor(rep(1:5, each = 4)), + b = factor(rep(rep(1:2, each = 2), 5)), + y = rnorm(20)) str(dd) 'data.frame': 20 obs. of 3 variables: $ a: Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 2 2 2 2 3 3 ... $ b: Factor w/ 2 levels 1,2: 1 1 2 2 1 1 2 2 1 1 ... $ y: num 0.6396 1.467 1.8403 -0.0915 0.2711 ... de - within(dd, { + a - as.numeric(as.character(a)) + b - as.numeric(as.character(b)) +} ) str(de) 'data.frame': 20 obs. of 3 variables: $ a: num 1 1 1 1 2 2 2 2 3 3 ... $ b: num 1 1 2 2 1 1 2 2 1 1 ... $ y: num 0.6396 1.467 1.8403 -0.0915 0.2711 ... HTH, Dennis On Sat, Jun 4, 2011 at 9:31 PM, Robert A. LaBudde r...@lcfltd.com wrote: I have a data frame: head(df) Time Temp Conc Repl Log10 1 0 -20 H 1 6.406547 2 2 -20 H 1 5.738683 3 7 -20 H 1 5.796394 4 14 -20 H 1 4.413691 5 0 4 H 1 6.406547 7 7 4 H 1 5.705433 str(df) 'data.frame': 177 obs. of 5 variables: $ Time : Factor w/ 4 levels 0,2,7,14: 1 2 3 4 1 3 4 1 3 4 ... $ Temp : Factor w/ 4 levels -20,4,25,..: 1 1 1 1 2 2 2 3 3 3 ... $ Conc : Factor w/ 3 levels H,L,M: 1 1 1 1 1 1 1 1 1 1 ... $ Repl : Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 1 1 1 1 1 1 ... $ Log10: num 6.41 5.74 5.8 4.41 6.41 ... levels(df$Temp) [1] -20 4 25 45 levels(df$Time) [1] 0 2 7 14 As you can see, Time and Temp are currently factors, not numeric. I would like to change these columns into numerical ones. df$Time- as.numeric(df$Time) doesn't work, as it changes to the factor level indices (1,2,3,4) instead of the values (0,2,7,14). There must be a direct way of doing this in R. I tried recode() in 'car': df$Temp- recode(df$Temp, '1=-20;2=25;3=4;4=45',as.factor.result=FALSE) head(df) Time Temp Conc Repl Freq 1 0 -20 H 1 6.406547 2 2 -20 H 1 5.738683 3 7 -20 H 1 5.796394 4 14 -20 H 1 4.413691 5 0 45 H 1 6.406547 7 7 45 H 1 5.705433 but note that the values for 'Temp' in rows 5 and 7 are 45 and not 4, as expected, although the result is numeric. The same happens if I use the order given by levels(df$Temp) instead of the sort order in the recode() 2nd argument. Any hints? Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: r...@lcfltd.com Least Cost Formulations, Ltd. URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239 Fax: 757-467-2947 Vere scire est per causas scire __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to convert a factor column into a numeric one?
Dr. LaBudde, Perhaps as.numeric(as.character(x)) is what you are looking for. HTH, Jorge On Sun, Jun 5, 2011 at 12:31 AM, Robert A. LaBudde wrote: I have a data frame: head(df) Time Temp Conc ReplLog10 10 -20H1 6.406547 22 -20H1 5.738683 37 -20H1 5.796394 4 14 -20H1 4.413691 504H1 6.406547 774H1 5.705433 str(df) 'data.frame': 177 obs. of 5 variables: $ Time : Factor w/ 4 levels 0,2,7,14: 1 2 3 4 1 3 4 1 3 4 ... $ Temp : Factor w/ 4 levels -20,4,25,..: 1 1 1 1 2 2 2 3 3 3 ... $ Conc : Factor w/ 3 levels H,L,M: 1 1 1 1 1 1 1 1 1 1 ... $ Repl : Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 1 1 1 1 1 1 ... $ Log10: num 6.41 5.74 5.8 4.41 6.41 ... levels(df$Temp) [1] -20 4 25 45 levels(df$Time) [1] 0 2 7 14 As you can see, Time and Temp are currently factors, not numeric. I would like to change these columns into numerical ones. df$Time- as.numeric(df$Time) doesn't work, as it changes to the factor level indices (1,2,3,4) instead of the values (0,2,7,14). There must be a direct way of doing this in R. I tried recode() in 'car': df$Temp- recode(df$Temp, '1=-20;2=25;3=4;4=45',as.factor.result=FALSE) head(df) Time Temp Conc Repl Freq 10 -20H1 6.406547 22 -20H1 5.738683 37 -20H1 5.796394 4 14 -20H1 4.413691 50 45H1 6.406547 77 45H1 5.705433 but note that the values for 'Temp' in rows 5 and 7 are 45 and not 4, as expected, although the result is numeric. The same happens if I use the order given by levels(df$Temp) instead of the sort order in the recode() 2nd argument. Any hints? Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: r...@lcfltd.com Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to convert a factor column into a numeric one?
Hi Robert, Try this: ## Example data converting mtcars to factors testdf - as.data.frame(lapply(mtcars, factor)) str(testdf) ## taking advantage of assignment methods to avoid an explicit call to as.data.frame ## convert factor to numeric using the technique recommended in ?factor testdf[] - lapply(testdf, function(x) as.numeric(levels(x))[x]) str(testdf) If you do not want to convert all columns, just use a subset. Here is one way: testdf[, c(mpg, cyl, disp)] - lapply(testdf[, c(mpg, cyl, disp)], function(x) as.numeric(levels(x))[x]) I would also look into *why* those numeric columns are being stored as factors in the first place. If you are reading the data in with read.table() or one of its wrapper functions (like read.csv), then it would be better to preempt the storage as a factor altogether rather than converting back to numeric. For example, perhaps something is being used to indicate missing data that R does not recognize (e.g., SAS uses .). Specifying na.strings = ., would fix this. See ?read.table for some of the options available. Hope this helps, Josh On Sat, Jun 4, 2011 at 9:31 PM, Robert A. LaBudde r...@lcfltd.com wrote: I have a data frame: head(df) Time Temp Conc Repl Log10 1 0 -20 H 1 6.406547 2 2 -20 H 1 5.738683 3 7 -20 H 1 5.796394 4 14 -20 H 1 4.413691 5 0 4 H 1 6.406547 7 7 4 H 1 5.705433 str(df) 'data.frame': 177 obs. of 5 variables: $ Time : Factor w/ 4 levels 0,2,7,14: 1 2 3 4 1 3 4 1 3 4 ... $ Temp : Factor w/ 4 levels -20,4,25,..: 1 1 1 1 2 2 2 3 3 3 ... $ Conc : Factor w/ 3 levels H,L,M: 1 1 1 1 1 1 1 1 1 1 ... $ Repl : Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 1 1 1 1 1 1 ... $ Log10: num 6.41 5.74 5.8 4.41 6.41 ... levels(df$Temp) [1] -20 4 25 45 levels(df$Time) [1] 0 2 7 14 As you can see, Time and Temp are currently factors, not numeric. I would like to change these columns into numerical ones. df$Time- as.numeric(df$Time) doesn't work, as it changes to the factor level indices (1,2,3,4) instead of the values (0,2,7,14). There must be a direct way of doing this in R. I tried recode() in 'car': df$Temp- recode(df$Temp, '1=-20;2=25;3=4;4=45',as.factor.result=FALSE) head(df) Time Temp Conc Repl Freq 1 0 -20 H 1 6.406547 2 2 -20 H 1 5.738683 3 7 -20 H 1 5.796394 4 14 -20 H 1 4.413691 5 0 45 H 1 6.406547 7 7 45 H 1 5.705433 but note that the values for 'Temp' in rows 5 and 7 are 45 and not 4, as expected, although the result is numeric. The same happens if I use the order given by levels(df$Temp) instead of the sort order in the recode() 2nd argument. Any hints? Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: r...@lcfltd.com Least Cost Formulations, Ltd. URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239 Fax: 757-467-2947 Vere scire est per causas scire __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to convert a factor column into a numeric one?
Exactly! Thanks. At 12:49 AM 6/5/2011, Jorge Ivan Velez wrote: Dr. LaBudde, Perhaps as.numeric(as.character(x)) is what you are looking for. HTH, Jorge On Sun, Jun 5, 2011 at 12:31 AM, Robert A. LaBudde wrote: I have a data frame: head(df) Time Temp Conc ReplLog10 10 -20H1 6.406547 22 -20H1 5.738683 37 -20H1 5.796394 4 14 -20H1 4.413691 504H1 6.406547 774H1 5.705433 str(df) 'data.frame': 177 obs. of 5 variables: $ Time : Factor w/ 4 levels 0,2,7,14: 1 2 3 4 1 3 4 1 3 4 ... $ Temp : Factor w/ 4 levels -20,4,25,..: 1 1 1 1 2 2 2 3 3 3 ... $ Conc : Factor w/ 3 levels H,L,M: 1 1 1 1 1 1 1 1 1 1 ... $ Repl : Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 1 1 1 1 1 1 ... $ Log10: num 6.41 5.74 5.8 4.41 6.41 ... levels(df$Temp) [1] -20 4 25 45 levels(df$Time) [1] 0 2 7 14 As you can see, Time and Temp are currently factors, not numeric. I would like to change these columns into numerical ones. df$Time- as.numeric(df$Time) doesn't work, as it changes to the factor level indices (1,2,3,4) instead of the values (0,2,7,14). There must be a direct way of doing this in R. I tried recode() in 'car': df$Temp- recode(df$Temp, '1=-20;2=25;3=4;4=45',as.factor.result=FALSE) head(df) Time Temp Conc Repl Freq 10 -20H1 6.406547 22 -20H1 5.738683 37 -20H1 5.796394 4 14 -20H1 4.413691 50 45H1 6.406547 77 45H1 5.705433 but note that the values for 'Temp' in rows 5 and 7 are 45 and not 4, as expected, although the result is numeric. The same happens if I use the order given by levels(df$Temp) instead of the sort order in the recode() 2nd argument. Any hints? Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: mailto:r...@lcfltd.comr...@lcfltd.com Least Cost Formulations, Ltd.URL: http://lcfltd.com/http://lcfltd.com/ 824 Timberlake Drive Tel: tel:757-467-0954757-467-0954 Virginia Beach, VA 23464-3239Fax: tel:757-467-2947757-467-2947 Vere scire est per causas scire __ mailto:R-help@r-project.orgR-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: r...@lcfltd.com Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to convert a factor column into a numeric one?
Thanks for your help. As far as your question below is concerned, the data frame arose as a result of some data cleaning on an original data frame, which was changed into a table, modified, and changed back to a data frame: ttcrmean- as.table(by(ngbe[,'Log10'], list(Time=ngbe$Time,Temp=ngbe$Temp,Conc=ngbe$Conc,Repl=ngbe$Replicate), mean)) for (k in 1:3) { #fix-up time zeroes for (l in 1:5) { #replicates t0val- ttcrmean[1,3,k,l] for (j in 1:4) { #temps ttcrmean[1,j,k,l]- t0val } #j } #l } #i df- na.omit(as.data.frame(ttcrmean)) colnames(df)[5]- 'Log10' At 12:51 AM 6/5/2011, Joshua Wiley wrote: Hi Robert, snip I would also look into *why* those numeric columns are being stored as factors in the first place. If you are reading the data in with read.table() or one of its wrapper functions (like read.csv), then it would be better to preempt the storage as a factor altogether rather than converting back to numeric. For example, perhaps something is being used to indicate missing data that R does not recognize (e.g., SAS uses .). Specifying na.strings = ., would fix this. See ?read.table for some of the options available. snip Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: r...@lcfltd.com Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to convert a factor column into a numeric one?
Thanks! Exactly what I wanted, as the same as Jorge also suggested. At 12:49 AM 6/5/2011, Dennis Murphy wrote: Hi: Try this: dd - data.frame(a = factor(rep(1:5, each = 4)), + b = factor(rep(rep(1:2, each = 2), 5)), + y = rnorm(20)) str(dd) 'data.frame': 20 obs. of 3 variables: $ a: Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 2 2 2 2 3 3 ... $ b: Factor w/ 2 levels 1,2: 1 1 2 2 1 1 2 2 1 1 ... $ y: num 0.6396 1.467 1.8403 -0.0915 0.2711 ... de - within(dd, { + a - as.numeric(as.character(a)) + b - as.numeric(as.character(b)) +} ) str(de) 'data.frame': 20 obs. of 3 variables: $ a: num 1 1 1 1 2 2 2 2 3 3 ... $ b: num 1 1 2 2 1 1 2 2 1 1 ... $ y: num 0.6396 1.467 1.8403 -0.0915 0.2711 ... HTH, Dennis On Sat, Jun 4, 2011 at 9:31 PM, Robert A. LaBudde r...@lcfltd.com wrote: I have a data frame: head(df) Time Temp Conc ReplLog10 10 -20H1 6.406547 22 -20H1 5.738683 37 -20H1 5.796394 4 14 -20H1 4.413691 504H1 6.406547 774H1 5.705433 str(df) 'data.frame': 177 obs. of 5 variables: $ Time : Factor w/ 4 levels 0,2,7,14: 1 2 3 4 1 3 4 1 3 4 ... $ Temp : Factor w/ 4 levels -20,4,25,..: 1 1 1 1 2 2 2 3 3 3 ... $ Conc : Factor w/ 3 levels H,L,M: 1 1 1 1 1 1 1 1 1 1 ... $ Repl : Factor w/ 5 levels 1,2,3,4,..: 1 1 1 1 1 1 1 1 1 1 ... $ Log10: num 6.41 5.74 5.8 4.41 6.41 ... levels(df$Temp) [1] -20 4 25 45 levels(df$Time) [1] 0 2 7 14 As you can see, Time and Temp are currently factors, not numeric. I would like to change these columns into numerical ones. df$Time- as.numeric(df$Time) doesn't work, as it changes to the factor level indices (1,2,3,4) instead of the values (0,2,7,14). There must be a direct way of doing this in R. I tried recode() in 'car': df$Temp- recode(df$Temp, '1=-20;2=25;3=4;4=45',as.factor.result=FALSE) head(df) Time Temp Conc Repl Freq 10 -20H1 6.406547 22 -20H1 5.738683 37 -20H1 5.796394 4 14 -20H1 4.413691 50 45H1 6.406547 77 45H1 5.705433 but note that the values for 'Temp' in rows 5 and 7 are 45 and not 4, as expected, although the result is numeric. The same happens if I use the order given by levels(df$Temp) instead of the sort order in the recode() 2nd argument. Any hints? Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: r...@lcfltd.com Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: r...@lcfltd.com Least Cost Formulations, Ltd.URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239Fax: 757-467-2947 Vere scire est per causas scire __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.