[R] summary() after changing contrasts
Hi, After running a regression on a factor variable, summary() reports the coefficients 'nicely,' ie, labelled with a string that is a concatenation of the variable name and the factor label. However, changing the base case a la contrasts(variable)-contr.treatment(N, base=x) results in the coefficients being reported as a less-helpful concatenation of variable name plus a digit. Of course, it's possible to map the digit to the appropriate factor label, but I'm wondering if there's an easy fix... -Dan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RODBC with MySQL sees tables, but queries return zero rows
Hi All, I'm using RODBC to tap into MySQL on a remote server. It appears like the connection is successful: I can see all tables and columns in my database. However, queries return zero lines, including queries I've verified as functional and non-empty by entering them directly in MySQL. I granted myself all privileges to the database, as per http://www.actualtech.com/mysql_remote.php, by entering this into mysql: GRANT ALL ON your_database_name.* TO your_user_id@'%' IDENTIFIED BY 'your_password'; I've opened ports on my server, as per http://forums.systeminetwork.com/isnetforums/showthread.php?t=42086 My setup: mysql Ver 14.12 Distrib 5.0.77, for redhat-linux-gnu (x86_64) using readline 5.1 R 2.10.0 GUI 1.30 Leopard build 32-bit (5511) Mac OS X 10.6.7 MySQL/PostgreSQL/SQLite driver purchased from http://www.actualtechnologies.com/products.php, as per the RODBC documentation Any tips? Thanks in advance, Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] symmetric ( square) contingency table from dataset of unordered pairs
Hi Everybody, I have a data set in which each observation has a pair of students, with each kid id'd by a 4 digit number: head(PAIRS) student1 student2 2 2213 2200 4 2198 2195 5 2199 2191 6 2229 2221 7 2247 2249 8 2250 2263 There is no significance to student1 vs. student2: they are just a pair, and the variable names could be flipped without loss of meaning. I want a symmetric, square contingency table with entry(i,j) = number of times students i and j are paired together. BUT because some students appear only in student1 and others only in student2, table() produces rectangular, asymmetric tables; row.names != col.names. I can't figure out how to get R to ignore treat the observations as unordered pairs. Currently the student ids are numeric; is the solution to treat them as factors, and ensure that the set of levels for each factor is identical? I have a kludgey hack - stack the PAIRS dataset on top of a reversed (student2=student1 and vice versa) version of itself, then use table(). But I'm wondering if there's a more elegant way. Thanks, Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Optimal Y=q cutoff after logistic regression
Hi, I understand that dichotimization of the predicted probabilities after logistic regression is philosophically questionable, throwing out information, etc. But I want to do it anyway. I'd like to include as a measure of fit % of observations correctly classified because it's measured in units that non-statisticians can understand more easily than area under the ROC curve, Dxy, etc. Am I right that there is an optimal Y=q probability cutoff, at which the True Positive Rate is high and the False Positive Rate is low? Visually, it would be the elbow in the ROC curve, right? My reasoning is that even if you had a near-perfect model, you could set a stupidly low (high) cutoff and have a higher false positive (negative) rate than would be optimal. I know the standard default or starting point is Y=.5, but if my above reasoning is correct, there ought to be an optimal cutoff for a given model. Is there an easy way to determine that cutoff in R without writing my own script to iterate through possible breakpoints and calculating classification accuracy at each one? Thanks in advance. -Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cv.lm only bivariate; other options for prediction intervals
Hi Folks, I have a pretty simple problem: after building a multivariate linear model, I need to report my 95% confidence interval for predictions based on future observations. I tried doing K-fold cross validation using cv.lm() from the DAAG package, but it currently only uses the first independent variable in the provided linear model. No warnings or anything. There was a thread about this here about 3 years ago but the person got no response. Does anyone know of a working alternative? Thanks, Dan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cv.lm() broken; cross validation vs. predict(interval=prediction)
repost because previous attempt was not plain text, sorry! Hi Folks, I have a pretty simple problem: after building a multivariate linear model, I need to report my 95% confidence interval for predictions based on future observations. I know that one option is to use predict(interval=prediction) but I'm curious about less parametric ways to get an estimate. I tried doing K-fold cross validation using cv.lm() from the DAAG package, but it currently only uses the first independent variable in the provided linear model. No warnings or anything. There was a thread about this here about 3 years ago but the person got no response. Does anyone know of a working alternative? Thanks, Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] is get() really what I want here?
# Let's say I have 5 objects, object_1, object_2, etc. for (i in 1:5) { assign(paste(object_,i, sep=), i+500) } # Now, for whatever reason, I don't know the names of the objects I've created, but I want to operate on them. list-ls(pattern=^obj) #Is get best? for (l in list) { cat(\n, l, is, get(l), sep= ) } Is get() the correct command to use in this situation? What if rather than just an integer, object_1 etc are large arrays - does that change the answer, for speed reasons? Thanks in advance, Dan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] is get() really what I want here?
Hi Josh, What I'm really trying to do is to refer to objects whose names I have stored in a vector. This example was arbitrary. I do a lot of looping through files in the working directory, or through objects in the namespace, and I'm confused about how best to call upon them from within a loop. Thanks, Dan On Wed, Oct 20, 2010 at 4:46 PM, Joshua Wiley jwiley.ps...@gmail.comwrote: Hi Daniel, get() will work for any object, but cat() may not. cat() should work for arrays, but it will be messy even for relatively small ones. For example, run: cat(Hello, array(1:100, dim = c(10, 10)), sep = ) What are you really trying to do? If you are just trying to figure out what random variables in your workspace you've assigned but do not know/forgot what they are, consider: ls.str(pattern=^obj) as a better way to get their names and some useful summaries (including class and number of observations). HTH, Josh On Tue, Oct 19, 2010 at 10:29 PM, Daniel Weitzenfeld dweitzenf...@gmail.com wrote: # Let's say I have 5 objects, object_1, object_2, etc. for (i in 1:5) { assign(paste(object_,i, sep=), i+500) } # Now, for whatever reason, I don't know the names of the objects I've created, but I want to operate on them. list-ls(pattern=^obj) #Is get best? for (l in list) { cat(\n, l, is, get(l), sep= ) } Is get() the correct command to use in this situation? What if rather than just an integer, object_1 etc are large arrays - does that change the answer, for speed reasons? Thanks in advance, Dan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.