[R] summary() after changing contrasts

2012-08-09 Thread Daniel Weitzenfeld
Hi,
After running a regression on a factor variable, summary() reports the
coefficients 'nicely,' ie, labelled with a string that is a concatenation
of the variable name and the factor label.
However, changing the base case a la

contrasts(variable)-contr.treatment(N, base=x)

results in the coefficients being reported as a less-helpful concatenation
of variable name plus a digit.

Of course, it's possible to map the digit to the appropriate factor label,
but I'm wondering if there's an easy fix...
-Dan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RODBC with MySQL sees tables, but queries return zero rows

2011-04-14 Thread Daniel Weitzenfeld
Hi All,
I'm using RODBC to tap into MySQL on a remote server.  It appears like
the connection is successful:  I can see all tables and columns in my
database.  However, queries return zero lines, including queries I've
verified as functional and non-empty by entering them directly in
MySQL.

I granted myself all privileges to the database, as per
http://www.actualtech.com/mysql_remote.php, by entering this into
mysql:
GRANT ALL ON your_database_name.* TO your_user_id@'%' IDENTIFIED BY
'your_password';

I've opened ports on my server, as per
http://forums.systeminetwork.com/isnetforums/showthread.php?t=42086

My setup:
mysql  Ver 14.12 Distrib 5.0.77, for redhat-linux-gnu (x86_64) using
readline 5.1
R 2.10.0 GUI 1.30 Leopard build 32-bit (5511)
Mac OS X 10.6.7
MySQL/PostgreSQL/SQLite driver purchased from
http://www.actualtechnologies.com/products.php, as per the RODBC
documentation

Any tips?
Thanks in advance,
Dan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] symmetric ( square) contingency table from dataset of unordered pairs

2011-03-24 Thread Daniel Weitzenfeld
Hi Everybody,
I have a data set in which each observation has a pair of students,
with each kid id'd by a 4 digit number:

 head(PAIRS)
  student1 student2
2 2213 2200
4 2198 2195
5 2199 2191
6 2229 2221
7 2247 2249
8 2250 2263

There is no significance to student1 vs. student2:  they are just a
pair, and the variable names could be flipped without loss of
meaning.

I want a symmetric, square contingency table with entry(i,j) = number
of times students i and j are paired together.
BUT because some students appear only in student1 and others only in
student2, table() produces rectangular, asymmetric tables; row.names
!= col.names.  I can't figure out how to get R to ignore treat the
observations as unordered pairs.

Currently the student ids are numeric; is the solution to treat them
as factors, and ensure that the set of levels for each factor is
identical?

I have a kludgey hack - stack the PAIRS dataset on top of a reversed
(student2=student1 and vice versa) version of itself, then use
table().  But I'm wondering if there's a more elegant way.

Thanks,
Dan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Optimal Y=q cutoff after logistic regression

2011-02-13 Thread Daniel Weitzenfeld
Hi,

I understand that dichotimization of the predicted probabilities after
logistic regression is philosophically questionable, throwing out
information, etc.

But I want to do it anyway.  I'd like to include as a measure of fit %
of observations correctly classified because it's measured in units
that non-statisticians can understand more easily  than area under the
ROC curve, Dxy, etc.

Am I right that there is an optimal Y=q probability cutoff, at which
the True Positive Rate is high and the False Positive Rate is low?
Visually, it would be the elbow in the ROC curve, right?
My reasoning is that even if you had a near-perfect model, you could
set a stupidly low (high) cutoff and have a higher false positive
(negative) rate than would be optimal.

I know the standard default or starting point is Y=.5, but if my
above reasoning is correct, there ought to be an optimal cutoff for a
given model.  Is there an easy way to determine that cutoff in R
without writing my own script to iterate through possible breakpoints
and calculating classification accuracy at each one?

Thanks in advance.
-Dan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cv.lm only bivariate; other options for prediction intervals

2010-10-22 Thread Daniel Weitzenfeld
Hi Folks,
I have a pretty simple problem: after building a multivariate linear model,
I need to report my 95% confidence interval for predictions based on future
observations.

I tried doing K-fold cross validation using cv.lm() from the DAAG package,
but it currently only uses the first independent variable in the provided
linear model.  No warnings or anything.  There was a thread about this here
about 3 years ago but the person got no response.  Does anyone know of a
working alternative?

Thanks,
Dan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cv.lm() broken; cross validation vs. predict(interval=prediction)

2010-10-22 Thread Daniel Weitzenfeld
 repost because previous attempt was not plain text, sorry! 

Hi Folks,
I have a pretty simple problem: after building a multivariate linear model,
I need to report my 95% confidence interval for predictions based on future
observations.

I know that one option is to use predict(interval=prediction) but
I'm curious about less parametric ways to get an estimate.

I tried doing K-fold cross validation using cv.lm() from the DAAG package,
but it currently only uses the first independent variable in the provided
linear model.  No warnings or anything.  There was a thread about this here
about 3 years ago but the person got no response.  Does anyone know of a
working alternative?

Thanks,
Dan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] is get() really what I want here?

2010-10-19 Thread Daniel Weitzenfeld
# Let's say I have 5 objects, object_1, object_2, etc.
for (i in 1:5) {
assign(paste(object_,i, sep=), i+500)
}

# Now, for whatever reason, I don't know the names of the objects I've
created, but I want to operate on them.
list-ls(pattern=^obj)

#Is get best?
for (l in list) {
cat(\n, l, is, get(l), sep= )
}

Is get() the correct command to use in this situation?  What if rather than
just an integer, object_1 etc are large arrays - does that change the
answer, for speed reasons?

Thanks in advance,
Dan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] is get() really what I want here?

2010-10-19 Thread Daniel Weitzenfeld
Hi Josh,
What I'm really trying to do is to refer to objects whose names I have
stored in a vector.  This example was arbitrary.
I do a lot of looping through files in the working directory, or through
objects in the namespace, and I'm confused about how best to call upon them
from within a loop.
Thanks,
Dan

On Wed, Oct 20, 2010 at 4:46 PM, Joshua Wiley jwiley.ps...@gmail.comwrote:

 Hi Daniel,

 get() will work for any object, but cat() may not.  cat() should work
 for arrays, but it will be messy even for relatively small ones.  For
 example, run:
 cat(Hello, array(1:100, dim = c(10, 10)), sep =  )

 What are you really trying to do?  If you are just trying to figure
 out what random variables in your workspace you've assigned but do not
 know/forgot what they are, consider:

 ls.str(pattern=^obj)

 as a better way to get their names and some useful summaries
 (including class and number of observations).

 HTH,

 Josh

 On Tue, Oct 19, 2010 at 10:29 PM, Daniel Weitzenfeld
 dweitzenf...@gmail.com wrote:
  # Let's say I have 5 objects, object_1, object_2, etc.
  for (i in 1:5) {
 assign(paste(object_,i, sep=), i+500)
  }
 
  # Now, for whatever reason, I don't know the names of the objects I've
  created, but I want to operate on them.
  list-ls(pattern=^obj)
 
  #Is get best?
  for (l in list) {
 cat(\n, l, is, get(l), sep= )
  }
 
  Is get() the correct command to use in this situation?  What if rather
 than
  just an integer, object_1 etc are large arrays - does that change the
  answer, for speed reasons?
 
  Thanks in advance,
  Dan
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 http://www.joshuawiley.com/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.