Re: [R] The variables combined in a table from other table and combination questions

2007-09-06 Thread Stephen Weigand
On 9/5/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Dear All:
> I need to have some data frame objects.
> First aa object:
>pH  Formulation  time  Subject
> [1]1.2  F   0   1
> [2]7.4  S   1   2
> [3]MF   2   3
> [4] 3   4
> [5] ni
> Then, I need to produce 2*3(pH*formulation) different
> tables.  This table includes column of (pH,
> Formulation, time  S1  S2  S3 …..Si) and S1= subject
> 1, S2=subject 2 and so on.  For example: bb1 table
>pH  Formulation  time  S1  S2  S3….Si
> [1]1.2  F   0
> [2] 1
> [3] 2
> [4] 3
> [5] n
>
> For example: bb2 table
>pH  Formulation  time  S1  S2  S3….Si
> [1]1.2  S   0
> [2] 1
> [3] 2
> [4] 3
> [5] n
>
>
> Moreover, the values of pH and Formulation column are
> the combination questions.  The values of pH and
> Formulation column should be the combinations such as
> (1.2, F), (1.2, S), (1.2, MF), (7.4, F), (7.4, S),
> (7.4, MF)
> I am a beginner level in R and I have no idea how to
> do this. Could any one please help me.  Thanks a
> lot!!!
>
> Best regrards
> Hsin Ya Lee
>

I don't understand exactly what you want but perhaps start with this:

expand.grid(pH = c(1.2, 7.4), Formulation = c("F", "S", "MF"))

Hope this helps,

Stephen

-- 
Rochester, Minn. USA

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "subscript out of bounds" Error in predict.naivebayes

2007-08-28 Thread Stephen Weigand
On 8/22/07, Polly He <[EMAIL PROTECTED]> wrote:
> I'm trying to fit a naive Bayes model and predict on a new data set using
> the functions naivebayes and predict (package = e1071).
>
> R version 2.5.1 on a Linux machine
>
> My data set looks like this. "class" is the response and k1 - k3 are the
> independent variables. All of them are factors. The response has 52 levels
> and k1 - k3 have 2-6 levels. I have about 9,300 independent variables but
> omit the long list here for simple demonstration. There are no missing
> values in the observations.
>
>class k1 k2 k3
>   1  0  0  1
>   8  0  0  0
>
> # model fitting, I also tried setting laplace=0 but didn't help
>  nbmodel <- naiveBayes(class~., data=train, laplace=1)
>
> # predict
>  nb.fit <- predict(nbmodel, x.test[,-1])
>
> First I had no trouble fitting the model. R also returned the predictions
> for some of my large data sets. But for some data sets, R can fit the model
> (no error message, nb.model$tables look ok). When I invoked the predict
> function, it kept giving me the following message:
>
> # my data set has 1 response variable and 9318 independent variables
> Error in FUN(1:9319[[4L]], ...) : subscript out of bounds
[...]

In my experience, some predict methods have trouble when
newdata does not have all levels of a factor. This seems
to be the case with predict.naiveBayes:

example(naiveBayes)
predict(model, subset(HouseVotes84, V1 == "n"))

gives

Error in object$tables[[v]] : subscript out of bounds

One workaround is to predict for a "bigger" data set
and retain a subset of the predictions.

Hope this helps,

Stephen


-- 
Rochester, Minn. USA

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Different fonts on different axes

2007-05-31 Thread Stephen Weigand
There's also this approach

plot(runif(10), ylab=list("Red, Bold?", col = "red", font = 2),
xlab="Black, standard?")


On 5/31/07, Greg Snow <[EMAIL PROTECTED]> wrote:
> Try this:
>
> > plot(runif(10), ylab="", xlab="Black, standard?")
> > mtext('Red, Bold', side=2, line=3, col='red', font=2)
>
> Hope this helps,
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> [EMAIL PROTECTED]
> (801) 408-8111
>
>
>
> > -Original Message-
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of Martin
> > Henry H. Stevens
> > Sent: Thursday, May 31, 2007 11:00 AM
> > To: R-Help
> > Subject: [R] Different fonts on different axes
> >
> > Hi Folks,
> > How do I get red bold font on my y axis and black standard
> > font on my x axis?
> >
> > plot(runif(10), ylab="Red, Bold?", xlab="Black, standard?")
> >
> > Any pointers or examples would be great.
> > Thanks!
> > Hank
> >
> >
> > Dr. Hank Stevens, Assistant Professor
> > 338 Pearson Hall
> > Botany Department
> > Miami University
> > Oxford, OH 45056
> >
> > Office: (513) 529-4206
> > Lab: (513) 529-4262
> > FAX: (513) 529-4243
> > http://www.cas.muohio.edu/~stevenmh/
> > http://www.muohio.edu/ecology/
> > http://www.muohio.edu/botany/
> >
> > "E Pluribus Unum"
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Rochester, Minn. USA

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lme with corAR1 errors - can't find AR coefficient in output

2007-05-25 Thread Stephen Weigand
Millo,

On 5/24/07, Millo Giovanni <[EMAIL PROTECTED]> wrote:

> Dear List,
>
> I am using the output of a ML estimation on a random effects model with
> first-order autocorrelation to make a further conditional test. My model
> is much like this (which reproduces the method on the famous Grunfeld
> data, for the econometricians out there it is Table 5.2 in Baltagi):
>
> library(Ecdat)
> library(nlme)
> data(Grunfeld)
> mymod<-lme(inv~value+capital,data=Grunfeld,random=~1|firm,correlation=co
> rAR1(0,~year|firm))
>
> Embarrassing as it may be, I can find the autoregressive parameter
> ('Phi', if I get it right) in the printout of summary(mymod) but I am
> utterly unable to locate the corresponding element in the lme or
> summary.lme objects.
>
> Any help appreciated. This must be something stupid I'm overlooking,
> either in str(mymod) or in the help files, but it's a huge problem for
> me.
>
>

Try

coef(mymod$model$corStruct,
   unconstrained = FALSE)

Stephen
-- 
Rochester, Minn. USA

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum of specific column

2007-04-25 Thread Stephen Weigand
On 4/25/07, Spilak,Jacqueline [Edm] <[EMAIL PROTECTED]> wrote:
> I have a data set that I have imported (not sure if that makes a
> difference) and I would like to calculate the sum of only specific
> columns.  I have tried
> >colSums(dataset, by=list(dataset$col5), dims=1) and I get an error of
> unused arguments

The error message is helpful: there is no 'by' argument to colSums.
You'll just get column sums over all rows.

> I have also tried
> >aggregate(dataset, by=list(dataset$col5), sum) and I get the error that
> sum is not meaningful for factors.

Instead of giving aggregate the whole dataset, you can specify certain
columns via dataset[, c(1,5)] or dataset[, c("height", "weight")].

>
> I want to only calculate the sum for specific columns because some of
> the columns have words in them and I have not been able to find anything
> else that would help or why these errors are occuring.
> Jacquie
>

Good luck,

Stephen

-- 
Rochester, Minn. USA

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems in programming a simple likelihood

2007-04-18 Thread Stephen Weigand
Deepankar,

Some general advice from a non-expert:

Write your likelihoods without a for loop. This is important because
the likelihood is evaluated multiple times in the maximization process
and you don't want to be looping looping looping ...

Always try multiple starting values

Sometimes it helps to try different optimization functions (e.g., optim)

Make sure your likelihood is correct. Check it against existing
software if possible

If necessary, simplify your model, to a single parameter even, and
build it up from there

Generate data under the model and see if your estimates are getting
close to the truth

Good luck,
Stephen
Rochester, Minn. USA

On 4/18/07, Deepankar Basu <[EMAIL PROTECTED]> wrote:
> As part of carrying out a complicated maximum likelihood estimation, I
> am trying to learn to program likelihoods in R. I started with a simple
> probit model but am unable to get the code to work. Any help or
> suggestions are most welcome. I give my code below:
>
> 
> mlogl <- function(mu, y, X) {
> n <- nrow(X)
> zeta <- X%*%mu
> llik <- 0
> for (i in 1:n) {
>   if (y[i]==1)
>llik <- llik + log(pnorm(zeta[i,], mean=0, sd=1))
>   else
>llik <- llik + log(1-pnorm(zeta[i,], mean=0, sd=1))
> }
> return(-llik)
> }
>
> women <- read.table("~/R/Examples/Women13.txt", header=TRUE)  # DATA
>
> # THE DATA SET CAN BE ACCESSED HERE
> # women <-
> read.table("http://wps.aw.com/wps/media/objects/2228/2281678/Data_Sets/ASCII/Women13.txt";,
>  header=TRUE)
> # I HAVE CHANGED THE NAMES OF THE VARIABLES
> # J is changed to "work"
> # M is changed to "mar"
> # S is changed to "school"
>
> attach(women)
>
> # THE VARIABLES OF USE ARE
> #   work: binary dependent variable
> #   mar: whether married or not
> #   school: years of schooling
>
> mu.start <- c(3, -1.5, 10)
> data <- cbind(1, mar, school)
> out <- nlm(mlogl, mu.start, y=work, X=data)
> cat("Results", "\n")
> out$estimate
>
> detach(women)
>
> *
>
> When I try to run the code, this is what I get:
>
> > source("probit.R")
> Results
> Warning messages:
> 1: NA/Inf replaced by maximum positive value
> 2: NA/Inf replaced by maximum positive value
> 3: NA/Inf replaced by maximum positive value
> 4: NA/Inf replaced by maximum positive value
>
> Thanks in advance.
> Deepankar
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mac ghostscript help

2007-04-15 Thread Stephen Weigand
Erin,

Here's one way to try to find where Ghostscript is on your system:

R> system("which gs")

On my Mac it's in

/usr/local/bin/gs

So I would use

R> Sys.putenv(R_GSCMD="/usr/local/bin/gs")

Hope this helps,

Stephen
Rochester, Minn., USA

On 4/15/07, Erin Berryman <[EMAIL PROTECTED]> wrote:
> Hello R community,
>
> I am hoping to use a new package that I just installed called
> "grImport" to import ps images into R for further manipulation using
> either base graphics or grid.  I downloaded the most recent version
> of Ghostscript from http://www.cs.wisc.edu/~ghost/ that I could find
> (v.8.56) for Mac OSX. However, I am apparently quite ignorant about
> how the required Ghostscript software works.
> When I run PostScriptTrace after loading the grImport package, I get
> this response:
>
>  > PostScriptTrace("/Users/erinberryman/Documents/data.ps")
>
> Error in PostScriptTrace("/Users/erinberryman/Documents/data.ps") :
> status 256 in running command 'gs -q -dBATCH -dNOPAUSE -
> sDEVICE=pswrite -sOutputFile=/dev/null -sstdout=data.ps.xml
> capturedata.ps'
> ESP Ghostscript 7.07.1: Unrecoverable error, exit code 1
>
> I am confused by the mention of a Ghostscript 7.07.1, because that is
> not the version that I have installed on my computer. After running a
> search of the R archives, I began to think that maybe I needed to
> tell R where my Ghostscript is, so I began to look for it. Now here
> is where I feel I am missing some key concept, because I find a
> folder called "ghostscript-8.56" that contains many files and folders
> with more files, and I have no idea which file is THE ghostscript
> that grImport wants to use. I used the following code to try to
> direct R to the correct folder for Ghostscript:
>
>  > Sys.putenv(R_GSCMD="/Users/erinberryman/ghostscript-8.56")
>  > PostScriptTrace("/Users/erinberryman/Documents/data.ps")
>
> Error in PostScriptTrace("/Users/erinberryman/Documentsdata.ps") :
> status 32256 in running command '/Users/erinberryman/
> ghostscript-8.56 -q -dBATCH -dNOPAUSE -sDEVICE=pswrite -sOutputFile=/
> dev/null -sstdout=data.ps.xml capturedata.ps'
> /bin/sh: line 1: /Users/erinberryman/ghostscript-8.56: is a directory
>
> Right, it is a directory indeed, but I do not know which specific
> file to give, since there are so many of them in that Ghostscript
> install.
>
> Hopefully there is a solution that a non-computer whiz like me can
> handle?
>
> Thank you,
>
> Erin
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Putting 2 breaks on Y axis

2007-04-12 Thread Stephen Weigand
On 4/12/07, Marc Schwartz <[EMAIL PROTECTED]> wrote:
> On Thu, 2007-04-12 at 13:41 -0500, Inman, Brant A. M.D. wrote:
> > R plotting experts:
> >
> > I have a bivariate dataset composed of 300 (x,y) continuous datapoints.
> > 297 of these points are located within the y range of [0,10], while 2
> > are located at 20 and one at 55.  No coding errors, real outliers.
> >
> > When plotting these data with a scatterplot, I obviously have a problem.
> > If I plot the full dataset with ylim = c(0,55), then I cannot see the
> > structure in the data in the [0, 10] range.  If I truncate the y axis
> > with ylim = c(0,10), then I cannot see the 3 outliers.  If I break the y
> > axis from 10 to 20 (using plotrix functions), I still do not see the
> > data optimally because of the white space from y=20 to y=55.
> >
> > What I would like to do is break the y axis at 2 points, roughly 10-20
> > and 20-55. Is there a function that can break an axis in 2 places?
> >
> > Thanks in advance for any suggestions.
> >
> > Brant
>
>
> Brant,
>
> I am not a particular fan of broken axes (though others will disagree),
> much less two breaks.
>
> Presuming that your data might look something like this:
>
> http://www.itl.nist.gov/div898/handbook/eda/section3/scattera.htm
>
> A couple of thoughts:
>
> 1. Not being sure if your data range above actually includes 0, you may
> want to consider a log scaled axis, if not.
>
> 2. I might be tempted to use two plots:
>
>   A. A first a plot of the entire data set, showing the 3 outliers
>
>   B. A second plot of the 297 pairs with axes constrained to the
>  appropriate ranges to enable better visualization of the data
>  structure.
>
> If number 2 is more appropriate, you could also use par("mfcol") to set
> up side by side plots. See ?par.
>
> HTH,
>
> Marc Schwartz
>

I was thinking plot the data without the outliers and include a
smaller inscribed plot in a corner showing all the data (the "global
view"). But I couldn't figure out how to do this.

(I think legend() works very hard to do this type of thing.)

Stephen

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random number from density()

2007-04-02 Thread Stephen Weigand
On 4/2/07, Arianne ALBERT <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I'm writing some genetic simulations in R where I would like to place
> genes along a chromosome proportional to the density of markers in a
> given region. For example, a chromosome can be presented as a list of
> marker locations:
>
> Chr1<-c(0, 6.5, 17.5, 26.2, 30.5, 36.4, 44.8, 45.7, 47.8, 48.7, 49.2,
> 50.9, 52.9, 54.5, 56.5, 58.9, 61.2, 64.1)
>
> Where the numbers refer to the locations of markers along the
> chromosome. As you can see, there are a lot of markers around 50, but
> they are less dense elsewhere. I would like to take this information to
> randomly select a location on the chromosome to put a gene, where we are
> more likely to pick a location with high marker density (instead of
> using a uniform distribution between 0 and 64.1).
>
> So far I've used density(Chr1) to generate the probability density, but
> can this also be used to generate a random number? All the help I can
> find suggests getting the quantile function (the reciprocal of the
> integral of the pdf), however, it doesn't seem as though density()
> returns a pdf that can be manipulated in this way. Any suggestions?
>
> Thanks in advance,
>
> Arianne
>
This thread may be helpful:

http://tolstoy.newcastle.edu.au/R/e2/help/07/01/9139.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] what is the difference between survival analysis and logistic regression with a timing variable?

2007-03-27 Thread Stephen Weigand
On 3/27/07, zhongmiao wang <[EMAIL PROTECTED]> wrote:
> Hello:
>
> If the question is how likely an event will occur at a give time point, can
> we use logistic regression with time t as a predictor variable? For example,
> if the data is
> ID   Gender  Tenure Churn
> 1   M17 0
> 2   M3   1
> 3   M6   0
> 4   F 10 1
> 5   F 9   0
> 6   F 20 1
>
> We want to predict the likelihood that an insurance policy holder will churn
> at a given tenure, can we build the model as:
>
> logit (churn=1)=b0+b1*Gender+b2*tenure?
>
> or we have to use survival analysis for discrete time? Thank you.
>
> Best Regards
> Zhongmiao Wang
> Senior Analyst
> RMG Connect
>
>

I'd guess you've got censored data so survival analysis
is more appropriate.

If you've followed a customer for only four months
and she hasn't churned (switched companies?) yet,
you only know her time to churning is > 4 months,
but not exactly how long it is. So you need survival
(time to event) methods that account for this partial
information.

Hope this helps,

Stephen
Rochester, MN

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.