from:"Paul Johnson"

Re: [R] What solve() does?

2010-09-04 Thread Paul Johnson

On Wed, Sep 1, 2010 at 5:36 AM, Petar Milin pmi...@ff.uns.ac.rs wrote:
 Hello!
 Can anyone explain me what solve() function does: Gaussian elimination or
 iterative, numeric solve? In addition, I would need both the Gaussian
 elimination and iterative solution for the course. Are the two built in R?

 Thanks!

 PM

Hello, Petar:

I think you are assuming that solve uses an elementary linear algebra
paper and pencil procedure, but I don't think it does.  In a digital
computer, those things are not precise, and I think the folks here
will even say you shouldn't use solve to get an inverse, but I can't
remember all of the details.

To see how solve works ...

Let me show you a trick I just learned. Read

?solve

notice it is a generic method, meaning it does not actually do the
calculations for you. Rather, there are specific implementations for
different types of cases. To find the implementations, run

methods(solve)

I get:

 methods(solve)
[1] solve.default solve.qr

Then if you want to read HOW solve does what it does (which I think
was your question), run this:

 solve.default

or

 solve.qr

In that code, you will see the chosen procedure depends on the linear
algebra libraries you make available.  I'm no expert on the details,
but it appears QR decomposition is the preferred method.  You can read
about that online or in numerical algebra books.



-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Please explain do.call in this context, or critique to stack this list faster

2010-09-04 Thread Paul Johnson

I've been doing some consulting with students who seem to come to R
from SAS.  They are usually pre-occupied with do loops and it is tough
to persuade them to trust R lists rather than keeping 100s of named
matrices floating around.

Often it happens that there is a list with lots of matrices or data
frames in it and we need to stack those together.  I thought it
would be a simple thing, but it turns out there are several ways to
get it done, and in this case, the most elegant way using do.call is
not the fastest, but it does appear to be the least prone to
programmer error.

I have been staring at ?do.call for quite a while and I have to admit
that I just need some more explanations in order to interpret it.  I
can't really get why this does work

do.call( rbind, mylist)

but it does not work to do

sapply ( mylist, rbind).

Anyway, here's the self contained working example that compares the
speed of various approaches.  If you send yet more ways to do this, I
will add them on and then post the result to my Working Example
collection.

## stackMerge.R
## Paul Johnson pauljohn at ku.edu
## 2010-09-02


## rbind is neat,but how to do it to a lot of
## data frames?

## Here is a test case

df1 - data.frame(x=rnorm(100),y=rnorm(100))
df2 - data.frame(x=rnorm(100),y=rnorm(100))
df3 - data.frame(x=rnorm(100),y=rnorm(100))
df4 - data.frame(x=rnorm(100),y=rnorm(100))

mylist -  list(df1, df2, df3, df4)

## Usually we have done a stupid
## loop  to get this done

resultDF - mylist[[1]]
for (i in 2:4) resultDF - rbind(resultDF, mylist[[i]])

## My intuition was that this should work:
## lapply( mylist, rbind )
## but no! It just makes a new list

## This obliterates the columns
## unlist( mylist )

## I got this idea from code in the
## complete function in the mice package
## It uses brute force to allocate a big matrix of 0's and
## then it places the individual data frames into that matrix.

m - 4
nr - nrow(df1)
nc - ncol(df1)
dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
for (j in  1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]]



## I searched a long time for an answer that looked better.
## This website is helpful:
## http://stackoverflow.com/questions/tagged/r
## I started to type in the question and 3 plausible answers
## popped up before I could finish.

## The terse answer is:
shortAnswer - do.call(rbind,mylist)

## That's the right answer, see:

shortAnswer == dataComplete
## But I don't understand why it works.

## More importantly, I don't know if it is fastest, or best.
## It is certainly less error prone than dataComplete

## First, make a bigger test case and use system.time to evaluate

phony - function(i){
  data.frame(w=rnorm(1000), x=rnorm(1000),y=rnorm(1000),z=rnorm(1000))
}
mylist - lapply(1:1000, phony)


### First, try the terse way
system.time( shortAnswer - do.call(rbind, mylist) )


### Second, try the complete way:
m - 1000
nr - nrow(df1)
nc - ncol(df1)

system.time(
   dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
 )

system.time(
   for (j in  1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]]
)


## On my Thinkpad T62 dual core, the shortAnswer approach takes about
## three times as long:


##  system.time( bestAnswer - do.call(rbind,mylist) )
##user  system elapsed
##  14.270   1.170  15.433

##  system.time(
## +dataComplete - as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
## +  )
##user  system elapsed
##   0.000   0.000   0.006

##  system.time(
## + for (j in  1:m) dataComplete[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]]
## + )
##user  system elapsed
##   4.940   0.050   4.989


## That makes the do.call way look slow, and I said hey,
## our stupid for loop at the beginning may not be so bad.
## Wrong. It is a disaster.  Check this out:


##  resultDF - phony(1)
##  system.time(
## + for (i in 2:1000) resultDF - rbind(resultDF, mylist[[i]])
## +)
##user  system elapsed
## 159.740   4.150 163.996


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Save data as .pdf or .JPG

2010-09-04 Thread Paul Johnson

On Wed, Sep 1, 2010 at 7:56 AM, khush  bioinfo.kh...@gmail.com wrote:
 Hi all ,

 I have following script to plot some data.

 plot( c(1,1100), c(0,15), type='n', xlab='', ylab='', ylim=c(0.1,25) ,
 las=2)
 axis (1, at = seq(0,1100,50), las =2)
 axis (2, at = seq(0,25,1), las =2)

 When I source(script.R), I got the image on interface but I do not want to
 use screenshot option to save the image? How can save the output to .pdf or
 .jpg format?

 Thank you
 Khushwant

Hi!  This is one of the things that is difficult for newcomers. I've
written down a pretty thorough answer:

http://pj.freefaculty.org/R/Rtips.html#5.2


pj
-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] coxph and ordinal variables?

2010-09-08 Thread Paul Johnson

run it with factor() instead of ordered().  You don't want the
orthogonal polynomial contrasts that result from ordered if you need
to compare against Stata.

I attach an R program that I wrote to explore ordered factors a while
agol I believe this will clear everything up if you study the
examples.

pj

On Wed, Sep 8, 2010 at 5:43 PM, Min-Han Tan minhan.scie...@gmail.com wrote:
 Dear R-help members,

 Apologies - I am posting on behalf of a colleague, who is a little puzzled
 as STATA and R seem to be yielding different survival estimates for the same
 dataset when treating a variable as ordinal. Ordered() is used  to represent
 an ordinal variable) I understand that R's coxph (by default) uses the Efron
 approximation, whereas STATA uses (by default) the Breslow. but we did
 compare using the same approximations. I am wondering if this is a result of
 how coxph manages an ordered factor?

 Essentially, this is a survival dataset using tumor grade (1, 2, 3 and 4) as
 the risk factor. This is more of an 'ordinal' variable, rather than a
 continuous variable. For the same data set of 399 patients, when treating
 the vector of tumor grade as a continuous variable (range of 1 to 4),
 testing the Efron and the Breslow approximations yield the same result in
 both R and STATA.

 However, when Hist_Grade_4 grp is converted into an ordered factor using
 ordered(), and the same scripts are applied, rather different results are
 obtained, relative to the STATA output. This is tested across the different
 approximations, with consistent results. The comparison using Efron
 approximation and ordinal data is is below.

 Your advice is very much appreciated!

 Min-Han

 Apologies below for the slightly malaligned output.

 STATA output

 . xi:stcox i.Hist_Grade_4grp, efr
 i.Hist_Grade_~p   _IHist_Grad_1-4     (naturally coded; _IHist_Grad_1
 omitted)

        failure _d:  FFR_censor
  analysis time _t:  FFR_month

 Iteration 0:   log likelihood =  -1133.369
 Iteration 1:   log likelihood = -1129.4686
 Iteration 2:   log likelihood = -1129.3196
 Iteration 3:   log likelihood = -1129.3191
 Refining estimates:
 Iteration 0:   log likelihood = -1129.3191

 Cox regression -- Efron method for ties

 No. of subjects =          399                     Number of obs   =
 399
 No. of failures =          218
 Time at risk    =  9004.484606
                                                  LR chi2(3)      =
  8.10
 Log likelihood  =   -1129.3191                     Prob  chi2     =
  0.0440

 --
         _t | Haz. Ratio   Std. Err.      z    P|z|     [95% Conf.
 Interval]
 -+
 _IHist_Gra~2 |   1.408166   .3166876     1.52   0.128     .9062001
  2.188183
 _IHist_Gra~3 |    1.69506   .3886792     2.30   0.021     1.081443
  2.656847
 _IHist_Gra~4 |   2.540278   .9997843     2.37   0.018      1.17455
 5.49403



 R Output using
 summary ( coxph( Surv(FFR_month,FFR_censor) ~ Hist_Grade_4grp,
 method=c(breslow)))
 summary ( coxph( Surv(FFR_month,FFR_censor) ~ Hist_Grade_4grp,
 method=c(exact)))
 summary ( coxph( Surv(FFR_month,FFR_censor) ~ Hist_Grade_4grp,
 method=c(efron)))



  n=399 (21 observations deleted due to missingness)

                    coef exp(coef) se(coef)     z Pr(|z|)
 Hist_Grade_4grp.L 0.66685   1.94809  0.26644 2.503   0.0123 *
 Hist_Grade_4grp.Q 0.03113   1.03162  0.20842 0.149   0.8813
 Hist_Grade_4grp.C 0.08407   1.08771  0.13233 0.635   0.5252
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

                 exp(coef) exp(-coef) lower .95 upper .95
 Hist_Grade_4grp.L     1.948     0.5133    1.1556     3.284
 Hist_Grade_4grp.Q     1.032     0.9693    0.6857     1.552
 Hist_Grade_4grp.C     1.088     0.9194    0.8392     1.410

 Rsquare= 0.02   (max possible= 0.997 )
 Likelihood ratio test= 8.1  on 3 df,   p=0.044
 Wald test            = 8.02  on 3 df,   p=0.0455
 Score (logrank) test = 8.2  on 3 df,   p=0.04202

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fast / dependable way to stack together data frames from a list

2010-09-08 Thread Paul Johnson

Hi, everybody:

I asked about this in r-help last week and promised
a summary of answers. Special thanks to the folks
that helped me understand do.call and pointed me
toward plyr.

We face this problem all the time. A procedure
generates a list of data frames. How to stack them
together?

The short answer is that the plyr package's rbind.fill
method is probably the fastest method that is not
prone to trouble and does not require much user caution.

result - rbind.fill(mylist)

A slower alternative that also works is

result -  do.call(rbind, mylist)

That is always available in R and it works well enough, even
though it is not quite as fast. Both of these are much faster than
a loop that repeatedly applies rbind.

Truly blazing speed can be found if we convert this into
matrices, but that is not possible if the list actually
contains data frames.

I've run this quite a few times, and the relative speed of the
different approaches has never differed much.

If you run this, I hope you will feel smarter, as I do!
:)


## stackListItems.R
## Paul Johnson pauljohn at ku.edu
## 2010-09-07

## Here is a test case

df1 - data.frame(x=rnorm(100),y=rnorm(100))
df2 - data.frame(x=rnorm(100),y=rnorm(100))
df3 - data.frame(x=rnorm(100),y=rnorm(100))
df4 - data.frame(x=rnorm(100),y=rnorm(100))

mylist -  list(df1, df2, df3, df4)

## Here's the way we have done it. We understand this,
## we believe the result, it is easy to remember. It is
## also horribly slow for a long list.

resultDF - mylist[[1]]
for (i in 2:4) resultDF - rbind(resultDF, mylist[[i]])


## It works better to just call rbind once, as in:

resultDF2 - rbind( mylist[[1]],mylist[[2]],mylist[[3]],mylist[[4]])


## That is faster because it calls rbind only once.

## But who wants to do all of that typing? How tiresome.
## Thanks to Erik Iverson in r-help, I understand that

resultDF3 - do.call(rbind, mylist)

## is doing the EXACT same thing.
## Erik explained that do.call( rbind, mylist)
## is *constructing* a function call from the list of arguments.
## It is shorthand for rbind(mylist[[1]], mylist[[2]], mylist[[3]])
## assuming mylist has 3 elements.

## Check the result:
all.equal( resultDF2, resultDF3)

## You often see people claim it is fast to allocate all
## of the required space in one shot and then fill it in.
## I got this algorithm from code in the
## complete function in the mice package.
## It allocates a big matrix of 0's and
## then it places the individual data frames into that matrix.

m - 4
nr - nrow(df1)
nc - ncol(df1)
resultDF4 - as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
for (j in  1:m) resultDF4[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]]

## This is a bit error prone for my taste. If the data frames have
## different numbers of rows, some major code surgery will be needed.

##
## Dennis Murphy pointed out the plyr package, by Hadley Wickham.
## Dennis said  ldply() in the plyr package. The following is the same
## idea as do.call(rbind, l), only faster.

library(plyr)
resultDF5  - ldply(mylist, rbind)
all.equal(resultDF, resultDF5)



## Plyr author Hadley Wickham followed up with I think all you want
here is rbind.fill:

resultDF6 - rbind.fill(mylist)
all.equal(resultDF, resultDF6)


## Gabor Grothendieck noted that if the elements in mylist were
matrices, this would all work faster.

mylist2 - lapply(mylist, as.matrix)

matrixDoCall -  do.call(rbind, mylist2)

all.equal(as.data.frame(matrixDoCall), resultDF)


## Gabor also showed a better way than 'system.time' to find out how
## long this takes on average using the rbenchmark package. Awesome!

# library(rbenchmark)
# benchmark(
#+ df = do.call(rbind, mylist),
#+ mat = do.call(rbind, L),
#+ order = relative, replications = 250
#+ )



## To see the potentially HUGE impact of these changes, we need to
## make a bigger test case. I just used system.time to evaluate, but
## if this involved a close call, I'd use rbenchmark.

phony - function(i){
  data.frame(w=rnorm(1000), x=rnorm(1000),y=rnorm(1000),z=rnorm(1000))
}
mylist - lapply(1:1000, phony)




### First, try my usual way
resultDF - mylist[[1]]
system.time(
for (i in 2:1000) resultDF - rbind(resultDF, mylist[[i]])
   )
## wow, that's slow:
## user  system elapsed
## 168.040   4.770 173.028


### Now do.call method:
system.time( resultDF3 - do.call(rbind, mylist) )
all.equal(resultDF, resultDF3)

## Faster! Takes one-twelfth as long
##   user  system elapsed
##  14.640.85   15.49


### Third, my adaptation of the complete function in the mice
### package:
m - length(mylist)
nr - nrow(mylist[[1]])
nc - ncol(mylist[[1]])

system.time(
   resultDF4 - as.data.frame(matrix(0, nrow = nr*m, ncol = nc))
 )

colnames(resultDF4) - colnames(mylist[[1]])
system.time(
   for (j in  1:m) resultDF4[(((j-1)*nr) + 1):(j*nr), ] - mylist[[j]]
)

all.equal(resultDF, resultDF4)
##Disappointingly slow on the big case:
#   user  system elapsed
# 80.400   3.970  84.573


### That took much longer than I expected, Gabor's
### hint

Re: [R] coxph and ordinal variables?

2010-09-10 Thread Paul Johnson

Hi, everybody

On Wed, Sep 8, 2010 at 5:43 PM, Min-Han Tan minhan.scie...@gmail.com wrote:


David said my R code text attachment got rejected by the mailing list.

Pooh.   I don't think that's nice.  I don't see anything in the
posting guide about a limit on text attachments.

Well, if you are still trying to understand what 'orthogonal
polynomial' means, I suggest you run the following through.  I thought
it was an
enlightening experience.

# Paul Johnson paulj...@ku.edu Nov. 16, 2005

# Ordinal predictors with a small number of possible values

# Here is R code and commentary about ordinal predictors.
# Please let me know if you have insights to explain this more clearly.

set.seed(19)

sampleSize - 100
subgroupSize - sampleSize/5
# One of those 5 point opinion questions: Strongly Disagree...
# Strongly agree with values assigned 1,3,5,7,9
surveyQuestion -
c(rep(1,subgroupSize),rep(3,subgroupSize),rep(5,subgroupSize),rep(7,subgroupSize),rep(9,subgroupSize))

### Make this an unordered factor
myufac - factor(surveyQuestion)
### Study the contrasts:
contrasts(myufac)

### Make an ordered  factor
myofac - ordered(surveyQuestion)
contrasts(myofac)

# another independent variable
x - rnorm(sampleSize)
# a random error
e - rnorm(sampleSize)


# First, suppose the output data is really just a
# linear reflection of the surveyQuestion. It is created
# by multiplying against the evenly spaced values
# observed in the survey

y1 - -5 +  4*surveyQuestion + 6*x + 10 * e


mod0 - lm(y1~x + surveyQuestion)
summary(mod0)

# Question: are you making a mistake by just throwing
# surveyQuestion into the analysis? You should not be
# making a mistake, because you (luckily) guessed the right model

# You might check by running the model with the unordered factor,
# which amounts to running dummy variables

mod1 - lm(y1~x + myufac)
summary(mod1)

# or with the ordered factor, which estimates the orthogonal
# polynomials

mod2 - lm(y1~x + myofac)
summary(mod2)

# Does either of those model appear to be better than
# the original mod0?

# If I did not know for sure the factor was ordered, I'd
# be stuck with the treatment contrasts in mod1.  And what
# I would really like to know is this: are the coefficients
# in mod1 stepping up in a clear, ordinal, evenly spaced pattern?

# Since we know the data is created with a coefficient of 4
# we would expect that the coefficients would step up like this
# myufac3   8
# myufac5   16
# myufac7   24
# myufac9   32

# See why we expect this pattern? The intercept gobbles up myufac1,
# so each impact of the surveyQuestion has to be reduced by 4 units.
# The impact of myufac3, which you expect might be 3*4=12, is reduced
# to 3*4 - 4 = 8, and so forth.

# But that does not happen with a smallish sample. You can run this
# code a few times. It appears to me that a sample of more than
# 10,000 is necessary to get any faithful reproduction of the steps
# between values of surveyQuestion.

# Question: would we mistakenly think that the uneveness observed in
# summary(mod1) is evidence of the need to treat surveyQuestion as a
# nominal factor, even though we know (since we created the data) that
# we might as well just throw surveyQuestion in as a single variable?
# How to decide?

# We need a hypothesis test of the conjecture that
# the coefficient estimates in mod1 fall along a line
# i.e, myufac-i = (b2 * i ) - b2

# I believe that the correct test results from this command:

anova(mod0, mod1, test=Chisq)

# If that is significant, it means you are losing predictive
# power by throwing in surveyQuestion as if it were a numerical
# variable.



# Now, what if we are sure that the data gathered in surveyQuestion is
# really ordinal, and so we estimate mod2.

# What do those parameters mean? Here I'm just reasoning/guessing
# because I cannot find any complete idiot's guide to orthogonal
# polynomials and their use in regression and/or R.

# Take a look at the contrasts themselves
#  ord1 - contrasts(myofac)
# ord1
#   .L .Q.C ^4
# 1 -6.324555e-01  0.5345225 -3.162278e-01  0.1195229
# 3 -3.162278e-01 -0.2672612  6.324555e-01 -0.4780914
# 5 -3.287978e-17 -0.5345225  1.595204e-16  0.7171372
# 7  3.162278e-01 -0.2672612 -6.324555e-01 -0.4780914
# 9  6.324555e-01  0.5345225  3.162278e-01  0.1195229

# What does this mean?  I believe: we act as though there are 4
# independent variables in the regression, L, Q, C, and ^4, and for
# each respondent in the dataset, we choose a row of these numbers
# to act as independent variables.  A person who
# answered 1 on the survey would have the input values (-.63, -.53,
# -.31, 0.11) for those four variables.

# This begs the question, what are L, Q, C, and ^4?

### This is tough to explain.

# Background Recall from calculus that any function can be
# approximated by a polynomial.  Since surveyQuestion has only 5
# possible values, a polynomial of degree 4 is needed to replace it
# in a regression model. It can

Re: [R] Problems with pdf device using plot glht function on multcomp library.

2010-09-18 Thread Paul Johnson

Hi, Kenneth

It is not clear if you mean that your pdf output usually works, but it
does not in this special case, or that this is a first effort with
pdf.  The answer might depend on which is the case case.

If you are just getting started, can I refer you to some lecture notes
I made about saving graphs in R?

http://pj.freefaculty.org/R/SummerCamp2010/PJ-Lectures/plotting2.pdf

If you cut off the file name at the end,  you will see the list of the
other lectures I prepared for our university's Summer Stats Camp in
2010.

If pdf does usually work, Could I suggest you take this code and
rearrange? Where you have this:

pdf(plot1.pdf)
m1-aov(Nitratos~Descripcion-1,data=Sx)
vect1-table(Sx$Descripcion)
K-contrMat(vect1,base=4)
dnk-glht(m1,linfct=K)
summary(dnk)

old.par-par(no.readonly = TRUE)
par(mai=c(1,2,1.25,1),mgp=c(3,1,0))
print(plot(dnk,las=1,xlab=))
print(abline(v=0,lty=2))
par(old.par)

dev.off()



Instead, separate the stats part from the plot part


m1 - aov(Nitratos~Descripcion-1, data=Sx)
vect1 - table(Sx$Descripcion)
K - contrMat(vect1, base=4)
dnk - glht(m1, linfct=K)
summary(dnk)

pdf(plot1.pdf, onefile=F, paper=special, height=6, width=6)
### old.par-par(no.readonly = TRUE)
par(mai = c(1, 2, 1.25, 1), mgp = c(3,1,0))
plot(dnk, las = 1, xlab = )
abline(v = 0, lty = 2)
par(old.par)
dev.off()

I've made changes to cut your print statements and moved your pdf
command down to sandwich the plot commands.  I've added some
embellishment which I almost guarantee will make your pdf output more
useful.  (Spaces are inserted around - and after commas for
readability.  Check the R examples, they generally recommend that).

To the R experts, it may be that the par stuff seems obvious, but to
the rest of us, well, it is not completely so.  You do not need to
save the par values because the par setting you change is instantly
forgotten when you do dev.off().   So you don't need to worry about
saving the default settings and then returning them.  Run par() to see
for yourself after dev.off().  par's defaults will be back where they
were at the beginning.

You would only need to do remember my par thing if you were trying
to put several graphs into a single output device, which you aren't.
And I've made that clear by putting onefile=F in the pdf statement.

I'd try it without the par at all if there is still trouble in the
output file.

I added the paper=special option in the pdf command in order to make
the margins in the output more reasonable. If you don't do that, the
pdf output assumes you are wanting a whole sheet of paper, so there
will be some massive margins in your output.

Good luck.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to find the interception point of two linear fitted model in R?

2009-10-21 Thread Paul Johnson

On Wed, Oct 21, 2009 at 12:09 PM, FMH kagba2...@yahoo.com wrote:
 Dear All,

 Let have 10 pair of observations, as shown below.

 ##
 x - 1:10
 y - c(1,3,2,4,5,10,13,15,19,22)
 plot(x,y)
 ##


 Two fitted  models, with ranges of [1,5] and [5,10], can be easily fitted 
 separately by lm function as shown below:

 ###
 mod1 - lm(y[1:5] ~ x[1:5])
 mod2 - lm(y[5:10] ~ x[5:10])
 ###

 I'm looking for the interception points between these two straight lines from 
 these fitted models, mod1 and mod2.

 Could someone advice me the way to do it?

 Thank you
 Fir


extract the coefficients, rearrange in a linear system, use solve.


 m1 - as.matrix(rbind(coef(mod1), coef(mod2)))
 a - cbind(c(1,1), -m1[, 2])
 a
 [,1]  [,2]
[1,]1 -0.90
[2,]1 -3.257143
 b - m1[,1]
 solve(a=a,b=b)
[1] 4.396364 4.551515

This seems to visually verify the result:

 plot(1:10,1:10, type=n)
 abline(mod1)
 abline(mod2)

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] spss imports--trouble with to.data.frame

2009-11-13 Thread Paul Johnson

 will not be allowed in factors anymore
13: In `levels-`(`*tmp*`, value = c(P, L, Kesk-Eesti,  ... :
  duplicated levels will not be allowed in factors anymore
14: In `levels-`(`*tmp*`, value = c(Galicia, Principado de Asturias,  ... :
  duplicated levels will not be allowed in factors anymore
15: In `levels-`(`*tmp*`, value = c(Stockholm, , Sydsverige,  ... :
  duplicated levels will not be allowed in factors anymore

  str(d2$HAPPY)
 Factor w/ 13 levels Extremely unhappy,..: NA NA NA NA NA NA NA NA NA NA ...

Oh, heck, all the values are missing!! Somehow, putting
to.data.frame inside the read.spss causes a different outcome than
using as.data.frame after reading in the data.

The symptoms of this in R-2.9 are a little different, but the
conclusion the same.  Help?

In case you are a student who wants to work with this data, I can
share to you the large script that I have been accumulating so that
you might play along.  It turns out to be surprisingly difficult to
recode these factor variables that have levels like none, 1,
2,...9, total.



## Paul Johnson
## November 13, 2009

## A question arose in the lab. A student asks I want
## to compare the answers from two different editions
## of the European Social Survey.

## I will add this to Stuff Worth Knowing later, but
## I can share this tutorial to you right now.

## From this website:

## http://ess.nsd.uib.no/ess

## Download those European Social Survey Datasets into a directory.

## In a terminal, use the unzip command:
## unzip ESS3e03_2.spss.zip

## unzip ESS2e03_1.spss.zip

## Then run the following in R.


library(foreign)

d2 - read.spss(ESS3e03_2.por,to.data.frame=T)


d2 - read.spss(ESS3e03_2.por)
warnings()

### You can try to go into a data frame in one
### step, that's an option in read.spss. But
### we saw warnings, and wanted to be careful.

d2 - as.data.frame(d2)
d2$whichSurvey - 2

d3 - read.spss(ESS2e03_1.por)

d3 - as.data.frame(d3)
d3$whichSurvey - 3

namesd2 - names(d2)
namesd3 - names(d3)

commonNames - intersect( namesd3, namesd2)

combod23 - rbind(d2[ , commonNames], d3[, commonNames])

save(combod23, file=combod23.Rda)


## Error
##Warning messages:
##1: In `[-.factor`(`*tmp*`, ri, value = c(NA, NA, NA, NA, NA, NA, NA,  :
##  invalid factor level, NAs generated
##2: In `[-.factor`(`*tmp*`, ri, value = c(NA, NA, NA, NA, NA, NA, NA,  :
##  invalid factor level, NAs generated
##3: In `[-.factor`(`*tmp*`, ri, value = c(1, 1, 1, 1, 1, 1, 1, 1, 1,  :
##  invalid factor level, NAs generated

## That worries me a little bit. The warnings did too.

## Inspect a few lines in the result.

combod23[1:4, ]

## fix doesn't work for me, did not bother to investigate.

## fix(combod23)
##Error in edit.data.frame(get(subx, envir = parent), title = subx, ...) :
##  can only handle vector and factor elements
## That means some data from hell came into this thing.

## I suspect that combod23 is OK.

## The memory use on this exercise is huge! Try to help it

rm (d2)
rm (d3)


## But I worry. I have 2 ways that I use to try to figure this
## out. One is to open the dataset in a clone of SPSS called
## PSPP. Actually, the executable is psppire.
##
## The other thing I do is open the same data again in
## a numeric format, and compare the 2 combined data frames

## This is also a useful exercise because it helps you
## understand what a factor is in R.

dn2 - read.spss(ESS3e03_2.por, use.value.labels = F)


dn2 - as.data.frame(dn2)
dn2$whichSurvey - 2

dn3 - read.spss(ESS2e03_1.por, use.value.labels = F)

dn3 - as.data.frame(dn3)
dn3$whichSurvey - 3

## Might be smart to compare
# dn2$HAPPY[1:50]
# d2$HAPPY[1:50]

namesdn2 - names(dn2)
namesdn3 - names(dn3)

commonNNames - intersect( namesdn3, namesdn2 )

combodn23 - rbind(dn2[ , commonNNames], dn3[, commonNNames])

save(combodn23, file=combodn23.Rda)

table( combod23$HAPPY, combodn23$HAPPY)

## In summary, whenever I want to use a variable from
## the combined data frame, I would probably compare
## against combodn23 just to feel safe.




## Note, after when you come back to work on this project again, you
## might as well just reload the saved copies of combod23 and
## combodn23.

## load(combod23.Rda)

## load(combodn23.Rda)

## That will put you at the current spot, no need to redo the merge


## Now, about recoding. If you just want numerical
## data, you might consider using combodn23.

## But if you want some factors and some numberical
## variables, then you might need to recode to reclaim
## values.

## HAPPY turns out to be an interesting example of a
## PAIN IN THE ASS because in SPSS, it is scored from
## 0 to 10, but they give value labels only for scores
## 1=  Extremely unhappy
## and
## 10= Extremely happy
##
## And the SPSS column has no labels for values 1-9.
## If SPSS gave NO labels at all, then this would come
## into R as a numeric variable. BUT, because there are
## 2 levels named, then R makes a factor out of it.

## When R turns it into a factor, you
## end up with a nutty

Re: [R] gregmisc library (Mandriva)

2009-11-15 Thread Paul Johnson

On Sat, Oct 3, 2009 at 9:18 AM, chi ball c...@hotmail.it wrote:



 Hi, I'm not able to find a rpm of gregmisc library (2.0.0)  for Linux 
 Mandriva 2008 Spring.
 Any suggestion?
 Thanks

If you don't find an up to date RPM, either you have to learn how to
build an RPM or just install the package yourself.

You can install the package yourself in a number of ways, I think R
FAQ outlines it.

To update and download a whole bunch of packages, I use a script.

It should be easy for you to see how this works. I scan the system to
update what packages there are, and then install a lot of others if
they are not installed yet.

as root run R CMD BATCH R_installFaves-2.R

or inside R as root you could type

source(R_installFaves-2.R)

On Ubuntu, if you do this as root it installes the packages into
/usr/local/lib/R, but on Fedora it installs them under /usr/lib/R.  I
do not know where they will go with Mandriva.

I used to run the script to get ALL packages, but when the CRAN list
accumulated more than 600 packages, my systems just spent all day
building packages. So I had to narrow my sites.

I'm looking at administering a cluster computer on which I'll need to
make RPMs for many packages, and so I'm in the same boat as you are if
you are wanting RPMs.  You could check back with me in about a month
to find out if I have packages for you.

pj

I think  gregmisc is a bundle, those are deprecated.  Instead, you
install gdata, gmodels, and so forth.


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] plotmath vector problem; full program enclosed

2010-07-06 Thread Paul Johnson

Here's another example of my plotmath whipping boy, the Normal distribution.

A colleague asks for a Normal plotted above a series of axes that
represent various other distributions (T, etc).

I want to use vectors of equations in plotmath to do this, but have
run into trouble.  Now I've isolated the problem down to a relatively
small piece of working example code (below).  If you would please run
this, I think you will see the problem.  When plotmath meets one
vector of expressions, it converts all but one to math, so in the
figure output i get, in LaTeX speak

b1 $\mu-1.0 \sigma$$\mu$

All values except the first come out correctly.

This happens only when I try to use bquote or substitute to get
variables to fill in where the 1.96, 1.0, and so forth should be.  In
the figure output, you should see a second axis where all of the
symbols are resolved correctly.

As usual, thanks in advance for your help, sorry if I've made an
obvious mistake or overlooked a manual.

### Filename: plotMathProblem.R
### Paul Johnson July 5, 2010
### email me paulj...@ku.edu

  sigma - 10.0
  mu - 4.0

  myx - seq( mu - 3.5*sigma,  mu+ 3.5*sigma, length.out=500)

  myDensity - dnorm(myx,mean=mu,sd=sigma)

  ### xpd needed to allow writing outside strict box of graph
  ### Need big bottom margin to add several x axes
  par(xpd=TRUE, ps=10, mar=c(18,2,2,2))

  plot(myx, myDensity, type=l, xlab=, ylab=Probability Density ,
main=myTitle1, axes=FALSE)
  axis(2, pos= mu - 3.6*sigma)
  axis(1, pos=0)

  lines(c(myx[1],myx[length(myx)]),c(0,0)) ### closes off axes


  addInteriorLine - function(x, m, sd){
for (i in 1:(length(x))){
  lines( c(x[i],x[i]), c(0, dnorm(x[i],m=m,sd=sd)), lty= 14, lwd=.2)
}
  }


  dividers - c(qnorm(0.025), -1, 0, 1, qnorm(0.975))
  addInteriorLine(mu+sigma*dividers, mu,sigma)

  # bquote creates an expression that text plotters can use
  t1 -  bquote( mu== .(mu))
  mtext(bquote( mu == .(mu)), 1, at=mu, line=-1)


  addInteriorLabel - function(pos1, pos2, m, s){
area - abs(100*( pnorm(m+pos1*s,m,s)-pnorm(m+pos2*s, m,s)))
mid - m+0.5*(pos1+pos2)*s
text(mid, 0.5*dnorm(mid,m,s),label=paste(round(area,2),%))
  }


  addInteriorLabel(dividers[1],dividers[2],  mu, sigma)
  addInteriorLabel(dividers[2],dividers[3],  mu, sigma)
  addInteriorLabel(dividers[3],dividers[4],  mu, sigma)
  addInteriorLabel(dividers[4],dividers[5],  mu, sigma)




### Following is problem point: axis will
### end up with correct labels, except for first point,
### where we end up with b1 instead of mu - 1.96*sigma.
   b1 - substitute( mu - d*sigma, list(d=round(dividers[1],2)) )
   b2 - substitute( mu - sigma )
   b3 - substitute( mu )
   b4 - substitute( mu + sigma )
   b5 - substitute( mu + d*sigma, list(d=round(dividers[5],2)) )
##   plot(-20:50,-20:50,type=n,axes=F)
   axis(1, line=4,at=mu+dividers*sigma,
labels=c(expression(b1),b2,b3,b4,b5), padj=-1)




### This gets right result but have to hard code the dividers
  b1 - expression( mu - 1.96*sigma )
  b2 - expression( mu - sigma )
  b3 - expression( mu )
  b4 - expression( mu + sigma )
  b5 - expression( mu + 1.96*sigma )
  axis(1, line=8,at=mu+dividers*sigma, labels=c(b1,b2,b3,b4,b5), padj=-1)




-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to save summary(lm) and anova (lm) in format?

2010-07-06 Thread Paul Johnson

There are R packages that can make nice R regression tables in LaTeX
documents. I've used memisc and its good, there is also apsrtable
and the old standby xtable.  Also I use my own function outreg, but
that's just a 'not invented here' attitude.

Your problem is that you need this to go into Word, in which case I
think a reasonable strategy is to create html output in R and then in
Word you can use paste special HTML and it will bring in the html as
a Word table.

I recently made a presentation about this, you might scan down to the
end where I have the html example for the poor-pitiful Word users of
the world:

http://pj.freefaculty.org/SummerCamp2010/regression2.pdf

Look down around slide 75

pj


On Fri, Jul 2, 2010 at 12:34 PM, Yi liuyi.fe...@gmail.com wrote:
 Hi, folks,

 I would like to copy the output of summary(lm) and anova (lm) in R to my
 word file. But the output will be a mess if I just copy after I call summary
 and anova.

 #
 x=rnorm(10)
 y=rnorm(10,mean=3)
 lm=lm(y~x)
 summary(lm)

 Call:
 lm(formula = y ~ x)
 Residuals:
      Min        1Q    Median        3Q       Max
 -1.278567 -0.312017  0.001938  0.297578  1.310113
 Coefficients:
            Estimate Std. Error t value Pr(|t|)
 (Intercept)   2.5221     0.2272  11.103 3.87e-06 ***
 x            -0.5988     0.2731  -2.192   0.0597 .
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 Residual standard error: 0.7181 on 8 degrees of freedom
 Multiple R-squared: 0.3753,     Adjusted R-squared: 0.2972
 F-statistic: 4.807 on 1 and 8 DF,  p-value: 0.0597
 

 How can I get the exact ouput as shown in R but not as the above?


 Thanks.

 Yi

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plotmath vector problem; full program enclosed

2010-07-07 Thread Paul Johnson

On Tue, Jul 6, 2010 at 12:41 PM, Duncan Murdoch
murdoch.dun...@gmail.com wrote:
 On 06/07/2010 10:54 AM, Paul Johnson wrote:

 Here's another example of my plotmath whipping boy, the Normal
 distribution.

 You want as.expression(b1), not expression(b1).  The latter means the
 expression consisting of the symbol b1.  The former means take the object
 stored in b1, and convert it to an expression..

 It's not perfect, because you'll end up with mu - -1.96sigma (i.e. two
 minus signs), but it's closer than what you had.

 Duncan Murdoch



Hi, Duncan and David

Thanks for looking.  I suspect from the comment you did not run the
code.  The expression examples I give do work fine already.  But I
have to explicitly put in values like 1.96 to make them work.  I'm
trying to avid that with substitute, which does work for b2, b3, b4,
b5, all but b1.  Why just one?

I'm uploading a picture of it so you can see for yourself:

http://pj.freefaculty.org/R/plotmathwrong.pdf

please look in the middle axis.

Why does only b1 not work, but the rest do?

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plotmath vector problem; full program enclosed

2010-07-07 Thread Paul Johnson

On Wed, Jul 7, 2010 at 5:41 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote:



 You want as.expression(b1), not expression(b1).  The latter means
 the
 expression consisting of the symbol b1.  The former means take the
 object
 stored in b1, and convert it to an expression..


Thanks to Duncan and Allen, who pointed out that I was not even
reading my own code carefully.  I apologize for trying your patience.

Before I stop swinging at this one, I still want to bother everybody
about one thing, which really was the original question, but I did not
know the words to ask it.

The full code below is a working example that works, but I don't
understand why. Focus on these two commands that produce 2 axes.  Both
produce the desired output, but, as far as I can see, they should not!

1:
   axis(1, line=6, at=mu+dividers*sigma,
labels=as.expression(c(b1,b2,b3,b4,b5), padj=-1))


2:
   axis(1, line=9, at=mu+dividers*sigma,
labels=c(as.expression(b1),b2,b3,b4,b5), padj=-1)

This second one shouldn't work, I think.
It has as.expression on only the first element, and yet they all come
out right. Is there a spill over effect?

My original question should not have asked why b1 does not print
correctly when I do this:

   axis(1, line=9, at=mu+dividers*sigma,
labels=c(expression(b1),b2,b3,b4,b5), padj=-1)

but the correct question should have been why do b2, b3, b4 , and b5
get processed properly into plot math even though they are not
expressions??

???
pj

### Filename: plotMathProblem.R
### Paul Johnson July 7, 2010
### email me paulj...@ku.edu

  sigma - 10.0
  mu - 4.0

  myx - seq( mu - 3.5*sigma,  mu+ 3.5*sigma, length.out=500)

  myDensity - dnorm(myx,mean=mu,sd=sigma)

  ### xpd needed to allow writing outside strict box of graph
  ### Need big bottom margin to add several x axes
  par(xpd=TRUE, ps=10, mar=c(18,2,2,2))

  plot(myx, myDensity, type=l, xlab=, ylab=Probability Density ,
main=myTitle1, axes=FALSE)
  axis(2, pos= mu - 3.6*sigma)
  axis(1, pos=0)

  lines(c(myx[1],myx[length(myx)]),c(0,0)) ### closes off axes


  addInteriorLine - function(x, m, sd){
for (i in 1:(length(x))){
  lines( c(x[i],x[i]), c(0, dnorm(x[i],m=m,sd=sd)), lty= 14, lwd=.2)
}
  }


  dividers - c(qnorm(0.025), -1, 0, 1, qnorm(0.975))
  addInteriorLine(mu+sigma*dividers, mu,sigma)

  # bquote creates an expression that text plotters can use
  t1 -  bquote( mu== .(mu))
  mtext(bquote( mu == .(mu)), 1, at=mu, line=-1)


  addInteriorLabel - function(pos1, pos2, m, s){
area - abs(100*( pnorm(m+pos1*s,m,s)-pnorm(m+pos2*s, m,s)))
mid - m+0.5*(pos1+pos2)*s
text(mid, 0.5*dnorm(mid,m,s),label=paste(round(area,2),%))
  }


  addInteriorLabel(dividers[1],dividers[2],  mu, sigma)
  addInteriorLabel(dividers[2],dividers[3],  mu, sigma)
  addInteriorLabel(dividers[3],dividers[4],  mu, sigma)
  addInteriorLabel(dividers[4],dividers[5],  mu, sigma)

   b1 - substitute( mu - d*sigma, list(d=round(dividers[1],2)) )
   b2 - substitute( mu - sigma )
   b3 - substitute( mu )
   b4 - substitute( mu + sigma )
   b5 - substitute( mu + d*sigma, list(d=round(dividers[5],2)) )
   axis(1, line=4, at=mu+dividers*sigma, labels=c(b1,b2,b3,b4,b5), padj=-1)


   axis(1, line=6, at=mu+dividers*sigma,
labels=as.expression(c(b1,b2,b3,b4,b5), padj=-1))



   axis(1, line=9, at=mu+dividers*sigma,
labels=c(as.expression(b1),b2,b3,b4,b5), padj=-1)


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] permutations of a binary matrix with fixed margins

2007-10-02 Thread Paul Johnson

Jérôme,

As a first attempt, how about the function below. It works (or not) by
randomly sorting the rows and columns, then searching the table for
squares with the corners = matrix(c(1,0,0,1),ncol=2) and subtracting them
from 1 to give matrix(c(0,1,1,0),ncol=2) (and vice versa). Randomized
matrices can be produced as a chain where each permutation is seeded with
the previous one.

Potential problems:
1. Does it really explore all possible permutations? Does it do it in an
unbiased way?
2. Related to above: there is potential autocorrelation (although I haven't
found it with my data set), so might need some dememorisation steps.
3. It's slow and dememorisation makes it slower.
4. It isn't clear to me whether it needs the added stochastic element, i.e.
squares are only flipped if runif(1)0.5. In practice it seems to work
without it (as far as I can tell, i.e. it isn't autocorrelated using my data
set).

I suspect there's a much better way of doing this, which might take the
margins as an input, and therefore wouldn't be directly derived from any
particular matrix structure.

Paul

###

# function to permute binary matrix keeping margins fixed.
# the input x is the binary matrix to be permuted

  permute.struct-
function(x)
{
  x-x[rownames(x)[sample(1:nrow(x))],colnames(x)[sample(1:ncol(x))]]
  pattern-c(1,0,0,1)
  for(i in 1:(nrow(x)-1))
for(j in 1:(ncol(x)-1))
  for(ii in (i+1):nrow(x))
for(jj in (j+1):ncol(x))
{
  query-c(x[c(i,ii),c(j,jj)])
  if((all(query-pattern==0) || all(query==1-pattern)) 
runif(1)0.5)
x[c(i,ii),c(j,jj)] - 1 - x[c(i,ii),c(j,jj)]
}
  x
}

###


From: Mathieu Jérôme jerome.mathieu_at_snv.jussieu.fr 
Date: Thu 05 Apr 2007 - 13:34:01 GMT


Dear R Users, 
How can I randomize a matrix (with contains only 0 and 1) with the
constraint that margin totals (columns and rows sums) are kept constant?
Some have called this swap permutation or something like that. 

The principle is to find bloc of 
10 
01 
and change it for 
01 
10 
there can be several rows or columns between these numbers. It probably
exists in R, but I can't find the function. I am aware of permcont, but it
works only for 2x2 matrices 

thanks in advance 
Jerome 

-- 
Jérôme Mathieu
Laboratory of soil tropical ecology
UPMC
[EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Interaction Terms versus Interaction Effects in logistic regression

2008-03-19 Thread Paul Johnson

I would like to know more about the output from the terms option in
predict(), especially for a glm.  And especially when there is an
interaction effect being considered.

Here's why I ask. These articles were recently brought to my
attention.  They claim that just about everybody who has reported an
interaction coefficient in a logit or probit glm has interpreted it
incorrectly.

Ai, C. and E.C. Norton. 2003. Interaction Terms in Logit and Probit
Models. Economics
Letters 80(1):123−129.

Norton, E.C., H. Wang, and C. Ai. 2004. Computing interaction effects
and standard errors
in logit and probit models. The Stata Journal 4(2):154−167.

These articles are available here:

http://www.unc.edu/~enorton/

Along with the Stata ado file that makes the calculations.

It seems to me the basic point here is that an interaction changes the
slope of a line, as in

z
   z
 z
z
  z
xxx
  z
z
  z
z


The predicted value changes, of course, It may go up or down,
depending on whether the case considered is on the left or right. I
don't see that as a unique problem for logit models.  It seems to be
an artifact of Euclidean geometry :)

The logistic regression model does complicate the application of this
model to making predictions because the positioning of a case depends
on the values of all input variables, not just the one considered in
the interaction.

This is why I'm wishing I had a better understanding of the terms
option in predict.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] as.numeric(.) returns 0

2008-06-12 Thread Paul Johnson

In R version 2.7.0 (2008-04-22) as.numeric(.) returns zero.

 as.numeric(.)
[1] 0

This must be a bug. Splus and previous versions of R (= 2.6.0) return NA,
as you might expect.

I'm running R version 2.7.0 (2008-04-22) on Windows XP.

Paul

_
Paul Johnson
Robertson Centre for Biostatistics
University of Glasgow
Glasgow G12 8QQ, UK
[EMAIL PROTECTED]
http://www.stats.gla.ac.uk/~paulj/index.html
http://www.rcb.gla.ac.uk/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] restricted coefficient and factor in linear regression.

2008-06-15 Thread Paul Johnson

On Sat, Jun 14, 2008 at 7:49 AM, Oh Dong-hyun [EMAIL PROTECTED] wrote:
 Hi,

 my data set is data.frame(id, yr, y, l, e, k).

 I would like to estimate Lee and Schmidts (1993, OUP) model in R.

 My colleague wrote SAS code as follows:
 ** procedures for creating dummy variables are omitted **
 ** di# and dt# are dummy variables for industry and time **
 data a2; merge a1 a2 a; by id yr;
proc sysnlin maxit=100 outest=beta2;
endogenous y;
exogenous  l e k di1-di12 dt2-dt10;
parms a0 0.94 al -0.14 ae 1.8 ak -0.9
b1 0 b2 0 b3 0 b4 0 b5 0 b6 0 b7 0 b8 0 b9 0 b10 0 b11 0
b12 0 c2 0 c3 0 c4 0 c5 0 c6 0 c7 0 c8 0 c9 0 c10 0;
y=a0+al*l+ae*e+ak*k
+(b1*di1+b2*di2+b3*di3+b4*di4+b5*di5+b6*di6
+b7*di7+b8*di8+b9*di9+b10*di10+b11*di11+b12*di12)*
(1*dt1+c2*dt2+c3*dt3+c4*dt4+c5*dt5+c6*dt6+c7*dt7
+c8*dt8+c9*dt9+c10*dt10);
title '* lee/schmidt parameter estimates *';

 My R code is as follows:
 ##
 library(plm)
 dt - read.table(dt.dta, sep = \t, header= T)
 dt$id - factor(dt$id)
 dt$yr - factor(dt$yr)
 fit.model - I(log(y)) ~ I(log(l)) + I(log(e)) + yr * id
 re.fit.gls - pggls(fit.model, data = dt)
 #

 I've got the following error message:
 # Error message ###
 Error in dimnames(x) - dn :
  length of 'dimnames' [2] not equal to array extent
  End of Error message

 I would like to figure out three things.
 1. How can I restrict coefficient in model? As you can see in SAS code,
 coefficient of dt1 is restricted to 1.
 2. If it is possible to restrict coefficients, it is possible to restrict
 coefficients of factors? If so, how?


Hello, I've not used the package plm very much.  I've been reading its
docs now and I don't think it is exactly what you want, since you've
not described a panel data problem.  Possibly you need to take this up
with the plm author.

If I were you, I'd  go in another direction.   First, fit with
ordinary 'lm', just to check sanity of data.  Second, get the package
systemfit in which there is a nonlinear system fitting routine
comparable to the SAS sysnlin that your colleague is using.

Let us know  how it works out.


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R vs. Bugs

2008-06-22 Thread Paul Johnson

Hey, good topic for a thread.  I've wrestled with this over the years.
I think there's some user confusion about what WinBUGS does.  People
who did not see BUGS before WinBUGS tend not to understand this in the
same way...

The unique / important contributions from WinBUGS  are the collection
of working examples and the Doodle code generator thing.  All of the
rest of it can be easily replaced by open source tools that exist or
could easily exist.


On Sun, Jun 22, 2008 at 11:34 AM, Peter Muhlberg
[EMAIL PROTECTED] wrote:
 I've done some looking around in R and elsewhere to answer my question
 on the value of R vs. Bugs for MCMC.  So, for anyone who is curious,
 here's what I think I've found:  Bugs compiles its code, which should
 make it much faster than a pure R program.  Packages such as AMCMC run
 MCMC in R, potentially with a user-defined C function for the
 density--which should make it comparable in speed to Bugs.  The
 packages MCMCpack  (MCMCmetrop1R function) and mcmc seem designed to
 run w/ a density function written in R.   MCMCpack does have functions
 that use precompiled C code from the Scythe library (which looks
 nice), but I see no simple way to add a C density function.  AMCMC and
 Bugs seem to use adaptive MCMC, but the other R packages don't appear
 to do so, which may mean another performance reduction.

Think of performance as a combination of development time and run time.

Andrew Martin's MCMCpack reduces development time by giving people
some pre-packaged Gibbs sampling fitters for standard models, such as
logistic regression.  It still uses the same iterative sampling
routines under the hood as most other MCMC approaches.  The only
difference there is that it has routines formatted in a way that will
be familiar to R users.  I do not believe a simulation model conducted
in MCMCpack will take a different amount of run time than a well
coded, custom version of the same that is prepared in WinBUGS or any
other GIBBS sampler.  The big difference is that MCMCpack offers
routines for familiar models, and if one wants to thrown in some
random parameter here or there, then MCMCpack won't be able to handle
it.

The U. Chicago professors provide the package bayesm which is roughly
the same kind of thing.  For pre-existing model types, an MCMC model
can be conducted.

Students ask Why learn BUGS when I can use MCMCpack (or bayesm)?
Answer: no reason, unless you want to propose a model that is not
already coded up by the package.


 I see no way to insert my own proposal density in the R functions.
 JAG, a Java-based version of BUGS, apparently allows users to create

If you mean to refer to Martyn Plummer's project JAGS:

http://www-ice.iarc.fr/~martyn/software/jags/

JAGS is Just Another Gibbs Sampler, and it is decidedly not written in Java.
It is, rather, written in C++.

JAGS is proposed as a more-or-less complete drop in replacement for
BUGS, so the user can write up a BUGS style model and then hand it
over to JAGS for processing, the same way we used BUGS before the
WinBugs came along and tried to make it a pointy-clicky experience.
JAGS has versions of the classic BUGS examples, and I think it is very
well done.   If you want to run a Gibbs Sampling exercise in Linux, I
suggest you seriously consider using JAGS.  You might use WinBUGS
Doodle thingie to sketch out the code for your model, but then
translate it to JAGS. (I put a bit of thought into packaging an RPM
for Fedora users a few months ago, but ran into a few little packaging
errors that put me off of it.  Now I'm running Ubuntu and packaging
for that is harder, at least for me...)

I notice there are some R packages that are providing pre-packaged
estimators for common models through JAGS.  Check witness:

bayescount  Bayesian analysis of count distributions with JAGS
bayesmixBayesian Mixture Models with JAGS

It is disappointing/frustrating to me that the source code for
WinBUGS/OpenBugs is kept in secret, because here's what I'd really
like.

1. Make a version of Doodle for sketching out models.  As far as I can
see, Doodle is the only truly uniquely valuable component in Win/Open
BUGS.  It helps people get started by providing a code template.

2. Create an avenue for that Doodle code to travel to JAGS.  rJAGS
exists as an interface between R and jags.


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Trouble combining plotmath, bquote, expressions

2008-04-02 Thread Paul Johnson

I'm using R-2.6.2 on Fedora Linux 9.

I've been experimenting with plotmath.

I wish it were easier to combine expressions in plotmath with values
from the R program itself.  There are two parameters in the following
example, the mean mymean and standard deviation mystd.  I am able
to use bquote to write elements into the graph title like

mu = mymean

and R will plot the symbol mu and the value of mymean from the
program. But I want to combine that result in a string along with
other results.

Can I combine to result like this

Normal( mu = mymean , sigma = mystd)

Where symbols mu and sigma are replaced with Greek and mymean and
mystd are drawn from program?


### Filename: Normal1_2008.R
### Paul Johnson March 31, 2008
### This code should be available somewhere in
http://pj.freefaculty.org.  If it is not
### email me [EMAIL PROTECTED]

mymean - 0
mystd - 1.5
myx - seq( mymean - 3*mystd,  mymean+ 3*mystd, length.out=500)


myDensity - dnorm(myx,mean=mymean,sd=mystd)

## This works
plot(myx, myDensity, type=n, xlab=X, ylab=Probability Density ,
main=expression(mu == 0))
## This works
plot(myx, myDensity, type=n, xlab=X, ylab=Probability Density ,
main=expression(sigma == 1.5))

## This works
t1 -  bquote( mu== .(mymean))

plot(myx, myDensity, type=n, xlab=X, ylab=Probability Density , main= t1 )

## This works

t2 -  bquote( sigma== .(mystd))

plot(myx, myDensity, type=n, xlab=X, ylab=Probability Density , main=t2)

## Can't figure how to combine t1 and t2 into plot title

### This fails!
### plot(myx, myDensity, type=n, xlab=X, ylab=Probability Density
, main=paste( t1,t2) )


plot(myx, myDensity, type=n, xlab=X, ylab=Probability Density , main= t1 )

## Can drop sigma in as margin text, though, on side 3.
mtext(t2, 3)
lines(myx,myDensity,lty=4, col=4) ### change line type  color if you want



Supposing there is a way to do this, could I submit a working example
to be added to the help page for plotmath ?  How could I go about
that?


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] plotmath overstrikes in output on a Linux system

2008-04-08 Thread Paul Johnson

I've been testing plotmath.  But I'm getting some funny output one one
computer. The problem is that characters are 'jumbled' and overstrike
when symbols are introduced.

Sample code:

mu - 440.0
sigma - 12.5
myx - seq( mu - 4*sigma,  mu+ 4*sigma, length.out=500)
myDensity - dnorm(myx,mean=mu,sd=sigma)

# Here's one way to retrieve the values of mu and sigma and insert
them into an expression
t1t2 - bquote (paste(Normal(, mu== .(mu), ',', sigma== .(sigma),)) )

plot(myx, myDensity, type=l, xlab=X, ylab=Probability Density , main=t1t2)


I have tested this code and it works on two desktop Fedora Linux 8
systems to make a nice figure, but on a Dell Laptop with Fedora Linux
8 and R 2.6.2, something funny happens: the characters overstrike
each other.  The Greek letter mu is printed several spaces to the
left of the ( that it is supposed to follow.  I made an itty  bitty
picture of the figure title to show you:

http://pj.freefaculty.org/R/plotmath_problem.png

I can force in spaces to re-arrange the symbols so they do not
overstrike. The following looks fine in the output.


t1t2 - bquote (paste(Normal (   , mu== .(mu), '  ,  ', sigma==
.(sigma), )) )
### Note spaces manually inserted above are needed, otherwise plotmath
overlaps l of

plot(myx, myDensity, type=l, xlab=X, ylab=Probability Density , main=t1t2)


What do you suppose is wrong? The X configuration?


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plotmath overstrikes in output on a Linux system

2008-04-08 Thread Paul Johnson

On Tue, Apr 8, 2008 at 4:29 AM, Prof Brian Ripley [EMAIL PROTECTED] wrote:
 Looks likes the the laptop is using different fonts with incorrect font
 metrics.  This could happen because it has a different screen resolution, or
 one of the systems is set to use scalable fonts or it is giving metrics for
 one fonts and using another or 

  BTW, this is probably nothing to do with plotmath per se -- the issue is
 most likely the device (presumably X11(), but you did not say) and its
 handling of font metric requests.  It looks to me as if the widths of normal
 text strings (and not symbols) is the problem.  I have seen something
 similar where there was a long-standing bug in plotmath (fixed in r44534),
 but this only manifested itself when text was kerned, which the X11() device
 in 2.6.2 does not ask for.

  My best guess is that the font selected is (to use the technical terms)
 returning glyph widths without left/right bearings.

  Please try 2.7.0 beta, which has a different X11() device (and the above
 bug fixed).  That ought to solve the problem for you, but otherwise you'll
 need to debug what the X11 metricInfo callback is giving you.



I installed R-latest from the beta directory.  Following your leads,
I've got some information that may help you see if this is purely an
isolated X11 problem on my computer or something you might fix in R.

On screen, I see this bad looking plot:

http://pj.freefaculty.org/R/png-screenshot-notOK.png

Using a png device to write the graph to file, either with

dev.copy(png ... )

or

png (file =  )
plot ( )
dev.off()

The saved file is GOOD:

http://pj.freefaculty.org/R/png-output-OK.png

http://pj.freefaculty.org/R/png_dev_copy_OK.png

More details about this laptop.  The display has a native resolution
of 1680x1050, and IF I put the X11 display at that resolution, then
the on-screen X11 display of that particular graph is fine.

However, if I change the display to another value, the one I prefer is
1400x1050, and re-start, then I see the bad on-screen display.   In
both cases, however, I have confirmed that the DPI settings in
xdpyinfo are correctly representing pixels/inches.

That leads me back to Prof. Ripley's suggestion that I've got a
mismatch in fonts between what X11 really wants and what it gets.  The
font configuration on these systems is becoming so complicated I'm
completely bewildered, as Fedora has both traditional linux font
server and xft fonts and pango/cairo and freetype and apparently
anything else they can think of.  I'm going to keep studying the R
admin guide to see how to debug it.


off topic

Finally,  something came up that I don't want to forget. It is an RPM
building question for R fans.  This concerns my experience with
R-beta-20080407.r45159.  R built  installed without any apparent
trouble, and I decided to try to make an RPM package of it so other
Fedora people can test.  I took the spec file from the existing Fedora
8 updates R package, and made a few changes to suit the new file name
for R, and I built RPMS.  The build proceeds without noticeable
trouble, but when I try to install the RPM, I get this error

 $ sudo rpm -ivh R-2.7.20080407.r45159-1fc8.1.i386.rpm
Password:
error: Failed dependencies:
perl(R::Dcf) is needed by R-2.7.20080407.r45159-1fc8.1.i386
perl(R::Logfile) is needed by R-2.7.20080407.r45159-1fc8.1.i386
perl(R::Rd) is needed by R-2.7.20080407.r45159-1fc8.1.i386
perl(R::Rdconv) is needed by R-2.7.20080407.r45159-1fc8.1.i386
perl(R::Rdtools) is needed by R-2.7.20080407.r45159-1fc8.1.i386
perl(R::Utils) is needed by R-2.7.20080407.r45159-1fc8.1.i386
perl(R::Vars) is needed by R-2.7.20080407.r45159-1fc8.1.i386

As you can see, the package does have the perl modules, but something
has gone wrong in the scripting process that takes note of them:

$ rpm -qilp R-2.7.20080407.r45159-1fc8.1.i386.rpm | grep perl
/usr/share/R/perl
/usr/share/R/perl/File
/usr/share/R/perl/File/Copy
/usr/share/R/perl/File/Copy/Recursive.pm
/usr/share/R/perl/R
/usr/share/R/perl/R/Dcf.pm
/usr/share/R/perl/R/Logfile.pm
/usr/share/R/perl/R/Rd.pm
/usr/share/R/perl/R/Rdconv.pm
/usr/share/R/perl/R/Rdlists.pm
/usr/share/R/perl/R/Rdtools.pm
/usr/share/R/perl/R/Utils.pm
/usr/share/R/perl/R/Vars.pm
/usr/share/R/perl/Text
/usr/share/R/perl/Text/DelimMatch.pm
/usr/share/R/perl/build-help-windows.pl
/usr/share/R/perl/build-help.pl
/usr/share/R/perl/massage-Examples.pl

I'm contacting the RPM maintainer from redhat to see if he knows. But
if you know, please let me know.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot function / par() settings

2008-04-08 Thread Paul Johnson

On Tue, Apr 8, 2008 at 12:47 PM, Pedro Mardones [EMAIL PROTECTED] wrote:
 Dear all;

  I'm trying to create a 2 x 3 plot (something I know like lattice can
  do better) using the plot function. However; I'm not sure how to make
  the width of the plots to be the same on each column. I guess the
  answer maybe obvious but I haven't been able to figure it out. I'll
  appreciate any suggestion. Here is the (highly inefficient) code for
  the first row:

Dear Pedro:

I played around with the layout function and I think this does what
you want.  Because I set the widths exactly, there is no squishing
of the outer plots anymore.

Try this:

nf - layout(matrix(c(1,2,3,4,5,6),2,3,byrow=TRUE), widths=c(lcm(5),
lcm(5), lcm(5)), heights=c(1,1), TRUE)

layout.show(nf)

par(mar=c(1,0,0,0))

  plot(1:5, 1:5, xlab = , ylab = , xaxt = n, col = blue);
  axis(3, at = seq(1:5), labels = rep(, 5))


  plot(1:5, 1:5, xlab = , ylab = , xaxt = n, yaxt = n, col =
  red); axis(3, at = seq(1:5), labels = seq(1:5))

  plot(1:5, 1:5, xlab = , ylab = , xaxt = n, yaxt = n, col =
  red); axis(3, at = seq(1:5), labels = rep(, 5))
  axis(4, at = seq(1:5), labels = rep(, 5))



  plot(1:5, 1:5, xlab = , ylab = , xaxt = n, col = blue);
  axis(3, at = seq(1:5), labels = rep(, 5))


  plot(1:5, 1:5, xlab = , ylab = , xaxt = n, yaxt = n, col =
  red); axis(3, at = seq(1:5), labels = seq(1:5))

  plot(1:5, 1:5, xlab = , ylab = , xaxt = n, yaxt = n, col =
  red); axis(3, at = seq(1:5), labels = rep(, 5))
  axis(4, at = seq(1:5), labels = rep(, 5))






  par(mfrow = c(2, 3))
  par(omi = c(0.60, 0.60, 0.60, 0.10))

  par(mai = c(0.00, 0.50, 0.50, 0.00))
  plot(1:5, 1:5, xlab = , ylab = , xaxt = n, col = blue);
  axis(3, at = seq(1:5), labels = rep(, 5))

  par(mai = c(0.00, 0.00, 0.50, 0.00))
  plot(1:5, 1:5, xlab = , ylab = , xaxt = n, yaxt = n, col =
  red); axis(3, at = seq(1:5), labels = seq(1:5))

  par(mai = c(0.00, 0.00, 0.50, 0.50))
  plot(1:5, 1:5, xlab = , ylab = , xaxt = n, yaxt = n, col =
  red); axis(3, at = seq(1:5), labels = rep(, 5))
  axis(4, at = seq(1:5), labels = rep(, 5))

  Thanks
  PM

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.




-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simple graphing question

2008-04-08 Thread Paul Johnson

On Tue, Apr 8, 2008 at 2:18 PM, stephen sefick [EMAIL PROTECTED] wrote:
 #copy and paste this into R
  f - (structure(list(TKN = c(0.103011025, 0.018633208, 0.104235702,
  0.074537363, 0.138286096), RM = c(215, 198, 148, 119, 61)), .Names = c(TKN,
  RM), class = data.frame, row.names = 25:29))
  plot(f$TKN~f$RM, type=b)

  I would like to reverse the X-Axis.  How do I do this?



Hello, Stephen:

It appears you might be new in R, so let me point out a couple of
things.  First, this works:


f - data.frame( TKN = c(0.103011025, 0.018633208,
0.104235702,0.074537363, 0.138286096), RM = c(215, 198, 148, 119, 61),
row.names = 25:29)

plot(TKN~RM, data=f, type=b, xlim=rev(range(f$RM)))


Note that I've created your data frame in a more usual way and I've
reversed the x axis in the plot by reversing the range of the X
variable. I've also used the data option to plot

Second, I had  reversed an axis before, but I quickly learned how by
typing the following command:

RSiteSearch(reverse axis)

That opened up the web browser and it listed many items, the second of
which was this:

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/66958.html

thread, the title of which is How to reverse the sequence of axis Y ? 

Generally, if you try RSiteSearch() and don't find what you need after
exploring a page or two of threads, then you can post here and ask
questions without people saying go read the posting guide before
posting questions.


Good luck
PJ



-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] distance matrix as text file - how to import?

2008-04-08 Thread Paul Johnson

On Tue, Apr 8, 2008 at 1:50 PM, Hans-Jörg Bibiko [EMAIL PROTECTED] wrote:
 Dear all,

  I have -hopefully- a tiny problem.

  I was sent a text file containing a distance matrix à la:

  1
  2 3
  4 5 6



Try this! I put your test data in text.txt and voila:


mat - matrix(0, 3,3)

mat[row(mat) = col(mat)] - scan(test.txt)


I found this Idea after RSiteSearch(scan triangular) led to this
item as the very first link:

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/22369.html


PJ


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with NA data when computing standard error

2008-04-08 Thread Paul Johnson

On Tue, Apr 8, 2008 at 12:44 PM, LeCzar [EMAIL PROTECTED] wrote:

  Hey,

  I want to compute means and standard errors as two tables like this:

   se-function(x)sqrt(var(x)/length(x))



The missings are not your main problem.

The command var computes the variance-covariance matrix.  Some
covariance values can be negative.  Trying to take square roots is a
mistake.

For example, run

 example(var)

to get some matrices to work with.

 C1[3,4] - NA
 C1[3,5] - NA

Observe you can calculate

 var(C1, na.rm=T)

but you cannot take sqrt of that because it would try to apply sqrt to
negative values.

To get the standard errors, it is necessary to reconsider the problem,
do something like

 diag(var(C1, na.rm=T))

That will give the diagonals, which are positive, so

 sqrt(diag(var(C1, na.rm=T)))

Works as well.

But you have the separate problem of dividing each one by the square
root of the length, and since there are missings that is not the same
for every column.  Maybe somebody knows a smarter way, but this
appears to give the correct answer:

validX - colSums( ! is.na(C1))

This gives the roots:

sqrt(validX)

Put that together, it seems to me you could try

se - function(x) {

myDiag - sqrt(diag(var(x, na.rm=T)))

 validX - colSums(! is.na(x))

 myDiag/sqrt(validX)
}

That works for me:

 se(C1)
   Fertility  Agriculture  ExaminationEducation
   50.740226   110.80861439.39061139.303898
Catholic Infant.Mortality
  328.272207 4.513863


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] If statements for vectors

2008-04-09 Thread Paul Johnson

On Wed, Apr 9, 2008 at 4:07 PM, Laura Bonnett [EMAIL PROTECTED] wrote:
 Dear Sirs,

  I am using both the Bioconductor adds on (Affy, AffyPLM,...) and the
  'standard' R-package.

  I am trying to select a list of genes which all have expression values below
  a certain threshold.
  I have done this by creating a vector which has 0s where the expression is
  greater than the threshold and 1s where it is less than or equal to it.
  Multiplying this vector by the expression values produces a list of 0s and
  expression values below the threshold value.

  However, now I need to remove the 0s.  I thought that this would be
  relatively trivial but it appears it isn't!!!


Without a working example from you, I have no way to test this
proposal.  But if I were you, I would get the index values of the
right cases in one step, and then use that to choose the ones I want.

theGoodOnes - which(exp2 = 0)
exp3 - exp2[theGoodOnes]


This can be crammed into one line, but I'd do it in two just to make
sure it is correct, at least the first time.

My example:

 x - rnorm(100)
 which(x = 0)
 [1]  9 11 12 13 14 16 19 20 21 22 24 25 28 30 32 34 35 36 38 40 41 42 43 44 46
[26] 47 49 50 53 54 57 59 61 63 64 66 68 69 70 72 75 78 80 81 82 83 88 90 97 98
 theGoodOnes - which(x=0)
 newX - x[theGoodOnes]
 newX
 [1] 0.89285908 0.51753998 1.18485887 2.10003705 0.54535841 1.28313738
 [7] 1.34092172 0.76064356 0.02201121 0.80808363 0.04578730 0.23045983
[13] 1.04306626 0.12694184 0.89706863 0.86302992 1.53471660 0.51192410
[19] 1.00366834 1.76923470 0.49508470 1.27888454 0.76706729 1.46340483
[25] 1.69315538 0.50537603 0.18422329 0.72968629 0.45490526 2.18208183
[31] 0.71926926 0.68915832 1.49076770 0.48763971 0.39273110 0.80709549
[37] 0.22099019 0.38103757 0.14626929 0.63933750 1.26643194 3.33091910
[43] 2.50341609 2.05286611 0.31986095 0.64548972 0.34712937 0.04075135
[49] 0.07206342 0.20325505


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Format regression result summary

2008-04-15 Thread Paul Johnson

On Fri, Apr 11, 2008 at 11:05 AM, Thiemo Fetzer [EMAIL PROTECTED] wrote:
 Hello to the whole group.

  I am a newbie to R, but I got my way through and think it is a lot easier to
  handle than other software packages (far less clicks necessary).

[snip]
  However, my wish is the output to have a format like:

  Estimate
  (Intercept)  3.750367***
 (0.172345)
  Var1  -0.002334
 (0.009342)
  Var2 0.012551*
 (0.005927)

  Thanks a lot in advance

  Thiemo


I attach an R function outreg that can make nice looking regression
tables in various formats.  I was thinking about writing something for
R news about this, but stopped to wait because I could not figure a
way to get lmer objects to print out in some pleasant way.  I have
several students and colleagues who use this, no problem lately.

This pdf demonstrates some of the kinds of output it can create.

http://pj.freefaculty.org/R/outreg-worked2.pdf

The R code here is not pretty, I'm obviously trying to do things that
are not anticipated by the original developers.  I found it a bit
tedious to do output with cat and so forth, but never found a more
elegant way.

If anybody wants to help with beautifying the code, please give me
some pointers.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plotmath overstrikes in output on a Linux system

2008-04-15 Thread Paul Johnson

On Tue, Apr 8, 2008 at 1:23 PM, Paul Johnson [EMAIL PROTECTED] wrote:
 On Tue, Apr 8, 2008 at 4:29 AM, Prof Brian Ripley [EMAIL PROTECTED] wrote:
   Looks likes the the laptop is using different fonts with incorrect font
   metrics.  This could happen because it has a different screen resolution, 
 or
   one of the systems is set to use scalable fonts or it is giving metrics for
   one fonts and using another or 
  

I found a fix for the funny font overlaps, and it was quite by
accident. In case there are other Fedora users for whom this might
appear, I would like to close the thread with this. In the livna
repository, there is a package called

freetype-freeworld-2.3.5-3.lvn8

After installing that, the fonts on screen are fine.  I do not
understand exactly why this works, or why freetype-freeworld is not in
the freetype distribution all along.  The package is described thusly,
and you will notice it has the same magic words that Professor Ripley
used--hints and glyphs:

$ rpm -qi freetype-freeworld
Name: freetype-freeworld   Relocations: (not relocatable)
Version : 2.3.5 Vendor: rpm.livna.org
Release : 3.lvn8Build Date: Tue 18 Sep
2007 10:28:30 AM CDT
Install Date: Sun 13 Apr 2008 11:16:45 PM CDT  Build Host:
plague-builder.livna.org
Group   : System Environment/Libraries   Source RPM:
freetype-freeworld-2.3.5-3.lvn8.src.rpm
Size: 685309   License: FTL or GPLv2+
Signature   : DSA/SHA1, Sat 22 Sep 2007 10:23:19 AM CDT, Key ID 71295441a109b1ec
Packager: rpm.livna.org http://bugzilla.livna.org
URL : http://www.freetype.org
Summary : A free and portable font rendering engine
Description :
The FreeType engine is a free and portable font rendering
engine, developed to provide advanced font support for a variety of
platforms and environments. FreeType is a library which can open and
manages font files as well as efficiently load, hint and render
individual glyphs. FreeType is not a font server or a complete
text-rendering library.

This version is compiled with the patented bytecode interpreter and subpixel
rendering enabled. It transparently overrides the system library using
ld.so.conf.d.



-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rggobi is crashing R-2.7.0

2008-05-07 Thread Paul Johnson

On Tue, May 6, 2008 at 12:32 PM, Mark Kimpel [EMAIL PROTECTED] wrote:
 I am running 64-bit Ubuntu 8.04 and when I invoke rggobi the interactive
  graph displays but R crashes. See my sessionInfo() and a short example
  below. Ggobi and rggobi installed without complaints. Mark

   sessionInfo()
  R version 2.7.0 Patched (2008-05-04 r45620)
  x86_64-unknown-linux-gnu


In the R 2.7 release notes, there is a comment about a change in the
GUI libraries and it says that one must recompile everything that
relies on R.  If your R 2.7 was an upgrade, not a fresh install, it
could explain why this is happening.  If there's some old library or R
package sitting around, it could account for this.

The part that concerned me about the R release note is that they don't
give a very clear guide on how far back in the toolchain we are
supposed to go.  Certainly, ggobi has to be rebuilt from scratch.  But
are any of the things on which ggobi depends needing recompliation as
well.

pj


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] poisson regression with robust error variance ('eyestudy

2008-05-08 Thread Paul Johnson

On Thu, May 8, 2008 at 8:38 AM, Ted Harding
[EMAIL PROTECTED] wrote:
 The below is an old thread:

  On 02-Jun-04 10:52:29, Lutz Ph. Breitling wrote:
   Dear all,
  
   i am trying to redo the 'eyestudy' analysis presented on the site
   http://www.ats.ucla.edu/stat/stata/faq/relative_risk.htm
   with R (1.9.0), with special interest in the section on relative
   risk estimation by poisson regression with robust error variance.
  
   so i guess rlm is the function to use. but what is its equivalent
   to the glm's argument family to indicate 'poisson'? or am i
   somehow totally wrong and this is not applicable here?
There have been several questions about getting robust standard errors
in glm lately.  I went and read that UCLA website on the RR eye study
and the Zou article that uses a glm with robust standard errors.

I don't think rlm is the right way to go because that gives
different parameter estimates.  You need to estimate with glm and then
get standard errors that are adjusted for heteroskedasticity. Well,
you may wish to use rlm for other reasons, but to replicate that
eyestudy project, you need to take the ordinary usual estimates of the
b's and adjust the standard errors.

The Zou article does not give the mathematical formulae used to
estimate those robust errors, it rather gives a code snippit on using
SAS proc GENMOD to get those estimates.  Presumably, if we had access
to the SAS formula, we could easily get the calculations we need with
R. It is a little irksome to me that people think saying use SAS proc
GENMOD is informative.  Rather, we really need to know which TYPE of
robust standard error is being considered, since there are about 10
competing formulations.

I started checking various R packages for calculating sandwich
variance estimates.  There is a package sandwich for that purpose,
and in the car package there is a function hccm that can do so.
The hccm in car refuses to take glm objects, but the function vcovHC
in sandwich will do so. The discussion for sandwich's vcovHC
function refers to linear models, and lm objects are used in the
examples, but the vignette sandwich distributed with the package
states The HAC estimators are already available for generalized
linear models (fitted by glm) and robust regression (fitted by rlm in
package MASS). .

As a result, one can get a var/covar matrix from the routines in the
sandwich package, but I'm not entirely sure they are match the ones
SAS gives.  The vcovHC help page has a nice explanation of the
differences across sandwich estimators.  It says they are all variants
on

(X'X)^{-1} X' Omega X (X'X)^{-1}

With different stipulations about Omega.  If we knew the stipulation
about OMEGA used in the SAS routine, we could try it.

I tried to get the eyestudy data, but the link on the UCLA website is
no longer valid.

So I generated some phony 0-1 data:

y - as.numeric(rnorm(1000)  0)
x - rnorm(1000)
 glm1 - glm(y~x, family=binomial)
 hccm(glm1)
Error in hccm.lm(glm1) : requires unweighted lm
 vcovH
vcovHAC  vcovHC
 glm1 - glm(y~x, family=poisson(link=log))
 vcovHC(glm1)
  (Intercept) x
(Intercept)  1.017588e-03 -3.722456e-05
x   -3.722456e-05  9.492517e-04


The default type of estimation the method called HC3,  which is
recommended by authors of some Monte Carlo research studies.  vcovHC
calculates White's basic correction if you run:


 vcovHC(glm1, type=HC0)
  (Intercept) x
(Intercept)  1.013508e-03 -3.691381e-05
x   -3.691381e-05  9.417839e-04


Compare against the non-robust glm var/covar matrix.

 vcov(glm1)
  (Intercept) x
(Intercept)  0.0020152998 -0.778422
x   -0.778422  0.0018721903


In conclusion, use glm followed by vcovHC and I believe you will find
estimates like the ones provided by SAS or Stata.  But, without access
to their data, I can't be entirely sure.



-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] poisson regression with robust error variance ('eyestudy

2008-05-08 Thread Paul Johnson

Ted Harding said:
 I can get the estimated RRs from

 RRs - exp(summary(GLM)$coef[,1])

 but do not see how to implement confidence intervals based
 on robust error variances using the output in GLM.


Thanks for the link to the data.  Here's my best guess.   If you use
the following approach, with the HC0 type of robust standard errors in
the sandwich package (thanks to Achim Zeileis), you get almost the
same numbers as that Stata output gives. The estimated b's from the
glm match exactly, but the robust standard errors are a bit off.



### Paul Johnson 2008-05-08
### sandwichGLM.R
system(wget http://www.ats.ucla.edu/stat/stata/faq/eyestudy.dta;)

library(foreign)

dat - read.dta(eyestudy.dta)

### Ach, stata codes factor contrasts backwards

dat$carrot0 - ifelse(dat$carrot==0,1,0)
dat$gender1 - ifelse(dat$gender==1,1,0)

glm1 - glm(lenses~carrot0,  data=dat, family=poisson(link=log))

summary(glm1)

library(sandwich)

vcovHC(glm1)


sqrt(diag(vcovHC(glm1)))


sqrt(diag(vcovHC(glm1, type=HC0)))

### Result:
#  summary(glm1)
# Call:
# glm(formula = lenses ~ carrot0, family = poisson(link = log),
#data = dat)

# Deviance Residuals:
# Min   1Q   Median   3Q  Max
# -1.1429  -0.9075   0.3979   0.3979   0.7734

# Coefficients:
#Estimate Std. Error z value Pr(|z|)
# (Intercept)  -0.8873 0.2182  -4.066 4.78e-05 ***
# carrot0   0.4612 0.2808   1.6420.101
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# (Dispersion parameter for poisson family taken to be 1)

# Null deviance: 67.297  on 99  degrees of freedom
# Residual deviance: 64.536  on 98  degrees of freedom
# AIC: 174.54

# Number of Fisher Scoring iterations: 5

#  sqrt(diag(vcovHC(glm1, type=HC0)))
# (Intercept) carrot0
#  0.1673655   0.1971117
# 

### Compare against
## http://www.ats.ucla.edu/stat/stata/faq/relative_risk.htm
## robust standard errors are:
## .1682086  .1981048


glm2 - glm(lenses~carrot0 +gender1 +latitude, data=dat,
family=poisson(link=log))


sqrt(diag(vcovHC(glm2, type=HC0)))

### Result:


#  summary(glm2)

#Call:
# glm(formula = lenses ~ carrot0 + gender1 + latitude, family =
poisson(link = log),
# data = dat)

# Deviance Residuals:
# Min   1Q   Median   3Q  Max
# -1.2332  -0.9316   0.2437   0.5470   0.9466

# Coefficients:
#Estimate Std. Error z value Pr(|z|)
# (Intercept) -0.652120.69820  -0.934   0.3503
# carrot0  0.483220.28310   1.707   0.0878 .
# gender1  0.205200.27807   0.738   0.4605
# latitude-0.010010.01898  -0.527   0.5980
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# (Dispersion parameter for poisson family taken to be 1)

#Null deviance: 67.297  on 99  degrees of freedom
# Residual deviance: 63.762  on 96  degrees of freedom
# AIC: 177.76

# Number of Fisher Scoring iterations: 5

#  sqrt(diag(vcovHC(glm2, type=HC0)))
# (Intercept) carrot0 gender1latitude
# 0.49044963  0.19537743  0.18481067  0.01275001

### UCLA site has:

## .4929193   .1963616  .1857414  .0128142


So, either there is some rounding error or Stata is not using
exactly the same algorithm as vcovHC in sandwich.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to add space between main title to leave space for legend?

2008-05-31 Thread Paul Johnson

Hello, everybody:

I recently encountered an example with  in which the graph was placed
in a way that did not leave room for a legend.  Maybe you would
describe it as legend too big, I'm not sure.  I found myself wishing
I could force in some space after the title.

Here's working example where I'd like to make room for a legend.



x - rnorm(100)

hist(x, freq=F, main=Different Meanings of Normality)

lines(density(x))

xseq1 - seq( min(x), max(x), length.out=100)

m1 - mean(x)

sd1 - sd(x)

obsNormal - dnorm(xseq1, mean=m1, sd=sd1)

lines( xseq1, obsNormal, lty=2, col=red)

truNormal - dnorm(xseq1)

lines(xseq1, truNormal, lty=3, col=green)

legend(0,0.4, legend=c(observed density, normal with observed mean
 sd, normal with 'true' mean and sd), lty=c(1,2,3), col=c(black,
red, green))


I tried fiddling around with par to change the margins, but it has the
bad effect of resizing the image and creating a blank space into which
I'm not allowed to write the legend. Know what I mean? I try

par( mar=c(1,1,3,1)) or par(mai=c(1,1,4,1))

before the hist and it makes space, but useless space.

I've been considering something desperate, such as layout().  Isn't
there a simpler way?






-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to add space between main title to leave space for legend?

2008-06-03 Thread Paul Johnson

On Sat, May 31, 2008 at 2:45 PM, Peter Dalgaard
[EMAIL PROTECTED] wrote:
 Paul Johnson wrote:

 Hell
 (1) par(xpd=TRUE) allows you to write outside the plotting region

  O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B

As usual, PD right on the money.  Here's a working demo program for
posterity.  For  most random samples, this gives a good looking
graph.  For a the other few, well, they happen less than 0.05 of the
time :)


### histogramWithDensityLinesAndLegend.R
### Paul Johnson 2008-06-02
### Thanks to r-help members for tips.


x - rnorm(100)
###Allows writing outside plotting region

par(mai=c(1,1,2.5,1))


par(xpd=TRUE)

myhist - hist(x, freq=F, main=Different Meanings of Normality)

lines(density(x))

xseq1 - seq( min(x), max(x), length.out=100)

m1 - mean(x)

sd1 - sd(x)

obsNormal - dnorm(xseq1, mean=m1, sd=sd1)

lines( xseq1, obsNormal, lty=2, col=red)

truNormal - dnorm(xseq1)

lines(xseq1, truNormal, lty=3, col=green)

legend(min(xseq1),1.3*max(myhist$density), legend=c(observed
density, normal with observed mean  sd, normal with 'true' mean
and sd), lty=c(1,2,3), col=c(black, red, blue))





-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] why doesn't t.test value include standard error

2009-05-07 Thread Paul Johnson

Hello, everybody!

I'm back to ask the obvious on behalf of the silent majority :)

Today a student asked me what standard error was used in this t.test
output.  I looked into it and was a little surprised that a t.test
output object does not have a slot for the standard error.  Of
course, we can reconstruct the se=mu-hat/t, but I was surprised.

Do you think it would be nicer if t.test did include the denominator
in the output? If we had that output, we could more easily compare the
different methods of calculating the standard error that are discussed
in ?t.test.

ex:

 x - rnorm(100, m=10)
 myt - t.test(x, m=8)
 attributes(myt)
$names
[1] statistic   parameter   p.value conf.intestimate
[6] null.value  alternative method  data.name

$class
[1] htest

 myse - myt$estimate/myt$statistic
 myse
mean of x
0.4928852
 myt$statistic
   t
20.33975
 myt$estimate/myse
mean of x
 20.33975


Happy summer!  Today was our last day of class in Kansas.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] times family unavailable in postscript device (Ubuntu Linux)

2009-05-11 Thread Paul Johnson

I'm running Ubuntu 9.04.  I could use some advice about fonts in
postscript devices.

 sessionInfo()
R version 2.9.0 (2009-04-17)
i486-pc-linux-gnu

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base


I can use family=Times with pdf output, but postscript refuses. It says:


 plot(rnorm(10),rnorm(10), family=Times)
Error in axis(side = side, at = at, labels = labels, ...) :
  family 'Times' not included in PostScript device

This happens even though Times *appears* to be listed as a valid family :

 names(postscriptFonts())
 [1] serifsans mono
 [4] AvantGarde   Bookman  Courier
 [7] HelveticaHelvetica-Narrow NewCenturySchoolbook
[10] Palatino TimesURWGothic
[13] URWBookman   NimbusMonNimbusSan
[16] URWHelvetica NimbusSanCondCenturySch
[19] URWPalladio  NimbusRomURWTimes
[22] ComputerModern   ComputerModernItalic Japan1
[25] Japan1HeiMin Japan1GothicBBB  Japan1Ryumin
[28] Korea1   Korea1debCNS1
[31] GB1

 example(postscriptFonts)

pstscF postscriptFonts()
$serif
$family
[1] Times

$metrics
[1] Times-Roman.afm  Times-Bold.afm   Times-Italic.afm
[4] Times-BoldItalic.afm Symbol.afm

$encoding
[1] default

attr(,class)
[1] Type1Font

$sans
$family
[1] Helvetica

$metrics
[1] Helvetica.afm Helvetica-Bold.afm
[3] Helvetica-Oblique.afm Helvetica-BoldOblique.afm
[5] Symbol.afm

$encoding
[1] default

attr(,class)
[1] Type1Font

$mono
$family
[1] Courier

$metrics
[1] Courier.afm Courier-Bold.afm
[3] Courier-Oblique.afm Courier-BoldOblique.afm
[5] Symbol.afm

$encoding
[1] default

attr(,class)
[1] Type1Font

$AvantGarde
$family
[1] AvantGarde

$metrics
[1] agw_.afm agd_.afm agwo.afm agdo.afm Symbol.afm

$encoding
[1] default

attr(,class)
[1] Type1Font

$Bookman
$family
[1] Bookman

$metrics
[1] bkl_.afm bkd_.afm bkli.afm bkdi.afm Symbol.afm

$encoding
[1] default

attr(,class)
[1] Type1Font

$Courier
$family
[1] Courier

$metrics
[1] Courier.afm Courier-Bold.afm
[3] Courier-Oblique.afm Courier-BoldOblique.afm
[5] Symbol.afm

$encoding
[1] default

attr(,class)
[1] Type1Font

$Helvetica
$family
[1] Helvetica

$metrics
[1] Helvetica.afm Helvetica-Bold.afm
[3] Helvetica-Oblique.afm Helvetica-BoldOblique.afm
[5] Symbol.afm

$encoding
[1] default

attr(,class)
[1] Type1Font

$`Helvetica-Narrow`
$family
[1] Helvetica-Narrow

$metrics
[1] hvn_.afm hvnb.afm hvno.afm hvnbo___.afm Symbol.afm

$encoding
[1] default

attr(,class)
[1] Type1Font

$NewCenturySchoolbook
$family
[1] NewCenturySchoolbook

$metrics
[1] ncr_.afm ncb_.afm nci_.afm ncbi.afm Symbol.afm

$encoding
[1] default

attr(,class)
[1] Type1Font

$Palatino
$family
[1] Palatino

$metrics
[1] por_.afm pob_.afm poi_.afm pobi.afm Symbol.afm

$encoding
[1] default

attr(,class)
[1] Type1Font

$Times
$family
[1] Times

$metrics
[1] Times-Roman.afm  Times-Bold.afm   Times-Italic.afm
[4] Times-BoldItalic.afm Symbol.afm

$encoding
[1] default

attr(,class)
[1] Type1Font

$URWGothic
$family
[1] URWGothic

$metrics
[1] a010013l.afm a010015l.afm a010033l.afm a010035l.afm s05l.afm

$encoding
[1] default

attr(,class)
[1] Type1Font

$URWBookman
$family
[1] URWBookman

$metrics
[1] b018012l.afm b018015l.afm b018032l.afm b018035l.afm s05l.afm

$encoding
[1] default

attr(,class)
[1] Type1Font

$NimbusMon
$family
[1] NimbusMon

$metrics
[1] n022003l.afm n022004l.afm n022023l.afm n022024l.afm s05l.afm

$encoding
[1] default

attr(,class)
[1] Type1Font

$NimbusSan
$family
[1] NimbusSan

$metrics
[1] n019003l.afm n019004l.afm n019023l.afm n019024l.afm s05l.afm

$encoding
[1] default

attr(,class)
[1] Type1Font

$URWHelvetica
$family
[1] URWHelvetica

$metrics
[1] n019003l.afm n019004l.afm n019023l.afm n019024l.afm s05l.afm

$encoding
[1] default

attr(,class)
[1] Type1Font

$NimbusSanCond
$family
[1] NimbusSanCond

$metrics
[1] n019043l.afm n019044l.afm n019063l.afm n019064l.afm s05l.afm

$encoding
[1] default

attr(,class)
[1] Type1Font

$CenturySch
$family
[1] CenturySch

$metrics
[1] c059013l.afm c059016l.afm c059033l.afm c059036l.afm s05l.afm

$encoding
[1] default

attr(,class)
[1] Type1Font

$URWPalladio
$family
[1] URWPalladio

$metrics
[1] p052003l.afm p052004l.afm p052023l.afm p052024l.afm s05l.afm

$encoding
[1] default

attr(,class)
[1] Type1Font

$NimbusRom
$family
[1] NimbusRom

$metrics
[1] n021003l.afm n021004l.afm n021023l.afm n021024l.afm s05l.afm

$encoding
[1] default

attr(,class)
[1] Type1Font

$URWTimes
$family
[1]

Re: [R] fitdistr for t distribution

2009-05-16 Thread Paul Johnson

On Fri, May 15, 2009 at 6:22 AM, lagreene lagreene...@gmail.com wrote:

 Thanks Jorge,

 but I still don't understand where they come from.  when I use:
 fitdistr(mydata, t, df = 9) and get values for m and s, and the variance
 of my data should be the df/s?

 I jsut want to be able to confirm how m and s are calculated

I've wondered the same kind of thing and I've learned the answer is
easy!  It is not so easy for all R functions, but did you try this
with fitdistr?

 library (MASS)
 fitdistr

the output that follows is the ACTUAL FORMULA that is used to make the
calculations!

I've not yet mastered the art of getting code for some functions.

 predict
function (object, ...)
UseMethod(predict)
environment: namespace:stats

But I know there is a way to get that code if you know the correct way
to run getS3method().  But I usually just go read the R source code
rather than puzzle over that.




 mydt - function(x, m, s, df) dt((x-m)/s, df)/s
 fitdistr(x2, mydt, list(m = 0, s = 1), df = 9, lower = c(-Inf, 0))

 Thanks anyway for the help!

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] save the output of summary(lmList(x)) into a dataframe

2009-06-16 Thread Paul Johnson

On Tue, Jun 16, 2009 at 3:29 AM, Cecilia Carmocecilia.ca...@ua.pt wrote:
 Hi r-helpers!

 I need to save the output of summary() function that I’ve runned like this:
 z- lmList(y~x1+x2| x3, na.action=na.omit,data1,subset=year==1999)
 w-summary(z)
 The output (w) is something like this:
 Call:
  Model: y ~ x1 + x2 | x3
   Data: data1


Does this come close?  I'm just using the lmList example from lme4

### example(lmList) spawns an object fm1 that I demonstrate with:


library(lme4)

example(lmList)


varnames - names(fm1[[1]]$coefficients)



mylist - lmlist(fm1, summary)

varnames - names(fm1[[1]]$coefficients)


myCoefficients - function(x, name) { x[name,] }

theintercepts - t( sapply(mylist, myCoefficients, varnames[1]))

theSecondVar - t( sapply(mylist, myCoefficients, varnames[2]))


I know other r-help readers will know a better way to cycle through
the variables rather than running once for each element of varnames...


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R for Mac 10.3.9

2008-01-28 Thread Paul Johnson

I know nothing of Macintosh, so please be patient.

My student has a Macintosh with OSX 10.3.9.  The R for Mac says it is
for 10.4.4 or higher.

Aside from saying get a new Mac, what can be said to my student?
Can you point me at the newest version of R that did work on 10.3 ?

pj

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Working up examples of barplots with customized marginal items

2009-03-11 Thread Paul Johnson

Hello, everybody:

I'm back to share a long piece of example code.  Because I had trouble
understanding the role of par() and margins when customizing barplots,
I have struggled with this example.  In case you are a student wanting
to place legends or textual markers in the outer regions of a barplot,
you might run these examples.

There are a few little things I don't quite understand.

If you look below where I've below placed ???,  you see I've used
text() to write inside the bars of a plot. To vertically align the
strings, I use the text() function's option pos.  However, the text is
not centered horizontally inside the bars. I don't think I found a
perfectly general solution with offset because it does not take into
account the width of the bars.

Second, can  changes to par be forced on all devices? I learned the
hard way that a change like par(mar=c(10,3,4,5)) will apply to the
current working device, but when a new output device is created, the
margins are re-set to default.  The margins in the saved graph won't
match the screen unless par is run again:

postscript( ---whatever--- )
###par must be run again to manipulate the postscript device:
par ( --whatever ---)
plot (--whatever---)
dev.off()

On the other hand,

dev.copy(postscript, ---whatever--)

will inherit the par values set on the screen device.

Is there a single command that can adjust the margins of all devices
that are created within a given session?


Anyway, here's my big barplot script and I hope some students benefit
from experimenting with it

pj

### Paul Johnson
### Twisting the margins of a barplot

### I never thought too much about customizing barplots, but
### now I have learned some tricks to share.  Step through these
### examples to see the possibilities.

set.seed(424242)

x - rpois(10, lambda=10)

mynames - c(rep(Really Long Name,9),Really Long Name Long Long)


#RSiteSearch(barplot names margins)


barplot(x)


### Note long names off edge:
barplot(x, names=mynames, las=2)

### Abbreviate offers one solution
barplot(x, names=abbreviate(mynames), las=2)

### Other option is to reset margins
###default mar is c(5.1, 4.1, 4.1, 2.1)

### Lets make the bottom margin larger
par(mar=c(15,4.1,4.1, 2.1))
barplot(x, names=mynames, las=2)



### Now the bottom is finished. But I'd like a legend.

legend(topleft, legend=c(A really long label,B,Cee What I can
do?,D), col=1:2)

### My bar hits the legend. How to fix?


### It is not sufficient to simply move the legend up
### because then it runs off the edge of the graph
barplot(x, names=mynames, las=2)
legend(2,22, legend=c(A really long label,B,Cee What I can
do?,D), col=1:2)

### By itself, changing the top margin does not help.
### It is also vital to set the xpd parameter to T so that
### R will draw outside the main plot region
par(mar=c(10,4.1,8.1, 2.1))
par(xpd=T)
barplot(x, names=mynames, las=2)
legend(2,22, legend=c(A really long label,B,Cee What I can
do?,D), col=1:2)


### Now lets gain some control on the colors and bars
### The default colors are so dark.  You can't write on top
### of them.  You can grab shades of gray like gray30 or such.

### I'll just specify 4 colors I know will work. I prefer
### the light gray colors because they can be written over.
mycols - c(lightgray,gray70,gray80,gray90,gray75)

barplot(x, names=mynames, las=2, col=mycols)

legend(2,20,legend=c(A,B,C,D),fill=mycols)

### Still, I don't like the solid shades so much.
### fill up the bars with angled lines.
### density determines number of lines inside the bar.

myden - c(60,20,30,40,25)
### angles determines the direction of lines inside bars
myangles - c(20,-30,60,-40,10)

barplot(x, names=mynames, las=2, col=mycols,density=myden,angle=c(20,60,-20))

legend(1,20,legend=c(A,B,C,D),density=myden,angle=myangles)

### for my coupe de grace, lets do some writing in the bars.

### Recall from Verzani you can get the x coordinates from the barplot
barPositions - barplot(x, names=mynames, las=2,
col=mycols,density=myden,angle=c(20,60,-20))

barPositions

### The text option srt=90 turns the text sideways

### I'm just guessing that bars 1 and 8 should be labeled
### at coordinates 5 and 5

text (barPositions[1], 5, my first bar is great, srt=90)

text (barPositions[8], 5, but 8 is greater, srt=90)

### Recently I had the problem of drawing a clustered bar chart.
### Lets suppose our x variable really represents responses from
### 2 groups of respondents, Men and Women.

## Create the matrix
xmatr - matrix(x, nrow=5)

### Use beside=T to cause barplot to treat each column
### of values as a group
barplot(xmatr, beside=T, names=c(Men,Women))

valueNames - c(Very Strong,Somewhat Strong,Not Strong,Somewhat
Weak,Very Weak)

### Use mtext to write in the margin so that labels on plot are nice.

par(mar=c(10,4.1,5.1, 2.1))

bp - barplot(xmatr, beside=T, names=NULL)

### bp contains the positions of the bars.
mtext(valueNames, side=1, las=3, at=bp)

mtext(c(Men,Women), side=1, at=c(bp[3],bp[8]),line

[R] Ever see Process R exited abnormally with code 4?

2009-03-13 Thread Paul Johnson

I'm on a Windows XP student's computer.  When we get busy and start
running R stuff, it hangs at random with the hour glass showing, but
the system's cpu is not running at 100%.  We sit and watch for a
while, and then try alt-ctl-delete to kill the not responding program.
 In this case, I'm able to produce the problem almost every time by
starting Rcmdr and running some graphs.

I'm suspecting that there may be some hardware problem, possibly
overheating, but have no evidence for that guess.  I can't find the
error codes in the R docs (sorry).

I think the following supplies the information that the posting guide asks for:

 sessionInfo()
R version 2.8.1 (2008-12-22)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base
 library(Rcmdr)
Loading required package: tcltk
Loading Tcl/Tk interface ... done
Loading required package: car

Rcmdr Version 1.4-7


Attaching package: 'Rcmdr'


The following object(s) are masked from package:tcltk :

 tclvalue

 x- rnorm(100)
 Loading required package: rgl
Loading required package: mgcv
This is mgcv  1.5-0 . For overview type `help(mgcv-package)'.

Process R exited abnormally with code 4 at Fri Mar 13 17:19:47 2009



-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with Function!

2009-03-13 Thread Paul Johnson

On Fri, Mar 13, 2009 at 6:28 PM, Lars Bishop lars...@gmail.com wrote:
 Dear All,

 I need to write 'n' functions on 'm' variables. The functions should be
 constructed according to the values of an (nxm) matrix of '1/0' values as
 follows. For example,

 if row1 is equal to ,say [1 0  ...0 0] then f1 - (1+x1)
 if row 2 is equal to, say [1 1 1 0...0 1] then f2
 -(1+x1)*(1+x2)*(1+x3)*(1+xm)
 if row n is equal to [0 1 0 1 1 0 . 0] then fn -(1+x2)*(1+x4)*(1+x5)

 Is there an efficient way to write these functions? Note that each f(x) is a
 function of m variables, independently on whether the function explicitly
 includes a given variable or not.

 Many thanks for your help!

 Regards,




indic - matrix(c(1,1,0,0,0,1,0,0,1,0), ncol=5)

X - data.frame(x1=3,x2=3,x3=4,x4=5,x5=6)

### Give this function 2 vectors, an indicator
### vector ind and a value vector Xin
myHomework - function(ind, Xin){
  prod(Xin[ , which(ind==1)])
}

myHomework(indic[1,], X[1,])

myHomework(indic[2,], X[1,])

Seems to give the right answer.  Let me know if we get an A.

PJ


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to combine xtable and minipage with Sweave ?

2009-03-13 Thread Paul Johnson

On Fri, Mar 13, 2009 at 12:17 PM, Duncan Murdoch murd...@stats.uwo.ca wrote:
 On 3/13/2009 12:07 PM, Ptit_Bleu wrote:

 Thanks Dieter for the link.


 You can use \includegraphics explicitly yourself, and avoid the automatic
 code generated by Sweave.  For example,

 testfn, fig=true, include=false=
 curve(f, from = 1, to = 5)
 @

 That generates the figure, but doesn't include it; then

 \begin{figure}
 \begin{center}
 \includegraphics{-testfn}
 \caption{The function $f(x) = |x-3.5| + (x-2)^2$.\label{fig:testfn}}
 \end{center}
 \end{figure}

 will do the include.  You need to name the chunk that creates the figure for
 this to be workable, and if you have set a prefix in your SweaveOpts, you
 need to use it in the \includegraphics call.  (Actually I forget if the
 hyphen is needed; I stripped off a prefix from the real example to show you
 this.)

 Duncan Murdoch


Dear Duncan:

Thanks for the tip. This is very useful to me.

Just to clear up the filename question, I worked up an example for
testing.  For me it is necessary to include the name of the Rnw/tex
file as a prefix in the includegraphics command.  Maybe that's just
me, or Ubuntu, or TexLive2007.

Here is a working file SweaveExample.Rnw.  In order to make this work,
you see the figure names in includegraphics.

\documentclass[english]{article}
\usepackage[T1]{fontenc}
\usepackage[latin9]{inputenc}

\usepackage{babel}

\begin{document}
testfn, fig=true, include=false=
curve(sin, from = 1, to = 5)
@

testfn2, fig=true, include=false=
curve(sin, from = 1, to = 5)
@

%
\begin{figure}
\caption{My Figure}
\includegraphics{SweaveExample-testfn}
\end{figure}

%%works without float too:
\includegraphics{SweaveExample-testfn2}
\end{document}


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with figure size when embedding fonts

2009-03-15 Thread Paul Johnson

On Sat, Mar 14, 2009 at 1:51 PM, Frank E Harrell Jr
f.harr...@vanderbilt.edu wrote:
 Dear Colleagues:

 I need to make a graphic that uses the Nimbus rather than Helvetica font
 family so that the font can be embedded in the encapsulated postscript file.
  This is to satisfy a requirement from a journal for electronic submission
 of figures.  I do the following:

 postscript('lowess.eps', onefile=FALSE, pointsize=18, horizontal=FALSE,
           family='NimbusSan')
 spar(mfrow=c(3,2))
 . . .
 dev.off()

 At this point lowess.eps looks fine.  When I run:
 embedFonts('lowess.eps') or
 embedFonts('lowess.eps', options='-sPAPERSIZE=letter')

 the figures are wider and the right panel of the 3x2 matrix of plots expands
 past the paper edge.  Advice welcomed.


Hello Frank:

This is an interesting post. I've not played with this until you
brought it to my attention.

I made a little progress.  It appears to me either that there is
something wrong with embedFonts as applied to postscript files or you
and I need to learn a lot of ghostscript options.

I created a small test and I see the same problem you do--the margins
bump if you embed the fonts. I think the problem may be worse than
that, it looks like the embedFonts wrecks the bounding box of the eps
file.

Do this:

 x - rnorm(100)
 y - rnorm(100)
 ml - loess(y~x)
 plot(x,y)
 lines(ml)
 postscript('lowess.eps', onefile=FALSE, pointsize=18,
horizontal=FALSE, family=NimbusSan)
 plot(x,y)
 lines(ml)
 text (-1,1,Really big writing)
 dev.off()

Copy lowess.eps to a safe place, then run

embedFonts(lowess.eps)

when you compare the two eps files, the margins are quite different.

I see the difference more sharply when I add the features I usually
have in postscript:


postscript('lowess.eps', onefile=FALSE, pointsize=18,
horizontal=FALSE, family=NimbusSan,
height=6,width=6,paper=special)
 plot(x,y)
 lines(ml)
 text(-1,1, Really  big writing
 dev.off()

If you run that, save a copy of lowess.eps, then run embedFonts, you
see the new lowess.eps no longer has the proper bounding box.

Here's the top of lowess-orig.eps

%!PS-Adobe-3.0 EPSF-3.0
%%DocumentNeededResources: font NimbusSanL-Regu
%%+ font NimbusSanL-Bold
%%+ font NimbusSanL-ReguItal
%%+ font NimbusSanL-BoldItal
%%+ font StandardSymL
%%Title: R Graphics Output
%%Creator: R Software
%%Pages: (atend)
%%BoundingBox: 0 0 432 432
%%EndComments
%%BeginProlog

On the other hand, the one that results from the embedFonts is not EPS
anymore, and the bounding box is, well, confusing.


%!PS-Adobe-3.0
%%Pages: (atend)
%%BoundingBox: 0 0 0 0
%%HiResBoundingBox: 0.00 0.00 0.00 0.00
%.
%%Creator: GPL Ghostscript 863 (pswrite)
%%CreationDate: 2009/03/15 20:00:36
%%DocumentData: Clean7Bit
%%LanguageLevel: 2
%%EndComments
%%BeginProlog

It appears to me that running the eps output file through ps2epsi will
re-create a tight bounding box, perhaps it is sufficient for your
need.

Rather than wrestle with the postscript  ghostscript, I re-did this
on a pdf output device. It appears to me that the margin placement is
not affected by embedFonts.


pdf('lowess.pdf', onefile=FALSE, pointsize=18,
family=NimbusSan,height=6,width=6,paper=special)
plot(x,y)
lines(ml)
text (-1,1,Really big writing)
dev.off()

I believe it is important to specify paper=special here.

I wondered about your experience with par(mfcol---).  Lately, I've had
more luck printing out the smaller separate components and putting
them on the same LaTeX figure. That way, I have a little more control,
and the fonts in the figure captions are identical to my text, which I
like.


PJ

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating dataframe names on the fly?

2009-03-21 Thread Paul Johnson

On Fri, Mar 20, 2009 at 7:18 PM, science! karthik@gmail.com wrote:

 I am aware that it is easily possible to create var names on the fly. e.g.
 assign(paste(m,i,sep=),j)
 but is it possible to assign dataframes to variables created on the fly?

 e.g.
 If I have a dataframe called master and I wanted to subset parts of those
 data into separate dataframes, I could do:

 m1=subset(master,master$SAMPLE=='1')
 m2=subset(master,master$SAMPLE=='2')
 .

 but I would like to do this in a loop. Can someone give me suggestions on
 how to accomplish this?


 I tried assign(paste(m,i,sep=),subset(master,master$SAMPLE==i) with no
 success.

Are you sure you really need to name these dataframes?

Here's a workaround that I use for these cases.  Create your new data
frames and add them to a list, as in

myframes - list(subset(master,master$SAMPLE=='1'),
m2=subset(master,master$SAMPLE=='2'))

Then when you want to use these things,  you can get the first one as
myframes[[1]]
or myframes[[2]].

You can name the objects inside the list:

names(myframes) - c(A,B)

This is just as good as referring to them  by name, in my experience.
It also has the benefit that because your dataframes are in a list,
then you can use features like lapply to do things for each dataset.

I'm reading ?assign now, and it appears  you can actually name these things.

 name - paste(fred,4,sep=)
 x - data.frame(py=rnorm(10))
 assign(name,x)
 ls()
[1] fred4 mname name  x
 name
[1] fred4
 fred4
py
1  -1.04243477
2  -0.66475049
3  -0.08576428
4   0.64369356
5  -0.06828696
6   1.15710627
7  -2.45041700
8   0.40139655
9   0.27320936
10 -0.98028020

pj



-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Goodness of fit for negative binomial model

2009-03-21 Thread Paul Johnson

On Fri, Mar 20, 2009 at 8:03 PM, t c mudiver1...@yahoo.com wrote:
 Dear r list,

 I am using glm.nb in the MASS package to fit negative binomial models to data 
 on manta ray abundance, and AICctab in the bbmle package to compare model 
 IC.  However, I need to test for the goodness of fit of the full model, and 
 have not been able to find a Pearson's Chi Squared statistic in any of the 
 output.  Am I missing it somewhere?  Is there a way to run the test using 
 either chisq.test() or goodfit()?  I would appreciate any suggestions on how 
 to test for goodness of fit in negative binomial models.

 Thanks,

 Tim Clark

I found myself wondering if the Chi Square you get from anova() is the
Pearson one you want?  (see ?anova.negbin).

Just an example:



 library(MASS)
 example(glm.nb)
 anova(quine.nb3)
Analysis of Deviance Table

Model: Negative Binomial(1.9284), link: log

Response: Days

Terms added sequentially (first to last)


 Df Deviance Resid. Df Resid. Dev P(|Chi|)
NULL   145272.291
Sex   11.705   144270.586 0.192
Sex:Age   6   30.202   138240.384 3.599e-05
Sex:Eth   2   20.461   136219.923 3.606e-05
Sex:Lrn   28.459   134211.465 0.015
Sex:Eth:Lrn   2   18.287   132193.178 1.069e-04
Sex:Age:Lrn   48.649   128184.529 0.070
Sex:Age:Eth   69.503   122175.025 0.147
Sex:Age:Eth:Lrn   47.572   118167.453 0.109
Warning message:
In anova.negbin(quine.nb3) : tests made without re-estimating 'theta'


That warning about re-estimating theta concerns me a bit.

If that's not the correct Pearson statistic, I bet you can get what
you need if you take the Pearson residuals and calculate whatever.  I
am looking at

?residuals.glm

and I note you can get Pearson residuals.  If you have a formula from
a dusty old stats book with the formula, I bet you can get it done.

pj
-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plot and Boxplot in the same graph

2009-03-21 Thread Paul Johnson

On Fri, Mar 20, 2009 at 10:02 PM, johnhj jhar...@web.de wrote:

 Hii,

 Is it possible, to use the plot() funktion and the boxplot() funktion
 together ?
 I will plot a simple graph and additionally to the graph on certain places
 boxplots. I have imagined to plot the graph a little bit transparency and
 show in the same graph on certain places boxplots

 Is it possible to do it in this way ?

 greetings,
 johnh
 --

Run the boxplot first, then use points() or other subsidiary plot
functions to add the points in the figure.

 y - rnorm(100)
 x - gl(2,50)
 boxplot(x,y)
 points(x,y)


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] I want to use Sweave, but only sometimes

2009-04-13 Thread Paul Johnson

Does anybody have a workable system to run an Rnw document through
R-Sweave when necessary, but to just run it through LaTeX if no new R
calculations are needed? I.e.,  the figures already exist, I do not
need R to do more work for me, so I send the document straight to
LaTeX.

I want to leave open the option that I might need to run the document
through Sweave in the future, so I don't want to just convert the
document to pure LaTeX format.

Here's why I think this must be possible. In this list, I saw one of
you explain this approach to working with figures in Sweave docs:

Instead of using the automatic figure inclusion approach like this:

testfn2, fig=true=
curve(sin, from = 1, to = 55)
@

Do this instead:


\SweaveOpts{prefix.string=foo/bar}

testfn2, fig=true, include=false=
curve(sin, from = 1, to = 55)
@


\begin{figure}
\caption{My Figure}
\includegraphics{foo/bar-testfn}
\end{figure}


As long as the figures are saved in the directory foo, then LaTeX will
find them, I think (hope!).  Then if I could tell LaTeX to ignore the
 ...@ stuff, then it *seems* to me the Rnw file would be processed
successfully without further hassle.

I am sorry if this has been asked  answered before, I may not know
the magic words for searching for past solutions.

pj

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Group by in R

2009-04-14 Thread Paul Johnson

On Mon, Apr 13, 2009 at 8:56 AM, Nick Angelou nikola...@yahoo.com wrote:

 data
   X1 X2 X3 X4
 1   1  2  2  1
 2   1  1  2  2
 3   1  1  2  2
 4   2  2  1  2
 5   1  1  2  2
 6   2  2  1  2
 7   1  1  2  1
 8   2  2  1  2
 9   1  2  1  1
 10  1  1  2  2

 sqldf(select X1, X2, X3, X4, count(*) CNT from data group by X1, X2, X3, X4
 ORDER BY X4, X1, X2, X3)

  X1 X2 X3 X4 CNT
 1  1  1  2  1   1
 2  1  2  1  1   1
 3  1  2  2  1   1
 4  1  1  2  2   4
 5  2  2  1  2   3

 The counts are fine, though it's not exactly what I need. I need a kind of
 contingency table:

                                 | levels of X4 |
                                 ---
 unique triplets of X1:X3 |  1   |   2   |

 -
            1 1 1             |  0       0
            1 1 2             |  1       4
            1 2 1             |  1       0
            1 2 2             |  1       0
            2 1 1             |  0       0
            2 1 2             |  0       0
            2 2 1             |  0       3
            2 2 2             |  0       0


 So the final result should be a table structure like:


 0 0
 1 4
 1 0
 1 0
 0 0
 0 0
 0 3
 0 0


I propose this way to get the numbers you want. I create a new
variable to represent the values of the three then make a table:



md - 
matrix(c(1,2,2,1,1,1,2,2,1,1,2,2,2,2,1,2,1,1,2,2,2,2,1,2,1,1,2,1,2,2,1,2,1,2,1,1,1,1,2,2),ncol=4)

dat - as.data.frame(md)
names(dat)- c(x1,x2,x3,x4)

newvar - factor(paste(dat$x1,dat$x2,dat$x3,sep=-))

table(newvar, dat$x4)


Behold:

 table(newvar, dat$x4)

newvar  1 2
  1-1-1 1 0
  1-2-1 1 0
  1-2-2 1 3
  2-1-1 1 0
  2-1-2 1 0
  2-2-1 1 0
  2-2-2 0 1



-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How do you specify font family in png output; png cross-platform issues

2009-01-26 Thread Paul Johnson

For teaching purposes, I prepared a little R program. I want to give
this to students who can run it and dump out many formats and then
compare their use in LaTeX documents.  I do not have too much trouble
with xfig or postscript format, but I've really run into a roadblock
where png files are concerned.

My original problem was that the png device does not accept a family option.
How can I have png output with the Times family to compare with the
postscript or pdf output?

While searching for information on this, I discovered there have been
a lot of R changes in png support.  If I give this script to people
with Mac or Windows, what are the chances that it will work?  If I'm
reading the png help page correctly, there are different types
available, Xlib and cairo, but I don't understand what all that means
for sending a program like this across systems. (fear the worst, but
ask hoping for best).

As far as I understand it, the paper=special option is needed so
that the eps or pdf output will fit into a document without creating
really huge margins around the graph. Correct?


x- rnorm(333)

y- rnorm(333)

plot ( x,y, xlab=Input Variable, ylab=Output Variable)

xfig(file=testplot.fig,  horizontal=F, height=6, width=6, family=Times)
plot ( x,y, xlab=Input Variable, ylab=Output Variable)
dev.off()

postscript(file=testplot-1.eps,  horizontal=F, height=6, width=6,
family=Times, onefile=F, paper=special)
plot ( x,y, xlab=Input Variable, ylab=Output Variable)
dev.off()

postscript(file=testplot-2.eps,  horizontal=F, height=4, width=4,
family=Times, onefile=F, paper=special)
plot ( x,y, xlab=Input Variable, ylab=Output Variable)
dev.off()

pdf(file=testplot-1.pdf, height=6, width=6,
family=Times,onefile=F,paper=special)
plot ( x,y, xlab=Input Variable, ylab=Output Variable)
dev.off()


png(file=testplot-1.png, height=350, width=550, type=Xlib)
plot ( x,y, xlab=Input Variable, ylab=Output Variable)
dev.off()


png(file=testplot-2.png, height=350, width=550, type=cairo)
plot ( x,y, xlab=Input Variable, ylab=Output Variable)
dev.off()




Can I bother you about one last png issue?

While searching r-help, I see posts about the difference in png output
between type Xlib and cairo.  For reasons I do not understand,
ordinary viewers like GQview or Firefox make cairo-produced png files
look blurry (in the words of posts on r-help).  The png output from
type=Xlib output is not blurry. This raises another level of
confusion about this exercise I'm devising.  Does R for Windows, as
provided on the CRAN system, use Xlib for png?

pj



-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] PCALG Package

2009-01-27 Thread Paul Johnson

This means you need to install the Rgraphviz package.  Have you tried?

For me, Rgraphviz is not in CRAN, but it is required for that package you want.

Rgraphviz is hosted in biocondoctor, so you have to install it through
that route.

http://www.bioconductor.org/packages/release/bioc/html/Rgraphviz.html

After that, you re-install the other package you really wanted.
 library(pcalg)
Loading required package: MASS
Loading required package: graph
Loading required package: robustbase
Loading required package: Rgraphviz
Loading required package: grid
Loading required package: ggm

Attaching package: 'ggm'


The following object(s) are masked from package:graph :

 edgeMatrix

Loading required package: mnormt

On Tue, Jan 27, 2009 at 12:17 PM, Tibert, Brock btib...@bentley.edu wrote:
 I can not even get the package to run.  I installed the package, and it is 
 telling me I need rGraphViz.  I was told to install it was included in the 
 Bioconductor package, but that did not work either.  The error message I 
 routinely get is surrounding a missing RGraphViz package.  I have searched 
 the internet, saw a place to install it.  I attempted that as well, but to no 
 avail.

 I am stumped.  Does it work for you?  IF so, when did you install the package?

 Many thanks,

 Brock


library(pcalg)
 Loading required package: MASS
 Loading required package: graph
 Loading required package: robustbase
 Loading required package: Rgraphviz
 Error: package 'Rgraphviz' could not be loaded
 In addition: Warning message:
 In library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc = 
 lib.loc) :
  there is no package called 'Rgraphviz'





-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] I want axes that cross

2009-02-13 Thread Paul Johnson

Hello, everybody.

A student asked me a howto question I can't answer.  We want the
length of the drawn axes to fill the full width and height of the
plot, like so:

   |
   | *
   |   *
   | *
---|-

However, when we use plot with axes=F and then use the axis commands
to add the axes, they do not cross over each other. We get


   | *
   |   *
 *
   --

The axes are not wide enough to cover the entire range of the data. We
do not want to add a box() command, which is a usual answer for this,
because we don't want to draw on top or the side.

Here's my test case:

x - rnorm(100)
z - gl(2,50)
y - rnorm(100, mean= 1.8*as.numeric(z))

plot(x,y,type=n, axes=F)
points(x,y, pch=$,cex=0.7, col=z)
axis(1, col=green, col.axis=green)
axis(2, col=red, col.axis=red)

Can I cause the 2 axes to cross as desired?

The axis help says the axis is clipped at the plot region, but it
seems to get clipped even more narrowly thanthat.

I've been searching in r-help quite a bit and am a little surprised
how difficult this is

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] I want axes that cross

2009-02-13 Thread Paul Johnson

On Fri, Feb 13, 2009 at 1:51 PM, Marc Schwartz
marc_schwa...@comcast.net wrote:
 on 02/13/2009 01:25 PM Paul Johnson wrote:
 Hello, everybody.

 A student asked me a howto question I can't answer.  We want the
 length of the drawn axes to fill the full width and height of the
 plot, like so:

 Paul,

 I am guessing that you want:

 x - rnorm(100)
 z - gl(2,50)
 y - rnorm(100, mean= 1.8*as.numeric(z))

 plot(x,y,type=n, axes=F)
 points(x,y, pch=$,cex=0.7, col=z)
 axis(1, col=green, col.axis=green)
 axis(2, col=red, col.axis=red)

 # Draw the box like an L on the bottom and left only
 box(bty = l)


 Note that you can specify which sides the 'box' is created upon by using
 the 'bty' argument. See ?box for more information.

Thanks, I did not find bty under ?box, but found it under par after
you pointed it out.

That does not get the correct output, however, because the black box
covers over my 2 different colored axes.

Even if I weren't color-conscious, it gives me this:

|
|
|___

not crossed axes, which I want:

   |
   |
 _|__
   |

I'm putting in a seed so we will both see the same things in this example.

set.seed(1233240)
x - rnorm(100)
z - gl(2,50)
y - rnorm(100, mean= 1.8*as.numeric(z))
plot(x,y,type=n, axes=F)
points(x,y, pch=$,cex=0.7, col=z)
axis(1, col=green, col.axis=green)
axis(2, col=red, col.axis=red)

# MS recomends:
 # Draw the box like an L on the bottom and left only
 box(bty = l)


 Also, by default, the axes extend the range of 'x' and 'y'  by 4%. You
 can use 'xaxs = i' and 'yaxs = i' in the plot() call to restrict the
 axes to the true ranges of 'x' and 'y'.  This would be important, for
 example, when you want the lower left hand corner of the plot to be at
 exact coordinates such as 0,0.

I would be delighted if the axes really did reach 4% outside the data.
But they don't.  I've seen that same thing you are referring to in the
documentation, but there's something wrong about it, In my example
code, we should see the same thing now I've put in a seed. The axes
are smaller than the data range, not equal to 1.04 times the data
range. I see several observations in the graph that are off the
charts, they are above the highest value of the y axis, or below the
lowest axis value. Similarly, there are observations smaller than the
low end of the x axis and bigger than the largest x axis value.

The 4% may be the plot region's size, but it is surely not the length
of the axis that is drawn?


 See ?par for more information.

 HTH,

 Marc Schwartz





-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] I want axes that cross

2009-02-13 Thread Paul Johnson

On Fri, Feb 13, 2009 at 1:42 PM, Daniel Moreira daniel.more...@duke.edu wrote:

 Try defining the argument 'pos' in the axis command-line, like:

 x - rnorm(100)
 z - gl(2,50)
 y - rnorm(100, mean= 1.8*as.numeric(z))

 plot(x,y,type=n, axes=F)
 points(x,y, pch=$,cex=0.7, col=z)
 axis(1, col=green, col.axis=green, pos=0)
 axis(2, col=red, col.axis=red, pos=-2)

 
 Daniel Moreira, MD


If you actually ran that code and still suggest it as the fix, then I
think you must be joking.  Pushing the axes into the middle of the
data cloud in order to make them cross is certainly not making a very
nice looking plot. Not only are there observations outside the area
framed by the axes, but there are axis labels that are overlapped by
observations and by the axes themselves.

PJ

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] I want axes that cross

2009-02-15 Thread Paul Johnson

On Fri, Feb 13, 2009 at 3:14 PM, Marc Schwartz
marc_schwa...@comcast.net wrote:
 on 02/13/2009 02:19 PM Paul Johnson wrote:
 On Fri, Feb 13, 2009 at 1:51 PM, Marc Schwartz

 OK, so given all of the above, something like the following should work:

 set.seed(1233240)
 x - rnorm(100)
 z - gl(2,50)
 y - rnorm(100, mean = 1.8 * as.numeric(z))

 # Calculate a new range, subtracting a definable value
 # from the min of each vector for the new minimum
 # Adust the 0.25 as may be needed
 X - c(min(x) - 0.25, max(x))
 Y - c(min(y) - 0.25, max(y))

 # Use 'X' and Y' here, not 'x' and 'y'
 # So that the plot region is extended appropriately
 plot(X, Y, type = n, axes = F, xlab = x, ylab = y)

 points(x, y, pch = $, cex = 0.7, col = z)

 # DO use 'pos'...
 axis(1, pos = Y[1], col = green, col.axis = green)
 axis(2, pos = X[1], col = red, col.axis = red)

 # get the plot region boundaries
 usr - par(usr)

 segments(X[1], usr[3], X[1], usr[4], col = red)
 segments(usr[1], Y[1], usr[2], Y[1], col = green)


 HTH,

 Marc



Thanks, Marc and everybody.

This last suggestion produces the graph I was trying for.

The other approaches that people suggest, using pos or xaxp, produce
other undesired changes in the tick marks or the position of the axes
relative to the data. xaxp offers the promise of a more intuitive
solution, except that, when using it, the tick marks are pushed off in
a bad way.  Your use of segments to draw the extensions of the axes
was the first intuition I had, but I did not know about the trick you
used to retrieve the size of the usr box.

More secrets of par, revealed.

Have you ever seen a drawing of the regions of an R plot with the
terminology that is used for parts?  Until I saw your example code, I
had not understood that the plot axes are placed at the absolute edge
of the user plotting region, for example.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] SPSS data import: problems work arounds for GSS surveys

2009-03-02 Thread Paul Johnson

I'm using R 2.8.1 on Ubuntu 8.10.  I'm writing partly to ask what's
wrong, partly to tell other users who search that there is a work
around.

The General Social Survey is a long standing series of surveys
provided by NORC (National Opinion Research Center).  I have
downloaded some years of the survey data in SPSS format (here's the
site: http://www.norc.org/GSS+Website/Download/SPSS+Format/).  When I
try to import using foreign, I get an error like so:

 library(foreign)
 dat - read.spss(gss2006.sav, to.data.frame=T, trim.factor.names=T)
Error in inherits(x, factor) : object cp not found
In addition: Warning messages:
1: In read.spss(gss2006.sav, to.data.frame = T, trim.factor.names = T) :
  gss2006.sav: File contains duplicate label for value 99.9 for variable TVRELIG
2: In read.spss(gss2006.sav, to.data.frame = T, trim.factor.names = T) :
  gss2006.sav: File contains duplicate label for value 99.9 for variable SEI
3: In read.spss(gss2006.sav, to.data.frame = T, trim.factor.names = T) :
  gss2006.sav: File contains duplicate label for value 99.9 for
variable FIRSTSEI
4: In read.spss(gss2006.sav, to.data.frame = T, trim.factor.names = T) :
  gss2006.sav: File contains duplicate label for value 99.9 for variable PASEI
5: In read.spss(gss2006.sav, to.data.frame = T, trim.factor.names = T) :
  gss2006.sav: File contains duplicate label for value 99.9 for variable MASEI
6: In read.spss(gss2006.sav, to.data.frame = T, trim.factor.names = T) :
  gss2006.sav: File contains duplicate label for value 99.9 for variable SPSEI
7: In read.spss(gss2006.sav, to.data.frame = T, trim.factor.names = T) :
  gss2006.sav: File contains duplicate label for value 0.75 for
variable YEARSJOB
8: In read.spss(gss2006.sav, to.data.frame = T, trim.factor.names = T) :
  gss2006.sav: File-indicated character representation code (1252)
looks like a Windows codepage

No dat object is created from this.


I have found a work around.  I installed PSPP version 0.6.0 and used
it to open the sav file, and then re-save it in SPSS sav  format.
That creates an SPSS file that foreign's function can open.

I still see the warnings about redundant value labels, but as far as I
can see these are harmless.  A working object is obtained like so:

 dat - read.spss(gss-pspp.sav)
Warning messages:
1: In read.spss(gss-pspp.sav) :
  gss-pspp.sav: File contains duplicate label for value 99.9 for
variable TVRELIG
2: In read.spss(gss-pspp.sav) :
  gss-pspp.sav: File contains duplicate label for value 0.75 for
variable YEARSJOB
3: In read.spss(gss-pspp.sav) :
  gss-pspp.sav: File contains duplicate label for value 99.9 for variable SEI
4: In read.spss(gss-pspp.sav) :
  gss-pspp.sav: File contains duplicate label for value 99.9 for
variable FIRSTSEI
5: In read.spss(gss-pspp.sav) :
  gss-pspp.sav: File contains duplicate label for value 99.9 for variable PASEI
6: In read.spss(gss-pspp.sav) :
  gss-pspp.sav: File contains duplicate label for value 99.9 for variable MASEI
7: In read.spss(gss-pspp.sav) :
  gss-pspp.sav: File contains duplicate label for value 99.9 for variable SPSEI


There is still some trouble with the importation of this SPSS file,
however.  It has the symptoms of being a non-rectangular data array, I
think.  What do you think about these warnings:

 dat - read.spss(gss-pspp.sav,to.data.frame=T)
There were 22 warnings (use warnings() to see them)
 warnings()
Warning messages:
1: In read.spss(gss-pspp.sav, to.data.frame = T) :
  gss-pspp.sav: File contains duplicate label for value 99.9 for
variable TVRELIG
2: In read.spss(gss-pspp.sav, to.data.frame = T) :
  gss-pspp.sav: File contains duplicate label for value 0.75 for
variable YEARSJOB
3: In read.spss(gss-pspp.sav, to.data.frame = T) :
  gss-pspp.sav: File contains duplicate label for value 99.9 for variable SEI
4: In read.spss(gss-pspp.sav, to.data.frame = T) :
  gss-pspp.sav: File contains duplicate label for value 99.9 for
variable FIRSTSEI
5: In read.spss(gss-pspp.sav, to.data.frame = T) :
  gss-pspp.sav: File contains duplicate label for value 99.9 for variable PASEI
6: In read.spss(gss-pspp.sav, to.data.frame = T) :
  gss-pspp.sav: File contains duplicate label for value 99.9 for variable MASEI
7: In read.spss(gss-pspp.sav, to.data.frame = T) :
  gss-pspp.sav: File contains duplicate label for value 99.9 for variable SPSEI
8: In xi = z[1L] | xi = z[2L] | xi[xi == z[3L]] :
  longer object length is not a multiple of shorter object length
9: In xi = z[1L] | xi = z[2L] | xi[xi == z[3L]] :
  longer object length is not a multiple of shorter object length
10: In xi = z[1L] | xi = z[2L] | xi[xi == z[3L]] :
  longer object length is not a multiple of shorter object length
11: In xi = z[1L] | xi = z[2L] | xi[xi == z[3L]] :
  longer object length is not a multiple of shorter object length
12: In xi = z[1L] | xi = z[2L] | xi[xi == z[3L]] :
  longer object length is not a multiple of shorter object length
13: In xi = z[1L] | xi = z[2L] | xi[xi == z[3L]] :
  longer object length is not a multiple

Re: [R] vertically aligned X axis labels disappear off R Graphics window

2009-03-02 Thread Paul Johnson

On Wed, Feb 25, 2009 at 12:51 PM, Uwe Ligges
lig...@statistik.tu-dortmund.de wrote:


 R User R User wrote:

 Hi guys,
 I'm evaluating R for basic data exploration. I produce a bar plot of the
 data, with the x axis labels aligned vertically. However, the start of
 labels longer than about 10 characters are cut off by the bottom of the
 graphics window.


Been there, did that :)

Run these steps. Run the par with various values to change margins.

The size of the figure is fixed from the outside edges, so when you
make the margin bigger, the drawing gets squeezed.

x - rpois(14, lambda=10)

mynames - c(Really Long Name,Really Long Name,Really Long
Name,Really Long Name,Really Long Name,Really Long Name,Really
Long Name,Really Long Name,Really Long Name,Really Long
Name,Really Long Name,Really Long Name,Really Long Name,Really
Long Name Long Long)


barplot(x, names=mynames, las=2)


## RSiteSearch(barplot names margins)

barplot(x, names=abbreviate(mynames), las=2)

par(mar=c(9.5,3,0.5, 0.3))
barplot(x, names=mynames, las=2)


## believe should be same as following, but the following works
## to change the margin only once, and further calls
## to similar have no margin changing effect. So
## you need run par over again before boxplotting.

barplot(x, names=mynames, las=2, mar=c(9.5,3,0.5, 0.3))


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] adding value labels on Interaction Plot

2009-03-04 Thread Paul Johnson

On Wed, Mar 4, 2009 at 10:52 AM, Dimitri Liakhovitski ld7...@gmail.com wrote:
 Hello - and sorry for what might look like a simple graphics question.

 I am building an interaction plot for d:

 d=data.frame(xx=c(3,3,2,2,1,1),yy=c(4,3,4,3,4,3),zz=c(5.1,4.4,3.5,3.3,-1.1,-1.3))
 d[[1]]-as.factor(d[[1]])
 d[[2]]-as.factor(d[[2]])
 print(d)

 interaction.plot(d$xx, d$yy, d$zz,
  type=b, col=c(red,blue), legend=F,
  lty=c(1,2), lwd=2, pch=c(18,24),
  xlab=X Label,
  ylab=Y Label,
  main=Chart Label)

 I am trying and not succeeding in adding Y values (value labels in
 Excel speak) near the data points on 3 lines of the graph.
 I understand that I might have to use text. But how do I tell text
 to use the actual coordinates of the dots on the lines?


 Thank you very much!


I'm not understanding your trouble, exactly. I had not heard of
interaction.plot before and so I've run your code and it is an
interesting function. Thanks for providing the working example.

I can help you with the text.

It is easy to add text. A commmands like

text( 1.3, 1, whatever, pos=3)

will put the word whatever on top of coordinates x and y. (you leave
out pos=3 and R centes the text on the coordinates).

If you need to find out x , y before running that, you can.  the
locator function will return coordinates. Run

locator(1)

and then left click on a point in the graph. Coordinates will pop out
on the screen.

And you can make the text placement depend on locator

text(locator(1), whatever, pos=3)

I don't think you want to do that work interactively, however.  It can
be automated.

You can add a column of names in your data frame and more or less
automate the plotting as well.  I did this to test.

mylabels - c(A,B,C,D,E,F)
text(d$xx,d$zz, mylabels, pos=3)

This almost works perfectly, but it plops the labels in the wrong spots.

I'd like to change the command so that the position of the text for
the red line would be on the right, while the position of the text for
the blue line is on the left.

It appears to me your variable yy is the one that separates the 2
lines. Correct? I observe:

as.numeric(d$yy)
[1] 2 1 2 1 2 1

We want the blue ones on the left, for them we need pos=2. For the
others, we want pos=4

Ach. I tried this

text( d$xx, d$zz, mylabels, pos=2*as.numeric(d$yy))

but it comes out backward.  So how about this:

text( d$xx, d$zz, mylabels, pos=as.numeric(d$yy))

That positions the red ones below the line and the blue ones to the
left.  That doesn't look too bad to me.

Anyway, I think you get the idea.

If you wanted to simply stop plotting the circles, and put the letters
right on the spot, that would be easy as well.




-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Table Transformation

2009-03-04 Thread Paul Johnson

On Wed, Mar 4, 2009 at 11:58 AM, Christian Pilger
christian.pil...@gmx.net wrote:

 Dear R-experts,

 recently, I started to discover the world of R. I came across a problem,
 that I was unable to solve by myself (including searches in R-help, etc.)

 I have a flat table similar to

 key1    key2    value1

 abcd_1  BP      10
 abcd_1  BSMP    1A
 abcd_1  PD      25
 abcd_2  BP      20
 abcd_3  BP      80
 abcd_4  IA      30
 abcd_4  PD      70
 abcd_4  PS      N

 I wish to transform this table to obtain the following result:

        key2
 key1    BP      BSMP    IA      PD      PS
 abcd_1  10    1A          25    
 abcd_2  20                      
 abcd_3  80                      
 abcd_4              30    70    N


I think we would say that a dataframe of the first type is in the
long format, while the other one you want is in the wide format.
I've done changes like that with the reshape function that is in the
stats package.

This example you propose is like making one column for each country
where key 1 is like the year in which the observation is made.
Right?

You don't have an easily cut-and-pasteable code example, so I've
generated a little working example. Here, x1 is key 1 and x2 is key 2.

 x1 - gl(4,5, labels=c(c1,c2,c3,c4))
 x1
 [1] c1 c1 c1 c1 c1 c2 c2 c2 c2 c2 c3 c3 c3 c3 c3 c4 c4 c4 c4 c4
Levels: c1 c2 c3 c4
 x2 - rep(1:5,4)
 df - data.frame(x1, x2, y=rnorm(20))
 df
   x1 x2   y
1  c1  1  0.02095747
2  c1  2  0.05926233
3  c1  3 -0.07561916
4  c1  4 -1.06272710
5  c1  5 -1.89202032
6  c2  1 -0.04549782
7  c2  2 -0.68333187
8  c2  3 -0.99151410
9  c2  4 -0.29070280
10 c2  5 -0.97655024
11 c3  1  0.33411223
12 c3  2 -0.24907340
13 c3  3 -0.25469819
14 c3  4  1.23956157
15 c3  5 -1.38162430
16 c4  1  0.50343661
17 c4  2 -0.58126964
18 c4  3  0.24256348
19 c4  4 -0.39398578
20 c4  5  0.01664450
 reshape(df, direction=wide, timevar=x2, idvar=x1)
   x1 y.1 y.2 y.3y.4y.5
1  c1  0.02095747  0.05926233 -0.07561916 -1.0627271 -1.8920203
6  c2 -0.04549782 -0.68333187 -0.99151410 -0.2907028 -0.9765502
11 c3  0.33411223 -0.24907340 -0.25469819  1.2395616 -1.3816243
16 c4  0.50343661 -0.58126964  0.24256348 -0.3939858  0.0166445


Your case will have many missings, but I think the idea is the same.

HTH


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] using a noisy variable in regression (not an R question)

2009-03-07 Thread Paul Johnson

On Sat, Mar 7, 2009 at 11:49 AM, Juliet Hannah juliet.han...@gmail.com wrote:
 Hi, This is not an R question, but I've seen opinions given on non R
 topics, so I wanted
 to give it a try. :)

 How would one treat a variable that was measured once, but is known to
 fluctuate a lot?
 For example, I want to include a hormone in my regression as an
 explanatory variable. However, this
 hormone varies in its levels throughout a day. Nevertheless, its levels differ
 substantially between individuals so that there is information there to use.

 One simple thing to try would be to form categories, but I assume
 there are better ways to handle this. Has anyone worked with such data, or 
 could
 anyone suggest some keywords that may be helpful in searching for this
 topic. Thanks
 for your input.


From teaching econometrics, I remember that if the truth is
y=b0+b1x1+noise and then you do not have a correct measure of x1, but
rather something else like ex1=x1+noise, then the regression estimate
of b1 is biased, generally attenuated.  As far as I understand it, the
technical solutions are not too encouraging You can try to get better
data or possibly to  build an instrumental variables model, where you
could have other predictors of the true value of x1 in a first stage
model.  I don't recall that I was able to persuade myself that
approach really solves anything, but many people recommend it. I
suppose a key question is whether you can persuade your audience that
ex1= x1+noise and whether that noise is well behaved.

As I was considering your problem, I was wondering if there might not
be a mixed model approach to this problem.  You hypothesize the
truth is y=b0+b1x1+noise, but you don't have x1.  So suppose you
reconsider the truth as a random parameter, as in y=b0+c1*ex1+noise.
ex1 is a fixed estimate of the hormone level for each observation.  c1
is a random, varying coefficient because the effect of the hormone
fluctuates in an unmeasurable way. Then you could try to estimate the
distribution of c1.

You have an interesting problem, I think.

pj
-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotmath with values?

2008-12-31 Thread Paul Johnson

On Wed, Dec 31, 2008 at 10:47 AM, Mike Prager mike.pra...@noaa.gov wrote:
 I hope to use the plotmath facility to print titles that mix
 math and values of R variables.

 The help for plotmath has an example, which after repeated
 reading, I find baffling. Likewise, I have read the help file
 for substitute (wqhich seems to be needed) without ever
 understanding what it does, other than being used in some magic
 incantations.

 I would like to do something like this:

 dev.new()
 aa - round(pi,2)
 plot(1:3, 1:3, main = ~ a == aa)

 and have the main title read a = 3.14

 but of course it reads a = aa.


I agree this is a very difficult part of using R.  I asked this exact
same kind of question last year.  If you go to an R-help email archive
for April 2, 2008

Trouble combining plotmath, bquote, expressions

you will find some discussion in the list.

For a while, I got pretty excited and started working on better
example code to put into the R distribution, but lost initiative when
I saw the magnitude of the problem.  This example code will show
several usages of bquote and an alternative substitute approach.  The
result seems not to look so beautiful as I recall, but isn't that
always the way it goes :).


### Filename: Normal1_2008.R
### Paul Johnson March 31, 2008
### This code should be available somewhere in
http://pj.freefaculty.org.  If it is not
### email me paulj...@ku.edu



mu - 10.034

sigma - 12.5786786

myx - seq( mu - 3.5*sigma,  mu+ 3.5*sigma, length.out=500)

myDensity - dnorm(myx,mean=mu,sd=sigma)


# Here's one way to retrieve the values of mu and sigma and insert
them into an expression
t1t2 - bquote (paste(Normal(, mu== .(round(mu,2)), ',', sigma==
.(round(sigma,2)),)) )

plot(myx, myDensity, type=l, xlab=x, ylab=Probability Density , main=t1t2)


t1t2 - bquote (paste(Normal (   , mu== .(round(mu,2)), '  ,  ',
sigma^2== .(round(sigma^2,2)), )) )
### Note spaces manually inserted above are needed, otherwise plotmath
overlaps l of normal with first parenthesis
plot(myx, myDensity, type=l, xlab=x, ylab=Probability Density , main=t1t2)




### Had difficulty finding syntax for substitute to combine symbols mu
and sigma with values.
### Following is best I can figure, no simpler or obvious than previous method.
##t1 - substitute ( mu == a ,  list (a = mu))
##t2 - substitute ( sigma == a, list(a = sigma))
##t1t2 - bquote(paste(Normal(, .(t1),,, .(t2),) ) )



t1t2 -  substitute( Normal ~~ group( (, list(mu==mu1,
sigma^2==sigma2), )) ,  list(mu1=round(mu,2),
sigma2=round(sigma^2,2)))


plot(myx, myDensity, type=l, xlab=x, ylab=Probability Density , main=t1t2)



plot(myx, myDensity, type=l, xlab=x, ylab=Probability Density ,
main=t1t2, axes=F)
axis(2, pos= mu - 3.6*sigma)
axis(1, pos=0)


# bquote creates an expression that text plotters can use
t1 -  bquote( mu== .(mu))

text( mu, 0.00, t1, pos=3)

ss = 0.2 * max(myDensity)

arrows( x0=mu, y0= ss, x1=mu+sigma, y1=ss, code=3, angle=90, length=0.1)

t2 -  bquote( sigma== .(round(sigma,2)))

text( mu+0.5*sigma, 1.15*ss, t2)


normalFormula - expression (f(x) == frac (1, sqrt(2*pi)*sigma) *
e^{~~ - ~~ frac(1,2)~~ bgroup((, frac(x-mu,sigma),))^2})

text ( mu + 0.5*sigma, max(myDensity)- 0.10 * max(myDensity),
normalFormula, pos=4)

### Theory says we should have about 2.5% of the area to the left of:
-1.96 * std.dev

criticalValue - mu -1.96 * sigma
specialX -  myx[myx = criticalValue]

### mark the critical value in the graph

text ( criticalValue, 0 , label= paste(round(criticalValue,2)), pos=1)

specialY - myDensity[myx  criticalValue]

polygon(c(specialX[1], specialX, specialX[length(specialX )]), c(0,
specialY, 0), density=c(-110),col=lightgray )

shadedArea - round(pnorm(mu - 1.96 * sigma, mean=mu, sd=sigma),4)

### Hard to position this text just right
al - paste( Prob(, x = , round(criticalValue,2),)\n=,F(,
round(criticalValue,2) ,)\n=, round(shadedArea,3),sep=)
text(  criticalValue- sigma, myDensity[length(specialX)], labels=al, pos=3)

ss - 0.1 * max(myDensity)
arrows( x0=mu, y0= ss, x1=mu-1.96*sigma, y1=ss, code=3, angle=90, length=0.1)

text( mu - 2.0*sigma, 1.15*ss,
bquote(paste(.(round(criticalValue,2)),phantom(1) == mu - 1.96, ,
sigma,sep= )),pos=4 )





 From a user's point of view -- one who has never written a
 parser nor taken a course in compilers -- what is needed is the
 nonexistent function value usable in plotmath expressions to
 produce the value of its argument, as

 plot(1:3, 1:3, main = ~ a == value(aa))

 How can this be done?

 THANKS!

 --
 Mike Prager, NOAA, Beaufort, NC
 * Opinions expressed are personal and not represented otherwise.
 * Any use of tradenames does not constitute a NOAA endorsement.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Paul E. Johnson

[R] Sweave documents have corrupted double quotes

2009-01-16 Thread Paul Johnson

I'm attaching a file foo.Rnw and I'm hoping some of you might run it
through your R  latex systems to find out if the double-quotes in
typewriter font turn out as black boxes (as they do for me).  If you
don't use Sweave, but you have a system with a working version of R
and LaTeX, the file gives the instructions you need to use to process
the file. The

The file itself explains the problem. You can see the flawed output on
my web site

http://pj.freefaculty.org/latex/foo.pdf

I'm running Ubuntu Linux 8.10 with R 2.8.1 and TexLive 2007 (which is
provided with the distribution).

This is not a new problem, I noticed it two years ago while using
TeTeX on Fedora Linux, and so I doubt that this is specific to
TeXLive.  Back then, I took the path of resistance and stopped using
the typewriter font.  That is becoming inconvenient, however.

I would sincerely appreciate any pointers you have.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave documents have corrupted double quotes

2009-01-16 Thread Paul Johnson

On Fri, Jan 16, 2009 at 10:43 AM, David Winsemius
dwinsem...@comcast.net wrote:
 Dear Dr Johnson;


 I'm not sure if you get copies of your posts. If you do can you check to see
 if the list-server kept the attachment? My copy did not have one.

 --
 Best
 David winsemius


Hm. Well, I do get the attachment, and don't understand why you don't.

But if you are willing, you can get the file here, same directory as
the dvi and pdf output:

http://pj.freefaculty.org/latex/foo.Rnw






-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave documents have corrupted double quotes

2009-01-16 Thread Paul Johnson

On Fri, Jan 16, 2009 at 11:06 AM, Vincent Goulet
vincent.gou...@act.ulaval.ca wrote:
 Paul,

 The file did not make it to the list.

 Did you try loading Sweave with the 'noae' option, that is:

\usepackage[noae]{Sweave}

 This *may* solve your issue.

 HTH Vincent


Wow. That does fix it. I bow to you.  I need to learn how this option
can be put into the Rnw file itself .

Are you the same Vincent Goulet who offers the customized Emacs for
windows? (http://vgoulet.act.ulaval.ca/ressources/emacs).  If so,
thanks again for that. Keep up the good work.  It has saved me and my
students tons of time setting up Windows systems.


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave documents have corrupted double quotes

2009-01-17 Thread Paul Johnson

Hey, everybody.

I am concluding that this Sweave wrecks quotation marks in typewriter
font is a bug in Sweave.

I've uploaded the foo.tex file so now you can see the Rnw, tex, and
pdf file side by side.

http://pj.freefaculty.org/latex

As previous poster noted, the Sweave instruction is added at the end
of the preamble in foo.tex. However, moving the \usepackage{Sweave} to
the beginning of the preamble has no effect. Same black boxes.

It is true that one can work around this by inserting the declaration

\usepackage[noae]Sweave

into the Rnw file (or in the tex file).  That solves the problem,
however, the user is left with a suspicious feeling.  What will go
wrong in my output after [noae] is inserted? If nothing, why was it
there in the first place?

pj


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sweave documents have corrupted double quotes

2009-01-17 Thread Paul Johnson

On Sat, Jan 17, 2009 at 4:00 PM, Duncan Murdoch murd...@stats.uwo.ca wrote:
 On 17/01/2009 4:29 PM, Ben Bolker wrote:

 Duncan Murdoch murdoch at stats.uwo.ca writes:


 R is open source, so this is no mystery:  if you use [noae] then Sweave
 won't use the ae package in LaTeX.  The problem you were having with quotes
 is that the ae package doesn't define those.  If you choose not to use it,
 then you won't get the symbols that it does define.

 Duncan Murdoch


  So if this is a LaTeX bug, should we report it upstream
 to the maintainer (Lars Engebretsen, http://sites.engebretsen.ch/lars/ ) ?

 It is documented in the README for the ae package, so I don't think a new
 report is needed.

 Duncan Murdoch

Defending Paul a little bit: from his point of view, Sweave

 is what broke things, and although the bug isn't in Sweave it
 would take a certain amount of digging to discover that ...

 From the ae-unaware user's point of view, adding Sweave is

 necessary and sufficient to produce the unwanted behavior.

  cheers
  Ben Bolker


Maybe bug is not the right word.  What do you call an undocumented
effect in software that does not produce a pleasant result? :)

The Sweave user manual makes no mention of the ae font. If you don't
believe me, grab this:
http://www.stat.uni-muenchen.de/~leisch/Sweave/Sweave-manual.pdf and
search for ae.   It does not mention the [noae] option. I never
realized  any ae fonts were used until this problem emerged. The ae
readme  http://www.ctan.org/get/fonts/ae/README doesn't have a warning
about double quotes. The characters missing are mainly: eth, thorn,
and the sami letter eng. Sometimes the £ character is also missing.
For the typewriter fonts, the situation is worse.  If it had said
typewriter font is missing quotation marks it would have gotten my
attention,  but I would have had no reason to be reading the ae font
README because there's nothing about ae in the Sweave manual. Claiming
that I brought this on myself by using the ae font is just wrong.

The claim that the ae font is widely known to be lacking in some
symbols is interesting, however. If so, Sweave should not place
\usepackage{ae} at the end of every preamble by default.

Here is where I think the question lies.  Given ae's weaknesses, what
should Sweave do?   Is there any reason why Sweave should prefer to
insert the ae fonts where I had Latin Modern?  I don't think so, if
these authors are correct:
http://www.dante.de/dante/events/eurotex/papers/TUT09.pdf

If there is a good reason to use ae for the serif and nonserif fonts,
we could protect the typewriter font by having Sweave insert this:

\usepackage{ae,aecompl}
\renewcommand{\ttdefault}{lmtt}
\usepackage[T1]{fontenc}

I think we could even do better by having a conditional check on
availability of lmtt before using the package, and if it is not
available use cm-super or just cm. I'd just like to set Sweave's
default as close to good as possible so more political science
professors don't have to go through this same hassle.

For the time being, I'm protecting users by applying this patch to
Sweave.sty, which makes [noae] the default situation.

$ diff Sweave.sty Sweave.sty-new
8c8
 \setboolean{swe...@ae}{true}
---
 \setboolean{swe...@ae}{false}

I'll see what symbols now show up as big black boxes.  Actually, those
don't scare me so much as the ones that fail to appear at all.  Almost
impossible to proof read for that kind of trouble. (Ever accidentally
put ~ in ordinary text in a latex document? It just disappears in the
output.)


pj
-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] PCA

2010-03-10 Thread Paul Johnson

On Wed, Mar 10, 2010 at 4:42 PM, Xanthe Walker xanthe.wal...@gmail.com wrote:
 Hello,

 I am trying to complete a PCA on a set of standardized ring widths from 8
 different sites (T10, T9, T8, T7, T6, T5, T3, and T2).
 The following is a small portion of my data:

 T10 T9 T8 T7 T6 T5 T3 T2 1.33738 0.92669 0.91146 0.98922 0.9308 0.88201
 0.92287 0.91775 0.82181 1.05319 0.92908 0.97971 0.95165 0.98029 1.14048
 0.77803 0.88294 0.96413 0.90893 0.87957 0.9961 0.74926 0.71394 0.70877
 1.07549 1.13311 1.23536 1.19382 1.2362 1.07741 1.20334 0.8727 0.77737
 0.99292 0.92703 1.02384 0.99831 1.1317 0.93672 1.07909 0.88933 1.15587
 1.20885 0.8983 1.06476 0.81845 1.09017 0.72909 0.75347 0.95826 0.90922
 0.73869 0.74846 0.70481 0.49826 0.91824 1.39082 1.1281 1.05147 0.95839
 1.20648 1.24587 0.65045 1.23251 0.80977 0.89714 0.90042 0.9543 0.86217
 1.20818 0.82725 0.7666 1.11092 1.10328 1.16464 1.00707 1.09575 1.04647
 0.79045 0.47331 0.88753 1.04699 1.0854 0.91803 1.03622 0.80624 0.905 1.28271
 0.91963 0.90121 0.89136 0.97408 1.0449 1.00572 0.7703 1.48373 1.31837
 0.97733 1.04229 1.23096 1.14002 1.09911 0.77523 1.31543

 This is what I have done in R:

 rm(list=ls(all=TRUE))
 PCA-read.csv(PCAsites.csv, header=TRUE)
 PCA
 attach(PCA)
 names(PCA)

 ###using correlation matrix#

 p1 - princomp(PCA, cor = TRUE)
 summary(p1)
 loadings(p1)
 plot(p1)
 biplot(p1)
 p1$scores
 screeplot(p1, npcs=4, type=lines)

 The problem is that the purpose of this analysis is to derive a new data
 set, using the first component, that I can work with. In other words, I am
 doing this to compress the data into one column of ring widths. How do I
 derive a new data set?


the output of p1$scores is what you want, isn't it?  Or just one column of it?

desiredScores - p1$scores[ , 1]

p



-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Shade area under curve

2010-03-10 Thread Paul Johnson

I use this to make illustration for some calculus notes. There are
examples of shaded areas in there:

### Filename: Normal1_2009_plotmathExample.R
### Paul Johnson June 3, 2009
### This code should be available somewhere in
http://pj.freefaculty.org/R.  If it is not
### email me paulj...@ku.edu


###Set mu and sigma at your pleasure:
mu - 10.03
sigma - 12.5786786

myx - seq( mu - 3.5*sigma,  mu+ 3.5*sigma, length.out=500)

myDensity - dnorm(myx,mean=mu,sd=sigma)


# It is challenging to combine plotmath with values of mu and sigma in
one expression.
# Either bquote or substitute can be used.  First use bquote:

myTitle1 - bquote (paste(x ~ Normal(, mu== .(round(mu,2)), ',',
sigma== .(round(sigma,2)),)) )

### Using substitute:
### myTitle1 -  substitute( x ~ Normal ~~ group( (, list(mu==mu1,
sigma^2==sigma2)#, )) ,  list(mu1=round(mu,2),
sigma2=round(sigma^2,2)))

### Or combine the two:
### t1 - substitute ( mu == a ,  list (a = mu))
### t2 - substitute ( sigma == a, list(a = round(sigma,2)))
### myTitle1 - bquote(paste(x ~ Normal(, .(t1),,, .(t2),) ) )


## To save the plot in a file, change FALSE to TRUE
saveplot - FALSE

if (saveplot == TRUE){
  pdf(file=Normal-2009.pdf, height=6.5, width=6.5, onefile=F,
paper=special, pointsize=10)

}else {
  dev.new(height=6.5, width=6.5,pointsize=9)
}
### xpd needed to allow writing outside strict box of graph
par(xpd=TRUE, ps=10)

plot(myx, myDensity, type=l, xlab=x, ylab=Probability Density ,
main=myTitle1, axes=FALSE)
axis(2, pos= mu - 3.6*sigma)
axis(1, pos=0)
lines(c(myx[1],myx[length(myx)]),c(0,0)) ### closes off axes

# bquote creates an expression that text plotters can use
t1 -  bquote( mu== .(mu))

# Find a value of myx that is very close to mu
centerX - max(which (myx = mu))
# plot light vertical line under peak of density
lines( c(mu, mu), c(0, myDensity[centerX]), lty= 14, lwd=.2)

# label the mean in the bottom margin
mtext(bquote( mu == .(mu)), 1, at=mu, line=-1)

### find position 20% up vertically, to use for arrow coordinate
ss = 0.2 * max(myDensity)
# Insert interval to represent width of one sigma
arrows( x0=mu, y0= ss, x1=mu+sigma, y1=ss, code=3, angle=90, length=0.1)

### Write the value of sigma above that interval
t2 -  bquote( sigma== .(round(sigma,2)))
text( mu+0.5*sigma, 1.15*ss, t2)

### Create a formula for the Normal
normalFormula - expression (f(x) == frac (1, sigma* sqrt(2*pi)) *
e^{~~ - ~~ frac(1,2)~~ bgroup((, frac(x-mu,sigma),))^2})
# Draw the Normal formula
text ( mu + 0.5*sigma, max(myDensity)- 0.10 * max(myDensity),
normalFormula, pos=4)

### Theory says we should have 2.5% of the area to the left of: -1.96 * sigma.
### Find the X coordinate of that critical value
criticalValue - mu -1.96 * sigma
### Then grab all myx values that are to the left of that critical value.
specialX -  myx[myx = criticalValue]

### mark the critical value in the graph
text ( criticalValue, 0 , label= paste(round(criticalValue,2)), pos=1)
### Take sequence parallel to values of myx inside critical region
specialY - myDensity[myx  criticalValue]
#  Polygon makes a nice shaded illustration
polygon(c(specialX[1], specialX, specialX[length(specialX )]), c(0,
specialY, 0), density=c(-110),col=lightgray )

shadedArea - round(pnorm(mu - 1.96 * sigma, mean=mu, sd=sigma),4)


### I want to insert annotation about area on left side.

al1 - bquote(Prob(x = .(round(criticalValue,2
al2 - bquote(phantom(0) == F( .(round(criticalValue,2)) ))
al3 - bquote(phantom(0) == .(round(shadedArea,3)))

### Hard to position this text just right
### Have tried many ideas, this may be least bad.
### Get center position in shaded area
medX - median(specialX)
### density at that center point:
denAtMedX - myDensity[indexMed - max(which(specialX  medX))]

text(medX, denAtMedX+0.0055, labels=al1)
text(medX, denAtMedX+0.004, labels=al2)
text(medX, denAtMedX+0.0025, labels=al3)

### point from text toward shaded area
arrows( x0=medX, y0=myDensity[indexMed]+0.002 ,x1= mu-2.5 *sigma, y1=
0.40*myDensity[length(specialX)] ,   length=0.1)



ss - 0.1 * max(myDensity)
### Mark interval from mu to mu-1.96*sigma
arrows( x0=mu, y0= ss, x1=mu-1.96*sigma, y1=ss, code=3, angle=90, length=0.1)
### Put text above interval
text( mu - 2.0*sigma,1.15*ss,
bquote(paste(.(round(criticalValue,2)),phantom(1)==mu-1.96 *
sigma,sep=)),pos=4 )




criticalValue - mu +1.96 * sigma
### Then grab all myx values that are to the left of that critical value.
specialX -  myx[myx = criticalValue]

### mark the critical value in the graph
text ( criticalValue, 0 , label= paste(round(criticalValue,2)), pos=1)
### Take sequence parallel to values of myx inside critical region
specialY - myDensity[myx  criticalValue]
#  Polygon makes a nice shaded illustration
polygon(c(specialX[1], specialX, specialX[length(specialX )]), c(0,
specialY, 0), density=c(-110),col=lightgray )

shadedArea - round(pnorm(mu + 1.96 * sigma, mean=mu, sd=sigma, lower.tail=F),4)


### Insert simpler comment on right side.

al2 - bquote(phantom

[R] Care to share an R presentation?

2010-03-17 Thread Paul Johnson

The R movement is picking up steam in the center of America.  People
that ignored my R-enthusiasm 10 years ago are now calling me up asking
for presentations.  I need to make a 2  hour presentation to a
collection of faculty and grad students who might like to use R. I
don't want to make it seem too complicated (as I often do), but I
don't want to mislead them to think it will be easy.

I expect other r-help readers have been in this same situation.  I
have a recollection (5, 6 years ago) that one of the R leaders had a
slideshow for this exact purpose.  But I can't find it now. There is a
R-help similar request and both John Fox and  Deepayan Sarkar offered
links to their materials.  However, the links aren't valid anymore.

If I don't find a pre-existing example to work from, I'll slap
together a Beamer/Sweave presentation and post it where future speech
givers can get it.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Does S inherit the enhancements in R language?

2010-03-19 Thread Paul Johnson

I don't know anybody who has S-plus these days, but I expect some of
you might, and perhaps you won't mind telling me something.

I'm working on my presentation about R for the general audience.  As I
survey the suggestions from this list about that project, I find
myself wondering whether S-plus benefits from R.  Does S-plus syntax
change like R's does.  I can take code for S and run it through R, but
if I took some R code to an S-plus system, would it work?

Example 1. _, -, =

The old S documents I find emphasize assigment with _.

When I started with R, that was deprecated, then forbidden. _ was
not allowed in file names, now it is.  It was absolutely necessary to
use -. = caused errors.  Around the time of R-2.1 or so, it became
possible to run R code that has = for assignments. It's not encouraged
by R core team, but = is allowed.

Does S+ now accept either

-

or

=

?

For that matter, in S+ is it now forbidden to use underscore for
assignment, as it is in R?

Example 2. Semicolons now discouraged in R code.

We used to require ; to end commands.

Now the R parser is smart enough to spot line breaks and interpret
them accordingly. R's been that way for as long as I can remember, but
I think the ; was required in earliest R releases.

I rather liked the definitive ; at the end of every command.  That
looks right to me, probably because of my SAS and C background.

Would S+ have a panic attack?

Example 3.  Namespace.  Does S-plus get better as R does?

Years ago, I was modeling US Democrats and Republicans and I created
an indicator variable called rep.  regression models would not work
after that because the rep function had been blocked. It was very
frustrating to me.
Professor Ripley spotted the error and posed a message called Don't
use the names of R functions as variable names
http://www.mail-archive.com/r-h...@stat.math.ethz.ch/msg11585.html.

After that, I was terrified that any name I used might conflict with a
built in R function name.

Last month, i saw a student here with a political model and he used
rep as a variable in a regression model, it seemed to work just fine.
I surmise that the rise in usage of namespaces in R packages accounts
for that?

I'm sorry if this is too OT for r-help.

pj
-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Question re the plotrix package

2011-07-27 Thread Paul Johnson


Dear list,

I am using the  clock24.plot command in this excellent package to plot animal 
activity data. 

Does anyone know if both symbols and a line can be plotted on the same plot to 
show both raw data (symbols) and a line (describing a statistical model of the 
pattern) ? Or if more than one line etc can be plotted?

Thanks

Paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fit.mult.impute() in Hmisc

2011-08-13 Thread Paul Johnson

On Thu, Mar 31, 2011 at 2:56 PM, Yuelin Li li...@mskcc.org wrote:
 I tried multiple imputation with aregImpute() and
 fit.mult.impute() in Hmisc 3.8-3 (June 2010) and R-2.12.1.

 The warning message below suggests that summary(f) of
 fit.mult.impute() would only use the last imputed data set.
 Thus, the whole imputation process is ignored.

  Not using a Design fitting function; summary(fit)
   will use standard errors, t, P from last imputation only.
   Use vcov(fit) to get the correct covariance matrix,
   sqrt(diag(vcov(fit))) to get s.e.


Hello.  I fiddled around with rms  multiple imputation when I was
preparing these notes from our R summer course. I ran into that same
thing you did, and my conclusion is slightly different from yours.

http://pj.freefaculty.org/guides/Rcourse/multipleImputation/multipleImputation-1-lecture.pdf

Look down to slide 80 or so, where I launch off into that question.
It appears to me that aregImpute will give the right answer for
fitters from rms, but if you want to feel confident about the results
for other fitters, you should use mitools or some other paramater
combining approach. My conclusion (slide 105) is

Please note: the standard errors in the output based on lrm match
the std.errors estimated by MItools. Thus I conclude
sqrt(diag(cov(fit.mult.impute.object) did not give correct results




 But the standard errors in summary(f) agree with the values
 from sqrt(diag(vcov(f))) to the 4th decimal point.  It would
 seem that summary(f) actually adjusts for multiple
 imputation?

 Does summary(f) in Hmisc 3.8-3 actually adjust for MI?

 If it does not adjust for MI, then how do I get the
 MI-adjusted coefficients and standard errors?

 I can't seem to find answers in the documentations, including
 rereading section 8.10 of the Harrell (2001) book  Googling
 located a thread in R-help back in 2003, which seemed dated.
 Many thanks in advance for the help,

 Yuelin.
 http://idecide.mskcc.org
 ---
 library(Hmisc)
 Loading required package: survival
 Loading required package: splines
 data(kyphosis, package = rpart)
 kp - lapply(kyphosis, function(x)
 +       { is.na(x) - sample(1:length(x), size = 10); x })
 kp - data.frame(kp)
 kp$kyp - kp$Kyphosis == present
 set.seed(7)
 imp - aregImpute( ~ kyp + Age + Start + Number, dat = kp, n.impute = 10,
 +                      type = pmm, match = closest)
 Iteration 13
 f - fit.mult.impute(kyp ~ Age + Start + Number, fitter=glm, xtrans=imp,
 +                 family = binomial, data = kp)

 Variance Inflation Factors Due to Imputation:

 (Intercept)         Age       Start      Number
       1.06        1.28        1.17        1.12

 Rate of Missing Information:

 (Intercept)         Age       Start      Number
       0.06        0.22        0.14        0.10

 d.f. for t-distribution for Tests of Single Coefficients:

 (Intercept)         Age       Start      Number
    2533.47      193.45      435.79      830.08

 The following fit components were averaged over the 10 model fits:

  fitted.values linear.predictors

 Warning message:
 In fit.mult.impute(kyp ~ Age + Start + Number, fitter = glm, xtrans = imp,  :
  Not using a Design fitting function; summary(fit) will use
 standard errors, t, P from last imputation only.  Use vcov(fit) to get the
 correct covariance matrix, sqrt(diag(vcov(fit))) to get s.e.


 f

 Call:  fitter(formula = formula, family = binomial, data = completed.data)

 Coefficients:
 (Intercept)          Age        Start       Number
    -3.6971       0.0118      -0.1979       0.6937

 Degrees of Freedom: 80 Total (i.e. Null);  77 Residual
 Null Deviance:      80.5
 Residual Deviance: 58   AIC: 66
 sqrt(diag(vcov(f)))
 (Intercept)         Age       Start      Number
  1.5444782   0.0063984   0.0652068   0.2454408
 -0.1979/0.0652068
 [1] -3.0350
 summary(f)

 Call:
 fitter(formula = formula, family = binomial, data = completed.data)

 Deviance Residuals:
   Min      1Q  Median      3Q     Max
 -1.240  -0.618  -0.288  -0.109   2.409

 Coefficients:
            Estimate Std. Error z value Pr(|z|)
 (Intercept)  -3.6971     1.5445   -2.39   0.0167
 Age           0.0118     0.0064    1.85   0.0649
 Start        -0.1979     0.0652   -3.03   0.0024
 Number        0.6937     0.2454    2.83   0.0047

 (Dispersion parameter for binomial family taken to be 1)

    Null deviance: 80.508  on 80  degrees of freedom
 Residual deviance: 57.965  on 77  degrees of freedom
 AIC: 65.97

 Number of Fisher Scoring iterations: 5


     =

     Please note that this e-mail and any files transmitted with it may be
     privileged, confidential, and protected from disclosure under
     applicable law. If the reader of this message is not the intended
     recipient, or an employee or agent responsible for delivering this
     message to the intended recipient, you are hereby notified that any
     reading, dissemination, distribution,

[R] seeking advice about rounding error and %%

2011-08-13 Thread Paul Johnson

A client came into our consulting center with some data that had been
damaged by somebody who opened it in MS Excel.  The columns were
supposed to be integer valued, 0 through 5, but some of the values
were mysteriously damaged. There were scores like 1.18329322 and such
in there.  Until he tracks down the original data and finds out what
went wrong, he wants to take all fractional valued scores and convert
to NA.

As a quick hack, I suggest an approach using %%

 x - c(1,2,3,1.1,2.12131, 2.001)
 x %% 1
[1] 0.0 0.0 0.0 0.1 0.12131 0.00100
 which(x %% 1  0)
[1] 4 5 6
 xbad - which(x %% 1  0)
  x[xbad] - NA
  x
[1]  1  2  3 NA NA NA

I worry about whether x %% 1 may ever return a non zero result for an
integer because of rounding error.

Is there a recommended approach?

What about zapsmall on the left, but what on the right of ?

which( zapsmall(x %% 1)   0 )


Thanks in advance

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] graphics: 3D regression plane

2011-04-27 Thread Paul Johnson

Hi. Comments below

On Wed, Apr 27, 2011 at 2:32 AM, agent dunham crossp...@hotmail.com wrote:
 Hi, thanks, I think I've changed the previous as you told me but I'm having
 this error, what does it mean?


 model- lm(log(v1)~log(v2)+v3, data=dat)

 newax- expand.grid(
    v2 = seq(min(log(dat$v2)), max(log(dat$v2)), length=100),
    v3= seq(min(dat$v3), max(dat$v3), length=100))

 fit - predict(model,newax)

 graph - regr2.plot(x=seq(min(log(dat$v2)), max(log(dat$v2)), length=100),
                 y=seq(min(dat$v3), max(dat$v3), length=100),
                 z= matrix(fit, 100, 100), main=Least-squares,
             resid.plot= TRUE, plot.base.points= TRUE, expand=0.5,
 ticktype=detailed, theta=-45)


 Error en cbind(x, y, z, 1) %*% pmat : argumentos no compatibles

 Thanks again, u...@host.com


I've struggled with these 3d things before.  You should supply us the
full testable program to fix.  Here I've fiddled around and can get a
plot from persp or regr2.plot, but I understand it is not informative.
 But the plot does run, and maybe you will see how to fix.

Generally, I would suggest you do the separate work in separate steps,
not all jammed together inside regr2.plot.  If you do each bit
separately, then you can inspect the result and be sure it is good.

See:

##PJ 2011-04-27
## testing to address question in r-help on persp and regr2.plot

v1 - rnorm(100,m=100,s=10)
v2 - rnorm(100, m=1000, s=100)
v3 - rnorm(100)
dat - data.frame(v1,v2,v3)
rm(v1,v2,v3)
model- lm(log(v1)~log(v2)+v3, data=dat)


rv2 - range(dat$v2)
plv2 - seq(rv2[1], rv2[2], length.out=100)

rv3 - range(dat$v3)
plv3 - seq(rv3[1], rv3[2], length.out=100)

newax- expand.grid(
   v2 = plv2,
   v3= plv3)

fit - predict(model,newax)

zmat - matrix(fit,100,100)

persp(x=plv2, y=plv3, z=zmat)



require(HH)
graph - regr2.plot(x= plv2,
y=plv3,
z=zmat, main=Least-squares,
resid.plot= TRUE, plot.base.points= TRUE, expand=0.5,
ticktype=detailed, theta=-45)









 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Testing for arguments in a function

2011-09-28 Thread Paul Johnson

On Mon, Sep 26, 2011 at 2:39 PM, Gene Leynes gley...@gmail.com wrote:
 I don't understand how this function can subset by i when i is missing

 ## My function:
 myfun = function(vec, i){
    ret = vec[i]
    ret
 }

 ## My data:
 i = 10
 vec = 1:100

 ## Expected input and behavior:
 myfun(vec, i)

 ## Missing an argument, but error is not caught!
 ## How is subsetting even possible here???
 myfun(vec)


Hello, Gene:

It seems to me the discussion of your question launches off into a
more abstract direction than we need here.  I've found it is wise to
name arguments to functions differently than variables in the
environment, so you don't have the function looking for i outside
itself.  And you can put each variable to a ridiculous default to
force an error when that option is not given.  NAN is nothing here,
just a string that has no meaning, so we get an error as you said you
wanted.


myfun - function(ivec, ii=NAN){
ivec[ii]
}
myfun(1:40,10)

works

myfun(1:40)

Produces

Error in myfun(1:40) : object 'NAN' not found

If you are happy enough to just plop out an error, there's no need to worry.

Note the function can be written more succinctly than you originally
had it, and you are generally advised to use - rather than = by
the R developers.




-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Testing for arguments in a function

2011-09-30 Thread Paul Johnson

Just for the record, following Bill Dunlap's advice, I think this is
the best answer to the question as originally posed is.


myfun - function(vec, i=stop('i' must be supplied)){
vec[i]
}

 myfun(1:40,10)
[1] 10
 myfun(1:10)
Error in myfun(1:10) : 'i' must be supplied


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Running a GMM Estimation on dynamic Panel Model using plm-Package

2011-10-05 Thread Paul Johnson

On Sun, Jun 12, 2011 at 2:43 PM, bstudent marc.ruet...@gmx.de wrote:
 Hello,

 although I searched for a solution related to my problem I didn´t find one,
 yet. My skills in R aren´t very large, however.
 For my Diploma thesis I need to run a GMM estimation on a dynamic panel
 model using the pgmm - function in the plm-Package.

 The model I want to estimate is: Y(t) = Y(t-1) + X1(t) + X2(t) + X3(t) .

 There are no normal instruments in this model. There just should be the
 gmm-instruments I need for the model.
 In order to estimate it, I tried the following code:


 library(plm)

 test - pgmm(Y ~ lag(Y, 1) + X1 + X2 + X3 | lag(Y, 1), data=Model,
 effect=individual, model=onestep)



 I tried Model as Modelp - pdata.frame(... and as Model -
 read.table(... but in both cases there´s an error-massage:

 Error in solve.default(Reduce(+, A2)) :
  System ist für den Rechner singulär: reziproke Konditionszahl =
 4.08048e-22


Hello,

I have students working on similar problems.  Here is what I would say to them:

Without a dataset and code that is supposed to work, nobody can figure
out what's wrong and help you around it.

2 suggestions

1. directly contact Yves Croissant, the plm principal author, and
give him your R code and the data set. Show him the error output you
get.  Here's the contact information:   Yves Croissant
yves.croiss...@univ-reunion.fr

If he answers, please let us know.

If you don't want to (or can't) give real data, make some up that
causes the same crash.

2. post in here a link to your data and the full code and I will try
to debug it to at least find out where this is going wrong.  I've been
studying debugging with R functions and this is a good  opportunity
for me.

I stopped focusing on panel estimator details in 2000, so I'm rusty,
but will probably recognize most of what is going on.  If you don't
want to broadcast this to everybody, uou can feel free to contact me
directly, pauljohn at ku.edu is my university address.

PJ

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Tinn-R

2011-10-05 Thread Paul Johnson

On Tue, Oct 4, 2011 at 1:25 PM, Charles McClure cmccl...@atrcorp.com wrote:
 I am new to R and have recently tried Tinn-R with very mixed and unexpected
 results.  Can you point me to a Tinn-R tutorial on the web or a decent
 reference book?


In my experience, TINN-R does not work so well, and most new users are
recommended to try instead Notepad++ with the addon components
R2notepad++ or Rstudio.  I have MS Windows setup tips here

http://web.ku.edu/~quant/cgi-bin/mw1/index.php?title=Windows:AdminTips

Until I see evidence otherwise, I'm concluding that TINN-R was the
best in 2008, but it is harder to configure now and there's no reason
to prefer it over Notepad++.  Real Men[tm] still use Emacs, but new
users may not have enough muscles :)

pj

 Thank you for your help;

 Charles McClure

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] multi-platform equivalent of x11() ?

2012-03-06 Thread Paul Johnson

I want to write an R help example that throws up 2 graphs in separate
windows, for comparison. In Linux I plot one, then run

x11()

to spawn a new on-screen device.

Is there some generic equivalent so I can write an example that will
work for Windows and Mac users as well?

If there is none, don't you think it would be fun if there were?

pj

-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] simulate an gradually increase of the number of subjects based on two variables

2012-03-13 Thread Paul Johnson

Suggestion below:

On Tue, Mar 13, 2012 at 1:24 PM, guillaume chaumet
guillaumechau...@gmail.com wrote:
 I omit to precise that I already try to generate data based on the mean and
 sd of two variables.

 x=rnorm(20,1,5)+1:20

 y=rnorm(20,1,7)+41:60

 simu-function(x,y,n) {
    simu=vector(list,length=n)

    for(i in 1:n) {
        x=c(x,rnorm(1,mean(x),sd(x)))
        y=c(y,rnorm(1,mean(y),sd(y)))
        simu[[i]]$x-x
        simu[[i]]$y-y


    }

    return(simu)
 }

 test=simu(x,y,60)
 lapply(test, function(x) cor.test(x$x,x$y))

 As you could see, the correlation is disappearing with increasing N.
 Perhaps, a bootstrap with lm or cor.test could solve my problem.


In this case, you should consider creating the LARGEST sample first,
and then remove cases to create the smaller samples.

The problem now is that you are drawing a completely fresh sample
every time, so you are getting not only the effect of sample size, but
also that extra randomness when case 1 is replaced every time.

 I am fairly confident (80%)  that if you approach it my way, the
mystery you see will start to clarify itself.  That is, draw the big
sample with the desired characteristic, and once you understand the
sampling distribution of cor for that big sample,  you will also
understand what happens when each large sample is reduced  by a few
cases.

BTW, if you were doing this on a truly massive scale, my way would run
much faster.  Allocate memory once, then don't need to manually delete
lines, just trim down the index on the rows.  (Same data access
concept as bootstrap).

pj



-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] beginner's loop issue

2012-03-13 Thread Paul Johnson

On Tue, Mar 13, 2012 at 11:27 AM, aledanda danda.ga...@gmail.com wrote:
 Dear All,

 I hope you don't mind helping me with this small issue. I haven't been using
 R in years and I'm trying to fill in a matrix
 with the output of a function (I'm probably using the Matlab logic here and
 it's not working).
 Here is my code:

 for (i in 1:length(input)){
  out[i,1:3] - MyFunction(input[i,1],input[i,2], input[i,3])
    out[i,4:6] - MyFunction(input[i,5],input[i,7], input[i,6])
      out[i,7:9] - MyFunction(input[i,8],input[i,10], input[i,9])
 }

 'input' is a matrix
 dim(input)
 [1] 46 10

 and each raw corresponds to a different subject.
 The error I get here is

 /Error in out[i, 1:3] - get.vaTer(input[i, 2], input[i, 4], input[i, 3],  :
  object 'out' not found/

out has to exist first, as previous commenter said.

Furthermore, suggestions:

Consider making MyFunction accept a vector of 3 arguments, rather than
separate arguments.

Consider making out 3 columns, as in

out - matrix(0, nrow=N, ncol=3)
for(i ...){
out[i,1:3] - MyFunction(input[i,1:3])
out[i,1:3] - MyFunction(input[i,4:6])
out[i,1:3] - MyFunction(input[i,7:9])
}

If you could re-shape your input thing as a list with one element
that needs to go into MyFunction, this could get easier still:

lapply(input, MyFunction)

 or if input were an array with 3 columns, you could revise MyFuntion
to accept a 3-vector.

apply(input, 1, MyFunction)

Hardly ever in R does one need to specify inputs as you have done in
your example.
pj



-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Models with ordered and unordered factors

2011-11-15 Thread Paul Johnson

On Tue, Nov 15, 2011 at 9:00 AM, Catarina Miranda
catarina.mira...@gmail.com wrote:
 Hello;

 I am having a problems with the interpretation of models using ordered or
 unordered predictors.
 I am running models in lmer but I will try to give a simplified example
 data set using lm.
 Both in the example and in my real data set I use a predictor variable
 referring to 3 consecutive days of an experiment. It is a factor, and I
 thought it would be more correct to consider it ordered.
 Below is my example code with my comments/ideas along it.
 Can someone help me to understand what is happening?

Dear Catarina:

I have had the same question, and I hope my answers help you
understand what's going on.

The short version:

http://pj.freefaculty.org/R/WorkingExamples/orderedFactor-01.R

The longer version, Working with Ordinal Predictors

http://pj.freefaculty.org/ResearchPapers/MidWest09/Midwest09.pdf

HTH
pj


 Thanks a lot in advance;

 Catarina Miranda


 y-c(72,25,24,2,18,38,62,30,78,34,67,21,97,79,64,53,27,81)

 Day-c(rep(Day 1,6),rep(Day 2,6),rep(Day 3,6))

 dataf-data.frame(y,Day)

 str(dataf) #Day is not ordered
 #'data.frame':   18 obs. of  2 variables:
 # $ y  : num  72 25 24 2 18 38 62 30 78 34 ...
 # $ Day: Factor w/ 3 levels Day 1,Day 2,..: 1 1 1 1 1 1 2 2 2 2 ...

 summary(lm(y~Day,data=dataf))  #Day 2 is not significantly different from
 Day 1, but Day 3 is.
 #
 #Call:
 #lm(formula = y ~ Day, data = dataf)
 #
 #Residuals:
 #    Min      1Q  Median      3Q     Max
 #-39.833 -14.458  -3.833  13.958  42.167
 #
 #Coefficients:
 #            Estimate Std. Error t value Pr(|t|)
 #(Intercept)   29.833      9.755   3.058 0.00797 **
 #DayDay 2      18.833     13.796   1.365  0.19234
 #DayDay 3      37.000     13.796   2.682  0.01707 *
 #---
 #Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 #
 #Residual standard error: 23.9 on 15 degrees of freedom
 #Multiple R-squared: 0.3241,     Adjusted R-squared: 0.234
 #F-statistic: 3.597 on 2 and 15 DF,  p-value: 0.05297
 #

 dataf$Day-ordered(dataf$Day)

 str(dataf) # Day 1Day 2Day 3
 #'data.frame':   18 obs. of  2 variables:
 # $ y  : num  72 25 24 2 18 38 62 30 78 34 ...
 # $ Day: Ord.factor w/ 3 levels Day 1Day 2..: 1 1 1 1 1 1 2 2 2 2 ...

 summary(lm(y~Day,data=dataf)) #Significances reversed (or Day.L and
 Day.Q are not sinonimous Day 2 and Day 3?): Day 2 (.L) is
 significantly different from Day 1, but Day 3 (.Q) isn't.

 #Call:
 #lm(formula = y ~ Day, data = dataf)
 #
 #Residuals:
 #    Min      1Q  Median      3Q     Max
 #-39.833 -14.458  -3.833  13.958  42.167
 #
 #Coefficients:
 #            Estimate Std. Error t value Pr(|t|)
 #(Intercept)  48.     5.6322   8.601 3.49e-07 ***
 #Day.L        26.1630     9.7553   2.682   0.0171 *
 #Day.Q        -0.2722     9.7553  -0.028   0.9781
 #---
 #Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 #
 #Residual standard error: 23.9 on 15 degrees of freedom
 #Multiple R-squared: 0.3241,     Adjusted R-squared: 0.234
 #F-statistic: 3.597 on 2 and 15 DF,  p-value: 0.05297

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how big (in RAM and/or disk storage) is each of these objects in a list?

2011-11-26 Thread Paul Johnson

Greetings, friends (and others :) )

We generated a bunch of results and saved them in an RData file. We
can open, use, all is well, except that the size of the saved file is
quite a bit larger than we expected.  I suspect there's something
floating about in there that one of the packages we are using puts in,
such as a spare copy of a data frame that is saved in some subtle way
that has escaped my attention.

Consider a list of objects. Are there ways to do these things:

1. ask R how much memory is used by the things inside the list?

2.   Does as.expression(anObject) print everything in there? Or, is
there a better way to convert each thing to text or some other format
that you can actually read line by line to see what is in there, to
see everything?

If there's no giant hidden data frame floating about, I figure I'll
have to convert symmetric matrices to lower triangles or such to save
space.  Unless R already is automatically saving a matrix in that way
but just showing me the full matrix, which I suppose is possible. If
you have other ideas about general ways to make saved objects smaller,
I'm open for suggestions.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] fundamental guide to use of numerical optimizers?

2011-12-15 Thread Paul Johnson

I was in a presentation of optimizations fitted with both MPlus and
SAS yesterday.  In a batch of 1000 bootstrap samples, between 300 and
400 of the estimations did not converge.  The authors spoke as if this
were the ordinary cost of doing business, and pointed to some
publications in which the nonconvergence rate was as high or higher.

I just don't believe that's right, and if some problem is posed so
that the estimate is not obtained in such a large sample of
applications, it either means the problem is badly asked or badly
answered.  But I've got no traction unless I can actually do
better

Perhaps I can use this opportunity to learn about R functions like
optim, or perhaps maxLik.

From reading r-help, it seems to me there are some basic tips for
optimization, such as:

1. It is wise to scale the data so that all columns have the same
range before running an optimizer.

2. With estimates of variance parameters, don't try to estimate sigma
directly, instead estimate log(sigma) because that puts the domain of
the solution onto the real number line.

3 With estimates of proportions, estimate instead the logit, for the
same reason.

Are these mistaken generalizations?  Are there other tips that
everybody ought to know?

I understand this is a vague question, perhaps the answers are just in
the folklore. But if somebody has written them out, I would be glad to
know.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] pls help to print out first row of terms(model) output in example program

2011-12-19 Thread Paul Johnson

Greetings.

I've written a convenience function for multicollinearity diagnosis.
I'd like to report to the user the formula that is used in a
regression.  I get output like this:

 mcDiagnose(m1)
[1] The following auxiliary models are being estimated and returned in a list:
[1] `x1` ~ .
formula(fmla)()
[1] `x2` ~ .

I'd like to fill in the period with the variable names that are in there.

I know the information is in there, I just need to get it.  The terms
output for a fitted model has the output at the very top, in the first
line, above the attributes.  I just want to print out that first line,
here:

 terms(m2)
y ~ log(10 + x1) + poly(x2, 2)
attr(,variables)
list(y, log(10 + x1), poly(x2, 2))
attr(,factors)
 log(10 + x1) poly(x2, 2)
y   0   0
log(10 + x1)1   0
poly(x2, 2) 0   1
attr(,term.labels)
[1] log(10 + x1) poly(x2, 2)
[snip]

In my working example code below , I need the help where I have ##fix
me fix me##


##Paul Johnson
## 2011-12-19
## mcDiagnose.R

lmAuxiliary - function(model){
  dat - as.data.frame(model.matrix(model))
  ## ivnames - attr(delete.response(terms(model)), term.labels)
  ## previous does not work with transforms like poly
  hasIntercept - attr(terms(model), intercept)

  if (hasIntercept) dat - dat[ , -1] # removes intercept. assumes
intercept in column 1
  ivnames - colnames(dat)
  print(The following auxiliary models are being estimated and
returned in a list:)

  results - list()

  for (i in ivnames) {
fmla - paste( `, i, `,  ~ . , sep=);
print(fmla)
maux - lm( formula(fmla), data=dat)
results[[ i ]] - maux
print(maux$call[2])
###fix me fix me ##
  }
  results
}


mcDiagnose - function(model){
  auxRegs - lmAuxiliary(model)
  auxRsq - numeric(length=length(auxRegs))
  j - 0
  for ( i in auxRegs ){
j - j + 1
sumry - summary(i)
auxRsq[j] - sumry$r.squared
  }
  print(Drum roll please!)
  print(And your R_j Squareds are (auxiliary Rsq))
  print(names(auxRegs))
  print(auxRsq)
}

x1 - rnorm(100)
x2 - rnorm(100)
y - rnorm(100)

m1 - lm(y~x1+x2)

mcDiagnose(m1)


m2 - lm(y~log(10+x1) + poly(x2,2))

mcDiagnose(m2)

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] 3d plotting alternatives. I like persp, but regret the lack of plotmath.

2011-12-29 Thread Paul Johnson

I have been making simple functions to display regressions in a new
package called rockchalk. For 3d illustrations, my functions use
persp, and I've grown to like working with it. As an example of the
kind of things I like to do, you might consult my lecture on
multicollinearity, which is by far the most detailed illustration I've
prepared.

http://pj.freefaculty.org/guides/stat/Regression/Multicollinearity/Multicollinearity-1-lecture.pdf

I used persp mainly because I can understand it, and it can be made to
work like plot in R, with additional tools like lines and points and
such. I don't want to interact with these plots, I just need to put
them into lectures documents relatively easily. And I've also
succeeded in turning them into flash video with R2SWF, which works
great!

Last summer, I put together some lecture notes illustrating persp,
scatterplot3d, and persp3d.
http://pj.freefaculty.org/guides/Rcourse/plot-3d/plots-3d.pdf. As I
review that now, I see I did not make any progress on the lattice
based plotters, or I did not write it down, anyway. scatterplot3d did
almost everything I needed to do, but not everything, and so I used
persp in my newer effort.

Then I put some plot math in an axis label and ran into the problem
that everybody else who uses persp finds, eventually. persp doesn't
allow expressions in axis labels. From searching in r-help, I see
that many people have run up against the same trouble. Most people say
too bad, I'll switch to some other tool.

Suggested alternatives.

1. Use wireframe or cloud in lattice. They can handle plotmath.

I've been studying that, and it can handle plot math, but I just can't
get the kinds of graphs I want from it. In the persp framework, I can
draw the 3d plot, and then add details to it one by one. I can
comprehend the little steps. In wireframe and cloud, it *appears* to
me i have to pack all of that work into one single command, and it is,
well, either too difficult or impossible. Or perhaps I'm just not
understanding the documentation. If I could make the sorts of plots I
need with lattice tools, I would do it. But I'm really blocked at the
front door by the write one giant function call nature of it.

I realize that's vague because I have not told you specifically what I
want to do. If there is a lattice expert reading, can you give me
some HOWTO hints? Take, for example, Slide 19 in this one:

http://pj.freefaculty.org/guides/stat/Regression/Multicollinearity/Multicollinearity-1-lecture.pdf

gray dots in the x1-x2 plane, blue points hovering above, red pointed
arrows from gray to blue.

And then later on, say slide 36-60, I have regression planes drawn in
with the arrows from the plane to the points.

2. Use rgl based tools. These ones are especially neat if you want to
interact with the graph--spin a 3d graph or get a display with
lively colors.

It works more like plot, in the sense that you can draw the figure and
then add components with points3d and so forth. And there are nice
working examples of extensions in misc3d. And also, the R CMDR
interface has a working pull down menu system that can make rgl 3d
graphs. That's a big plus.

After playing with some examples, I see these troubles. The only
output device that runs well (finishes and does not generate massive
files) is png. The output quality on the screen is quite beautiful,
but transition to black-and-white is not as nice looking as persp, in
my opinion. These graphs draw much more slowly. They are more
difficult to script out in an Sweave document, it seems to me.

If those criticisms are wrong, I would be glad to know.

So I'm back wondering why persp can't be updated.

Nobody has explained why it is not possible to revise persp to allow
expressions in axis labels. Perhaps nobody has done it because people
think that persp has no fans :) But I'm a fan.

--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to properly re-set a saved seed? I've got the answer, but no explanation

2012-01-06 Thread Paul Johnson

Hello, happy new year.

I've come back to a problem from last spring. I need to understand
what what is the technically correct method to save a seed and then
re-set it.  Although this example arises in the context of MT19937,
I'm needing to do this with other generators as well, including
L'Ecuyer streams.

The puzzle is this comment in ?Random: ‘set.seed’ is the recommended
way to specify seeds.

What I did not understand before, and can't make sense of now, is that
set.seed does not re-set a saved seed.  Here's my working example:


## Paul Johnson
## April 20, 2011

## If you've never dug into the R random number generator and its use of the
## default MT19937, this might be informative.  It will also help you
## avoid a time-consuming mistake that I've made recently.

## I've been developing a simulation and I want to save and restore random
## streams exactly. I got that idea from John Chambers Software for
## Data Analysis and I studied his functions to see how he did it.
## The problem, as we see below, is that set.seed() doesn't do what I expected,
## and I feel lucky to have noticed it now, rather than later.

## I wish set.seed did work the way I want, though, and that's why I'm
## writing here, to see if anybody else wishes the same.

## Here's a puzzle.  Try this:
set.seed(444)
s1 - .Random.seed
runif(1)
rnorm(1)

set.seed(444)
runif(1)
rnorm(1)
## Those matched. Good

## Re-insert the saved seed s1
set.seed(s1)
runif(1)
rnorm(1)

## Why don't the draws match? I put back the seed, didn't I?
## Hm. Try again
set.seed(s1)
runif(1)
rnorm(1)

## Why did those match? But neither matches cases 1 and 2.

## I was baffled  discouraged.

## The help page for random numbers says:
##  ‘set.seed’ is the recommended way to specify seeds.

## But, it doesn't say something like

# set.seed(s1)

## is supposed to work. Ah. My mistake.

 Here's the fix and the puzzle ###
## To re-set the generator to its position at s1, it is required
## to do what the help page seems to say we should not.

.Random.seed - s1

runif(1)
#

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] fix and edit don't work: unable to open X Input Method-segfault

2012-01-08 Thread Paul Johnson

I can't run fix() or edit() anymore. Did I break my system?

I'm running Debian Linux with R-2.14.1. As far as I can tell, the R
packages came from Debian's testing wheezy repository.  I would like
to know if users on other types of systems see the same problem. If
no, then, obviously, it is a Debian-only issue and I can approach it
from that point of view.  And if no other Debian users see same, it
means it is a me-only problem, and that's discouraging :)

I get this same R crash whether I try fix when R is running in a
terminal or in Emacs with ESS. I've not seen this before, but Google
leads to some bug reports on Ubuntu in 2007, where it was claimed that
the problem was fixed.

The really bad part is that the second try causes a segmentation fault
in R itself.


 library(ggplot2)
Loading required package: reshape
Loading required package: plyr

Attaching package: ‘reshape’

The following object(s) are masked from ‘package:plyr’:

rename, round_any

Loading required package: grid
Loading required package: proto

 sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] grid  stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.6
 fix(mpg)
Error in dataentry(datalist, modes) : invalid device
In addition: Warning message:
In edit.data.frame(get(subx, envir = parent), title = subx, ...) :
  unable to open X Input Method
 fix(mpg)

 *** caught segfault ***
address (nil), cause 'unknown'

Traceback:
 1: edit.data.frame(get(subx, envir = parent), title = subx, ...)
 2: edit(get(subx, envir = parent), title = subx, ...)
 3: fix(mpg)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:


Same happens no matter what packages are loaded, so far as I can tell.
 Here it is without ggplot2, in case you were suspicious of those
particular datasets.


 library(datasets)
 datasets()
Error: could not find function datasets
 help(package=datasets)
 fix(CO2)
Error in dataentry(datalist, modes) : invalid device
In addition: Warning message:
In edit.data.frame(get(subx, envir = parent), title = subx, ...) :
  unable to open X Input Method



-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question on model.matrix

2012-01-30 Thread Paul Johnson

Greetings

On Sat, Jan 28, 2012 at 2:43 PM, Daniel Negusse
daniel.negu...@my.mcphs.edu wrote:



 while reading some tutorials, i came across this and i am stuck. i want to 
 understand it and would appreciate if anyone can tell me.

 design - model.matrix(~ -1+factor(c(1,1,2,2,3,3)))

 can someone break down this code and explain to me what the ~, and the 
 -1+factor are doing?

A formula would be y ~ x, so when you don't include y, it means you
only want the right hand side variables.  The term design matrix
generally means the numeric coding that is fitted in a statistical
procedure.

The -1 in the formula means do not insert an intercept for me.  It
affects the way the factor variable is converted to numeric contrasts
in the design matrix.   If there is an intercept, then the contrasts
have to be adjusted to prevent perfect multicollinearity.

If you run a few examples, you will see. This uses lm, but the formula
and design matrix ideas are same. Note, with an intercept, I get 3
dummy variables from x2, but with no intercept, I get 4 dummies:

 x1 - rnorm(16)
 x2 - gl(4, 4, labels=c(none,some,more,lots))
 y - rnorm(16)
 m1 - lm(y ~ x1 + x2)
 model.matrix(m1)
   (Intercept)  x1 x2some x2more x2lots
11 -0.2567  0  0  0
21  0.94963659  0  0  0
31  0.06915561  0  0  0
41  0.89971204  0  0  0
51  0.73817482  1  0  0
61  2.92451195  1  0  0
71 -0.80682449  1  0  0
81  1.07472998  1  0  0
91  1.34949123  0  1  0
10   1 -0.42203984  0  1  0
11   1 -1.66316740  0  1  0
12   1 -2.83232063  0  1  0
13   1  1.26177313  0  0  1
14   1  0.10359857  0  0  1
15   1 -1.85671242  0  0  1
16   1 -0.25140729  0  0  1
attr(,assign)
[1] 0 1 2 2 2
attr(,contrasts)
attr(,contrasts)$x2
[1] contr.treatment

 m2 - lm(y ~ -1 + x1 + x2)
 model.matrix(m2)
x1 x2none x2some x2more x2lots
1  -0.2567  1  0  0  0
2   0.94963659  1  0  0  0
3   0.06915561  1  0  0  0
4   0.89971204  1  0  0  0
5   0.73817482  0  1  0  0
6   2.92451195  0  1  0  0
7  -0.80682449  0  1  0  0
8   1.07472998  0  1  0  0
9   1.34949123  0  0  1  0
10 -0.42203984  0  0  1  0
11 -1.66316740  0  0  1  0
12 -2.83232063  0  0  1  0
13  1.26177313  0  0  0  1
14  0.10359857  0  0  0  1
15 -1.85671242  0  0  0  1
16 -0.25140729  0  0  0  1
attr(,assign)
[1] 1 2 2 2 2
attr(,contrasts)
attr(,contrasts)$x2
[1] contr.treatment

I think you'll need to mess about with R basics like plot and lm
before you go off using the formulas that you really care about.
Otherwise, well, you'll always be lost about stuff like ~ and -1.

I've started posting all my lecture notes (source code, R code, pdf
output) http://pj.freefaculty.org/guides.  That might be a quick start
for you.

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] replacing characters in matrix. substitute, delayedAssign, huh?

2012-01-30 Thread Paul Johnson

A user question today has me stumped.  Can you advise me, please?

User wants a matrix that has some numbers, some variables, possibly
even some function names.  So that has to be a character matrix.
Consider:

 BM - matrix(0.1, 5, 5)

Use data.entry(BM) or similar to set some to more abstract values.

 BM[3,1] - a
 BM[4,2] - b
 BM[5,2] - b
 BM[5,3] - d
 BM
 var1  var2  var3  var4  var5
[1,] 0.1 0.1 0.1 0.1 0.1
[2,] 0.1 0.1 0.1 0.1 0.1
[3,] a   0.1 0.1 0.1 0.1
[4,] 0.1 b   0.1 0.1 0.1
[5,] 0.1 b   d 0.1 0.1

Later on, user code will set values, e.g.,

a - rnorm(1)
b - 17
d - 4

Now, push those into BM, convert whole thing to numeric

newBM - apply(BM, c(1,2), as.numeric)

and use newBM for some big calculation.

Then re-set new values for a, b, d, do the same over again.

I've been trying lots of variations on parse, substitute, and eval.

The most interesting function I learned about this morning was delayedAssign.
If I had only to work with one scalar, it does what I want

 delayedAssign(a, whatA)
 whatA - 91
 a
[1] 91

I can't see how to make that work in the matrix context, though.

Got ideas?

pj

 sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.14.1

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] replacing characters in matrix. substitute, delayedAssign, huh?

2012-01-30 Thread Paul Johnson

Henrik's proposal works well, so far.  Thanks very much. I could not
have figured that out (without much more suffering).

Here's the working example
in case future googlers find their way to this thread.


## Paul Johnson paulj...@ku.edu
## 2012-01-30

## Special thanks to r-help email list contributors,
## especially Henrik Bengtsson


BM - matrix(0.1, 5, 5)

BM[2,1] - a
BM[3,2] - b

BM

parseAndEval - function(x, ...) eval(parse(text=x))

a - 0.5
b - 0.4

realBM - apply(BM, MARGIN=c(1,2), FUN=parseAndEval)

BM[4,5] - rnorm(1, m=7, sd=1)

BM

realBM - apply(BM, MARGIN=c(1,2), FUN=parseAndEval)

realBM

## Now, what about gui interaction with that table?
## The best nice looking options are not practical at the moment.

## Try this instead

data.entry(BM)

## That will work on all platforms, so far as I know, without
## any special effort from us. Run that, make some changes, then
## make sure you insert new R variables to match in your environment.

## Suppose you inserted the letter z in there somewhere

## set z out here

z - rpois(1, lambda=10)

realBM - apply(BM, MARGIN=c(1,2), FUN=parseAndEval)


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] trouble automating formula edits when log or * are present; update trouble

2012-05-29 Thread Paul Johnson

Greetings

I want to take a fitted regression and replace all uses of a variable
in a formula. For example, I'd like to take

m1 - lm(y ~ x1, data=dat)

and replace x1 with something else, say x1c, so the formula would become

m1 - lm(y ~ x1c, data=dat)

I have working code to finish that part of the problem, but it fails
when the formula is more complicated. If the formula has log(x1) or
x1:x2, the update code I'm testing doesn't get right.

Here's the test code:

##PJ
## 2012-05-29
dat - data.frame(x1=rnorm(100,m=50), x2=rnorm(100,m=50),
x3=rnorm(100,m=50), y=rnorm(100))

m1 - lm(y ~ log(x1) + x1 + sin(x2) + x2 + exp(x3), data=dat)
m2 - lm(y ~ log(x1) + x2*x3, data=dat)

suffixX - function(fmla, x, s){
upform - as.formula(paste0(. ~ ., -, x, +, paste0(x, s)))
update.formula(fmla, upform)
}

newFmla - formula(m2)
newFmla
suffixX(newFmla, x2, c)
suffixX(newFmla, x1, c)

The last few lines of the output. See how the update misses x1 inside
log(x1) or in the interaction?


 newFmla - formula(m2)
 newFmla
y ~ log(x1) + x2 * x3
 suffixX(newFmla, x2, c)
y ~ log(x1) + x3 + x2c + x2:x3
 suffixX(newFmla, x1, c)
y ~ log(x1) + x2 + x3 + x1c + x2:x3

It gets the target if the target is all by itself, but not otherwise.

After messing with this for quite a while, I conclude that update was
the wrong way to go because it is geared to replacement of individual
bits, not editing all instances of a thing.

So I started studying the structure of formula objects.  I noticed
this really interesting thing. the newFmla object can be probed
recursively to eventually reveal all of the individual pieces:


 newFmla
y ~ log(x1) + x2 * x3
 newFmla[[3]]
log(x1) + x2 * x3
 newFmla[[3]][[2]]
log(x1)
 newFmla[[3]][[2]][[2]]
x1

So, if you could tell me of a general way to walk though a formula
object, couldn't I use gsub or something like that to recognize each
instance of x1 and replace with x1c??

I just can't figure how to automate the checking of each possible
element in a formula, to get the right combination of [[]][[]][[]].
See what I mean? I need to avoid this:

 newFmla[[3]][[2]][[3]]
Error in newFmla[[3]][[2]][[3]] : subscript out of bounds

pj

-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] trouble automating formula edits when log or * are present; update trouble

2012-05-29 Thread Paul Johnson



 Try substitute:

 do.call(substitute, list(newFmla, setNames(list(as.name(x1c)), x1)))
 y ~ log(x1c) + x2 * x3

Damn. That's pretty. I'd say setNames a magic bullet too.

Thanks very much.

The approach suggested by Michael and Bert has the little shortcoming
that grepping for x1 also picks up similarly named variables like
x1new and x1old.  I've not yet found a similar downside with
Gabor's method.

pj

 --
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com



-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] memory usage benefit from anonymous variable constructions.

2012-06-03 Thread Paul Johnson

This is an I was just wondering question.

When the package dataframe was announced, the author claimed to
reduce the number of times a data frame was copied, I started to
wonder if I should care about this in my projects.  Has anybody
written a general guide for how to write R code that doesn't
needlessly exhaust RAM?

In Objective-C, we used to gain some considerable advantages by
avoiding declaring objects separately, using anonymous variable
instead. The storage was allocated on the stack, I think, and I think
there was talk that the numbers might stay 'closer' to the CPU
(register?) for immediate usage.

Does this benefit in R as well?  For example, instead of the way I
would usually do this:

 mf - model.frame(model)
  y - model.response(mf)

Here is the anonymous alternative, mf is never declared

y - model.response(model.frame(model))

On the face of it, I can imagine this might be better because no
permanent thing mf is created, the garbage collector wouldn't be
called into play if all the data is local and disappears immediately.
But, then again, R is doing lots of stuff under the hood that I've
never bothered to learn about.


pj
-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Partial R-square in multiple linear regression

2012-06-03 Thread Paul Johnson

On Fri, Jun 1, 2012 at 2:05 PM, Jin Choi oohps...@gmail.com wrote:
 Hello,

 I am trying to obtain the partial r-square values (r^2 or R2) for
 individual predictors of an outcome variable in multiple linear
 regression. I am using the 'lm' function to calculate the beta
 coefficients, however, I would like to know the individual %
 contributions of several indepenent variables. I tried searching for
 this function in many R packages, but it has proven elusive to me. I
 am an R beginner, and I am hoping that I will find a solution!

 Thank you very much.

 Sincerely,

This is the kind of practical user request I've been trying to answer
in the rockchalk package. I have a function called getDeltaRsquare.
It takes a fitted regression and calculates the change in Rsquare when
each variable is removed.

It does not achieve the Partition idea for Rsquare that you might
want because there is no logically meaningful way to partition Rsquare
among predictors if there is any multicollinearity.  I could point you
at some stat books that try to achieve that partition, mostly they
seem to be by psychologists (who like that kind of thing.)  You will
go down a rabbit hole of semi-partial correlation coefficients and
so forth.

But if you just want the how much does R-square drop if I leave out
this variable, I got that for you :)  Whether that is meaningful to
you, well, that's a bigger methodological question.

pj


 Jin Choi
 MSc Epidemiology student
 McGill University, Montreal, Quebec, CANADA

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] non ascill characters in plots. no alternative but plotmath?

2012-06-06 Thread Paul Johnson

A student entered some data with text characters like epsilon and
alpha.   On her Windows system, the Greek letters did not display
properly in a plot.  There were some ordinary ASCII instead.

I asked her to send me the code so I could test. For me, the plot
looks ok on the screen.

Format1 - c(320,500,700,1000,500,320,700,500,320)
Format2 - c(800,1000,1150,1400,1500,1650,1800,2300,2500)
Vowel - c(u,o, α, a,ø, y, ε, e,i)
V1 - data.frame(Format1,Format2,Vowel)
plot(Format1 ~ Format2, data = V1, type=n)
text(V1$Format2, V1$Format1, labels=V1$Vowel)

On my Debian linux system, the plot shows the Greek letters just fine
in the screen device.

However, I turned on a pdf device to run the same  code and see signs
of trouble.

 text(V1$Format2, V1$Format1, labels=V1$Vowel)
Warning messages:
1: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) :
  conversion failure on 'α' in 'mbcsToSbcs': dot substituted for ce
2: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) :
  conversion failure on 'α' in 'mbcsToSbcs': dot substituted for b1
3: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) :
  font metrics unknown for Unicode character U+03b1
4: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) :
  conversion failure on 'α' in 'mbcsToSbcs': dot substituted for ce
5: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) :
  conversion failure on 'α' in 'mbcsToSbcs': dot substituted for b1
6: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) :
  conversion failure on 'ε' in 'mbcsToSbcs': dot substituted for ce
7: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) :
  conversion failure on 'ε' in 'mbcsToSbcs': dot substituted for b5
8: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) :
  font metrics unknown for Unicode character U+03b5
9: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) :
  conversion failure on 'ε' in 'mbcsToSbcs': dot substituted for ce
10: In text.default(V1$Format2, V1$Format1, labels = V1$Vowel) :
  conversion failure on 'ε' in 'mbcsToSbcs': dot substituted for b5

The alpha and epsilon characters don't appear in the pdf.   I don't
know the proper terminology to describe the situation, thus I don't
know where to start reading. Until very recently, I didn't even know
it was possible to directly enter these characters in Emacs, but I've
learned that part.

I understand you might answer use plotmath, if if that's the only
workable thing, I will teach her how. But that's a little bit of an up
hill climb (from where we are now standing). It will be a lot more
work for me to teach about expressions and whatnot, so if there is a
direct route from a column of non ASCII characters to a plot that has
those characters in it, I'd be glad to know.

pj

-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] EM algorithm to find MLE of coeff in mixed effects model

2012-07-03 Thread Paul Johnson

On Tue, Jul 3, 2012 at 12:41 PM, jimmycloud jimmycl...@gmail.com wrote:
 I have a general question about coefficients estimation of the mixed model.


I have 2 ideas for you.

1. Fit with lme4 package, using the lmer function. That's what it is for.

2. If you really want to write your own EM algorithm, I don't feel
sure that very many R and EM experts are going to want to read through
the code you have because you don't follow some of the minimal
readability guidelines.  I accept the fact that there is no officially
mandated R style guide, except for indent with 4 spaces, not tabs.
But there are some almost universally accepted basics like

1. Use -, not =, for assignment
2. put a space before and  after symbols like -, = , + , * , and so forth.
3. I'd suggest you get used to putting squiggly braces in the KR style.

I have found the formatR package's tool tidy.source can do this
nicely. From tidy.source, here's what I get with your code

http://pj.freefaculty.org/R/em2.R

Much more readable!
(I inserted library(MASS) for you at the top, otherwise this doesn't
even survive the syntax check.)

When I copy from email to Emacs, some line-breaking monkey-business
occurs, but I expect you get the idea.

Now, looking at your cleaned up code, I can see some things to tidy up
to improve the chances that some expert will look.

First, R functions don't require return at the end, many experts
consider it distracting. (Run lm or such and review how they write.
No return at the end of functions).

Second, about that big for loop at the top, the one that goes from m 1:300

I don't know what that loop is doing, but there's some code in it that
I'm suspicious of. For example, this part:

 W.hat - psi.old - psi.old %*% t(Zi) %*% solve(Sig.hat) %*%  Zi %*% psi.old

Sigb - psi.old - psi.old %*% t(Zi) %*% solve(Sig.hat) %*%  Zi %*% psi.old


First, you've caused the possibly slow calculation of solve
 (Sig.hat) to run two times.  If you really need it, run it once and
save the result.


Second, a for loop is not necessarily slow, but it may be easier to
read if you re-write this:

 for (j in 1:n) {
tempaa[j]=Eh4new(datai=data.list[[j]],weights.m=weights.mat,beta=beta.old)
tempbb[j] - Eh4newv2(datai = data.list[[j]], weights.m =
weights.mat, beta = beta.old)
 }

like this:

tempaa - lapply(data.list, Eh4new, weights.m, beta.old)
tempbb - lapply(data.list, Eh4newv2, weights.m, beta.old)

Third, here is a no-no

tempbb - c()
for (j in 1:n) {
tempbb - cbind(tempbb, Eh2new(data.list[[j]], weights.m = weights.mat))
}

That will call cbind over and over, causing a relatively slow memory
re-allocation.  See
(http://pj.freefaculty.org/R/WorkingExamples/stackListItems.R)

Instead, do this to apply Eh2new to each item in data.list

tempbbtemp - lapply(data.list, Eh2new, weights.mat)

and then smash the results together in one go

tempbb - do.call(cbind, tempbbtemp)


Fourth, I'm not sure on the matrix algebra. Are you sure you need
solve to get the full inverse of Sig.hat?  Once you start checking
into how estimates are calculated in R, you find that the
paper-and-pencil matrix algebra style of formula is generally frowned
upon. OLS (in lm.fit) doesn't do solve(X'X), it uses a QR
decomposition of matrices.  OR look in MASS package ridge regression
code, where the SVD is used to get the inverse.

I wish I knew enough about the EM algorithm to gaze at your code and
say aha, error in line 332, but I'm not.  But if you clean up the
presentation and tighten up the most obvious things, you improve your
chances that somebody who is an expert will exert him/herself to do
it.

pj


  b follows
 N(0,\psi)  #i.e. bivariate normal
 where b is the latent variable, Z and X are ni*2 design matrices, sigma is
 the error variance,
 Y are longitudinal data, i.e. there are ni measurements for object i.
 Parameters are \beta, \sigma, \psi; call them \theta.

 I wrote a regular EM, the M step is to maximize the log(f(Y,b;\theta))  by
 taking first order derivatives, setting to 0 and solving the equation.
 The E step involves the evaluation of E step, using Gauss Hermite
 approximation. (10 points)

 All are simulated data. X and Z are naive like cbind(rep(1,m),1:m)
 After 200 iterations, the estimated \beta converges to the true value while
 \sigma and \psi do not. Even after one iteration, the \sigma goes up from
 about 10^0 to about 10^1.
 I am confused since the \hat{\beta} requires \sigma and \psi from previous
 iteration. If something wrong then all estimations should be incorrect...

 Another question is that I calculated the logf(Y;\theta) to see if it
 increases after updating \theta.
 Seems decreasing.

 I thought the X and Z are linearly dependent would cause some issue but I
 also changed the X and Z to some random numbers from normal distribution.

 I also tried ECM, which converges fast but sigma and psi are not close to
 the desired values.
 Is this due

[R] good documentation on use of Rscript. where to find?

2012-10-18 Thread Paul Johnson

What is the correct format for the shebang line and what options are
allowed or necessary along with this?

I find plenty of blogs and opinions, but few authoritative answers.

I want an R script to run and update packages periodically, with a
cron job that launches it. What is necessary to put in
line one of the R program. Aside from the basic part

#!/usr/bin/Rscript

what else can there be, or should there be?

Various people suggest adding things like --vanilla but those seem
not to be accepted, at least on RedHat EL 6 with R-2.15.1.  I'd like
to run this with a command line like:

$ R-labSelectInstall-02.R  /tmp/Rupdate.txt 21

so that all the standard output and err goes to a text file, but I've
not found a way to make it work comparably to the R command line
option --vanilla or such.

# ./R-labSelectInstall-02.R --vanilla
'ARNING: unknown option '--vanilla

See what I mean?

-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] inverse for formula transformations on LHS

2013-05-17 Thread Paul Johnson

This is an R formula handling question. It arose in class. We were working
on the Animals data in the MASS package. In order to see a relationship,
you need to log brain and body weight.  It's a fun one for teaching
regression, if you did not try it yet.  There are outliers too!

Students wanted to make a predicted value plot in the non-logged values of
y, for comparison, and I wondered if I couldn't automate this somehow for
them.

It made me wonder how R manages formulae and if a transformation like
log(y) can be be mechanically inverted.

So we have something concrete to talk about, suppose x and y are variables
in dat, a person fits

m1 - lm(log(y) ~ log(x), data = dat)

termplot shows log(y) on the vertical.  What if I want y on the vertical?
Similarly, predict gives values on the log(y) scale, there's no argument
like type = untransformed.

I want my solution to be a bit general, so that it would give back
predicted y for formulae like

sqrt(y)

or

exp(y)

or

log(y + d)

or whatever other math people might throw in there.

Here's what I can tell so far about R's insides.  The formula handler makes
a list out of the formula, I can get that from the terms object that the
model generates. The formula list has ~ as element 1, and log(x)
becomes element [[2]].

Where in the R source code can I see how R looks at the symbol log(y) and
discerns that there is a variable y that needs to be logged? If I could
understand that, and if R has a table of inverse functions, then maybe I
could see what to do.

If you have ideas, I'm very grateful if you share them.

pj
-- 
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] inverse for formula transformations on LHS

2013-05-21 Thread Paul Johnson

On Sat, May 18, 2013 at 7:05 AM, Stephen Milborrow mi...@sonic.net wrote:

 Paul Johnson pauljoh...@gmail.com wrote:

 m1 - lm(log(y) ~ log(x), data = dat)

 termplot shows log(y) on the vertical.  What if I want y on the vertical?


 plotmo in the plotmo package has an inverse.func argument,
 so something like the following might work for you?

 Thanks, I will read plotmo. I did not hear of that one before.

It looks like we have been working on same problem. Take at look at my
package rockchalk and run the examples for plotSlopes and
plotCurves.  Exact same ideas you are thinking about.

I don't think it will help in this case because it still relies on the user
to know about  exp as the inverse of log. When users do predict on glm,
it just knows how to put predictions on the response or link scales, and
I am aiming for something automagical like that.




 library(MASS)
 library(plotmo)
 log.brain - log(Animals$brain)
 log.body  - log(Animals$body)
 m2 - lm(log.brain ~ log.body)

 myplot - function(...)
plotmo(m2, do.par=F, nrug=-1, col.resp=2, pch=20,se=1, main=,
  xlab=log(body), ...)

 par(mfrow = c(2, 2))
 myplot(ylab=log(brain))
 myplot(ylab=brain, inverse.func=exp)
 termplot(m2, se=T, rug=T) #  for comparison


 --
Paul E. Johnson
Professor, Political Science  Assoc. Director
1541 Lilac Lane, Room 504  Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.org   http://quant.ku.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

1 - 100 of 169 matches

Mail list logo