Re: [R] big data?

2014-08-06 Thread Mike Harwood
The read.table.ffdf function in the ff package can read in delimited files 
and store them to disk as individual columns.  The ffbase package provides 
additional data management and analytic functionality.  I have used these 
packages on 15 Gb files of 18 million rows and 250 columns.


On Tuesday, August 5, 2014 1:39:03 PM UTC-5, David Winsemius wrote:


 On Aug 5, 2014, at 10:20 AM, Spencer Graves wrote: 

   What tools do you like for working with tab delimited text files up 
 to 1.5 GB (under Windows 7 with 8 GB RAM)? 

 ?data.table::fread 

   Standard tools for smaller data sometimes grab all the available 
 RAM, after which CPU usage drops to 3% ;-) 
  
  
   The bigmemory project won the 2010 John Chambers Award but is 
 not available (for R version 3.1.0). 
  
  
   findFn(big data, 999) downloaded 961 links in 437 packages. That 
 contains tools for data PostgreSQL and other formats, but I couldn't find 
 anything for large tab delimited text files. 
  
  
   Absent a better idea, I plan to write a function getField to 
 extract a specific field from the data, then use that to split the data 
 into 4 smaller files, which I think should be small enough that I can do 
 what I want. 

 There is the colbycol package with which I have no experience, but I 
 understand it is designed to partition data into column sized objects. 
 #--- from its help file- 
 cbc.get.col {colbycol}R Documentation 
 Reads a single column from the original file into memory 

 Description 

 Function cbc.read.table reads a file, stores it column by column in disk 
 file and creates a colbycol object. Functioncbc.get.col queries this object 
 and returns a single column. 

   Thanks, 
   Spencer 
  
  __ 
  r-h...@r-project.org javascript: mailing list 
  https://stat.ethz.ch/mailman/listinfo/r-help 
  PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html 
  and provide commented, minimal, self-contained, reproducible code. 

 David Winsemius 
 Alameda, CA, USA 

 __ 
 r-h...@r-project.org javascript: mailing list 
 https://stat.ethz.ch/mailman/listinfo/r-help 
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html 
 and provide commented, minimal, self-contained, reproducible code. 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multiple rms summary plots in a single device

2012-05-25 Thread Mike Harwood
I would like to incorporate multiple summary plots from the rms
package into a single device and to control the titles, and also to
open a new device when I reach a specified number of plots.  Currently
I am only getting a single plot(summary( graph in the upper left-
hand corner of each successive device.  However, in the rms
documention I see instances of a loop being used with par(mfrow( for
multiple plots in a single device(e.g. residuals.lrm), and these
examples work on my system.  Please advise regarding options that must
be specified to plot(summary(, or in the construction of my loop.
Below are sample code and my sessionInfo().  Please note that I am
using data.table to facilitate my real analysis, but I can replicate
the issue with tData as a data.frame (using seg - subset(tData,
groups == segment) logic), but I included the data.table logic in case
it may be having some influence.  Thank you!

Mike


tData - data.frame(groups=as.factor(1:8), low=as.factor(1:4)
,high=as.factor(seq(100, 400, 100)),  rand=runif(400))
tData - data.table(tData)
setkeyv(tData, 'groups')


dd - datadist(tData)
options(datadist = 'dd')

doSumPlot - function(segment){
seg - tData[groups == segment,]
plot(summary(rand ~
+ low
+ high
,data = seg
), main=paste('Group:', segment))
}


for(i in 1:length(levels(tData$groups))){
cat('Group: ', i, '\n')
if(i == 1 ){
dev.new()
par(mfrow=c(2,2))
}
if(i/5 == round(i/5, 0)){
dev.new()
par(mfrow=c(2,2))
}
# dev.new()
doSumPlot(levels(tData$groups)[i])
}


 sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] splines   stats graphics  grDevices utils datasets
methods
[8] base

other attached packages:
[1] rms_3.5-0Hmisc_3.9-3  survival_2.36-14
data.table_1.8.0

loaded via a namespace (and not attached):
[1] cluster_1.14.2 grid_2.15.0lattice_0.20-6 tools_2.15.0

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] glmmPQL and predict

2012-01-09 Thread Mike Harwood
Is the labeling/naming of levels in the documentation for the
predict.glmmPQL function backwards?  The documentation states Level
values increase from outermost to innermost grouping, with level zero
corresponding to the population predictions.  Taking the sample in
the documentation:

fit - glmmPQL(y ~ trt + I(week  2), random = ~1 |  ID,
   family = binomial, data = bacteria)

 head(predict(fit, bacteria, level = 0, type=response))
[1] 0.9680779 0.9680779 0.8587270 0.8587270 0.9344832 0.9344832
 head(predict(fit, bacteria, level = 1, type=response))
  X01   X01   X01   X01   X02   X02
0.9828449 0.9828449 0.9198935 0.9198935 0.9050782 0.9050782
 head(predict(fit, bacteria, type=response)) ## population prediction
  X01   X01   X01   X01   X02   X02
0.9828449 0.9828449 0.9198935 0.9198935 0.9050782 0.9050782

The returned values for level=1 and level=unspecified match, which
is not what I expected based upon the documentation. Exponentiating
the intercept coefficients from the fitted regression, the level=0
values match when the random effect intercept is included

 1/(1+exp(-3.412014)) ## only the fixed effect
[1] 0.9680779
 1/(1+exp(-1*(3.412014+0.63614382))) ## fixed and random effect intercepts
[1] 0.9828449

Thanks!

Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Dropping columns from data frame

2012-01-06 Thread Mike Harwood
How does R do it, and should I ever be worried?  I always remove
columns by index, and it works exactly as I would naively expect - but
HOW?  The second illustration, which deletes non contiguous columns,
represents what I do all the time and have some trepidation about
because I don't know the mechanics (e.g. why doesn't the column
formerly-known-as-4 become 3 after column 1 is dropped: doesn't vector
removal from a df/list invoke a loop in C?).  Can I delete a named
list of columns, which are examples 4 and 5 and which generate the
unary error' mesages, without resorting to orig.df$num1.10 - NULL?

Thanks!

orig.df - data.frame(cbind(
1:10
,11:20
,letters[1:10]
,letters[11:20]
,LETTERS[1:10]
,LETTERS[11:20]
))
names(orig.df) - c(
'num1.10'
,'num11.20'
,'lc1.10'
,'lc11.20'
,'uc1.10'
,'uc11.20'
)
# Illustration 1: contiguous columns at beginning of data frame
head(orig.df[,-c(1:3)])

# Illustration 2: non-contiguous columns
head(orig.df[,-c(1,3,5)])

# Illustration 3: contiguous columns at end of data frame
head(orig.df[,-c(4:6)]) ## as expected

# Illustrations 4-5: unary errors
head(orig.df[,-c(as.list('num1.10', 'lc1.10', 'uc1.10'))])
head(orig.df[,-c('num1.10', 'lc1.10', 'uc1.10')])


Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dropping columns from data frame

2012-01-06 Thread Mike Harwood
Thank you, David.  I was merely using head to limit the code/
output.  My question remains, because a created data frame has the
same columns as was output from head:

 head(orig.df,3)
  num1.10 num11.20 lc1.10 lc11.20 uc1.10 uc11.20
1   1   11  a   k  A   K
2   2   12  b   l  B   L
3   3   13  c   m  C   M
 # Illustration 1: contiguous columns at beginning of data frame
 head(orig.df[,-c(1:3)],2)
  lc11.20 uc1.10 uc11.20
1   k  A   K
2   l  B   L
 new.df - orig.df[,-c(1:3)]
 head(new.df,2)
  lc11.20 uc1.10 uc11.20
1   k  A   K
2   l  B   L

 # Illustration 2: non-contiguous columns
 head(orig.df[,-c(1,3,5)],2)
  num11.20 lc11.20 uc11.20
1   11   k   K
2   12   l   L
 new.df - orig.df[,-c(1,3,5)]
 head(new.df,2)
  num11.20 lc11.20 uc11.20
1   11   k   K
2   12   l   L




On Jan 6, 9:49 am, David Winsemius dwinsem...@comcast.net wrote:
 On Jan 6, 2012, at 10:00 AM, Mike Harwood wrote:

  How does R do it, and should I ever be worried?  I always remove
  columns by index, and it works exactly as I would naively expect - but
  HOW?  The second illustration, which deletes non contiguous columns,
  represents what I do all the time and have some trepidation about
  because I don't know the mechanics (e.g. why doesn't the column
  formerly-known-as-4 become 3 after column 1 is dropped: doesn't vector
  removal from a df/list invoke a loop in C?).

 You are NOT removing columns. You are returning (to `head` and then
 to `print`) an extract from the dataframe, but that does not change
 the original dataframe. To effect a change you would need to assign
 the value back to the same name as the original daatframe.

 --
 David









   Can I delete a named
  list of columns, which are examples 4 and 5 and which generate the
  unary error' mesages, without resorting to orig.df$num1.10 - NULL?

  Thanks!

  orig.df - data.frame(cbind(
     1:10
     ,11:20
     ,letters[1:10]
     ,letters[11:20]
     ,LETTERS[1:10]
     ,LETTERS[11:20]
     ))
  names(orig.df) - c(
     'num1.10'
     ,'num11.20'
     ,'lc1.10'
     ,'lc11.20'
     ,'uc1.10'
     ,'uc11.20'
     )
  # Illustration 1: contiguous columns at beginning of data frame
  head(orig.df[,-c(1:3)])

  # Illustration 2: non-contiguous columns
  head(orig.df[,-c(1,3,5)])

  # Illustration 3: contiguous columns at end of data frame
  head(orig.df[,-c(4:6)])    ## as expected

  # Illustrations 4-5: unary errors
  head(orig.df[,-c(as.list('num1.10', 'lc1.10', 'uc1.10'))])
  head(orig.df[,-c('num1.10', 'lc1.10', 'uc1.10')])

  Mike

  __
  r-h...@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 David Winsemius, MD
 West Hartford, CT

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: tcplot documentation in evd package

2011-12-13 Thread Mike Harwood
This issue occurs only when both the evd and ismev packages are loaded.
 Please retract this posting, if possible.  Thank you in advance!

Mike

-- Forwarded message --
From: Mike Harwood harwood...@gmail.com
Date: Mon, Dec 12, 2011 at 7:47 PM
Subject: tcplot documentation in evd package
To: r-help@r-project.org


Hello, and please advise regarding any errors/omissions on my part.
However, the documentation for R's tcplot function in the evd package
appears to contain an error.  I am using evd version 2.2-4 in R 2.14.0
with Windows XP.

 data(portpirie)
 mrlplot(portpirie)  ## No Error

 tlim - c(3.6, 4.2)

 tcplot(portpirie, tlim) ## Line from documentation
Error in fpot(data, u[1], model = model, cmax = cmax, r = r, ulow =
ulow,  :
 `x' must be a non-empty numeric vector


 tcplot(portpirie, tlim=c(3.6, 4.2)) ## explicitly specifying the
threshold limits
Error in fpot(data, u[1], model = model, cmax = cmax, r = r, ulow =
ulow,  :
 `x' must be a non-empty numeric vector

tcplot(portpirie$SeaLevel, tlim) ## Resolves Issue

gpd.fitrange(portpirie$SeaLevel, 3.6, 4.2) ## An alternative, still
naming the SeaLevel vector

Please advise.  Thanks!

Mike

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tcplot documentation in evd package

2011-12-12 Thread Mike Harwood
Hello, and please advise regarding any errors/omissions on my part.
However, the documentation for R's tcplot function in the evd package
appears to contain an error.  I am using evd version 2.2-4 in R 2.14.0
with Windows XP.

 data(portpirie)
 mrlplot(portpirie)  ## No Error

 tlim - c(3.6, 4.2)

 tcplot(portpirie, tlim) ## Line from documentation
Error in fpot(data, u[1], model = model, cmax = cmax, r = r, ulow =
ulow,  :
  `x' must be a non-empty numeric vector


 tcplot(portpirie, tlim=c(3.6, 4.2)) ## explicitly specifying the threshold 
 limits
Error in fpot(data, u[1], model = model, cmax = cmax, r = r, ulow =
ulow,  :
  `x' must be a non-empty numeric vector

tcplot(portpirie$SeaLevel, tlim) ## Resolves Issue

gpd.fitrange(portpirie$SeaLevel, 3.6, 4.2) ## An alternative, still
naming the SeaLevel vector

Please advise.  Thanks!

Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] survexp with large dataframes

2011-09-28 Thread Mike Harwood
Hello, and thank you in advance.

I would like to capture the expected survival from a coxph model for
subjects in an observational study with recurrent events, but the
survexp statement is failing due to memory.  I am using R version
2.13.1 (2011-07-08) on Windows XP.

My objective is to plot the fitted survival with the Kaplan-Meier
plot.  Below is the code with output and [unfortunately] errors. Is
there something wrong in my use of cluster in generating the
proportional hazards model, or is there some syntax to pass it into
survexp?

Mike

 dim(dev)
[1] 899876 25

 mod1 - coxph(Surv(begin.cp, end.cp, event)
+ ~ age.sex
+   + plan_type
+   + uw_load
+   + cluster(mbr_key)
+   ,data=dev
+   )

 summary(mod1)
Call:
coxph(formula = Surv(begin.cp, end.cp, event) ~ age.sex + plan_type +
uw_load + cluster(mbr_key), data = dev)

  n= 899876, number of events= 753324

 coef exp(coef)  se(coef) robust se   z
Pr(|z|)
age.sex19-34_MALE   -0.821944  0.439576  0.005529  0.023298 -35.280  
2e-16 ***
age.sex35-49_FEMALE  0.058776  1.060537  0.004201  0.018477   3.181
0.00147 **
age.sex35-49_MALE   -0.515590  0.597148  0.004634  0.019986 -25.798  
2e-16 ***
age.sex50-64_FEMALE  0.190940  1.210386  0.004350  0.020415   9.353  
2e-16 ***
age.sex50-64_MALE   -0.127514  0.880281  0.004487  0.021431  -5.950
2.68e-09 ***
age.sexCHILD_CHILD  -0.327522  0.720707  0.004238  0.017066 -19.192  
2e-16 ***
plan_typeLOW-0.165735  0.847270  0.002443  0.011080 -14.958  
2e-16 ***
uw_load1-50  0.215122  1.240014  0.006437  0.029189   7.370
1.71e-13 ***
uw_load101-250   0.551042  1.735060  0.003993  0.018779  29.344  
2e-16 ***
uw_load251+  0.981660  2.668884  0.003172  0.017490  56.126  
2e-16 ***
uw_load51-1000.413464  1.512046  0.006216  0.027877  14.832  
2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

exp(coef) exp(-coef) lower .95 upper .95
age.sex19-34_MALE  0.4396 2.27490.42000.4601
age.sex35-49_FEMALE1.0605 0.94291.02281.0996
age.sex35-49_MALE  0.5971 1.67460.57420.6210
age.sex50-64_FEMALE1.2104 0.82621.16291.2598
age.sex50-64_MALE  0.8803 1.13600.84410.9180
age.sexCHILD_CHILD 0.7207 1.38750.69700.7452
plan_typeLOW   0.8473 1.18030.82910.8659
uw_load1-501.2400 0.80641.17111.3130
uw_load101-250 1.7351 0.57631.67241.8001
uw_load251+2.6689 0.37472.57892.7620
uw_load51-100  1.5120 0.66141.43161.5970

Concordance= 0.643  (se = 0 )
Rsquare= 0.205   (max possible= 1 )
Likelihood ratio test= 206724  on 11 df,   p=0
Wald test= 9207  on 11 df,   p=0
Score (logrank) test = 246358  on 11 df,   p=0,   Robust = 4574  p=0

  (Note: the likelihood ratio and score tests assume independence of
 observations within a cluster, the Wald and robust score tests do
not).

 dev.fit - survexp( ~ 1, ratetable=mod1, data=dev)
Error in survexp.cfit(cbind(as.numeric(X), R), Y, conditional,
FALSE,  :
  cannot allocate memory block of size 15.2 Gb

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] predict with eha package

2011-06-02 Thread Mike Harwood
Hello list, and thank you in advance.

I'm unable to generate predicted values when specifying newdata using
phreg and aftreg models
in the eha package, but I do not have the same problem with a
proportional hazards model from coxph.  Without the newdata argument
the predicted values are returned, but with
newdata=model.dataframe coxph is fine but both aftreg and phreg
models return an  Error in predict.coxph(f.ph.eha,
newdata = mort, type = lp) :  Data is not the same size as it was in
the original fit message.  Since I ultimately want a parametric model
and the real
data is left truncated and right censored, I think the aftreg function
in the eha package is what I must use.  Following is my sample code,
without the output.

#~ All models generated successfully -
f.ph - coxph(Surv(enter, exit, event) ~ ses, data = mort)
f.ph.eha - phreg(Surv(enter, exit, event) ~ ses, data = mort)
f.aft - aftreg(Surv(enter, exit, event) ~ ses, data = mort)

#~ All fits generated successfully ---
f.ph.fit - predict(f.ph, type='lp')
f.ph.eha.fit - predict(f.ph.eha, type='lp')
f.aft.fit - predict(f.aft, type='lp')

#~ First fit generated successfully, others output
error
f.ph.fit - predict(f.ph, newdata=mort, type='lp')
f.ph.eha.fit - predict(f.ph.eha, newdata=mort, type='lp')
f.aft.fit - predict(f.aft, newdata=mort, type='lp')


Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] predict 'expected' with eha package

2011-05-21 Thread Mike Harwood
I am unsure what is being returned, and what is supposed to be
returned, when using 'predict' with type='expected' for an aftreg
survival model.  The code below first generates a weibull  model, then
uses predict to create a vector of the linear predictors, then
attempts to create the 'expected' vector, which is empty.  The final
two steps in the code generate a lognormal model with the same data,
and the same empty 'expected' vector.

My expectation had been that 'expected' would generate the same
transformed dependent variable output as predict with a survreg model
using type='response'.  Since my 'real' data is left-truncated and
right-censored I cannot use survreg, and I wanted to investigate the
output from eha.

Thanks in advance!

Mike

 data(mort)
 aftreg(Surv(enter, exit, event) ~ ses, data = mort)
Call:
aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort)

Covariate  W.mean  Coef Exp(Coef)  se(Coef)Wald p
ses
   lower0.416 0 1   (reference)
   upper0.584-0.348 0.706 0.089 0.000

log(scale)3.60336.698 0.065 0.000
log(shape)0.331 1.392 0.058 0.000

Events276
Total time at risk 17038
Max. log. likelihood  -1391.3
LR test statistic 16.1
Degrees of freedom1
Overall p-value   5.91578e-05
 m1 - aftreg(Surv(enter, exit, event) ~ ses, data = mort)
 head(predict(m1, type='lp')) ## produces output
1 2 3 4 5 6
-0.347853  0.00 -0.347853  0.00  0.00  0.00
 head(predict(m1, type='expected')) ## is this correct?
numeric(0)
 m2 - aftreg(Surv(enter, exit, event) ~ ses, dist='lognormal', data = mort)
 head(predict(m2, type='expected')) ## is this correct?
numeric(0)


from eha (the survival and rms packages are not an option for my
'real' question, since I have left-truncated right-censored data

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Syntax for iter.max in rms

2011-05-08 Thread Mike Harwood
Hello,

I would like to increase the number of iterations for running a
Buckley-James regression model in the rms package, but something is
apparently syntactically wrong.  The following code initially is
exactly as it appears in the help page (which runs successfully), then
a failure to converge message (resulting from specifying an
'identity' link argument, the error message by adding both the
iter.max control and 'identity' link arguments, and finally, the error
message when testing just an iter.max argument.

This was run in R 2.13.0 with rms version 3.3-0 on Ubuntu Linux 11.04.

Thank you in advance, and all insights and criticisms are appreciated.

Mike

library(rms)
 f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital,x=TRUE, y=TRUE)
 f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, link='identity',x=TRUE, 
 y=TRUE)

No convergence in 50 steps
Failure in bj.fit
 f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, link='identity', 
 iter.max=200, x=TRUE, y=TRUE)
Error in bj(Surv(ftime, stroke) ~ rcs(age, 5) + hospital, link =
identity,  :
  unused argument(s) (iter.max = 200)
 f - bj(Surv(ftime, stroke) ~ rcs(age,5) + hospital, iter.max=200, x=TRUE, 
 y=TRUE)
Error in bj(Surv(ftime, stroke) ~ rcs(age, 5) + hospital, iter.max =
200,  :
  unused argument(s) (iter.max = 200)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ID parameter in model

2011-05-04 Thread Mike Harwood
Thank you, Goran.  Please see the package details below:

 packageDescription('eha')
Encoding: UTF-8
Package: eha
Version: 1.3-2
Date: 2011-03-01
Title: Event History Analysis
Description: A package for survival and event history analysis
License: GPL (= 3)
Author: Göran Broström
Depends: R (= 2.2.0), survival, graphics
Maintainer: Göran Broström g...@stat.umu.se
Packaged: 2011-03-01 14:56:12 UTC; gb
Repository: CRAN
Date/Publication: 2011-03-01 15:50:52
Built: R 2.13.0; i386-pc-mingw32; 2011-04-15 08:22:36 UTC; windows

Mike


On Mon, May 2, 2011 at 10:38 AM, Mike Harwood harwood...@gmail.com wrote:

 Hello,

 I am apparently confused about the use of an id parameter for an event
 history/survival model, and why the EHA documentation for aftreg does
 not specify one.  All assistance and insights are appreciated.

 Attempting to specifiy an id variable with the documentation example
 generates an overlapping intervals error, so I sorted the original
 mort dataframe and set subsequent entry times an id to the previous
 exit time + 0.0001.  This allowed me to see the affect of the id
 parameter on the coefficients and significance tests, and prompted my
 question.  The code I used is shown below, with the results at the
 bottom.  Thanks in advance!

 Mike

 head(mort) ## data clearly contains multiple entries for some of the
 dataframe ids

 no.id.aft - aftreg(Surv(enter, exit, event) ~ ses, data = mort)  ##
 Inital model
 id.aft - aftreg(Surv(enter, exit, event) ~ ses, data = mort, id=id)
 ## overlapping intervals error

 mort.sort - ## ensure records ordered
mort[
order(mort$id, mort$enter),]

 ## remove overlap
 for (i in 2:nrow(mort.sort)){
 if (mort.sort[i,'id'] == mort.sort[i-1,'id'])
 mort.sort[i,'enter'] - mort.sort[i-1, 'exit'] + 0.0001
}

 no.id.aft.sort - aftreg(Surv(enter, exit, event) ~ ses, data =
 mort.sort) ## initial model on modified df
 id.aft.sort - aftreg(Surv(enter, exit, event) ~ ses, id=id, data =
 mort.sort) ## with id parameter


 #=== output ===#
  no.id.aft.sort
 Call:
 aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort.sort)

 Covariate  W.mean  Coef Exp(Coef)  se(Coef)Wald p
 ses
   lower0.416 0 1   (reference)
   upper0.584-0.347 0.707 0.089 0.000

 log(scale)3.60336.704 0.065 0.000
 log(shape)0.331 1.393 0.058 0.000

 Events276
 Total time at risk 17045
 Max. log. likelihood  -1391.4
 LR test statistic 16.1
 Degrees of freedom1
 Overall p-value   6.04394e-05
  id.aft.sort
 Call:
 aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort.sort,
id = id)

 Covariate  W.mean  Coef Exp(Coef)  se(Coef)Wald p
 ses
   lower0.416 0 1   (reference)
   upper0.584-0.364 0.695 0.090 0.000

 log(scale)3.58836.171 0.065 0.000
 log(shape)0.338 1.402 0.058 0.000

 Events276
 Total time at risk 17045
 Max. log. likelihood  -1390.8
 LR test statistic 17.2
 Degrees of freedom1
 Overall p-value   3.3091e-05
 



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ID parameter in model

2011-05-02 Thread Mike Harwood
Hello,

I am apparently confused about the use of an id parameter for an event
history/survival model, and why the EHA documentation for aftreg does
not specify one.  All assistance and insights are appreciated.

Attempting to specifiy an id variable with the documentation example
generates an overlapping intervals error, so I sorted the original
mort dataframe and set subsequent entry times an id to the previous
exit time + 0.0001.  This allowed me to see the affect of the id
parameter on the coefficients and significance tests, and prompted my
question.  The code I used is shown below, with the results at the
bottom.  Thanks in advance!

Mike

head(mort) ## data clearly contains multiple entries for some of the
dataframe ids

no.id.aft - aftreg(Surv(enter, exit, event) ~ ses, data = mort)  ##
Inital model
id.aft - aftreg(Surv(enter, exit, event) ~ ses, data = mort, id=id)
## overlapping intervals error

mort.sort - ## ensure records ordered
mort[
order(mort$id, mort$enter),]

## remove overlap
for (i in 2:nrow(mort.sort)){
 if (mort.sort[i,'id'] == mort.sort[i-1,'id'])
 mort.sort[i,'enter'] - mort.sort[i-1, 'exit'] + 0.0001
}

no.id.aft.sort - aftreg(Surv(enter, exit, event) ~ ses, data =
mort.sort) ## initial model on modified df
id.aft.sort - aftreg(Surv(enter, exit, event) ~ ses, id=id, data =
mort.sort) ## with id parameter


#=== output ===#
 no.id.aft.sort
Call:
aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort.sort)

Covariate  W.mean  Coef Exp(Coef)  se(Coef)Wald p
ses
   lower0.416 0 1   (reference)
   upper0.584-0.347 0.707 0.089 0.000

log(scale)3.60336.704 0.065 0.000
log(shape)0.331 1.393 0.058 0.000

Events276
Total time at risk 17045
Max. log. likelihood  -1391.4
LR test statistic 16.1
Degrees of freedom1
Overall p-value   6.04394e-05
 id.aft.sort
Call:
aftreg(formula = Surv(enter, exit, event) ~ ses, data = mort.sort,
id = id)

Covariate  W.mean  Coef Exp(Coef)  se(Coef)Wald p
ses
   lower0.416 0 1   (reference)
   upper0.584-0.364 0.695 0.090 0.000

log(scale)3.58836.171 0.065 0.000
log(shape)0.338 1.402 0.058 0.000

Events276
Total time at risk 17045
Max. log. likelihood  -1390.8
LR test statistic 17.2
Degrees of freedom1
Overall p-value   3.3091e-05


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survexp with weights

2011-04-21 Thread Mike Harwood
The system details follow below.  Also, I have attempted specifying
the variables in a second ratetable statement, but the same missing
object error occurs in creating a survexp object/list.

 R.version
   _
platform   i386-pc-mingw32
arch   i386
os mingw32
system i386, mingw32
status
major  2
minor  12.2
year   2011
month  02
day25
svn rev54585
language   R
version.string R version 2.12.2 (2011-02-25)


package ‘survival’ version 2.36-5




On Apr 20, 11:03 am, Mike Harwood harwood...@gmail.com wrote:
 Hello,

 I probably have a syntax error in trying to generate an expected
 survival curve from a weighted cox model, but I can't see it.  I used
 the help sample code to generate a weighted model, with the addition
 of a weights=albumin argument (I only chose albumin because it had
 no missing values, not because of any real relevance).  Below are my
 code with the resulting error messages.  Thanks in advance!

  pfit - coxph(Surv(time,status0) ~ trt + log(bili) + log(protime) + age +

 +     + platelet,  data=pbc
 +     )

  pfit

 Call:
 coxph(formula = Surv(time, status  0) ~ trt + log(bili) +
 log(protime) +
     age + +platelet, data = pbc)

                   coef exp(coef) se(coef)        z      p
 trt          -0.000624     0.999  0.17304 -0.00360 1.
 log(bili)     0.985497     2.679  0.08949 11.01262 0.
 log(protime)  2.794001    16.346  0.95289  2.93215 0.0034
 age           0.020590     1.021  0.00805  2.55666 0.0110
 platelet     -0.001699     0.998  0.00085 -2.00130 0.0450

 Likelihood ratio test=164  on 5 df, p=0  n= 308, number of events=
 143
    (110 observations deleted due to missingness)

  plot(survfit(Surv(time, status0) ~ trt, data=pbc))
  lines(survexp( ~ trt, ratetable=pfit, data=pbc), col='purple')

  pfit.wtd - coxph(Surv(time,status0) ~ trt + log(bili) + log(protime) + 
  age +

 +     + platelet,  weights=albumin, data=pbc
 +     )

  pfit.wtd

 Call:
 coxph(formula = Surv(time, status  0) ~ trt + log(bili) +
 log(protime) +
     age + +platelet, data = pbc, weights = albumin)

                  coef exp(coef) se(coef)      z       p
 trt          -0.01354     0.987 0.094204 -0.144 8.9e-01
 log(bili)     0.99282     2.699 0.048690 20.391 0.0e+00
 log(protime)  2.54136    12.697 0.525797  4.833 1.3e-06
 age           0.01913     1.019 0.004398  4.350 1.4e-05
 platelet     -0.00165     0.998 0.000462 -3.578 3.5e-04

 Likelihood ratio test=535  on 5 df, p=0  n= 308, number of events=
 143
    (110 observations deleted due to missingness) plot(survfit(Surv(time, 
 status0) ~ trt, data=pbc))
  lines(survexp( ~ trt, ratetable=pfit.wtd, data=pbc), col='purple')

 Error in eval(expr, envir, enclos) : object 'albumin' not found 
 lines(survexp( ~ trt, ratetable=pfit.wtd, weights=albumin, data=pbc), 
 col='purple')

 Error in eval(expr, envir, enclos) : object 'albumin' not found
 In addition: Warning message:
 In survexp(~trt, ratetable = pfit.wtd, weights = albumin, data =
 pbc) :
   Weights ignored lines(survexp( ~ trt, ratetable=pfit.wtd, 
 weights=pbc$albumin, data=pbc), col='purple')

 Error in eval(expr, envir, enclos) : object 'albumin' not found
 In addition: Warning message:
 In survexp(~trt, ratetable = pfit.wtd, weights = pbc$albumin, data =
 pbc) :
   Weights ignored



 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] survexp with weights

2011-04-20 Thread Mike Harwood
Hello,

I probably have a syntax error in trying to generate an expected
survival curve from a weighted cox model, but I can't see it.  I used
the help sample code to generate a weighted model, with the addition
of a weights=albumin argument (I only chose albumin because it had
no missing values, not because of any real relevance).  Below are my
code with the resulting error messages.  Thanks in advance!

 pfit - coxph(Surv(time,status0) ~ trt + log(bili) + log(protime) + age +
+ + platelet,  data=pbc
+ )

 pfit
Call:
coxph(formula = Surv(time, status  0) ~ trt + log(bili) +
log(protime) +
age + +platelet, data = pbc)


  coef exp(coef) se(coef)z  p
trt  -0.000624 0.999  0.17304 -0.00360 1.
log(bili) 0.985497 2.679  0.08949 11.01262 0.
log(protime)  2.79400116.346  0.95289  2.93215 0.0034
age   0.020590 1.021  0.00805  2.55666 0.0110
platelet -0.001699 0.998  0.00085 -2.00130 0.0450

Likelihood ratio test=164  on 5 df, p=0  n= 308, number of events=
143
   (110 observations deleted due to missingness)

 plot(survfit(Surv(time, status0) ~ trt, data=pbc))
 lines(survexp( ~ trt, ratetable=pfit, data=pbc), col='purple')

 pfit.wtd - coxph(Surv(time,status0) ~ trt + log(bili) + log(protime) + age +
+ + platelet,  weights=albumin, data=pbc
+ )

 pfit.wtd
Call:
coxph(formula = Surv(time, status  0) ~ trt + log(bili) +
log(protime) +
age + +platelet, data = pbc, weights = albumin)


 coef exp(coef) se(coef)  z   p
trt  -0.01354 0.987 0.094204 -0.144 8.9e-01
log(bili) 0.99282 2.699 0.048690 20.391 0.0e+00
log(protime)  2.5413612.697 0.525797  4.833 1.3e-06
age   0.01913 1.019 0.004398  4.350 1.4e-05
platelet -0.00165 0.998 0.000462 -3.578 3.5e-04

Likelihood ratio test=535  on 5 df, p=0  n= 308, number of events=
143
   (110 observations deleted due to missingness)
 plot(survfit(Surv(time, status0) ~ trt, data=pbc))
 lines(survexp( ~ trt, ratetable=pfit.wtd, data=pbc), col='purple')
Error in eval(expr, envir, enclos) : object 'albumin' not found
 lines(survexp( ~ trt, ratetable=pfit.wtd, weights=albumin, data=pbc), 
 col='purple')
Error in eval(expr, envir, enclos) : object 'albumin' not found
In addition: Warning message:
In survexp(~trt, ratetable = pfit.wtd, weights = albumin, data =
pbc) :
  Weights ignored
 lines(survexp( ~ trt, ratetable=pfit.wtd, weights=pbc$albumin, data=pbc), 
 col='purple')
Error in eval(expr, envir, enclos) : object 'albumin' not found
In addition: Warning message:
In survexp(~trt, ratetable = pfit.wtd, weights = pbc$albumin, data =
pbc) :
  Weights ignored


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] debug biglm response error on bigglm model

2011-01-12 Thread Mike Harwood
Thank you, Greg.  The issue was in the simulation logic, where one of
the values was not changing correctly for some iterations...

On Jan 10, 3:20 pm, Greg Snow greg.s...@imail.org wrote:
 Not sure, but one possible candidate problem is that in your simulations one 
 iteration ended up with fewer levels of a factor than the overall dataset and 
 that caused the error.

 There is no recode function in the default packages, there are at least 6 
 recode functions in other packages, we cannot tell which you were using from 
 the code below.

 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111





  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
  project.org] On Behalf Of Mike Harwood
  Sent: Monday, January 10, 2011 6:29 AM
  To: r-h...@r-project.org
  Subject: [R] debug biglm response error on bigglm model

  G'morning

  What does the error message Error in x %*% coef(object) : non-
  conformable arguments indicate when calculating the response values
  for
  newdata with a model from bigglm (in package biglm), and how can I
  debug it?  I am attempting to do Monte Carlo simulations, which may
  explain the loop in the code that follows.  After the code I
  have included the output, which shows that the simulations are
  changing the response and input values, and that there are not any
  atypical values for the
  factors in the seventh iteration.  At the end of the output is the
  aforementioned error message.  Finally, I have included the model from
  biglm.

  Thanks in advance!

  Code:
  ===
  iter - nrow(nov.2010)
  predict.nov.2011 - vector(mode='numeric', length=iter)
  for (i in 1:iter) {
      iter.df - nov.2010
      ##-- Update values of dynamic variables --
      iter.df$age - iter.df$age + 12
      iter.df$pct_utilize -
          iter.df$pct_utilize + mc.util.delta[i]

      iter.df$updated_varname1 -
          ceiling(iter.df$updated_varname1 + mc.varname1.delta[i])

      if(iter.df$state==WI)
          iter.df$varname3 - iter.df$varname3 + mc.wi.varname3.delta[i]
      if(iter.df$state==MN)
          iter.df$varname3 - iter.df$varname3 + mc.mn.varname3.delta[i]
      if(iter.df$state==IL)
          iter.df$varname3 - iter.df$varname3 + mc.il.varname3.delta[i]
      if(iter.df$state==US)
          iter.df$varname3 - iter.df$varname3 + mc.us.varname3.delta[i]

      ##--- Bin Variables --
      iter.df$bin_varname1 - as.factor(recode(iter.df$updated_varname1,
          300:499 = '300 - 499';
           500:549 = '500 - 549';
           550:599 = '550 - 599';
           600:649 = '600 - 649';
           650:699 = '650 - 699';
           700:749 = '700 - 749';
           750:799 = '750 - 799'; 800:849 = 'GE 800'; else    =
  'missing';
           ))
      iter.df$bin_age - as.factor(recode(iter.df$age,
          0:23   = '  24mo.';
           24:72  = '24 - 72mo.';
           72:300 = '72 - 300mo'; else   = 'missing';
           ))
      iter.df$bin_util - as.factor(recode(iter.df$pct_utilize,
          0.0:0.2 = '  0 - 20%';
           0.2:0.4 = '  20 - 40%';
           0.4:0.6 = '  40 - 60%';
           0.6:0.8 = '  60 - 80%';
           0.8:1.0 = ' 80 - 100%';
           1.0:1.2 = '100 - 120%'; else    = 'missing';
           ))
      iter.df$bin_varname2 - as.factor(recode(iter.df$varname2_prop,
          0:70 = '     70%';
           70:85 = ' 70 - 85%';
           85:95 = ' 85 - 95%';
           95:110 = '95 - 110%'; else  =  'missing';
           ))
      iter.df$bin_varname1 - relevel(iter.df$bin_varname1, 'missing')
      iter.df$bin_age - relevel(iter.df$bin_age, 'missing')
      iter.df$bin_util - relevel(iter.df$bin_util, 'missing')
      iter.df$bin_varname2 - relevel(iter.df$bin_varname2, 'missing')

  #~     print(head(iter.df))
      if (i=6  i=8){
           print('-')
           browser()
           print(i)
           print(table(iter.df$bin_varname1))
           print(table(iter.df$bin_age))
           print(table(iter.df$bin_util))
           print(table(iter.df$bin_varname2))
  #~         debug(predict.nov.2011[i] -
  #~              sum(predict(logModel.1, newdata=iter.df,
  type='response')))
       }

      predict.nov.2011[i] -
           sum(predict(logModel.1, newdata=iter.df, type='response'))

      print(predict.nov.2011[i])

    }

  Output
  ==
  [1] 36.56073
  [1] 561.4516
  [1] 4.83483
  [1] 5.01398
  [1] 7.984146
  [1] -
  Called from: top level
  Browse[1]
  [1] 6

    missing 300 - 499 500 - 549 550 - 599 600 - 649 650 - 699 700 - 749
  750 - 799    GE 800
        842       283       690      1094      1695      3404
  6659     18374     21562

     missing     24mo. 24 - 72mo. 72 - 300mo
          16       2997      19709      31881

     missing    0 - 20%   20 - 40%   40 - 60%   60 - 80%  80 - 100% 100
  - 120%
       17906

[R] debug biglm response error on bigglm model

2011-01-10 Thread Mike Harwood
G'morning

What does the error message Error in x %*% coef(object) : non-
conformable arguments indicate when calculating the response values
for
newdata with a model from bigglm (in package biglm), and how can I
debug it?  I am attempting to do Monte Carlo simulations, which may
explain the loop in the code that follows.  After the code I
have included the output, which shows that the simulations are
changing the response and input values, and that there are not any
atypical values for the
factors in the seventh iteration.  At the end of the output is the
aforementioned error message.  Finally, I have included the model from
biglm.

Thanks in advance!

Code:
===
iter - nrow(nov.2010)
predict.nov.2011 - vector(mode='numeric', length=iter)
for (i in 1:iter) {
iter.df - nov.2010
##-- Update values of dynamic variables --
iter.df$age - iter.df$age + 12
iter.df$pct_utilize -
iter.df$pct_utilize + mc.util.delta[i]

iter.df$updated_varname1 -
ceiling(iter.df$updated_varname1 + mc.varname1.delta[i])

if(iter.df$state==WI)
iter.df$varname3 - iter.df$varname3 + mc.wi.varname3.delta[i]
if(iter.df$state==MN)
iter.df$varname3 - iter.df$varname3 + mc.mn.varname3.delta[i]
if(iter.df$state==IL)
iter.df$varname3 - iter.df$varname3 + mc.il.varname3.delta[i]
if(iter.df$state==US)
iter.df$varname3 - iter.df$varname3 + mc.us.varname3.delta[i]

##--- Bin Variables --
iter.df$bin_varname1 - as.factor(recode(iter.df$updated_varname1,
300:499 = '300 - 499';
 500:549 = '500 - 549';
 550:599 = '550 - 599';
 600:649 = '600 - 649';
 650:699 = '650 - 699';
 700:749 = '700 - 749';
 750:799 = '750 - 799'; 800:849 = 'GE 800'; else=
'missing';
 ))
iter.df$bin_age - as.factor(recode(iter.df$age,
0:23   = '  24mo.';
 24:72  = '24 - 72mo.';
 72:300 = '72 - 300mo'; else   = 'missing';
 ))
iter.df$bin_util - as.factor(recode(iter.df$pct_utilize,
0.0:0.2 = '  0 - 20%';
 0.2:0.4 = '  20 - 40%';
 0.4:0.6 = '  40 - 60%';
 0.6:0.8 = '  60 - 80%';
 0.8:1.0 = ' 80 - 100%';
 1.0:1.2 = '100 - 120%'; else= 'missing';
 ))
iter.df$bin_varname2 - as.factor(recode(iter.df$varname2_prop,
0:70 = ' 70%';
 70:85 = ' 70 - 85%';
 85:95 = ' 85 - 95%';
 95:110 = '95 - 110%'; else  =  'missing';
 ))
iter.df$bin_varname1 - relevel(iter.df$bin_varname1, 'missing')
iter.df$bin_age - relevel(iter.df$bin_age, 'missing')
iter.df$bin_util - relevel(iter.df$bin_util, 'missing')
iter.df$bin_varname2 - relevel(iter.df$bin_varname2, 'missing')

#~ print(head(iter.df))
if (i=6  i=8){
 print('-')
 browser()
 print(i)
 print(table(iter.df$bin_varname1))
 print(table(iter.df$bin_age))
 print(table(iter.df$bin_util))
 print(table(iter.df$bin_varname2))
#~ debug(predict.nov.2011[i] -
#~  sum(predict(logModel.1, newdata=iter.df,
type='response')))
 }

predict.nov.2011[i] -
 sum(predict(logModel.1, newdata=iter.df, type='response'))

print(predict.nov.2011[i])

  }

Output
==
[1] 36.56073
[1] 561.4516
[1] 4.83483
[1] 5.01398
[1] 7.984146
[1] -
Called from: top level
Browse[1]
[1] 6

  missing 300 - 499 500 - 549 550 - 599 600 - 649 650 - 699 700 - 749
750 - 799GE 800
  842   283   690  1094  1695  3404
6659 18374 21562

   missing 24mo. 24 - 72mo. 72 - 300mo
16   2997  19709  31881

   missing0 - 20%   20 - 40%   40 - 60%   60 - 80%  80 - 100% 100
- 120%
 17906   4832   4599   5154   7205
14865 42

  missing  70%  70 - 85%  85 - 95% 95 - 110%
10423 19429 10568  8350  5833
[1] 11.04090
[1] -
Called from: top level
Browse[1]
[1] 7

  missing 300 - 499 500 - 549 550 - 599 600 - 649 650 - 699 700 - 749
750 - 799
  847   909  1059  1586  3214  6304
16349 24335

   missing 24mo. 24 - 72mo. 72 - 300mo
16   2997  19709  31881

   missing0 - 20%   20 - 40%   40 - 60%   60 - 80%  80 - 100% 100
- 120%
 17145   4972   4617   5020   6634
16139 76

  missing  70%  70 - 85%  85 - 95% 95 - 110%
10423 19429 10568  8350  5833
Error in x %*% coef(object) : non-conformable arguments

Model
===
Large data regression model: bigglm(outcome ~ bin_varname1 +
bin_varname2 + bin_age + bin_util +
state + varname3 + varname3:state, family = binomial(link =
logit),
data = dev.data, maxit = 75, sandwich = FALSE)
Sample size =  1372250

__
R-help@r-project.org mailing list

[R] Error in x %*% coef(object) : non-conformable arguments

2011-01-07 Thread Mike Harwood
Hello, and thanks in advance!

What does the error message Error in x %*% coef(object) : non-
conformable arguments when indicate when predicting values for
newdata with a model from bigglm (in package biglm), and how can I
debug it?  I am attempting to do Monte Carlo simulations, which may
explain the somewhat interesting loop which follows.  After the code I
have included the output, which shows that a) the simulations are
changing the values, and b) there are not any atypical values for the
factors in the seventh iteration.  At the end of the output is the
aforementioned error message.

Code:
===
iter - nrow(nov.2010)
predict.nov.2011 - vector(mode='numeric', length=iter)
for (i in 1:iter) {
iter.df - nov.2010
##-- Update values of dynamic variables --
iter.df$age - iter.df$age + 12
iter.df$pct_utilize -
iter.df$pct_utilize + mc.util.delta[i]

iter.df$updated_varname1 -
ceiling(iter.df$updated_varname1 + mc.varname1.delta[i])

if(iter.df$state==WI)
iter.df$varname3 - iter.df$varname3 + mc.wi.varname3.delta[i]
if(iter.df$state==MN)
iter.df$varname3 - iter.df$varname3 + mc.mn.varname3.delta[i]
if(iter.df$state==IL)
iter.df$varname3 - iter.df$varname3 + mc.il.varname3.delta[i]
if(iter.df$state==US)
iter.df$varname3 - iter.df$varname3 + mc.us.varname3.delta[i]

##--- Bin Variables --
iter.df$bin_varname1 - as.factor(recode(iter.df$updated_varname1,
300:499 = '300 - 499';
 500:549 = '500 - 549';
 550:599 = '550 - 599';
 600:649 = '600 - 649';
 650:699 = '650 - 699';
 700:749 = '700 - 749';
 750:799 = '750 - 799'; 800:849 = 'GE 800'; else=
'missing';
 ))
iter.df$bin_age - as.factor(recode(iter.df$age,
0:23   = '  24mo.';
 24:72  = '24 - 72mo.';
 72:300 = '72 - 300mo'; else   = 'missing';
 ))
iter.df$bin_util - as.factor(recode(iter.df$pct_utilize,
0.0:0.2 = '  0 - 20%';
 0.2:0.4 = '  20 - 40%';
 0.4:0.6 = '  40 - 60%';
 0.6:0.8 = '  60 - 80%';
 0.8:1.0 = ' 80 - 100%';
 1.0:1.2 = '100 - 120%'; else= 'missing';
 ))
iter.df$bin_varname2 - as.factor(recode(iter.df$varname2_prop,
0:70 = ' 70%';
 70:85 = ' 70 - 85%';
 85:95 = ' 85 - 95%';
 95:110 = '95 - 110%'; else  =  'missing';
 ))
iter.df$bin_varname1 - relevel(iter.df$bin_varname1, 'missing')
iter.df$bin_age - relevel(iter.df$bin_age, 'missing')
iter.df$bin_util - relevel(iter.df$bin_util, 'missing')
iter.df$bin_varname2 - relevel(iter.df$bin_varname2, 'missing')

#~ print(head(iter.df))
if (i=6  i=8){
 print('-')
 browser()
 print(i)
 print(table(iter.df$bin_varname1))
 print(table(iter.df$bin_age))
 print(table(iter.df$bin_util))
 print(table(iter.df$bin_varname2))
#~ debug(predict.nov.2011[i] -
#~  sum(predict(logModel.1, newdata=iter.df,
type='response')))
 }

predict.nov.2011[i] -
 sum(predict(logModel.1, newdata=iter.df, type='response'))

print(predict.nov.2011[i])

  }


Output
==
[1] 36.56073
[1] 561.4516
[1] 4.83483
[1] 5.01398
[1] 7.984146
[1] -
Called from: top level
Browse[1]
[1] 6

  missing 300 - 499 500 - 549 550 - 599 600 - 649 650 - 699 700 - 749
750 - 799GE 800
  842   283   690  1094  1695  3404
6659 18374 21562

   missing 24mo. 24 - 72mo. 72 - 300mo
16   2997  19709  31881

   missing0 - 20%   20 - 40%   40 - 60%   60 - 80%  80 - 100% 100
- 120%
 17906   4832   4599   5154   7205
14865 42

  missing  70%  70 - 85%  85 - 95% 95 - 110%
10423 19429 10568  8350  5833
[1] 11.04090
[1] -
Called from: top level
Browse[1]
[1] 7

  missing 300 - 499 500 - 549 550 - 599 600 - 649 650 - 699 700 - 749
750 - 799
  847   909  1059  1586  3214  6304
16349 24335

   missing 24mo. 24 - 72mo. 72 - 300mo
16   2997  19709  31881

   missing0 - 20%   20 - 40%   40 - 60%   60 - 80%  80 - 100% 100
- 120%
 17145   4972   4617   5020   6634
16139 76

  missing  70%  70 - 85%  85 - 95% 95 - 110%
10423 19429 10568  8350  5833
Error in x %*% coef(object) : non-conformable arguments

Model
===
Large data regression model: bigglm(outcome ~ bin_varname1 +
bin_varname2 + bin_age + bin_util +
state + varname3 + varname3:state, family = binomial(link =
logit),
data = dev.data, maxit = 75, sandwich = FALSE)
Sample size =  1372250

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the 

[R] Set axis limits in mixtools plot

2011-01-06 Thread Mike Harwood
Hello,

Can the x and y axis limits be specified in a density plot with the
mixtools package for a finite mixture model?  Uncommenting the xlim2/
ylim2 lines in the plot command below generates 'not a graphical
parameter' warnings (and does not change the axis settings), and
uncommenting the xlim/ylim lines generates a 'formal argument ylim
matched by multiple actual arguments' error.  I want to use a
consistent set of axis setting for four different sets of data.
Thanks!

test.data -
c(0.0289,0.0342,0.0379,0.0405,0.0421,0.0429,0.0430,0.0426,0.0423
,
0.0425,0.0435,0.0451,0.0466,0.0477,0.0480,0.0477,0.0473,0.0473,0.0479
,
0.0492,0.0507,0.0519,0.0526,0.0523,0.0507,0.0482,0.0452,0.0428,0.0414
,
0.0409,0.0404,0.0388,0.0358,0.0319,0.0283,0.0263,0.0269,0.0298,0.0339
,
0.0378,0.0404,0.0417,0.0425,0.0436,0.0457,0.0489,0.0522,0.0544,0.0546
,
0.0529,0.0501,0.0474,0.0457,0.0454,0.0460,0.0467,0.0466,0.0450,0.0423
,
0.0395,0.0378,0.0377,0.0388,0.0401,0.0404,0.0394,0.0378,0.0366,0.0371
,
0.0395,0.0433,0.0474,0.0512,0.0546,0.0581,0.0622,0.0673,0.0729,0.0782
,
0.0824,0.0854,0.0877,0.0902,0.0934,0.0973,0.1009,0.1032,0.1038,0.1029
,
0.1017,0.1017,0.1034,0.1065,0.1099,0.1126,0.1137,0.1133,0.1123,0.1118
,
0.1124,0.1140,0.1159,0.1175,0.1182,0.1179,0.1170,0.1160,0.1153,0.1153
,
0.1159,0.1164,0.1161,0.1146,0.1118,0.1080,0.1038,0.1002,0.0976,0.0961
,
0.0954,0.0948,0.0940,0.0929,0.0920,0.0916,0.0921,0.0934,0.0951,0.0964
,
0.0964,0.0951,0.0925,0.0894,0.0864,0.0841,0.0827,0.0821,0.0819,0.0817
,
0.0814,0.0808,0.0797,0.0783,0.0768,0.0756,0.0749,0.0749,0.0752,0.0752
,
0.0743,0.0724,0.0694,0.0662,0.0635,0.0616,0.0607,0.0603,0.0600,0.0592
,
0.0580,0.0567,0.0553,0.0539,0.0522,0.0499,0.0470,0.0435,0.0394,0.0346
,
0.0290,0.0227,0.0160,0.0098,0.0045,0.0009,-0.0012,-0.0021,-0.0025,-0.0029
,-0.0038,-0.0052,-0.0074,-0.0103,-0.0141,-0.0189,-0.0246,-0.0308,-0.0370
,-0.0432,-0.0495,-0.0561,-0.0627,-0.0681,-0.0716,-0.0732,-0.0736,-0.0740
,-0.0750,-0.0760,-0.0758,-0.0732,-0.0677,-0.0603,-0.0531,-0.0481,-0.0459
,-0.0448,-0.0421,-0.0364,-0.0285,-0.0219,-0.0202,-0.0245,-0.0321,-0.0386
,-0.0399,-0.0352,-0.0272,-0.0206,-0.0187,-0.0220,-0.0280,-0.0330
)

library(mixtools)
fit - normalmixEM(test.data)

plot(fit
,whichplots=2
#,xlim2=c(-0.1, 0.15)
#,ylim2=c(0, 20)
#,xlim=c(-0.1, 0.15)
#,ylim=c(0, 20)
)
grid()

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reconcile Random Samples

2010-12-23 Thread Mike Harwood
Is there a way to generate identical random samples using R's runif
function and SAS's ranuni function?  I have assigned the same seed
values in both software packages, but the following results show
different results.   Thanks!

R
===
 set.seed(6)
 random - runif(10)
 random
 [1] 0.6062683 0.9376420 0.2643521 0.3800939 0.8074834 0.9780757
0.9579337
 [8] 0.7627319 0.5096485 0.0644768

SAS (the log file)
===
15 data _null_;
16   do i=1 to 10;
17  random = ranuni(6);
18  put i= random=;
19  end;
20 run;

i=1 random=0.1097754189
i=2 random=0.8205322939
i=3 random=0.3989458365
i=4 random=0.5563918723
i=5 random=0.5296154672
i=6 random=0.8156640985
i=7 random=0.2578750389
i=8 random=0.1901503369
i=9 random=0.2987641572
i=10 random=0.3993993096

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] pairs and mfrow

2010-09-27 Thread Mike Harwood
Is there an alternative to par(mfrow=c(2,1)) to get stacked scatterplot
matrixes generated with pairs?

I am using version 2.11.1 on Windows XP.  The logic I am using follows, and
the second pairs plot replaces the first plot in the current graphics
device, which is not what I expected (or desired).

par(mfrow=c(2,1))
pairs(b2007, main=6/2000 - 12/2006)
pairs(a2007, main=1/2007 - 06/2009)

Thanks in advance!

Mike

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Assign Name of Data Frame

2010-02-12 Thread Mike Harwood
Marc,

Thank you for your help.  I had tried assign, but I did not enclose the
table name in quotes in my function call.  I needed to see what someone else
had written before I ever would have noticed it!

On Fri, Feb 12, 2010 at 10:00 AM, Marc Schwartz marc_schwa...@me.comwrote:

 On Feb 12, 2010, at 8:19 AM, mah wrote:

  Hello R Experts,
 
  How can I assign the name of a data frame with the argument of a
  function?  Specifically I am using RODBC to build local dataframes
  from SAS datasets on a
  remote server.  I would like the local dataframe have the same name as
  the source SAS dataset, and the function below is what I am
  developing.  However, the substitute(table) on the left side of the
  assignment
  generates the error Error in substitute(table) - sqlQuery(sears,
  sql.stmt) :
  could not find function substitute-.
 
  Thanks in advance
 
  MakeDF - function(table)
  #
  # Function makes dataframe from UNIX SAS datasets
  #
  {
  st.time - Sys.time()
  print(substitute(table))
  sql.stmt - paste(select * from swprod., substitute(table),
  sep=)
  print(sql.stmt)
  substitute(table) - sqlQuery(sears, sql.stmt)
  #  deparse(substitute(table)) - sqlQuery(sears, sql.stmt)
  end.time
  print(end.time - st.time)
  }
  MakeDF(sku_series)



 My recommendation would be something like this:

 MakeDF - function(table)
 {
  DF - sqlQuery(channel, paste(select * from swprod., table, sep = ))
  assign(table, DF, envir = parent.frame())
 }

 Then use would be:

  MakeDF(sku_series)


 The result would be a data frame called 'sku_series' in the calling
 environment.  You could substitute globalenv() for parent.frame() if you
 wanted to create the data frame in the global environment.

 See ?assign

 HTH,

 Marc Schwartz



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.