Re: [R] multivariate problem

2012-01-08 Thread Clifford Long
Would some type of multivariate SPC be useful?  Potentially useful
options might include those based on SPC using PCA, or perhaps
Hotelling's T2.

Maybe you would find something useful in a package such as rrcov?
http://cran.r-project.org/web/packages/rrcov/vignettes/rrcov.pdf

R/S

Cliff



On Sun, Jan 8, 2012 at 11:16 AM, David Winsemius dwinsem...@comcast.net wrote:

 On Jan 8, 2012, at 3:01 AM, Iasonas Lamprianou wrote:

 Dear all,
 I am not sure if this is the right place to ask this question, but I will
 have a go. Please redirect me to a different place if this is not the right
 one!

 I have a (relatively) simple problem which causes me some frustration
 because I cannot find the solution. I measure ten variables (var1 to var10)
 every day, they are all continuous (linear) and most of them are correlated.
 Some days, for any reason, the relationship between these variables may
 change. They are still correlated, but their correlation may change slightly
 but practically this is important. Or, one of the variables may increase its
 value significantly suddenly and keep this high value for a few days and
 then come back to the normal level. I am using R. Is there any function I
 can use to help me identify these strange days when the relationship between
 these variables changes? For example, if DayX is such a strange day, factor
 analyzing the data before DayX and after DayX separately would give me
 different factors (princial components). But how can I identify such a daym
 without trial and error?


 The zoo package has `rollapply`. You would of course be required to be much
 more specific in defining your problem than you have been so far.

 --
 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using plot with time series object - axes = FALSE option does not appear to work

2011-01-03 Thread Clifford Long
Dear R-help,

I am told by Professor Ripley that my question (quote) wastes the time
of (and has insulted on-list) the R developers by falsely claiming
there are problems with their code.  I am writing, as he instructed,
to publicly apologize to the R developers for any slight that my
question might have generated.

When posing my question, I genuinely thought that by invoking plot
that it was a Base R graphics package that I was using, and not that
offered by Mr. Ryan.

As I am told that I owe an apology, I would like to do as I was
instructed and publicly apologize on the list to the R developers, but
not for any malicious intent, but rather for my lack of expertise and
any unintentional insult that resulted.  (My apologies also to Mr.
Ryan for my ignorance.)

Signed bruised, but maybe a little less ignorant about R.

Cliff Long
gnolff...@gmail.com




On Mon, Jan 3, 2011 at 12:03 AM, Clifford Long gnolff...@gmail.com wrote:
 Dear R-help,

 I am attempting to plot data using standard R plot utilities.  The
 data was retrieved from FRED (St. Louis Federal Reserve) using the
 package quantmod.  My question is NOT about quantmod.  While I
 retrieve data using quantmod, I am not using its charting utility.  I
 have been having success using the standard R plot utilities to this
 point with this type of data.

 Eventually I want to put two series on the same plot but with the
 y-axis for one series on the right side, and also inverted (min at the
 top, max at the bottom).  I believe that I see how to do this by using
 par(new = TRUE) with a second plot statement with axes = FALSE,
 followed by the command axis(side = 4, ylim = c(max(seriesname),
 min(seriesname)).

 Here is what I believe should be a smaller reproducible example of my issue:

 #-

 library(quantmod)

 getSymbols('PCECTPI', src='FRED')
 is.xts(PCECTPI)      # check the type of object - response is 'TRUE'

 plot(PCECTPI)  # This works fine.

 plot(PCECTPI, axes = FALSE)  # This works in that it gives me a plot,
 but I get the axes regardless of the use of axes = FALSE.

 #-

 I did find that using par(yaxt = n) seems to work to suppress the
 y-axis.  But it seems to me that the axes = FALSE command should
 also work, and I believe that it would be easier to use in the larger
 context of my goal.


 I have spent time with the R help pages and Nabble searches of the R
 help archives but I still seem to be missing something.

 Does the axes = FALSE option not work when using plot with this type
 of data object?
 Is there some other fundamental thing that I have overlooked?
 Or should this work?

 My apologies if the answer is obvious and I've just missed it.  Thank
 you in advance for any help that can be provided.

 Cliff Long
 gnolff...@gmail.com


 

 My system:
 HP Pavilion
 Windows 7
 The computer/OS is 64-bit.  I am running the precompiled 32-bit
 version of R (per sessionInfo).
 Thus far, everything seems to have been working as expected.


 sessionInfo()
 R version 2.12.1 (2010-12-16)
 Platform: i386-pc-mingw32/i386 (32-bit)

 locale:
 [1] LC_COLLATE=English_United States.1252
 [2] LC_CTYPE=English_United States.1252
 [3] LC_MONETARY=English_United States.1252
 [4] LC_NUMERIC=C
 [5] LC_TIME=English_United States.1252

 attached base packages:
 [1] stats     graphics  grDevices utils     datasets  methods   base

 other attached packages:
 [1] quantmod_0.3-15 TTR_0.20-2      xts_0.7-5       zoo_1.6-4
 [5] Defaults_1.1-1

 loaded via a namespace (and not attached):
 [1] grid_2.12.1     lattice_0.19-13 tools_2.12.1


 Sys.getlocale()
 [1] LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
 States.1252;LC_MONETARY=English_United
 States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using plot with time series object - axes = FALSE option does not appear to work

2011-01-02 Thread Clifford Long
Dear R-help,

I am attempting to plot data using standard R plot utilities.  The
data was retrieved from FRED (St. Louis Federal Reserve) using the
package quantmod.  My question is NOT about quantmod.  While I
retrieve data using quantmod, I am not using its charting utility.  I
have been having success using the standard R plot utilities to this
point with this type of data.

Eventually I want to put two series on the same plot but with the
y-axis for one series on the right side, and also inverted (min at the
top, max at the bottom).  I believe that I see how to do this by using
par(new = TRUE) with a second plot statement with axes = FALSE,
followed by the command axis(side = 4, ylim = c(max(seriesname),
min(seriesname)).

Here is what I believe should be a smaller reproducible example of my issue:

#-

library(quantmod)

getSymbols('PCECTPI', src='FRED')
is.xts(PCECTPI)  # check the type of object - response is 'TRUE'

plot(PCECTPI)  # This works fine.

plot(PCECTPI, axes = FALSE)  # This works in that it gives me a plot,
but I get the axes regardless of the use of axes = FALSE.

#-

I did find that using par(yaxt = n) seems to work to suppress the
y-axis.  But it seems to me that the axes = FALSE command should
also work, and I believe that it would be easier to use in the larger
context of my goal.


I have spent time with the R help pages and Nabble searches of the R
help archives but I still seem to be missing something.

Does the axes = FALSE option not work when using plot with this type
of data object?
Is there some other fundamental thing that I have overlooked?
Or should this work?

My apologies if the answer is obvious and I've just missed it.  Thank
you in advance for any help that can be provided.

Cliff Long
gnolff...@gmail.com




My system:
HP Pavilion
Windows 7
The computer/OS is 64-bit.  I am running the precompiled 32-bit
version of R (per sessionInfo).
Thus far, everything seems to have been working as expected.


 sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] quantmod_0.3-15 TTR_0.20-2  xts_0.7-5   zoo_1.6-4
[5] Defaults_1.1-1

loaded via a namespace (and not attached):
[1] grid_2.12.1 lattice_0.19-13 tools_2.12.1


 Sys.getlocale()
[1] LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Homogeneity of regression slopes

2010-09-14 Thread Clifford Long
Hi Thomas,

Thanks for the additional information.

Just wondering, and hoping to learn ... would any lack of homogeneity of
variance (which is what I believe you mean by different stddev estimates) be
found when performing standard regression diagnostics, such as residual
plots, Levene's test (or equivalent), etc.?  If so, then would a WLS routine
or some type of variance stabilizing transformation be useful?

Again, hoping to learn.  I'll check out the gls() routine in the nlme
package, as you mentioned.

Thanks.

Cliff


On Mon, Sep 13, 2010 at 10:02 PM, Thomas Stewart tgstew...@gmail.comwrote:

 Allow me to add to Michael's and Clifford's responses.

 If you fit the same regression model for each group, then you are also
 fitting a standard deviation parameter for each model.  The solution
 proposed by Michael and Clifford is a good one, but the solution assumes
 that the standard deviation parameter is the same for all three models.

 You may want to consider the degree by which the standard deviation
 estimates differ for the three separate models.  If they differ wildly, the
 method described by Michael and Clifford may not be the best.  Rather, you
 may want to consider gls() in the nlme package to explicitly allow the
 variance parameters to vary.

 -tgs

 On Mon, Sep 13, 2010 at 4:52 PM, Doug Adams f...@gmx.com wrote:

  Hello,
 
  We've got a dataset with several variables, one of which we're using
  to split the data into 3 smaller subsets.  (as the variable takes 1 of
  3 possible values).
 
  There are several more variables too, many of which we're using to fit
  regression models using lm.  So I have 3 models fitted (one for each
  subset of course), each having slope estimates for the predictor
  variables.
 
  What we want to find out, though, is whether or not the overall slopes
  for the 3 regression lines are significantly different from each
  other.  Is there a way, in R, to calculate the overall slope of each
  line, and test whether there's homogeneity of regression slopes?  (Am
  I using that phrase in the right context -- comparing the slopes of
  more than one regression line rather than the slopes of the predictors
  within the same fit.)
 
  I hope that makes sense.  We really wanted to see if the predicted
  values at the ends of the 3 regression lines are significantly
  different... But I'm not sure how to do the Johnson-Neyman procedure
  in R, so I think testing for slope differences will suffice!
 
  Thanks to any who may be able to help!
 
  Doug Adams
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Homogeneity of regression slopes

2010-09-13 Thread Clifford Long
If you'll allow me to throw in two cents ...

Like Michael said, the dummy variable route is the way to go, but I believe
that the coefficients on the dummy variables test for equal intercepts.  For
equality of slopes, do we need the interaction between the dummy variable
and the explanatory variable whose slope (coefficient) is of interest?  I'll
add some detail below.


For only two groups, we could use a single 2-level dummy variable D
D = 0 is the reference level (group)
D = 1 is the other level (group)


Equality of intercepts

y = b0 + b1*x + b2*D

If D = 0, then y = b0 + b1*x
If D = 1, then y = b0 + b1*x + b2   ..   group like terms: y = (b0 + b2)
+ b1*x

If coefficient b2 = 0, then we might fail to reject the null hypothesis that
the intercepts are equal
If coefficient b2  0, then we would reject the null hypothesis that the
intercepts are equal


Equality of slopes model

 y = b0 + b1*x + b2*D + b3*x*D

(we added the interaction between x and D)


If D = 0, then y = b0 + b1*x
If D = 1, then y = b0 + b1*x + b2 + b3*x  ..   group like terms: y = (b0
+ b2) + (b1 + b3)*x

If coefficient b3 = 0, then we might fail to reject the null hypothesis that
the slopes are equal
If coefficient b3  0, then we would reject the null hypothesis that
the slopes are equal


For a model with three groups, assuming that lm / glm / etc. would really do
this for you, the explicit dummy variable coding might look like:

 D1  D2
group 1   0 0(reference level ... can usually choose)
group 2   1 0
group 3   0 1

I believe that this is called a sigma-restricted model (??), as opposed to
an overparameterized model where three groups would have three dummy
variables.
You can probably find this info in most books on basic regression.  This
might be overly simplistic, and I'll happily stand corrected if I've made
any mistakes.

Otherwise, I hope that this helps.

Cliff




On Mon, Sep 13, 2010 at 7:12 PM, Michael Bedward
michael.bedw...@gmail.comwrote:

 Hello Doug,

 Perhaps it would just be easier to keep your data together and have a
 single regression with a term for the grouping variable (a factor with
 3 levels). If the groups give identical results the coefficients for
 the two non-reference grouping variable levels will include 0 in their
 confidence interval.

 Michael


 On 14 September 2010 06:52, Doug Adams f...@gmx.com wrote:
  Hello,
 
  We've got a dataset with several variables, one of which we're using
  to split the data into 3 smaller subsets.  (as the variable takes 1 of
  3 possible values).
 
  There are several more variables too, many of which we're using to fit
  regression models using lm.  So I have 3 models fitted (one for each
  subset of course), each having slope estimates for the predictor
  variables.
 
  What we want to find out, though, is whether or not the overall slopes
  for the 3 regression lines are significantly different from each
  other.  Is there a way, in R, to calculate the overall slope of each
  line, and test whether there's homogeneity of regression slopes?  (Am
  I using that phrase in the right context -- comparing the slopes of
  more than one regression line rather than the slopes of the predictors
  within the same fit.)
 
  I hope that makes sense.  We really wanted to see if the predicted
  values at the ends of the 3 regression lines are significantly
  different... But I'm not sure how to do the Johnson-Neyman procedure
  in R, so I think testing for slope differences will suffice!
 
  Thanks to any who may be able to help!
 
  Doug Adams
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simulating points from GLM corresponding to new x-values

2009-08-12 Thread Clifford Long
Would the predict routine (using 'newdata') do what you need?

Cliff Long
Hollister Incorporated



On Wed, Aug 12, 2009 at 4:33 AM, Jacob Nabe-Nielsenjacobn...@me.com wrote:
 Dear List,

 Does anyone know how to simulate data from a GLM object correponding
 to values of the independent (x) variable that do not occur in the
 original dataset?

 I have tried using simulate(), but it generates a new value of the
 dependent variable corresponding to each of the original x-values,
 which is not what I need. Ideally I whould like to simulate new values
 for GLM objects both with family=gaussian and with family=binomial.

 Thanks in advance,
 Jacob

 Jacob Nabe-Nielsen, PhD, MSc
 Scientist
  --
 Section for Climate Effects and System Modelling
 Department of Arctic Environment
 National Enviornmental Research Institute
 Aarhus University
 Frederiksborgvej 399, Postbox 358
 4000 Roskilde, Denmark

 email: n...@dmu.dk
 fax: +45 4630 1914
 phone: +45 4630 1944


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simulating points from GLM corresponding to new x-values

2009-08-12 Thread Clifford Long
Hi Jacob,

At the risk of embarrassing myself, I gave it a shot.  I'll throw this
out on the list, if for no other reason than perhaps someone with
higher wattage than myself might tear it apart and give you something
both useful and perhaps elegant (from an R coding standpoint).

(see the following ... it just ran for me, so I hope it will for you, too)

If this isn't what you need, I'll shut up and watch and learn from the others.

Regards,

Cliff


I tried to put together something that might work as an example ...
based on a simple linear regression model, but using the GLM routine.

Once the model was created, I used 'predict' based on the model
outcome and the original x values.

I then used 'predict' based on the model outcome and new x values,
along with a function for simulation of the distribution at the new x
values.

At the


#--
# Create sample data set to use with GLM
#   (assume first order linear model for now)
#--
b0 = 10
b1 = 0.3
x = sort(rep(seq(1,11, by=2), 10))

fn.y = function(x1){y1 = b0 + b1*x1 + rnorm(n=1, mean=0, sd=1)}

y = sapply(x, fn.y)


xydata = data.frame(cbind(x, y))


model1 = glm(y ~ x, family = gaussian, data = xydata)


#--
# Generate new x values
#   run new x values through 'predict'
#--

newx = data.frame(xnew = sort(rep(seq(2,10, by=2), 12)))

y.pred = predict(model1, newx, se.fit = TRUE)


#--
# Generate simulated values based on new x values
#   and function based on outcome of 'predict' routine
#--

fn.pred = function(fit, sefit){rnorm(n=1, mean=fit, sd=sefit*sqrt(60))}

pred.sim = sapply(y.pred$fit, fn.pred, y.pred$se.fit)


#--
# Generate simulated values based on orig x values
#   using 'simulate' routine
#--

y.sim = simulate(model1, nsim = 1)


#--
# Plot original x, y values
#   then add simulated y values from 'simulate' based on orig x values
#   and the add simulated values from 'predict' and function based on
new x values
#--
plot(x, y)
lines(x, y.sim$sim_1, col='red', type='p')
lines(newx[,1], pred.sim, col='darkblue', type='p')

#--
# END
#--



On Wed, Aug 12, 2009 at 2:51 PM, Jacob Nabe-Nielsenjacobn...@me.com wrote:
 Hi Cliff -- thanks for the suggestion.

 I tried extracting the predicted mean and standard error using predict().
 Afterwards I simulated the dependent variable using rnorm(), with mean and
 standard deviation taken from the predict() function (sd = sqrt(n)*se). The
 points obtained this way were scattered far too much (compared to points
 obtained with simulate()) -- I am not quite sure why.

 Unfortunately the documentation of the simulate() function does not provide
 much information about how it is implemented, which makes it difficult to
 judge which method is best (predict() or simulate(), and it is also unclear
 whether simulate() can be applied to glms (with family=gaussian or
 binomial).

 Any suggestions for how to proceed?

 Jacob


 On 12 Aug 2009, at 13:11, Clifford Long wrote:

 Would the predict routine (using 'newdata') do what you need?

 Cliff Long
 Hollister Incorporated



 On Wed, Aug 12, 2009 at 4:33 AM, Jacob Nabe-Nielsenjacobn...@me.com
 wrote:

 Dear List,

 Does anyone know how to simulate data from a GLM object correponding
 to values of the independent (x) variable that do not occur in the
 original dataset?

 I have tried using simulate(), but it generates a new value of the
 dependent variable corresponding to each of the original x-values,
 which is not what I need. Ideally I whould like to simulate new values
 for GLM objects both with family=gaussian and with family=binomial.

 Thanks in advance,
 Jacob

 Jacob Nabe-Nielsen, PhD, MSc
 Scientist
  --
 Section for Climate Effects and System Modelling
 Department of Arctic Environment
 National Enviornmental Research Institute
 Aarhus University
 Frederiksborgvej 399, Postbox 358
 4000 Roskilde, Denmark

 email: n...@dmu.dk
 fax: +45 4630 1914
 phone: +45 4630 1944


       [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

[R] ROC curve using epicalc (after logistic regression) (re-sent)

2009-07-27 Thread Clifford Long
Dear R-help,

I am resending as I believe I screwed up the e-mail address to R-help
earlier.  Sorry for my lack of attention to detail, and for any
inconvenience.

I have also sent the question to the package maintainer, as suggested
in the posting guide.

Regards,

Cliff



-- Forwarded message --
From: Clifford Long gnolff...@gmail.com
Date: Sun, Jul 26, 2009 at 8:46 PM
Subject: Fwd: ROC curve using epicalc (after logistic regression)
To: cvira...@medicine.psu.ac.th


Dear Virasakdi Chongsuvivatwong,

After sending the message below to the R-help mailing list, it
occurred to me that I probably should also have sent a copy to you,
per R posting guidance.

I would be interested in any thoughts or suggestions that you might
have regarding my difficulty using the ROCR routine in the epicalc
package.  (I've used this before, and find it to be a very helpful
package ... thanks.)

Is my issue related to the way the data is structured for the glm
routine - meaning not with individual cases, but instead by counts
(per DOE treatment) of pass, fail, and total?

Or perhaps I've made another error?

I'll understand if you don't have the time to look this over.  In case
you do, any direction/guidance will be appreciated.

Thank you for your time, and for this excellent package.

Regards,

Cliff Long




-- Forwarded message --
From: Clifford Long gnolff...@gmail.com
Date: Sun, Jul 26, 2009 at 3:52 PM
Subject: ROC curve using epicalc (after logistic regression)
To: R-help@r-project.org


Dear R-help list,

I'm attempting to use the ROC routine from the epicalc package after
performing a logistic regression analysis.  My code is included after
the sessionInfo() result.  The datafile (GasketMelt1.csv) is attached.
 I updated both R and the epicalc packages and tried again before
sending this request.

sessionInfo result:

R version 2.9.1 (2009-06-26)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] caret_4.19      lattice_0.17-25 epicalc_2.9.1.2 survival_2.35-4
[5] foreign_0.8-36

loaded via a namespace (and not attached):
[1] grid_2.9.1  tools_2.9.1


Header information from package 'epicalc':
Package:            epicalc
Version:            2.9.1.2
Date:               2009-07-14


My code ...

#
#  Logistic Regression   (the model result is as expected)
#

dfile = 'GasketMelt1.csv'
gmelt.df = read.csv(dfile, header = TRUE, as.is = TRUE)
names(gmelt.df)

gmelt.df$p = gmelt.df$Pass / gmelt.df$Total

gmelt.glm = glm(p ~ Time + Temperature + Depth
                       + Time*Temperature + Time*Depth + Temperature*Depth,
                       family = binomial(link = logit), data=gmelt.df,
weight=Total)
summary(gmelt.glm)

#
#  ROC
#
library(epicalc)

lroc(gmelt.glm, graph = TRUE, line.col = red)


The error message:

 lroc(gmelt.glm, graph = TRUE, line.col = red)
Error in dimnames(x) - dn :
 length of 'dimnames' [2] not equal to array extent



Have I overlooked something?


Many thanks to anyone who might have a suggestion.

Cliff
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ROC curve using epicalc (after logistic regression)

2009-07-26 Thread Clifford Long
Dear R-help list,

I'm attempting to use the ROC routine from the epicalc package after
performing a logistic regression analysis.  My code is included after
the sessionInfo() result.  The datafile (GasketMelt1.csv) is attached.
 I updated both R and the epicalc packages and tried again before
sending this request.

sessionInfo result:

R version 2.9.1 (2009-06-26)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] splines   stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] caret_4.19  lattice_0.17-25 epicalc_2.9.1.2 survival_2.35-4
[5] foreign_0.8-36

loaded via a namespace (and not attached):
[1] grid_2.9.1  tools_2.9.1


Header information from package 'epicalc':
Package:epicalc
Version:2.9.1.2
Date:   2009-07-14


My code ...

#
#  Logistic Regression   (the model result is as expected)
#

dfile = 'GasketMelt1.csv'
gmelt.df = read.csv(dfile, header = TRUE, as.is = TRUE)
names(gmelt.df)

gmelt.df$p = gmelt.df$Pass / gmelt.df$Total

gmelt.glm = glm(p ~ Time + Temperature + Depth
+ Time*Temperature + Time*Depth + Temperature*Depth,
family = binomial(link = logit), data=gmelt.df, 
weight=Total)
summary(gmelt.glm)

#
#  ROC
#
library(epicalc)

lroc(gmelt.glm, graph = TRUE, line.col = red)


The error message:

 lroc(gmelt.glm, graph = TRUE, line.col = red)
Error in dimnames(x) - dn :
  length of 'dimnames' [2] not equal to array extent



Have I overlooked something?


Many thanks to anyone who might have a suggestion.

Cliff
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: question about using _apply and/or aggregate functions

2009-06-22 Thread Clifford Long
Resending, as am not sure about the original To: address.  Sorry for
any redundancy.
- Cliff


-- Forwarded message --
From: Clifford Long gnolff...@gmail.com
Date: Mon, Jun 22, 2009 at 11:04 AM
Subject: question about using _apply and/or aggregate functions
To: r-h...@lists.r-project.org


Hi R-list,

I'll apologize in advance for (1) the wordiness of my note (not sure
how to avoid it) and (2) any deficiencies on my part that lead to my
difficulties.

I have an application with several stages that is meant to simulate
and explore different scenarios with respect to product sales (in
units sold per month).  My session info is at the bottom of this note.

The steps include (1) an initial linear regression model, (2) building
an ARIMA model, (3) creating 12 months of simulated 'real' data - for
various values of projected changes in slope from the linear
regression - from the ARIMA model using arima.sim with subsequent
undifferencing and appending to historical data, (3) one-step-ahead
forecasting for 12 months using the 'fixed' arima model and simulated
data, (4) taking the residuals from the forecasting (simulated minus
forecast for each of the 12 months) and applying QC charting, and (5)
counting the number (if any) of runs-rules violations detected.

The simulation-aspect calculates 100 simulations for each of 50 values of slope.

All of this seems to work fine (although I'm sure that the coding
could be improved through functions, vectorization, etc. ... ).
However, the problem I'm having is at the end where I had hoped that
things would be easier.  I want to summarize and graph the probability
of detecting a runs-rule violation vs. the amount of the shift in
slope (of logunit sales).

The output data array passed from the qcc section at the end includes:
 - the adjustment made to the slope (a multiplier)
 - the actual value of the slope
 - the iteration number of the simulation loop (within each value of slope)
 - the count of QC charting limits violations
 - the count of QC charting runs rules violations


My code is in the attached file (generic_code.R) and my initial sales
data needed to prime the simulation is in the other attached file
(generic_data.csv).  The relevant section of my code is at the
bottom of the .R file after the end of the outer loop.  I've tried to
provide meaningful comments.

I've read the help files for _apply, aggregate, arrays and data types,
and have also consulted with several texts (Maindonald and Braun;
Spector; Venebles and Ripley for S-plus).  Somehow I still don't get
it.  My attempts usually result in a message like the following:

 agg.result = aggregate(simarray.part[,3], by=list4, FUN = mean)
Error in FUN(X[[1L]], ...) : arguments must have same length

But when I check the length of the arguments, they appear to match. (??)

 length(simarray.part[,3])
[1] 5000
 length(simarray.part[,4])
[1] 5000
 length(list4)
[1] 5000


I would have rather passed along a subset of the simulation/loop
output dataset, but was unsure how to save it to preserve whatever
properties that I may have missed that are causing my difficulties.
If anyone still wants to help at this point, I believe that you'll
need to load the .csv data and run the simulation (maybe reducing the
number of iterations).

Many thanks to anyone who can shed some light on my difficulties
(whether self-induced or otherwise).

Cliff



I'm using a pre-compiled binary version of R for Windows.

Session info:

 sessionInfo()
R version 2.9.0 (2009-04-17)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] qcc_1.3         forecast_1.24   tseries_0.10-18 zoo_1.5-5
[5] quadprog_1.4-11

loaded via a namespace (and not attached):
[1] grid_2.9.0      lattice_0.17-22


 Sys.getlocale()
[1] LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] question about using _apply and/or aggregate functions

2009-06-22 Thread Clifford Long
Hi R-list,

I'll apologize in advance for (1) the wordiness of my note (not sure
how to avoid it) and (2) any deficiencies on my part that lead to my
difficulties.

I have an application with several stages that is meant to simulate
and explore different scenarios with respect to product sales (in
units sold per month).  My session info is at the bottom of this note.

The steps include (1) an initial linear regression model, (2) building
an ARIMA model, (3) creating 12 months of simulated 'real' data - for
various values of projected changes in slope from the linear
regression - from the ARIMA model using arima.sim with subsequent
undifferencing and appending to historical data, (3) one-step-ahead
forecasting for 12 months using the 'fixed' arima model and simulated
data, (4) taking the residuals from the forecasting (simulated minus
forecast for each of the 12 months) and applying QC charting, and (5)
counting the number (if any) of runs-rules violations detected.

The simulation-aspect calculates 100 simulations for each of 50 values of slope.

All of this seems to work fine (although I'm sure that the coding
could be improved through functions, vectorization, etc. ... ).
However, the problem I'm having is at the end where I had hoped that
things would be easier.  I want to summarize and graph the probability
of detecting a runs-rule violation vs. the amount of the shift in
slope (of logunit sales).

The output data array passed from the qcc section at the end includes:
  - the adjustment made to the slope (a multiplier)
  - the actual value of the slope
  - the iteration number of the simulation loop (within each value of slope)
  - the count of QC charting limits violations
  - the count of QC charting runs rules violations


My code is in the attached file (generic_code.R) and my initial sales
data needed to prime the simulation is in the other attached file
(generic_data.csv).  The relevant section of my code is at the
bottom of the .R file after the end of the outer loop.  I've tried to
provide meaningful comments.

I've read the help files for _apply, aggregate, arrays and data types,
and have also consulted with several texts (Maindonald and Braun;
Spector; Venebles and Ripley for S-plus).  Somehow I still don't get
it.  My attempts usually result in a message like the following:

 agg.result = aggregate(simarray.part[,3], by=list4, FUN = mean)
Error in FUN(X[[1L]], ...) : arguments must have same length

But when I check the length of the arguments, they appear to match. (??)

 length(simarray.part[,3])
[1] 5000
 length(simarray.part[,4])
[1] 5000
 length(list4)
[1] 5000


I would have rather passed along a subset of the simulation/loop
output dataset, but was unsure how to save it to preserve whatever
properties that I may have missed that are causing my difficulties.
If anyone still wants to help at this point, I believe that you'll
need to load the .csv data and run the simulation (maybe reducing the
number of iterations).

Many thanks to anyone who can shed some light on my difficulties
(whether self-induced or otherwise).

Cliff



I'm using a pre-compiled binary version of R for Windows.

Session info:

 sessionInfo()
R version 2.9.0 (2009-04-17)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] qcc_1.3 forecast_1.24   tseries_0.10-18 zoo_1.5-5
[5] quadprog_1.4-11

loaded via a namespace (and not attached):
[1] grid_2.9.0  lattice_0.17-22


 Sys.getlocale()
[1] LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] question about using _apply and/or aggregate functions

2009-06-22 Thread Clifford Long
Hi David,

I appreciate the advice.  I had coerced 'list4' to as.list, but forgot
to specify list=() in the call to aggregate.  I made the correction,
and now get the following:

 slope.mult = simarray[,1]
 adj.slope.value = simarray[,2]
 adj.slope.level = simarray[,2]
 qc.run.violation = simarray[,5]
 simarray.part = cbind(slope.mult, adj.slope.value, qc.run.violation, 
 adj.slope.level)
 list4 = as.list(simarray.part[,4])
 agg.result = aggregate(simarray.part[,3], by=list(list4), FUN = mean)
Error in sort.list(unique.default(x), na.last = TRUE) :
  'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

... I'm not sure what this means that I've done wrong.  I did check
'list4' using is.list, and get TRUE back as an answer, so feel
that my mistake is some other fundamental aspect of R that I'm failing
to grasp.

To your note on 'tapply' ... I did try this as well (actually, tried
it first) with no initial success.  On your recommendation, I gave
tapply another go, and get something recognizable:

vtt = tapply(simarray.part[,3], simarray.part[,2], mean)

 dim(vtt)
[1] 50
 length(vtt)
[1] 50
 vtt[1:5]
0.003132 0.006264 0.009396 0.012528  0.01566
0.29 0.24 0.23 0.16 0.22
 vtt[1]
0.003132
0.29
 vtt[[1]][1]
[1] 0.29


I see that the output stored in vtt has one dimension with
length=50.  But each place in vtt appears to hold two values.  I'll
admit that I'm not sure how to access/reference the entirety of all 50
values =  0.29  0.24  0.23  0.16  0.22 (and so on).  I don't appear to
be able to access/reference only what appears to be an embedded index
= 0.003132   0.006264   0.009396  etc.   What am I missing?  Is there
a reference that I need to re-read?  I would like to be able to plot
one against the other.

Thanks again for taking the time outside of your day job for your
earlier reply!

Cliff


On Mon, Jun 22, 2009 at 11:28 AM, David Winsemiusdwinsem...@comcast.net wrote:

 On Jun 22, 2009, at 12:04 PM, Clifford Long wrote:

 Hi R-list,

 I'll apologize in advance for (1) the wordiness of my note (not sure
 how to avoid it) and (2) any deficiencies on my part that lead to my
 difficulties.

 I have an application with several stages that is meant to simulate
 and explore different scenarios with respect to product sales (in
 units sold per month).  My session info is at the bottom of this note.

 The steps include (1) an initial linear regression model, (2) building
 an ARIMA model, (3) creating 12 months of simulated 'real' data - for
 various values of projected changes in slope from the linear
 regression - from the ARIMA model using arima.sim with subsequent
 undifferencing and appending to historical data, (3) one-step-ahead
 forecasting for 12 months using the 'fixed' arima model and simulated
 data, (4) taking the residuals from the forecasting (simulated minus
 forecast for each of the 12 months) and applying QC charting, and (5)
 counting the number (if any) of runs-rules violations detected.

 The simulation-aspect calculates 100 simulations for each of 50 values of
 slope.

 All of this seems to work fine (although I'm sure that the coding
 could be improved through functions, vectorization, etc. ... ).
 However, the problem I'm having is at the end where I had hoped that
 things would be easier.  I want to summarize and graph the probability
 of detecting a runs-rule violation vs. the amount of the shift in
 slope (of logunit sales).

 The output data array passed from the qcc section at the end includes:
  - the adjustment made to the slope (a multiplier)
  - the actual value of the slope
  - the iteration number of the simulation loop (within each value of
 slope)
  - the count of QC charting limits violations
  - the count of QC charting runs rules violations


 My code is in the attached file (generic_code.R) and my initial sales
 data needed to prime the simulation is in the other attached file
 (generic_data.csv).  The relevant section of my code is at the
 bottom of the .R file after the end of the outer loop.  I've tried to
 provide meaningful comments.

 I've read the help files for _apply, aggregate, arrays and data types,
 and have also consulted with several texts (Maindonald and Braun;
 Spector; Venebles and Ripley for S-plus).  Somehow I still don't get
 it.  My attempts usually result in a message like the following:

 agg.result = aggregate(simarray.part[,3], by=list4, FUN = mean)

 Error in FUN(X[[1L]], ...) : arguments must have same length

 I cannot comment on the overall strategy, but wondered if this minor mod to
 the code might succeed;

 agg.result = aggregate(simarray.part[,3], by=list(list4), FUN = mean)

 My personal experience with aggregate() is not positive. I generally end up
 turning to tapply()  (which is at the heart of aggregate() anyway) probably
 because I forget to wrap the second argument in a list. Slow learner, I
 guess.



 But when I check the length of the arguments, they appear to match. (??)

 length

Re: [R] question about using _apply and/or aggregate functions

2009-06-22 Thread Clifford Long
David,

Once again, many thanks for your very useful and timely feedback, and
for your patience with my learning curve.

Sincerely,

Cliff



On Mon, Jun 22, 2009 at 7:11 PM, David Winsemiusdwinsem...@comcast.net wrote:

 On Jun 22, 2009, at 7:55 PM, David Winsemius wrote:


 On Jun 22, 2009, at 6:16 PM, Clifford Long wrote:

 Hi David,

 I appreciate the advice.  I had coerced 'list4' to as.list, but forgot
 to specify list=() in the call to aggregate.  I made the correction,
 and now get the following:

 slope.mult = simarray[,1]
 adj.slope.value = simarray[,2]
 adj.slope.level = simarray[,2]
 qc.run.violation = simarray[,5]
 simarray.part = cbind(slope.mult, adj.slope.value, qc.run.violation,
 adj.slope.level)
 list4 = as.list(simarray.part[,4])
 agg.result = aggregate(simarray.part[,3], by=list(list4), FUN = mean)

 Error in sort.list(unique.default(x), na.last = TRUE) :
 'x' must be atomic for 'sort.list'
 Have you called 'sort' on a list?

 ... I'm not sure what this means that I've done wrong.  I did check
 'list4' using is.list, and get TRUE back as an answer, so feel
 that my mistake is some other fundamental aspect of R that I'm failing
 to grasp.

 To your note on 'tapply' ... I did try this as well (actually, tried
 it first) with no initial success.  On your recommendation, I gave
 tapply another go, and get something recognizable:

 vtt = tapply(simarray.part[,3], simarray.part[,2], mean)


 snipped other stuff...



 I would like to be able to plot
 one against the other.

 plot(names(vtt), vtt)

 Or perhaps:

 plot(as.numeric(names(vtt)), vtt)

 --

 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] p-values for ARIMA coefficients

2009-06-22 Thread Clifford Long
Hi Myriam,

I'll take a stab at it, but can't offer elegance in the solution such
as the more experienced R folks might deliver.

I believe that the ARIMA function provides both point estimates and
their standard errors for the coefficients.  You can use these as you
might a mean and standard error in a one-sample t- or z-test with the
null hypothesis being that the coefficient value is 'zero'.  If the
interval estimate of the coefficient doesn't span 'zero', then you
reject the null hypothesis at whatever level of Type 1 error you
chose.  The R set of distribution functions might be useful for this.

Having offered that as a possible interim solution, you'd be well
served to see what other advice more experienced R users might offer.

Regards,

Cliff



On Mon, Jun 22, 2009 at 6:38 PM, m.gha...@yahoo.fr wrote:
 Hi,

 I'm a beginner using R and I'm modeling a time series with ARIMA.
 I'm looking for a way to determine the p-values of the coefficients of my 
 model.

 Does ARIMA function return these values? or is there a way to determine them 
 easily?

 Thanks for your answer
 Myriam

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.