Re: [R] if value is in vector, perform this function

2013-03-03 Thread Patrick Burns

I think the command you want is:

if(t %in% feed_days) C_A - 1.5 else C_A - 0

Do not confuse `%in%` (which is essentially
are the left-hand values in the right-hand
vector)

with

in of the `for` loop.

By the way,

if(t == TRUE)

is redundant -- better is:

if(t)


Pat


On 02/03/2013 23:57, Louise Stevenson wrote:

Hi,

I'm trying to set up R to run a simulation of two populations in which every 3.5 days, 
the initial value of one of the populations is reset to 1.5. I'm simulation an experiment 
we did in which we fed Daphnia populations twice a week with algae, so I want the initial 
value of the algal population to reset to 1.5 twice a week to simulate that feeding. I've 
use for loops and if/else loops before but I can't figure out how to syntax if t is 
in this vector of possible t values, do this command, else, do this command if that 
makes sense. Here's what I have (and it doesn't work):

params = c(1, 0.15, 0.164, 1)
init = c(1.5, 0.05)
t=seq(1,60, by=0.5) #all time values, experiment ran for 60 days

#feeding sequence - every 3.5 days
feed_days = seq(1,60,by=3.5)

Daphnia - function(t,x,params){
C_D = x[2];
C_A = 0;
for(t %in% feed_days){
if t == TRUE {
C_A = 1.5
}
else{
C_A = 0
 }}
gamma = params[1]; m_D = params[2]; K_q = params[3]; q_max = params[4];
M_D = m_D * C_D
I_A = (C_D * q_max * C_A) / (K_q + C_A)
r_D = gamma * I_A
return(
list(c(
 - I_A,
r_D - M_D
)))
}

library(deSolve)
results - ode(init, t, Daphnia, params, method = lsoda)


Let me know if there's any other info that would be helpful and thanks very 
much for your help!


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @burnsstat @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of:
 'Impatient R'
 'The R Inferno'
 'Tao Te Programming')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] if value is in vector, perform this function

2013-03-03 Thread Berend Hasselman

On 03-03-2013, at 00:57, Louise Stevenson louise.steven...@lifesci.ucsb.edu 
wrote:

 Hi,
 
 I'm trying to set up R to run a simulation of two populations in which every 
 3.5 days, the initial value of one of the populations is reset to 1.5. I'm 
 simulation an experiment we did in which we fed Daphnia populations twice a 
 week with algae, so I want the initial value of the algal population to reset 
 to 1.5 twice a week to simulate that feeding. I've use for loops and if/else 
 loops before but I can't figure out how to syntax if t is in this vector of 
 possible t values, do this command, else, do this command if that makes 
 sense. Here's what I have (and it doesn't work):
 
 params = c(1, 0.15, 0.164, 1)
 init = c(1.5, 0.05)
 t=seq(1,60, by=0.5) #all time values, experiment ran for 60 days
 
 #feeding sequence - every 3.5 days
 feed_days = seq(1,60,by=3.5)
 
 Daphnia - function(t,x,params){
   C_D = x[2];
   C_A = 0;
   for(t %in% feed_days){
   if t == TRUE {
   C_A = 1.5
   }
   else{
   C_A = 0
}}
   gamma = params[1]; m_D = params[2]; K_q = params[3]; q_max = params[4];
   M_D = m_D * C_D
   I_A = (C_D * q_max * C_A) / (K_q + C_A)
   r_D = gamma * I_A
   return(
   list(c(
- I_A,
   r_D - M_D
   )))
   }
 
 library(deSolve)
 results - ode(init, t, Daphnia, params, method = lsoda)
 

You have been given a correction for expression for (t %in% feed_days).

But even with that correction things will not do as you seem to want.

The argument t of function Daphnia is the integration time the ode solver is 
passing and almost certainly is NOT an element of the vector t defined at the 
start of your script. That t is the the time sequence for which output is 
wanted (see ode help); it is what is put into the output of ode.
There is no reason to assume that the Daphnia argument t is  an element of 
feed_days. You can easily check this by inserting a print(t) in Daphnia. So C_A 
will be 0 most of the time.

It would certainly help if you named the elements of the init vector and the 
return list of Daphnia.
In Daphnia x[2] is C_D. But what is x[1] (C_A?)?

I think you will have to look at deSolve events but I'm not sure if that is 
possible or required/desired with your model.

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] if value is in vector, perform this function

2013-03-03 Thread Patrick Burns

I forgot to say:

Also do not depend on equality in this situation.
You want to test equality with a tolerance.

See Circle 1 of 'The R Inferno':
http://www.burns-stat.com/documents/books/the-r-inferno/

I also see that 't' is a vector unlike what I was
thinking before, thus you want to use 'ifelse':

C_A - ifelse(t %in% feed_days, 1.5, 0)

except that still leaves out the tolerance.  If
you are always only going to go by half-days, then
the following should work:

C_A - ifelse( round(2*t) %in% round(2 * feed_days), 1.5, 0)

Pat

On 02/03/2013 23:57, Louise Stevenson wrote:

Hi,

I'm trying to set up R to run a simulation of two populations in which every 3.5 days, 
the initial value of one of the populations is reset to 1.5. I'm simulation an experiment 
we did in which we fed Daphnia populations twice a week with algae, so I want the initial 
value of the algal population to reset to 1.5 twice a week to simulate that feeding. I've 
use for loops and if/else loops before but I can't figure out how to syntax if t is 
in this vector of possible t values, do this command, else, do this command if that 
makes sense. Here's what I have (and it doesn't work):

params = c(1, 0.15, 0.164, 1)
init = c(1.5, 0.05)
t=seq(1,60, by=0.5) #all time values, experiment ran for 60 days

#feeding sequence - every 3.5 days
feed_days = seq(1,60,by=3.5)

Daphnia - function(t,x,params){
C_D = x[2];
C_A = 0;
for(t %in% feed_days){
if t == TRUE {
C_A = 1.5
}
else{
C_A = 0
 }}
gamma = params[1]; m_D = params[2]; K_q = params[3]; q_max = params[4];
M_D = m_D * C_D
I_A = (C_D * q_max * C_A) / (K_q + C_A)
r_D = gamma * I_A
return(
list(c(
 - I_A,
r_D - M_D
)))
}

library(deSolve)
results - ode(init, t, Daphnia, params, method = lsoda)


Let me know if there's any other info that would be helpful and thanks very 
much for your help!


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @burnsstat @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of:
 'Impatient R'
 'The R Inferno'
 'Tao Te Programming')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Empirical Bayes Estimator for Poisson-Gamma Parameters

2013-03-03 Thread Ali A. Bromideh
Dear Nicole,

First of all, my sincere gratitude goes to your kind reply. As I told to Mr.
Gunter, this is a part of my research and differs from homework. However, I
am going to clarify the problem. Suppose we have received an observation
from a Poisson distr. i.e. Y_1~Pois(Lam_1), where Lam_1~Gamma(alpha_1,
beta_1). Now, what's the empirical Bayes (EB) estimation for alpha_1 and
beta_1? 
Let Y_2~Pois(Lam_2) and Lam_2~Gamma(alpha_2, beta_2). Again how can we
calculate EB for alpha_2 and beta_2? 

In fact, I read the relevant paper by Robbins at
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC350425/ but it gave 0 for Y_1.
And for the Var(Y)  E(Y), it generates negative value for positive value of
alpha/beta!! 

Any idea? 

Kind regards,
 


-Original Message-
From: Nicole Ford [mailto:nicole.f...@me.com] 
Sent: Sunday, March 03, 2013 4:09 AM
To: Bert Gunter
Cc: Boroumideh-Ali Akbar; r-help@r-project.org
Subject: Re: [R] Empirical Bayes Estimator for Poisson-Gamma Parameters

also, kruschke at indiana has some info on this, both online and youtube.
(if homework.)  if not, more infor will be helpful.

~n


On Feb 25, 2013, at 9:41 AM, Bert Gunter wrote:

 Homework? We don't do homework here.
 
 If not, search (e.g. via google -- R hierarchical Bayes -- or some
such).
 
 -- Bert
 
 On Mon, Feb 25, 2013 at 1:39 AM, Ali A. Bromideh a.bromi...@ikco.com
wrote:
 Dear Sir/Madam,
 
 
 
 I apologize for any cross-posting. I got a simple question, which I
thought
 the R list may help me to find an answer. Suppose we have Y_1, Y_2, .,
Y_n ~
 Poisson (Lambda_i) and Lambda_i ~Gamma(alpha_i, beta_i).  Empirical Bayes
 Estimator for hyper-parameters of the gamma distr, i.e. (alpha_t, beta_t)
 are needed.
 
 
 
 y=c(12,5,17,14)
 
 n=4
 
 
 
 What about a Hierarchal B ayes estimators?
 
 
 
 
 
 Any relevant work and codes in R (or S+) is highly appreciated.
 
 
 
 Kind regards,
 
 Ali
 
 
 
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -- 
 
 Bert Gunter
 Genentech Nonclinical Biostatistics
 
 Internal Contact Info:
 Phone: 467-7374
 Website:

http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biost
atistics/pdb-ncb-home.htm
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help searching a matrix for only certain records

2013-03-03 Thread Matt Borkowski
Let me start by saying I am rather new to R and generally consider myself to be 
a novice programmer...so don't assume I know what I'm doing :)

I have a large matrix, approximately 300,000 x 14. It's essentially a 20-year 
dataset of 15-minute data. However, I only need the rows where the column I've 
named REC.TYPE contains the string SAO   or FL-15. 

My horribly inefficient solution was to search the matrix row by row, test the 
REC.TYPE column and essentially delete the row if it did not match my criteria. 
Essentially...

 j - 1
 for (i in 1:nrow(dataset)) {
if(dataset$REC.TYPE[j] != SAOdataset$RECTYPE[j] != FL-15) {
  dataset - dataset[-j,]  }
else {
  j - j+1  }
 }

After watching my code get through only about 10% of the matrix in an hour and 
slowing with every row...I figure there must be a more efficient way of pulling 
out only the records I need...especially when I need to repeat this for another 
8 datasets. 

Can anyone point me in the right direction?

Thanks!

Matt

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Kolmogorov-Smirnov: calculate p value given as input the test statistic

2013-03-03 Thread Rani Elkon
Dear all,

 

I calculate the test statistic for the KS test outside R, and wish to use R
only to calculate the corresponding p-value. 

Is there a way for doing this? (as far as I see,  ks.test() requires raw
data as input). Alternatively, is there a way to provide the ks.test() the
two CDFs (two samples test) rather than the (x, y) data vectors? 

 

Thanks in advance,

Rani 

 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help searching a matrix for only certain records

2013-03-03 Thread jim holtman
Try this:

dataset - subset(dataset, grepl((SAO |FL-15), REC.TYPE))


On Sun, Mar 3, 2013 at 1:11 AM, Matt Borkowski mathias1...@yahoo.com wrote:
 Let me start by saying I am rather new to R and generally consider myself to 
 be a novice programmer...so don't assume I know what I'm doing :)

 I have a large matrix, approximately 300,000 x 14. It's essentially a 20-year 
 dataset of 15-minute data. However, I only need the rows where the column 
 I've named REC.TYPE contains the string SAO   or FL-15.

 My horribly inefficient solution was to search the matrix row by row, test 
 the REC.TYPE column and essentially delete the row if it did not match my 
 criteria. Essentially...

 j - 1
 for (i in 1:nrow(dataset)) {
if(dataset$REC.TYPE[j] != SAOdataset$RECTYPE[j] != FL-15) {
  dataset - dataset[-j,]  }
else {
  j - j+1  }
 }

 After watching my code get through only about 10% of the matrix in an hour 
 and slowing with every row...I figure there must be a more efficient way of 
 pulling out only the records I need...especially when I need to repeat this 
 for another 8 datasets.

 Can anyone point me in the right direction?

 Thanks!

 Matt

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-Smirnov: calculate p value given as input the test statistic

2013-03-03 Thread Rui Barradas

Hello,

You can compute the p-value from the test statistic if you know the 
samples' sizes. R calls functions written in C for the several cases, 
for the two samples case, this is the code (edited)


n.x - 100  # length of 1st sample
n.y - 100  # length of 2nd sample
STATISTIC - 1.23

PVAL - 1 - .C(psmirnov2x,
p = as.double(STATISTIC),
as.integer(n.x),
as.integer(n.y))$p
PVAL - min(1.0, max(0.0, PVAL))


For the other cases check the source, file stats/ks.test.R.

As for the second question, I believe the answer is no, you must provide 
at least on sample and a CDF. Something like


x - rnorm(100)
f - ecdf(rnorm(100))

ks.test(x, f)


Hope this helps,

Rui Barradas

Em 03-03-2013 09:58, Rani Elkon escreveu:

Dear all,



I calculate the test statistic for the KS test outside R, and wish to use R
only to calculate the corresponding p-value.

Is there a way for doing this? (as far as I see,  ks.test() requires raw
data as input). Alternatively, is there a way to provide the ks.test() the
two CDFs (two samples test) rather than the (x, y) data vectors?



Thanks in advance,

Rani






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] caret pls model statistics

2013-03-03 Thread Charles Determan Jr
Thank you for your response Max.  Is there some literature that you make
that statement?  I am confused as I have seen many publications that
contain R^2 and Q^2 following PLSDA analysis.  The analysis usually is to
discriminate groups (ie. classification).  Are these papers incorrect in
using these statistics?

Regards,
Charles

On Sat, Mar 2, 2013 at 10:39 PM, Max Kuhn mxk...@gmail.com wrote:

 Charles,

 You should not be treating the classes as numeric (is virginica really
 three times setosa?). Q^2 and/or R^2 are not appropriate for classification.

 Max


 On Sat, Mar 2, 2013 at 5:21 PM, Charles Determan Jr deter...@umn.eduwrote:

 I have discovered on of my errors.  The timematrix was unnecessary and an
 unfortunate habit I brought from another package.  The following provides
 the same R2 values as it should, however, I still don't know how to
 retrieve Q2 values.  Any insight would again be appreciated:

 library(caret)
 library(pls)

 data(iris)

 #needed to convert to numeric in order to do regression
 #I don't fully understand this but if I left as a factor I would get an
 error following the summary function
 iris$Species=as.numeric(iris$Species)
 inTrain1=createDataPartition(y=iris$Species,
 p=.75,
 list=FALSE)

 training1=iris[inTrain1,]
 testing1=iris[-inTrain1,]

 ctrl1=trainControl(method=cv,
 number=10)

 plsFit2=train(Species~.,
 data=training1,
 method=pls,
 trControl=ctrl1,
 metric=Rsquared,
 preProc=c(scale))

 data(iris)
 training1=iris[inTrain1,]
 datvars=training1[,1:4]
 dat.sc=scale(datvars)

 pls.dat=plsr(as.numeric(training1$Species)~dat.sc,
 ncomp=3, method=oscorespls, data=training1)

 x=crossval(pls.dat, segments=10)

 summary(x)
 summary(plsFit2)

 Regards,
 Charles

 On Sat, Mar 2, 2013 at 3:55 PM, Charles Determan Jr deter...@umn.edu
 wrote:

  Greetings,
 
  I have been exploring the use of the caret package to conduct some plsda
  modeling.  Previously, I have come across methods that result in a R2
 and
  Q2 for the model.  Using the 'iris' data set, I wanted to see if I could
  accomplish this with the caret package.  I use the following code:
 
  library(caret)
  data(iris)
 
  #needed to convert to numeric in order to do regression
  #I don't fully understand this but if I left as a factor I would get an
  error following the summary function
  iris$Species=as.numeric(iris$Species)
  inTrain1=createDataPartition(y=iris$Species,
  p=.75,
  list=FALSE)
 
  training1=iris[inTrain1,]
  testing1=iris[-inTrain1,]
 
  ctrl1=trainControl(method=cv,
  number=10)
 
  plsFit2=train(Species~.,
  data=training1,
  method=pls,
  trControl=ctrl1,
  metric=Rsquared,
  preProc=c(scale))
 
  data(iris)
  training1=iris[inTrain1,]
  datvars=training1[,1:4]
  dat.sc=scale(datvars)
 
  n=nrow(dat.sc)
  dat.indices=seq(1,n)
 
  timematrix=with(training1,
  classvec2classmat(Species[dat.indices]))
 
  pls.dat=plsr(timematrix ~ dat.sc,
  ncomp=3, method=oscorespls, data=training1)
 
  x=crossval(pls.dat, segments=10)
 
  summary(x)
  summary(plsFit2)
 
  I see two different R2 values and I cannot figure out how to get the Q2
  value.  Any insight as to what my errors may be would be appreciated.
 
  Regards,
 
  --
  Charles
 



 --
 Charles Determan
 Integrated Biosciences PhD Student
 University of Minnesota

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --

 Max




-- 
Charles Determan
Integrated Biosciences PhD Student
University of Minnesota

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Errors-In-Variables in R

2013-03-03 Thread John Fox
Dear Cedric,

If I understand correctly what you want to do, and if you're willing to
assume that the variables are normally distributed, then you should be able
to use any of the latent-variable structural-equation-modeling packages in
R, such as sem, OpenMX, or lavaan. 

Here's an artificial example using the sem package:

 snip --

 set.seed(12345)
 zeta - rnorm(1000)
 y - 1 + 2*zeta + rnorm(1000, 0, 1)
 x - zeta + rnorm(1000)
 plot(x, y)
 Data - data.frame(x, y)
 summary(lm(y ~ x)) # biased

Call:
lm(formula = y ~ x)

Residuals:
Min  1Q  Median  3Q Max 
-6.6339 -1.1406  0.0299  1.1573  6.5652 

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)  1.040070.05514   18.86   2e-16 ***
x1.060890.04012   26.44   2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.743 on 998 degrees of freedom
Multiple R-squared:  0.4119,Adjusted R-squared:  0.4113 
F-statistic: 699.1 on 1 and 998 DF,  p-value:  2.2e-16

 plot(x, y) # not shown
 
 library(sem)
 
 eqns - specifyEquations()
1: y = alpha*Intercept + beta*zeta
2: x = 1*zeta
3: V(y) = sigma
4: V(x) = 1
5: V(zeta) = phi
6: 
Read 5 items

 model - sem(eqns, data=Data, raw=TRUE, fixed.x=Intercept)
 summary(model)

Model fit to raw moment matrix.

 Model Chisquare =  0.2264654   Df =  1 Pr(Chisq) = 0.6341572
 AIC =  8.226465
 BIC =  -6.68129

 Normalized Residuals
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
 0.  0.1635  0.1711  0.2189  0.2564  0.4759 

 Parameter Estimates
  Estimate  Std Error  z value   Pr(|z|) 
alpha 1.0400668 0.05507397 18.884905 1.518098e-79 y --- Intercept
beta  2.2553406 0.14197058 15.885971 7.926103e-57 y --- zeta 
sigma 0.6404697 0.25612060  2.500657 1.239632e-02 y -- y
phi   0.8881856 0.08444223 10.518263 7.117323e-26 zeta -- zeta  

 Iterations =  15 

 library(car)
 linearHypothesis(model, c(alpha = 1, beta = 2, sigma = 1, phi =
1)) # true parameter values
Linear hypothesis test

Hypothesis:
alpha = 1
beta = 2
sigma = 1
phi = 1

Model 1: restricted model
Model 2: model

  Res.Df Df  Chisq Pr(Chisq)
1  5 
2  1  4 3.8285 0.4297

 snip --

For other distributional assumptions, you'd have to write your own objective
function but you may still be able to use sem or one of the other SEM
packages.

I hope this helps,
 John

---
John Fox
Senator McMaster Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada



 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Cedric Sodhi
 Sent: Saturday, March 02, 2013 4:56 PM
 To: Rui Barradas
 Cc: r-help@r-project.org
 Subject: Re: [R] Errors-In-Variables in R
 
 Perhaps it would have been clearer that this is no homework if I
 hadn't forgotten to say what [1] is. Sorry for that.
 
 [1] https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15225
 
 (This is no homework but genuinely adresses the problem that R to my
 knowledge does not have models for error in variables)
 
 
 On Sat, Mar 02, 2013 at 09:34:21PM +, Rui Barradas wrote:
  There's a no homework policy in R-help.
 
  Rui Barradas
 
  Em 02-03-2013 18:28, Cedric Sodhi escreveu:
   In reference to [1], how would you solve the following regression
   problem:
  
   Given observations (X_i,Y_i) with known respective error
 distributions
   (e_X_i,e_Y_i) (say, 0-mean Gaussian with known STD), find the
 parameters
   a and b which maximize the Likelihood of
  
   Y = a*X + b
  
   Taking the example further, how many of the very simplified
 assumptions
   from the above example can be lifted or eased and R still has a
 method
   for finding an errors-in-variables fit?
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kolmogorov-Smirnov: calculate p value given as input the test statistic

2013-03-03 Thread Prof Brian Ripley

On 03/03/2013 09:58, Rani Elkon wrote:

Dear all,



I calculate the test statistic for the KS test outside R, and wish to use R
only to calculate the corresponding p-value.


There is no public way to do this in R.  But you can read the code of 
ks.test and see how it does it, and extract the code you need.


Note that ks.test covers several cases and hence has several branches of 
code to compute p values.  Also (and this is one good reason why there 
is no a public interface), the internal code differs by version of R (so 
another answer I have just seen is wrong for pre-3.0.0).



Is there a way for doing this? (as far as I see,  ks.test() requires raw
data as input). Alternatively, is there a way to provide the ks.test() the
two CDFs (two samples test) rather than the (x, y) data vectors?


Yes, because if you have the CDF you can recover the sorted data vector 
which is all you need.






Thanks in advance,

Rani


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help searching a matrix for only certain records

2013-03-03 Thread arun
Hi,
Try this:
set.seed(51)
 mat1- as.matrix(data.frame(REC.TYPE= 
sample(c(SAO,FAO,FL-1,FL-2,FL-15),20,replace=TRUE),Col2=rnorm(20),Col3=runif(20),stringsAsFactors=FALSE))
 dat1- as.data.frame(mat1,stringsAsFactors=FALSE)

dat1[grepl(SAO|FL-15,dat1$REC.TYPE),]
#   REC.TYPE    Col2   Col3
#4 FL-15 -1.31594143 0.41193183
#6 FL-15  0.43419586 0.96004780
#9 FL-15 -0.90690732 0.84000657
#10  SAO  0.21363265 0.20155142
#13  SAO -0.55566727 0.71606558
#15  SAO -0.71533068 0.90851364
#17  SAO  1.58611036 0.97475674
#20  SAO -0.42904914 0.33710578
A.K.



- Original Message -
From: Matt Borkowski mathias1...@yahoo.com
To: r-help@r-project.org
Cc: 
Sent: Sunday, March 3, 2013 1:11 AM
Subject: [R] Help searching a matrix for only certain records

Let me start by saying I am rather new to R and generally consider myself to be 
a novice programmer...so don't assume I know what I'm doing :)

I have a large matrix, approximately 300,000 x 14. It's essentially a 20-year 
dataset of 15-minute data. However, I only need the rows where the column I've 
named REC.TYPE contains the string SAO   or FL-15. 

My horribly inefficient solution was to search the matrix row by row, test the 
REC.TYPE column and essentially delete the row if it did not match my criteria. 
Essentially...

 j - 1
 for (i in 1:nrow(dataset)) {
    if(dataset$REC.TYPE[j] != SAO    dataset$RECTYPE[j] != FL-15) {
      dataset - dataset[-j,]  }
    else {
      j - j+1  }
 }

After watching my code get through only about 10% of the matrix in an hour and 
slowing with every row...I figure there must be a more efficient way of pulling 
out only the records I need...especially when I need to repeat this for another 
8 datasets. 

Can anyone point me in the right direction?

Thanks!

Matt

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Random Sample with constraints

2013-03-03 Thread Angelo Scozzarella Tiscali
Dear R friends,

I'd like to generate random sample (variable size and range) without a 
specified distribution but with given mean and standard deviation.

Could you help me?

thanks in advance

Angelo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random Sample with constraints

2013-03-03 Thread Ben Bolker
Angelo Scozzarella Tiscali angeloscozzarella at tiscali.it writes:

 
 Dear R friends,
 
 I'd like to generate random sample (variable size and range) without a
specified distribution but with
 given mean and standard deviation.
 
 Could you help me?
 

   The problem is underspecified, so no, we can't.

   Any random sample will by definition be a sample from _some_
distribution.

  If you give more context someone might able be to help you with 
a solution.

  Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help searching a matrix for only certain records

2013-03-03 Thread Jim Holtman
there are way more efficient ways of doing many of the operations , but you 
probably won't see any differences unless you have very large objects (several 
hunfred thousand entries), or have to do it a lot of times.  My background is 
in computer performance and for the most part I have found that the 
easiest/mostbstraight forward ways are fine most of the time.

a more efficient way might be:

testdata - testdata[match(c('SAO ', 'FL-15'), testdata$REC.TYPE), ]

you can always use 'system.time' to determine how long actions take.

for multiple comparisons use %in%

Sent from my iPad

On Mar 3, 2013, at 9:22, Matt Borkowski mathias1...@yahoo.com wrote:

 Thank you for your response Jim! I will give this one a try! But a couple 
 followup questions...
 
 In my search for a solution, I had seen something stating match() is much 
 more efficient than subset() and will cut down significantly on computing 
 time. Is there any truth to that?
 
 Also, I found the following solution which works for matching a single 
 condition, but I couldn't quite figure out how to  modify it it to search for 
 both my acceptable conditions...
 
 testdata - testdata[testdata$REC.TYPE == SAO,,drop=FALSE]
 
 -Matt
 
 
 
 
 --- On Sun, 3/3/13, jim holtman jholt...@gmail.com wrote:
 
 From: jim holtman jholt...@gmail.com
 Subject: Re: [R] Help searching a matrix for only certain records
 To: Matt Borkowski mathias1...@yahoo.com
 Cc: r-help@r-project.org
 Date: Sunday, March 3, 2013, 8:00 AM
 
 Try this:
 
 dataset - subset(dataset, grepl((SAO |FL-15), REC.TYPE))
 
 
 On Sun, Mar 3, 2013 at 1:11 AM, Matt Borkowski mathias1...@yahoo.com wrote:
 Let me start by saying I am rather new to R and generally consider myself to 
 be a novice programmer...so don't assume I know what I'm doing :)
 
 I have a large matrix, approximately 300,000 x 14. It's essentially a 
 20-year dataset of 15-minute data. However, I only need the rows where the 
 column I've named REC.TYPE contains the string SAO   or FL-15.
 
 My horribly inefficient solution was to search the matrix row by row, test 
 the REC.TYPE column and essentially delete the row if it did not match my 
 criteria. Essentially...
 
 j - 1
 for (i in 1:nrow(dataset)) {
 if(dataset$REC.TYPE[j] != SAOdataset$RECTYPE[j] != FL-15) {
   dataset - dataset[-j,]  }
 else {
   j - j+1  }
 }
 
 After watching my code get through only about 10% of the matrix in an hour 
 and slowing with every row...I figure there must be a more efficient way of 
 pulling out only the records I need...especially when I need to repeat this 
 for another 8 datasets.
 
 Can anyone point me in the right direction?
 
 Thanks!
 
 Matt
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -- 
 Jim Holtman
 Data Munger Guru
 
 What is the problem that you are trying to solve?
 Tell me what you want to do, not how you want to do it.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random Sample with constraints

2013-03-03 Thread Angelo Scozzarella Tiscali
For example,  I want to simulate different populations with same mean and 
standard deviation but different distribution.


Il giorno 03/mar/2013, alle ore 17:14, Angelo Scozzarella Tiscali ha scritto:

 Dear R friends,
 
 I'd like to generate random sample (variable size and range) without a 
 specified distribution but with given mean and standard deviation.
 
 Could you help me?
 
 thanks in advance
 
 Angelo
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help searching a matrix for only certain records

2013-03-03 Thread jim holtman
If you are using matrices, then here is several ways of doing it for
size 300,000.  You can determine if the difference of 0.1 seconds is
important in terms of the performance you are after.  It is taking you
more time to type in the statements than it is taking them to execute:

 n - 30
 testdata - matrix(
+ sample(c(SAO , FL-15, Other), n, TRUE, prob = c(1,2,1000))
+ , nrow = n
+ , dimnames = list(NULL, REC.TYPE)
+ )
 table(testdata[, REC.TYPE])

 FL-15  Other   SAO
   562 299151287
 system.time(x1 - subset(testdata, grepl((SAO |FL-15), testdata[, 
 REC.TYPE])))
   user  system elapsed
   0.170.000.17
 system.time(x2 - subset(testdata, testdata[, REC.TYPE] %in% c(SAO , 
 FL-15)))
   user  system elapsed
   0.050.000.05
 system.time(x3 - testdata[match(testdata[, REC.TYPE]
+ , c(SAO , FL-15)
+ , nomatch = 0) != 0
+ ,, drop = FALSE]
+ )
   user  system elapsed
   0.030.000.03
 identical(x1, x2)
[1] TRUE
 identical(x2, x3)
[1] TRUE



On Sun, Mar 3, 2013 at 11:22 AM, Jim Holtman jholt...@gmail.com wrote:
 there are way more efficient ways of doing many of the operations , but you 
 probably won't see any differences unless you have very large objects 
 (several hunfred thousand entries), or have to do it a lot of times.  My 
 background is in computer performance and for the most part I have found that 
 the easiest/mostbstraight forward ways are fine most of the time.

 a more efficient way might be:

 testdata - testdata[match(c('SAO ', 'FL-15'), testdata$REC.TYPE), ]

 you can always use 'system.time' to determine how long actions take.

 for multiple comparisons use %in%

 Sent from my iPad

 On Mar 3, 2013, at 9:22, Matt Borkowski mathias1...@yahoo.com wrote:

 Thank you for your response Jim! I will give this one a try! But a couple 
 followup questions...

 In my search for a solution, I had seen something stating match() is much 
 more efficient than subset() and will cut down significantly on computing 
 time. Is there any truth to that?

 Also, I found the following solution which works for matching a single 
 condition, but I couldn't quite figure out how to  modify it it to search 
 for both my acceptable conditions...

 testdata - testdata[testdata$REC.TYPE == SAO,,drop=FALSE]

 -Matt




 --- On Sun, 3/3/13, jim holtman jholt...@gmail.com wrote:

 From: jim holtman jholt...@gmail.com
 Subject: Re: [R] Help searching a matrix for only certain records
 To: Matt Borkowski mathias1...@yahoo.com
 Cc: r-help@r-project.org
 Date: Sunday, March 3, 2013, 8:00 AM

 Try this:

 dataset - subset(dataset, grepl((SAO |FL-15), REC.TYPE))


 On Sun, Mar 3, 2013 at 1:11 AM, Matt Borkowski mathias1...@yahoo.com wrote:
 Let me start by saying I am rather new to R and generally consider myself 
 to be a novice programmer...so don't assume I know what I'm doing :)

 I have a large matrix, approximately 300,000 x 14. It's essentially a 
 20-year dataset of 15-minute data. However, I only need the rows where the 
 column I've named REC.TYPE contains the string SAO   or FL-15.

 My horribly inefficient solution was to search the matrix row by row, test 
 the REC.TYPE column and essentially delete the row if it did not match my 
 criteria. Essentially...

 j - 1
 for (i in 1:nrow(dataset)) {
 if(dataset$REC.TYPE[j] != SAOdataset$RECTYPE[j] != FL-15) {
   dataset - dataset[-j,]  }
 else {
   j - j+1  }
 }

 After watching my code get through only about 10% of the matrix in an hour 
 and slowing with every row...I figure there must be a more efficient way of 
 pulling out only the records I need...especially when I need to repeat this 
 for another 8 datasets.

 Can anyone point me in the right direction?

 Thanks!

 Matt

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?
 Tell me what you want to do, not how you want to do it.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random Sample with constraints

2013-03-03 Thread Ted Harding
On 03-Mar-2013 16:29:05 Angelo Scozzarella Tiscali wrote:
 For example,  I want to simulate different populations with same mean and
 standard deviation but different distribution.
 
 Il giorno 03/mar/2013, alle ore 17:14, Angelo Scozzarella Tiscali ha scritto:
 Dear R friends,
 
 I'd like to generate random sample (variable size and range) without a
 specified distribution but with given mean and standard deviation.
 
 Could you help me?
 
 thanks in advance
 Angelo

As Ben Bolker said, any random sample must come from some distribution,
so you cannot generate one without specifying some distribution.

Insofar as your question can be interpreted, it will be satisfied
if, given the desired mean, M, and SD, S, you take any two available
distributions with, respectively, known means M1 and M2 and known
SDs S1 and S2. Let X1 denote a sample from t5he first, and X2 a
sample from the second.

Then (X1 - M1)/(S1/S) is a sample from the first distribution
re-scaled to have mean M and SD S, as required.

Similarly, (X2 - M2)/(S2/S) is a sample from the second distribution
re-scaled to have mean M and SD S, as required.

As for what the first distribution that you sample from, and the second,
that can be at your own choice -- for eample, the first could be
the Standard Normal (M1 = 0, S1 = 1); use rnomr().
The second could be the uniform on (0,1) (M2 = 0.5, S2 = 1/sqrt(12));
use runif().

Similar for other arbitrary choices of first and second distribution
(so long as each has at least a second moment, hence excluding, for
example, the Cauchy distribution).

That's about as far as one can go with your question!

Hoping it helps, howevr.
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 03-Mar-2013  Time: 17:12:50
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help searching a matrix for only certain records

2013-03-03 Thread arun
HI,
You could also use ?data.table() 

n- 30
set.seed(51)
 mat1- as.matrix(data.frame(REC.TYPE= 
sample(c(SAO,FAO,FL-1,FL-2,FL-15),n,replace=TRUE),Col2=rnorm(n),Col3=runif(n),stringsAsFactors=FALSE))
 dat1- as.data.frame(mat1,stringsAsFactors=FALSE)
 table(mat1[,1])
#
 # FAO  FL-1 FL-15  FL-2   SAO 
#60046 60272 59669 59878 60135 
system.time(x1 - subset(mat1, grepl((SAO|FL-15), mat1[, REC.TYPE])))
 #user  system elapsed 
 # 0.076   0.004   0.082 
 system.time(x2 - subset(mat1, mat1[, REC.TYPE] %in% c(SAO, FL-15)))
 #  user  system elapsed 
 # 0.028   0.000   0.030 

system.time(x3 - mat1[match(mat1[, REC.TYPE]
    , c(SAO, FL-15)
    , nomatch = 0) != 0
    ,, drop = FALSE]
    )
#user  system elapsed 
#  0.028   0.000   0.028 
 table(x3[,1])
#
#FL-15   SAO 
#59669 60135 


library(data.table)

dat2- data.table(dat1) 
 system.time(x4- dat2[match(REC.TYPE,c(SAO, 
FL-15),nomatch=0)!=0,,drop=FALSE])
  # user  system elapsed 
  #0.024   0.000   0.025 
 table(x4$REC.TYPE)

#FL-15   SAO 
#59669 60135 
A.K.








- Original Message -
From: jim holtman jholt...@gmail.com
To: Matt Borkowski mathias1...@yahoo.com
Cc: r-help@r-project.org r-help@r-project.org
Sent: Sunday, March 3, 2013 11:52 AM
Subject: Re: [R] Help searching a matrix for only certain records

If you are using matrices, then here is several ways of doing it for
size 300,000.  You can determine if the difference of 0.1 seconds is
important in terms of the performance you are after.  It is taking you
more time to type in the statements than it is taking them to execute:

 n - 30
 testdata - matrix(
+     sample(c(SAO , FL-15, Other), n, TRUE, prob = c(1,2,1000))
+     , nrow = n
+     , dimnames = list(NULL, REC.TYPE)
+     )
 table(testdata[, REC.TYPE])

FL-15  Other   SAO
   562 299151    287
 system.time(x1 - subset(testdata, grepl((SAO |FL-15), testdata[, 
 REC.TYPE])))
   user  system elapsed
   0.17    0.00    0.17
 system.time(x2 - subset(testdata, testdata[, REC.TYPE] %in% c(SAO , 
 FL-15)))
   user  system elapsed
   0.05    0.00    0.05
 system.time(x3 - testdata[match(testdata[, REC.TYPE]
+                             , c(SAO , FL-15)
+                             , nomatch = 0) != 0
+                             ,, drop = FALSE]
+             )
   user  system elapsed
   0.03    0.00    0.03
 identical(x1, x2)
[1] TRUE
 identical(x2, x3)
[1] TRUE



On Sun, Mar 3, 2013 at 11:22 AM, Jim Holtman jholt...@gmail.com wrote:
 there are way more efficient ways of doing many of the operations , but you 
 probably won't see any differences unless you have very large objects 
 (several hunfred thousand entries), or have to do it a lot of times.  My 
 background is in computer performance and for the most part I have found that 
 the easiest/mostbstraight forward ways are fine most of the time.

 a more efficient way might be:

 testdata - testdata[match(c('SAO ', 'FL-15'), testdata$REC.TYPE), ]

 you can always use 'system.time' to determine how long actions take.

 for multiple comparisons use %in%

 Sent from my iPad

 On Mar 3, 2013, at 9:22, Matt Borkowski mathias1...@yahoo.com wrote:

 Thank you for your response Jim! I will give this one a try! But a couple 
 followup questions...

 In my search for a solution, I had seen something stating match() is much 
 more efficient than subset() and will cut down significantly on computing 
 time. Is there any truth to that?

 Also, I found the following solution which works for matching a single 
 condition, but I couldn't quite figure out how to  modify it it to search 
 for both my acceptable conditions...

 testdata - testdata[testdata$REC.TYPE == SAO,,drop=FALSE]

 -Matt




 --- On Sun, 3/3/13, jim holtman jholt...@gmail.com wrote:

 From: jim holtman jholt...@gmail.com
 Subject: Re: [R] Help searching a matrix for only certain records
 To: Matt Borkowski mathias1...@yahoo.com
 Cc: r-help@r-project.org
 Date: Sunday, March 3, 2013, 8:00 AM

 Try this:

 dataset - subset(dataset, grepl((SAO |FL-15), REC.TYPE))


 On Sun, Mar 3, 2013 at 1:11 AM, Matt Borkowski mathias1...@yahoo.com wrote:
 Let me start by saying I am rather new to R and generally consider myself 
 to be a novice programmer...so don't assume I know what I'm doing :)

 I have a large matrix, approximately 300,000 x 14. It's essentially a 
 20-year dataset of 15-minute data. However, I only need the rows where the 
 column I've named REC.TYPE contains the string SAO   or FL-15.

 My horribly inefficient solution was to search the matrix row by row, test 
 the REC.TYPE column and essentially delete the row if it did not match my 
 criteria. Essentially...

 j - 1
 for (i in 1:nrow(dataset)) {
     if(dataset$REC.TYPE[j] != SAO    dataset$RECTYPE[j] != FL-15) {
       dataset - dataset[-j,]  }
     else {
       j - j+1  }
 }

 After watching my code get through only about 10% of the matrix in an hour 
 and slowing with every row...I 

Re: [R] Help searching a matrix for only certain records

2013-03-03 Thread Matt Borkowski
Thank you for your response Jim! I will give this one a try! But a couple 
followup questions...

In my search for a solution, I had seen something stating match() is much more 
efficient than subset() and will cut down significantly on computing time. Is 
there any truth to that?

Also, I found the following solution which works for matching a single 
condition, but I couldn't quite figure out how to  modify it it to search for 
both my acceptable conditions...

 testdata - testdata[testdata$REC.TYPE == SAO,,drop=FALSE]

-Matt




--- On Sun, 3/3/13, jim holtman jholt...@gmail.com wrote:

From: jim holtman jholt...@gmail.com
Subject: Re: [R] Help searching a matrix for only certain records
To: Matt Borkowski mathias1...@yahoo.com
Cc: r-help@r-project.org
Date: Sunday, March 3, 2013, 8:00 AM

Try this:

dataset - subset(dataset, grepl((SAO |FL-15), REC.TYPE))


On Sun, Mar 3, 2013 at 1:11 AM, Matt Borkowski mathias1...@yahoo.com wrote:
 Let me start by saying I am rather new to R and generally consider myself to 
 be a novice programmer...so don't assume I know what I'm doing :)

 I have a large matrix, approximately 300,000 x 14. It's essentially a 20-year 
 dataset of 15-minute data. However, I only need the rows where the column 
 I've named REC.TYPE contains the string SAO   or FL-15.

 My horribly inefficient solution was to search the matrix row by row, test 
 the REC.TYPE column and essentially delete the row if it did not match my 
 criteria. Essentially...

 j - 1
 for (i in 1:nrow(dataset)) {
    if(dataset$REC.TYPE[j] != SAO    dataset$RECTYPE[j] != FL-15) {
      dataset - dataset[-j,]  }
    else {
      j - j+1  }
 }

 After watching my code get through only about 10% of the matrix in an hour 
 and slowing with every row...I figure there must be a more efficient way of 
 pulling out only the records I need...especially when I need to repeat this 
 for another 8 datasets.

 Can anyone point me in the right direction?

 Thanks!

 Matt

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] distribution functions and lists

2013-03-03 Thread Oleguer Plana Ripoll
Hello everyone,

I have a quick question but I am stuck with it and I do not know how to solve 
it.

Imagine I need the distribution function of a Weibull(1,1) at t=3, then I will 
write pweibull(3,1,1).

I want to keep the shape and scale parameters in a list (or a vector or 
whatever). Then I have
parameters-list(shape=1,scale=1) 
but when I write pweibull(3,parameters) I get the following error:
Error in pweibull(q, shape, scale, lower.tail, log.p) : 
  Non-numeric argument to mathematical function

I have to write pweibull(3,parameters[[1]],parameters[[2]]) but I am very 
interested in being able to write pweibull(3,parameters).

Does anyone know how to solve it?

Thank you very much,

Oleguer Plana
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] distribution functions and lists

2013-03-03 Thread Milan Bouchet-Valat
Le dimanche 03 mars 2013 à 19:49 +0100, Oleguer Plana Ripoll a écrit :
 Hello everyone,
 
 I have a quick question but I am stuck with it and I do not know how
 to solve it.
 
 Imagine I need the distribution function of a Weibull(1,1) at t=3,
 then I will write pweibull(3,1,1).
 
 I want to keep the shape and scale parameters in a list (or a vector
 or whatever). Then I have
 parameters-list(shape=1,scale=1) 
 but when I write pweibull(3,parameters) I get the following error:
 Error in pweibull(q, shape, scale, lower.tail, log.p) : 
   Non-numeric argument to mathematical function
 
 I have to write pweibull(3,parameters[[1]],parameters[[2]]) but I am
 very interested in being able to write pweibull(3,parameters).
 
 Does anyone know how to solve it?
What you are looking for is do.call():

parameters - list(q=3, shape=1, scale=1)
do.call(pweibull, parameters)

or

parameters - list(shape=1, scale=1)
do.call(pweibull, c(q=3, parameters))


Regards

 Thank you very much,
 
 Oleguer Plana
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] caret pls model statistics

2013-03-03 Thread Charles Determan Jr
I was under the impression that in PLS analysis, R2 was calculated by 1-
(Residual sum of squares) / (Sum of squares).  Is this still what you are
referring to?  I am aware of the linear R2 which is how well two variables
are correlated but the prior equation seems different to me.  Could you
explain if this is the same concept?

Charles

On Sun, Mar 3, 2013 at 12:46 PM, Max Kuhn mxk...@gmail.com wrote:

  Is there some literature that you make that statement?

 No, but there isn't literature on changing a lightbulb with a duck either.

  Are these papers incorrect in using these statistics?

 Definitely, if they convert 3+ categories to integers (but there are
 specialized R^2 metrics for binary classification models). Otherwise, they
 are just using an ill-suited score.

 How would you explain such an R^2 value to someone? R^2 is a function of
 correlation between the two random variables. For two classes, one of them
 is binary. What does it mean?

 Historically, models rooted in computer science (eg neural networks) used
 RMSE or SSE to fit models with binary outcomes and that *can* work work
 well.

 However, I don't think that communicating R^2 is effective. Other metrics
 (e.g. accuracy, Kappa, area under the ROC curve, etc) are designed to
 measure the ability of a model to classify and work well. With 3+
 categories, I tend to use Kappa.

 Max




 On Sun, Mar 3, 2013 at 10:53 AM, Charles Determan Jr deter...@umn.eduwrote:

 Thank you for your response Max.  Is there some literature that you make
 that statement?  I am confused as I have seen many publications that
 contain R^2 and Q^2 following PLSDA analysis.  The analysis usually is to
 discriminate groups (ie. classification).  Are these papers incorrect in
 using these statistics?

 Regards,
 Charles


 On Sat, Mar 2, 2013 at 10:39 PM, Max Kuhn mxk...@gmail.com wrote:

 Charles,

 You should not be treating the classes as numeric (is virginica really
 three times setosa?). Q^2 and/or R^2 are not appropriate for classification.

 Max


 On Sat, Mar 2, 2013 at 5:21 PM, Charles Determan Jr deter...@umn.eduwrote:

 I have discovered on of my errors.  The timematrix was unnecessary and
 an
 unfortunate habit I brought from another package.  The following
 provides
 the same R2 values as it should, however, I still don't know how to
 retrieve Q2 values.  Any insight would again be appreciated:

 library(caret)
 library(pls)

 data(iris)

 #needed to convert to numeric in order to do regression
 #I don't fully understand this but if I left as a factor I would get an
 error following the summary function
 iris$Species=as.numeric(iris$Species)
 inTrain1=createDataPartition(y=iris$Species,
 p=.75,
 list=FALSE)

 training1=iris[inTrain1,]
 testing1=iris[-inTrain1,]

 ctrl1=trainControl(method=cv,
 number=10)

 plsFit2=train(Species~.,
 data=training1,
 method=pls,
 trControl=ctrl1,
 metric=Rsquared,
 preProc=c(scale))

 data(iris)
 training1=iris[inTrain1,]
 datvars=training1[,1:4]
 dat.sc=scale(datvars)

 pls.dat=plsr(as.numeric(training1$Species)~dat.sc,
 ncomp=3, method=oscorespls, data=training1)

 x=crossval(pls.dat, segments=10)

 summary(x)
 summary(plsFit2)

 Regards,
 Charles

 On Sat, Mar 2, 2013 at 3:55 PM, Charles Determan Jr deter...@umn.edu
 wrote:

  Greetings,
 
  I have been exploring the use of the caret package to conduct some
 plsda
  modeling.  Previously, I have come across methods that result in a R2
 and
  Q2 for the model.  Using the 'iris' data set, I wanted to see if I
 could
  accomplish this with the caret package.  I use the following code:
 
  library(caret)
  data(iris)
 
  #needed to convert to numeric in order to do regression
  #I don't fully understand this but if I left as a factor I would get
 an
  error following the summary function
  iris$Species=as.numeric(iris$Species)
  inTrain1=createDataPartition(y=iris$Species,
  p=.75,
  list=FALSE)
 
  training1=iris[inTrain1,]
  testing1=iris[-inTrain1,]
 
  ctrl1=trainControl(method=cv,
  number=10)
 
  plsFit2=train(Species~.,
  data=training1,
  method=pls,
  trControl=ctrl1,
  metric=Rsquared,
  preProc=c(scale))
 
  data(iris)
  training1=iris[inTrain1,]
  datvars=training1[,1:4]
  dat.sc=scale(datvars)
 
  n=nrow(dat.sc)
  dat.indices=seq(1,n)
 
  timematrix=with(training1,
  classvec2classmat(Species[dat.indices]))
 
  pls.dat=plsr(timematrix ~ dat.sc,
  ncomp=3, method=oscorespls, data=training1)
 
  x=crossval(pls.dat, segments=10)
 
  summary(x)
  summary(plsFit2)
 
  I see two different R2 values and I cannot figure out how to get the
 Q2
  value.  Any insight as to what my errors may be would be appreciated.
 
  Regards,
 
  --
  Charles
 



 --
 Charles Determan
 Integrated Biosciences PhD Student
 University of Minnesota

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 

Re: [R] distribution functions and lists

2013-03-03 Thread Oleguer Plana Ripoll
Dear Milan and other users,

Thank you for your help, it worked. The problem is that the function do.call 
is not ready for vectors and I need it in order to integrate it afterwards.

With the pweibull, I can write:
pweibull(1,shape=1)
pweibull(2,shape=1)
pweibull(1:2,shape=1)

When I do the same with the do.call, I obtain an error:
do.call(pweibull,c(q=1,list(shape=1,scale=1)))
do.call(pweibull,c(q=2,list(shape=1,scale=1)))
do.call(pweibull,c(q=1:2,list(shape=1,scale=1)))
Error in pweibull(q1 = 1L, q2 = 2L, shape = 1, scale = 1) : 
  unused argument(s) (q1 = 1, q2 = 2)

Do you know how can I solve it?

Thank you, 
Oleguer


On 03/03/2013, at 20:32, Milan Bouchet-Valat nalimi...@club.fr wrote:

 Le dimanche 03 mars 2013 à 19:49 +0100, Oleguer Plana Ripoll a écrit :
 Hello everyone,
 
 I have a quick question but I am stuck with it and I do not know how
 to solve it.
 
 Imagine I need the distribution function of a Weibull(1,1) at t=3,
 then I will write pweibull(3,1,1).
 
 I want to keep the shape and scale parameters in a list (or a vector
 or whatever). Then I have
 parameters-list(shape=1,scale=1) 
 but when I write pweibull(3,parameters) I get the following error:
 Error in pweibull(q, shape, scale, lower.tail, log.p) : 
  Non-numeric argument to mathematical function
 
 I have to write pweibull(3,parameters[[1]],parameters[[2]]) but I am
 very interested in being able to write pweibull(3,parameters).
 
 Does anyone know how to solve it?
 What you are looking for is do.call():
 
 parameters - list(q=3, shape=1, scale=1)
 do.call(pweibull, parameters)
 
 or
 
 parameters - list(shape=1, scale=1)
 do.call(pweibull, c(q=3, parameters))
 
 
 Regards
 
 Thank you very much,
 
 Oleguer Plana
  [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] distribution functions and lists

2013-03-03 Thread Duncan Murdoch

On 13-03-03 3:39 PM, Oleguer Plana Ripoll wrote:

Dear Milan and other users,

Thank you for your help, it worked. The problem is that the function do.call 
is not ready for vectors and I need it in order to integrate it afterwards.


do.call() is fine, it's the argument list that needs fixing.  Construct 
a list containing elements q, shape, and scale.




With the pweibull, I can write:
pweibull(1,shape=1)
pweibull(2,shape=1)
pweibull(1:2,shape=1)

When I do the same with the do.call, I obtain an error:
do.call(pweibull,c(q=1,list(shape=1,scale=1)))
do.call(pweibull,c(q=2,list(shape=1,scale=1)))
do.call(pweibull,c(q=1:2,list(shape=1,scale=1)))
Error in pweibull(q1 = 1L, q2 = 2L, shape = 1, scale = 1) :
   unused argument(s) (q1 = 1, q2 = 2)

Do you know how can I solve it?


parameters - list(shape=1, scale=1)
do.call(pweibull, c(list(q=1:2), parameters))

Duncan Murdoch



Thank you,
Oleguer


On 03/03/2013, at 20:32, Milan Bouchet-Valat nalimi...@club.fr wrote:


Le dimanche 03 mars 2013 à 19:49 +0100, Oleguer Plana Ripoll a écrit :

Hello everyone,

I have a quick question but I am stuck with it and I do not know how
to solve it.

Imagine I need the distribution function of a Weibull(1,1) at t=3,
then I will write pweibull(3,1,1).

I want to keep the shape and scale parameters in a list (or a vector
or whatever). Then I have
parameters-list(shape=1,scale=1)
but when I write pweibull(3,parameters) I get the following error:
Error in pweibull(q, shape, scale, lower.tail, log.p) :
  Non-numeric argument to mathematical function

I have to write pweibull(3,parameters[[1]],parameters[[2]]) but I am
very interested in being able to write pweibull(3,parameters).

Does anyone know how to solve it?

What you are looking for is do.call():

parameters - list(q=3, shape=1, scale=1)
do.call(pweibull, parameters)

or

parameters - list(shape=1, scale=1)
do.call(pweibull, c(q=3, parameters))


Regards


Thank you very much,

Oleguer Plana
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





[[alternative HTML version deleted]]



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] caret pls model statistics

2013-03-03 Thread Max Kuhn
That the most common formula, but not the only one. See

  Kvålseth, T. (1985). Cautionary note about $R^2$. *American Statistician*,
*39*(4), 279–285.

Traditionally, the symbol 'R' is used for the Pearson correlation
coefficient and one way to calculate R^2 is... R^2.

Max


On Sun, Mar 3, 2013 at 3:16 PM, Charles Determan Jr deter...@umn.eduwrote:

 I was under the impression that in PLS analysis, R2 was calculated by 1-
 (Residual sum of squares) / (Sum of squares).  Is this still what you are
 referring to?  I am aware of the linear R2 which is how well two variables
 are correlated but the prior equation seems different to me.  Could you
 explain if this is the same concept?

 Charles


 On Sun, Mar 3, 2013 at 12:46 PM, Max Kuhn mxk...@gmail.com wrote:

  Is there some literature that you make that statement?

 No, but there isn't literature on changing a lightbulb with a duck either.

  Are these papers incorrect in using these statistics?

 Definitely, if they convert 3+ categories to integers (but there are
 specialized R^2 metrics for binary classification models). Otherwise, they
 are just using an ill-suited score.

  How would you explain such an R^2 value to someone? R^2 is
 a function of correlation between the two random variables. For two
 classes, one of them is binary. What does it mean?

 Historically, models rooted in computer science (eg neural networks) used
 RMSE or SSE to fit models with binary outcomes and that *can* work work
 well.

 However, I don't think that communicating R^2 is effective. Other metrics
 (e.g. accuracy, Kappa, area under the ROC curve, etc) are designed to
 measure the ability of a model to classify and work well. With 3+
 categories, I tend to use Kappa.

 Max




 On Sun, Mar 3, 2013 at 10:53 AM, Charles Determan Jr deter...@umn.eduwrote:

 Thank you for your response Max.  Is there some literature that you make
 that statement?  I am confused as I have seen many publications that
 contain R^2 and Q^2 following PLSDA analysis.  The analysis usually is to
 discriminate groups (ie. classification).  Are these papers incorrect in
 using these statistics?

 Regards,
 Charles


 On Sat, Mar 2, 2013 at 10:39 PM, Max Kuhn mxk...@gmail.com wrote:

 Charles,

 You should not be treating the classes as numeric (is virginica really
 three times setosa?). Q^2 and/or R^2 are not appropriate for 
 classification.

 Max


 On Sat, Mar 2, 2013 at 5:21 PM, Charles Determan Jr 
 deter...@umn.eduwrote:

 I have discovered on of my errors.  The timematrix was unnecessary and
 an
 unfortunate habit I brought from another package.  The following
 provides
 the same R2 values as it should, however, I still don't know how to
 retrieve Q2 values.  Any insight would again be appreciated:

 library(caret)
 library(pls)

 data(iris)

 #needed to convert to numeric in order to do regression
 #I don't fully understand this but if I left as a factor I would get an
 error following the summary function
 iris$Species=as.numeric(iris$Species)
 inTrain1=createDataPartition(y=iris$Species,
 p=.75,
 list=FALSE)

 training1=iris[inTrain1,]
 testing1=iris[-inTrain1,]

 ctrl1=trainControl(method=cv,
 number=10)

 plsFit2=train(Species~.,
 data=training1,
 method=pls,
 trControl=ctrl1,
 metric=Rsquared,
 preProc=c(scale))

 data(iris)
 training1=iris[inTrain1,]
 datvars=training1[,1:4]
 dat.sc=scale(datvars)

 pls.dat=plsr(as.numeric(training1$Species)~dat.sc,
 ncomp=3, method=oscorespls, data=training1)

 x=crossval(pls.dat, segments=10)

 summary(x)
 summary(plsFit2)

 Regards,
 Charles

 On Sat, Mar 2, 2013 at 3:55 PM, Charles Determan Jr deter...@umn.edu
 wrote:

  Greetings,
 
  I have been exploring the use of the caret package to conduct some
 plsda
  modeling.  Previously, I have come across methods that result in a
 R2 and
  Q2 for the model.  Using the 'iris' data set, I wanted to see if I
 could
  accomplish this with the caret package.  I use the following code:
 
  library(caret)
  data(iris)
 
  #needed to convert to numeric in order to do regression
  #I don't fully understand this but if I left as a factor I would get
 an
  error following the summary function
  iris$Species=as.numeric(iris$Species)
  inTrain1=createDataPartition(y=iris$Species,
  p=.75,
  list=FALSE)
 
  training1=iris[inTrain1,]
  testing1=iris[-inTrain1,]
 
  ctrl1=trainControl(method=cv,
  number=10)
 
  plsFit2=train(Species~.,
  data=training1,
  method=pls,
  trControl=ctrl1,
  metric=Rsquared,
  preProc=c(scale))
 
  data(iris)
  training1=iris[inTrain1,]
  datvars=training1[,1:4]
  dat.sc=scale(datvars)
 
  n=nrow(dat.sc)
  dat.indices=seq(1,n)
 
  timematrix=with(training1,
  classvec2classmat(Species[dat.indices]))
 
  pls.dat=plsr(timematrix ~ dat.sc,
  ncomp=3, method=oscorespls, data=training1)
 
  x=crossval(pls.dat, segments=10)
 
  summary(x)
  summary(plsFit2)
 
  I see two different R2 values and I cannot figure out 

[R] Creating 3d partial dependence plots

2013-03-03 Thread Jerrod Parker
Help,

I've been having a difficult time trying to create 3d partial dependence
plots using rgl.  It looks like this question has been asked a couple
times, but I'm unable to find a clear answer googling.  I've tried creating
x, y, and z variables by extracting them from the partialPlot output to no
avail.  I've seen these plots used several times in articles, and I think
they would help me a great deal looking at interactions.  Could someone
provide a coding example using randomForest and rgl?  It would be greatly
appreciated.

Thank you,
Jerrod Parker

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating 3d partial dependence plots

2013-03-03 Thread Duncan Murdoch

On 13-03-03 7:08 PM, Jerrod Parker wrote:

Help,

I've been having a difficult time trying to create 3d partial dependence
plots using rgl.  It looks like this question has been asked a couple
times, but I'm unable to find a clear answer googling.  I've tried creating
x, y, and z variables by extracting them from the partialPlot output to no
avail.  I've seen these plots used several times in articles, and I think
they would help me a great deal looking at interactions.  Could someone
provide a coding example using randomForest and rgl?  It would be greatly
appreciated.



I think you are making your question too hard to answer.  Show us an 
example of what you tried (a self-contained, minimal example, of 
course) and we'll suggest ways to fix it.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Empirical Bayes Estimator for Poisson-Gamma Parameters

2013-03-03 Thread Nicole Ford
Did you try using MLE to approximate the marginal?  


On Mar 3, 2013, at 5:26 AM, Ali A. Bromideh wrote:

 Dear Nicole,
 
 First of all, my sincere gratitude goes to your kind reply. As I told to Mr.
 Gunter, this is a part of my research and differs from homework. However, I
 am going to clarify the problem. Suppose we have received an observation
 from a Poisson distr. i.e. Y_1~Pois(Lam_1), where Lam_1~Gamma(alpha_1,
 beta_1). Now, what's the empirical Bayes (EB) estimation for alpha_1 and
 beta_1? 
 Let Y_2~Pois(Lam_2) and Lam_2~Gamma(alpha_2, beta_2). Again how can we
 calculate EB for alpha_2 and beta_2? 
 
 In fact, I read the relevant paper by Robbins at
 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC350425/ but it gave 0 for Y_1.
 And for the Var(Y)  E(Y), it generates negative value for positive value of
 alpha/beta!! 
 
 Any idea? 
 
 Kind regards,
 
 
 
 -Original Message-
 From: Nicole Ford [mailto:nicole.f...@me.com] 
 Sent: Sunday, March 03, 2013 4:09 AM
 To: Bert Gunter
 Cc: Boroumideh-Ali Akbar; r-help@r-project.org
 Subject: Re: [R] Empirical Bayes Estimator for Poisson-Gamma Parameters
 
 also, kruschke at indiana has some info on this, both online and youtube.
 (if homework.)  if not, more infor will be helpful.
 
 ~n
 
 
 On Feb 25, 2013, at 9:41 AM, Bert Gunter wrote:
 
 Homework? We don't do homework here.
 
 If not, search (e.g. via google -- R hierarchical Bayes -- or some
 such).
 
 -- Bert
 
 On Mon, Feb 25, 2013 at 1:39 AM, Ali A. Bromideh a.bromi...@ikco.com
 wrote:
 Dear Sir/Madam,
 
 
 
 I apologize for any cross-posting. I got a simple question, which I
 thought
 the R list may help me to find an answer. Suppose we have Y_1, Y_2, .,
 Y_n ~
 Poisson (Lambda_i) and Lambda_i ~Gamma(alpha_i, beta_i).  Empirical Bayes
 Estimator for hyper-parameters of the gamma distr, i.e. (alpha_t, beta_t)
 are needed.
 
 
 
 y=c(12,5,17,14)
 
 n=4
 
 
 
 What about a Hierarchal B ayes estimators?
 
 
 
 
 
 Any relevant work and codes in R (or S+) is highly appreciated.
 
 
 
 Kind regards,
 
 Ali
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -- 
 
 Bert Gunter
 Genentech Nonclinical Biostatistics
 
 Internal Contact Info:
 Phone: 467-7374
 Website:
 
 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biost
 atistics/pdb-ncb-home.htm
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help searching a matrix for only certain records

2013-03-03 Thread Matt Borkowski
I appreciate all the feedback on this. I ended up using this line to solve my 
problem, just because I stumbled upon it first...

 alldata - alldata[alldata$REC.TYPE == SAO   | alldata$REC.TYPE == 
 FM-15,,drop=FALSE]

But I think Jim's solution would work equally as well. I was a bit confused by 
the relative complexity of the data frames solution, as it seems like more 
steps than necessary.

Thanks again for the input!

-Matt




Again, thanks for the feedback!

--- On Sun, 3/3/13, arun smartpink...@yahoo.com wrote:

 From: arun smartpink...@yahoo.com
 Subject: Re: [R] Help searching a matrix for only certain records
 To: Matt Borkowski mathias1...@yahoo.com
 Cc: R help r-help@r-project.org, jim holtman jholt...@gmail.com
 Date: Sunday, March 3, 2013, 1:29 PM
 HI,
 You could also use ?data.table() 
 
 n- 30
 set.seed(51)
  mat1- as.matrix(data.frame(REC.TYPE=
 sample(c(SAO,FAO,FL-1,FL-2,FL-15),n,replace=TRUE),Col2=rnorm(n),Col3=runif(n),stringsAsFactors=FALSE))
  dat1- as.data.frame(mat1,stringsAsFactors=FALSE)
  table(mat1[,1])
 #
  # FAO  FL-1 FL-15  FL-2   SAO 
 #60046 60272 59669 59878 60135 
 system.time(x1 - subset(mat1, grepl((SAO|FL-15),
 mat1[, REC.TYPE])))
  #user  system elapsed 
  # 0.076   0.004   0.082 
  system.time(x2 - subset(mat1, mat1[, REC.TYPE] %in%
 c(SAO, FL-15)))
  #  user  system elapsed 
  # 0.028   0.000   0.030 
 
 system.time(x3 - mat1[match(mat1[, REC.TYPE]
     ,
 c(SAO, FL-15)
     ,
 nomatch = 0) != 0
     ,,
 drop = FALSE]
     )
 #user  system elapsed 
 #  0.028   0.000   0.028 
  table(x3[,1])
 #
 #FL-15   SAO 
 #59669 60135 
 
 
 library(data.table)
 
 dat2- data.table(dat1) 
  system.time(x4- dat2[match(REC.TYPE,c(SAO,
 FL-15),nomatch=0)!=0,,drop=FALSE])
   # user  system elapsed 
   #0.024   0.000   0.025 
  table(x4$REC.TYPE)
 
 #FL-15   SAO 
 #59669 60135 
 A.K.
 
 
 
 
 
 
 
 
 - Original Message -
 From: jim holtman jholt...@gmail.com
 To: Matt Borkowski mathias1...@yahoo.com
 Cc: r-help@r-project.org
 r-help@r-project.org
 Sent: Sunday, March 3, 2013 11:52 AM
 Subject: Re: [R] Help searching a matrix for only certain
 records
 
 If you are using matrices, then here is several ways of
 doing it for
 size 300,000.  You can determine if the difference of 0.1
 seconds is
 important in terms of the performance you are after.  It is
 taking you
 more time to type in the statements than it is taking them
 to execute:
 
  n - 30
  testdata - matrix(
 +     sample(c(SAO , FL-15, Other), n, TRUE,
 prob = c(1,2,1000))
 +     , nrow = n
 +     , dimnames = list(NULL, REC.TYPE)
 +     )
  table(testdata[, REC.TYPE])
 
 FL-15  Other   SAO
    562 299151    287
  system.time(x1 - subset(testdata, grepl((SAO
 |FL-15), testdata[, REC.TYPE])))
    user  system elapsed
    0.17    0.00    0.17
  system.time(x2 - subset(testdata, testdata[,
 REC.TYPE] %in% c(SAO , FL-15)))
    user  system elapsed
    0.05    0.00    0.05
  system.time(x3 - testdata[match(testdata[,
 REC.TYPE]
 +                             , c(SAO ,
 FL-15)
 +                             , nomatch =
 0) != 0
 +                             ,, drop =
 FALSE]
 +             )
    user  system elapsed
    0.03    0.00    0.03
  identical(x1, x2)
 [1] TRUE
  identical(x2, x3)
 [1] TRUE
 
 
 
 On Sun, Mar 3, 2013 at 11:22 AM, Jim Holtman jholt...@gmail.com
 wrote:
  there are way more efficient ways of doing many of
 the operations , but you probably won't see any differences
 unless you have very large objects (several hunfred thousand
 entries), or have to do it a lot of times.  My background
 is in computer performance and for the most part I have
 found that the easiest/mostbstraight forward ways are fine
 most of the time.
 
  a more efficient way might be:
 
  testdata - testdata[match(c('SAO ', 'FL-15'),
 testdata$REC.TYPE), ]
 
  you can always use 'system.time' to determine how long
 actions take.
 
  for multiple comparisons use %in%
 
  Sent from my iPad
 
  On Mar 3, 2013, at 9:22, Matt Borkowski mathias1...@yahoo.com
 wrote:
 
  Thank you for your response Jim! I will give this
 one a try! But a couple followup questions...
 
  In my search for a solution, I had seen something
 stating match() is much more efficient than subset() and
 will cut down significantly on computing time. Is there any
 truth to that?
 
  Also, I found the following solution which works
 for matching a single condition, but I couldn't quite figure
 out how to  modify it it to search for both my acceptable
 conditions...
 
  testdata - testdata[testdata$REC.TYPE ==
 SAO,,drop=FALSE]
 
  -Matt
 
 
 
 
  --- On Sun, 3/3/13, jim holtman jholt...@gmail.com
 wrote:
 
  From: jim holtman jholt...@gmail.com
  Subject: Re: [R] Help searching a matrix for only
 certain records
  To: Matt Borkowski mathias1...@yahoo.com
  Cc: r-help@r-project.org
  Date: Sunday, March 3, 2013, 8:00 AM
 
  Try this:
 
  dataset - subset(dataset, grepl((SAO 

Re: [R] SAS and R complement each other

2013-03-03 Thread Frank Harrell
I'm not sure why you posted the original note.  I quit using SAS in 1991 and
haven't needed it yet.
Frank

RogerJDeAngelis wrote
 Sorry about the double post. But I keep getting 'post' rejections, so I
 resubmitted about an hour later.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/SAS-and-R-complement-each-other-tp4660157p4660190.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.