Re: [R] Merging two data frames, but keeping NAs

2013-12-06 Thread Rainer M Krug
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/05/13, 16:11 , Sarah Goslee wrote:
 Adding the argument all.x=TRUE to merge() will retain the NA
 values, but the only reliable way I've found to preserve order with
 NA values in a merge is to add an index column to x, merge the
 data, sort on the index column, then delete it.

Thanks Sarah - that works nicely, although it is a not so nice
workaround 0 there should be an argument in merge to keep NA...

Cheers,

Rainer

 
 Sarah
 
 On Thu, Dec 5, 2013 at 9:56 AM, Rainer M Krug rai...@krugs.de
 wrote:
 -BEGIN PGP SIGNED MESSAGE- Hash: SHA1
 
 Hi
 
 My brain is giving up on this...
 
 I have the following two data.frames:
 
 x -  data.frame(ref=c(NA, NA, NA, 10:5, NA, 1:5)) y -
 data.frame(id = c(2, 3, 4, 6, 7, 9, 8), val = 101:107)
 
 Which look as follow:
 
 x
 ref 1   NA 2   NA 3   NA 4   10 59 68 77 86 9
 5 10  NA 11   1 12   2 13   3 14   4 15   5
 y
 id val 1  2 101 2  3 102 3  4 103 4  6 104 5  7 105 6  9 106 7  8
 107
 
 
 Now I want to merge y into x, but that
 
 a) the sort order of x stays the same (sort=FALSE in merge())
 and b) the NAs stay
 
 The result should look as follow (column id only here for
 clarity):
 
 result
 ref  id  val 1   NA  NA  NA 2   NA  NA  NA 3   NA  NA  NA 4   10
 NA  NA 59   9   106 68   8   107 77   7   105 86
 6   104 95  NA  NA 10  NA  NA  NA 11   1  NA  NA 12   2   2
 101 13   3   3  102 14   4   4  103 15   5  NA  NA
 
 merge(x, y, by.x=ref, by.y=id, sort=FALSE) leaves out the NA,
 but otherwise it works:
 
 merge(x, y, by.x=1, by.y=id, sort=FALSE)
 ref val 1   9 106 2   8 107 3   7 105 4   6 104 5   2 101 6   3
 102 7   4 103
 
 Is there any way that I can tell merge() to keep the NA, or how
 can I achieve what I want?
 
 Thanks,
 
 Rainer
 
 

- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :   +33 - (0)9 53 10 27 44
Cell:   +33 - (0)6 85 62 59 98
Fax :   +33 - (0)9 58 10 27 44

Fax (D):+49 - (0)3 21 21 25 22 44

email:  rai...@krugs.de

Skype:  RMkrug
-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJSoYwnAAoJENvXNx4PUvmCTjwH/2s8NdixLDI7uWvZ0p90wFxK
OMq9IcOTQ/VEK6ksYzN5e8Q6ukGCgMPW2OKqrLkqr9xhtt49toWR64CgXGgqnKYu
Vu5BT8MldwvtLYLWjyGGlrsz4VXFBixTQxfPPltSXakT742Wno7T0OLIL7V8FBgk
AqdRZpN6+QfBiQGFO7doXWndvnvXXD3uOqEAe89xwV3PBNHLCNDcMKY74HQ+t4F+
RrBzKZRvBOrwyfHFGFGfvEluewpcsPY2ooR/TqcO1XaLz94A5F2RcHdedqkIcdln
tEcOWZq9j9RWQo/9Af4pdxv9CClt8molP3rG4JRYA4x9JiSj4GNYNNF5wnofTAw=
=nxjF
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merging two data frames, but keeping NAs

2013-12-06 Thread Rainer M Krug
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



On 12/05/13, 16:37 , arun wrote:
 Hi, Try ?join()
 
 library(plyr)

Well - what would we do without Hadley ...

He solved many problems we didn't know we would have soon...

Cheers,

Rainer

 y$ref - y$id
 join(x,y,by=ref)
 ref id val 1   NA NA  NA 2   NA NA  NA 3   NA NA  NA 4   10 NA  NA 
 59  9 106 68  8 107 77  7 105 86  6 104 95 NA
 NA 10  NA NA  NA 11   1 NA  NA 12   2  2 101 13   3  3 102 14   4
 4 103 15   5 NA  NA
 
 
 A.K.
 
 
 On Thursday, December 5, 2013 9:58 AM, Rainer M Krug
 rai...@krugs.de wrote: Hi
 
 My brain is giving up on this...
 
 I have the following two data.frames:
 
 x -  data.frame(ref=c(NA, NA, NA, 10:5, NA, 1:5)) y -
 data.frame(id = c(2, 3, 4, 6, 7, 9, 8), val = 101:107)
 
 Which look as follow:
 
 x
 ref 1   NA 2   NA 3   NA 4   10 59 68 77 86 95 
 10  NA 11   1 12   2 13   3 14   4 15   5
 y
 id val 1  2 101 2  3 102 3  4 103 4  6 104 5  7 105 6  9 106 7  8
 107
 
 
 Now I want to merge y into x, but that
 
 a) the sort order of x stays the same (sort=FALSE in merge()) and 
 b) the NAs stay
 
 The result should look as follow (column id only here for
 clarity):
 
 result
 ref  id  val 1   NA  NA  NA 2   NA  NA  NA 3   NA  NA  NA 4   10
 NA  NA 59   9   106 68   8   107 77   7   105 86
 6   104 95  NA  NA 10  NA  NA  NA 11   1  NA  NA 12   2   2
 101 13   3   3  102 14   4   4  103 15   5  NA  NA
 
 merge(x, y, by.x=ref, by.y=id, sort=FALSE) leaves out the NA,
 but otherwise it works:
 
 merge(x, y, by.x=1, by.y=id, sort=FALSE)
 ref val 1   9 106 2   8 107 3   7 105 4   6 104 5   2 101 6   3
 102 7   4 103
 
 Is there any way that I can tell merge() to keep the NA, or how can
 I achieve what I want?
 
 Thanks,
 
 Rainer
 
 
 __ R-help@r-project.org
 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
 read the posting guide http://www.R-project.org/posting-guide.html 
 and provide commented, minimal, self-contained, reproducible code.
 

- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :   +33 - (0)9 53 10 27 44
Cell:   +33 - (0)6 85 62 59 98
Fax :   +33 - (0)9 58 10 27 44

Fax (D):+49 - (0)3 21 21 25 22 44

email:  rai...@krugs.de

Skype:  RMkrug
-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJSoYxtAAoJENvXNx4PUvmC8JMIANWUXBhCFgKv+wZs2oKv1jMm
qGLcd31a55j8NSoZZRf5v6coG+UEdVGhBu4cLlt1+0BRAhYIK9AnLvV9KXbt5zbI
PKySevB3box1ILbwsr8JH2YyOtlgjjint4LcGuEr4doNy0uo7a3G9J3ctxZgDFeE
QrmDH8EFc55lX76gzp41xUaAxvBP72GlgwK9O4jyO4f19LFcJ87C68s7Gwm2Qs4x
Ysc3JmZ8tC4BlD4H5FV/Pf6cLCxoX3CgQERGD+NNe5HCW/XSXOYsKzreamPr7ayd
bAuTDLRpPqUSYKG/nbcvjj0HMs06YNTYP4LTnwp08QUJ2VH98viQkTBF8OxDGgI=
=mK8w
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to concatenate the results from parallelized nested foreach loops

2013-12-06 Thread Jairaj Sathyanarayana
Hi all,

I am working with data.table objects within nested foreach loops and I am
having trouble creating the results object the way I would prefer.

Code below with sample data:

library(iterators)
library(data.table)
library(foreach)

#generate dummy data
set.seed(1212)
sample1 - data.frame(parentid=round((runif(5, min=1, max=5))),
childid=round(runif(10, min=1, max=10)))
length(unique(sample1$parentid))

#get unique parents
sample1uniq - as.data.frame(unique(sample1$parentid))
names(sample1uniq) - parentid

#convert original dataset to data.table
sample1 - data.table(sample1)
setkey(sample1,parentid)

#convert unique ids to data.table
sample1uniq - data.table(sample1uniq)
setkey(sample1uniq,parentid)

#a random sample of 5K to users to scan against
sample2uniq_idx - sample(1:nrow(sample1uniq), size=5000)
sample2uniq - sample1uniq[sample2uniq_idx]
sample2uniq - data.table(sample2uniq)
setkey(sample2uniq,parentid)

#construct iterators
sample1uniq_iter - iter(sample1uniq)
sample2uniq_iter - iter(sample2uniq)

outerresults - foreach (x = sample1uniq_iter, .combine=rbind,
.packages=c('foreach','doParallel', 'data.table')) %dopar% {
  b - sample1[J(x)]  #ith parent
  b2 - as.data.frame(b)[,2]  #ith parent's children

  foreach (y = sample2uniq_iter, .combine=rbind) %dopar% {
c - sample1[J(y)]  #jth parent
c2 - as.data.frame(c)[,2]  #jth parent's children

common - length(intersect(b2, c2))

if (common0) {
  uni - length(union(b2, c2))
  results - list(u1=x, u2=y, inter=common, union=uni)
}
  }
}

Note that all tasks can be done in parallel with no dependency issues.

I was expecting the results to come out like this (made up):

u1 u2 inter union
1  2  10  20
1  3  410
1  4  715
1  5  610
2  3  10  20
2  4  410
3  5  710
4  5  610

But they don't. Do I need to implement a different combine function? Any
other ideas/help will be appreciated.

thx

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to generate a smoothed surface for a three dimensional dataset?

2013-12-06 Thread A Xi Ma
The following question is inspired by Jun's problem, which resembles some
of my own problems, but goes off on a tangent about applying plot3D from
Karline Soetart.


On Thu, Dec 5, 2013 at 11:52 PM, Bert Gunter gunter.ber...@gene.com wrote:


 Your comment that:

  I can see the critical point here is to find a right function to
 make the prediction. 

 is what indicates to me that your critical point is that you have
 insufficient knowledge and need help. Feel free to disagree, of
 course.


I don't know if it's true for Jun, but it's definitely true for me - I have
insufficient knowledge! I'm out of my depth with surface estimation, but I
have to learn how to do it, one way or the other.

Currently I'm reading the docs for plot3d.

I loaded the package into rstudio and ran some of the examples.  The
image2D example seems to get its data from a data.frame called volcano
with a small v.

imag2D  nr - nrow(volcano)

imag2D  nc - ncol(volcano)

imag2D  image2D(volcano, x = 1:nr, y = 1:nc, lighting = TRUE,
imag2D+main = volcano, clab = height, m)



 The objects() command shows a Volcano with a big V.  The small-v and
big-V volcanoes are not the same, because the str command shows:
[69] mtcars  myf n   nam
 [73] nc  nms nr
o
 ...
[117] V   V2  Volcano
volcx
[121] volcy   VV  Vy
w
[125] warm.palweight  width
wombat
[129] x   x.atxx
xyz.fit
[133] y   y1  y2
y3
[137] y.atyearyy
z
[141] z0  zi  z.predict
zz
[145] zzz
 str(Volcano)
 num [1:29, 1:21] 100 103 105 108 110 116 120 122 123 118 ...
 str(volcano)
 num [1:87, 1:61] 100 101 102 103 104 105 105 106 107 108 ...


I don't understand how the volcano object works well enough to power the
image2D command, but doesn't show up in objects().  At first I thought
there was some kind of secret smuggling compartment in memory space, and
nr and nc and volcano were all hidden in that secret place.  But in
fact, nr and nc show up in objects().

So ... I am even less educated than the other newbies on the list, and I'm
following along, and I really don't see how R is doing what it's doing.
Should I be reading the plot3D .pdf textbooks, or should I give up and go
back to some much more basic textbook?

Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Generating restricted numbers

2013-12-06 Thread gncl dzgn
Hello everyone, 

I'm trying to generate a sequence that consists of random numbers and the 
following algorithm works well 

###

a - 0.08
b - 0.01
T - 90
t - 0:T 
alpha - 1
e - rnorm(T, mean = 0, sd = 0.1)
d - c( runif(1,0, a*T), rep(0, T-1) )
for (i in 2:T)
{
d[i] - alpha * d[i-1] + e[i]
}
plot(d, type=l)

##

But I have to add this restriction  each d on 
day t must satisfy to belong to the time-dependent interval [0, a*(T-t)] . For 
example, d on day 50 can be minimal 0 and maximal a*40. 

I hope, somebody helps me.

Best regards
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] model selection with step()

2013-12-06 Thread Karen Keating
I am using the step() function to select a model using backward
elimination, with AIC as the selection criterion.  The full regression
model contains three predictors, plus all the second order terms and
two-way interactions.  The full model is fit via lm() using two different
model formulae.  One formula uses explicitly defined variables for the
second-order and interaction terms and the other formula uses the I(x^2)
and colon operators.  The fit generated by lm() is exactly the same for
both models, but when I pass these fitted models to the step() function, I
get two different results.  Apparently, step() does not recognize the three
main predictors unless the second order and interaction terms are
explicitly defined as separate variables.

I assigned this problem to my first-year graduate students, not realizing
that R would give two different answers.  Now I have to re-grade their
homework, but I would really like to give them a reasonable explanation for
the discrepancy.

The complete code is given below.

Could anyone shed some light on this mystery?

Thanks in advance,
Karen Keating
Kansas State University


# Exercise 9.13, Kutner, Nachtsheim, Neter  Li
temp- scan()
49.0   45.0   36.0   45.0
55.0   30.0   28.0   40.0
85.0   11.0   16.0   42.0
32.0   30.0   46.0   40.0
26.0   39.0   76.0   43.0
28.0   42.0   78.0   27.0
95.0   17.0   24.0   36.0
26.0   63.0   80.0   42.0
74.0   25.0   12.0   52.0
37.0   32.0   27.0   35.0
31.0   37.0   37.0   55.0
49.0   29.0   34.0   47.0
38.0   26.0   32.0   28.0
41.0   38.0   45.0   30.0
12.0   38.0   99.0   26.0
44.0   25.0   38.0   47.0
29.0   27.0   51.0   44.0
40.0   37.0   32.0   54.0
31.0   34.0   40.0   36.0

dat- matrix(temp,ncol=4,nrow=length(temp)/4,byrow=T)
colnames(dat)-c('Y','X1','X2','X3')
dat - data.frame(dat)
attach(dat)

# second order terms and interactions
X12-X1*X2
X13-X1*X3
X23-X2*X3
X1sq - X1^2
X2sq - X2^2
X3sq - X3^2

fit1 - lm(Y~ X1sq  + X2sq  + X3sq  +X1+X2+X3+ X12 + X13 + X23 )
fit2 - lm(Y~I(X1^2)+I(X2^2)+I(X3^2)+X1+X2+X3+X1:X2+X1:X3+X2:X3)
sum( abs(fit1$res - fit2$res) ) # 0, so fitted models are the same
dim(model.matrix(fit1))   # 19 x 10
dim(model.matrix(fit2))   # 19 x 10

dim(fit1$model)  # 19 x 10
dim(fit2$model)  # 19 x 7 -- could this cause the discrepancy?

back1 - step(fit1,direction='backward')
back2 - step(fit2,direction='backward')
# Note that 'back1' considers the three primary predictors X1, X2 and X3,
# while 'back2' does not.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple Error Bar

2013-12-06 Thread Rolf Turner


Uh, no.  You are forgetting to take the square root of 10, and to divide 
by the square root of 12.


The variance of Y is (exactly) (56^2 - 1)/12, so the variance of Y-bar 
is this quantity over 10,
so the standard deviation of Y-bar is sqrt((56^2 - 1)/12)/sqrt(10).  
Which is approximately

(ignoring the -1) 56/sqrt(12) * 1/sqrt(10).

cheers,

Rolf

On 12/06/13 20:26, Jim Lemon wrote:

On 12/06/2013 04:16 PM, mohan.radhakrish...@polarisft.com wrote:

Hi,
   Basic question with basic code.   I am simulating a 
set of

'y' values for a standard 'x' value measurement. So here the error bars
are very long because the
number of samples are very small. Is that correct ? I am plotting the 
mean

of 'y' on the 'y' axis.


Thanks,
Mohan

x- data.frame(c(5,10,15,20,25,30,35,40,50,60))
  colnames(x)- c(x)

  y- sample(5:60,10,replace=T)
  y1- sample(5:60,10,replace=T)
  y2- sample(5:60,10,replace=T)
  y3- sample(5:60,10,replace=T)
  y4- sample(5:60,10,replace=T)

  z- data.frame(cbind(x,y,y1,y2,y3,y4))
  z$mean- apply(z[,c(2,3,4,5,6)],2,mean)
  z$sd- apply(z[,c(2,3,4,5,6)],2,sd)
  z$se- z$sd / sqrt(5)



Hi Mohan,
As your samples seem to follow a discrete uniform distribution, the 
standard deviation is approximately the number of integers in the 
range (56) divided by the number of observations (10).


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple Error Bar

2013-12-06 Thread mohan . radhakrishnan
The latest code I have put together is this. Could you point out what is 
missing here ? 

#Reference values plotted on x-axis. These are constant.
 #These values could be time of day. So every day at the same
 #time we could collect other measurements
 referenceset - data.frame(c(5,10,15,20,25,30,35,40,50,60))
 colnames( referenceset) - c(reference)

 #These are the sets of measurements. So every day at the same
 #time we could collect several samples. This is simulated now.
 sampleset - data.frame( matrix(sample(1:2, c(2), replace = TRUE), 
ncol = 2000) )
 
 sampleset - cbind( sampleset, referenceset )
 
 #Calculate mean
 sampleset$mean - apply(sampleset[,1:10],2,mean)
 
 #Calculate Standard Deviation
 sampleset$sd - apply(sampleset[,c(1:10)],2,sd)
 
 #Calculate Standard Error
 sampleset$se - sampleset$sd / sqrt(2000)
 
 #print(sampleset)
 
 plot( sampleset$reference,
   sampleset$mean,
   las=1,
   ylab=Mean of 'y' values,
   xlab=x,
  );
 
arrows(sampleset$reference,
   sampleset$mean-sampleset$se,
   sampleset$reference,
   sampleset$mean+sampleset$se,
   code = 3,
   angle=90,
   length=0.2)






Thanks.



From:   Rolf Turner r.tur...@auckland.ac.nz
To: Jim Lemon j...@bitwrit.com.au
Cc: mohan.radhakrish...@polarisft.com, r-help@r-project.org
Date:   12/06/2013 02:53 PM
Subject:Re: [R] Simple Error Bar




Uh, no.  You are forgetting to take the square root of 10, and to divide 
by the square root of 12.

The variance of Y is (exactly) (56^2 - 1)/12, so the variance of Y-bar 
is this quantity over 10,
so the standard deviation of Y-bar is sqrt((56^2 - 1)/12)/sqrt(10). 
Which is approximately
(ignoring the -1) 56/sqrt(12) * 1/sqrt(10).

 cheers,

 Rolf

On 12/06/13 20:26, Jim Lemon wrote:
 On 12/06/2013 04:16 PM, mohan.radhakrish...@polarisft.com wrote:
 Hi,
Basic question with basic code.   I am simulating a 
 set of
 'y' values for a standard 'x' value measurement. So here the error bars
 are very long because the
 number of samples are very small. Is that correct ? I am plotting the 
 mean
 of 'y' on the 'y' axis.


 Thanks,
 Mohan

 x- data.frame(c(5,10,15,20,25,30,35,40,50,60))
   colnames(x)- c(x)

   y- sample(5:60,10,replace=T)
   y1- sample(5:60,10,replace=T)
   y2- sample(5:60,10,replace=T)
   y3- sample(5:60,10,replace=T)
   y4- sample(5:60,10,replace=T)

   z- data.frame(cbind(x,y,y1,y2,y3,y4))
   z$mean- apply(z[,c(2,3,4,5,6)],2,mean)
   z$sd- apply(z[,c(2,3,4,5,6)],2,sd)
   z$se- z$sd / sqrt(5)


 Hi Mohan,
 As your samples seem to follow a discrete uniform distribution, the 
 standard deviation is approximately the number of integers in the 
 range (56) divided by the number of observations (10).




This e-Mail may contain proprietary and confidential information and is sent 
for the intended recipient(s) only.  If by an addressing or transmission error 
this mail has been misdirected to you, you are requested to delete this mail 
immediately. You are also hereby notified that any use, any form of 
reproduction, dissemination, copying, disclosure, modification, distribution 
and/or publication of this e-mail message, contents or its attachment other 
than by its intended recipient/s is strictly prohibited.

Visit us at http://www.polarisFT.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I print predicted effect sizes in forest plot?

2013-12-06 Thread Viechtbauer Wolfgang (STAT)
The model you are fitting is a random-effects model and does not include any 
potential moderators/covariates. Therefore, the estimated intercept of that 
model is *the* estimated/predicted (average) effect and it applies to each 
study. That is why the predict function also just gives you that value. That 
value is also included in the forest plot (at the bottom). The predicted 
(average) effect will no longer be the same for each study only if you include 
covariates in the model.

Best,
Wolfgang

--   
Wolfgang Viechtbauer, Ph.D., Statistician   
Department of Psychiatry and Psychology   
School for Mental Health and Neuroscience   
Faculty of Health, Medicine, and Life Sciences   
Maastricht University, P.O. Box 616 (VIJV1)   
6200 MD Maastricht, The Netherlands   
+31 (43) 388-4170 | http://www.wvbauer.com   

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Alma Wilflinger
 Sent: Thursday, December 05, 2013 22:06
 To: R help r-help@r-project.org
 Subject: [R] How do I print predicted effect sizes in forest plot?
 
 Hi,
 
 I am struggling a bit with creating a forest plot containing the predicted
 effect size. As seen in other studies these effect sizes are shown per
 study usually as a light grey diamond - which is what I want to achieve.
 
 The calls I use are:
 iat_result = rma(yi=Mean, vi=Variance_rounded, ni=N, sei=Std_error,
 slab=Study_Name, subset=(Country == AUT), data=cma_iat, method=HS)
 
 summary.rma(iat_result)
 
 
 #not sure how to use it or if needed
 #predict(iat_result)
 
 forest(iat_result)
 
 
 At the end I am getting the forest plot as is without the predicted
 values.
 
 I am not sure if I need the predict function and how to use it? - the
 predict function deliveres the same values as already computed in the rma
 object.
 
 
 I checked the manual for package metafor but was not able to find out how
 to print the predicted values per study.
 
 
 kind regards, Alma
   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I print predicted effect sizes in forest plot?

2013-12-06 Thread Viechtbauer Wolfgang (STAT)
One more thing ... You used the command:

iat_result = rma(yi=Mean, vi=Variance_rounded, ni=N, sei=Std_error, 
slab=Study_Name, subset=(Country == AUT), data=cma_iat, method=HS)

This probably does not do what you want it to do. First of all, if you specify 
vi, there is no need to specify sei (or vice-versa). One is sufficient. But 
more crucially, I assume 'Mean' is what it says it is - a mean of a certain 
variable X. And I assume that 'Variance_rounded' is the variance of said 
variable X. But vi is used to specify the *sampling variance* of yi (or sei is 
used to specify the standard error), which, for a mean, is the variance divided 
by N (and the standard error is the SD divided by the square root of N):

http://en.wikipedia.org/wiki/Standard_error_of_the_mean#Standard_error_of_the_mean

So, my hunch is that you are not supplying the right information to the rma() 
function.

Best,
Wolfgang 

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Alma Wilflinger
 Sent: Thursday, December 05, 2013 22:06
 To: R help r-help@r-project.org
 Subject: [R] How do I print predicted effect sizes in forest plot?
 
 Hi,
 
 I am struggling a bit with creating a forest plot containing the predicted
 effect size. As seen in other studies these effect sizes are shown per
 study usually as a light grey diamond - which is what I want to achieve.
 
 The calls I use are:
 iat_result = rma(yi=Mean, vi=Variance_rounded, ni=N, sei=Std_error,
 slab=Study_Name, subset=(Country == AUT), data=cma_iat, method=HS)
 
 summary.rma(iat_result)
 
 
 #not sure how to use it or if needed
 #predict(iat_result)
 
 forest(iat_result)
 
 
 At the end I am getting the forest plot as is without the predicted
 values.
 
 I am not sure if I need the predict function and how to use it? - the
 predict function deliveres the same values as already computed in the rma
 object.
 
 
 I checked the manual for package metafor but was not able to find out how
 to print the predicted values per study.
 
 
 kind regards, Alma
   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tune an support vector machine

2013-12-06 Thread Uwe Bohne

   Hej all,

   actually i try to tune a SVM in R and use the package e1071 wich works
   pretty well.
   I do some gridsearch in the parameters and get the best possible parameters
   for classification.
   Here is my sample code

   type-sample(c(-1,1) , 20, replace = TRUE )
   weight-sample(c(20:50),20, replace=TRUE)
   height-sample(c(100:200),20, replace=TRUE)
   width-sample(c(30:50),20,replace=TRUE)
   volume-sample(c(1000:5000),20,replace=TRUE)

   data-cbind(type,weight,height,width,volume)
   train-as.data.frame(data)
   library(e1071)

   features - c(weight,height,width,volume)
   (formula-as.formula(paste(type ~ , paste(features, collapse= +

   svmtune=tune.svm(formula,  data=train, kernel=radial, cost=2^(-2:5),
   gamma=2^(-2:1),cross=10)
   summary(svmtune)

   My question is if there is a way to tune the features.

   So in other words - what i wanna do is to try all possible combinations of
   features : for example use only (volume) or use (weight, height) or use
   (height,volume,width) and so on for the SVM  and to get the best combination
   back.


   Best wishes

   Uwe
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Generating restricted numbers

2013-12-06 Thread Adams, Jean
What value do you want d to take on if it is outside that interval?

Here is an example where if d is outside the interval, it is assigned to be
one of the interval endpoints.
minx - 0
for(i in 2:T) {
x - alpha * d[i-1] + e[i]
maxx - a*(T-i)
if(x  minx) {
 d[i] - minx
} else {
if(x  maxx) {
 d[i] - maxx
} else d[i] - x
}
}

Jean


On Thu, Dec 5, 2013 at 10:19 PM, gncl dzgn guncelduz...@hotmail.com wrote:

 Hello everyone,

 I'm trying to generate a sequence that consists of random numbers and the
 following algorithm works well

 ###

 a - 0.08
 b - 0.01
 T - 90
 t - 0:T
 alpha - 1
 e - rnorm(T, mean = 0, sd = 0.1)
 d - c( runif(1,0, a*T), rep(0, T-1) )
 for (i in 2:T)
 {
 d[i] - alpha * d[i-1] + e[i]
 }
 plot(d, type=l)

 ##

 But I have to add this restriction  each d on
 day t must satisfy to belong to the time-dependent interval [0, a*(T-t)]
 . For
 example, d on day 50 can be minimal 0 and maximal a*40.

 I hope, somebody helps me.

 Best regards

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I print predicted effect sizes in forest plot?

2013-12-06 Thread Michael Dewey

At 21:05 05/12/2013, Alma Wilflinger wrote:

Hi,

I am struggling a bit with creating a forest plot containing the 
predicted effect size. As seen in other studies these effect sizes 
are shown per study usually as a light grey diamond - which is what 
I want to achieve.


The calls I use are:
iat_result = rma(yi=Mean, vi=Variance_rounded, ni=N, sei=Std_error, 
slab=Study_Name, subset=(Country == AUT), data=cma_iat, method=HS)


Alma
You do not need to specify both vi and sei as one is sufficient and 
you do not need ni as well.
I realise that is not the question you asked (which Wolfgang has 
already answered).




summary.rma(iat_result)


#not sure how to use it or if needed
#predict(iat_result)

forest(iat_result)


At the end I am getting the forest plot as is without the predicted values.

I am not sure if I need the predict function and how to use it? - 
the predict function deliveres the same values as already computed 
in the rma object.



I checked the manual for package metafor but was not able to find 
out how to print the predicted values per study.



kind regards, Alma
[[alternative HTML version deleted]]


Michael Dewey
i...@aghmed.fsnet.co.uk
http://www.aghmed.fsnet.co.uk/home.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Double Infinite Integration

2013-12-06 Thread Hans W Borchers
Aya Anas aanas at feps.edu.eg writes:

 Hello all,
 
 I need to perform the following integration where the integrand is the
 product of three functions:
 f(x)g(y)z(x,y)
 
 the limits of x are(0,inf) and the limits of y are(-inf,inf).
 
 Could this be done using R?

There is a saying: Don't ask Can this be done in R?, ask How is it done?

Extracting function f(x) from the inner integral may not always be the best 
idea. And applying package 'cubature' will not work as adaptIntegrate() does 
not really handle non-finite interval limits.

As an example, let us assume the functions are

f - function(x) x
g - function(y) y^2
h - function(x, y) exp(-(x^2+y^2))

Define a function that calculates the inner integral:

F1 - function(x) {
fun - function(y) f(x) * g(y) * h(x, y)
integrate(fun, -Inf, Inf)$value
}
F1 - Vectorize(F1)  # requested when using integrate()

We have to check that integrate() is indeed capable of computing this integrand 
over an infinite interval.

F1(c(0:4))   # looks good
## [1] 0.00e+00 3.260247e-01 3.246362e-02 3.281077e-04 3.989274e-07

Now integrate this function over the second (infinite) interval.

integrate(F1, 0, Inf)
## 0.4431135 with absolute error  2.4e-06

Correct, as the integral is equal to sqrt(pi)/4 ~ 0.44311346...

If we extract f(x) from the inner integral the value of the integral and the
computation times will be the same, but the overall handling will be slightly 
more complicated.

 I tried using the function integrate 2 times, but it didn't work:
 z- function(x,y) {
 
 }
 f-function(x){
 rr-put here the function in x *integrate(function(y) z(x, y),
 -Inf,Inf)$value
 return(rr)
 }
 
 rr2-integrate(function(x) f(x), 0, Inf)$value
 print(rr2)
 
I didn't get any output at all!!!
 
 Thanks,
 Aya


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merging different columns in one matrix

2013-12-06 Thread arun
Hi,
May be this helps:

dat1 - read.table(text=
a a b b c c
x y x y x y
12   34   256  25    5   32
5    45    23   452   21   
45,sep=,header=TRUE,stringsAsFactors=FALSE,check.names=FALSE) 


 mat1 - matrix(0,5,5,dimnames=list(NULL,c(x,letters[1:4])))
 mat1[,1]- 
sort(unique(as.numeric(unlist(dat1[-1,which(dat1==x,arr.ind=TRUE)[,2]]

dat1New - dat1[-1,which(dat1==x,arr.ind=TRUE)[,2]]
dat2New - dat1[-1,which(dat1==y,arr.ind=TRUE)[,2]]
mat1[,2:4] -sapply(seq_len(ncol(dat1New)),function(i) {x1 
-dat2New[match(mat1[,1],dat1New[,i]),i]
x1[is.na(x1)] -0
as.numeric(x1)})
 mat1
#   x  a   b  c d
#[1,]   5 45   0 32 0
#[2,]  12 34   0  0 0
#[3,]  21  0   0 45 0
#[4,]  23  0 452  0 0
#[5,] 256  0  25  0 0


A.K.


Hello everyone, 

I have a dataframe made as follows: 

a     a     b     b     c     c 
x     y     x     y     x     y 
12   34   256  25    5   32 
5    45    23   452   21   45 
...   ...    ...   ...    ...    ... 

My intention is to create just one matrix made as follows 

x     a     b     c     d 
5     45   0     32    0 
12   34   0     0      0 
21   0     0     45    0 
23   0    452   0     0 
256  ...  ...     ...   ... 
... 

As you can see I want on the first column all the values 
collected from all the x columns and ordered. On the other columns I 
want the y-values related to every letter (a-b-c...). For example the 
first value on the x column is 5 (the smallest). It is present in the a 
x-values (first matrix) so in the second table I report its related 
y-value (45). However 5 is not present in the b x-values so I report a 0
 on the second table. And so on. 

I don't know if it's a difficult task but I had several problems
 with the double header handling and the data. I looked for some clues 
on the internet but documentation is very fragmented and lacking. 

(So, in addition, any recommendation for good R books?)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function

2013-12-06 Thread Walter Anderson
I am having trouble understanding how to use sapply (or similar 
functions) with a user defined function with multiple parameters.

I have the following functions defined

q1.ans - function(x)
{
   retVal = 0
   if (x == 1) {
 retVal = 1
   } else if (x ==2) {
 retVal = 2
   }
   return (retVal)
}
q2.ans - function(x)
{
   retVal = 0
   if (x == 1) {
 retVal = 1
   } else if (x ==2) {
 retVal = 3
   }
   return (retVal)
}
q3.ans - function(x)
{
   retVal = 0
   if (x == 1) {
 retVal = 2
   } else if (x ==2) {
 retVal = 3
   }
   return (retVal)
}

evaluate.questions - function(q.1,q.2,q.3)
{
   a - q1.ans(q.1)
   b - q2.ans(q.2)
   c - q3.ans(q.3)
   retVal = 0   # Set default value to be no preference
   # The following code only implements those values from the state
machine that show a preference (ID's 5,9,11,13-15,17-18,21,23-27)
   if (a == 0) {
 if (b == 1) {
   if (c == 1) {
 retVal = 1  # State machine ID 5
   }
 } else if (b == 2) {
   if (c == 2) {
 retVal = 2  # State machine ID 9
   }
 }
   } else if (a == 1) {
 if (b == 0) {
   if (c == 1) {
 retVal = 1  # State machine ID 11
   }
 } else if (b == 1) {
   retVal = 1# State machine ID's 13-15, value of C doesn't
matter
 } else if (b == 2) {
   if (c == 1) {
 retVal = 1  # State machine ID 17
   } else if (c == 2) {
 retVal = 2  # State machine ID 18
   }
 }
   } else if (a == 2) {
 if (b == 0) {
   if (c == 2) {
 retVal = 2  # State machine ID 21
   }
 } else if (b == 1) {
   if (c == 1) {
 retVal = 1  # State machine ID 23
   } else if (c == 2) {
 retVal = 2  # State machine ID 24
   }
 } else if (b == 2) {
   retVal = 2# State machine ID's 25-27, value of C doesn't
matter
 }
   }
   return (retVal)
}

And a data set that looks like this:

ID,Q1,Q2,Q3
1,2,2,2
2,2,1,1
3,1,1,1
4,1,2,2
5,2,2,1
6,1,2,1
...


I have been researching and it appears that I should be using the sapply 
function to apply the evaluate.question function above to each row in 
the data frame like this

preferences - sapply(df, evaluate.questions, function(x,y,z) 
evaluate.questions(df['Q1'],df['Q2'],df['Q3']))

Unfortunately this doesn't work and the problem appears that the sapply 
function is not feeding the parameters to the evaluate.questions 
function as I expect.  Can someone provide some guidance on what I am 
doing wrong?

This is the error message I am getting:

Error in x --1 :
   Comparison (1) is possible only for atomic and list types
In addition: warning messages:
In if (x == 1) { :
   the condition has length  1 and only the first element will be used

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function

2013-12-06 Thread PIKAL Petr
Hi

The warning is due to fact that if takes only single scalar value not an 
entire vector.

Maybe you shall explain more clearly what result do you expect.

I bet that there is vectorised solution to your problem but I am lost in your 
ifs and cannot follow what shall be the output.

Please use 

dput(head(df))

when showing input data and clearly describe intended result.

Regards
Petr


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Walter Anderson
 Sent: Friday, December 06, 2013 4:44 PM
 To: r-help@r-project.org
 Subject: [R] Need help figuring out sapply (and similar functions) with
 multiple parameter user defined function
 
 I am having trouble understanding how to use sapply (or similar
 functions) with a user defined function with multiple parameters.
 
 I have the following functions defined
 
 q1.ans - function(x)
 {
retVal = 0
if (x == 1) {
  retVal = 1
} else if (x ==2) {
  retVal = 2
}
return (retVal)
 }
 q2.ans - function(x)
 {
retVal = 0
if (x == 1) {
  retVal = 1
} else if (x ==2) {
  retVal = 3
}
return (retVal)
 }
 q3.ans - function(x)
 {
retVal = 0
if (x == 1) {
  retVal = 2
} else if (x ==2) {
  retVal = 3
}
return (retVal)
 }
 
 evaluate.questions - function(q.1,q.2,q.3)
 {
a - q1.ans(q.1)
b - q2.ans(q.2)
c - q3.ans(q.3)
retVal = 0   # Set default value to be no preference
# The following code only implements those values from the state
 machine that show a preference (ID's 5,9,11,13-15,17-18,21,23-27)
if (a == 0) {
  if (b == 1) {
if (c == 1) {
  retVal = 1  # State machine ID 5
}
  } else if (b == 2) {
if (c == 2) {
  retVal = 2  # State machine ID 9
}
  }
} else if (a == 1) {
  if (b == 0) {
if (c == 1) {
  retVal = 1  # State machine ID 11
}
  } else if (b == 1) {
retVal = 1# State machine ID's 13-15, value of C doesn't
 matter
  } else if (b == 2) {
if (c == 1) {
  retVal = 1  # State machine ID 17
} else if (c == 2) {
  retVal = 2  # State machine ID 18
}
  }
} else if (a == 2) {
  if (b == 0) {
if (c == 2) {
  retVal = 2  # State machine ID 21
}
  } else if (b == 1) {
if (c == 1) {
  retVal = 1  # State machine ID 23
} else if (c == 2) {
  retVal = 2  # State machine ID 24
}
  } else if (b == 2) {
retVal = 2# State machine ID's 25-27, value of C doesn't
 matter
  }
}
return (retVal)
 }
 
 And a data set that looks like this:
 
 ID,Q1,Q2,Q3
 1,2,2,2
 2,2,1,1
 3,1,1,1
 4,1,2,2
 5,2,2,1
 6,1,2,1
 ...
 
 
 I have been researching and it appears that I should be using the
 sapply function to apply the evaluate.question function above to each
 row in the data frame like this
 
 preferences - sapply(df, evaluate.questions, function(x,y,z)
 evaluate.questions(df['Q1'],df['Q2'],df['Q3']))
 
 Unfortunately this doesn't work and the problem appears that the sapply
 function is not feeding the parameters to the evaluate.questions
 function as I expect.  Can someone provide some guidance on what I am
 doing wrong?
 
 This is the error message I am getting:
 
 Error in x --1 :
Comparison (1) is possible only for atomic and list types In
 addition: warning messages:
 In if (x == 1) { :
the condition has length  1 and only the first element will be used
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function

2013-12-06 Thread Walter Anderson

Thank you for your response!

I am attempting to determine a preference from the answers to three 
binomial questions;


q.1) 1 or 2q.2) 1 or 3q.3) 2 or 3

However, the questions are coded with either a 1 or 2 (though no answer 
is also possible) and the first three functions (q#.ans) convert those 
values to the 1,2,or 3 shown above


and generate one of the following result for each row of the table; 0 - 
no preference, or 1,2,3 which indicates the preference indicated by the 
question


The if's implement the following state conditions:

  # ID A  B  C  Preference
  # 1  0  0  0  None
  # 2  0  0  1  None
  # 3  0  0  2  None
  # 4  0  1  0  None
  # 5  0  1  1  Option 1
  # 6  0  1  2  None
  # 7  0  2  0  None
  # 8  0  2  1  None
  # 9  0  2  2  Option 2
  # 10 1  0  0  None
  # 11 1  0  1  Option 1
  # 12 1  0  2  None
  # 13 1  1  0  Option 1
  # 14 1  1  1  Option 1
  # 15 1  1  2  Option 1
  # 16 1  2  0  None
  # 17 1  2  1  Option 1
  # 18 1  2  2  Option 2
  # 19 2  0  0  None
  # 20 2  0  1  None
  # 21 2  0  2  Option 2
  # 22 2  1  0  None
  # 23 2  1  1  Option 1
  # 24 2  1  2  Option 2
  # 25 2  2  0  Option 2
  # 26 2  2  1  Option 2
  # 27 2  2  2  Option 2

The if statement only implements those values from the state machine 
that show a preference (ID's 5,9,11,13-15,17-18,21,23-27)


On 12/06/2013 09:59 AM, PIKAL Petr wrote:

Hi

The warning is due to fact that if takes only single scalar value not an 
entire vector.

Maybe you shall explain more clearly what result do you expect.

I bet that there is vectorised solution to your problem but I am lost in your 
ifs and cannot follow what shall be the output.

Please use

dput(head(df))

when showing input data and clearly describe intended result.

Regards
Petr



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
project.org] On Behalf Of Walter Anderson
Sent: Friday, December 06, 2013 4:44 PM
To: r-help@r-project.org
Subject: [R] Need help figuring out sapply (and similar functions) with
multiple parameter user defined function

I am having trouble understanding how to use sapply (or similar
functions) with a user defined function with multiple parameters.

I have the following functions defined

 q1.ans - function(x)
 {
retVal = 0
if (x == 1) {
  retVal = 1
} else if (x ==2) {
  retVal = 2
}
return (retVal)
 }
 q2.ans - function(x)
 {
retVal = 0
if (x == 1) {
  retVal = 1
} else if (x ==2) {
  retVal = 3
}
return (retVal)
 }
 q3.ans - function(x)
 {
retVal = 0
if (x == 1) {
  retVal = 2
} else if (x ==2) {
  retVal = 3
}
return (retVal)
 }

 evaluate.questions - function(q.1,q.2,q.3)
 {
a - q1.ans(q.1)
b - q2.ans(q.2)
c - q3.ans(q.3)
retVal = 0   # Set default value to be no preference
# The following code only implements those values from the state
 machine that show a preference (ID's 5,9,11,13-15,17-18,21,23-27)
if (a == 0) {
  if (b == 1) {
if (c == 1) {
  retVal = 1  # State machine ID 5
}
  } else if (b == 2) {
if (c == 2) {
  retVal = 2  # State machine ID 9
}
  }
} else if (a == 1) {
  if (b == 0) {
if (c == 1) {
  retVal = 1  # State machine ID 11
}
  } else if (b == 1) {
retVal = 1# State machine ID's 13-15, value of C doesn't
 matter
  } else if (b == 2) {
if (c == 1) {
  retVal = 1  # State machine ID 17
} else if (c == 2) {
  retVal = 2  # State machine ID 18
}
  }
} else if (a == 2) {
  if (b == 0) {
if (c == 2) {
  retVal = 2  # State machine ID 21
}
  } else if (b == 1) {
if (c == 1) {
  retVal = 1  # State machine ID 23
} else if (c == 2) {
  retVal = 2  # State machine ID 24
}
  } else if (b == 2) {
retVal = 2# State machine ID's 25-27, value of C doesn't
 matter
  }
}
return (retVal)
 }

And a data set that looks like this:

 ID,Q1,Q2,Q3
 1,2,2,2
 2,2,1,1
 3,1,1,1
 4,1,2,2
 5,2,2,1
 6,1,2,1
 ...


I have been researching and it appears that I should be using the
sapply function to apply the evaluate.question function above to each
row in the data frame like this

preferences - sapply(df, evaluate.questions, function(x,y,z)
evaluate.questions(df['Q1'],df['Q2'],df['Q3']))

Unfortunately this doesn't work and the problem appears that the sapply
function is not feeding the parameters to the evaluate.questions
function as I expect.  Can someone provide some 

Re: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function

2013-12-06 Thread PIKAL Petr
Hi

So first step is over. Anyway, is there any problem with using dput as I 
suggested?

Instead of using your date I need to generate my own.

A-sample(0:2, 10, replace=T)
B-sample(0:2, 10, replace=T)
C-sample(0:2, 10, replace=T)
df-data.frame(A,B,C)

df[df[,2]==2,2]-3
df$C-as.numeric(as.character(factor(df$C, labels=c(0,2,3

df
   A B C
1  0 3 3
2  0 1 2
3  0 3 2
4  1 0 3
5  1 0 3
6  2 3 2
7  1 3 2
8  2 3 3
9  1 1 0
10 0 0 3

 -Original Message-
 From: Walter Anderson [mailto:wandrso...@gmail.com]
 Sent: Friday, December 06, 2013 5:11 PM
 To: PIKAL Petr; r-help@r-project.org
 Subject: Re: [R] Need help figuring out sapply (and similar functions)
 with multiple parameter user defined function
 
 Thank you for your response!
 
 I am attempting to determine a preference from the answers to three
 binomial questions;
 
 q.1) 1 or 2q.2) 1 or 3q.3) 2 or 3
 
 However, the questions are coded with either a 1 or 2 (though no answer
 is also possible) and the first three functions (q#.ans) convert those
 values to the 1,2,or 3 shown above

Instead of those tricky ifs (uff uff) you can use either of these

df[df[,2]==2,2]-3
df$C-as.numeric(as.character(factor(df$C, labels=c(0,2,3

df
   A B C
1  0 3 3
2  0 1 2
3  0 3 2
4  1 0 3
5  1 0 3
6  2 3 2
7  1 3 2
8  2 3 3
9  1 1 0
10 0 0 3

And here I am lost again.

Please, can you clearly state the way how do you want to choose preferences 
based on values in those three columns.

Regards
Petr

 
 and generate one of the following result for each row of the table; 0 -
 no preference, or 1,2,3 which indicates the preference indicated by the
 question
 
 The if's implement the following state conditions:
 
# ID A  B  C  Preference
# 1  0  0  0  None
# 2  0  0  1  None
# 3  0  0  2  None
# 4  0  1  0  None
# 5  0  1  1  Option 1
# 6  0  1  2  None
# 7  0  2  0  None
# 8  0  2  1  None
# 9  0  2  2  Option 2
# 10 1  0  0  None
# 11 1  0  1  Option 1
# 12 1  0  2  None
# 13 1  1  0  Option 1
# 14 1  1  1  Option 1
# 15 1  1  2  Option 1
# 16 1  2  0  None
# 17 1  2  1  Option 1
# 18 1  2  2  Option 2
# 19 2  0  0  None
# 20 2  0  1  None
# 21 2  0  2  Option 2
# 22 2  1  0  None
# 23 2  1  1  Option 1
# 24 2  1  2  Option 2
# 25 2  2  0  Option 2
# 26 2  2  1  Option 2
# 27 2  2  2  Option 2
 
 The if statement only implements those values from the state machine
 that show a preference (ID's 5,9,11,13-15,17-18,21,23-27)
 
 On 12/06/2013 09:59 AM, PIKAL Petr wrote:
  Hi
 
  The warning is due to fact that if takes only single scalar value
 not an entire vector.
 
  Maybe you shall explain more clearly what result do you expect.
 
  I bet that there is vectorised solution to your problem but I am lost
 in your ifs and cannot follow what shall be the output.
 
  Please use
 
  dput(head(df))
 
  when showing input data and clearly describe intended result.
 
  Regards
  Petr
 
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
  project.org] On Behalf Of Walter Anderson
  Sent: Friday, December 06, 2013 4:44 PM
  To: r-help@r-project.org
  Subject: [R] Need help figuring out sapply (and similar functions)
  with multiple parameter user defined function
 
  I am having trouble understanding how to use sapply (or similar
  functions) with a user defined function with multiple parameters.
 
  I have the following functions defined
 
   q1.ans - function(x)
   {
  retVal = 0
  if (x == 1) {
retVal = 1
  } else if (x ==2) {
retVal = 2
  }
  return (retVal)
   }
   q2.ans - function(x)
   {
  retVal = 0
  if (x == 1) {
retVal = 1
  } else if (x ==2) {
retVal = 3
  }
  return (retVal)
   }
   q3.ans - function(x)
   {
  retVal = 0
  if (x == 1) {
retVal = 2
  } else if (x ==2) {
retVal = 3
  }
  return (retVal)
   }
 
   evaluate.questions - function(q.1,q.2,q.3)
   {
  a - q1.ans(q.1)
  b - q2.ans(q.2)
  c - q3.ans(q.3)
  retVal = 0   # Set default value to be no preference
  # The following code only implements those values from the
 state
   machine that show a preference (ID's 5,9,11,13-15,17-18,21,23-
 27)
  if (a == 0) {
if (b == 1) {
  if (c == 1) {
retVal = 1  # State machine ID 5
  }
} else if (b == 2) {
  if (c == 2) {
retVal = 2  # State machine ID 9
  }
}
  } else if (a == 1) {
if (b == 0) {
  if (c == 1) {
retVal = 1  # State machine ID 11
  }
} else if (b == 1) {
  retVal = 1# State machine ID's 13-15, value of C

Re: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function

2013-12-06 Thread William Dunlap
 I have been researching and it appears that I should be using the sapply
 function to apply the evaluate.question function above to each row in
 the data frame like this

Read the documentation more closely: sapply(dataFrame, func)
applies func() to each column, not row, of dataFrame.

 preferences - sapply(df, evaluate.questions, function(x,y,z)
 evaluate.questions(df['Q1'],df['Q2'],df['Q3']))

Furthermore,
sapply(X = dataFrame, FUN = func, extraArgument)
calls
func(dataFrame[, i], extraArgument)
for i in seq_len(ncol(dataFrame).

One problem is that FUN=evaluate.questions takes 3 arguments and
you give it only 2.  Another problem is that the third argument you
pass to sapply is a function (of 3 arguments) and FUN is not expecting
any of its arguments to be functions.

It may be easier for you to not use sapply here, but to use for-loops and
come up with something that works.  (Write tests that will indicate whether
it works or not in a variety of situations.)  Then transform it to use things
like ifelse() and sapply() to make it more readable and run faster.

 Unfortunately this doesn't work and the problem appears that the sapply
 function is not feeding the parameters to the evaluate.questions
 function as I expect.  Can someone provide some guidance on what I am
 doing wrong?

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Walter Anderson
 Sent: Friday, December 06, 2013 7:44 AM
 To: r-help@r-project.org
 Subject: [R] Need help figuring out sapply (and similar functions) with 
 multiple parameter
 user defined function
 
 I am having trouble understanding how to use sapply (or similar
 functions) with a user defined function with multiple parameters.
 
 I have the following functions defined
 
 q1.ans - function(x)
 {
retVal = 0
if (x == 1) {
  retVal = 1
} else if (x ==2) {
  retVal = 2
}
return (retVal)
 }
 q2.ans - function(x)
 {
retVal = 0
if (x == 1) {
  retVal = 1
} else if (x ==2) {
  retVal = 3
}
return (retVal)
 }
 q3.ans - function(x)
 {
retVal = 0
if (x == 1) {
  retVal = 2
} else if (x ==2) {
  retVal = 3
}
return (retVal)
 }
 
 evaluate.questions - function(q.1,q.2,q.3)
 {
a - q1.ans(q.1)
b - q2.ans(q.2)
c - q3.ans(q.3)
retVal = 0   # Set default value to be no preference
# The following code only implements those values from the state
# machine that show a preference (ID's 5,9,11,13-15,17-18,21,23-27)
if (a == 0) {
  if (b == 1) {
if (c == 1) {
  retVal = 1  # State machine ID 5
}
  } else if (b == 2) {
if (c == 2) {
  retVal = 2  # State machine ID 9
}
  }
} else if (a == 1) {
  if (b == 0) {
if (c == 1) {
  retVal = 1  # State machine ID 11
}
  } else if (b == 1) {
retVal = 1# State machine ID's 13-15, value of C doesn't matter
  } else if (b == 2) {
if (c == 1) {
  retVal = 1  # State machine ID 17
} else if (c == 2) {
  retVal = 2  # State machine ID 18
}
  }
} else if (a == 2) {
  if (b == 0) {
if (c == 2) {
  retVal = 2  # State machine ID 21
}
  } else if (b == 1) {
if (c == 1) {
  retVal = 1  # State machine ID 23
} else if (c == 2) {
  retVal = 2  # State machine ID 24
}
  } else if (b == 2) {
retVal = 2# State machine ID's 25-27, value of C doesn't matter
  }
}
return (retVal)
 }
 
 And a data set that looks like this:
 
 ID,Q1,Q2,Q3
 1,2,2,2
 2,2,1,1
 3,1,1,1
 4,1,2,2
 5,2,2,1
 6,1,2,1
 ...
 
 
 I have been researching and it appears that I should be using the sapply
 function to apply the evaluate.question function above to each row in
 the data frame like this
 
 preferences - sapply(df, evaluate.questions, function(x,y,z)
 evaluate.questions(df['Q1'],df['Q2'],df['Q3']))
 
 Unfortunately this doesn't work and the problem appears that the sapply
 function is not feeding the parameters to the evaluate.questions
 function as I expect.  Can someone provide some guidance on what I am
 doing wrong?
 
 This is the error message I am getting:
 
 Error in x --1 :
Comparison (1) is possible only for atomic and list types
 In addition: warning messages:
 In if (x == 1) { :
the condition has length  1 and only the first element will be used
 
   [[alternative HTML version deleted]]
 
 __
 

[R] Gene Ontology Profiling on Single Data Set with Different Species?

2013-12-06 Thread Sarah Pohl
Hey everyone,

I have a list of genes for which I would like to get Gene Ontology profiles 
(i.e. what are the most common GO terms). First I had a look at topGO, but 
since that compares two data sets, which I don’t have, it wasn’t right for this 
purpose. I then found goProfiles, which seems to do exactly what I wanted, but 
there is one problem: the genes I have don’t all come from the same organism, 
so there’s no organism annotation package.
Do you know of any other R package that would do the trick if I give it my list 
of genes and their GO terms? Or do I have to create my own annotation package 
and then use goProfiles?

Regards,
Sarah

-

Sarah Pohl
PhD student

Helmholtz Centre for Infection Research

eMail: sarah.p...@helmholtz-hzi.de




Helmholtz-Zentrum für Infektionsforschung GmbH | Inhoffenstraße 7 | 38124 
Braunschweig | www.helmholtz-hzi.de
Das HZI ist seit 2007 zertifiziertes Mitglied im audit berufundfamilie

Vorsitzende des Aufsichtsrates: MinDir’in Bärbel Brumme-Bothe, 
Bundesministerium für Bildung und Forschung
Stellvertreter: MinDirig Rüdiger Eichel, Niedersächsisches Ministerium für 
Wissenschaft und Kultur
Geschäftsführung: Prof. Dr. Dirk Heinz
Gesellschaft mit beschränkter Haftung (GmbH)
Sitz der Gesellschaft: Braunschweig
Handelsregister: Amtsgericht Braunschweig, HRB 477
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mixed model ANCOVA

2013-12-06 Thread laurie bayet
Hi,

I want to set up a mixed model ANCOVA but cannot find a way to do it.

There is:

* 1 subject factor (random, between subjects) called Subject
* 3 categorical within subjects factors called Emotion, Sex, Race
* 1 continuous covariate (**WITHIN subjects**) called Score
and
* a continuous dependent variable called logRT

I need a nice and clean table with p-values and effect sizes for each
factors and relevant interactions.

Which function should I use?

I am guessing lmer from lme4 but could not find any example on the forums
or on my manual from Gaël Millot.

Here is a wild guess :

 ModelRT - lmer(logRT ~ Race + Sex+ Emotion + Score + Race*Sex +
Race*Emotion + Sex*Emotion + Race*Sex*Emotion + (1 | Subject))

Would that be correct ?

Thank you,

laurie

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help - Trace of matrices

2013-12-06 Thread Wagner Bonat
Dear,

I need to calculate the following equation

tr(Sigma^-1 %*% D.Sigma)

I know only Sigma (positive definite) and D.Sigma (derivative of Sigma), a
naive code is

sum(diag(solve(Sigma,D.Sigma)))

but these matrices are dense and big dimension (1 x 1), and I need
to evaluate this equation many times.
What is the better way to evaluate this equation in R ?
Note that I need only the diagonal, I think is possible to calculate only
the diagnonal, but how ??

-- 
Wagner Hugo Bonat
LEG - Laboratório de Estatística e Geoinformação
UFPR - Universidade Federal do Paraná

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Easy Uplift Tree Classify Error

2013-12-06 Thread Brian
Does anyone know if the error being generated when trying to predict 
test set data in the Easy Uplift Tree package is something fixable by 
the user or is this a bug in the program making the package essentially 
non-operable?
This is from the package documentation and fails on the last step of 
applying the model to the test set:



install.packages(EasyUpliftTree)
library(EasyUpliftTree)
library(survival)
data(colon)


#APPEARS TO WORK

sample.data - na.omit(colon[colon$rx != Lev  colon$etype == 2, ])
treat - ifelse(sample.data$rx == Lev+5FU, 1, 0)
y - ifelse(sample.data$status == 0, 1, 0)
x - sample.data[, c(4:9, 11:14)]
x$v1 - factor(x$sex)
x$v2 - factor(x$obstruct)
x$v3 - factor(x$perfor)
x$v4 - factor(x$adhere)
x$v5 - factor(x$differ)
x$v6 - factor(x$extent)
x$v7 - factor(x$surg)
x$v8 - factor(x$node4)


index - 1:nrow(x)
train.index - index[(index%%2 == 0)]
test.index - index[index%%2 != 0]
y.train - y[train.index]
x.train - x[train.index, ]
treat.train - treat[train.index]
y.test - y[test.index]
x.test - x[test.index, ]
treat.test - treat[test.index]
uplift.tree - buildUpliftTree(y.train, treat.train, x.train)
print(uplift.tree)


#FAILS

apply(1:nrow(x.test), function(i) classify(uplift.tree, x.test[i, ]))

#Error in match.fun(FUN) : argument FUN is missing, with no default

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help - Trace of matrices

2013-12-06 Thread Doran, Harold
A fast computation I use is based on the following:

A - matrix(rnorm(16), ncol = 4)
B - matrix(rnorm(16), ncol = 4)
C - A %*% B
sum(diag(C))

### This is less expensive to compute when the matrix multiplication is 
expensive
sum(A * t(B))

So, it just uses the elementwise calculations and sums over all cels

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Wagner Bonat
Sent: Friday, December 06, 2013 12:02 PM
To: r-help@r-project.org
Subject: [R] Help - Trace of matrices

Dear,

I need to calculate the following equation

tr(Sigma^-1 %*% D.Sigma)

I know only Sigma (positive definite) and D.Sigma (derivative of Sigma), a 
naive code is

sum(diag(solve(Sigma,D.Sigma)))

but these matrices are dense and big dimension (1 x 1), and I need to 
evaluate this equation many times.
What is the better way to evaluate this equation in R ?
Note that I need only the diagonal, I think is possible to calculate only the 
diagnonal, but how ??

--
Wagner Hugo Bonat
LEG - Laboratório de Estatística e Geoinformação UFPR - Universidade Federal do 
Paraná

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function

2013-12-06 Thread Walter Anderson

On 12/06/2013 10:43 AM, William Dunlap wrote:

I have been researching and it appears that I should be using the sapply
function to apply the evaluate.question function above to each row in
the data frame like this

Read the documentation more closely: sapply(dataFrame, func)
applies func() to each column, not row, of dataFrame.

I misunderstood.  I thought it was apply the func to each row...  My mistake

preferences - sapply(df, evaluate.questions, function(x,y,z)
evaluate.questions(df['Q1'],df['Q2'],df['Q3']))

Furthermore,
 sapply(X = dataFrame, FUN = func, extraArgument)
calls
 func(dataFrame[, i], extraArgument)
for i in seq_len(ncol(dataFrame).

One problem is that FUN=evaluate.questions takes 3 arguments and
you give it only 2.  Another problem is that the third argument you
pass to sapply is a function (of 3 arguments) and FUN is not expecting
any of its arguments to be functions.
I will need to think about this, I am not sure I understand.  I really 
don't seem to understand how any of the apply functions seem to work.

It may be easier for you to not use sapply here, but to use for-loops and
come up with something that works.  (Write tests that will indicate whether
it works or not in a variety of situations.)  Then transform it to use things
like ifelse() and sapply() to make it more readable and run faster.
I already have tested my functions by using a for loop, and they work.  
Here is the for loop I use.


for (indx in 1:length(df$ID)) {
df$Preference - 
evaluate.questions(df$Q1[indx],df$Q2[indx],df$Q3[indx])

}

I understand that such for loops aren't 'best practice' in R and am 
trying to learn its approach.  Thank you for the suggestions!

Unfortunately this doesn't work and the problem appears that the sapply
function is not feeding the parameters to the evaluate.questions
function as I expect.  Can someone provide some guidance on what I am
doing wrong?

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mixed model ANCOVA

2013-12-06 Thread Ben Bolker
laurie bayet lauriebayet at gmail.com writes:

 
 Hi,
 
 I want to set up a mixed model ANCOVA but cannot find a way to do it.
 
 There is:
 
 * 1 subject factor (random, between subjects) called Subject
 * 3 categorical within subjects factors called Emotion, Sex, Race
 * 1 continuous covariate (**WITHIN subjects**) called Score
 and
 * a continuous dependent variable called logRT
 
 I need a nice and clean table with p-values and effect sizes for each
 factors and relevant interactions.
 
 Which function should I use?
 
 I am guessing lmer from lme4 but could not find any example on the forums
 or on my manual from Gaël Millot.
 
 Here is a wild guess :
 
  ModelRT - lmer(logRT ~ Race + Sex+ Emotion + Score + Race*Sex +
 Race*Emotion + Sex*Emotion + Race*Sex*Emotion + (1 | Subject))
 
 Would that be correct ?
 
 Thank you,
 
 laurie
 

* This might be better on r-sig-mixed-mod...@r-project.org
* In R '*' indicates main effects plus all interactions (':' is
for an interaction only), so you can abbreviate your formula to

ModelRT - lmer(logRT ~ Race*Sex*Emotion + (1 | Subject))

or using lme from the nlme package:

ModelRT - lme(logRT~Race*Sex*Emotion, random=~1|Subject)

* You should strongly consider passing an explicit 'data' argument
rather than picking up the variables from the workspace
* See ?pvalues in lme4 for some of your choices about getting
tables of p-values and effect sizes (e.g. with auxiliary functions
from the car, lmerTest, or pbkrtest packages). Beware that lme
will give you denominator and degrees of freedom, but the degrees
of freedom may very likely be miscalculated for your within-subject
continuous covariate
* You should strongly consider whether you need to include
among-subject variance in the within-subject factors in your model
[see the two refs below]

@article{barr_random_2013,
title = {Random effects structure for confirmatory hypothesis testing: Keep
it maximal},
volume = {68},
issn = {0749-{596X}},
shorttitle = {Random effects structure for confirmatory hypothesis testing},
url = {http://www.sciencedirect.com/science/article/pii/S0749596X12001180},
doi = {10.1016/j.jml.2012.11.001},
abstract = {Linear mixed-effects models ({LMEMs)} have become increasingly
prominent in psycholinguistics and related areas. However, many researchers
do not seem to appreciate how random effects structures affect the
generalizability of an analysis. Here, we argue that researchers using
{LMEMs} for confirmatory hypothesis testing should minimally adhere to the
standards that have been in place for many decades. Through theoretical
arguments and Monte Carlo simulation, we show that {LMEMs} generalize best
when they include the maximal random effects structure justified by the
design. The generalization performance of {LMEMs} including data-driven
random effects structures strongly depends upon modeling criteria and sample
size, yielding reasonable results on moderately-sized samples when
conservative criteria are used, but with little or no power advantage over
maximal models. Finally, random-intercepts-only {LMEMs} used on
within-subjects and/or within-items data from populations where subjects
and/or items vary in their sensitivity to experimental manipulations always
generalize worse than separate F1 and F2 tests, and in many cases, even
worse than F1 alone. Maximal {LMEMs} should be the ‘gold standard’ for
confirmatory hypothesis testing in psycholinguistics and beyond.},
number = {3},
urldate = {2013-09-26},
journal = {Journal of Memory and Language},
author = {Barr, Dale J. and Levy, Roger and Scheepers, Christoph and Tily,
Harry J.},
month = apr,
year = {2013},
keywords = {Generalization, Linear mixed-effects models, Monte Carlo
simulation, statistics},
pages = {255--278}
}

@article{schielzeth_conclusions_2009,
title = {Conclusions beyond support: overconfident estimates in mixed models},
volume = {20},
issn = {1045-2249, 1465-7279},
shorttitle = {Conclusions beyond support},
url = {http://beheco.oxfordjournals.org/content/20/2/416},
doi = {10.1093/beheco/arn145},
abstract = {Mixed-effect models are frequently used to control for the
nonindependence of data points, for example, when repeated measures from the
same individuals are available. The aim of these models is often to estimate
fixed effects and to test their significance. This is usually done by
including random intercepts, that is, intercepts that are allowed to vary
between individuals. The widespread belief is that this controls for all
types of pseudoreplication within individuals. Here we show that this is not
the case, if the aim is to estimate effects that vary within individuals and
individuals differ in their response to these effects. In these cases,
random intercept models give overconfident estimates leading to conclusions
that are not supported by the data. By allowing individuals to differ in the
slopes of their responses, it is possible to account for the nonindependence
of data points that pseudoreplicate 

Re: [R] Open multiple files using a loop

2013-12-06 Thread arun
Hi Chris,

May be this helps.

#Suppose the working directory is `FirstLevel`
D - dir(recursive=TRUE)
 D
#[1] S1/S1data.txt S2/S2data.txt S3/S3data.txt
sapply(D,function(x) nrow(read.table(x,sep=,header=TRUE)))
#S1/S1data.txt S2/S2data.txt S3/S3data.txt 
#   20    20    20 

res - do.call(rbind,lapply(D,function(x) read.table(x,sep=,header=TRUE)))
 dim(res)
#[1] 60  2
A.K.


Dear R/Arun 

I would like to open 50 text different files (S1data; S2data; 
S3data etc.) and rbind() them into a single data.frame or matrix. Is 
there a way doing this with a loop or in some other time-saving manner? 

`S1data` - read.table(~/fmridata/FirstLevel/S1/S1data, header=T, quote=\) 
`S2data` - read.table(~/fmridata/FirstLevel/S2/S2data, header=T, quote=\) 
`S3data` - read.table(~/fmridata/FirstLevel/S3/S3data, header=T, quote=\) 

etc… to S50 

alldata - rbind(S1data, S2data, S3data etc… to 50) 


This type of idea (assuming each file has 10 rows (x50=500) and 25 columns): 

subjects - c(S1,S2,S3 etc… to S50) 
alldata - matrix(nrow = 500, ncol=25, byrow=TRUE) 

for(i in 1:50) { 
`subject[i]data` - 
read.table(~/fmridata/FirstLevel/(subject[i])/subject[i]data, header=T, 
quote=\) 

alldata[i,] - subject[i]data 

} 

Thanks, 
Chris

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with the nested anova formulas

2013-12-06 Thread Ben Bolker
Robert Lynch robert.b.lynch at gmail.com writes:

  I am modeling grade as a function of membership in
  various cohorts.  There
 are four cohorts.  (NONE, ISE07,ISE08,ISE09) and two times of cohorts
 coded as ISE = TRUE (ISE0#) or FALSE (NONE).  There is clear co-linearity
 but that is to be expected.
 
  running the following code
 
  CutOff -0
  fit.base - lme(fixed= zGrade ~ Rep + COHORT/ISE + P7APrior + Female +
 White + HSGPA + MATH + AP_TOTAL + Years + EOP + Course,
  random= ~1|SID,
  data = share[share$GRADE = CutOff,])
 
  I get the following error
 
  Error in MEEM(object, conLin, control$niterEM) :
Singularity in backsolve at level 0, block 1
 
  but if I take out the /ISE I get no error, simmilarly if I take out the
 COHORT/.
 
  I want to test for the effects of the different cohorts within the ISE
 subset and across ISE  NONE
 
 I can send the data (the whole is too large) if you wish.

  Please send this to r-sig-mixed-mod...@r-project.org for more
discussion.

  The short answer is that lme can't fit models with rank-deficient
fixed effect model matrices -- in other words, there are redundant
parameters in your model because COHORT and ISE between them use
6 parameters to model 4 independent quantities.

http://stats.stackexchange.com/questions/35071/
  what-is-rank-deficiency-and-how-to-deal-with-it

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] p value for mu: anova()

2013-12-06 Thread Ben Bolker
Rosario Garcia Gil M.Rosario.Garcia at slu.se writes:

 
 Hello
 
 I have run an anova analysis for
 the fallowing model:  H_obs=mu+REGION+MANAGEMENT + e
 
 When I run it in ASRelm I get the p-value for mu, and,
 of course also for the two dependent variables (REGION
 and MANAGEMENT)
 
 When I run it in R, I do not get the pvalue for mu.
 
 Can some one help me to understand why? 
 and if it is possible to estimate the pvalue for mu in anova() in R?

   You may be wondering why no-one has answered your question ...
(1) it's way too vague and (2) the attached file probably got
stripped by the mailing list software before anyone saw it.
(Even if #2 weren't true, people are unlikely to take the time
to answer a really vague question if it means digging into a
data file to figure out what's going on.)

  See for example http://tinyurl.com/reproducible-000
  What *exact* code are you running?  an anova analysis is too vague.
 
  Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function

2013-12-06 Thread William Dunlap
 I understand that such for loops aren't 'best practice' in R and am
 trying to learn its approach. 

sapply() is an encapsulated loop and loops have their place in R.
'Best practice' is a nebulous term, but explicit loops can make
code that is hard to understand (by a compiler or by a human)
and any loop at the R-code level will generally make code run
more slowly.  However, depending on your background, explicit
loops may be easier for you to write and understand, so you
may get an answer faster by using loops.

 Then transform it to use things
  like ifelse() and sapply() to make it more readable and run faster.

Changing your 'if' statements to calls to the vectorized 'ifelse' will
probably make looping unneeded.  E.g., your q1.ans() only works
on a scalar, forcing you to use sapply (or the superior vapply) to
work on vectors:

q1.ans - function(x)
{
   retVal = 0
   if (x == 1) {
 retVal = 1
   } else if (x ==2) {
 retVal = 2
   }
   return (retVal)
}
as in
 q1.ans(1:3)
   [1] 1 
   Warning message:
   In if (x == 1) { :
 the condition has length  1 and only the first element will be used
sapply(1:3, q1.ans)
   [1] 1 2 0

You can change it to work on a vector by using ifelse:
   q1a.ans - function(x) {
  ifelse(x==1,
 1,  # return 1's where x had 1's
 ifelse(x==2,
   2, # return 2's where x had 2's
   0)) # return 0 where x had something else
}
used as
 q1a.ans(1:3)
   [1] 1 2 0

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: Walter Anderson [mailto:wandrso...@gmail.com]
 Sent: Friday, December 06, 2013 9:58 AM
 To: William Dunlap; r-help@r-project.org
 Subject: Re: [R] Need help figuring out sapply (and similar functions) with 
 multiple
 parameter user defined function
 
 On 12/06/2013 10:43 AM, William Dunlap wrote:
  I have been researching and it appears that I should be using the sapply
  function to apply the evaluate.question function above to each row in
  the data frame like this
  Read the documentation more closely: sapply(dataFrame, func)
  applies func() to each column, not row, of dataFrame.
 I misunderstood.  I thought it was apply the func to each row...  My mistake
  preferences - sapply(df, evaluate.questions, function(x,y,z)
  evaluate.questions(df['Q1'],df['Q2'],df['Q3']))
  Furthermore,
   sapply(X = dataFrame, FUN = func, extraArgument)
  calls
   func(dataFrame[, i], extraArgument)
  for i in seq_len(ncol(dataFrame).
 
  One problem is that FUN=evaluate.questions takes 3 arguments and
  you give it only 2.  Another problem is that the third argument you
  pass to sapply is a function (of 3 arguments) and FUN is not expecting
  any of its arguments to be functions.
 I will need to think about this, I am not sure I understand.  I really
 don't seem to understand how any of the apply functions seem to work.
  It may be easier for you to not use sapply here, but to use for-loops and
  come up with something that works.  (Write tests that will indicate whether
  it works or not in a variety of situations.)  Then transform it to use 
  things
  like ifelse() and sapply() to make it more readable and run faster.
 I already have tested my functions by using a for loop, and they work.
 Here is the for loop I use.
 
 for (indx in 1:length(df$ID)) {
  df$Preference -
 evaluate.questions(df$Q1[indx],df$Q2[indx],df$Q3[indx])
 }
 
 I understand that such for loops aren't 'best practice' in R and am
 trying to learn its approach.  Thank you for the suggestions!
  Unfortunately this doesn't work and the problem appears that the sapply
  function is not feeding the parameters to the evaluate.questions
  function as I expect.  Can someone provide some guidance on what I am
  doing wrong?
  Bill Dunlap
  Spotfire, TIBCO Software
  wdunlap tibco.com
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] model selection with step()

2013-12-06 Thread Adams, Jean
Karen,

Look at the help for the drop1() function.
?drop1

There you will see, The hierarchy is respected when considering terms to
be added or dropped: all main effects contained in a second-order
interaction must remain, and so on.

So, for fit2, the step() function will only consider dropping a main effect
(e.g., X3) if there are no interactions involving that effect in the model.
 That's why only after X1:X3 and X2:X3 are dropped, do you see X3 being
considered for dropping in your example.

Jean



On Thu, Dec 5, 2013 at 11:27 PM, Karen Keating keati...@ksu.edu wrote:

 I am using the step() function to select a model using backward
 elimination, with AIC as the selection criterion.  The full regression
 model contains three predictors, plus all the second order terms and
 two-way interactions.  The full model is fit via lm() using two different
 model formulae.  One formula uses explicitly defined variables for the
 second-order and interaction terms and the other formula uses the I(x^2)
 and colon operators.  The fit generated by lm() is exactly the same for
 both models, but when I pass these fitted models to the step() function, I
 get two different results.  Apparently, step() does not recognize the three
 main predictors unless the second order and interaction terms are
 explicitly defined as separate variables.

 I assigned this problem to my first-year graduate students, not realizing
 that R would give two different answers.  Now I have to re-grade their
 homework, but I would really like to give them a reasonable explanation for
 the discrepancy.

 The complete code is given below.

 Could anyone shed some light on this mystery?

 Thanks in advance,
 Karen Keating
 Kansas State University


 # Exercise 9.13, Kutner, Nachtsheim, Neter  Li
 temp- scan()
 49.0   45.0   36.0   45.0
 55.0   30.0   28.0   40.0
 85.0   11.0   16.0   42.0
 32.0   30.0   46.0   40.0
 26.0   39.0   76.0   43.0
 28.0   42.0   78.0   27.0
 95.0   17.0   24.0   36.0
 26.0   63.0   80.0   42.0
 74.0   25.0   12.0   52.0
 37.0   32.0   27.0   35.0
 31.0   37.0   37.0   55.0
 49.0   29.0   34.0   47.0
 38.0   26.0   32.0   28.0
 41.0   38.0   45.0   30.0
 12.0   38.0   99.0   26.0
 44.0   25.0   38.0   47.0
 29.0   27.0   51.0   44.0
 40.0   37.0   32.0   54.0
 31.0   34.0   40.0   36.0

 dat- matrix(temp,ncol=4,nrow=length(temp)/4,byrow=T)
 colnames(dat)-c('Y','X1','X2','X3')
 dat - data.frame(dat)
 attach(dat)

 # second order terms and interactions
 X12-X1*X2
 X13-X1*X3
 X23-X2*X3
 X1sq - X1^2
 X2sq - X2^2
 X3sq - X3^2

 fit1 - lm(Y~ X1sq  + X2sq  + X3sq  +X1+X2+X3+ X12 + X13 + X23 )
 fit2 - lm(Y~I(X1^2)+I(X2^2)+I(X3^2)+X1+X2+X3+X1:X2+X1:X3+X2:X3)
 sum( abs(fit1$res - fit2$res) ) # 0, so fitted models are the same
 dim(model.matrix(fit1))   # 19 x 10
 dim(model.matrix(fit2))   # 19 x 10

 dim(fit1$model)  # 19 x 10
 dim(fit2$model)  # 19 x 7 -- could this cause the discrepancy?

 back1 - step(fit1,direction='backward')
 back2 - step(fit2,direction='backward')
 # Note that 'back1' considers the three primary predictors X1, X2 and X3,
 # while 'back2' does not.

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] quantiles with approximately the same number of data points within each quantile?

2013-12-06 Thread Anika Masters
What is a good way to create quantiles with approximately the same
number of data points within each quantile?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need help figuring out sapply (and similar functions) with multiple parameter user defined function

2013-12-06 Thread Walter Anderson

Thanks again!

Can the ifelse statement be nested like

ifelse(condition1,
ifelse(condition2,yes,no)
ifelse(condition3,yes,no)
)

?
On 12/06/2013 12:23 PM, William Dunlap wrote:

I understand that such for loops aren't 'best practice' in R and am
trying to learn its approach.

sapply() is an encapsulated loop and loops have their place in R.
'Best practice' is a nebulous term, but explicit loops can make
code that is hard to understand (by a compiler or by a human)
and any loop at the R-code level will generally make code run
more slowly.  However, depending on your background, explicit
loops may be easier for you to write and understand, so you
may get an answer faster by using loops.


Then transform it to use things
like ifelse() and sapply() to make it more readable and run faster.

Changing your 'if' statements to calls to the vectorized 'ifelse' will
probably make looping unneeded.  E.g., your q1.ans() only works
on a scalar, forcing you to use sapply (or the superior vapply) to
work on vectors:

 q1.ans - function(x)
 {
retVal = 0
if (x == 1) {
  retVal = 1
} else if (x ==2) {
  retVal = 2
}
return (retVal)
 }
as in
  q1.ans(1:3)
[1] 1
Warning message:
In if (x == 1) { :
  the condition has length  1 and only the first element will be used
 sapply(1:3, q1.ans)
[1] 1 2 0

You can change it to work on a vector by using ifelse:
q1a.ans - function(x) {
   ifelse(x==1,
  1,  # return 1's where x had 1's
  ifelse(x==2,
2, # return 2's where x had 2's
0)) # return 0 where x had something else
 }
used as
  q1a.ans(1:3)
[1] 1 2 0

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com



-Original Message-
From: Walter Anderson [mailto:wandrso...@gmail.com]
Sent: Friday, December 06, 2013 9:58 AM
To: William Dunlap; r-help@r-project.org
Subject: Re: [R] Need help figuring out sapply (and similar functions) with 
multiple
parameter user defined function

On 12/06/2013 10:43 AM, William Dunlap wrote:

I have been researching and it appears that I should be using the sapply
function to apply the evaluate.question function above to each row in
the data frame like this

Read the documentation more closely: sapply(dataFrame, func)
applies func() to each column, not row, of dataFrame.

I misunderstood.  I thought it was apply the func to each row...  My mistake

preferences - sapply(df, evaluate.questions, function(x,y,z)
evaluate.questions(df['Q1'],df['Q2'],df['Q3']))

Furthermore,
  sapply(X = dataFrame, FUN = func, extraArgument)
calls
  func(dataFrame[, i], extraArgument)
for i in seq_len(ncol(dataFrame).

One problem is that FUN=evaluate.questions takes 3 arguments and
you give it only 2.  Another problem is that the third argument you
pass to sapply is a function (of 3 arguments) and FUN is not expecting
any of its arguments to be functions.

I will need to think about this, I am not sure I understand.  I really
don't seem to understand how any of the apply functions seem to work.

It may be easier for you to not use sapply here, but to use for-loops and
come up with something that works.  (Write tests that will indicate whether
it works or not in a variety of situations.)  Then transform it to use things
like ifelse() and sapply() to make it more readable and run faster.

I already have tested my functions by using a for loop, and they work.
Here is the for loop I use.

for (indx in 1:length(df$ID)) {
  df$Preference -
evaluate.questions(df$Q1[indx],df$Q2[indx],df$Q3[indx])
}

I understand that such for loops aren't 'best practice' in R and am
trying to learn its approach.  Thank you for the suggestions!

Unfortunately this doesn't work and the problem appears that the sapply
function is not feeding the parameters to the evaluate.questions
function as I expect.  Can someone provide some guidance on what I am
doing wrong?

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] quantiles with approximately the same number of data points within each quantile?

2013-12-06 Thread Don McKenzie
By default I believe.  See

http://en.wikipedia.org/wiki/Quantile

Others more erudite may correct me.

On Dec 6, 2013, at 11:47 AM, Anika Masters anika.mast...@gmail.com wrote:

 What is a good way to create quantiles with approximately the same
 number of data points within each quantile?
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

Don McKenzie
Research Ecologist
Pacific Wildland Fire Science Lab
US Forest Service

Affiliate Professor
School of Environmental and Forest Sciences
University of Washington
d...@uw.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] quantiles with approximately the same number of data points within each quantile?

2013-12-06 Thread Rui Barradas

Hello,

Use function ?quantile.
See this example, each group has exactly, not approximately, 25 elements.

x - rnorm(100)

qnt - quantile(x)
tapply(x, findInterval(x, qnt, rightmost.closed = TRUE), length)

Hope this helps,

Rui Barradas

Em 06-12-2013 19:47, Anika Masters escreveu:

What is a good way to create quantiles with approximately the same
number of data points within each quantile?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to generate a smoothed surface for a three dimensional dataset?

2013-12-06 Thread David Winsemius

On Dec 5, 2013, at 9:46 PM, A Xi Ma wrote:

 The following question is inspired by Jun's problem, which resembles some
 of my own problems, but goes off on a tangent about applying plot3D from
 Karline Soetart.
 
 
 On Thu, Dec 5, 2013 at 11:52 PM, Bert Gunter gunter.ber...@gene.com wrote:
 
 
 Your comment that:
 
  I can see the critical point here is to find a right function to
 make the prediction. 
 
 is what indicates to me that your critical point is that you have
 insufficient knowledge and need help. Feel free to disagree, of
 course.
 
 
 I don't know if it's true for Jun, but it's definitely true for me - I have
 insufficient knowledge! I'm out of my depth with surface estimation, but I
 have to learn how to do it, one way or the other.

 
 Currently I'm reading the docs for plot3d.
 
 I loaded the package into rstudio and ran some of the examples.  The
 image2D example seems to get its data from a data.frame called volcano
 with a small v.

Right. the 'volcano'-object is a standard data object for demonstration of R 
graphics. It resides in the datasets package and has a help file:

help(volcano)

 
 imag2D  nr - nrow(volcano)
 
 imag2D  nc - ncol(volcano)
 
 imag2D  image2D(volcano, x = 1:nr, y = 1:nc, lighting = TRUE,
 imag2D+main = volcano, clab = height, m)
 

 
 The objects() command shows a Volcano with a big V.  The small-v and
 big-V volcanoes are not the same, because the str command shows:
 
snipped superfluous output from an objects()-command.

 str(Volcano)
 num [1:29, 1:21] 100 103 105 108 110 116 120 122 123 118 ...
 str(volcano)
 num [1:87, 1:61] 100 101 102 103 104 105 105 106 107 108 ...

They are both matrices. The Volcano matrix has only one-ninth the number of 
values.

The first small section of the volcano vignette reads:

1. Intro

To make this vignette smaller, the size of volcano is reduced:

 # Reduce the resolution
 Volcano - volcano[seq(1, nrow(volcano), by = 3),

 seq(1, ncol(volcano), by = 3)]
-
So that code just selects every third of the values of the 'volcano' matrix.



 
 I don't understand how the volcano object works well enough to power the
 image2D command, but doesn't show up in objects().

It is accessible by functions although it is not visible in the workspace.

 str(volcano)
 num [1:87, 1:61] 100 101 102 103 104 105 105 106 107 108 ...
 'volcano' %in% ls()
[1] FALSE

If you want to get it into the workspace, you just use the data() function:

 data('volcano')
 'volcano' %in% ls()
[1] TRUE# now visible



  At first I thought
 there was some kind of secret smuggling compartment in memory space, and
 nr and nc and volcano were all hidden in that secret place.  But in
 fact, nr and nc show up in objects().
 
 So ... I am even less educated than the other newbies on the list, and I'm
 following along, and I really don't see how R is doing what it's doing.
 Should I be reading the plot3D .pdf textbooks, or should I give up and go
 back to some much more basic textbook?

I'm thinking you are not yet ready for plot3D. It's unclear what level of 
effort you have put in to reading and mastering the Introduction to R or 
whatever text you are using to educate yourself. I certainly do not think that 
a beginning tutorial in R was the goal that the authors of the plot3D package 
had in mind. Even before posting to Rhelp you are expected to have studied the 
available documentation and learned enough R to be able to answer all the 
questions you posed. So I suggest studying your copy of Introduction to R that 
is shipped with every binary of R.

 
 Thanks.
 
   [[alternative HTML version deleted]]

And you should learn to post in plain text. Please do read the Posting Guide.

 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] rms 4.1-0

2013-12-06 Thread Frank Harrell

The rms package has had several updates in version 4.1-0:

   * Fixed orm.fit to not create penalty matrix if not needed 
(penalties are not yet implemented anyway)

   * Added yscale argument to plot.Predict
   * Added Wald test simulation to orm help file
   * Added example in help file for plot.anova.rms of adding a line 
combining the effects of two predictors in dot chart

   * Fixed grid interpretation error in survplot.survfit
   * Changed plot.anova.rms to use dotchart3 instead of dotchart2
   * Fixed bug in summary.rms - was taking reciprocal of effect ratio 
with orm even if not loglog family (thanks: Yong Hao Pua 
puayong...@gmail.com

   * Removed link to print.lm, summary.lm in ols.Rd
   * Added ntrans argument to plot.anova.rms
   * Fixed handling of intercepts in Rq, validate.Rq
   * Removed residuals.Glm, residuals.rms (also from Rd, NAMESPACE)
   * Removed other .rms methods and other remnants from fooling S+ 
dispatcher
   * Fixed bug in lm.pfit when penalty used (thanks: Yong Hao Pua 
puayong...@gmail.com)

   * Fixed bug in calibrate.default for ols (thanks: Andy Bush)
   * Change print.contrast.rms to insert NA for SE if fun is not the 
identity function
   * Added margin argument to plot.anova.rms to print selected stats in 
right margin of dot chart
   * Added anova argument to plot.Predict to allow overall association 
test statistics to be added to panels
   * Fixed bug in val.prob in which the logistic model was re-fitted 
instead of fixing coefficients at 0,1.  This resulted in model 
statistics (including c-index) to always be favorable even when 
predictions were worse than change.  Thanks: Kirsen Van Hoorde 
kirsten.vanhoo...@esat.kuleuven.be
   * Fixed bug in survdiffplot where conf.int was always overridden by 
value from survfit.  Thanks: Kamil Fijorek kamilfijo...@gmail.com
   * Fixed bug in grid= for survplot.* and survdiffplot.  Thanks: Kamil 
Fijorek
   * Fixed rms.s to account for possible offset in names(nmiss). 
Thanks: Larry Hunsicker
   * Fixed psm.s to not compute Dxy if simple right censoring is not in 
effect.  Thanks: I.M. Nolte
   * rcs: respect system option fractied, passed to rcspline.eval; can 
be used to get old behavior

   * Gls: as nlme 3.1-113 exports more functions, removed nlme:::

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R survAUC Package

2013-12-06 Thread David Winsemius

On Dec 4, 2013, at 7:49 AM, kevinod wrote:

 I have a concern about the survAUC package option AUC.cd.  
 

So shouldn't you be sending this to the package authors? They may or may not be 
regular readers of R help. It's apackage I have never heard of.

 I am exploring package functionality, specifically AUC statistics for Cox
 Regression, for a small academic project
 
 When utilizing this package on the ovarian data set within that package I
 obtain an AUC statistic of 0.3322928.  When AUC calculations use a
 dichotomous outcome such as this, see included R code, the result should lie
 between 0.5 and 1, not 0.33.
 
 Please explain this, I am not certain that the algorithm that is being
 utilized for this package is correct.
 
 Thank you
 
 Kevin O’Donnell, MS Work Environment, MS Env Eng.,  MS Const Project Mgmt
 Graduate Student
 Department of Biostatistics
 Boston University School of Public Health
 715 Albany Street Boston, MA
 
 617-480-1677
 
 x11(h=8,w=11)
 fit = survfit(Surv(futime,fustat) ~ rx)
 plot(fit, mark.time=FALSE, xscale=365.25,main=Plot of Survival Curves by
 Prescription Status,
xlab='Length of Survival', ylab='Proportion of Individuals who have
 Survived')
 lines(fit[1], lwd=3,lty=2:3, xscale=365.24,col=2)
 lines(fit[2], lwd=2,lty=2:2, xscale=365.24,col=3)
 legend(.2,.2, c(No treatment, treatment), lwd=3,lty = 2:3) 
 
 
 TR2 = ovarian[1:16,]
 TE2 = ovarian[17:26,]
 train.fit2  = coxph(Surv(futime, fustat) ~ rx,
x=TRUE, y=TRUE, method=efron, data=TR)
 lp2 = predict(train.fit) 
 lpnew2 = predict(train.fit2, newdata=TE2)
 Surv.rsp2 = Surv(TR2$futime, TR2$fustat)
 Surv.rsp.new2 = Surv(TE2$futime, TE2$fustat)
 times2 = seq(10, 1000, 10)  
 
 AUC_CD2 = AUC.cd(Surv.rsp2, Surv.rsp.new2, lp2, lpnew2, times2)
 
 
 AUC_hc2 = AUC.hc(Surv.rsp2, Surv.rsp.new2, lpnew2, times2)
 
 
 AUC_sh2 = AUC.sh(Surv.rsp2, Surv.rsp.new2, lp2, lpnew2, times2)
 
 
 AUC_Uno2 = AUC.uno(Surv.rsp2, Surv.rsp.new2, lpnew2, times2)
 
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/R-survAUC-Package-tp4681638.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using assign with mapply

2013-12-06 Thread Julio Sergio Santana
I have a data frame whose first colum contains the names of the variables 
and whose second colum contains the values to assign to them:

   : kkk - data.frame(vars=c(var1, var2, var3), 
 vals=c(10, 20, 30), stringsAsFactors=F)

If I do 

   : assign(kkk$vars[1], kkk$vals[1])

it works

   : var1
   [1] 10

However, if I try with mapply this is what I get:

   : mapply(assign, kkk$vars, kkk$vals)
   var1 var2 var3 
 10   20   30 
   : var2
   Error: object 'var2' not found

Maybe I have not undestand how mapply and assign work. Do you have
any comments?

Thanks,

  -Sergio.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wrong date fromat?

2013-12-06 Thread arun
Hi,
Try
vec1 - 10958:10963 
 as.Date(vec1,origin=1960-01-01)
#[1] 1990-01-01 1990-01-02 1990-01-03 1990-01-04 1990-01-05
#[6] 1990-01-06

A.K.



I have imported a stata data into R and wanted to convert the date. 
 The format went OK, but the output doesn't represent my data. The head 
of the imported data is this one 

 head(df$date) 
[1] 10958 10959 10960 10961 10962 10963 

I tried to convert the date using the zoo package: 

library(zoo) 
df$date-as.Date(df$date) 
head(df$date) 

 head(df$date) 
[1] 2000-01-02 2000-01-03 2000-01-04 2000-01-05 2000-01-06 
2000-01-07 

However my date starts with January 1, 1990 and the converted data starts from 
January 2, 2000. 

What have I done wrong?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using assign with mapply

2013-12-06 Thread David Winsemius

On Dec 6, 2013, at 11:27 AM, Julio Sergio Santana wrote:

 I have a data frame whose first colum contains the names of the variables 
 and whose second colum contains the values to assign to them:
 
   : kkk - data.frame(vars=c(var1, var2, var3), 
 vals=c(10, 20, 30), stringsAsFactors=F)
 
 If I do 
 
   : assign(kkk$vars[1], kkk$vals[1])
 
 it works
 
   : var1
   [1] 10
 
 However, if I try with mapply this is what I get:
 
   : mapply(assign, kkk$vars, kkk$vals)
   var1 var2 var3 
 10   20   30 
   : var2
   Error: object 'var2' not found
 
 Maybe I have not undestand how mapply and assign work. Do you have
 any comments?

I think you will find that the value returned from the mapply call was a three 
element list with the desired names and values  ... except you then gave that 
enclosing list no name and it will be garbage-collected. If you want to have 
'assign' do its magic into the global environment, then you need to supply 
'mapply' a MoreArgs argument on the other side of the ellipsis:

Usage:
mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE,
   USE.NAMES = TRUE)

So what happens if you try this:

mapply(assign,  kkk$vars, kkk$vals, MoreArgs = list(envir = .GlobalEnv)

-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Wrong date fromat?

2013-12-06 Thread Daniel Nordlund
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of arun
 Sent: Friday, December 06, 2013 3:11 PM
 To: R help
 Subject: Re: [R] Wrong date fromat?
 
 Hi,
 Try
 vec1 - 10958:10963
  as.Date(vec1,origin=1960-01-01)
 #[1] 1990-01-01 1990-01-02 1990-01-03 1990-01-04 1990-01-05
 #[6] 1990-01-06
 
 A.K.
 
 
 
 I have imported a stata data into R and wanted to convert the date.
  The format went OK, but the output doesn't represent my data. The head
 of the imported data is this one
 
  head(df$date)
 [1] 10958 10959 10960 10961 10962 10963
 
 I tried to convert the date using the zoo package:
 
 library(zoo)
 df$date-as.Date(df$date)
 head(df$date)
 
  head(df$date)
 [1] 2000-01-02 2000-01-03 2000-01-04 2000-01-05 2000-01-06
 2000-01-07
 
 However my date starts with January 1, 1990 and the converted data starts
 from January 2, 2000.
 
 What have I done wrong?
 

You need to specify an appropriate value for the origin parameter.  It looks 
like as.Date in the zoo package (which masks the as.Date in base) defaults to 
the Unix epoch value, origin='1970-01-01'.  Your Stata values are based on 
origin='1960-01-01' as your first example specified.

Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] kmeans clustering on large but sparse matrix

2013-12-06 Thread Wuming Gong
Hi Lishu,

I run into the similar large-scale problems recently.  I used a parallel
SGD k-means described in this paper for my problem:

http://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf

Let n be the samples, k be the number of clusters, and m be the number of
nodes,
1.  First, each node reads n / m sample data, and randomly generate enough
'mini batches' (size of mini-batch and SGD iterations must be determined
beforehand)
2.  Sample k / m centers from the samples on each node
3.  Update the centers, by using the mini-batches generated at the first
step.  Note that at this stage it is not necessary to hold the sample data
on each node.
4.  Once the centers are optimized by SGD, compute the distance matrix
between samples and centers.  I used spherical k-means so this step can be
divided into a series of block matrix multiplication to save memory.

Note that each node only needs to hold partial sample data and partial
centers, so this method can work on 'regular' MPI environment and do not
need the shared memory architecture.

I used pbdMPI to parallelize the algorithm.

hope this helps.

Wuming







On Wed, Jan 18, 2012 at 3:37 PM, Lishu Liu lishu...@gmail.com wrote:

 Hi,

 I have a 60k*600k matrix, which exceed the vector length limit of 2^32-1.
 But it's rather sparse, only 0.02% has value. So I save is as MarketMatrix
 (mm) file, it's about 300M in size. I use readMM in Matrix package to read
 it in. If do so, the data type becomes dgTMatrix in 'Matrix' package
 instead of the common matrix type.

 The problem is, if I run k-means only on part of the data, to make sure the
 vector length do not exceed 2^32-1, there's no problem at all. Meaning that
 the kmeans in R could recognize this type of matrix.
 If I run the entire matrix, R says too many elements specified.

 I have considered the 'bigmemory' and 'biganalytics' packages. But to save
 the sparse matrix as common CSV file would take approx 70G and 99% being 0.
 I just don't think it's necessary or efficient to treat it as a dense
 matrix.

 It there anyway to deal with the vector length limit? Can I split the whole
 matrix into small ones and then do k-means?



 Thanks,
 Lishu

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tune an support vector machine

2013-12-06 Thread Wuming Gong
Hi Uwe,

It looks SVM in e1071 and Kernlab does not support feature selection, but
you can take a look at package penalizedSVM (
http://cran.r-project.org/web/packages/penalizedSVM/penalizedSVM.pdf).

Or you can implement a SVM-RFE (
http://axon.cs.byu.edu/Dan/778/papers/Feature%20Selection/guyon*.pdf) by
the alpha values returned by svm() in e1071 or ksvm() in Kernlab.

Wuming


On Fri, Dec 6, 2013 at 7:06 AM, Uwe Bohne balu...@gmx.de wrote:


Hej all,

actually i try to tune a SVM in R and use the package e1071 wich works
pretty well.
I do some gridsearch in the parameters and get the best possible
 parameters
for classification.
Here is my sample code

type-sample(c(-1,1) , 20, replace = TRUE )
weight-sample(c(20:50),20, replace=TRUE)
height-sample(c(100:200),20, replace=TRUE)
width-sample(c(30:50),20,replace=TRUE)
volume-sample(c(1000:5000),20,replace=TRUE)

data-cbind(type,weight,height,width,volume)
train-as.data.frame(data)
library(e1071)

features - c(weight,height,width,volume)
(formula-as.formula(paste(type ~ , paste(features, collapse= +

svmtune=tune.svm(formula,  data=train, kernel=radial, cost=2^(-2:5),
gamma=2^(-2:1),cross=10)
summary(svmtune)

My question is if there is a way to tune the features.

So in other words - what i wanna do is to try all possible combinations
 of
features : for example use only (volume) or use (weight, height) or use
(height,volume,width) and so on for the SVM  and to get the best
 combination
back.


Best wishes

Uwe
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.