date:20100909

Did you try taking out P7, which is text? Moreover, if you get a message
saying ' the standard deviation is zero', it means that the entire column is
constant. By definition, the covariance of a constant with a random variable
is 0, but your data consists of values, so cor() understandably throws a
warning that one or more of your columns are constant. Applying the
following to your data (which I named expd instead),  we get

sapply(expd[, -12], var)
  P1   P2   P3   P4   P5
P6
5.43e-01 1.08e+00 5.77e-01 1.08e+00 6.43e-01
5.57e-01
  P8   P9  P10  P11  P12
SITE
5.73e-01 3.19e+00 5.07e-01 2.50e-01 5.50e+00
2.49e+00
  Errors warnings   ManualTotalH_tot
HP1.1
9.072840e+03 2.081334e+04 7.43e-01 3.823500e+04 3.880250e+03
2.676667e+00
   HP1.2HP1.3HP1.4   HP_totHO1.1
HO1.2
0.00e+00 2.008440e+03 3.057067e+02 3.827250e+03 8.40e-01
0.00e+00
   HO1.3HO1.4   HO_totHU1.1HU1.2
HU1.3
0.00e+00 0.00e+00 8.40e-01 0.00e+00 2.10e-01
2.27e-01
  HU_tot   HRL_totLP1.1LP1.2
LP1.3
6.23e-01 7.43e-01 3.754610e+03 3.209333e+01 0.00e+00
2.065010e+03
   LP1.4   LP_totLO1.1LO1.2LO1.3
LO1.4
2.246233e+02 3.590040e+03 3.684000e+01 0.00e+00 0.00e+00
2.84e+00
  LO_totLU1.1LU1.2LU1.3   LU_tot
LR_tot
6.00e+01 0.00e+00 1.44e+00 3.626667e+00 8.37e+00
4.94e+00
  SP_totSP1.1SP1.2SP1.3SP1.4
SP_tot.1
6.911067e+02 4.225000e+01 0.00e+00 1.009600e+02 4.161600e+02
3.071600e+02
   SO1.1SO1.2SO1.3SO1.4   SO_tot
SU1.1
4.54e+00 2.50e-01 0.00e+00 2.10e-01 5.25e+00
0.00e+00
   SU1.2SU1.3   SU_tot   SR
1.556667e+00 4.225000e+01 3.504000e+01 4.225000e+01

Which columns are constant?
which(sapply(expd[, -12], var)  .Machine$double.eps)
HP1.2 HO1.2 HO1.3 HO1.4 HU1.1 LP1.2 LO1.2 LO1.3 LU1.1 SP1.2 SO1.3 SU1.1
   192425262835404144515760

I suspect that in your real data set, there aren't so many constant columns,
but this is one way to check.

HTH,
Dennis

On Wed, Sep 8, 2010 at 12:35 PM, Stephane Vaucher vauch...@iro.umontreal.ca
 wrote:

 Hi everyone,

 I'm observing what I believe is weird behaviour when attempting to do
 something very simple. I want a correlation matrix, but my matrix seems to
 contain correlation values that are not found when executed on pairs:

  test2$P2

  [1] 2 2 4 4 1 3 2 4 3 3 2 3 4 1 2 2 4 3 4 1 2 3 2 1 3

 test2$HP_tot

  [1]  10  10  10  10  10  10  10  10 136 136 136 136 136 136 136 136 136
 136  15
 [20]  15  15  15  15  15  15 c=cor(test2$P3,test2$HP_tot,method='spearman')

 c

 [1] -0.2182876

 c=cor(test2,method='spearman')

 Warning message:
 In cor(test2, method = spearman) : the standard deviation is zero

 write(c,file='out.csv')


 from my spreadsheet
 -0.25028783918741

 Most cells are correct, but not that one.

 If this is expected behaviour, I apologise for bothering you, I read the
 documentation, but I do not know if the calculation of matrices and pairs is
 done using the same function (eg, with respect to equal value observations).

 If this is not a desired behaviour, I noticed that it only occurs with a
 relatively large matrix (I couldn't reproduce on a simple 2 column data
 set). There might be a naming error.

  names(test2)

  [1] ID   NOMBRE   MAIL
  [4] Age  SEXO Studies
  [7] Hours_Internet   Vision.Disabilities  Other.disabilities
 [10] Technology_Knowledge Start_Time   End_Time
 [13] Duration P1   P1Book
 [16] P1DVDP2   P3
 [19] P4   P5   P6
 [22] P8   P9   P10
 [25] P11  P12  P7
 [28] SITE Errors   warnings
 [31] Manual   TotalH_tot
 [34] HP1.1HP1.2HP1.3
 [37] HP1.4HP_tot   HO1.1
 [40] HO1.2HO1.3HO1.4
 [43] HO_tot   HU1.1HU1.2
 [46] HU1.3HU_tot   HR
 [49] L_totLP1.1LP1.2
 [52] LP1.3LP1.4LP_tot
 [55] LO1.1LO1.2LO1.3
 [58] LO1.4LO_tot   LU1.1
 [61] LU1.2LU1.3LU_tot
 [64] LR_tot   SP_tot   SP1.1
 [67] SP1.2SP1.3SP1.4
 [70] SP_tot.1 SO1.1SO1.2
 [73] SO1.3SO1.4

[R] advise on operations speed with Rcpp,Boost::ipc Shared Memory

2010-09-09 Thread raje...@cse.iitm.ac.in


Hi,

I have an implementation where I transfer data records via shared memory to an 
R program. If anyone has prior experience, I'd like to find out which would be 
faster

1) storing data records in shared memory as they are(in a matrix) and then use 
the Rcpp::wrap to convert them to R datatypes
2) merge the records into a string and store records as strings. Then use R 
functions like strsplit,lapply etc to convert them to their original matrix 
form.

Any help is appreciated
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help with HB analysis in R for a conjoint study Data

2010-09-09 Thread Vijayan Padmanabhan


Dear Group
I was referring to a conjoint analysis scenario using R  from the paper 
referred below:
Agricultural Information Research 17(2),2008,86-94
available online at  www.jstage.jst.go.jp/
This paper describes the data modelling of a conjoint study design based 
on conditional logit procedure.
I understand that Heirarchical Bayes is asymptotically equivalent to 
Conditionallogit. However it would be of interest if somebody is willing 
to share the script to fit this data using HB in R (I understand that 
bayesm package supports HB , but I am not able to figure out exactly how 
to model this example data and interpret it).
Thanks in Advance.
Regards
Vijayan Padmanabhan


What is expressed without proof can be denied without proof - Euclide. 


Can you avoid printing this?
Think of the environment before printing the email.
---
Please visit us at www.itcportal.com
**
This Communication is for the exclusive use of the intended recipient (s) and 
shall
not attach any liability on the originator or ITC Ltd./its Subsidiaries/its 
Group 
Companies. If you are the addressee, the contents of this email are intended 
for your 
use only and it shall not be forwarded to any third party, without first 
obtaining 
written authorisation from the originator or ITC Ltd./its Subsidiaries/its 
Group 
Companies. It may contain information which is confidential and legally 
privileged
and the same shall not be used or dealt with by any third party in any manner 
whatsoever without the specific consent of ITC Ltd./its Subsidiaries/its Group 
Companies.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Feature selection via glmnet package (LASSO)

Hi:

When you need to search for a function in R, rely on our good friend, the
package sos:

library(sos)
findFn('elastic net')
found 23 matches;  retrieving 2 pages

HTH,
Dennis

On Wed, Sep 8, 2010 at 6:58 PM, jjenkner jjenk...@web.de wrote:


 Hello Lai!

 You can try the elastic net which is a mixture of lasso and ridge
 regression. Setting the parameter alpha to less than one will provide you
 with more coefficients different from zero. I am not sure about the R
 implementation. You have to search for it on your own.

 Johannes
 --
 View this message in context:
 http://r.789695.n4.nabble.com/Feature-selection-via-glmnet-package-LASSO-tp2308635p2532271.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating with tolerances

2010-09-09 Thread Bernardo Rangel Tura

On Thu, 2010-09-09 at 09:16 +0430, Jan private wrote:
 Dear list,
 
 I am from an engineering background, accustomed to work with tolerances.
 
 For example, I have measured
 
 Q = 0.15 +- 0.01 m^3/s
 H = 10 +- 0.1 m
 
 and now I want to calculate
 
 P = 5 * Q * H 
 
 and get a value with a tolerance +-
 
 What is the elegant way of doing this in R?
 
 Thank you,
   Jan

Hi Jan,

If I understood  your problem this script solve your problem:

q-0.15 + c(-.1,0,.1)
h-10 + c(-.1,0,.1)
5*q*h
[1]  2.475  7.500 12.625

-- 
Bernardo Rangel Tura, M.D,MPH,Ph.D
National Institute of Cardiology
Brazil

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 'par mfrow' and not filling horizontally

2010-09-09 Thread Ivan Calandra

  Hi!

I think you've got already all useful solutions, but I usually just 
change mfrow to c(2,2).
There is then free space left, but I usually edit my graphs in 
Illustrator anyway.

Ivan

Le 9/8/2010 21:01, (Ted Harding) a écrit :
 Greetings, Folks.
 I'd appreciate being shown the way out of this one!
 I've been round the documentation in ever-drecreasing
 circles, and along other paths, without stumbling on
 the answer.

 The background to the question can be exemplified by
 the example (no graphics window open to start with):

set.seed(54321)
X0- rnorm(50) ; Y0- rnorm(50)

par(mfrow=c(2,1),mfg=c(1,1),cex=0.5)
plot(X0,Y0,pch=+,col=blue,xlim=c(-3,3),ylim=c(-3,3),
xlab=X,ylab=Y,main=My Plot,asp=1)

par(mfg=c(2,1))
plot(X0,Y0,pch=+,col=blue,xlim=c(-3,3),ylim=c(-3,3),
xlab=X,ylab=Y,main=My Plot,asp=1)

 As you will see, both plots have been extended laterally
 to fill the plotting area horizontally, hence extend from
 approx X = -8 to approx X = +8 (on my X11 display), despite
 the xlim=c(-3,3); however, the ylim=c(-3,3) has been
 respected, as has asp=1.

 What I would like to see, independently of the shape of
 the graphics window, is a pair of square plots, each with
 X and Y ranging from -3 to 3, even if this leaves empty
 space in the graphics window on either side.

 Hints?

 With thanks,
 Ted.

 
 E-Mail: (Ted Harding)ted.hard...@manchester.ac.uk
 Fax-to-email: +44 (0)870 094 0861
 Date: 08-Sep-10   Time: 20:01:19
 -- XFMail --

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calan...@uni-hamburg.de

**
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] See what is inside a matrix

Hi:

One possibility is a heatmap, although there are other approaches.

x - matrix(sample(1:100, 1, replace = TRUE), nrow = 100)
image(x)
xx - apply(x, 1, sort)   # sorts the rows of x
image(xx)

# ggplot2 version:
library(ggplot2)
ggplot(melt(x), aes(x=X1, y=X2, fill=value)) + geom_tile() +
  scale_fill_gradientn(colour = terrain.colors(10))

See the online help page http://had.co.nz/ggplot2/scale_gradientn.html
for several examples of choosing color ranges in scale_fill_gradientn(). To
get similar control over image, change the col = argument according to the
description on the help page of image - ?image .

Another alternative is an enhanced heatmap function in package gplots. I'll
leave that to you to investigate...

HTH,
Dennis

On Thu, Sep 9, 2010 at 12:22 AM, Alaios ala...@yahoo.com wrote:

 Hello everyone.. Is there any graphical tool to help me see what is inside
 a
 matrix? I have 100x100 dimensions matrix and as you already know as it does
 not
 fit on my screen R splits it into pieces.

 I would like to thank you in advance for your help
 Best Regards
 Alex




[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with outer

2010-09-09 Thread tuggi


thank you for your answers, but my problem is that i want plot the function
guete for the variables p_11 and p_12 between zero and one. that means that
i also want to plot p_11=0.7 and p_12=0.3. but with a=0.4 and b=0.6 and
p_11=seq(0,a,0.05*a) and p_12=seq(0,b,0.0*b) i cannot do that.
i hope you have an other idea.

tuggi
-- 
View this message in context: 
http://r.789695.n4.nabble.com/problem-with-outer-tp2532074p2532550.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Strange output daply with empty strata

2010-09-09 Thread Jan van der Laan


Dear list,

I get some strange results with daply from the plyr package. In the  
example below, the average age per municipality for employed en  
unemployed is calculated. If I do this using tapply (see code below) I  
get the following result:


no  yes
A   NA 36.94931
B 51.22505 34.24887
C 48.05759 51.00198

If I do this using daply:

municipality   no  yes
   A 36.94931 48.05759
   B 51.22505 51.00198
   C 34.24887   NA

daply generates the same numbers. However, these are not in the  
correct cells. For example, in municipality A everybody is employed.  
Therefore, the NA should be in the cell for unemployed in municipality  
A.


Am I using daply incorrectly or is there indeed something wrong with  
the output of daply?


Regards,

Jan


I am using version 1.1 of the plyr-package.


# Generate some test data
data.test - data.frame(
  municipality=rep(LETTERS[1:3], each=10),
  employed=sample(c(yes, no), 30, replace=TRUE),
  age=runif(30,20,70))
# Make sure everybody is employed in municipality A
data.test$employed[data.test$municipality == A] - yes

# Compare the output of tapply:
tapply(data.test$age, list(data.test$municipality, data.test$employed),
mean)
# to that of daply:
daply(data.test, .(municipality, employed), function(d){mean(d$age)} )
# results of ddply are the samen as tapply
ddply(data.test, .(municipality, employed), function(d){mean(d$age)} )

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] markov model

2010-09-09 Thread Isaac SAGAON TEYSSIER


Dear all,
 
I would like some help to writing the likelihood function for the 
continuous-time markov model, even if it can be calculated with the msm 
package, I need to know how it is calculated
 
Thank you
 
Luis  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Which language is faster for numerical computation?

2010-09-09 Thread Christofer Bogaso

Dear all, R offers integration mechanism with different programming
languages like C, C++, Fortran, .NET etc. Therefore I am curious on,
for heavy numerical computation which language is the fastest? Is
there any study? I specially want to know because, if there is some
study saying that C is the fastest language for numerical computation
then I would change some of my R code into C.

Thanks for your time.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Making R lazy

2010-09-09 Thread Lorenzo Isella


Dear All,
I hope this is not too off-topic.
I am wondering if there is any possibility to make an R code lazy i.e. 
to prevent it from calculating quantities which are not used in the code.
As an example: you are in a rush to modify your code and at the end it 
ends up with dead branches, let's say a sequence which is calculated 
but not used in any following calculations, not printed on screen, not 
stored in a file etc...
It would be nice to teach R to automagically skip its calculation when I 
run the script (at least in a non-interactive way).
I know that such a situation is probably the result of bad programming 
habits, but it may arise all the same.
If I understand correctly, what I am asking for is something different 
from any kind of garbage collection which would take place, if ever, 
only after the array has been calculated.
Any suggestions (or clarifications if I am on the wrong track) are 
appreciated.

Cheers

Lorenzo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Strange output daply with empty strata

Hi:

Here's what I tried:

# data frame versions (aggregate, ddply):

aggregate(age ~ municipality + employed, data = data.test, FUN = mean)
  municipality employed  age
1B   no 55.57407
2C   no 44.67463
3A  yes 41.58759
4B  yes 43.59330
5C  yes 43.82545

ddply(data.test, .(municipality, employed), summarise, mean = mean(age))
  municipality employed mean
1A  yes 41.58759
2B   no 55.57407
3B  yes 43.59330
4C   no 44.67463
5C  yes 43.82545

It appears that aggregate() silently removes groups where no observations
are present, but ddply() has an option .drop, which when set to FALSE,
returns NaN for the not employed group in municipality A:

ddply(data.test, .(municipality, employed), summarise, avgage = mean(age),
.drop = FALSE)
  municipality employed   avgage
1A   no  NaN
2A  yes 41.58759
3B   no 55.57407
4B  yes 43.59330
5C   no 44.67463
6C  yes 43.82545

#  tapply/daply

with(data.test, tapply(age, list(municipality, employed), mean))
no  yes
A   NA 41.58759
B 55.57407 43.59330
C 44.67463 43.82545

daply(data.test, .(municipality, employed), function(d){mean(d$age)} )
employed
municipality   no  yes
   A 41.58759 44.67463
   B 55.57407 43.82545
   C 43.59330   NA

The .drop argument has a different meaning in daply. Some R functions have
an na.last argument, and it may be that somewhere in daply, there is a
function call that moves all NAs to the end. The means are in the right
order except for the first, where the NA is supposed to be, so everything is
offset in the table by 1. I've cc'ed Hadley on this.

HTH,
Dennis


On Thu, Sep 9, 2010 at 2:43 AM, Jan van der Laan rh...@eoos.dds.nl wrote:

 Dear list,

 I get some strange results with daply from the plyr package. In the example
 below, the average age per municipality for employed en unemployed is
 calculated. If I do this using tapply (see code below) I get the following
 result:

no  yes
 A   NA 36.94931
 B 51.22505 34.24887
 C 48.05759 51.00198

 If I do this using daply:

 municipality   no  yes
   A 36.94931 48.05759
   B 51.22505 51.00198
   C 34.24887   NA

 daply generates the same numbers. However, these are not in the correct
 cells. For example, in municipality A everybody is employed. Therefore, the
 NA should be in the cell for unemployed in municipality A.

 Am I using daply incorrectly or is there indeed something wrong with the
 output of daply?

 Regards,

 Jan


 I am using version 1.1 of the plyr-package.


 # Generate some test data
 data.test - data.frame(
  municipality=rep(LETTERS[1:3], each=10),
  employed=sample(c(yes, no), 30, replace=TRUE),
  age=runif(30,20,70))
 # Make sure everybody is employed in municipality A
 data.test$employed[data.test$municipality == A] - yes

 # Compare the output of tapply:
 tapply(data.test$age, list(data.test$municipality, data.test$employed),
 mean)
 # to that of daply:
 daply(data.test, .(municipality, employed), function(d){mean(d$age)} )
 # results of ddply are the samen as tapply
 ddply(data.test, .(municipality, employed), function(d){mean(d$age)} )

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] confidence intervals around p-values

2010-09-09 Thread Fernando Marmolejo Ramos

Dear all

I wonder if anyone has heard of confidence intervals around p-values...

Any pointer would be highly appreciated.

Best

Fer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R-help Digest, Vol 91, Issue 9

2010-09-09 Thread Fernando Marmolejo Ramos

dear all

I wonder if anyone has heard of confidence intervals around p-values...

Any pointer would be highly appreciated.

Best

Fer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating with tolerances (error propagation)

2010-09-09 Thread Jan private

Hello Bernardo,

-
If I understood  your problem this script solve your problem:

q-0.15 + c(-.1,0,.1)
h-10 + c(-.1,0,.1)
5*q*h
[1]  2.475  7.500 12.625
-

OK, this solves the simple example. 
But what if the example is not that simple. E.g.

P = 5 * q/h

Here, to get the maximum tolerances for P, we need to divide the maximum
value for q by the minimum value for h, and vice versa. Is there any way
to do this automatically, without thinking about every single step?

There is a thing called interval arithmetic (I saw it as an Octave
package) which would do something like this.

I would have thought that tracking how a (measuring) error propagates
through a complex calculation would be a standard problem of
statistics?? In other words, I am looking for a data type which is a
number with a deviation +- somehow attached to it, with binary operators
that automatically knows how to handle the deviation.

Thank you,  
Jan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] UseR groups: NewJersey R - LondonR - BaselR

2010-09-09 Thread Sarah Lewis

NewJerseyR

Mango Solutions is pleased to announce the inaugural meeting of NewJerseyR a 
networking and social event for all local and regional R users.

Thank you to those of you already registered to attend the first NewJerseyR 
meeting on Thursday 16th September and to those of you who have already joined 
our mailing list for future NewJerseyR events.   Please note that we are 
announcing a CHANGE OF VENUE for the meeting next week. The new venue is 
adjacent to the originally advertised venue so we are confident that this late 
change should cause no inconvenience.

Date:Thursday 16th September 2010
Venue:  Kona Grill, 511 Route One South, Iselin, New Jersey  08830
Time:6:30pm - 9:30pm (talks start at 7pm)

Presentations at the event will be:

- Richard Pugh, Mango Solutions: MSToolkit 3.0: Clinical Trial Simulation 
using R
- Max Kuhn, Pfizer: The Caret Package: A Unified Interface for Predictive 
Models
- Mani Subramaniam, ATT Labs: tsX: An R package for the exploratory analysis 
of a large collection of time-series 
http://user2010.org/abstracts/Subramaniam+Varadhan+Urbanek+Epstein_3.pdf%20%20
- Brian McHugh, Bristol Myers Squibb: How R can orchestrate bootstrapping

Free drinks and snacks will be available.

Mailing List - To ensure you receive details of all future NewJerseyR meeting 
please ask to join our mailing list by emailing us at:  
newjers...@mango-solutions.com  


LondonR

Thank you to everyone who attended the July LondonR meeting and big thanks to 
Chris Campbell, Matthew Dowle and Andy Nicholls for presenting. 
Past presentations are available at 
http://londonr.org/LondonR-20090331/Agenda.html 

As ever, we need volunteers to present at all future meetings. If you feel you 
have something to input into this meeting or can recommend someone else, we 
would be delighted to hear from you.

The next LondonR meeting will be held on the 5th October 2010

Venue:  Counting House - 50 Cornhill, London, EC3V 3PD  Tel: 020 7283 
7123 
(Nearest tube is Bank exit 4 or 5. Opposite Star Bucks)
Time:6pm - 9pm
Agenda:to be confirmed

The following LondonR meetings will be held:

* 8th December 2010
* 9th March 2011
(Agenda and venue to be confirmed)

 To register, for more information or to speak at the next LondonR meeting 
please email us at lond...@mango-solutions.com

BaselR

Mango Solutions would like to thank all who came to the BaselR meeting on 
Wednesday 29th July. We were delighted to see so many in attendance.
Our particular thanks go to the following for their extremely interesting 
presentations:

Andrew Ellis, ETH Zurich - Desktop Publishing with Sweave
Dominik Locher, THETA AG - Professional Reporting with RExcel  
Sebastian Pérez Saaibi, ETH Zurich - R Generator Tool for Google Motion Charts

These presentations are available at http://www.baselr.org/Presentations.html 

The next BaselR meeting will be on Wednesday 13th October 2010.

Time:   6:30pm - 9:30pm (talks start at 7pm)
Venue: transBARent,  Viaduktstrasse 3  CH-4051 Basel

Full details of presentation for this next meeting will be published in due 
course.

If you would like to attend the next BaselR meeting we would ask you to please 
register ahead of the meeting in order to help us with our planning. Please 
register by emailing:  bas...@mango-solutions.com 

Mailing List - To ensure you receive details of all future BaselR meeting 
please ask to join our mailing list by emailing us at:  
bas...@mango-solutions.com 
 


All of our UseR group meetings are free. All we ask is for attendees to 
register prior to the event so that we can cater for everyone.

Mango Solutions run public R courses as well as private, customised R courses. 
Please visit http://mango-solutions.com/training.html 

For more information about Mango Solutions please contact us at 
i...@mango-solutions.com or visit our website www.mango-solutions.com 



Sarah Lewis

Hadley Wickham, Creator of ggplot2 - first time teaching in the UK. 1st - 2nd  
November 2010. 
To book your seat please go to http://mango-solutions.com/news.html 

T: +44 (0)1249 767700 Ext: 200
F: +44 (0)1249 767707
M: +44 (0)7746 224226
www.mango-solutions.com
Unit 2 Greenways Business Park 
Bellinger Close
Chippenham
Wilts
SN15 1BN
UK 

LEGAL NOTICE
This message is intended for the use o...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] try-error can not be test. Why?


On 08/09/2010 11:46 PM, Philippe Grosjean wrote:

On 08/09/10 19:25, David Winsemius wrote:

On Sep 8, 2010, at 1:18 PM, telm8 wrote:


Hi,

I am having some strange problem with detecting try-error. From what I
have read so far the following statement:


try( log(a) ) == try-error


should yield TRUE, however, it yields FALSE. I can not figure out why.
Can
someone help?


  class(try( log(a), silent=TRUE )) == try-error
[1] TRUE


This is perfectly correct in this case, but while we are mentioning a 
test on the class of an object, the better syntax is:


  inherits(try(log(a)), try-error)

In a more general context, class may be defined with multiple strings (R 
way of subclassing S3 objects). For instance, this does not work:


  if (class(Sys.time()) == POSIXct) ok else not ok

... because the class of a `POSIXct' object is defined as: c(POSIXt, 
POSIXct). This works:


Getting even further off track:  another advantage of inherits() is that 
the class can change.  For example, in the upcoming 2.12.0 release, the 
class of Sys.time() will be


 class(Sys.time())
[1] POSIXct POSIXt

Putting the names in the reverse order was a relic from ancient times 
that will soon be corrected.


The tests below won't care about this change, but some more fragile 
tests might.


Duncan Murdoch



  if (inherits(Sys.time(), POSIXct)) ok else not ok

Alternate valid tests would be (but a little bit less readable):

  if (any(class(Sys.time()) == POSIXct)) ok else not ok

or, by installing the operators package, a less conventional, but 
cleaner code:


  install.packages(operators)
  library(operators)
  if (Sys.time() %of% POSIXct) ok else not ok

Best,

Philippe Grosjean


Many thanks
--
View this message in context:
http://r.789695.n4.nabble.com/try-error-can-not-be-test-Why-tp2531675p2531675.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Which language is faster for numerical computation?

2010-09-09 Thread Rainer M Krug

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 09/09/10 12:26, Christofer Bogaso wrote:
 Dear all, R offers integration mechanism with different programming
 languages like C, C++, Fortran, .NET etc. Therefore I am curious on,
 for heavy numerical computation which language is the fastest? Is
 there any study? I specially want to know because, if there is some
 study saying that C is the fastest language for numerical computation
 then I would change some of my R code into C.

As far as I am aware, the two choices are C and Fortran - where it
depends on the calculations, which one is faster.

Cheers,

Rainer

 
 Thanks for your time.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Natural Sciences Building
Office Suite 2039
Stellenbosch University
Main Campus, Merriman Avenue
Stellenbosch
South Africa

Tel:+33 - (0)9 53 10 27 44
Cell:   +27 - (0)8 39 47 90 42
Fax (SA):   +27 - (0)8 65 16 27 82
Fax (D) :   +49 - (0)3 21 21 25 22 44
Fax (FR):   +33 - (0)9 58 10 27 44
email:  rai...@krugs.de

Skype:  RMkrug
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkyIxHYACgkQoYgNqgF2egofwQCePvkN3xewbSNEeKUiuxlL7Utx
CxMAniGRwoAWfJ8VNTPHLXIbtpiZCrMd
=fmFx
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] modulo operation

2010-09-09 Thread José M. Blanco Moreno


 Dear R-users,
May be there is something that I am not understanding, missed or else...
Why do these operations yield these results?
 25%/%0.2
[1] 124
 25%%0.2
[1] 0.2

I would expect (although I know that what I do expect and what is really 
intended in the code may be different things)

 25/0.2
[1] 125
 25 - floor(25/0.25)*0.25
[1] 0

(At least this second one is what I would expect from the code in 
arithmetic.c, lines 168 to 178)


--
---
José M. Blanco-Moreno

Dept. de Biologia Vegetal (Botànica)
Facultat de Biologia
Universitat de Barcelona
Av. Diagonal 645
08028 Barcelona
SPAIN
---

phone: (+34) 934 039 863
fax: (+34) 934 112 842

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] createDataPartition

2010-09-09 Thread Trafim Vanishek

Dear all,

does anyone know how to define the structure of the required samples using
function createDataPartition, meaning proportions of different types of
variable in the partition?
Smth like this for iris data:

createDataPartition(y = c(setosa = .5, virginica = .3, versicolor = .2),
times = 10, p = .7, list = FALSE)

Thanks a lot for your help.

Regards,
Trafim

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error in normalizePath(path) : with McAfee


On 09/09/2010 12:01 AM, Erin Hodgess wrote:

Dear R People:

I keep getting the Error in normalizePath(path) :  while trying to
obtain the necessary packages to use with the Applied Spatial
Statistics with R book.

I turned off the Firewall (from McAfee) but am still getting the same message.

Does anyone have any idea on a solution please?


I think you need to show us your code and the error in context.

Duncan Murdoch




sessionInfo()

R version 2.11.1 (2010-05-31)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] ctv_0.6-0

loaded via a namespace (and not attached):
[1] tools_2.11.1





Thanks,
Erin





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] modulo operation

2010-09-09 Thread Barry Rowlingson

2010/9/9 José M. Blanco Moreno jmbla...@ub.edu:
  Dear R-users,
 May be there is something that I am not understanding, missed or else...
 Why do these operations yield these results?
 25%/%0.2
 [1] 124
 25%%0.2
 [1] 0.2

 I would expect (although I know that what I do expect and what is really
 intended in the code may be different things)
 25/0.2
 [1] 125
 25 - floor(25/0.25)*0.25
 [1] 0

 (At least this second one is what I would expect from the code in
 arithmetic.c, lines 168 to 178)

Did you read the documentation before you read the code?

 ‘%%’ and ‘x %/% y’ can be used for non-integer ‘y’, e.g. ‘1 %/%
 0.2’, but the results are subject to rounding error and so may be
 platform-dependent.  Because the IEC 60059 representation of ‘0.2’
 is a binary fraction slightly larger than ‘0.2’, the answer to ‘1
 %/% 0.2’ should be ‘4’ but most platforms give ‘5’.

I suspect that is relevant to your interests

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Making R lazy


On 09/09/2010 6:27 AM, Lorenzo Isella wrote:

Dear All,
I hope this is not too off-topic.
I am wondering if there is any possibility to make an R code lazy i.e. 
to prevent it from calculating quantities which are not used in the code.
As an example: you are in a rush to modify your code and at the end it 
ends up with dead branches, let's say a sequence which is calculated 
but not used in any following calculations, not printed on screen, not 
stored in a file etc...
It would be nice to teach R to automagically skip its calculation when I 
run the script (at least in a non-interactive way).
I know that such a situation is probably the result of bad programming 
habits, but it may arise all the same.
If I understand correctly, what I am asking for is something different 
from any kind of garbage collection which would take place, if ever, 
only after the array has been calculated.
Any suggestions (or clarifications if I am on the wrong track) are 
appreciated.


R does lazy evaluation of function arguments, so an ugly version of what 
you're asking for is to put all your code into arguments, either as 
default values or as actual argument values.  For example:


f - function(a = slow1, b = slow2, c = slow3) {
  a
  c
}

f()

will never calculate slow2, but it will calculate slow1 and slow3.  The 
other version of this is


f - function(a,b,c) {
  a
  c
}

f(slow1, slow2, slow3)

The big difference between the two versions is in scoping:  the first 
one evaluates the expressions in the local scope of f, the second one 
evaluates them in the scope of the caller.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] average columns of data frame corresponding to replicates

2010-09-09 Thread jim holtman

try this:

 myData
   sample1.id1 sample1.id2 sample2.id1 sample1.id3 sample3.id1
sample1.id4 sample2.id2
11   2   2   1   1
  1   1
21   2   2   2   1
  2   1
31   2   2   3   1
  3   1
41   2   2   4   1
  4   1
51   2   2   5   1
  5   1
61   2   2   6   1
  6   1
71   2   2   7   1
  7   1
81   2   2   8   1
  8   1
91   2   2   9   1
  9   1
10   1   2   2  10   1
 10   1
 newData - NULL
 for (i in repeat_ids){
+ # determine the columns to use
+ colIndx - grep(paste(i, $, sep=''), colnames(myData))
+ if (length(colIndx) == 0) next  # make sure it exists
+ # create the average of the columns
+ newData - cbind(newData, rowMeans(myData[, colIndx], na.rm=TRUE))
+ colnames(newData)[ncol(newData)] - i  # add the name
+ }
 newData
   id1 id2
 [1,] 1.33 1.5
 [2,] 1.33 1.5
 [3,] 1.33 1.5
 [4,] 1.33 1.5
 [5,] 1.33 1.5
 [6,] 1.33 1.5
 [7,] 1.33 1.5
 [8,] 1.33 1.5
 [9,] 1.33 1.5
[10,] 1.33 1.5



On Tue, Sep 7, 2010 at 12:00 PM, Juliet Hannah juliet.han...@gmail.com wrote:
 Hi Group,

 I have a data frame below. Within this data frame there are  samples
 (columns) that are measured  more than once. Samples are indicated by
 idx. So id1 is present in columns 1, 3, and 5. Not every id is
 repeated. I would like to create a new data frame so that the repeated
  ids are averaged. For example, in the new data frame, columns 1, 3,
 and 5 of the original will be replaced by 1 new column  that is the
 mean of these three. Thanks for any suggestions.

 Juliet



 myData - data.frame(sample1.id1 =rep(1,10),
 sample1.id2=rep(2,10),
 sample2.id1 = rep(2,10),
 sample1.id3 = 1:10,
 sample3.id1 = rep(1,10),
 sample1.id4 = 1:10,
 sample2.id2 = rep(1,10))

 repeat_ids - c(id1,id2)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] modulo operation


On 09/09/2010 7:56 AM, Barry Rowlingson wrote:

2010/9/9 José M. Blanco Moreno jmbla...@ub.edu:

 Dear R-users,
May be there is something that I am not understanding, missed or else...
Why do these operations yield these results?

25%/%0.2

[1] 124

25%%0.2

[1] 0.2

I would expect (although I know that what I do expect and what is really
intended in the code may be different things)

25/0.2

[1] 125

25 - floor(25/0.25)*0.25

[1] 0

(At least this second one is what I would expect from the code in
arithmetic.c, lines 168 to 178)


Did you read the documentation before you read the code?

 ‘%%’ and ‘x %/% y’ can be used for non-integer ‘y’, e.g. ‘1 %/%
 0.2’, but the results are subject to rounding error and so may be
 platform-dependent.  Because the IEC 60059 representation of ‘0.2’
 is a binary fraction slightly larger than ‘0.2’, the answer to ‘1
 %/% 0.2’ should be ‘4’ but most platforms give ‘5’.

I suspect that is relevant to your interests



Yes.  I think José is assuming that 25 %/% 0.2 and floor(25/0.2) are 
equal, but they are not, because rounding affects them differently. 
(The first is a single operation with no rounding except in the 
representation of 0.2; the second is two operations and is subject to 
another set of rounding.)


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reproducible research

2010-09-09 Thread Erik Iverson


Another vote for org-mode here.  In addition the advantages the
other posts mentioned, you get multiple export engines (html, latex, ...)
all built in.

On 09/09/2010 12:47 AM, David Scott wrote:

I am investigating some approaches to reproducible research. I need in
the end to produce .html or .doc or .docx. I have used hwriter in the
past but have had some problems with verbatim output from R. Tables are
also not particularly convenient.

I am interested in R2HTML and R2wd in particular, and possibly odfWeave.

Does anyone have sample documents using any of these approaches which
they could let me have?

David Scott

_

David Scott Department of Statistics
The University of Auckland, PB 92019
Auckland 1142, NEW ZEALAND
Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
Email: d.sc...@auckland.ac.nz, Fax: +64 9 373 7018

Director of Consulting, Department of Statistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Failure to aggregate

2010-09-09 Thread Gabor Grothendieck

On Wed, Sep 8, 2010 at 4:48 AM, Dimitri Shvorob
dimitri.shvo...@gmail.com wrote:

 I was able to aggregate (with sqldf, at least), after saving and re-loading
 the dataframe. My first guess was that h (and/or price?) now being a factor
 - stringsAsFactors = T by default - made the difference, and I tried to
 convert x$h to factor, but received an error.

Please provide enough of x to reproduce your problem.  e.g.

x - head(x)
dput(x)
# repeat code and ensure it still shows problem

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] modulo operation

2010-09-09 Thread José M. Blanco Moreno




Did you read the documentation before you read the code?

‘%%’ and ‘x %/% y’ can be used for non-integer ‘y’, e.g. ‘1 %/%
0.2’, but the results are subject to rounding error and so may be
platform-dependent. Because the IEC 60059 representation of ‘0.2’
is a binary fraction slightly larger than ‘0.2’, the answer to ‘1
%/% 0.2’ should be ‘4’ but most platforms give ‘5’.

I suspect that is relevant to your interests



Yes. I think José is assuming that 25 %/% 0.2 and floor(25/0.2) are 
equal, but they are not, because rounding affects them differently. 
(The first is a single operation with no rounding except in the 
representation of 0.2; the second is two operations and is subject to 
another set of rounding.)


Duncan Murdoch
Thank you (both) very much for the info. Indeed I wasn't aware of that 
piece of documentation and of the implications of rounding. Excuse me 
for my hasty question when facing this behaviour.


--
---
José M. Blanco-Moreno

Dept. de Biologia Vegetal (Botànica)
Facultat de Biologia
Universitat de Barcelona
Av. Diagonal 645
08028 Barcelona
SPAIN
---

phone: (+34) 934 039 863
fax: (+34) 934 112 842

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Emacs function argument hints

2010-09-09 Thread Ista Zahn

Hi Tim,
This works out of the box for me, with ESS 5.11 and Emacs 23.1

-Ista

On Thu, Sep 9, 2010 at 4:07 AM, Tim Elwell-Sutton tesut...@hku.hk wrote:
 Hi

 I've recently started using Emacs as my text editor for writing R script.

 I am looking for a feature which I have seen on the standard R text editor
 for Mac OS. In the Mac OS editor when you start typing a function, the
 possible arguments for that function appear at the bottom of the window.
 E.g. if you type table(   before you finish typing you can see at the
 bottom of the window:



 table(..., exclude = if (useNA == no) c(NA, NaN), useNA = c(no,
ifany, always), dnn = list.names(...), deparse.level = 1)



 I think this feature may be called function argument hints but I'm not
 sure and searching the archive with that term has not produced anything
 useful.

 Is this feature available in Emacs or any other windows text editor for R?



 Thanks very much

 Tim



 (Using Windows XP, R 2.11.1, GNU Emacs 23.2.1)




[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] confidence intervals around p-values

2010-09-09 Thread Ben Bolker

Fernando Marmolejo Ramos fernando.marmolejoramos at adelaide.edu.au writes:

 
 Dear all
 
 I wonder if anyone has heard of confidence intervals around p-values...
 
 Any pointer would be highly appreciated.


  No, and my reflex is that it seems like a bad idea.
  If you are using p-values as an index of effect size (e.g. translating
a t- or Z-score into a p-value), why not calculate the confidence interval
on the effect size?
  This is off-topic for the group (not an R question), but if you
gave a sense of the problem you were trying to solve you might get
some answers.

  Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] confidence intervals around p-values


 On 09/09/2010 6:44 AM, Fernando Marmolejo Ramos wrote:

Dear all

I wonder if anyone has heard of confidence intervals around p-values...


That doesn't really make sense.  p-values are statistics, not 
parameters.  You would compute a confidence interval around a population 
mean because that's a parameter, but you wouldn't compute a confidence 
interval around the sample mean:  you've observed it exactly.


Duncan Murdoch


Any pointer would be highly appreciated.

Best

Fer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] New package for medical image registration: RNiftyReg

2010-09-09 Thread Jon Clayden

The first release of RNiftyReg, an R package for registration
(alignment and resampling) of medical images, is now available on CRAN
[1]. It may also be useful for other 3D array-like data sets.
RNiftyReg is built on top of the NiftyReg library [2], and is written
in a mixture of C, C++ and R. It currently supports 3D rigid-body and
affine registration, and support for 2D and nonlinear registration is
planned for a future release. NIfTI-format files can be read in and
passed to the registration algorithm using the oro.nifti package. In
testing I've found that a standard 12 degree-of-freedom affine
registration typically takes less than a minute, but timings will
depend on the dimensions of the images. Feedback on the package would
be very welcome at this stage.

Over the last few years a number of R packages for medical image
analysis have been produced, and R is gaining momentum as a platform
in this field [3]. I hope that this package will be a useful addition.

All the best,
Jon

--
[1] http://cran.r-project.org/web/packages/RNiftyReg/index.html
[2] http://sourceforge.net/projects/niftyreg/
[3] http://cran.r-project.org/web/views/MedicalImaging.html

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Invitation to the ICANNGA'11 Conference

2010-09-09 Thread ICANNGA organizing committee

Dear Colleague, 

The 10th ICANNGA conference, to be held April 14-16 2011 in Ljubljana,
Slovenia, is fast approaching and with it the paper submission deadline,
which is October 1st, 2010.

Let us kindly invite you to visit our web page: www.icannga.com, where
all the details on ICANNGA, its history, program topics, the keynote
speakers, and registration details can be found. Moreover, it offers
information on Slovenia and Ljubljana, its capital. 

We want to remind you also that the accepted papers will be published in
the Springer's Lecture Notes in Computer Science, and that the best
selected papers will appear in the Springer's Computing journal with SCI
impact factor.

We are looking forward to hosting you in our beautiful country and hope
you will be able to feel Slovenia and enjoy the conference and your
stay.

With best wishes, 

ICANNGA Organizing Committee 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Strange output daply with empty strata

2010-09-09 Thread hadley wickham

 daply(data.test, .(municipality, employed), function(d){mean(d$age)} )
     employed
 municipality   no  yes
    A 41.58759 44.67463
    B 55.57407 43.82545
    C 43.59330   NA

 The .drop argument has a different meaning in daply. Some R functions have
 an na.last argument, and it may be that somewhere in daply, there is a
 function call that moves all NAs to the end. The means are in the right
 order except for the first, where the NA is supposed to be, so everything is
 offset in the table by 1. I've cc'ed Hadley on this.

This is a bug, which I've fixed in the development version (hopefully
to be released next week).
In the plyr 1.2:

 daply(data.test, .(municipality, employed), function(d){mean(d$age)} )
employed
municipality   no  yes
   A   NA 39.49980
   B 44.69291 51.63733
   C 57.38072 45.28978

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Highlighting a few bars in a barplot

2010-09-09 Thread Daniel Brewer

Hello,

I have a bar plot where I am already using colour to distinguish one set
of samples from another.  I would also like to highlight a few of these
bars as ones that should be looked at in detail.  I was thinking of
using hatching, but I can't work out how or if you can have a background
colour and hatching which is different between bars.  Any suggestions on
how I should do this?

Thanks

Dan

-- 
**
Daniel Brewer, Ph.D.

Institute of Cancer Research
Molecular Carcinogenesis
Email: daniel.bre...@icr.ac.uk
**

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company 
Limited by Guarantee, Registered in England under Company No. 534147 with its 
Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the a...{{dropped:2}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot symbol +, but with variable bar lenghts

Look at my.symbols in the TeachingDemos package.

-Original Message-
From: Rainer Machne r...@tbi.univie.ac.at
Sent: Thursday, September 09, 2010 12:42 AM
To: R-help@r-project.org R-help@r-project.org
Subject: [R] plot symbol +, but with variable bar lenghts

Hi,

does anybody know of some plotting function or an easy way to
generate + symbols with individually settable bar lengths?
I tried just combining | and - as pch and setting the size via cex,
but that doesn't really work since the two symbols have different default
lengths. Is there a horizontal | or a longer - available?

Thanks,
Rainer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Saving/loading custom R scripts

2010-09-09 Thread Bos, Roger

Josh,

I liked your idea of setting the repo in the .Rprofile file, so I tried it:

r - getOption(repos)
r[CRAN] - http://cran.stat.ucla.edu;
options(repos = r)
rm(r)

And now when I open R I get an error:

  Error in r[CRAN] - http://cran.stat.ucla.edu; : 
  cannot do complex assignments in base namespace
 

I am using R2.11.1pat in windows.  

Thanks,

Roger



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Joshua Wiley
Sent: Wednesday, September 08, 2010 11:20 AM
To: DrCJones
Cc: r-help@r-project.org
Subject: Re: [R] Saving/loading custom R scripts

Hi,

Just create a file called .Rprofile that is located in your working directory 
(this means you could actually have different ones in each working directory).  
In that file, you can put in code just like any other code that would be 
source()d in.  For instance, all my .Rprofile files start with:

r - getOption(repos)
r[CRAN] - http://cran.stat.ucla.edu;
options(repos = r)
rm(r)

So that I do not have to pick my CRAN mirror.  Similarly you could merely add 
this line to the file:

source(file = 
http://www.r-statistics.com/wp-content/uploads/2010/02/Friedman-Test-with-Post-Hoc.r.txt;)

and R would go online, download that file and source it in (not that I am 
recommending re-downloading every time you start R).  Then whatever names they 
used to define the functions, would be in your workspace.

Note that in general, you will not get any output alerting you that it has 
worked; however, if you type ls() you should see those functions'
names.

Cheers,

Josh

On Wed, Sep 8, 2010 at 12:25 AM, DrCJones matthias.godd...@gmail.com wrote:

 Hi,
 How does R automatically load functions so that they are available 
 from the workspace? Is it anything like Matlab - you just specify a 
 directory path and it finds it?

 The reason I ask is because  I found a really nice script that I would 
 like to use on a regular basis, and it would be nice not to have to 
 'copy and paste' it into R on every startup:

 http://www.r-statistics.com/wp-content/uploads/2010/02/Friedman-Test-w
 ith-Post-Hoc.r.txt

 This would be for Ubuntu, if that makes any difference.

 Cheers
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Saving-loading-custom-R-scripts-tp253092
 4p2530924.html Sent from the R help mailing list archive at 
 Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
***

This message is for the named person's use only. It may\...{{dropped:20}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] multi-class for BRT

2010-09-09 Thread azam jaafari

Hi
 
I want to do boosted regression(classification) tree for categorical response 
(with 7 levels). Can I do this by GBM package?
 
please help me? 
 
Thanks alot
 


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Determine Bounds of Current Graph

2010-09-09 Thread Isamoor

I'm having trouble determining the bounds of my current graph.  I know how
to set the bounds up front (ylim  xlim in most cases), but I would rather
be able to dynamically see what was chosen to use in later code.

Example:

library(maps)
map('state','Indiana')
map.axes()
??Something that lets me know the y-axis is from ~38 to ~42 and store this
information into a vector

Is there some way to query what the bounds of the current graph are?

Thanks!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Newbie cross tabulation issue

2010-09-09 Thread John Kane

It would help if you included a bit of sample data. See ?dput as a way of doing 
this.

Also a good place to start is by looking at the package reshape. Have a look 
at http://had.co.nz/reshape/ for some information on the package.

--- On Wed, 9/8/10, Jonathan Finlay jmfinl...@gmail.com wrote:

 From: Jonathan Finlay jmfinl...@gmail.com
 Subject: [R] Newbie cross tabulation issue
 To: r-help@r-project.org
 Received: Wednesday, September 8, 2010, 6:40 PM
 hi, i'm new in R and i need some
 help. Please, ¿do you know a function how
 can process cross tables for many variables and show the
 result in one table
 who look like this?:

 ++
 |-- |        X
 variable           |
 |- | Xop1 | Xop2 | Xop3|.|
 ++
 |Yvar1 | Total | %row..|
 |         |  Op1 | 
 %row..|
 |         |  Op2 | 
 %row..|
 |+---+
 |Yvar2 | Op1 |  %row..|
 |         | Op2 | 
 %row...|
 ++
 |Yvar3 | Op1 |  %row..|
 |         | Op2 | 
 %row...|
 |         | Op3 | 
 %row...|
 |+---+

 Like a pivot table!

 thanks a lot.

 -- 
 Jonathan.

     [[alternative HTML version deleted]]

 -Inline Attachment Follows-

 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Which language is faster for numerical computation?

2010-09-09 Thread Matt Shotwell

For the compiled languages, it depends heavily on the compiler. This
sort of comparison is rendered moot by the huge variety of compiler and
hardware specific optimizations. My suggestion is to use C, or possibly
C++ in conjunction with Rcpp, as these are most compatible with R. Also,
C and C++ are consistently rated highly (often in the top 3) in
popularity and use. Fortran is not. This would make a difference if you
want to collaborate or ask for help.

-Matt

On Thu, 2010-09-09 at 06:26 -0400, Christofer Bogaso wrote:
 Dear all, R offers integration mechanism with different programming
 languages like C, C++, Fortran, .NET etc. Therefore I am curious on,
 for heavy numerical computation which language is the fastest? Is
 there any study? I specially want to know because, if there is some
 study saying that C is the fastest language for numerical computation
 then I would change some of my R code into C.
 
 Thanks for your time.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Matthew S. Shotwell
Graduate Student 
Division of Biostatistics and Epidemiology
Medical University of South Carolina

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Which language is faster for numerical computation?

2010-09-09 Thread Dirk Eddelbuettel


On 9 September 2010 at 13:26, Rainer M Krug wrote:
| -BEGIN PGP SIGNED MESSAGE-
| Hash: SHA1
| 
| On 09/09/10 12:26, Christofer Bogaso wrote:
|  Dear all, R offers integration mechanism with different programming
|  languages like C, C++, Fortran, .NET etc. Therefore I am curious on,
|  for heavy numerical computation which language is the fastest? Is
|  there any study? I specially want to know because, if there is some
|  study saying that C is the fastest language for numerical computation
|  then I would change some of my R code into C.
| 
| As far as I am aware, the two choices are C and Fortran - where it
| depends on the calculations, which one is faster.

Could it get any more un-scientific and un-empirical?  Maybe we should debate
whether it is faster on Thursdays than on Wednesdays too ?

FWIW the Rcpp package contains this benchmark example where (R and) C++ is
faster than (R and) C.  So it really all depends.  If someone wants to
contribute a Fortran version I'll gladly commit it too.

   test replications elapsed relative user.self sys.self
5Rcpp_New_ptr(a, b)1   0.213   1.  0.210
1 R_API_optimised(a, b)1   0.233   1.0939  0.230
4Rcpp_New_std(a, b)1   0.258   1.2113  0.260
3Rcpp_Classic(a, b)1   0.445   2.0892  0.450
2 R_API_naive(a, b)1   1.179   5.5352  1.170
6  Rcpp_New_sugar(a, b)1   1.260   5.9155  1.260
All results are equal
e...@max:~/svn/rcpp/pkg/Rcpp/inst/examples/ConvolveBenchmarks$ 

(That is a slightly reworked version from SVN and to be on CRAN soon. What is
on CRAN looks a little different as it doesn't use the rbenchmark package.)

Benchmark results are far from conclusive proofs but a carefully set-up study
can highlight and illuminate differences and/or lack thereof. If Christopher
has a particular problem in mind he should probably test and benchmark
approaches to that problem.

Lastly, the time the code runs is just one measure. For Rcpp we also aim to
minimise the time it takes to _write_ the code to solve the problem.

Dirk

-- 
Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Determine Bounds of Current Graph



On Sep 9, 2010, at 10:07 AM, Isamoor wrote:

I'm having trouble determining the bounds of my current graph.  I  
know how
to set the bounds up front (ylim  xlim in most cases), but I would  
rather

be able to dynamically see what was chosen to use in later code.

Example:

library(maps)
map('state','Indiana')
map.axes()


?par

 bounds - par(usr)
 bounds
[1] -88.12964 -84.77184  37.74583  41.82082


??Something that lets me know the y-axis is from ~38 to ~42 and  
store this

information into a vector

Is there some way to query what the bounds of the current graph are?

Thanks!



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] confidence intervals around p-values

2010-09-09 Thread Ted Harding

On 09-Sep-10 13:21:07, Duncan Murdoch wrote:
   On 09/09/2010 6:44 AM, Fernando Marmolejo Ramos wrote:
 Dear all

 I wonder if anyone has heard of confidence intervals around
 p-values...
 
 That doesn't really make sense.  p-values are statistics, not 
 parameters. You would compute a confidence interval around a
 population mean because that's a parameter, but you wouldn't
 compute a confidence interval around the sample mean: you've
 observed it exactly.
 
 Duncan Murdoch

Duncan has succinctly stated the essential point in the standard
interpretation. The P-value is calculated from the sample in
hand, a definite null hypothesis, and the distribution of the
test statistic given the null hyptohesis, so (given all of these)
there is no scope for any other answer.

However, there are circumstances in which the notion of confidence
interval for a P-value makes some sense. One such might be the
Mann-Whitney test for identity of distribution of two samples
of continuous variables, where (because of discretisation of the
values when they were recorded) there are ties.

Then you know in theory that the underlying values are all
different, but because you don't know where these lie in the
discretisation intervals you don't know which way a tie may
split. So it would make sense to simulate by splitting ties
at random (e.g. uniformly distribute each 1.5 value over the
interval (1.5,1.6) or (1.45,1.55)). 

For each such simulated tie-broken sample, calculate the P-value.
Then you get a distribution of exact P-values calculated from
samples without ties which are consistent with the recorded data.
The central 95% of this distribution could be interpreted as a 95%
coinfidence interval for the true P-value.

To bring this closer to on-topic, here is an example in R
(rounding to intervals of 0.2):

  set.seed(51324)
  X - sort(2*round(0.5*rnorm(12),1))
  Y - sort(2*round(0.5*rnorm(12)+0.25,1))
  rbind(X,Y)
#   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
# X -1.8 -1.2 -0.8 -0.6  0.00  0.2  0.2  1.2   1.8 2   2.2
# Y -1.2 -0.4 -0.2  0.4  0.41  1.0  1.0  1.2   1.8 2   2.6
# So several ties (-1.2,1.2,1.8,2.0), as well as 0.0, 0.4, 1.0
# which don't matter.
wilcox.test(X,Y,alternative=less,exact=TRUE,correct=FALSE)
# data:  X and Y   W = 54, p-value = 0.1488

  Ps - numeric(1000)
  for(i in (1:1000)){
Xr - (X-0.1) + 0.2*runif(10)
Yr - (Y-0.1) + 0.2*runif(10)
Ps[i] - wilcox.test(Xr,Yr,alternative=less,
 exact=TRUE,correct=FALSE)$p.value
  }
  hist(Ps)
  table(round(Ps,4))
  # 0.1328 0.1457 0.1593 0.1737 0.1888 
  # 81267336226 90 

So this gives you a picture of the uncertainty in the P-value
(0.1488, calculated from the rounded data) relative to what it
really should have been (if calculated from unrounded data).
Since each possible true (tie-broken) sample can be viewed
as a hypothesis about unobserved truth, it does make a certain
sense to view these results as a kind of confidence distribution
for the P-value you should have got. However, this is more of a
Bayesian argument, since the above calculation has assigned
equal prior probability to the tie-breaks!

One could also, I suppose, consider the question of what
distribution of P-values might arise if the/an alternative
huypothesis were true, and where in this does the P-value that
we actually got lie? But these are murkier waters ...

Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 09-Sep-10   Time: 15:24:29
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] optimized value worse than starting Value

2010-09-09 Thread Ravi Varadhan


Yes, Barry, we are aware of this issue.  It is caused by printing to console 
from FORTRAN in one of the optimization codes, ucminf.  If we set trace=FALSE 
in optimx, this problem goes away.

Ravi.



Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


- Original Message -
From: Barry Rowlingson b.rowling...@lancaster.ac.uk
Date: Thursday, September 9, 2010 4:13 am
Subject: Re: [R] optimized value worse than starting Value
To: Michael Bernsteiner dethl...@hotmail.com
Cc: rvarad...@jhmi.edu, r-help@r-project.org


 On Wed, Sep 8, 2010 at 6:26 PM, Michael Bernsteiner
 dethl...@hotmail.com wrote:
  @Barry: Yes it is the Rosenbrock Function. I'm trying out some thing 
 I found
  here: 
 
  @Ravi: Thanks for your help. I will have a closer look at the BB 
 package. Am
  I right, that the optimx package is ofline atm? (Windows)
 
  It looks like the Windows build of optimx failed the R CMD check when
 running the examples:
 
 
 
 Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating with tolerances (error propagation)



On Sep 9, 2010, at 6:50 AM, Jan private wrote:


Hello Bernardo,

-
If I understood  your problem this script solve your problem:

q-0.15 + c(-.1,0,.1)
h-10 + c(-.1,0,.1)
5*q*h
[1]  2.475  7.500 12.625
-

OK, this solves the simple example.
But what if the example is not that simple. E.g.

P = 5 * q/h

Here, to get the maximum tolerances for P, we need to divide the  
maximum
value for q by the minimum value for h, and vice versa. Is there any  
way

to do this automatically, without thinking about every single step?

There is a thing called interval arithmetic (I saw it as an Octave
package) which would do something like this.

I would have thought that tracking how a (measuring) error propagates
through a complex calculation would be a standard problem of
statistics?? In other words, I am looking for a data type which is a
number with a deviation +- somehow attached to it, with binary  
operators

that automatically knows how to handle the deviation.

Thank you,  
Jan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating with tolerances (error propagation)

2010-09-09 Thread Peng, C


 q-0.15 + c(-.1,0,.1) 
 h-10 + c(-.1,0,.1) 
 
 5*q/h[3:1] 
[1] 0.02475248 0.0750 0.12626263
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Re-Calculating-with-tolerances-error-propagation-tp2532640p2532991.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Correlation question

2010-09-09 Thread Stephane Vaucher


Thank you Dennis,

You identified a factor (text column) that I was concerned with. 
I simplified my example to try and factor out possible causes. I 
eliminated the recurring values in columns (which were not the columns 
that caused problems). I produced three examples with simple data sets.


1. Correct output, 2 columns only:


test.notext = read.csv('test-notext.csv')
cor(test.notext, method='spearman')

   P3 HP_tot
P3  1.000 -0.2182876
HP_tot -0.2182876  1.000

dput(test.notext)

structure(list(P3 = c(2L, 2L, 2L, 4L, 2L, 3L, 2L, 1L, 3L, 2L,
2L, 2L, 3L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L),
HP_tot = c(10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 136L,
136L, 136L, 136L, 136L, 136L, 136L, 136L, 136L, 136L, 15L,
15L, 15L, 15L, 15L, 15L, 15L)), .Names = c(P3, HP_tot
), class = data.frame, row.names = c(NA, -25L))

2. Incorrect output where I introduced my P7 column containing text only 
the 'a' character:



test = read.csv('test.csv')
cor(test, method='spearman')

   P3 P7 HP_tot
P3  1.000 NA -0.2502878
P7 NA  1 NA
HP_tot -0.2502878 NA  1.000
Warning message:
In cor(test, method = spearman) : the standard deviation is zero

dput(test)

structure(list(P3 = c(2L, 2L, 2L, 4L, 2L, 3L, 2L, 1L, 3L, 2L,
2L, 2L, 3L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L),
P7 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = a, class = factor), HP_tot = c(10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 136L, 136L, 136L, 136L, 136L,
136L, 136L, 136L, 136L, 136L, 15L, 15L, 15L, 15L, 15L, 15L,
15L)), .Names = c(P3, P7, HP_tot), class = data.frame, 
row.names = c(NA,

-25L))

3. Incorrect output with P7 containing a variety of alpha-numeric 
characters (ascii), to factor out equal valued column issue. Notice that 
the text column is interpreted as a numeric value.



test.number = read.csv('test-alpha.csv')
cor(test.number, method='spearman')

   P3 P7 HP_tot
P3  1.000  0.4093108 -0.2502878
P7  0.4093108  1.000 -0.3807193
HP_tot -0.2502878 -0.3807193  1.000

dput(test.number)

structure(list(P3 = c(2L, 2L, 2L, 4L, 2L, 3L, 2L, 1L, 3L, 2L,
2L, 2L, 3L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L),
P7 = structure(c(11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L,
19L, 20L, 21L, 22L, 23L, 24L, 25L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 10L), .Label = c(0, 1, 2, 3, 4, 5,
6, 7, 8, 9, a, b, c, d, e, f, g, h,
i, j, k, l, m, n, o), class = factor), HP_tot = c(10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 136L, 136L, 136L, 136L,
136L, 136L, 136L, 136L, 136L, 136L, 15L, 15L, 15L, 15L, 15L,
15L, 15L)), .Names = c(P3, P7, HP_tot), class = data.frame, 
row.names = c(NA,

-25L))

Correct output is obtained by avoiding matrix computation of correlation:

cor(test.number$P3, test.number$HP_tot, method='spearman')

[1] -0.2182876

It seems that a text column corrupts my correlation calculation (only in a 
matrix calculation). I assumed that text columns would not influence the 
result of the calculations.


Is this a correct behaviour? If not,I can submit a bug report? If it is, 
is there a known workaround?


cheers,
Stephane Vaucher

On Thu, 9 Sep 2010, Dennis Murphy wrote:


Did you try taking out P7, which is text? Moreover, if you get a message
saying ' the standard deviation is zero', it means that the entire column is
constant. By definition, the covariance of a constant with a random variable
is 0, but your data consists of values, so cor() understandably throws a
warning that one or more of your columns are constant. Applying the
following to your data (which I named expd instead),  we get

sapply(expd[, -12], var)
 P1   P2   P3   P4   P5
P6
5.43e-01 1.08e+00 5.77e-01 1.08e+00 6.43e-01
5.57e-01
 P8   P9  P10  P11  P12
SITE
5.73e-01 3.19e+00 5.07e-01 2.50e-01 5.50e+00
2.49e+00
 Errors warnings   ManualTotalH_tot
HP1.1
9.072840e+03 2.081334e+04 7.43e-01 3.823500e+04 3.880250e+03
2.676667e+00
  HP1.2HP1.3HP1.4   HP_totHO1.1
HO1.2
0.00e+00 2.008440e+03 3.057067e+02 3.827250e+03 8.40e-01
0.00e+00
  HO1.3HO1.4   HO_totHU1.1HU1.2
HU1.3
0.00e+00 0.00e+00 8.40e-01 0.00e+00 2.10e-01
2.27e-01
 HU_tot   HRL_totLP1.1LP1.2
LP1.3
6.23e-01 7.43e-01 3.754610e+03 3.209333e+01 0.00e+00
2.065010e+03
  LP1.4   LP_totLO1.1LO1.2LO1.3
LO1.4
2.246233e+02 3.590040e+03 3.684000e+01 0.00e+00 0.00e+00
2.84e+00
 LO_totLU1.1LU1.2LU1.3   LU_tot
LR_tot
6.00e+01 0.00e+00 1.44e+00 3.626667e+00 8.37e+00
4.94e+00
 SP_tot

Re: [R] Highlighting a few bars in a barplot

2010-09-09 Thread RINNER Heinrich

Hello Daniel,

something like that might work:

x - runif(6)
marker1 - rep(c(red, blue), 3)
marker2 - c(rep(0,5), 10)
barplot(x, col = marker1)
barplot(x, density = marker2, add=T)

But I'd be interested if you learn about other solutions...

-Heinrich.

 -Ursprüngliche Nachricht-
 Von: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org] Im Auftrag von Daniel Brewer
 Gesendet: Donnerstag, 09. September 2010 16:03
 An: r-h...@stat.math.ethz.ch
 Betreff: [R] Highlighting a few bars in a barplot


 Hello,

 I have a bar plot where I am already using colour to
 distinguish one set
 of samples from another.  I would also like to highlight a
 few of these
 bars as ones that should be looked at in detail.  I was thinking of
 using hatching, but I can't work out how or if you can have a
 background
 colour and hatching which is different between bars.  Any
 suggestions on
 how I should do this?

 Thanks

 Dan

 --
 **
 Daniel Brewer, Ph.D.

 Institute of Cancer Research
 Molecular Carcinogenesis
 Email: daniel.bre...@icr.ac.uk
 **

 The Institute of Cancer Research: Royal Cancer Hospital, a
 charitable Company Limited by Guarantee, Registered in
 England under Company No. 534147 with its Registered Office
 at 123 Old Brompton Road, London SW7 3RP.

 This e-mail message is confidential and for use by the
 a...{{dropped:2}}

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide
 commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating with tolerances (error propagation)



On Sep 9, 2010, at 6:50 AM, Jan private wrote:


Hello Bernardo,

-
If I understood  your problem this script solve your problem:

q-0.15 + c(-.1,0,.1)
h-10 + c(-.1,0,.1)
5*q*h
[1]  2.475  7.500 12.625
-

OK, this solves the simple example.
But what if the example is not that simple. E.g.

P = 5 * q/h

Here, to get the maximum tolerances for P, we need to divide the  
maximum

value for q by the minimum value for h, and vice versa.


Have you considered the division by zero problems?


Is there any way
to do this automatically, without thinking about every single step?

There is a thing called interval arithmetic (I saw it as an Octave
package) which would do something like this.


(Sorry for the blank reply posting. Serum caffeine has not yet reached  
optimal levels.)


Is it possible that interval arithmetic would produce statistically  
incorrect tolerance calculation, and that be why it has not been added  
to R? Those tolerance intervals are presumably some sort of  
(unspecified) prediction intervals (i.e. contain 95% or 63% or some  
fraction of a large sample) and combinations under mathematical  
operations are not going to be properly derived by c( min(XY),  
max(XY) ) since those are not calculated with any understanding of  
combining variances of functions on random variables.


--
David.


I would have thought that tracking how a (measuring) error propagates
through a complex calculation would be a standard problem of
statistics??


In probability theory, anyway.


In other words, I am looking for a data type which is a
number with a deviation +- somehow attached to it, with binary  
operators

that automatically knows how to handle the deviation.


There is the suite of packages that represent theoretic random  
variables and support mathematical operations on them.


See distrDoc and the rest of that suite.

--
David.


Thank you,  
Jan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reproducible research

2010-09-09 Thread Matt Shotwell

I have a little package I've been using to write template blog posts (in
HTML) with embedded R code. It's quite small but very flexible and
extensible, and aims to do something similar to Sweave and brew. In
fact, the package is heavily influenced by the brew package, though
implemented quite differently. It depends on the evaluate package,
available in the CRAN. The tentatively titled 'markup' package is
attached. After it's installed, see ?markup and the few examples in the
inst/ directory, or just example(markup).

-Matt

On Thu, 2010-09-09 at 01:47 -0400, David Scott wrote:
 I am investigating some approaches to reproducible research. I need in 
 the end to produce .html or .doc or .docx. I have used hwriter in the 
 past but have had some problems with verbatim output from  R. Tables are 
 also not particularly convenient.
 
 I am interested in R2HTML and R2wd in particular, and possibly odfWeave.
 
 Does anyone have sample documents using any of these approaches which 
 they could let me have?
 
 David Scott
 
 _
 
 David Scott   Department of Statistics
   The University of Auckland, PB 92019
   Auckland 1142,NEW ZEALAND
 Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
 Email:d.sc...@auckland.ac.nz,  Fax: +64 9 373 7018
 
 Director of Consulting, Department of Statistics
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Matthew S. Shotwell
Graduate Student 
Division of Biostatistics and Epidemiology
Medical University of South Carolina
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Failure to aggregate

2010-09-09 Thread Petr PIKAL

Hi

you has to provide some more info about x e.g. str(x)

x-data.frame(price=1, h=Sys.time())



r-help-boun...@r-project.org napsal dne 08.09.2010 10:18:52:

 
 Mnay thanks fr suggestions. I am afraid this is one tough daatframe...
 
  t = sqldf(select h, count(*) from x group by h)
 Error in sqliteExecStatement(con, statement, bind.data) : 
   RS-DBI driver: (error in statement: no such table: x)
 In addition: Warning message:
 In value[[3L]](cond) : RAW() can only be applied to a 'raw', not a 
'double'

did not test

 
  t = aggregate(x[price], by = x[h], FUN = NROW) 
 Error in sort.list(y) : 'x' must be atomic for 'sort.list'
 Have you called 'sort' on a list?

works
 aggregate(x[price], by = x[h], FUN = NROW) 
h price
1 2010-09-09 16:58:04 1


 
  t = aggregate(x[price], by = x[h], FUN = length) 
 Error in sort.list(y) : 'x' must be atomic for 'sort.list'
 Have you called 'sort' on a list?

works

aggregate(x[price], by = x[h], FUN = length)
h price
1 2010-09-09 16:58:04 1


 
  t = tapply(x$price, by = x$h, FUN = length) 
 Error in is.list(INDEX) : 'INDEX' is missing

works
use INDEX instead of by

tapply(x$price, by = list(x$h), FUN = length)
Error in is.list(INDEX) : 'INDEX' is missing
tapply(x$price, x$h, FUN = length)
2010-09-09 16:58:04 
  1

Regards
Petr

 
  class(x)
 [1] data.frame
  class(x$h)
 [1] POSIXt  POSIXlt
  class(x$price)
 [1] integer
 
 -- 
 View this message in context: http://r.789695.n4.nabble.com/Failure-to-
 aggregate-tp2528613p2530963.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculating with tolerances (error propagation)

2010-09-09 Thread Keith Jewell

Jan private jrheinlaen...@gmx.de wrote in message 
news:1284029454.2740.361.ca...@localhost.localdomain...
 Hello Bernardo,

 -
 If I understood  your problem this script solve your problem:

 q-0.15 + c(-.1,0,.1)
 h-10 + c(-.1,0,.1)
 5*q*h
 [1]  2.475  7.500 12.625
 -

 OK, this solves the simple example.
 But what if the example is not that simple. E.g.

 P = 5 * q/h

 Here, to get the maximum tolerances for P, we need to divide the maximum
 value for q by the minimum value for h, and vice versa. Is there any way
 to do this automatically, without thinking about every single step?

 There is a thing called interval arithmetic (I saw it as an Octave
 package) which would do something like this.

 I would have thought that tracking how a (measuring) error propagates
 through a complex calculation would be a standard problem of
 statistics?? In other words, I am looking for a data type which is a
 number with a deviation +- somehow attached to it, with binary operators
 that automatically knows how to handle the deviation.

 Thank you,
 Jan

Ahhh! tracking how a (measuring) error propagates through a complex 
calculation

That doesn't depend only on values+errors, it also depends on the 
calculations, so - as you imply - you'd have to define a new data type and 
appropriate methods for all the mathematical operators (not just the binary 
ones!). Not a trivial task!

If you don't already know it, you should look at  Evaluation of measurement 
data - Guide to the expression of uncertainty in measurement 
http://www.bipm.org/utils/common/documents/jcgm/JCGM_100_2008_E.pdf 
especially section 5.

Hope that helps,

Keith J

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Newbie cross tabulation issue

2010-09-09 Thread Jonathan Finlay

2010/9/8 David Winsemius dwinsem...@comcast.net


 I hope you mean only two factors and an n x m table.

 Yes David I like say factor, but am new here.

-- 
Jonathan.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error in normalizePath(path) : with McAfee

2010-09-09 Thread peter dalgaard


On Sep 9, 2010, at 13:52 , Duncan Murdoch wrote:

 On 09/09/2010 12:01 AM, Erin Hodgess wrote:
 Dear R People:
 I keep getting the Error in normalizePath(path) :  while trying to
 obtain the necessary packages to use with the Applied Spatial
 Statistics with R book.
 I turned off the Firewall (from McAfee) but am still getting the same 
 message.
 Does anyone have any idea on a solution please?
 
 I think you need to show us your code and the error in context.

Maybe not... We have been here before, and if I remember correctly, the issue 
is not the firewall, but antivirus software messing with temp directories. (As 
I understand it, it goes I see you have created a new directory, now let me 
put that in a safe place while I check it for malware. Oh, you were still using 
it? Too bad. Sort of like grabbing someone's new cup while they are trying to 
pour milk into it) 

Check back issues of R-help for details of the earlier incidents. 

 
 Duncan Murdoch
 
 sessionInfo()
 R version 2.11.1 (2010-05-31)
 i386-pc-mingw32
 locale:
 [1] LC_COLLATE=English_United States.1252
 [2] LC_CTYPE=English_United States.1252
 [3] LC_MONETARY=English_United States.1252
 [4] LC_NUMERIC=C
 [5] LC_TIME=English_United States.1252
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base
 other attached packages:
 [1] ctv_0.6-0
 loaded via a namespace (and not attached):
 [1] tools_2.11.1
 
 Thanks,
 Erin
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with outer

2010-09-09 Thread Peng, C


Can you set the multinomial prob. to zero for p1+p2+p3 != 1 if you have to
use the multinomial distribution in guete(). Otherwise, I would say the
problem/guete() itself is problematic.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/problem-with-outer-tp2532074p2533050.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Newbie cross tabulation issue



On Sep 8, 2010, at 7:32 PM, Jonathan Finlay wrote:

Thanks David, gmodels::Crosstable partially work because can show  
only 1 x 1

tablen
CrossTable(x,y,...)
I need something how can process at less 1 variable in X an 10 in Y.


A further thought (despite a lack of clarification on what your data  
situation really is.). The strong tendency in R is not to attempt  
replication of formats in SAS that were developed in an era of dot- 
matrix printers, but to target modern output devices. As such most of  
the table output facilities with any degree of sophistication have  
LaTeX or HTML as targets.


RSiteSearch(html tables) produces over 1000 links although they have  
many that are not for multiway tables where multi is greater than R  
x C. RSiteSearch(latex tables) produces many fewer.  You may want to  
look at xtable, Sweave, odfWeave, the various HTML utilities, and  
Harrell's Hmisc::summary.formula


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Prediction confidence intervals for a Poisson GLM

2010-09-09 Thread stephenb

I am following up on an old post. Please, comment:

it appears that

predict(glm.model,type=response,se.fit=T)

will do all the conversions and give se on the scale of the response. This
only takes into account the error in parameter estimation. what a
prediction interval is meant to be usually means it has to capture the
error due to both parameter estimation and sampling variation ie it
encompasses the actual realizations.
for a given parameter there is sampling variation and that is not included
in the output of predict. the discreteness of models makes it quite
difficult to estimate a percentile interval, though. for binary outcomes, I
think it does not make sense. for Poisson and binomial (grouped binary) I
think it is possible to get approximations at least and this is what the
original poster needed I think.

so, let's say we have plow and pup for an observation from predict. if
size=100 for that obs.
predlow=qbinom(.025,100,plow)
predup=qbinom(.975,100,pup)

will give the prediction bounds. this I think partly ignores possible
overdispersion. Please, suggest a better way taking overdispersion into
account (in the qbinom part).

Thanks everybody.
Stephen B.
--
View this message in context:
http://r.789695.n4.nabble.com/Prediction-confidence-intervals-for-a-Poisson-GLM-tp841577p2533070.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Failure to aggregate

2010-09-09 Thread Dimitri Shvorob


 g = head(x)
 dput(g)
structure(list(price = c(500L, 500L, 501L, 501L, 500L, 501L), 
size = c(221000L, 2000L, 1000L, 13000L, 3000L, 3000L), src = c(R, 
R, R, R, R, R), t = structure(list(sec = c(24.133, 
47.096, 12.139, 18.142, 10.721, 28.713), min = c(0L, 0L, 
1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L, 
4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, 
105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 
2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 
0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday, 
mon, year, wday, yday, isdst), class = c(POSIXt, 
POSIXlt)), d = structure(list(sec = c(0, 0, 0, 0, 0, 0), 
min = c(0L, 0L, 0L, 0L, 0L, 0L), hour = c(0L, 0L, 0L, 
0L, 0L, 0L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L, 
0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L, 
105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L, 
3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L
)), .Names = c(sec, min, hour, mday, mon, year, 
wday, yday, isdst), class = c(POSIXt, POSIXlt)), 
h = structure(list(sec = c(0, 0, 0, 0, 0, 0), min = c(0L, 
0L, 0L, 0L, 0L, 0L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L, 
4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, 
105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 
2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 
0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday, 
mon, year, wday, yday, isdst), class = c(POSIXt, 
POSIXlt)), m = structure(list(sec = c(0, 0, 0, 0, 0, 0), 
min = c(0L, 0L, 1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, 
9L, 9L, 9L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L, 
0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L, 
105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L, 
3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L
)), .Names = c(sec, min, hour, mday, mon, year, 
wday, yday, isdst), class = c(POSIXt, POSIXlt)), 
s = structure(list(sec = c(24, 47, 12, 18, 10, 28), min = c(0L, 
0L, 1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L, 
4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L, 
105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 
2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 
0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday, 
mon, year, wday, yday, isdst), class = c(POSIXt, 
POSIXlt))), .Names = c(price, size, src, t, d, 
h, m, s), row.names = c(NA, 6L), class = data.frame)

 n = sqldf(select distinct h, src, count(*) from g group by h, src)
Loading required package: tcltk
Loading Tcl/Tk interface ... done
Error in sqliteExecStatement(con, statement, bind.data) : 
  RS-DBI driver: (error in statement: no such table: g)
In addition: Warning message:
In value[[3L]](cond) : RAW() can only be applied to a 'raw', not a 'double'

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Failure-to-aggregate-tp2528613p2533051.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regression function for categorical predictor data

2010-09-09 Thread karena


Hi, thank you very much for the help. 
one more quick question: is that, my predictor variable should be coded as
'factor' when using either 'lm' or 'glm'?

sincerely,

karena
-- 
View this message in context: 
http://r.789695.n4.nabble.com/regression-function-for-categorical-predictor-data-tp2532045p2533035.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a question about replacing the value in the data.frame

2010-09-09 Thread karena


Thanks a lot!
-- 
View this message in context: 
http://r.789695.n4.nabble.com/a-question-about-replacing-the-value-in-the-data-frame-tp2532010p2533036.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reproducible research

2010-09-09 Thread Matt Shotwell

Well, the attachment was a dud. Try this:

http://biostatmatt.com/R/markup_0.0.tar.gz

-Matt

On Thu, 2010-09-09 at 10:54 -0400, Matt Shotwell wrote:
 I have a little package I've been using to write template blog posts (in
 HTML) with embedded R code. It's quite small but very flexible and
 extensible, and aims to do something similar to Sweave and brew. In
 fact, the package is heavily influenced by the brew package, though
 implemented quite differently. It depends on the evaluate package,
 available in the CRAN. The tentatively titled 'markup' package is
 attached. After it's installed, see ?markup and the few examples in the
 inst/ directory, or just example(markup).
 
 -Matt
 
 On Thu, 2010-09-09 at 01:47 -0400, David Scott wrote:
  I am investigating some approaches to reproducible research. I need in 
  the end to produce .html or .doc or .docx. I have used hwriter in the 
  past but have had some problems with verbatim output from  R. Tables are 
  also not particularly convenient.
  
  I am interested in R2HTML and R2wd in particular, and possibly odfWeave.
  
  Does anyone have sample documents using any of these approaches which 
  they could let me have?
  
  David Scott
  
  _
  
  David Scott Department of Statistics
  The University of Auckland, PB 92019
  Auckland 1142,NEW ZEALAND
  Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
  Email:  d.sc...@auckland.ac.nz,  Fax: +64 9 373 7018
  
  Director of Consulting, Department of Statistics
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

-- 
Matthew S. Shotwell
Graduate Student 
Division of Biostatistics and Epidemiology
Medical University of South Carolina

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Alignment of lines within barplot bars

2010-09-09 Thread Steve Murray


Dear all,

I have a barplot upon which I hope to superimpose horizontal lines extending 
across the width of each bar. I am able to partly achieve this through the 
following set of commands:

positions - barplot(bar_values, col=grey)
par(new=TRUE)
plot(positions, horiz_values, col=red, pch=_, ylim=c(min(bar_values), 
max(bar_values)))


...however this results in small, off-centred lines, which don't extend across 
the width of each bar. I've tried using 'cex' to increase the width, but of 
course this also increases the height of the line and results in it spanning a 
large range of y-axis values.


I'm sure this shouldn't be too tricky to achieve, nor that uncommon a problem! 
It may be that I'm taking the wrong approach.

Any help offered would be gratefully received.

Many thanks,

Steve

  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Axis break with gap.plot()

2010-09-09 Thread Filoche


Hi everyone.

I'm trying to break the y axis on a plot. For instance, I have 2 series
(points and a loess). Since the loess is a continuous set of points, it
passes in the break section. However, with gap.plot I cant plot the loess
because of this (I got the message some values of y will not be
displayed).

Here's my code: 

library(plotrix);


#generate some data
x = seq(-pi,pi,0.1);
sinx = sin(x);

#add leverage value
sinx = c(sinx,10);
xx = c(x,max(x) + 0.1);

#Loess
yy = loess(sinx ~ xx, span = 0.1);
yy = predict(yy);

#Add break between 2 and 8
gap.plot(xx,sinx,c(2,8)); #This line works fine
gap.plot(xx,yy,c(2,8), add = T); #This wont plot the loess

I did the graphic I would like to produce in Sigmaplot.  

http://img830.imageshack.us/img830/5206/breakaxis.jpg

Can it be done in R ?

With regards,
Phil






-- 
View this message in context: 
http://r.789695.n4.nabble.com/Axis-break-with-gap-plot-tp2533027p2533027.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] scalable delimiters in plotmath

2010-09-09 Thread baptiste auguie

Dear list,

I read in ?plotmath that I can use bgroup to draw scalable delimiters
such as [ ] and ( ). The same technique fails with   however, and I
cannot find a workaround,

grid.text(expression(bgroup(,atop(x,y),)))

Error in bgroup(, atop(x, y), ) : invalid group delimiter

Regards,

baptiste

sessionInfo()
R version 2.11.1 (2010-05-31)
x86_64-apple-darwin9.8.0

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] grid  stats graphics  grDevices utils datasets
methods   base

other attached packages:
[1] TeachingDemos_2.7

loaded via a namespace (and not attached):
[1] tools_2.11.1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] See what is inside a matrix

The image function will create a plot with the values transformed to colors. Or 
the View function (note the capitol V) will let you look at it in a spreadsheet 
like window with scrollbars.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Alaios
 Sent: Thursday, September 09, 2010 1:23 AM
 To: Rhelp
 Subject: [R] See what is inside a matrix
 
 Hello everyone.. Is there any graphical tool to help me see what is
 inside a
 matrix? I have 100x100 dimensions matrix and as you already know as it
 does not
 fit on my screen R splits it into pieces.
 
 I would like to thank you in advance for your help
 Best Regards
 Alex
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Failure to aggregate

2010-09-09 Thread jim holtman

I think your main problem is that you have your time as POSIXlt which
is a multiple valued vector.  I converted the 't' to POSIXct, removed
the other POSIXlt value and created a 'h' as the character for the
hour and it works fine:

 str(g)
'data.frame':   6 obs. of  5 variables:
 $ price: int  500 500 501 501 500 501
 $ size : int  221000 2000 1000 13000 3000 3000
 $ src  : chr  R R R R ...
 $ time : POSIXct, format: 2005-01-04 09:00:24 2005-01-04 09:00:47
2005-01-04 09:01:12 2005-01-04 09:01:18 ...
 $ h: chr  09 09 09 09 ...
 sqldf(select distinct h, src, count(*) from g group by h,src)
   h src count(*)
1 09   R6


On Thu, Sep 9, 2010 at 11:16 AM, Dimitri Shvorob
dimitri.shvo...@gmail.com wrote:

 g = head(x)
 dput(g)
 structure(list(price = c(500L, 500L, 501L, 501L, 500L, 501L),
    size = c(221000L, 2000L, 1000L, 13000L, 3000L, 3000L), src = c(R,
    R, R, R, R, R), t = structure(list(sec = c(24.133,
    47.096, 12.139, 18.142, 10.721, 28.713), min = c(0L, 0L,
    1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L,
    4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L,
    105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L,
    2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L,
    0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday,
    mon, year, wday, yday, isdst), class = c(POSIXt,
    POSIXlt)), d = structure(list(sec = c(0, 0, 0, 0, 0, 0),
        min = c(0L, 0L, 0L, 0L, 0L, 0L), hour = c(0L, 0L, 0L,
        0L, 0L, 0L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L,
        0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L,
        105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L,
        3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L
        )), .Names = c(sec, min, hour, mday, mon, year,
    wday, yday, isdst), class = c(POSIXt, POSIXlt)),
    h = structure(list(sec = c(0, 0, 0, 0, 0, 0), min = c(0L,
    0L, 0L, 0L, 0L, 0L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L,
    4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L,
    105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L,
    2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L,
    0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday,
    mon, year, wday, yday, isdst), class = c(POSIXt,
    POSIXlt)), m = structure(list(sec = c(0, 0, 0, 0, 0, 0),
        min = c(0L, 0L, 1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L,
        9L, 9L, 9L), mday = c(4L, 4L, 4L, 4L, 4L, 4L), mon = c(0L,
        0L, 0L, 0L, 0L, 0L), year = c(105L, 105L, 105L, 105L,
        105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L, 2L), yday = c(3L,
        3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L
        )), .Names = c(sec, min, hour, mday, mon, year,
    wday, yday, isdst), class = c(POSIXt, POSIXlt)),
    s = structure(list(sec = c(24, 47, 12, 18, 10, 28), min = c(0L,
    0L, 1L, 1L, 2L, 2L), hour = c(9L, 9L, 9L, 9L, 9L, 9L), mday = c(4L,
    4L, 4L, 4L, 4L, 4L), mon = c(0L, 0L, 0L, 0L, 0L, 0L), year = c(105L,
    105L, 105L, 105L, 105L, 105L), wday = c(2L, 2L, 2L, 2L, 2L,
    2L), yday = c(3L, 3L, 3L, 3L, 3L, 3L), isdst = c(0L, 0L,
    0L, 0L, 0L, 0L)), .Names = c(sec, min, hour, mday,
    mon, year, wday, yday, isdst), class = c(POSIXt,
    POSIXlt))), .Names = c(price, size, src, t, d,
 h, m, s), row.names = c(NA, 6L), class = data.frame)

 n = sqldf(select distinct h, src, count(*) from g group by h, src)
 Loading required package: tcltk
 Loading Tcl/Tk interface ... done
 Error in sqliteExecStatement(con, statement, bind.data) :
  RS-DBI driver: (error in statement: no such table: g)
 In addition: Warning message:
 In value[[3L]](cond) : RAW() can only be applied to a 'raw', not a 'double'

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Failure-to-aggregate-tp2528613p2533051.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Bug on chron

2010-09-09 Thread skan


hello

I think I've found a bug
I don't know if it's a chron bug or a R one.

(05/12/05 23:00:00) +1/24  gives
(05/12/05 24:00:00)
instead of 
(05/13/05 00:00:00)
it looks like the same but it's not because when you get the date of this
datetime it says day 12 instead of 13.


Please, forward it to the place where this bugs are supposed to be posted.

cheers
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Bug-on-chron-tp2533135p2533135.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] rgl and lighting

2010-09-09 Thread james.foadi

Dear R community (and Duncan more specifically),
I can't work out how to make additional light sources work in rgl.
Here is the example.

First I create a cube and visualize it:

 cubo - cube3d(col=black)
 shade3d(cubo)

Next I position the viewpoint at theta=0 and phi=30:

 view3d(theta=0,phi=30)

Next, I want to create a 2nd light source which diffuses red light from the 
front face. I thought I could do:

light3d(diffuse=red,theta=0,phi=0)

but...the front side doesn't show any red-iness. Same goes for specular and 
ambient.
What am I doing wrong here? How should the fron side show in red colour?

J

Dr James Foadi PhD
Membrane Protein Laboratory (MPL)
Diamond Light Source Ltd
Diamond House
Harewell Science and Innovation Campus
Chilton, Didcot
Oxfordshire OX11 0DE

Email:  james.fo...@diamond.ac.uk
Alt Email:  j.fo...@imperial.ac.uk


-- 
This e-mail and any attachments may contain confidential...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] coxph and ordinal variables?

2010-09-09 Thread Thomas Lumley


On Wed, 8 Sep 2010, Paul Johnson wrote:


run it with factor() instead of ordered().  You don't want the
orthogonal polynomial contrasts that result from ordered if you need
to compare against Stata.


If you don't want polynomial contrasts for ordered factors, you can just tell R 
not to use them.

options(contrasts=c(contr.treatment,contr.treatment))

It's like the Good Old Days when you had to use options() to tell S-PLUS not to 
use Helmert contrasts.


   -thomas

Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Saving/loading custom R scripts

On Thu, Sep 9, 2010 at 7:05 AM, Bos, Roger roger@rothschild.com wrote:
Josh,

I liked your idea of setting the repo in the .Rprofile file, so I tried it:

r - getOption(repos)
r[CRAN] - http://cran.stat.ucla.edu;
options(repos = r)
rm(r)

And now when I open R I get an error:

Error in r[CRAN] - http://cran.stat.ucla.edu; :
cannot do complex assignments in base namespace

II have been using that for several months now. I use a text editor
to create ~/.Rprofile (where ~ represents the path to my working
directory), and add those four lines of code. I don't know why it
would not work for you, and I cannot replicate the error myself so it
is hard to offer any suggestions.

I am using R2.11.1pat in windows.

Thanks,

Roger

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Joshua Wiley
Sent: Wednesday, September 08, 2010 11:20 AM
To: DrCJones
Cc: r-help@r-project.org
Subject: Re: [R] Saving/loading custom R scripts

Hi,

Just create a file called .Rprofile that is located in your working directory
(this means you could actually have different ones in each working
directory). In that file, you can put in code just like any other code that
would be source()d in. For instance, all my .Rprofile files start with:

r - getOption(repos)
r[CRAN] - http://cran.stat.ucla.edu;
options(repos = r)
rm(r)

So that I do not have to pick my CRAN mirror. Similarly you could merely add
this line to the file:

source(file =
http://www.r-statistics.com/wp-content/uploads/2010/02/Friedman-Test-with-Post-Hoc.r.txt;)

and R would go online, download that file and source it in (not that I am
recommending re-downloading every time you start R). Then whatever names
they used to define the functions, would be in your workspace.

Note that in general, you will not get any output alerting you that it has
worked; however, if you type ls() you should see those functions'
names.

Cheers,

Josh

On Wed, Sep 8, 2010 at 12:25 AM, DrCJones matthias.godd...@gmail.com wrote:

Hi,
How does R automatically load functions so that they are available
from the workspace? Is it anything like Matlab - you just specify a
directory path and it finds it?

The reason I ask is because I found a really nice script that I would
like to use on a regular basis, and it would be nice not to have to
'copy and paste' it into R on every startup:

http://www.r-statistics.com/wp-content/uploads/2010/02/Friedman-Test-w
ith-Post-Hoc.r.txt

This would be for Ubuntu, if that makes any difference.

Cheers
--
View this message in context:
http://r.789695.n4.nabble.com/Saving-loading-custom-R-scripts-tp253092
4p2530924.html Sent from the R help mailing list archive at
Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

This message is for the named person's use only. It may
contain confidential, proprietary or legally privileged
information. No right to confidential or privileged treatment
of this message is waived or lost by an error in transmission.
If you have received this message in error, please immediately
notify the the sender by e-mail, delete the message and all
copies from your system and destroy any hard copies. You must
not, directly or indirectly, use, disclose, distribute,
print or copy any part of this message if you are not
the intended recipient.

__
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
__

--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

Re: [R] Bug on chron

2010-09-09 Thread Gabor Grothendieck

On Thu, Sep 9, 2010 at 11:59 AM, skan juanp...@gmail.com wrote:

 hello

 I think I've found a bug
 I don't know if it's a chron bug or a R one.

 (05/12/05 23:00:00) +1/24  gives
 (05/12/05 24:00:00)
 instead of
 (05/13/05 00:00:00)
 it looks like the same but it's not because when you get the date of this
 datetime it says day 12 instead of 13.



I can't reproduce such behavior:

 library(chron)
 x - chron(05/12/05, 23:00:00) + 1/24; x
[1] (05/13/05 00:00:00)
 month.day.year(x)$day
[1] 13
 packageDescription(chron)$Version
[1] 2.3-36
 R.version.string
[1] R version 2.11.1 Patched (2010-05-31 r52167)
 win.version()
[1] Windows Vista (build 6002) Service Pack 2



-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regression function for categorical predictor data

Hi,

If your predictor variable is categorical than it should be converted
to a factor.  If it is continuous or being treated as such, you do not
need to.  It is generally quite easy to do:

varname - factor(varname)

or if it is in a data frame

yourdf$varname - factor(yourdf$varname)

Cheers,

Josh

On Thu, Sep 9, 2010 at 8:09 AM, karena dr.jz...@gmail.com wrote:

 Hi, thank you very much for the help.
 one more quick question: is that, my predictor variable should be coded as
 'factor' when using either 'lm' or 'glm'?

 sincerely,

 karena
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/regression-function-for-categorical-predictor-data-tp2532045p2533035.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Uncertainty analysis

2010-09-09 Thread Lathouri, Maria

Dear all

I would like to run in R an uncertainty/sensitivity analysis. I know that these 
two are performed together. I have a geochemical model where I have the inputs, 
the water variables (e.g. pH, temperature, oxygen ect) and as well an output of 
different variables. What I would like to do is to estimate the uncertainty 
that the output variables have considering the uncertainty of the input 
variables as well how the variations of my inputs contribute most to the 
variations of my output (probably though the sensitivity analysis). I was 
thinking perhaps with Monte Carlo analysis.

Is there a way to do that?

Thanks a lot.

Maria



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Saving/loading custom R scripts

2010-09-09 Thread Jakson A. Aquino

On Thu, Sep 9, 2010 at 1:14 PM, Joshua Wiley jwiley.ps...@gmail.com wrote:
 On Thu, Sep 9, 2010 at 7:05 AM, Bos, Roger roger@rothschild.com wrote:
 Josh,

 I liked your idea of setting the repo in the .Rprofile file, so I tried it:

 r - getOption(repos)
 r[CRAN] - http://cran.stat.ucla.edu;
 options(repos = r)
 rm(r)

I couldn't understand why to use 4 lines of code... You could try this:

options(repos = http://cran.stat.ucla.edu;)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] confidence intervals around p-values

One other case where a confidence interval on a p-value may make sense is 
permutation (or other resampling) tests.  The population parameter p-value 
would be the p-value that would be obtained from the distribution of all 
possible permutations, but in practice we just sample from that population and 
estimate a p-value.  The confidence interval would then be based on the number 
of sample permutations and could give an idea if that number was big enough.  
If the full confidence interval is less than alpha then you can be confident 
that the true p-value would give significance, if it is completely above 
alpha then it is not significant.  The real problem comes when the confidence 
interval includes alpha, that would indicate that B (the number of 
resamples/permutations) was not large enough.  Be careful, doing a small number 
of permutations then deciding to do more based on the CI would likely introduce 
bias (how much is another question).

The nice thing is that in this case the p-value is a simple proportion and the 
confidence interval can be computed using binom.test.

But, I fully agree that in most cases the idea of a CI for a p-value is not 
meaningful, you need to have some case where your p-value is an estimate of a 
population parameter p-value that has some meaning.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Ted Harding
 Sent: Thursday, September 09, 2010 8:25 AM
 To: r-help@r-project.org
 Cc: Fernando Marmolejo Ramos
 Subject: Re: [R] confidence intervals around p-values
 
 On 09-Sep-10 13:21:07, Duncan Murdoch wrote:
On 09/09/2010 6:44 AM, Fernando Marmolejo Ramos wrote:
  Dear all
 
  I wonder if anyone has heard of confidence intervals around
  p-values...
 
  That doesn't really make sense.  p-values are statistics, not
  parameters. You would compute a confidence interval around a
  population mean because that's a parameter, but you wouldn't
  compute a confidence interval around the sample mean: you've
  observed it exactly.
 
  Duncan Murdoch
 
 Duncan has succinctly stated the essential point in the standard
 interpretation. The P-value is calculated from the sample in
 hand, a definite null hypothesis, and the distribution of the
 test statistic given the null hyptohesis, so (given all of these)
 there is no scope for any other answer.
 
 However, there are circumstances in which the notion of confidence
 interval for a P-value makes some sense. One such might be the
 Mann-Whitney test for identity of distribution of two samples
 of continuous variables, where (because of discretisation of the
 values when they were recorded) there are ties.
 
 Then you know in theory that the underlying values are all
 different, but because you don't know where these lie in the
 discretisation intervals you don't know which way a tie may
 split. So it would make sense to simulate by splitting ties
 at random (e.g. uniformly distribute each 1.5 value over the
 interval (1.5,1.6) or (1.45,1.55)).
 
 For each such simulated tie-broken sample, calculate the P-value.
 Then you get a distribution of exact P-values calculated from
 samples without ties which are consistent with the recorded data.
 The central 95% of this distribution could be interpreted as a 95%
 coinfidence interval for the true P-value.
 
 To bring this closer to on-topic, here is an example in R
 (rounding to intervals of 0.2):
 
   set.seed(51324)
   X - sort(2*round(0.5*rnorm(12),1))
   Y - sort(2*round(0.5*rnorm(12)+0.25,1))
   rbind(X,Y)
 #   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
 # X -1.8 -1.2 -0.8 -0.6  0.00  0.2  0.2  1.2   1.8 2   2.2
 # Y -1.2 -0.4 -0.2  0.4  0.41  1.0  1.0  1.2   1.8 2   2.6
 # So several ties (-1.2,1.2,1.8,2.0), as well as 0.0, 0.4, 1.0
 # which don't matter.
 wilcox.test(X,Y,alternative=less,exact=TRUE,correct=FALSE)
 # data:  X and Y   W = 54, p-value = 0.1488
 
   Ps - numeric(1000)
   for(i in (1:1000)){
 Xr - (X-0.1) + 0.2*runif(10)
 Yr - (Y-0.1) + 0.2*runif(10)
 Ps[i] - wilcox.test(Xr,Yr,alternative=less,
  exact=TRUE,correct=FALSE)$p.value
   }
   hist(Ps)
   table(round(Ps,4))
   # 0.1328 0.1457 0.1593 0.1737 0.1888
   # 81267336226 90
 
 So this gives you a picture of the uncertainty in the P-value
 (0.1488, calculated from the rounded data) relative to what it
 really should have been (if calculated from unrounded data).
 Since each possible true (tie-broken) sample can be viewed
 as a hypothesis about unobserved truth, it does make a certain
 sense to view these results as a kind of confidence distribution
 for the P-value you should have got. However, this is more of a
 Bayesian argument, since the above calculation has assigned
 equal prior probability to the tie-breaks!
 
 One could also, I suppose, consider

Re: [R] Bug on chron

2010-09-09 Thread skan


Something strange.

Your example work but...
I have a zoo object.
I extract its element 21


 index(test[21])
 [1] (05/12/05 23:00:00)
 
 index(test[21])+1/24
 [1] (05/12/05 24:00:00)
 
 

Why 24:00 ?





 packageDescription(chron)$Version
 [1] 2.3-35
  R.version.string
 [1] R version 2.11.1 (2010-05-31)

 packageDescription(zoo)$Version
[1] 1.7-0


cheers
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Bug-on-chron-tp2533135p2533194.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Correlation question

Hi Stephane,

When I use your sample data (e.g., test, test.number), cor() throws an
error that x must be numeric (because of the factor or character
data).  Are you not getting any errors when trying to calculate the
correlation on these data?  If you are not, I wonder what version of R
are you using?  The quickest way to find out is sessionInfo().

As far as a work around, it would be relative simple to find out which
columns of your data frame were not numeric or integer and exclude
those (I'm happy to provide that code if you want).

Best regards,

Josh

On Thu, Sep 9, 2010 at 7:50 AM, Stephane Vaucher
vauch...@iro.umontreal.ca wrote:
 Thank you Dennis,

 You identified a factor (text column) that I was concerned with. I
 simplified my example to try and factor out possible causes. I eliminated
 the recurring values in columns (which were not the columns that caused
 problems). I produced three examples with simple data sets.

 1. Correct output, 2 columns only:

 test.notext = read.csv('test-notext.csv')
 cor(test.notext, method='spearman')

               P3     HP_tot
 P3      1.000 -0.2182876
 HP_tot -0.2182876  1.000

 dput(test.notext)

 structure(list(P3 = c(2L, 2L, 2L, 4L, 2L, 3L, 2L, 1L, 3L, 2L,
 2L, 2L, 3L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L),
    HP_tot = c(10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 136L,
    136L, 136L, 136L, 136L, 136L, 136L, 136L, 136L, 136L, 15L,
    15L, 15L, 15L, 15L, 15L, 15L)), .Names = c(P3, HP_tot
 ), class = data.frame, row.names = c(NA, -25L))

 2. Incorrect output where I introduced my P7 column containing text only the
 'a' character:

 test = read.csv('test.csv')
 cor(test, method='spearman')

               P3 P7     HP_tot
 P3      1.000 NA -0.2502878
 P7             NA  1         NA
 HP_tot -0.2502878 NA  1.000
 Warning message:
 In cor(test, method = spearman) : the standard deviation is zero

 dput(test)

 structure(list(P3 = c(2L, 2L, 2L, 4L, 2L, 3L, 2L, 1L, 3L, 2L,
 2L, 2L, 3L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L),
    P7 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
    ), .Label = a, class = factor), HP_tot = c(10L, 10L,
    10L, 10L, 10L, 10L, 10L, 10L, 136L, 136L, 136L, 136L, 136L,
    136L, 136L, 136L, 136L, 136L, 15L, 15L, 15L, 15L, 15L, 15L,
    15L)), .Names = c(P3, P7, HP_tot), class = data.frame, row.names
 = c(NA,
 -25L))

 3. Incorrect output with P7 containing a variety of alpha-numeric characters
 (ascii), to factor out equal valued column issue. Notice that the text
 column is interpreted as a numeric value.

 test.number = read.csv('test-alpha.csv')
 cor(test.number, method='spearman')

               P3         P7     HP_tot
 P3      1.000  0.4093108 -0.2502878
 P7      0.4093108  1.000 -0.3807193
 HP_tot -0.2502878 -0.3807193  1.000

 dput(test.number)

 structure(list(P3 = c(2L, 2L, 2L, 4L, 2L, 3L, 2L, 1L, 3L, 2L,
 2L, 2L, 3L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L),
    P7 = structure(c(11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L,
    19L, 20L, 21L, 22L, 23L, 24L, 25L, 1L, 2L, 3L, 4L, 5L, 6L,
    7L, 8L, 9L, 10L), .Label = c(0, 1, 2, 3, 4, 5,
    6, 7, 8, 9, a, b, c, d, e, f, g, h,
    i, j, k, l, m, n, o), class = factor), HP_tot = c(10L,
    10L, 10L, 10L, 10L, 10L, 10L, 10L, 136L, 136L, 136L, 136L,
    136L, 136L, 136L, 136L, 136L, 136L, 15L, 15L, 15L, 15L, 15L,
    15L, 15L)), .Names = c(P3, P7, HP_tot), class = data.frame,
 row.names = c(NA,
 -25L))

 Correct output is obtained by avoiding matrix computation of correlation:

 cor(test.number$P3, test.number$HP_tot, method='spearman')

 [1] -0.2182876

 It seems that a text column corrupts my correlation calculation (only in a
 matrix calculation). I assumed that text columns would not influence the
 result of the calculations.

 Is this a correct behaviour? If not,I can submit a bug report? If it is, is
 there a known workaround?

 cheers,
 Stephane Vaucher

 On Thu, 9 Sep 2010, Dennis Murphy wrote:

 Did you try taking out P7, which is text? Moreover, if you get a message
 saying ' the standard deviation is zero', it means that the entire column
 is
 constant. By definition, the covariance of a constant with a random
 variable
 is 0, but your data consists of values, so cor() understandably throws a
 warning that one or more of your columns are constant. Applying the
 following to your data (which I named expd instead),  we get

 sapply(expd[, -12], var)
         P1           P2           P3           P4           P5
 P6
 5.43e-01 1.08e+00 5.77e-01 1.08e+00 6.43e-01
 5.57e-01
         P8           P9          P10          P11          P12
 SITE
 5.73e-01 3.19e+00 5.07e-01 2.50e-01 5.50e+00
 2.49e+00
     Errors     warnings       Manual        Total        H_tot
 HP1.1
 9.072840e+03 2.081334e+04 7.43e-01 3.823500e+04 3.880250e+03
 2.676667e+00
      HP1.2        HP1.3        HP1.4       HP_tot        HO1.1

Re: [R] Saving/loading custom R scripts

On Thu, Sep 9, 2010 at 9:28 AM, Jakson A. Aquino jaksonaqu...@gmail.com wrote:
 On Thu, Sep 9, 2010 at 1:14 PM, Joshua Wiley jwiley.ps...@gmail.com wrote:
 On Thu, Sep 9, 2010 at 7:05 AM, Bos, Roger roger@rothschild.com wrote:
 Josh,

 I liked your idea of setting the repo in the .Rprofile file, so I tried it:

 r - getOption(repos)
 r[CRAN] - http://cran.stat.ucla.edu;
 options(repos = r)
 rm(r)

 I couldn't understand why to use 4 lines of code... You could try this:

You can have more than one repository, using repos = url, will
overwrite all of them.  For instance, I believe it is standard on
Windows to have CRAN and CRANextra.  The one line option probably
would be fine often.  In any case, reading through the documentation,
the code used there is:

local({r -
  getOption(repos); r[CRAN] - http://my.local.cran;;
  options(repos=r)})

perhaps wrapping it in local() will take care of your problem, Roger.


 options(repos = http://cran.stat.ucla.edu;)


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Bug on chron

Could this be a case of faq 7.31?  where rounding error means that you are 
seeing a time that is slightly before midnight (but printing shows it at 
midnight).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of skan
 Sent: Thursday, September 09, 2010 10:00 AM
 To: r-help@r-project.org
 Subject: [R] Bug on chron
 
 
 hello
 
 I think I've found a bug
 I don't know if it's a chron bug or a R one.
 
 (05/12/05 23:00:00) +1/24  gives
 (05/12/05 24:00:00)
 instead of
 (05/13/05 00:00:00)
 it looks like the same but it's not because when you get the date of
 this
 datetime it says day 12 instead of 13.
 
 
 Please, forward it to the place where this bugs are supposed to be
 posted.
 
 cheers
 --
 View this message in context: http://r.789695.n4.nabble.com/Bug-on-
 chron-tp2533135p2533135.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Bug on chron

2010-09-09 Thread skan


I don't know.
You can look at the file, is very short.
http://r.789695.n4.nabble.com/file/n2533223/test test 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Bug-on-chron-tp2533135p2533223.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] optimized value worse than starting Value

2010-09-09 Thread Hans W Borchers

Barry Rowlingson b.rowlingson at lancaster.ac.uk writes:

 
 On Wed, Sep 8, 2010 at 1:35 PM, Michael Bernsteiner
 dethlef1 at hotmail.com wrote:
 
  Dear all,
 
  I'm optimizing a relatively simple function. Using optimize the optimized
  parameter value is worse than the starting. why?


I would like to stress here that finding a global minimum is not as much
sorcery as this thread seems to suggest. A widely accepted procedure to 
provably identify a global minimum goes roughly as follows
(see Chapt. 4 in [1]):

  - Make sure the global minimum does not lie 'infinitely' for out.
  - Provide estimations for the derivatives/gradients.
  - Define a grid fine enough to capture or exclude minima.
  - Search grid cells coming into consideration and compare.

This can be applied to two- and higher-dimensional problems, but of course
may require enormous efforts. In science and engineering applications it is at
times necessary to really execute this approach.

Hans Werner

[1] F. Bornemann et al., The SIAM 100-Digit Challenge, 2004, pp. 79.

In fact, a slightly finer grid search will succeed in locating the
 proper minimum; several teams used such a search together with estimates
 based on the partial derivatives of f to show that the search was fine
 enough to guarantee capture of the answer.


  This looks familiar. Is this some 1-d version of the Rosenbrock
 Banana Function?
 
  http://en.wikipedia.org/wiki/Rosenbrock_function
 
  It's designed to be hard to find the minimum. In the real world one
 would hope that things would not have such a pathological behaviour.
 
  Numerical optimisations are best done using as many methods as
 possible - see optimise, nlm, optim, nlminb and the whole shelf of
 library books devoted to it.
 
 Barry


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] confidence intervals around p-values

2010-09-09 Thread JLucke

A confidence interval around the p-value makes no sense because there is 
no parameter being estimated, but the sampling distribution of the p-value 
makes a lot of sense.  The pre-observational P-value is a random variable 
that is a function of the underlying random variable being tested.  That 
is, P_X(t) = Pr(Xt) is itself a random variable with density, 
distribution, and moments.  Thus, one can compute the 95% sampling 
distribution around the expectation of P.

See 

Hung, H. M. J.; O'Neill, R. T.; Bauer, P.  Kohne, K. The behavior of the 
P-value when the alternative hypothesis is true Biometrics, 1997, 53, 1-22

Donahue, R. M. J. A note on information seldom reported via the p value. 
The American Statistician, American Statistical Association, 1999, 53, 
303-306 

 





Greg Snow greg.s...@imail.org 
Sent by: r-help-boun...@r-project.org
09/09/2010 12:29 PM

To
ted.hard...@manchester.ac.uk ted.hard...@manchester.ac.uk, 
r-help@r-project.org r-help@r-project.org
cc
Fernando Marmolejo Ramos fernando.marmolejora...@adelaide.edu.au
Subject
Re: [R] confidence intervals around p-values






One other case where a confidence interval on a p-value may make sense is 
permutation (or other resampling) tests.  The population parameter p-value 
would be the p-value that would be obtained from the distribution of all 
possible permutations, but in practice we just sample from that population 
and estimate a p-value.  The confidence interval would then be based on 
the number of sample permutations and could give an idea if that number 
was big enough.  If the full confidence interval is less than alpha then 
you can be confident that the true p-value would give significance, if 
it is completely above alpha then it is not significant.  The real problem 
comes when the confidence interval includes alpha, that would indicate 
that B (the number of resamples/permutations) was not large enough.  Be 
careful, doing a small number of permutations then deciding to do more 
based on the CI would likely introduce bias (how much is another 
question).

The nice thing is that in this case the p-value is a simple proportion and 
the confidence interval can be computed using binom.test.

But, I fully agree that in most cases the idea of a CI for a p-value is 
not meaningful, you need to have some case where your p-value is an 
estimate of a population parameter p-value that has some meaning.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Ted Harding
 Sent: Thursday, September 09, 2010 8:25 AM
 To: r-help@r-project.org
 Cc: Fernando Marmolejo Ramos
 Subject: Re: [R] confidence intervals around p-values
 
 On 09-Sep-10 13:21:07, Duncan Murdoch wrote:
On 09/09/2010 6:44 AM, Fernando Marmolejo Ramos wrote:
  Dear all
 
  I wonder if anyone has heard of confidence intervals around
  p-values...
 
  That doesn't really make sense.  p-values are statistics, not
  parameters. You would compute a confidence interval around a
  population mean because that's a parameter, but you wouldn't
  compute a confidence interval around the sample mean: you've
  observed it exactly.
 
  Duncan Murdoch
 
 Duncan has succinctly stated the essential point in the standard
 interpretation. The P-value is calculated from the sample in
 hand, a definite null hypothesis, and the distribution of the
 test statistic given the null hyptohesis, so (given all of these)
 there is no scope for any other answer.
 
 However, there are circumstances in which the notion of confidence
 interval for a P-value makes some sense. One such might be the
 Mann-Whitney test for identity of distribution of two samples
 of continuous variables, where (because of discretisation of the
 values when they were recorded) there are ties.
 
 Then you know in theory that the underlying values are all
 different, but because you don't know where these lie in the
 discretisation intervals you don't know which way a tie may
 split. So it would make sense to simulate by splitting ties
 at random (e.g. uniformly distribute each 1.5 value over the
 interval (1.5,1.6) or (1.45,1.55)).
 
 For each such simulated tie-broken sample, calculate the P-value.
 Then you get a distribution of exact P-values calculated from
 samples without ties which are consistent with the recorded data.
 The central 95% of this distribution could be interpreted as a 95%
 coinfidence interval for the true P-value.
 
 To bring this closer to on-topic, here is an example in R
 (rounding to intervals of 0.2):
 
   set.seed(51324)
   X - sort(2*round(0.5*rnorm(12),1))
   Y - sort(2*round(0.5*rnorm(12)+0.25,1))
   rbind(X,Y)
 #   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
 # X -1.8 -1.2 -0.8 -0.6  0.00  0.2  0.2  1.2   1.8 2   2.2
 # Y -1.2 -0.4 -0.2  0.4  0.41  1.0  1.0  1.2   1.8 2   2.6

Re: [R] rgl and lighting