Re: [R] Why was the ‘doSMP’ package removed from CRAN?

2012-01-26 Thread Uwe Ligges



On 25.01.2012 22:20, Tal Galili wrote:

Hello dear list,

I just noticed that:

Package ‘doSMP’ was removed from the CRAN repository.
http://cran.r-project.org/web/packages/doSMP/index.html

Does any one know the reason for this?
Is this a technical or a legal (e.g: license) issue?



If legal issues were the reason, you had not found it in the archives 
anymore.


Uwe




Thanks,
Tal



Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--

[[alternative HTML version deleted]]




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why was the ‘doSMP’ package removed from CRAN?

2012-01-26 Thread Barry Rowlingson
On Thu, Jan 26, 2012 at 8:02 AM, Uwe Ligges
lig...@statistik.tu-dortmund.de wrote:


 On 25.01.2012 22:20, Tal Galili wrote:

 Does any one know the reason for this?
 Is this a technical or a legal (e.g: license) issue?



 If legal issues were the reason, you had not found it in the archives
 anymore.


 Licensing can change - as a copyright holder I could decide the next
release of my package is going to be under a proprietary license that
doesn't allow redistribution. Any code already released under
something like the GPL can't be forcibly removed, and people can make
new forks from those, but if I'm the only person writing it and I
decide to change the license and the latest free version isn't
compatible with the current version of R then I'd expect to see the
old versions in the archives and no version for the latest version of
R.

 Last checkin at R-forge was only six weeks ago, and 1.0-3 installs
fine on my latest R:

 https://r-forge.r-project.org/scm/?group_id=950

I suspect they just haven't pushed it to R-forge yet. Cockup before conspiracy.

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why was the ‘doSMP’ package removed from CRAN?

2012-01-26 Thread Uwe Ligges



On 26.01.2012 09:39, Barry Rowlingson wrote:

On Thu, Jan 26, 2012 at 8:02 AM, Uwe Ligges
lig...@statistik.tu-dortmund.de  wrote:



On 25.01.2012 22:20, Tal Galili wrote:



Does any one know the reason for this?
Is this a technical or a legal (e.g: license) issue?




If legal issues were the reason, you had not found it in the archives
anymore.



  Licensing can change - as a copyright holder I could decide the next
release of my package is going to be under a proprietary license that
doesn't allow redistribution. Any code already released under
something like the GPL can't be forcibly removed, and people can make
new forks from those, but if I'm the only person writing it and I
decide to change the license and the latest free version isn't
compatible with the current version of R then I'd expect to see the
old versions in the archives and no version for the latest version of
R.

  Last checkin at R-forge was only six weeks ago, and 1.0-3 installs
fine on my latest R:

  https://r-forge.r-project.org/scm/?group_id=950

I suspect they just haven't pushed it to R-forge yet. Cockup before conspiracy.

Barry


Before people start with even more speculations: Reason is that the 
revoIPC package does no compile on modern versions of gcc and hence has 
been archived, doSMP depends on it and therefore went to the archives as 
well.


Uwe Ligges

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to remove rows representing concurrent sessions from data.frame?

2012-01-26 Thread johannes rara
I have a dataset like this (dput for this below) which represents user
computer sessions:

   username  machine   start end
1 user1 D5599.domain.com 2011-01-03 09:44:18 2011-01-03 09:47:27
2 user1 D5599.domain.com 2011-01-03 09:46:29 2011-01-03 10:09:16
3 user1 D5599.domain.com 2011-01-03 14:07:36 2011-01-03 14:56:17
4 user1 D5599.domain.com 2011-01-05 15:03:17 2011-01-05 15:23:15
5 user1 D5599.domain.com 2011-02-14 14:33:39 2011-02-14 14:40:16
6 user1 D5599.domain.com 2011-02-23 13:54:30 2011-02-23 13:58:23
7 user1 D5599.domain.com 2011-03-21 10:10:18 2011-03-21 10:32:22
8 user1 D5645.domain.com 2011-06-09 10:12:41 2011-06-09 10:58:59
9 user1 D5682.domain.com 2011-01-03 12:03:45 2011-01-03 12:29:43
10USER2 D5682.domain.com 2011-01-12 14:26:05 2011-01-12 14:32:53
11USER2 D5682.domain.com 2011-01-17 15:06:19 2011-01-17 15:44:22
12USER2 D5682.domain.com 2011-01-18 15:07:30 2011-01-18 15:42:43
13USER2 D5682.domain.com 2011-01-25 15:20:55 2011-01-25 15:24:38
14USER2 D5682.domain.com 2011-02-14 15:03:00 2011-02-14 15:07:43
15USER2 D5682.domain.com 2011-02-14 14:59:23 2011-02-14 15:14:47


There may be serveral concurrent sessions for same username from the
same computer. How can I remove those rows so that only one sessions
is left for this data? I have no idea how to do this, maybe using
difftime?

-J

structure(list(username = c(user1, user1, user1,
user1, user1, user1, user1, user1,
user1, USER2, USER2, USER2, USER2, USER2, USER2
), machine = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L), .Label = c(D5599.domain.com, D5645.domain.com,
D5682.domain.com, D5686.domain.com, D5694.domain.com, D5696.domain.com,
D5772.domain.com, D5772.domain.com, D5847.domain.com, D5855.domain.com,
D5871.domain.com, D5927.domain.com, D5927.domain.com, D5952.domain.com,
D5993.domain.com, D6012.domain.com, D6048.domain.com, D6077.domain.com,
D5688.domain.com, D5815.domain.com, D6106.domain.com, D6128.domain.com
), class = factor), start = structure(c(1294040658, 1294040789,
1294056456, 1294232597, 1297686819, 1298462070, 1300695018, 1307603561,
1294049025, 1294835165, 1295269579, 1295356050, 1295961655, 1297688580,
1297688363), class = c(POSIXct, POSIXt), tzone = ), end =
structure(c(1294040847,
1294042156, 1294059377, 1294233795, 1297687216, 1298462303, 1300696342,
1307606339, 1294050583, 1294835573, 1295271862, 1295358163, 1295961878,
1297688863, 1297689287), class = c(POSIXct, POSIXt), tzone = )),
.Names = c(username,
machine, start, end), row.names = c(NA, 15L), class = data.frame)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Workshop on Bayesian methods and WinBUGS. One week to go!

2012-01-26 Thread Polychronis Kostoulas
Workshop on Bayesian methods and WinBUGS
***

A two-day workshop on Bayesian methods is being held on Friday 3 - Saturday 4 
February 2012 at the University of Sydney. 

This course is suitable for graduate students, academics, researchers and 
professionals who wish to introduce themselves in the application of Bayesian 
methods and the use of WinBUGS software. The workshop is offered in association 
with the 11th International Colloquium on Paratuberculosis (ICP) but is also 
open to non-ICP participants.

Bayesian tools in the control of Paratuberculosis 

Program outline, Day 1 (3 February 2012):

-Introduction to Bayesian Inference (Probability theory; Bayes¢ theorem; The 
likelihood function; The prior  the posterior distribution).

-Overview of WinBUGS (Running the software; Writing code; Estimation of a 
single proportion; Hands-on WINBUGS)

-Bayesian prevalence estimation (One population; Multiple populations; 
Herd-level prevalence; Hands-on WINBUGS).


Program outline, Day 2 (4 February 2012):

-Bayesian latent class models in Se and Sp estimation (Conditionally 
independent  dependent tests; Continuous tests; Hands-on WinBUGS).

-Bayesian logistic regression (Ordinary logistic regression; Logistic 
regression adjusting for the Se and Sp of the diagnostic process; Hands-on 
WinBUGS). 

-Checklist of what should be present in a Bayesian analysis; Guidelines on 
informative prior specification; The ParaTB paradigm.

Tutors: 
Ian A. Gardner, University of Prince Edward Island, Canada
Polychronis Kostoulas, University of Thessaly, Greece

Local coordinator:
Navneet Dhand, University of Sydney, Australia

For the program and further information go to: 
http://www.icp2012.com.au/downloads/ICPWorkshopFlyer.pdf

Limited places (20) are available so be sure to register quickly. 

To register please go to: 
https://eiapac.certain.com/cem/getdemo.ei?id=7120002s=_2AC0SH5WT




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

2012-01-26 Thread Jhope
Thank you everyone for your dedication to improving 'R' - its function to
statistical analysis and comments. 

I have now 48 models (unique combinations of 1 to 6 variables) and have put
them into a list and gained the results for all models. Below is a sample of
my script  results:

m$model48 - Model48.glm
 sapply(m, extractAIC)

  model47  model48
[1,]8.000   46.000
[2,] 6789.863 3549.543

Q
1) How do you interpret these results? What is 1 and 2? 
2) I have been suggested to try a quasibinomial, once responses are fixed.
Is this necessary? If so is there a way I can do this by considering all
these models ?
3) Then gaussian?

Much appreciated! 
Saludos, J

--
View this message in context: 
http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4329798.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How do I use the cut function to assign specific cut points?

2012-01-26 Thread citadel
I am new to R, and I am trying to cut a continuous variable BMI into
different categories and can't figure out how to use it. I would like to cut
it into four groups: 20, 20-25, 25-30 and = 30.  I am having difficulty
figuring the code for 20 and =30? Please help. Thank you.

--
View this message in context: 
http://r.789695.n4.nabble.com/How-do-I-use-the-cut-function-to-assign-specific-cut-points-tp4329788p4329788.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I use the cut function to assign specific cut points?

2012-01-26 Thread Jim Lemon

On 01/26/2012 07:23 PM, citadel wrote:

I am new to R, and I am trying to cut a continuous variable BMI into
different categories and can't figure out how to use it. I would like to cut
it into four groups:20, 20-25, 25-30 and= 30.  I am having difficulty
figuring the code for20 and=30? Please help. Thank you.


Hi citadel,
Assuming that you only have positive numbers and they are all less than 100:

BMIcut-cut(BMI,breaks=c(0,20,25,30,100),
 include.lowest=TRUE,right=FALSE)

This will give you
 =0-20, =20-25, =20-30, =30-100

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] extracting from data.frames for survival analysis

2012-01-26 Thread Philip Robinson
Hi,

I have a data frame:

 class(B27.vec)
[1] data.frame

 head(B27.vec)

  AGE Gend B27 AgeOn DD uveitis psoriasis IBD CD UC InI BASDAI BASFI Smok UV
1  571   119 38   2 1   1  1  1   1   5.40  8.08   NA  1
2  351   133  2   2 1   1  1  1   1   1.69  2.28   NA  1
3  492   140  9   1 1   1  1  1   1   8.30  9.40   NA  0
4  321   121 11   1 1   1  1  1   1   5.10  9.10   NA  0
5  311   124  7   1 1   1  1  1   1   6.63  6.52   NA  0
6  271   123  4   1 2   1  1  1   1   7.19  6.51   NA  0

I am trying to perform survival analysis but continually get errors
when extracting from this data.frame:

attempt 1:
 X - Surv(B27.vec$AgeOn,B27.vec$UV)
 survdiff(X,rho=0,data=uvf)
Error in x$terms : $ operator is invalid for atomic vectors

attempt 2:
 X - Surv(B27.vec[,4],B27.vec[,15])
 survdiff(X,rho=0,data=uvf)
Error in x$terms : $ operator is invalid for atomic vector

attempt 3:
 AO - B27.vec[[AgeOn, exact = TRUE]]
 UV - B27.vec[[UV,exact=TRUE]]
 X - Surv(AO,UV)
 survdiff(X,rho=0,data=uvf)
Error in x$terms : $ operator is invalid for atomic vectors

I have read ?data.frame  extract.data.frame but I cannot understand
how I might structure this differently so it extracts the required
columns from this dataframe. For the second 2 attempts I am not using
the $ term. Sorry if this seems basic but cannot understand why
attempt 1 or 2 doesn't work.

thanks
Philip

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extracting from data.frames for survival analysis

2012-01-26 Thread Gerrit Eichner

Hi, Philip,

counter-questions:

1. Which/where is the grouping variable for the test of differences in 
survival?


2. Assume the grouping variable is Gend in B27.vec. Then, why aren't you 
using


survdiff( Surv( AgeOn, UV) ~ Gend, rho = 0, data = B27.vec)

?

Hth  -- Gerrit

On Thu, 26 Jan 2012, Philip Robinson wrote:


Hi,

I have a data frame:


class(B27.vec)

[1] data.frame


head(B27.vec)


 AGE Gend B27 AgeOn DD uveitis psoriasis IBD CD UC InI BASDAI BASFI Smok UV
1  571   119 38   2 1   1  1  1   1   5.40  8.08   NA  1
2  351   133  2   2 1   1  1  1   1   1.69  2.28   NA  1
3  492   140  9   1 1   1  1  1   1   8.30  9.40   NA  0
4  321   121 11   1 1   1  1  1   1   5.10  9.10   NA  0
5  311   124  7   1 1   1  1  1   1   6.63  6.52   NA  0
6  271   123  4   1 2   1  1  1   1   7.19  6.51   NA  0

I am trying to perform survival analysis but continually get errors
when extracting from this data.frame:

attempt 1:

X - Surv(B27.vec$AgeOn,B27.vec$UV)
survdiff(X,rho=0,data=uvf)

Error in x$terms : $ operator is invalid for atomic vectors

attempt 2:

X - Surv(B27.vec[,4],B27.vec[,15])
survdiff(X,rho=0,data=uvf)

Error in x$terms : $ operator is invalid for atomic vector

attempt 3:

AO - B27.vec[[AgeOn, exact = TRUE]]
UV - B27.vec[[UV,exact=TRUE]]
X - Surv(AO,UV)
survdiff(X,rho=0,data=uvf)

Error in x$terms : $ operator is invalid for atomic vectors

I have read ?data.frame  extract.data.frame but I cannot understand
how I might structure this differently so it extracts the required
columns from this dataframe. For the second 2 attempts I am not using
the $ term. Sorry if this seems basic but cannot understand why
attempt 1 or 2 doesn't work.

thanks
Philip

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Finding suspicious data points?

2012-01-26 Thread Fredrik Karlsson
Dear list,

I would like to find data points that at least should be checked one more
time before I process them further.
I've had a look at the outliers package for this, and the outliers function
in that package, but it appears to only return one value.

An example:

 outlier(c(1:3,rnorm(1000,mean=10,sd=300)))
[1] 1

I think at least 1,2 and 3 should be checked in this case.


Any ideas on how to achieve this in R?

Actually, the real data I will be investigating consist of vector norms and
angles (in an attempt to identify either very short, very long vectors, or
vectors pointing in an odd angle for the category to which it has been
assigned) so a 2D method would be even better.

I would much appreciate any help I can get on this,

/Fredrik



-- 
Life is like a trumpet - if you don't put anything into it, you don't get
anything out of it.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (no subject)

2012-01-26 Thread peter dalgaard
That's not how to do it. Please follow the link in the footer and the correct 
way to subscribe should become clear.

-pd


On Jan 25, 2012, at 21:49 , Paul Johnston wrote:

 subscribe
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding suspicious data points?

2012-01-26 Thread Carl Witthoft

According to the help file for 'outlier'  ,  (quoting)

x a data sample, vector in most cases. If argument is a dataframe, then 
outlier is

calculated for each column by sapply. The same behavior is applied by apply
when the matrix is given.  (endquote)

Looks like you could create a matrix that looks like an upper 
triangular like


1   1   1
NA  2   2
NA  NA  3

and see the results.  However, since 'outlier' just returns the value 
furthest from the mean, this doesn't really provide much information. 
If I were to write a function to find genuine outliers,  I would do 
something like


x[ abs(x-mean(x))  3*sd(x)] , thus returning all values more than 
3-sigma from the mean.




quote

I would like to find data points that at least should be checked one 
more time before I process them further.
I've had a look at the outliers package for this, and the outliers 
function in that package, but it appears to only return one value.


An example:

 outlier(c(1:3,rnorm(1000,mean=10,sd=300)))
[1] 1

I think at least 1,2 and 3 should be checked in this case.

Any ideas on how to achieve this in R?

Actually, the real data I will be investigating consist of vector norms 
and angles (in an attempt to identify either very short, very long 
vectors, or vectors pointing in an odd angle for the category to which 
it has been assigned) so a 2D method would be even better.


I would much appreciate any help I can get on this,


--

Sent from my Cray XK6
Pendeo-navem mei anguillae plena est.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I use the cut function to assign specific cut points?

2012-01-26 Thread Frank Harrell
It is not valid to categorize BMI.  This will result in major loss of
information and residual confounding.  Plus there is huge heterogeneity in
the BMI = 30 group.   Details are at
http://biostat.mc.vanderbilt.edu/CatContinuous and see these articles:

@Article{fil07cat,
  author =   {Filardo, Giovanni and Hamilton, Cody and Hamman, 
Baron and
Ng, Hon K. T. and Grayburn, Paul},
  title ={Categorizing {BMI} may lead to biased results in 
studies
investigating in-hospital mortality after isolated {CABG}},
  journal =  J Clin Epi,
  year = 2007,
  volume =   60,
  pages ={1132-1139},
  annote =   {BMI;CABG;surgical adverse events;hospital
mortality;epidemiology;smoothing methods;categorization;categorizing
continuous variables;investigators should waive categorization entirely and
use smoothed functions for continuous variables;examples of non-monotonic
relationships}
}
@Article{roy06dic,
  author =   {Royston, Patrick and Altman, Douglas G. and
Sauerbrei, Willi},
  title ={Dichotomizing continuous predictors in multiple
regression: a bad idea},
  journal =  Stat in Med,
  year = 2006,
  volume =   25,
  pages ={127-141},
  annote =   {continuous
covariates;dichotomization;categorization;regression;efficiency;clinical
research;residual confounding;destruction of statistical inference
when cutpoints are chosen using the response variable;varying effect
estimates when change cutpoints;difficult to interpret effects
when dichotomize;nice plot showing effect of categorization;PBC data}
}

If you work with colleagues who tell you this is the way it's done  don't
go down without a fight.  In general, good statistical practice dictates
that categorization is only done for producing certain tables (for which
case you might use the cut2 function in the Hmisc package).  Even that will
change as we incorporate more micrographics (think of loess plots with BMI
on the x-axis) within table cells as is now done in the Hmisc
summary.formula function for purely categorical variables.

Frank


citadel wrote
 
 I am new to R, and I am trying to cut a continuous variable BMI into
 different categories and can't figure out how to use it. I would like to
 cut it into four groups: 20, 20-25, 25-30 and = 30.  I am having
 difficulty figuring the code for 20 and =30? Please help. Thank you.
 

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/How-do-I-use-the-cut-function-to-assign-specific-cut-points-tp4329788p4330380.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting data from trellis object

2012-01-26 Thread Jean V Adams
tsippel wrote on 01/25/2012 01:23:05 PM:

 Simple question (I hope?).  How does one extract the data from a trellis
 object? I need to get the data from a call to histogram().
 
 A simple example below...
 
 dat-data.frame(year=as.factor(round(runif(500, 1990, 2010))), 
x=rnorm(500,
 75, 35),
   y=rep(seq(1,50,by=1), times=10))
 histogram(data=dat, y~x | year, type='count')
 
 How do I get the actual numbers plotted in the histograms from this call 
to
 histogram()?
 
 Thanks,
 
 Tim


hist.stuff - histogram(data=dat, y~x | year, type='count')
lapply(hist.stuff, print)

Jean
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Conditional Random Fields

2012-01-26 Thread Ana
Are there other packages besides CRF available on R, for Conditional
Random Fields ?

Thanks in advance.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Request for help on manipulation large data sets

2012-01-26 Thread Chee Chen
 Dear All,
I would like to ask for help on how to read different files automatically and 
do analysis using scripts.

1. Description of the data
1.1.  there are 5 text files, each of which contains cleaned data for the same 
100 SNPs.  Observations (e.g., position on gnome, alelle type, ...) for SNPs  
are rows ordered by the SNP numbers,
1.2.  there are 1 text file, containing the expression level of mRNAs 9 (and 
other information), which are rows ordered by mRNA numbers,

So for each SNP the sample size is 5.

2.  Description of aim
Take SNP 1 and mRNA 1 for example. Write a scrip that  can
2.1 extract  row 1 from each of the 5 text files for SNP data
2.2 extract row 1 from the mRNA text file
2.3  regress mRNA 1 counts on SNP 1
2.4 store regression results, SNP number, mRNA number and their positions on 
gnome

3.  key thing I need help
automatic extraction of observations for one particular SNP from SNP data 
files, since there are actually 400 SNP data files, or more generally extract 
information automatically from different files and store it properly for 
operation.

Thanks a lot!
Chee
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R.lnk shortcut: Warning message when opening Microsoft Word 2003

2012-01-26 Thread Duncan Murdoch

On 12-01-25 3:46 PM, Grant Anderson wrote:



Whenever I open Microsoft Word (2003), I get this warning two times---and then 
must manually 'OK' it twice before I can proceed in Word (a pain to do):

Â

The drive or network connection that the shortcut 'R.lnk' refers to is unavailable. 
Make sure that the disk is properly inserted or the network resource is available, and 
then try again.

Â

Perhaps I linked Word with R, somehow, when I recently installed R and Word 
on my new computer???

Â

Anyhow, since I don't want any Word-R interaction, how can I get rid of these 
two annoying warnings? I've already searched my entire hard drive for 'R.lnk' 
(using X1.com) and found only those R-links I understand and want.


This is clearly a Microsoft bug, so you should ask them.

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R extracting regression coefficients from multiple regressions using lapply command

2012-01-26 Thread Jean V Adams
Michael Wither wrote on 01/26/2012 12:08:19 AM:

 Hi, I have a question about running multiple in regressions in R and 
then
 storing the coefficients.  I have a large dataset with several 
variables,
 one of which is a state variable, coded 1-50 for each state. I'd like to
 run a regression of 28 select variables on the remaining 27 variables of
 the dataset (there are 55 variables total), and specific for each state, 
ie
 run a regression of variable1 on covariate1, covariate2, ..., 
covariate27
 for observations where state==1. I'd then like to repeat this for 
variable1
 for states 2-50, and the repeat the whole process for variable2,
 variable3,..., variable28. I think I've written the correct R code to do
 this, but the next thing I'd like to do is extract the coefficients,
 ideally into a coefficient matrix. Could someone please help me with 
this?
 Here's the code I've written so far, I'm not sure if this is the best 
way
 to do this.  Please help me.
 
 for (num in 1:50) {
 
 #PUF is the data set I'm using
 
 #Subset the data by states
 PUFnum - subset(PUF, state==num)
 
 #Attach data set with state specific data
 attach(PUFnum)
 
 #Run our prediction regression
 #the variables class1 through e19700 are the 27 covariates I want to 
use
 regression - lapply(PUFnum,  function(z) lm(z ~
 class1+class2+class3+class4+class5+class6+class7+xtot+e00200+e00300
 +e00600+e00900+e01000+p04470+e04800+e09600+e07180+e07220+e07260
 +e06500+e10300+e59720+e11900+e18425+e18450+e18500+e19700))
 
 Beta - lapply(regression, function(d) d- coef(regression$d))
 
  detach(PUFnum)
 
 }
 Thanks,
 Mike


This should help you get started.

# You don't provide any sample data, so I made some up myself
nstates - 5
nobs - 30
nys - 3
nxs - 4
PUF - data.frame(matrix(rnorm(nstates*nobs*(nys+nxs)), nrow=nstates*nobs, 

dimnames=list(NULL, c(paste(Y, 1:nys, sep=), paste(X, 1:nxs, 
sep=)
PUF$state - rep(1:nstates, nobs)
head(PUF)

# create a character vector of all your covariate names
# separated by a plus sign
# this will serve as the right half of your regression equations
covariates - paste(names(PUF)[nys + (1:nxs)], collapse= + )

# create an empty array to be filled with coefficients
coefs - array(NA, dim=c(nstates, nys, nxs+1))

# fill the array with coefficients
# this will work for you if the first 28 columns of your PUF
# data frame are the response variables
for(i in 1:nstates) {
for(j in 1:nys) {
coefs[i, j, ] - lm(formula(paste(names(PUF)[j], covariates, sep= 
~ )),
data=PUF[PUF$state==i, ])$coef
}}
coefs

Jean
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Inserting a character into a character string XXXX

2012-01-26 Thread Dan Abner
Hello everyone,

I have a character vector of 24 hour time values in the format hm
without the delimiting :. How can I insert the : immediately to
the left of the second digit from the right?

mytimes-scan(what=)
 1457
 1457
 1310
 1158
 137
 1855


Thanks!

Dan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] null distribution of binom.test p values

2012-01-26 Thread Chris Wallace

Dear R-help,

I must be missing something very obvious, but I am confused as to why 
the null distribution for p values generated by binom.test() appears to 
be non-uniform.  The histogram generated below has a trend towards 
values closer to 1 than 0.  I expected it to be flat.


hist(sapply(1:1000, function(i,n=100) 
binom.test(sum(rnorm(n)0),n,p=0.5,alternative=two)$p.value))


This trend is more pronounced for small n, and the distribution appears 
uniform for larger n, say n=1000.  I had expected the distribution to be 
discrete for small n, but not skewed.  Can anyone explain why?


Many thanks,

Chris.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] eRm package - Rasch simulation

2012-01-26 Thread James Holland
When I try to create a Rasch simulation of data using the sim.rasch
function, I get more items than I intend

#My code

library(eRm)

#Number of items
k - 20

#Number of participants
n - 100


#Create Rasch Data
#sim.rasch(persons, items, seed = NULL, cutpoint = randomized)

r.simulation - sim.rasch(n,k)

I end up with 20 participants and 100 items, but the instructions say
otherwise (and I believe I have the most up to date).  I reverse the n and
k

sim.rasch(k,n); it works out to how I intended; 100 participants and 20
items.


Can someone explain to me what I'm missing?  I've read the package
instructions and didn't see myself missing something (inputting vectors
instead of an integer).

James

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] eRm - Rasch modeling - First item missing from estimation

2012-01-26 Thread James Holland
I'm trying to kick off some of the rust and learn some of the R packages
for Rasch modeling.  When I tried using the eRm package, I get item
difficulty estimates for all my items accept the first (in terms of order)
item.

#Begin code

library(eRm)

r.simulation - sim.rasch(20,100)


r.data - r.simulation$items

#eRm results

erm.rasch - RM(r.data)
names(erm.rasch)

erm.items - erm.rasch$etapar

erm.items


#end code


All items accept V1 (the first item in the list) show up.


James

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How do I compare a multiple staged response with multivariables to a Development Index?

2012-01-26 Thread Jhope
Hi R- listeners,

I should add that I would like also to compare my field data to an index
model. The index was created by using the following script:

devel.index - function(values, weights=c(1, 2, 3, 4, 5, 6)) {
  foo - values*weights
  return(apply(foo, 1, sum) / apply(values, 1, sum))
}

Background:
Surveyed turtle egg embryos have been categorized into 6 stages of
development in the field. The stages in the field data are named ST0, ST1,
ST2, ST3, ST4, Shells. from the data = data.to.analyze. 

Q?
1. What is the best way to analyze the field data on embryonic development
of 6 stages? 
2. Doing this while considering, testing the variables: Veg, HTL,
Aeventexhumed, Sector, Rayos, TotalEggs?
3. And then compare the results to a development index.

The goal is to determine hatching success in various areas of the beach. And
try to create a development index of these microenvironments. Seasonality
would play a key role. Is this possible?

Many thanks!
Saludos, Jean



--
View this message in context: 
http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4329909.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-help Digest, Vol 107, Issue 25

2012-01-26 Thread Paul Miller
Hi Rui,

Thanks for your reply to my post. My code still has various shortcomings but at 
least now it is fully functional.

It may be that, as I transition to using R, I'll have to live with some less 
than ideal code, at least at the outset. I'll just have to write and re-write 
my code as I improve.

Appreciate your help.

Paul


Message: 66
Date: Tue, 24 Jan 2012 09:54:57 -0800 (PST)
From: Rui Barradas ruipbarra...@sapo.pt
To: r-help@r-project.org
Subject: Re: [R] Checking for invalid dates: Code works but needs
improvement
Message-ID: 1327427697928-4324533.p...@n4.nabble.com
Content-Type: text/plain; charset=us-ascii

Hello,

Point 3 is very simple, instead of 'print' use 'cat'.
Unlike 'print' it allows for several arguments and (very) simple formating.

  { cat(Error: Invalid date values in, DateNames[[i]], \n,
   TestDates[DateNames][[i]][TestDates$Invalid==1], \n) }

Rui Barradas

Message: 53
Date: Tue, 24 Jan 2012 08:54:49 -0800 (PST)
From: Paul Miller pjmiller...@yahoo.com
To: r-help@r-project.org
Subject: [R] Checking for invalid dates: Code works but needs
improvement
Message-ID:
1327424089.1149.yahoomailclas...@web161604.mail.bf1.yahoo.com
Content-Type: text/plain; charset=us-ascii

Hello Everyone,

Still new to R. Wrote some code that finds and prints invalid dates (see 
below). This code works but I suspect it's not very good. If someone could show 
me a better way, I'd greatly appreciate it.

Here is some information about what I'm trying to accomplish. My sense is that 
the R date functions are best at identifying invalid dates when fed character 
data in their default format. So my code converts the input dates to character, 
breaks them apart using strsplit, and then reformats them. It then identifies 
which dates are missing in the sense that the month or year are unknown and 
prints out any remaining invalid date values. 

As I see it, the code has at least 4 shortcomings.

1. It's too long. My understanding is that skilled programmers can usually or 
often complete tasks like this in a few lines.

2. It's not vectorized. I started out trying to do something that was 
vectorized but ran into problems with the strsplit function. I looked at the 
help file and it appears this function will only accept a single character 
vector.

3. It prints out the incorrect dates but doesn't indicate which date variable 
they belong to. I tried various things with paste but never came up with 
anything that worked. Ideally, I'd like to get something that looks roughly 
like:

Error: Invalid date values in birthDT

21931-11-23 
1933-06-31

Error: Invalid date values in diagnosisDT

2010-02-30

4. There's no way to specify names for input and output data. I imagine this 
would be fairly easy to specify this in the arguments to a function but am not 
sure how to incorporate it into a for loop.

Thanks,

Paul  

##
 Code for detecting invalid dates 
##

 Test Data 

connection - textConnection(
1 11/23/21931 05/23/2009 un/17/2011
2 06/20/1940  02/30/2010 03/17/2011
3 06/17/1935  12/20/2008 07/un/2011
4 05/31/1937  01/18/2007 04/30/2011
5 06/31/1933  05/16/2009 11/20/un
)

TestDates - data.frame(scan(connection, 
 list(Patient=0, birthDT=, diagnosisDT=, metastaticDT=)))

close(connection)

TestDates

class(TestDates$birthDT)
class(TestDates$diagnosisDT)
class(TestDates$metastaticDT)

 List of Date Variables 

DateNames - c(birthDT, diagnosisDT, metastaticDT)

 Read Dates 

for (i in seq(TestDates[DateNames])){
TestDates[DateNames][[i]] - as.character(TestDates[DateNames][[i]])
TestDates$ParsedDT - strsplit(TestDates[DateNames][[i]],/)
TestDates$Month - sapply(TestDates$ParsedDT,function(x)x[1])
TestDates$Day - sapply(TestDates$ParsedDT,function(x)x[2])
TestDates$Year - sapply(TestDates$ParsedDT,function(x)x[3])
TestDates$Day[TestDates$Day==un] - 15
TestDates[DateNames][[i]] - with(TestDates, paste(Year, Month, Day, sep = -))
is.na( TestDates[DateNames][[i]] [TestDates$Month==un] ) - T
is.na( TestDates[DateNames][[i]] [TestDates$Year==un] ) - T
TestDates$Date - as.Date(TestDates[DateNames][[i]], format=%Y-%m-%d)
TestDates$Invalid - ifelse(is.na(TestDates$Date)  
!is.na(TestDates[DateNames][[i]]), 1, 0)
if( sum(TestDates$Invalid)==0 ) 
{ TestDates[DateNames][[i]] - TestDates$Date } else
{ print ( TestDates[DateNames][[i]][TestDates$Invalid==1]) }
TestDates - subset(TestDates, select = -c(ParsedDT, Month, Day, Year, Date, 
Invalid))
}

TestDates

class(TestDates$birthDT)
class(TestDates$diagnosisDT)
class(TestDates$metastaticDT)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inserting a character into a character string XXXX

2012-01-26 Thread Gerrit Eichner

Hello, Dan,

you could probably use a combination of nchar(), substr() (or substring()) 
and paste(). Take a look at their online help pages.


Hth  --  Gerrit



Hello everyone,

I have a character vector of 24 hour time values in the format hm
without the delimiting :. How can I insert the : immediately to
the left of the second digit from the right?

mytimes-scan(what=)
1457
1457
1310
1158
137
1855


Thanks!

Dan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inserting a character into a character string XXXX

2012-01-26 Thread William Dunlap
   sub(([[:digit:]]{2,2})$, :\\1, mytimes)
  [1] 14:57 14:57 13:10 11:58 1:37  18:55

That will convert 05 to :05 and will do nothing
to 5.  Pad with 0's before calling sub if that is
required.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Dan Abner
 Sent: Thursday, January 26, 2012 6:50 AM
 To: r-help@r-project.org
 Subject: [R] Inserting a character into a character string 
 
 Hello everyone,
 
 I have a character vector of 24 hour time values in the format hm
 without the delimiting :. How can I insert the : immediately to
 the left of the second digit from the right?
 
 mytimes-scan(what=)
  1457
  1457
  1310
  1158
  137
  1855
 
 
 Thanks!
 
 Dan
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Checking for invalid dates: Code works but needs improvement

2012-01-26 Thread Paul Miller
Sorry, sent this earlier but forgot to add an informative subject line. Am 
resending, in the hopes of getting further replies. My apologies. Hope this is 
OK.

Paul


Hi Rui,

Thanks for your reply to my post. My code still has various shortcomings but at 
least now it is fully functional.

It may be that, as I transition to using R, I'll have to live with some less 
than ideal code, at least at the outset. I'll just have to write and re-write 
my code as I improve.

Appreciate your help.

Paul


Message: 66
Date: Tue, 24 Jan 2012 09:54:57 -0800 (PST)
From: Rui Barradas ruipbarra...@sapo.pt
To: r-help@r-project.org
Subject: Re: [R] Checking for invalid dates: Code works but needs
improvement
Message-ID: 1327427697928-4324533.p...@n4.nabble.com
Content-Type: text/plain; charset=us-ascii

Hello,

Point 3 is very simple, instead of 'print' use 'cat'.
Unlike 'print' it allows for several arguments and (very) simple formating.

  { cat(Error: Invalid date values in, DateNames[[i]], \n,
   TestDates[DateNames][[i]][TestDates$Invalid==1], \n) }

Rui Barradas

Message: 53
Date: Tue, 24 Jan 2012 08:54:49 -0800 (PST)
From: Paul Miller pjmiller...@yahoo.com
To: r-help@r-project.org
Subject: [R] Checking for invalid dates: Code works but needs
improvement
Message-ID:
1327424089.1149.yahoomailclas...@web161604.mail.bf1.yahoo.com
Content-Type: text/plain; charset=us-ascii

Hello Everyone,

Still new to R. Wrote some code that finds and prints invalid dates (see 
below). This code works but I suspect it's not very good. If someone could show 
me a better way, I'd greatly appreciate it.

Here is some information about what I'm trying to accomplish. My sense is that 
the R date functions are best at identifying invalid dates when fed character 
data in their default format. So my code converts the input dates to character, 
breaks them apart using strsplit, and then reformats them. It then identifies 
which dates are missing in the sense that the month or year are unknown and 
prints out any remaining invalid date values. 

As I see it, the code has at least 4 shortcomings.

1. It's too long. My understanding is that skilled programmers can usually or 
often complete tasks like this in a few lines.

2. It's not vectorized. I started out trying to do something that was 
vectorized but ran into problems with the strsplit function. I looked at the 
help file and it appears this function will only accept a single character 
vector.

3. It prints out the incorrect dates but doesn't indicate which date variable 
they belong to. I tried various things with paste but never came up with 
anything that worked. Ideally, I'd like to get something that looks roughly 
like:

Error: Invalid date values in birthDT

21931-11-23 
1933-06-31

Error: Invalid date values in diagnosisDT

2010-02-30

4. There's no way to specify names for input and output data. I imagine this 
would be fairly easy to specify this in the arguments to a function but am not 
sure how to incorporate it into a for loop.

Thanks,

Paul  

##
 Code for detecting invalid dates 
##

 Test Data 

connection - textConnection(
1 11/23/21931 05/23/2009 un/17/2011
2 06/20/1940  02/30/2010 03/17/2011
3 06/17/1935  12/20/2008 07/un/2011
4 05/31/1937  01/18/2007 04/30/2011
5 06/31/1933  05/16/2009 11/20/un
)

TestDates - data.frame(scan(connection, 
 list(Patient=0, birthDT=, diagnosisDT=, metastaticDT=)))

close(connection)

TestDates

class(TestDates$birthDT)
class(TestDates$diagnosisDT)
class(TestDates$metastaticDT)

 List of Date Variables 

DateNames - c(birthDT, diagnosisDT, metastaticDT)

 Read Dates 

for (i in seq(TestDates[DateNames])){
TestDates[DateNames][[i]] - as.character(TestDates[DateNames][[i]])
TestDates$ParsedDT - strsplit(TestDates[DateNames][[i]],/)
TestDates$Month - sapply(TestDates$ParsedDT,function(x)x[1])
TestDates$Day - sapply(TestDates$ParsedDT,function(x)x[2])
TestDates$Year - sapply(TestDates$ParsedDT,function(x)x[3])
TestDates$Day[TestDates$Day==un] - 15
TestDates[DateNames][[i]] - with(TestDates, paste(Year, Month, Day, sep = -))
is.na( TestDates[DateNames][[i]] [TestDates$Month==un] ) - T
is.na( TestDates[DateNames][[i]] [TestDates$Year==un] ) - T
TestDates$Date - as.Date(TestDates[DateNames][[i]], format=%Y-%m-%d)
TestDates$Invalid - ifelse(is.na(TestDates$Date)  
!is.na(TestDates[DateNames][[i]]), 1, 0)
if( sum(TestDates$Invalid)==0 ) 
{ TestDates[DateNames][[i]] - TestDates$Date } else
{ print ( TestDates[DateNames][[i]][TestDates$Invalid==1]) }
TestDates - subset(TestDates, select = -c(ParsedDT, Month, Day, Year, Date, 
Invalid))
}

TestDates

class(TestDates$birthDT)
class(TestDates$diagnosisDT)
class(TestDates$metastaticDT)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 

Re: [R] Coloring Canada provinces (package maps?)

2012-01-26 Thread Dimitri Liakhovitski
Barry, thanks a lot!
I was able to read in Candian data set from gadm:

library(raster)
# Finding ISO3 code for Canada
getData('ISO3')  # Canada's code is CAN
# Reading in data at different levels
can0-getData('GADM', country=CAN, level=0)
can1-getData('GADM', country=CAN, level=1)
can2-getData('GADM', country=CAN, level=2)
class(can0)
str(can0)
class(can1)
str(can1)

Apologies for a novice question (I've never worked with maps and
raster before): what is the way in raster to see what are the
geographic units within each data set (can0, can1, can2)?
And what function allows to plot them and color them?

Thanks a lot for the hints!
Dimitri



On Wed, Jan 25, 2012 at 4:37 PM, Barry Rowlingson
b.rowling...@lancaster.ac.uk wrote:
 On Wed, Jan 25, 2012 at 9:25 PM, Dimitri Liakhovitski
 dimitri.liakhovit...@gmail.com wrote:
 Dear R'ers,

 I am wondering what is the smallest geographicterritorial unit
 available for formatting in Canada. Provinces?


 I know that in the US it is the county so that I can color US
 counties any way I want, for example:

 ### Example for coloring US counties
 ### Creating an ARTIFICIAL criterion for coloring US counties:
 library(maps)

 If you want to extend your skills beyond the map package then you can
 plot anything that you can get a shapefile, or other common geospatial
 data set of, using the sp packages and friends such as maptools and
 rgdal.

  gadm has four levels of Canadian boundaries, at first glance -
 country, province (black), something smaller than province (blue) and
 then red which looks like urban divisions.

  The province upper-left doesn't seem to have any blue subdivisions,
 but that's possibly because there would be more subdivisions than
 people who actually live there.

 http://www.gadm.org/download

  Gadm also has a facility to download the data as .Rdata objects that
 can load straight into R.

  You might want to ask questions about spatial data programming on
 R-sig-geo or even on www.stackoverflow.com with the R tag.

 Barry



-- 
Dimitri Liakhovitski
marketfusionanalytics.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] null distribution of binom.test p values

2012-01-26 Thread Greg Snow
I believe that what you are seeing is due to the discrete nature of the 
binomial test.  When I run your code below I see the bar between 0.9 and 1.0 is 
about twice as tall as the bar between 0.0 and 0.1, but the bar between 0.8 and 
0.9 is not there (height 0), if you average the top 2 bars (0.8-0.9 and 
0.9-1.0) then the average height is similar to that of the lowest bar.  The bar 
between 0.5 and 0.6 is also 0, if you average that one with the next 2 (0.6-0.7 
and 0.7-0.8) then they are also similar to the bars near 0.



-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Chris Wallace
 Sent: Thursday, January 26, 2012 5:44 AM
 To: r-help@r-project.org
 Subject: [R] null distribution of binom.test p values
 
 Dear R-help,
 
 I must be missing something very obvious, but I am confused as to why
 the null distribution for p values generated by binom.test() appears to
 be non-uniform.  The histogram generated below has a trend towards
 values closer to 1 than 0.  I expected it to be flat.
 
 hist(sapply(1:1000, function(i,n=100)
 binom.test(sum(rnorm(n)0),n,p=0.5,alternative=two)$p.value))
 
 This trend is more pronounced for small n, and the distribution appears
 uniform for larger n, say n=1000.  I had expected the distribution to
 be
 discrete for small n, but not skewed.  Can anyone explain why?
 
 Many thanks,
 
 Chris.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in ifelse(append, a, w) : , (list) object cannot be coerced to type 'logical'

2012-01-26 Thread Juan Antonio Balbuena

   Hello
   I will appreciate your help with the following.
   Running a script in R 2.14.1 under windows vista I get the following error
   message:
Error in ifelse(append, a, w) :
 (list) object cannot be coerced to type 'logical'
   However, the very same script runs perfectly well under Ubuntu.
   My understanding is that this type of error is associated to writing data to
   disk. So the problem is likely caused by the following statements, which are
   inside a loop:
   P.vector - c(P1, P2, P3)
   write (P.vector, file = Type2_simul3_2012.txt, sep =\t , append =T).
   I  have read that including \ in such statements can be problematic,
   because it is a scape character in R, but trying sep=  instead of \t did
   not solve the problem.
   Any help will be much appreciated. Thanks in advance.
   Juan A. Balbuena

   --

   Dr. Juan A. Balbuena
   Marine Zoology Unit
   Cavanilles Institute of Biodiversity and Evolutionary Biology
   University of
   Valencia
   [1]http://www.uv.es/~balbuena
   P.O. Box 22085
   [2]http://www.uv.es/cavanilles/zoomarin/index.htm
   46071 Valencia, Spain
   [3]http://cetus.uv.es/mullpardb/index.html
   e-mail: [4]j.a.balbu...@uv.estel. +34 963 543 658fax +34 963 543 733
   
   NOTE! For shipments by EXPRESS COURIER use the following street address:
   C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain.
   

References

   1. http://www.uv.es/%7Ebalbuena
   2. http://www.uv.es/cavanilles/zoomarin/index.htm
   3. http://cetus.uv.es/mullpardb/index.html
   4. mailto:j.a.balbu...@uv.es
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] adding additional information to histogram

2012-01-26 Thread Raphael Bauduin
Hi,

I am a beginner with R, and I think the answer to my question will
seem obvious, but after searching and trying without success I've
decided to post to the list.

I am working with data loaded from a csv filewith these fields:
  order_id, item_value
As an order can have multiple items, an order_id may be present
multiple times in the CSV.

I managed to compute the total value  and the number of items for each order:

  oli - read.csv(/tmp/order_line_items_data.csv, header=TRUE)
  orders_values - tapply(oli[[2]], oli[[1]], sum)
  items_per_order - tapply(oli[[2]], oli[[1]], length)

I then can display the histogram of the order values:

  hist(orders_values, breaks=c(10*0:20,800), xlim=c(0,200), prob=TRUE)

Now on this histogram, I would like to display the average number of
items of the orders in each group (defined with the breaks).
So for the bar of orders with value 0 to 10, I'd like to display the
average number of items of these orders.

Thanks in advance

Raph

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function for grouping

2012-01-26 Thread yan
what if I don't need to store the combination results, I just want to get the
combination result for a given row.
e.g. for the 5 elements divided into 3 groups , and if I give another
parameter which is the row number of the results, in petr's example, say if
I set row number to 1, then I get 1,2,3,1,1.

So I need a systematic way of generating the combination, but I don't need
all the combinations in one go.

Many thanks


yan

--
View this message in context: 
http://r.789695.n4.nabble.com/function-for-grouping-tp4324436p4330877.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in ifelse(append, a, w) : , (list) object cannot be coerced to type 'logical'

2012-01-26 Thread David Winsemius


On Jan 26, 2012, at 10:17 AM, Juan Antonio Balbuena wrote:



  Hello
  I will appreciate your help with the following.
  Running a script in R 2.14.1 under windows vista I get the  
following error

  message:
   Error in ifelse(append, a, w) :
(list) object cannot be coerced to type 'logical'
  However, the very same script runs perfectly well under Ubuntu.
  My understanding is that this type of error is associated to  
writing data to
  disk. So the problem is likely caused by the following statements,  
which are

  inside a loop:
  P.vector - c(P1, P2, P3)
  write (P.vector, file = Type2_simul3_2012.txt, sep =\t ,  
append =T).


  I  have read that including \ in such statements can be  
problematic,
  because it is a scape character in R, but trying sep=  instead  
of \t did

  not solve the problem.



Isn't it much more likely that your failure to use TRUE and instead  
taking the lazy and failure-prone shortcut of using only T that is  
the root of the problem.




  Any help will be much appreciated. Thanks in advance.
  Juan A. Balbuena

  --

  Dr. Juan A. Balbuena
  Marine Zoology Unit
  Cavanilles Institute of Biodiversity and Evolutionary Biology
  University of
  Valencia
  [1]http://www.uv.es/~balbuena
  P.O. Box 22085
  [2]http://www.uv.es/cavanilles/zoomarin/index.htm
  46071 Valencia, Spain
  [3]http://cetus.uv.es/mullpardb/index.html
  e-mail: [4]j.a.balbu...@uv.estel. +34 963 543 658fax +34  
963 543 733

  
  NOTE! For shipments by EXPRESS COURIER use the following street  
address:

  C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain.
  

References

  1. http://www.uv.es/%7Ebalbuena
  2. http://www.uv.es/cavanilles/zoomarin/index.htm
  3. http://cetus.uv.es/mullpardb/index.html
  4. mailto:j.a.balbu...@uv.es
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lattice panels with grouped extra data as text?

2012-01-26 Thread Rainer Hurling
I have a problem with including extra data in a lattice graphic. I am 
trying to put some calculated values in an extra column of a boxplot. A 
minimal example should show what I am trying to do:



foo - data.frame(
  Treatment=rnorm(1:12,2),
  Variant=c(A,A,B,C,D,C,B,D,D,B,B,A),

Szenario=c(Dry,Wet,Dry,Dry,Wet,Dry,Wet,Wet,Dry,Wet,Dry,Dry),
  Area=c(sample(100, 12))
  )

require(lattice)
require(latticeExtra)

bwplot(Treatment ~ Variant | Szenario,
   data=foo,

   panel = function(...) {
 # Marking the extra column yellow
 panel.xblocks(c(4.5,5.0),
   c(rgb(255,255,0, alpha=127, max=255),
 rgb(255,255,0, alpha=127, max=255)))
 # Put in the extra values (instead of a string)
 panel.text(5, min(foo$Treatment), sum of foo$area per panel, 
srt=90, adj=c(0,0.5), cex=2)


 panel.bwplot(...)
   },

   # Create extra space for a column
   xlim=range(0.5,5.5),
   scales = list(x = list(labels=c(NA,A,B,C,D,)))
   )


I would like to put summarized area values (from Foo$Area) in the yellow 
coloured columns of each panel. So afterwards there should be one sum in 
the first panel and another sum in the next panel (and so on, if more 
than two panels).


It tried a little bit with groups and group.number but without success.

Is there any easy solution for this?

Any help is really appreciated.
Rainer Hurling

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] List to Array: How to establish the dimension of the array

2012-01-26 Thread Brian Diggs
Please include the context of the discussion in your responses.  See 
inline below.


On 1/24/2012 11:33 PM, Ajay Askoolum wrote:

Thanks you,

I can get the length of aa with length(unlist(aa)). If aa has 4
dimensions, I imagine I'd need to do

max(sapply(aa,sapply,sapply,length)


Correct.


How can I do this in a generic way? That is in a loop. I am clear
about the exit condition for the loop.


By generic, I assume that you mean without knowing the depth of the 
lists (that is, dimensions) to begin with?



d-1


start loop

if d = length(unlist(aa)) then exit loop


Note that length(unlist(aa)) will only equal the product of the 
dimensions if the data is regular.  That is, there is an entry for 
every combination of indices (within the dimensions).



else
 d-d *expr


How do I constructexpr  such that it does


d- d * length(aa)  # first pass



d- d * max(sapply(aa, length)) # second pass



d- d * max(sapply(aa, sapply, length)) # third pass




# ? ## fourth path etc






(Apologies for the pseudo code describing the loop; I am not near a
machine with R)


I don't really understand this.


One way I can thing of is to create a loop counter variable, say
lc-1 and to increment it within the loop and then use a switch
statement to execute the appropriate expression. This sems like a
kludge to me.

Is there a neater way?


If you are trying to get back a vector of the dimensions, then this 
would work:


dimRecursiveList - function(l) {
if (class(l) == list) {
c(length(l), dimRecursiveList(l[[1]]))
} else {
NULL
}
}

From previous context:

aa - list(list(list(37531.52, 62787.32, 5503.184, 33832.8),
list(20469.60, 27057.27, 51160.25, 45165.24),
list(957.932,  21902.94, 37531.52, 62787.32)),
   list(list(5503.184, 33832.8,  20469.6,  27057.27),
list(51160.25, 45165.24, 957.932,  21902.94),
list(37531.52, 62787.32, 5503.184, 33832.8)))

Which then gives

 dimRecursiveList(aa)
[1] 2 3 4

In this case, the data is regular, so

 Reduce(`*`, dimRecursiveList(aa)) == length(unlist(aa))
[1] TRUE

If the data are not regular, and you want the dimension to be the 
largest, then it is more complicated (due to bookkeeping)


dimRecursiveListIrregular - function(l) {
  if (class(l) == list) {
dims - sapply(l, dimRecursiveListIrregular)
if (is.null(dim(dims))) {
  if (all(is.na(dims))) {
length(l)
  } else {
c(length(l), max(dims, na.rm=TRUE))
  }
} else {
  c(length(l), apply(dims, 1, max, na.rm=TRUE))
}
  } else {
NA
  }
}

If the data are regular, then it is better to convert this to an array. 
This function will do it for arbitrary depth


RecursiveListToArray - function(l) {
  if(class(l) == list) {
laply(l, RecursiveListToArray)
  } else {
l
  }
}

 aaa - RecursiveListToArray(aa)
 dim(aaa)
[1] 2 3 4

--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health  Science University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] eRm package - Rasch simulation

2012-01-26 Thread Achim Zeileis

On Thu, 26 Jan 2012, James Holland wrote:


When I try to create a Rasch simulation of data using the sim.rasch
function, I get more items than I intend

#My code

library(eRm)

#Number of items
k - 20

#Number of participants
n - 100


#Create Rasch Data
#sim.rasch(persons, items, seed = NULL, cutpoint = randomized)

r.simulation - sim.rasch(n,k)

I end up with 20 participants and 100 items, but the instructions say
otherwise (and I believe I have the most up to date).  I reverse the n and
k


If I use your code above, I get

R dim(r.simulation)
[1] 100  20

which is exactly as expected: 100 rows = persons and 20 columns = items.

This is with the current CRAN version of eRm: 0.14-0.
Z


sim.rasch(k,n); it works out to how I intended; 100 participants and 20
items.


Can someone explain to me what I'm missing?  I've read the package
instructions and didn't see myself missing something (inputting vectors
instead of an integer).

James

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] eRm - Rasch modeling - First item missing from estimation

2012-01-26 Thread Achim Zeileis

On Thu, 26 Jan 2012, James Holland wrote:


I'm trying to kick off some of the rust and learn some of the R packages
for Rasch modeling.  When I tried using the eRm package, I get item
difficulty estimates for all my items accept the first (in terms of order)
item.

#Begin code

library(eRm)

r.simulation - sim.rasch(20,100)


r.data - r.simulation$items

#eRm results

erm.rasch - RM(r.data)
names(erm.rasch)

erm.items - erm.rasch$etapar

erm.items


#end code


All items accept V1 (the first item in the list) show up.


As explained on the manual page ?RM, one of the parameters is not 
identified (as the Rasch scale is latent with arbitrary 0). Either you can 
restrict the sum of all item parameters to zero (default) or the first 
item parameter. In both cases, the print output of the model omits the 
first item parameter. But coef(erm.rasch) will give you the appropriate 
expanded parameter vector.


See also vignette(eRm, package = eRm) for more details.
Z



James

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Issues with PearsonIV distribution

2012-01-26 Thread gaurang.jitendrabhai
Hi Michael,
I am using the same version contributed by Martin Backer. I have been in touch 
with him via email as well. He said yesterday that he can help me with it but 
is busy for some time.
This distribution function does not behave properly as can be seen in the 
attached spreadsheet. Check sheet RcodeOutput for Pearson Distribution fits 
in comparison to hyperbolic and normal fits on the same sample data.
Regards,
Gaurang 

-Original Message-
From: R. Michael Weylandt [mailto:michael.weyla...@gmail.com] 
Sent: 25 January 2012 20:00
To: Jitendrabhai Gaurang (AGC)
Cc: R-help@r-project.org
Subject: Re: [R] Issues with PearsonIV distribution

Which implementation of Pearson IV are you using?

If it's trouble with a homebrewed one, you might want to use the p/q
functions here:
http://cran.r-project.org/web/packages/PearsonDS/index.html

Otherwise, you likely will have to share some code for any concrete help.

Michael

On Wed, Jan 25, 2012 at 12:16 PM,  gaurang.jitendrab...@aviva.com wrote:
 Hi team,
 I am facing issues with PearsonIV distribution fitting in R.
 I am applying Hyperbolic and PearsonIV distributions on the equity returns in 
 UK over a period of 30 years.
 For the same data set i am getting strikingly different results under which 
 Hyperbolic distribution does produce negative percentiles of the return after 
 fitting but PearsonIV distribution does not.
 I think there is an issue with PearsonIV distribution in R; also there are no 
 plots available for PearsonIV distribution.
 Can you help? I am ready to share my code.

 Gaurang Mehta
 Risk Calibration | Group Economic Capital | Aviva Group
 Email: gaurang.jitendrab...@aviva.com
 14th Floor, St Helen's
 1 Undershaft
 London, EC3P 3DQ


 Aviva plc, registered Office: St. Helen's, 1 Undershaft, London EC3P 3DQ. 
 Registered in England No. 02468686. www.aviva.com
 This message and any attachments may be confidential or ...{{dropped:10}}

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

Aviva plc, registered Office: St. Helen's, 1 Undershaft, London EC3P 3DQ. 
Registered in England No. 02468686. www.aviva.com
This message and any attachments may be confidential or legally privileged. If 
you are not the intended recipient, please telephone or e-mail the sender and 
delete this message and any attachments from your system. Also, if you are not 
the intended recipient you must not copy this message or attachments or 
disclose the contents to any other person. Any views or opinions expressed are 
solely those of the author and do not necessarily represent those of Aviva.__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] null distribution of binom.test p values

2012-01-26 Thread Chris Wallace

Greg, thanks for the reply.

Unfortunately, I remain unconvinced!

I ran a longer simulation, 100,000 reps.  The size of the test is 
consistently too small (see below) and the histogram shows increasing 
bars even within the parts of the histogram with even bar spacing.  See 
https://www-gene.cimr.cam.ac.uk/staff/wallace/hist.png


y-sapply(1:10, function(i,n=100)
 binom.test(sum(rnorm(n)0),n,p=0.5,alternative=two)$p.value)
mean(y0.01)
# [1] 0.00584
mean(y0.05)
# [1] 0.03431
mean(y0.1)
# [1] 0.08646

Can that really be due to the discreteness of the distribution?

C.

On 26/01/12 16:08, Greg Snow wrote:

I believe that what you are seeing is due to the discrete nature of the 
binomial test.  When I run your code below I see the bar between 0.9 and 1.0 is 
about twice as tall as the bar between 0.0 and 0.1, but the bar between 0.8 and 
0.9 is not there (height 0), if you average the top 2 bars (0.8-0.9 and 
0.9-1.0) then the average height is similar to that of the lowest bar.  The bar 
between 0.5 and 0.6 is also 0, if you average that one with the next 2 (0.6-0.7 
and 0.7-0.8) then they are also similar to the bars near 0.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (no subject)

2012-01-26 Thread Harish Eswaran

Dear Rhelp,



When I try to plot a barplot, I get the following message:



Error in plot.new() : figure margins too large



What does this mean and how can I fix it?



I appreciate your help,

Horatio Gates

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] null distribution of binom.test p values

2012-01-26 Thread Greg Snow
Yes that is due to the discreteness of the distribution, consider the following:

 binom.test(39,100,.5)

Exact binomial test

data:  39 and 100 
number of successes = 39, number of trials = 100, p-value = 0.0352
alternative hypothesis: true probability of success is not equal to 0.5 
95 percent confidence interval:
 0.2940104 0.4926855 
sample estimates:
probability of success 
  0.39 

 binom.test(40,100,.5)

Exact binomial test

data:  40 and 100 
number of successes = 40, number of trials = 100, p-value = 0.05689
alternative hypothesis: true probability of success is not equal to 0.5 
95 percent confidence interval:
 0.3032948 0.5027908 
sample estimates:
probability of success 
   0.4

(you can do the same for 60 and 61)

So notice that the probability of getting 39 or more extreme is 0.0352, but 
anything less extreme will result in not rejecting the null hypothesis (because 
the probability of getting a 40 or a 60 (dbinom(40,100,.5)) is about 1% each, 
so we see a 2% jump there).  So the size/probability of a type I error will 
generally not be equal to alpha unless n is huge or alpha is chosen to 
correspond to a jump in the distribution rather than using common round values.

I have seen suggestions that instead of the standard test we use a test that 
rejects the null for values 39 and more extreme, don't reject the null for 41 
and less extreme, and if you see a 40 or 60 then you generate a uniform random 
number and reject if it is below a certain value (that value chosen to give an 
overall probability of type I error of 0.05).  This will correctly size the 
test, but becomes less reproducible (and makes clients nervous when they 
present their data and you pull out a coin, flip it, and tell them if they have 
significant results based on your coin flip (or more realistically a die 
roll)).  I think it is better in this case if you know your final sample size 
is going to be 100 to explicitly state that alpha will be 0.352 (but then you 
need to justify why you are not using the common 0.05 to reviewers).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: Chris Wallace [mailto:chris.wall...@cimr.cam.ac.uk]
 Sent: Thursday, January 26, 2012 9:36 AM
 To: Greg Snow
 Cc: r-help@r-project.org
 Subject: Re: [R] null distribution of binom.test p values
 
 Greg, thanks for the reply.
 
 Unfortunately, I remain unconvinced!
 
 I ran a longer simulation, 100,000 reps.  The size of the test is
 consistently too small (see below) and the histogram shows increasing
 bars even within the parts of the histogram with even bar spacing.  See
 https://www-gene.cimr.cam.ac.uk/staff/wallace/hist.png
 
 y-sapply(1:10, function(i,n=100)
   binom.test(sum(rnorm(n)0),n,p=0.5,alternative=two)$p.value)
 mean(y0.01)
 # [1] 0.00584
 mean(y0.05)
 # [1] 0.03431
 mean(y0.1)
 # [1] 0.08646
 
 Can that really be due to the discreteness of the distribution?
 
 C.
 
 On 26/01/12 16:08, Greg Snow wrote:
  I believe that what you are seeing is due to the discrete nature of
 the binomial test.  When I run your code below I see the bar between
 0.9 and 1.0 is about twice as tall as the bar between 0.0 and 0.1, but
 the bar between 0.8 and 0.9 is not there (height 0), if you average the
 top 2 bars (0.8-0.9 and 0.9-1.0) then the average height is similar to
 that of the lowest bar.  The bar between 0.5 and 0.6 is also 0, if you
 average that one with the next 2 (0.6-0.7 and 0.7-0.8) then they are
 also similar to the bars near 0.
 
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inserting a character into a character string XXXX

2012-01-26 Thread Dan Abner
Hi Bill,

Thanks very much for your response.

Can you suggest an approach for the pre-padding? Here is a more
respresentative sample of the values:

mytimes-scan(what=)
1334
2310
39
2300
1556
3
404
37
1320
4
211
2320


Thanks!

Dan



On Thu, Jan 26, 2012 at 10:41 AM, William Dunlap wdun...@tibco.com wrote:
   sub(([[:digit:]]{2,2})$, :\\1, mytimes)
  [1] 14:57 14:57 13:10 11:58 1:37  18:55

 That will convert 05 to :05 and will do nothing
 to 5.  Pad with 0's before calling sub if that is
 required.

 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Dan Abner
 Sent: Thursday, January 26, 2012 6:50 AM
 To: r-help@r-project.org
 Subject: [R] Inserting a character into a character string 

 Hello everyone,

 I have a character vector of 24 hour time values in the format hm
 without the delimiting :. How can I insert the : immediately to
 the left of the second digit from the right?

 mytimes-scan(what=)
  1457
  1457
  1310
  1158
  137
  1855


 Thanks!

 Dan

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Coloring Canada provinces (package maps?)

2012-01-26 Thread Barry Rowlingson
On Thu, Jan 26, 2012 at 4:03 PM, Dimitri Liakhovitski
dimitri.liakhovit...@gmail.com wrote:
 Barry, thanks a lot!
 I was able to read in Candian data set from gadm:

 library(raster)
 # Finding ISO3 code for Canada
 getData('ISO3')  # Canada's code is CAN
 # Reading in data at different levels
 can0-getData('GADM', country=CAN, level=0)
 can1-getData('GADM', country=CAN, level=1)
 can2-getData('GADM', country=CAN, level=2)
 class(can0)
 str(can0)
 class(can1)
 str(can1)

 Apologies for a novice question (I've never worked with maps and
 raster before): what is the way in raster to see what are the
 geographic units within each data set (can0, can1, can2)?
 And what function allows to plot them and color them?

 These are now SpatialPolygonDataFrame objects - like data frames, but
each row has an associated polygonal geometry. These actually come
from the sp package which the raster package has loaded for you.

 class(can0) will tell you this.

 names(can1) will tell you the names of the attributes of the polygons
which you can use like columns of a data frame.

 plot(can1) will plot it

 spplot(can1, columnname) will do a coloured plot.

 For more info, check the help for sp or read the R-Spatial Task View
on CRAN which has everything you need to know about maps and such in
R.

 Ask any more questions like this on the R-sig-geo mailing list where
the mapping R people hang out...

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] (no subject)

2012-01-26 Thread R. Michael Weylandt
I usually get that error when I'm replotting on a window/device that's
already really small:

e.g., on my Mac

plot(1:5)
## Make window really small
plot(1:6) # Throws error

Michael


On Thu, Jan 26, 2012 at 11:32 AM, Harish Eswaran hra...@att.net wrote:

                Dear Rhelp,



 When I try to plot a barplot, I get the following message:



 Error in plot.new() : figure margins too large



 What does this mean and how can I fix it?



 I appreciate your help,

 Horatio Gates

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aplpy recursive function on a list

2012-01-26 Thread Brian Diggs

On 1/25/2012 10:09 AM, patzoul wrote:

I have 2 series of data a,b and I would like to calculate a new series which
is z[t] = z[t-1]*a[t] + b[t] , z[1] = b[1].
How can I do that without using a loop?


A combination of Reduce and Map will work.  Map to stitch together the a 
and b vectors, Reduce to step along them, allowing for accumulation.


a - c(2,4,3,5)
b - c(1,3,5,7)

z - Reduce(function(zm1, coef) {coef[1] * zm1 + coef[2]},
Map(c, a, b),
init = b[1], accumulate = TRUE)

 z
[1]   1   3  15  50 257



--
View this message in context: 
http://r.789695.n4.nabble.com/aplpy-recursive-function-on-a-list-tp4328017p4328017.html
Sent from the R help mailing list archive at Nabble.com.




--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health  Science University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inserting a character into a character string XXXX

2012-01-26 Thread William Dunlap
One way to pad with initial zeros is to convert your
strings to integers and format the integers:
   sprintf(%04d, as.integer(mytimes))
   [1] 1334 2310 0039 2300 1556 0003 0404
   [8] 0037 1320 0004 0211 2320
It has the added benefit that the call to as.integer
will warn you if any supposed integers don't look like
integers.  If you use this approach you don't need the
call to sub():
   tmp - as.integer(mytimes)
   sprintf(%02d:%02d, tmp%/%100, tmp%%100)
   [1] 13:34 23:10 00:39 23:00 15:56 00:03
   [7] 04:04 00:37 13:20 00:04 02:11 23:20
You could also check that tmp%/%100 (the hour) and tmp%%100
(the minute) are in their expected ranges before calling
sprintf.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 -Original Message-
 From: Dan Abner [mailto:dan.abne...@gmail.com]
 Sent: Thursday, January 26, 2012 8:55 AM
 To: William Dunlap
 Cc: r-help@r-project.org
 Subject: Re: [R] Inserting a character into a character string 
 
 Hi Bill,
 
 Thanks very much for your response.
 
 Can you suggest an approach for the pre-padding? Here is a more
 respresentative sample of the values:
 
 mytimes-scan(what=)
 1334
 2310
 39
 2300
 1556
 3
 404
 37
 1320
 4
 211
 2320
 
 
 Thanks!
 
 Dan
 
 
 
 On Thu, Jan 26, 2012 at 10:41 AM, William Dunlap wdun...@tibco.com wrote:
    sub(([[:digit:]]{2,2})$, :\\1, mytimes)
   [1] 14:57 14:57 13:10 11:58 1:37  18:55
 
  That will convert 05 to :05 and will do nothing
  to 5.  Pad with 0's before calling sub if that is
  required.
 
  Bill Dunlap
  Spotfire, TIBCO Software
  wdunlap tibco.com
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] 
  On Behalf Of Dan Abner
  Sent: Thursday, January 26, 2012 6:50 AM
  To: r-help@r-project.org
  Subject: [R] Inserting a character into a character string 
 
  Hello everyone,
 
  I have a character vector of 24 hour time values in the format hm
  without the delimiting :. How can I insert the : immediately to
  the left of the second digit from the right?
 
  mytimes-scan(what=)
   1457
   1457
   1310
   1158
   137
   1855
 
 
  Thanks!
 
  Dan
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I compare a multiple staged response with multivariables to a Development Index?

2012-01-26 Thread R. Michael Weylandt
This might get more traction on the R-SIG-Ecology lists.

And best of luck to you; I quite like turtles.

Michael

On Thu, Jan 26, 2012 at 4:37 AM, Jhope jeanwaij...@gmail.com wrote:
 Hi R- listeners,

 I should add that I would like also to compare my field data to an index
 model. The index was created by using the following script:

 devel.index - function(values, weights=c(1, 2, 3, 4, 5, 6)) {
  foo - values*weights
  return(apply(foo, 1, sum) / apply(values, 1, sum))
 }

 Background:
 Surveyed turtle egg embryos have been categorized into 6 stages of
 development in the field. The stages in the field data are named ST0, ST1,
 ST2, ST3, ST4, Shells. from the data = data.to.analyze.

 Q?
 1. What is the best way to analyze the field data on embryonic development
 of 6 stages?
 2. Doing this while considering, testing the variables: Veg, HTL,
 Aeventexhumed, Sector, Rayos, TotalEggs?
 3. And then compare the results to a development index.

 The goal is to determine hatching success in various areas of the beach. And
 try to create a development index of these microenvironments. Seasonality
 would play a key role. Is this possible?

 Many thanks!
 Saludos, Jean



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4329909.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Coloring Canada provinces (package maps?)

2012-01-26 Thread Dimitri Liakhovitski
Thank you very much, Barry!
Dimitri

On Thu, Jan 26, 2012 at 12:00 PM, Barry Rowlingson
b.rowling...@lancaster.ac.uk wrote:
 On Thu, Jan 26, 2012 at 4:03 PM, Dimitri Liakhovitski
 dimitri.liakhovit...@gmail.com wrote:
 Barry, thanks a lot!
 I was able to read in Candian data set from gadm:

 library(raster)
 # Finding ISO3 code for Canada
 getData('ISO3')  # Canada's code is CAN
 # Reading in data at different levels
 can0-getData('GADM', country=CAN, level=0)
 can1-getData('GADM', country=CAN, level=1)
 can2-getData('GADM', country=CAN, level=2)
 class(can0)
 str(can0)
 class(can1)
 str(can1)

 Apologies for a novice question (I've never worked with maps and
 raster before): what is the way in raster to see what are the
 geographic units within each data set (can0, can1, can2)?
 And what function allows to plot them and color them?

  These are now SpatialPolygonDataFrame objects - like data frames, but
 each row has an associated polygonal geometry. These actually come
 from the sp package which the raster package has loaded for you.

  class(can0) will tell you this.

  names(can1) will tell you the names of the attributes of the polygons
 which you can use like columns of a data frame.

  plot(can1) will plot it

  spplot(can1, columnname) will do a coloured plot.

  For more info, check the help for sp or read the R-Spatial Task View
 on CRAN which has everything you need to know about maps and such in
 R.

  Ask any more questions like this on the R-sig-geo mailing list where
 the mapping R people hang out...

 Barry



-- 
Dimitri Liakhovitski
marketfusionanalytics.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

2012-01-26 Thread Frank Harrell
To pretend that AIC solves this problem is to ignore that AIC is just a
restatement of P-values.
Frank

Rubén Roa wrote
 
 I think we have gone through this before.
 First, the destruction of all aspects of statistical inference is not at
 stake, Frank Harrell's post notwithstanding.
 Second, checking all pairs is a way to see for _all pairs_ which model
 outcompetes which in terms of predictive ability by -2AIC or more. Just
 sorting them by the AIC does not give you that if no model is better than
 the next best by less than 2AIC.
 Third, I was not implying that AIC differences play the role of
 significance tests. I agree with you that model selection is better not
 understood as a proxy or as a relative of significance tests procedures.
 Incidentally, when comparing many models the AIC is often inconclusive. If
 one is bent on selecting just _the model_ then I check numerical
 optimization diagnostics such as size of gradients, KKT criteria, and
 other issues such as standard errors of parameter estimates and the
 correlation matrix of parameter estimates. For some reasons I don't find
 model averaging appealing. I guess deep in my heart I expect more from my
 model than just the best predictive ability.
 
 Rubén
 
 -Mensaje original-
 De: r-help-bounces@ [mailto:r-help-bounces@] En nombre de Ben Bolker
 Enviado el: miércoles, 25 de enero de 2012 15:41
 Para: r-help@.ethz
 Asunto: Re: [R]How do I compare 47 GLM models with 1 to 5 interactions and
 unique combinations?
 
 Rubén Roa rroa at azti.es writes:
 
 A more 'manual' way to do it is this.
 
 If you have all your models named properly and well organized, say 
 Model1, Model2, ..., Model 47, with a slot with the AIC (glm produces 
 a list with one component named 'aic') then something like
 this:
  
 x - matrix(0,1081,3) x[,1:2] - t(combn(47,2)) for(i in 1:n){x[,3]
 - abs(unlist(combn(eval(as.name(paste(Model,i,sep=)))$aic[1,])-
 unlist(combn(eval(as.name(paste(Model,i,sep=)))$aic,2)[2,]))}
 
  
 will calculate all the 1081 AIC differences between paired comparisons 
 of the 47 models. It may not be pretty but works for me.
 
  
 An AIC difference equal or less than 2 is a tie, anything higher is 
 evidence for the model with the less AIC (Sakamoto et al., Akaike 
 Information Criterion Statistics, KTK Scientific Publishers, Tokyo).
  
 
   I wouldn't go quite as far as Frank Harrell did in his answer, but
 
 (1) it seems silly to do all pairwise comparisons of models; one of the
 big advantages of information-theoretic approaches is that you can just
 list the models in order of AIC ...
 
 (2) the dredge() function from the MuMIn package may be useful for
 automating this sort of thing.  There is an also an AICtab function in the
 emdbook package.
 
 (3) If you're really just interested in picking the single model with the
 best expected predictive capability (and all of the other assumptions of
 the AIC approach are met -- very large data set, good fit to the data,
 etc.), then just picking the model with the best AIC will work.  It's when
 you start to make inferences on the individual parameters within that
 model, without taking account of the model selection process, that you run
 into trouble.  If you are really concerned about good predictions then it
 may be better to do multi-model averaging *or* use some form of shrinkage
 estimator.
 
 (4) The delta-AIC2 means pretty much the same rule of thumb is fine,
 but it drives me nuts when people use it as a quasi-significance-testing
 rule, as in the simpler model is adequate if dAIC2.  If you're going to
 work in the model selection arena, then don't follow hypothesis-testing
 procedures!  A smaller AIC means lower expected K-L distance, period.
 
 For the record, Brian Ripley has often expressed the (minority) opinion
 that AIC is *not* appropriate for comparing non-nested models (e.g. 
 lt;http://tolstoy.newcastle.edu.au/R/help/06/02/21794.htmlgt;).
 
 
 
 -Original Message-
 From: r-help-bounces at r-project.org on behalf of Milan 
 Bouchet-Valat
 Sent: Wed 1/25/2012 10:32 AM
 To: Jhope
 Cc: r-help at r-project.org
 
 Subject: Re: [R] How do I compare 47 GLM models with 1 to 5  
 interactions and unique combinations?
 
  
 Le mardi 24 janvier 2012 à 20:41 -0800, Jhope a écrit :
  Hi R-listers,
  
  I have developed 47 GLM models with different combinations of 
  interactions from 1 variable to 5 variables. I have manually made 
  each model separately and put them into individual tables (organized 
  by the number of variables) showing the AIC score. I want to compare
 all of these models.
  
  1) What is the best way to compare various models with unique 
  combinations and different number of variables?
 See ?step or ?stepAIC (from package MASS) if you want an automated way 
 of doing this.
 
  2) I am trying to develop the most simplest model ideally. Even 
  though adding another variable would lower the AIC,
 
   No, not necessarily.
 
 how do I interpret it is worth
  

Re: [R] function for grouping

2012-01-26 Thread Petr Savicky
On Thu, Jan 26, 2012 at 08:29:22AM -0800, yan wrote:
 what if I don't need to store the combination results, I just want to get the
 combination result for a given row.
 e.g. for the 5 elements divided into 3 groups , and if I give another
 parameter which is the row number of the results, in petr's example, say if
 I set row number to 1, then I get 1,2,3,1,1.
 
 So I need a systematic way of generating the combination, but I don't need
 all the combinations in one go.
 
 Many thanks

Hi.

I suppose that your question is meant for some larger
example than 5 elements into 3 groups, since for this
small case, we can store the whole table.

Considering, for example, 200 elements and 3 groups, which
you mentioned previously, this can probably be done using
some implementation of large integers. The reason is that
the final table has approximately 3^200/6 rows. So, the index
of the last row has approx log2(3^200/6) = 314.4 bits.
The standard numbers have 53 bits. For large integers,
one can use packages gmp, Rmpfr or some package
for symbolic algebra, for example Ryacas.

Do you want to use a large integer m for a search of
the exactly m-th partitioning in lexicographic order?

Let me point out that sampling random partitionings can
be done much easier.

Petr.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inserting a character into a character string XXXX

2012-01-26 Thread Dan Abner
Cheers Bill!!



On Thu, Jan 26, 2012 at 12:03 PM, William Dunlap wdun...@tibco.com wrote:
 One way to pad with initial zeros is to convert your
 strings to integers and format the integers:
   sprintf(%04d, as.integer(mytimes))
   [1] 1334 2310 0039 2300 1556 0003 0404
   [8] 0037 1320 0004 0211 2320
 It has the added benefit that the call to as.integer
 will warn you if any supposed integers don't look like
 integers.  If you use this approach you don't need the
 call to sub():
   tmp - as.integer(mytimes)
   sprintf(%02d:%02d, tmp%/%100, tmp%%100)
   [1] 13:34 23:10 00:39 23:00 15:56 00:03
   [7] 04:04 00:37 13:20 00:04 02:11 23:20
 You could also check that tmp%/%100 (the hour) and tmp%%100
 (the minute) are in their expected ranges before calling
 sprintf.

 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com

 -Original Message-
 From: Dan Abner [mailto:dan.abne...@gmail.com]
 Sent: Thursday, January 26, 2012 8:55 AM
 To: William Dunlap
 Cc: r-help@r-project.org
 Subject: Re: [R] Inserting a character into a character string 

 Hi Bill,

 Thanks very much for your response.

 Can you suggest an approach for the pre-padding? Here is a more
 respresentative sample of the values:

 mytimes-scan(what=)
 1334
 2310
 39
 2300
 1556
 3
 404
 37
 1320
 4
 211
 2320


 Thanks!

 Dan



 On Thu, Jan 26, 2012 at 10:41 AM, William Dunlap wdun...@tibco.com wrote:
    sub(([[:digit:]]{2,2})$, :\\1, mytimes)
   [1] 14:57 14:57 13:10 11:58 1:37  18:55
 
  That will convert 05 to :05 and will do nothing
  to 5.  Pad with 0's before calling sub if that is
  required.
 
  Bill Dunlap
  Spotfire, TIBCO Software
  wdunlap tibco.com
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] 
  On Behalf Of Dan Abner
  Sent: Thursday, January 26, 2012 6:50 AM
  To: r-help@r-project.org
  Subject: [R] Inserting a character into a character string 
 
  Hello everyone,
 
  I have a character vector of 24 hour time values in the format hm
  without the delimiting :. How can I insert the : immediately to
  the left of the second digit from the right?
 
  mytimes-scan(what=)
   1457
   1457
   1310
   1158
   137
   1855
 
 
  Thanks!
 
  Dan
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Checking for invalid dates: Code works but needs improvement

2012-01-26 Thread Marc Schwartz
Paul,

I have a partial solution for you. It is partial in that I have not quite 
figured out the correct incantation to convert a 5 digit year (eg. 11/23/21931) 
properly using the R date functions. According to various sources (eg. man 
strptime and man strftime) as well as the R help for both functions, there are 
extended formats available, but I am having a bout of cerebral flatulence in 
getting them to work correctly and a search has not been fruitful. Perhaps 
someone else can offer some insights.

That being said, with the exception of correctly handling that one situation, 
which arguably IS a valid date a long time in the future and which would 
otherwise result in a truncated year (first four digits only)

 as.Date(11/23/21931, format = %m/%d/%Y)
[1] 2193-11-23

Here is one approach:

# Check the date. If as.Date() fails or the input is  10 characters return it
checkDate - function(x) as.character(x[is.na(as.Date(x, format = %m/%d/%Y)) 
| 
nchar(as.character(x))  10])

 lapply(TestDates[, -1], checkDate)
$birthDT
[1] 11/23/21931 06/31/1933 

$diagnosisDT
[1] 02/30/2010

$metastaticDT
[1] un/17/2011 07/un/2011 11/20/un  


You could fine tune the checkDate() function to handle other formats, etc.

HTH,

Marc Schwartz


On Jan 26, 2012, at 9:54 AM, Paul Miller wrote:

 Sorry, sent this earlier but forgot to add an informative subject line. Am 
 resending, in the hopes of getting further replies. My apologies. Hope this 
 is OK.
 
 Paul
 
 
 Hi Rui,
 
 Thanks for your reply to my post. My code still has various shortcomings but 
 at least now it is fully functional.
 
 It may be that, as I transition to using R, I'll have to live with some less 
 than ideal code, at least at the outset. I'll just have to write and re-write 
 my code as I improve.
 
 Appreciate your help.
 
 Paul
 
 
 Message: 66
 Date: Tue, 24 Jan 2012 09:54:57 -0800 (PST)
 From: Rui Barradas ruipbarra...@sapo.pt
 To: r-help@r-project.org
 Subject: Re: [R] Checking for invalid dates: Code works but needs
improvement
 Message-ID: 1327427697928-4324533.p...@n4.nabble.com
 Content-Type: text/plain; charset=us-ascii
 
 Hello,
 
 Point 3 is very simple, instead of 'print' use 'cat'.
 Unlike 'print' it allows for several arguments and (very) simple formating.
 
  { cat(Error: Invalid date values in, DateNames[[i]], \n,
   TestDates[DateNames][[i]][TestDates$Invalid==1], \n) }
 
 Rui Barradas
 
 Message: 53
 Date: Tue, 24 Jan 2012 08:54:49 -0800 (PST)
 From: Paul Miller pjmiller...@yahoo.com
 To: r-help@r-project.org
 Subject: [R] Checking for invalid dates: Code works but needs
improvement
 Message-ID:
1327424089.1149.yahoomailclas...@web161604.mail.bf1.yahoo.com
 Content-Type: text/plain; charset=us-ascii
 
 Hello Everyone,
 
 Still new to R. Wrote some code that finds and prints invalid dates (see 
 below). This code works but I suspect it's not very good. If someone could 
 show me a better way, I'd greatly appreciate it.
 
 Here is some information about what I'm trying to accomplish. My sense is 
 that the R date functions are best at identifying invalid dates when fed 
 character data in their default format. So my code converts the input dates 
 to character, breaks them apart using strsplit, and then reformats them. It 
 then identifies which dates are missing in the sense that the month or year 
 are unknown and prints out any remaining invalid date values. 
 
 As I see it, the code has at least 4 shortcomings.
 
 1. It's too long. My understanding is that skilled programmers can usually or 
 often complete tasks like this in a few lines.
 
 2. It's not vectorized. I started out trying to do something that was 
 vectorized but ran into problems with the strsplit function. I looked at the 
 help file and it appears this function will only accept a single character 
 vector.
 
 3. It prints out the incorrect dates but doesn't indicate which date variable 
 they belong to. I tried various things with paste but never came up with 
 anything that worked. Ideally, I'd like to get something that looks roughly 
 like:
 
 Error: Invalid date values in birthDT
 
 21931-11-23 
 1933-06-31
 
 Error: Invalid date values in diagnosisDT
 
 2010-02-30
 
 4. There's no way to specify names for input and output data. I imagine this 
 would be fairly easy to specify this in the arguments to a function but am 
 not sure how to incorporate it into a for loop.
 
 Thanks,
 
 Paul  
 
 ##
  Code for detecting invalid dates 
 ##
 
  Test Data 
 
 connection - textConnection(
 1 11/23/21931 05/23/2009 un/17/2011
 2 06/20/1940  02/30/2010 03/17/2011
 3 06/17/1935  12/20/2008 07/un/2011
 4 05/31/1937  01/18/2007 04/30/2011
 5 06/31/1933  05/16/2009 11/20/un
 )
 
 TestDates - data.frame(scan(connection, 
 list(Patient=0, birthDT=, diagnosisDT=, metastaticDT=)))
 
 close(connection)
 
 

Re: [R] Checking for invalid dates: Code works but needs improvement

2012-01-26 Thread Gabor Grothendieck
On Tue, Jan 24, 2012 at 11:54 AM, Paul Miller pjmiller...@yahoo.com wrote:
 Hello Everyone,

 Still new to R. Wrote some code that finds and prints invalid dates (see 
 below). This code works but I suspect it's not very good. If someone could 
 show me a better way, I'd greatly appreciate it.

 Here is some information about what I'm trying to accomplish. My sense is 
 that the R date functions are best at identifying invalid dates when fed 
 character data in their default format. So my code converts the input dates 
 to character, breaks them apart using strsplit, and then reformats them. It 
 then identifies which dates are missing in the sense that the month or year 
 are unknown and prints out any remaining invalid date values.

 As I see it, the code has at least 4 shortcomings.

 1. It's too long. My understanding is that skilled programmers can usually or 
 often complete tasks like this in a few lines.

 2. It's not vectorized. I started out trying to do something that was 
 vectorized but ran into problems with the strsplit function. I looked at the 
 help file and it appears this function will only accept a single character 
 vector.

 3. It prints out the incorrect dates but doesn't indicate which date variable 
 they belong to. I tried various things with paste but never came up with 
 anything that worked. Ideally, I'd like to get something that looks roughly 
 like:

 Error: Invalid date values in birthDT

 21931-11-23
 1933-06-31

 Error: Invalid date values in diagnosisDT

 2010-02-30

 4. There's no way to specify names for input and output data. I imagine this 
 would be fairly easy to specify this in the arguments to a function but am 
 not sure how to incorporate it into a for loop.

 Thanks,

 Paul

 ##
  Code for detecting invalid dates 
 ##

  Test Data 

 connection - textConnection(
 1 11/23/21931 05/23/2009 un/17/2011
 2 06/20/1940  02/30/2010 03/17/2011
 3 06/17/1935  12/20/2008 07/un/2011
 4 05/31/1937  01/18/2007 04/30/2011
 5 06/31/1933  05/16/2009 11/20/un
 )

 TestDates - data.frame(scan(connection,
                 list(Patient=0, birthDT=, diagnosisDT=, metastaticDT=)))

 close(connection)

 TestDates

 class(TestDates$birthDT)
 class(TestDates$diagnosisDT)
 class(TestDates$metastaticDT)

  List of Date Variables 

 DateNames - c(birthDT, diagnosisDT, metastaticDT)

  Read Dates 

 for (i in seq(TestDates[DateNames])){
 TestDates[DateNames][[i]] - as.character(TestDates[DateNames][[i]])
 TestDates$ParsedDT - strsplit(TestDates[DateNames][[i]],/)
 TestDates$Month - sapply(TestDates$ParsedDT,function(x)x[1])
 TestDates$Day - sapply(TestDates$ParsedDT,function(x)x[2])
 TestDates$Year - sapply(TestDates$ParsedDT,function(x)x[3])
 TestDates$Day[TestDates$Day==un] - 15
 TestDates[DateNames][[i]] - with(TestDates, paste(Year, Month, Day, sep = 
 -))
 is.na( TestDates[DateNames][[i]] [TestDates$Month==un] ) - T
 is.na( TestDates[DateNames][[i]] [TestDates$Year==un] ) - T
 TestDates$Date - as.Date(TestDates[DateNames][[i]], format=%Y-%m-%d)
 TestDates$Invalid - ifelse(is.na(TestDates$Date)  
 !is.na(TestDates[DateNames][[i]]), 1, 0)
 if( sum(TestDates$Invalid)==0 )
        { TestDates[DateNames][[i]] - TestDates$Date } else
        { print ( TestDates[DateNames][[i]][TestDates$Invalid==1]) }
 TestDates - subset(TestDates, select = -c(ParsedDT, Month, Day, Year, Date, 
 Invalid))
 }

 TestDates

 class(TestDates$birthDT)
 class(TestDates$diagnosisDT)
 class(TestDates$metastaticDT)

If s is a vector of character strings representing dates then bad is a
logical vector which is TRUE for the bad ones and FALSE for the good
ones (adjust as needed if a different date range is valid) so s[bad]
is the bad inputs and the output d is a Date vector with NAs for the
bad ones:

x - gsub(un, 15, s)
d - as.Date(x, %m/%d/%Y)
bad - is.na(d) | d  as.Date(1900-01-01) | d  Sys.Date()
d[bad] - NA

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aplpy recursive function on a list

2012-01-26 Thread Berend Hasselman

On 26-01-2012, at 17:58, Brian Diggs wrote:

 On 1/25/2012 10:09 AM, patzoul wrote:
 I have 2 series of data a,b and I would like to calculate a new series which
 is z[t] = z[t-1]*a[t] + b[t] , z[1] = b[1].
 How can I do that without using a loop?
 
 A combination of Reduce and Map will work.  Map to stitch together the a and 
 b vectors, Reduce to step along them, allowing for accumulation.
 
 a - c(2,4,3,5)
 b - c(1,3,5,7)
 
 z - Reduce(function(zm1, coef) {coef[1] * zm1 + coef[2]},
Map(c, a, b),
init = b[1], accumulate = TRUE)
 
  z
 [1]   1   3  15  50 257

I don't think so.

a - c(2,4,3,5)
b - c(1,3,5,7)

z - rep(0,length(a))
z[1] - b[1]
for( t in 2:length(a) ) { z[t] - a[t] * z[t-1] + b[t] }
z

gives

[1]   1   7  26 137

and this agrees with a manual calculation.

You get a vector of length 5 as result. It should be of length 4 with your data.
If you change the Reduce expression to this

u - Reduce(function(zm1, coef) {coef[1] * zm1 + coef[2]},
   Map(c, a[-1], b[-1]),
   init = b[1], accumulate = TRUE)

then you get the correct result.

 u
[1]   1   7  26 137

Berend

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] What does [[1]] mean?

2012-01-26 Thread Ajay Askoolum
I know that [] is used for indexing.
I know that [[]] is used for reference to a property of a COM object.

I cannot find any explanation of what [[1]] does or, more pertinently, where it 
should be used.

Thank you.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aplpy recursive function on a list

2012-01-26 Thread Berend Hasselman

On 26-01-2012, at 19:10, Berend Hasselman wrote:

 
 On 26-01-2012, at 17:58, Brian Diggs wrote:
 
 On 1/25/2012 10:09 AM, patzoul wrote:
 I have 2 series of data a,b and I would like to calculate a new series which
 is z[t] = z[t-1]*a[t] + b[t] , z[1] = b[1].
 How can I do that without using a loop?
 
 
 
 I don't think so.
 
 a - c(2,4,3,5)
 b - c(1,3,5,7)
 
 z - rep(0,length(a))
 z[1] - b[1]
 for( t in 2:length(a) ) { z[t] - a[t] * z[t-1] + b[t] }
 z
 
 gives
 
 [1]   1   7  26 137
 
 and this agrees with a manual calculation.
 
 You get a vector of length 5 as result. It should be of length 4 with your 
 data.
 If you change the Reduce expression to this
 
 u - Reduce(function(zm1, coef) {coef[1] * zm1 + coef[2]},
   Map(c, a[-1], b[-1]),
   init = b[1], accumulate = TRUE)
 
 then you get the correct result.
 
 u
 [1]   1   7  26 137


And the loop especially if byte compiled with cmpfun  appears to be quite a bit 
quicker.

Nrep - 1000

tfrml.loop - function(a,b) {
z - rep(0,length(a))
z[1] - b[1]
for( t in 2:length(a) ) {
z[t] - a[t] * z[t-1] + b[t]
}

z
}

tfrml.rdce - function(a,b) {
u - Reduce(function(zm1, coef) {coef[1] * zm1 + coef[2]},
   Map(c, a[-1], b[-1]),
   init = b[1], accumulate = TRUE)
u
}

library(compiler)
tfrml.loop.c - cmpfun(tfrml.loop)
tfrml.rdce.c - cmpfun(tfrml.rdce)

z.loop - tfrml.loop(a,b)
z.rdce - tfrml.rdce(a,b)
all.equal(z.loop, z.rdce)

library(rbenchmark)

N - 500
set.seed(1)
a - runif(N)
b - runif(N)

benchmark(tfrml.loop(a,b), tfrml.rdce(a,b), tfrml.loop.c(a,b), 
tfrml.rdce.c(a,b),
replications=Nrep, columns=c(test, replications, elapsed))

test replications elapsed
1   tfrml.loop(a, b) 1000   2.665
3 tfrml.loop.c(a, b) 1000   0.554
2   tfrml.rdce(a, b) 1000   4.082
4 tfrml.rdce.c(a, b) 1000   3.143

Berend

R-2.14.1 (32-bits), Mac OS X 10.6.8.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What does [[1]] mean?

2012-01-26 Thread Greg Snow
Have you read ?[[ ?

The short answer is that you can use both [] and [[]] on lists, the [] 
construct will return a subset of  the list (which will be a list) while [[]] 
will return a single element of the list (which could be a list or a vector or 
whatever that element may be):  compare:

 tmp - list( a=1, b=letters )
 tmp[1]
$a
[1] 1

 tmp[1] + 1
Error in tmp[1] + 1 : non-numeric argument to binary operator
 tmp[[1]]
[1] 1
 tmp[[1]] + 1
[1] 2

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Ajay Askoolum
 Sent: Thursday, January 26, 2012 11:27 AM
 To: R General Forum
 Subject: [R] What does [[1]] mean?
 
 I know that [] is used for indexing.
 I know that [[]] is used for reference to a property of a COM object.
 
 I cannot find any explanation of what [[1]] does or, more pertinently,
 where it should be used.
 
 Thank you.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] merge multiple data frames

2012-01-26 Thread maxbre
This is my reproducible example (three data frames: a, b, c)

a-structure(list(date = structure(1:6, .Label = c(2012-01-03, 
2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 
2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 
2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 
2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23
), class = factor), so2 = c(0.799401398190476, 0, 0, 0.0100453950434783, 
0.200154920565217, 0.473866969181818), nox = c(111.716109973913, 
178.077239330435, 191.257829021739, 50.6799951473913, 115.284643540435, 
110.425185027727), no = c(48.8543691516522, 88.7197448817391, 
93.9931932472609, 13.9759949817391, 43.1395266865217, 41.7280296016364
), no2 = c(36.8673432865217, 42.37150668, 47.53311701, 29.3026882474783, 
49.2986070321739, 46.5978461731818), co = c(0.618856168125,
0.99659347508, 
0.66698741608, 0.38343731117, 0.281604928875, 0.155383408913043
), o3 = c(12.1393100029167, 12.3522739816522, 10.9908791203043, 
26.9122200013043, 13.8421695947826, 12.3788847045455), ipa =
c(167.541954974667, 
252.7196257875, 231.802370709167, 83.4850259595833, 174.394613581667, 
173.868599272609), ws = c(1.47191016429167, 0.765781205208333, 
0.937053086791667, 1.581022406625, 0.909756802125, 0.959252831695652
), wd = c(45.2650019737732, 28.2493544114369, 171.049080544214, 
319.753674830936, 33.8713897347193, 228.368119533759), temp =
c(7.9197282588, 
3.79434291520833, 2.1287644735, 6.733854600625, 3.136579722, 
3.09864120704348), umr = c(86.11566638875, 94.5034087491667, 
94.14451249375, 53.1016709004167, 65.63420423, 74.955669236087
)), .Names = c(date, so2, nox, no, no2, co, o3, 
ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class =
data.frame)


b-structure(list(date = structure(1:6, .Label = c(2012-01-03, 
2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 
2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 
2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 
2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23
), class = factor), so2 = c(0, 0, 0, 0, 0, 0), nox = c(13.74758511, 
105.8060582, 61.22720599, 11.45280354, 56.86804174, 39.17917222
), no = c(0.882593766, 48.97037506, 9.732937217, 1.794549972, 
16.32300019, 8.883637786), no2 = c(11.80447753, 25.35235381, 
28.72990261, 8.590004034, 31.9003796, 25.50512403), co = c(0.113954917, 
0.305985964, 0.064001839, 0, 1.86e-05, 0), o3 = c(5.570499897, 
9.802379608, 5.729360104, 11.91304016, 12.13407993, 10.00961971
), ipa = c(6.065110207, 116.9079971, 93.21240234, 10.5777998, 
66.40740204, 34.47359848), ws = c(0.122115001, 0.367668003, 0.494913995, 
0.627124012, 0.473895013, 0.593913019), wd = c(238.485119317031, 
221.645073036776, 220.372076815032, 237.868340917096, 209.532933617465, 
215.752030286564), temp = c(4.044159889, 1.176810026, 0.142934993, 
0.184606999, -0.935989976, -2.015399933), umr = c(72.29229736, 
88.69879913, 87.49530029, 24.00079918, 44.8852005, 49.47729874
)), .Names = c(date, so2, nox, no, no2, co, o3, 
ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class =
data.frame)


c-structure(list(date = structure(1:6, .Label = c(2012-01-03, 
2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 
2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 
2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 
2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23
), class = factor), so2 = c(2.617839247, 0, 0, 0.231044086, 
0.944608887, 2.12400444), nox = c(308.9046313, 275.6778849, 390.0824142, 
178.7429364, 238.655832, 251.892601), no = c(156.0262489, 151.4412498, 
221.0725021, 65.96049786, 106.541748, 119.3471241), no2 = c(74.80145447, 
59.29991481, 66.5897975, 77.84267978, 75.68422569, 85.43044816
), co = c(1.628431197, 1.716231492, 1.264678366, 1.693460745, 
0.780637084, 0.892724398), o3 = c(26.1473999, 15.91584015, 22.46199989, 
37.39400101, 15.63426018, 17.51494026), ipa = c(538.414978, 406.4620056, 
432.6459961, 275.2820129, 435.7909851, 436.8039856), ws = c(4.995530128, 
1.355309963, 1.708899975, 3.131690025, 1.546270013, 1.571320057
), wd = c(58.15639877, 64.5657153143848, 39.9754269501381, 24.0739884380921, 
55.9453098437477, 56.7648829092446), temp = c(10.24740028, 7.052690029, 
4.33258009, 13.91609955, 8.762220383, 11.04300022), umr = c(97.60900116, 
96.91899872, 96.20649719, 94.74620056, 82.04550171, 89.41320038
)), .Names = c(date, so2, nox, no, no2, co, o3, 
ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class =
data.frame)


Given the three data frames “a”, “b” and “c”, I need to merge them all by
the common field “date”.
The following attempt is working fine but…

# code start
all-merge(a,b,by=date,suffixes=c(.a,.b),all=T)
all-merge(all,c,by=date,all=T)
# code end

…I would like to possibly do it in just “one shot” (a generalisation of the
code for a more complex case of handling many data frames) also by assigning
proper suffixes to each variable (not possible with the previous code
snippet)

Then I also try 

[R] 3-parametric Weibull regression

2012-01-26 Thread Pfrengle Andreas (GS-SI/ENX7)
Hello,

I'm quite new to R and want to make a Weibull-regression with the survival 
package. I know how to build my Surv-object and how to make a 
standard-weibull regression with survreg.
However, I want to fit a translated or 3-parametric weibull dist to account for 
a failure-free time.
I think I would need a new object in survreg.distributions, but I don't know 
how to create that correctly. I've tried to inherit it from the extreme value 
dist, analogous to the implemented weibull-dist like so:
survreg.distributions$weibull3p$name - weibull
survreg.distributions$weibull3p$dist - extreme
survreg.distributions$weibull3p$trans - function(y, x0) log(y) + x0
survreg.distributions$weibull3p$dtrans - function(y) 1/y
survreg.distributions$weibull3p$itrans - function(x, x0) exp(x-x0)
I then get a failure by survregDtest(survreg.distributions$weibull3p) that x0 
is missing (since the transformation function is expected to take only one 
argument).

Any ideas? Or is there maybe a package somewhere that supports 3-parametric 
weibull regression which I missed?

Regards,
Andreas


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge multiple data frames

2012-01-26 Thread R. Michael Weylandt
I might do something like this:

mergeAll - function(..., by = date, all = TRUE) {
  dotArgs - list(...)
  Reduce(function(x, y)
  merge(x, y, by = by, all = all, suffixes=paste(., names(dotArgs),
sep = )),
  dotArgs)}

mergeAll(a = a, b = b, c = c)

str(.Last.value)

You also might be able to set it up to capture names without you
having to put a = a etc. using substitute.

On Thu, Jan 26, 2012 at 12:29 PM, maxbre mbres...@arpa.veneto.it wrote:
 This is my reproducible example (three data frames: a, b, c)

 a-structure(list(date = structure(1:6, .Label = c(2012-01-03,
 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08,
 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13,
 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18,
 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23
 ), class = factor), so2 = c(0.799401398190476, 0, 0, 0.0100453950434783,
 0.200154920565217, 0.473866969181818), nox = c(111.716109973913,
 178.077239330435, 191.257829021739, 50.6799951473913, 115.284643540435,
 110.425185027727), no = c(48.8543691516522, 88.7197448817391,
 93.9931932472609, 13.9759949817391, 43.1395266865217, 41.7280296016364
 ), no2 = c(36.8673432865217, 42.37150668, 47.53311701, 29.3026882474783,
 49.2986070321739, 46.5978461731818), co = c(0.618856168125,
 0.99659347508,
 0.66698741608, 0.38343731117, 0.281604928875, 0.155383408913043
 ), o3 = c(12.1393100029167, 12.3522739816522, 10.9908791203043,
 26.9122200013043, 13.8421695947826, 12.3788847045455), ipa =
 c(167.541954974667,
 252.7196257875, 231.802370709167, 83.4850259595833, 174.394613581667,
 173.868599272609), ws = c(1.47191016429167, 0.765781205208333,
 0.937053086791667, 1.581022406625, 0.909756802125, 0.959252831695652
 ), wd = c(45.2650019737732, 28.2493544114369, 171.049080544214,
 319.753674830936, 33.8713897347193, 228.368119533759), temp =
 c(7.9197282588,
 3.79434291520833, 2.1287644735, 6.733854600625, 3.136579722,
 3.09864120704348), umr = c(86.11566638875, 94.5034087491667,
 94.14451249375, 53.1016709004167, 65.63420423, 74.955669236087
 )), .Names = c(date, so2, nox, no, no2, co, o3,
 ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class =
 data.frame)


 b-structure(list(date = structure(1:6, .Label = c(2012-01-03,
 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08,
 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13,
 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18,
 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23
 ), class = factor), so2 = c(0, 0, 0, 0, 0, 0), nox = c(13.74758511,
 105.8060582, 61.22720599, 11.45280354, 56.86804174, 39.17917222
 ), no = c(0.882593766, 48.97037506, 9.732937217, 1.794549972,
 16.32300019, 8.883637786), no2 = c(11.80447753, 25.35235381,
 28.72990261, 8.590004034, 31.9003796, 25.50512403), co = c(0.113954917,
 0.305985964, 0.064001839, 0, 1.86e-05, 0), o3 = c(5.570499897,
 9.802379608, 5.729360104, 11.91304016, 12.13407993, 10.00961971
 ), ipa = c(6.065110207, 116.9079971, 93.21240234, 10.5777998,
 66.40740204, 34.47359848), ws = c(0.122115001, 0.367668003, 0.494913995,
 0.627124012, 0.473895013, 0.593913019), wd = c(238.485119317031,
 221.645073036776, 220.372076815032, 237.868340917096, 209.532933617465,
 215.752030286564), temp = c(4.044159889, 1.176810026, 0.142934993,
 0.184606999, -0.935989976, -2.015399933), umr = c(72.29229736,
 88.69879913, 87.49530029, 24.00079918, 44.8852005, 49.47729874
 )), .Names = c(date, so2, nox, no, no2, co, o3,
 ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class =
 data.frame)


 c-structure(list(date = structure(1:6, .Label = c(2012-01-03,
 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08,
 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13,
 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18,
 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23
 ), class = factor), so2 = c(2.617839247, 0, 0, 0.231044086,
 0.944608887, 2.12400444), nox = c(308.9046313, 275.6778849, 390.0824142,
 178.7429364, 238.655832, 251.892601), no = c(156.0262489, 151.4412498,
 221.0725021, 65.96049786, 106.541748, 119.3471241), no2 = c(74.80145447,
 59.29991481, 66.5897975, 77.84267978, 75.68422569, 85.43044816
 ), co = c(1.628431197, 1.716231492, 1.264678366, 1.693460745,
 0.780637084, 0.892724398), o3 = c(26.1473999, 15.91584015, 22.46199989,
 37.39400101, 15.63426018, 17.51494026), ipa = c(538.414978, 406.4620056,
 432.6459961, 275.2820129, 435.7909851, 436.8039856), ws = c(4.995530128,
 1.355309963, 1.708899975, 3.131690025, 1.546270013, 1.571320057
 ), wd = c(58.15639877, 64.5657153143848, 39.9754269501381, 24.0739884380921,
 55.9453098437477, 56.7648829092446), temp = c(10.24740028, 7.052690029,
 4.33258009, 13.91609955, 8.762220383, 11.04300022), umr = c(97.60900116,
 96.91899872, 96.20649719, 94.74620056, 82.04550171, 89.41320038
 )), .Names = c(date, so2, nox, no, no2, co, o3,
 ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class =
 data.frame)


 Given the three data 

Re: [R] What does [[1]] mean?

2012-01-26 Thread Patrick Burns

Here is a page that should help:

http://www.burns-stat.com/pages/Tutor/more_R_subscript.html

Also Circle 8.1.54 of 'The R Inferno'
http://www.burns-stat.com/pages/Tutor/R_inferno.pdf


On 26/01/2012 18:27, Ajay Askoolum wrote:

I know that [] is used for indexing.
I know that [[]] is used for reference to a property of a COM object.

I cannot find any explanation of what [[1]] does or, more pertinently, where it 
should be used.

Thank you.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] null distribution of binom.test p values

2012-01-26 Thread Thomas Lumley
On Fri, Jan 27, 2012 at 5:36 AM, Chris Wallace
chris.wall...@cimr.cam.ac.uk wrote:
 Greg, thanks for the reply.

 Unfortunately, I remain unconvinced!

 I ran a longer simulation, 100,000 reps.  The size of the test is
 consistently too small (see below) and the histogram shows increasing bars
 even within the parts of the histogram with even bar spacing.  See
 https://www-gene.cimr.cam.ac.uk/staff/wallace/hist.png

 y-sapply(1:10, function(i,n=100)
  binom.test(sum(rnorm(n)0),n,p=0.5,alternative=two)$p.value)
 mean(y0.01)
 # [1] 0.00584
 mean(y0.05)
 # [1] 0.03431
 mean(y0.1)
 # [1] 0.08646

 Can that really be due to the discreteness of the distribution?

Yes.  All so-called exact tests tend to be conservative due to
discreteness, and there's quite a lot of discreteness in the tails

The problem is far worse for Fisher's exact test, and worse still for
Fisher's other exact test (of Hardy-Weinberg equilibrium --
http://www.genetics.org/content/180/3/1609.full).

You don't need to rely on finite-sample simulations here: you can
evaluate the level exactly.  Using binom.test() you find that the
rejection regions are y=39 and y=61, so the level at nominal 0.05
is:
 pbinom(39,100,0.5)+pbinom(60,100,0.5,lower.tail=FALSE)
[1] 0.0352002
agreeing very well with your 0.03431

At nominal 0.01 the exact level is
 pbinom(36,100,0.5)+pbinom(63,100,0.5,lower.tail=FALSE)
[1] 0.006637121
and at 0.1 it is
 pbinom(41,100,0.5)+pbinom(58,100,0.5,lower.tail=FALSE)
[1] 0.08862608

Your result at nominal 0.01 is a bit low, but I think that's bad luck.
 When I ran your code I got 0.00659 for the estimated level at nominal
0.01, which matches the exact calculations very well


Theoreticians sweep this under the carpet by inventing randomized
tests, where you interpolate a random p-value between the upper and
lower values from a discrete distribution.  It's a very elegant idea
that I'm glad to say I haven't seen used in practice.

 -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aplpy recursive function on a list

2012-01-26 Thread Brian Diggs

On 1/26/2012 10:33 AM, Berend Hasselman wrote:


On 26-01-2012, at 19:10, Berend Hasselman wrote:



On 26-01-2012, at 17:58, Brian Diggs wrote:


On 1/25/2012 10:09 AM, patzoul wrote:

I have 2 series of data a,b and I would like to calculate a new series which
is z[t] = z[t-1]*a[t] + b[t] , z[1] = b[1].
How can I do that without using a loop?





I don't think so.

a- c(2,4,3,5)
b- c(1,3,5,7)

z- rep(0,length(a))
z[1]- b[1]
for( t in 2:length(a) ) { z[t]- a[t] * z[t-1] + b[t] }
z

gives

[1]   1   7  26 137

and this agrees with a manual calculation.

You get a vector of length 5 as result. It should be of length 4 with your data.
If you change the Reduce expression to this

u- Reduce(function(zm1, coef) {coef[1] * zm1 + coef[2]},
   Map(c, a[-1], b[-1]),
   init = b[1], accumulate = TRUE)

then you get the correct result.


u

[1]   1   7  26 137


You are correct; I had an off-by-one error.  It agreed with my manual 
calculation, which also had the same error.



And the loop especially if byte compiled with cmpfun  appears to be quite a bit 
quicker.

Nrep- 1000

tfrml.loop- function(a,b) {
 z- rep(0,length(a))
 z[1]- b[1]
 for( t in 2:length(a) ) {
 z[t]- a[t] * z[t-1] + b[t]
 }

 z
}

tfrml.rdce- function(a,b) {
 u- Reduce(function(zm1, coef) {coef[1] * zm1 + coef[2]},
Map(c, a[-1], b[-1]),
init = b[1], accumulate = TRUE)
 u
}

library(compiler)
tfrml.loop.c- cmpfun(tfrml.loop)
tfrml.rdce.c- cmpfun(tfrml.rdce)

z.loop- tfrml.loop(a,b)
z.rdce- tfrml.rdce(a,b)
all.equal(z.loop, z.rdce)

library(rbenchmark)

N- 500
set.seed(1)
a- runif(N)
b- runif(N)

benchmark(tfrml.loop(a,b), tfrml.rdce(a,b), tfrml.loop.c(a,b), 
tfrml.rdce.c(a,b),
 replications=Nrep, columns=c(test, replications, 
elapsed))

 test replications elapsed
1   tfrml.loop(a, b) 1000   2.665
3 tfrml.loop.c(a, b) 1000   0.554
2   tfrml.rdce(a, b) 1000   4.082
4 tfrml.rdce.c(a, b) 1000   3.143

Berend

R-2.14.1 (32-bits), Mac OS X 10.6.8.


The timings are interesting; I would not have expected the loop to have 
outperformed Reduce, or at least not by that much.  The loop also 
benefits much more from compiling, which is not as surprising since 
Reduce and Map are already compiled. I would guess the difference is due 
to overhead changing the format of the a/b data and being able to 
specialize the code.


I also ran timings for comparison and got qualitatively the same thing:

benchmark(tfrml.loop(a,b), tfrml.rdce(a,b), tfrml.loop.c(a,b), 
tfrml.rdce.c(a,b),

  replications=Nrep,
  columns=c(test, replications, elapsed, relative),
  order=relative)

test replications elapsed relative
3 tfrml.loop.c(a, b) 10000.34 1.00
1   tfrml.loop(a, b) 10001.89 5.558824
4 tfrml.rdce.c(a, b) 10002.12 6.235294
2   tfrml.rdce(a, b) 10002.79 8.205882

R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)
(Windows 7)


--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health  Science University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Quality of fit statistics for NLS?

2012-01-26 Thread Max Brondfield
Dear all,
I am trying to analyze some non-linear data to which I have fit a curve of
the following form:

dum - nls(y~(A + (B*x)/(C+x)), start = list(A=370,B=100,C=23000))

I am wondering if there is any way to determine meaningful quality of fit
statistics from the nls function?

A summary yields highly significant p-values, but it is my impression that
these are questionable at best given the iterative nature of the fit:

 summary(dum)

Formula: y ~ (A + (B * x)/(C + x))

Parameters:
   Estimate Std. Error t value Pr(|t|)
A   388.753  4.794  81.090   2e-16 ***
B   115.215  5.006  23.015   2e-16 ***
C 20843.832   4646.937   4.485 1.12e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 18.25 on 245 degrees of freedom

Number of iterations to convergence: 4
Achieved convergence tolerance: 2.244e-06


Is there any other means of determining the quality of the curve fit? I
have tried applying confidence intervals using confint(dum), but these
curves seem unrealistically narrow. Thanks so much for your help!
-Max

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculate a function repeatedly over sections of a ts object

2012-01-26 Thread Jorge Molinos

Hi,

I want to apply a function (in my case SDF; package “sapa”) repeatedly over 
discrete sections of a daily time series object by sliding a time window of 
constant length (e.g. 10 consecutive years or 1825 days) over the entire ts at 
increments of 1 time unit (e.g. 1 year or 365 days). So for example, the first 
SDF would be calculated for the daily values of my variable recorded between 
years 1 to 5, SDF2 to those for years 2 to 6 and so on until the total length 
of the series is covered. How can I implement this into a R script? Any help is 
much appreciated.

Jorge
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why was the ‘doSMP’ package removed from CRAN?

2012-01-26 Thread Tal Galili
Thank you for the explanation Uwe.

With regards,
Tal


Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Thu, Jan 26, 2012 at 10:42 AM, Uwe Ligges 
lig...@statistik.tu-dortmund.de wrote:



 On 26.01.2012 09:39, Barry Rowlingson wrote:

 On Thu, Jan 26, 2012 at 8:02 AM, Uwe Ligges
 lig...@statistik.tu-dortmund.**de lig...@statistik.tu-dortmund.de
  wrote:



 On 25.01.2012 22:20, Tal Galili wrote:


  Does any one know the reason for this?
 Is this a technical or a legal (e.g: license) issue?




 If legal issues were the reason, you had not found it in the archives
 anymore.


  Licensing can change - as a copyright holder I could decide the next
 release of my package is going to be under a proprietary license that
 doesn't allow redistribution. Any code already released under
 something like the GPL can't be forcibly removed, and people can make
 new forks from those, but if I'm the only person writing it and I
 decide to change the license and the latest free version isn't
 compatible with the current version of R then I'd expect to see the
 old versions in the archives and no version for the latest version of
 R.

  Last checkin at R-forge was only six weeks ago, and 1.0-3 installs
 fine on my latest R:

  
 https://r-forge.r-project.org/**scm/?group_id=950https://r-forge.r-project.org/scm/?group_id=950

 I suspect they just haven't pushed it to R-forge yet. Cockup before
 conspiracy.

 Barry


 Before people start with even more speculations: Reason is that the
 revoIPC package does no compile on modern versions of gcc and hence has
 been archived, doSMP depends on it and therefore went to the archives as
 well.

 Uwe Ligges


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculate a function repeatedly over sections of a ts object

2012-01-26 Thread R. Michael Weylandt
I'm not sure if it's easily doable with a ts class, but the rollapply
function in the zoo package will do this easily. (Also, I find zoo to
be a much more natural time-series workflow than ts so it might make
the rest of your life easier as well)

Michael

On Thu, Jan 26, 2012 at 2:24 PM, Jorge Molinos jgarc...@tcd.ie wrote:

 Hi,

 I want to apply a function (in my case SDF; package “sapa”) repeatedly over 
 discrete sections of a daily time series object by sliding a time window of 
 constant length (e.g. 10 consecutive years or 1825 days) over the entire ts 
 at increments of 1 time unit (e.g. 1 year or 365 days). So for example, the 
 first SDF would be calculated for the daily values of my variable recorded 
 between years 1 to 5, SDF2 to those for years 2 to 6 and so on until the 
 total length of the series is covered. How can I implement this into a R 
 script? Any help is much appreciated.

 Jorge
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculate a function repeatedly over sections of a ts object

2012-01-26 Thread Gabor Grothendieck
On Thu, Jan 26, 2012 at 4:00 PM, R. Michael Weylandt
michael.weyla...@gmail.com wrote:
 I'm not sure if it's easily doable with a ts class, but the rollapply
 function in the zoo package will do this easily. (Also, I find zoo to
 be a much more natural time-series workflow than ts so it might make
 the rest of your life easier as well)


rollapply does have ts and default methods in addition to a zoo method.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge multiple data frames

2012-01-26 Thread maxbre
thank you for your reply
I'll study and test your code (which is a bit obscure to me up to now);
by the way do you think that merge_all is a wrong way to hit? 
thanks again
m

--
View this message in context: 
http://r.789695.n4.nabble.com/merge-multiple-data-frames-tp4331089p4331830.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

2012-01-26 Thread Jhope
I ask the question about when to stop adding another variable even though it
lowers the AIC because each time I add a variable the AIC is lower. How do I
know when the model is a good fit? When to stop adding variables, keeping
the model simple?

Thanks, J

--
View this message in context: 
http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4331848.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to remove rows representing concurrent sessions from data.frame?

2012-01-26 Thread Jean V Adams
johannes rara wrote on 01/26/2012 02:46:57 AM:

 I have a dataset like this (dput for this below) which represents user
 computer sessions:
 
username  machine   start end
 1 user1 D5599.domain.com 2011-01-03 09:44:18 2011-01-03 09:47:27
 2 user1 D5599.domain.com 2011-01-03 09:46:29 2011-01-03 10:09:16
 3 user1 D5599.domain.com 2011-01-03 14:07:36 2011-01-03 14:56:17
 4 user1 D5599.domain.com 2011-01-05 15:03:17 2011-01-05 15:23:15
 5 user1 D5599.domain.com 2011-02-14 14:33:39 2011-02-14 14:40:16
 6 user1 D5599.domain.com 2011-02-23 13:54:30 2011-02-23 13:58:23
 7 user1 D5599.domain.com 2011-03-21 10:10:18 2011-03-21 10:32:22
 8 user1 D5645.domain.com 2011-06-09 10:12:41 2011-06-09 10:58:59
 9 user1 D5682.domain.com 2011-01-03 12:03:45 2011-01-03 12:29:43
 10USER2 D5682.domain.com 2011-01-12 14:26:05 2011-01-12 14:32:53
 11USER2 D5682.domain.com 2011-01-17 15:06:19 2011-01-17 15:44:22
 12USER2 D5682.domain.com 2011-01-18 15:07:30 2011-01-18 15:42:43
 13USER2 D5682.domain.com 2011-01-25 15:20:55 2011-01-25 15:24:38
 14USER2 D5682.domain.com 2011-02-14 15:03:00 2011-02-14 15:07:43
 15USER2 D5682.domain.com 2011-02-14 14:59:23 2011-02-14 15:14:47
 
 
 There may be serveral concurrent sessions for same username from the
 same computer. How can I remove those rows so that only one sessions
 is left for this data? I have no idea how to do this, maybe using
 difftime?
 
 -J
 
 structure(list(username = c(user1, user1, user1,
 user1, user1, user1, user1, user1,
 user1, USER2, USER2, USER2, USER2, USER2, USER2
 ), machine = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L,
 3L, 3L, 3L, 3L, 3L, 3L), .Label = c(D5599.domain.com, 
D5645.domain.com,
 D5682.domain.com, D5686.domain.com, D5694.domain.com, 
 D5696.domain.com,
 D5772.domain.com, D5772.domain.com, D5847.domain.com, 
 D5855.domain.com,
 D5871.domain.com, D5927.domain.com, D5927.domain.com, 
 D5952.domain.com,
 D5993.domain.com, D6012.domain.com, D6048.domain.com, 
 D6077.domain.com,
 D5688.domain.com, D5815.domain.com, D6106.domain.com, 
D6128.domain.com
 ), class = factor), start = structure(c(1294040658, 1294040789,
 1294056456, 1294232597, 1297686819, 1298462070, 1300695018, 1307603561,
 1294049025, 1294835165, 1295269579, 1295356050, 1295961655, 1297688580,
 1297688363), class = c(POSIXct, POSIXt), tzone = ), end =
 structure(c(1294040847,
 1294042156, 1294059377, 1294233795, 1297687216, 1298462303, 1300696342,
 1307606339, 1294050583, 1294835573, 1295271862, 1295358163, 1295961878,
 1297688863, 1297689287), class = c(POSIXct, POSIXt), tzone = )),
 .Names = c(username,
 machine, start, end), row.names = c(NA, 15L), class = 
data.frame)


# rearrange the data, so that there is one date/time variable
# and another variable indicates start/end
library(reshape)
df2 - melt(df)
# sort the data by user, machine, date/time
df3 - df2[order(df2$username, df2$machine, df2$value), ]
# for each user and machine, 
# keep only the first start record and the last end record
first - function(x) {
l - length(x)
c(1, 1-(x[-1]==x[-l]))
}
last - function(x) {
y - rev(x)
l - length(y)
rev(c(1, 1-(y[-1]==y[-l])))
}
df4 - df3[(df3$variable==start  first(df3$variable)) | 
(df3$variable==end  last(df3$variable)), ]
# combine the results
df5 - cbind(df4[df4$variable==start, 
c(username, machine, value)], 
value2=df4$value[df4$variable==end])
df5

Jean
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Quality of fit statistics for NLS?

2012-01-26 Thread Bert Gunter
Inline below.

-- Bert

On Thu, Jan 26, 2012 at 12:16 PM, Max Brondfield
max.brondfi...@gmail.com wrote:
 Dear all,
 I am trying to analyze some non-linear data to which I have fit a curve of
 the following form:

 dum - nls(y~(A + (B*x)/(C+x)), start = list(A=370,B=100,C=23000))

 I am wondering if there is any way to determine meaningful quality of fit
 statistics from the nls function?

 A summary yields highly significant p-values, but it is my impression that
 these are questionable at best given the iterative nature of the fit:
No. They are questionable primarily because there is no clear null
model. They are based on profile likelihoods (as ?confint tells you),
which may or may not be what you want for goodness of fit.

One can always get goodness of fit statistics but the question in
nonlinear models is: goodness of fit with respect to what? So the
answer to your question is: if you know what you're doing, certainly.
Otherwise, find someone who does.





 summary(dum)

 Formula: y ~ (A + (B * x)/(C + x))

 Parameters:
   Estimate Std. Error t value Pr(|t|)
 A   388.753      4.794  81.090   2e-16 ***
 B   115.215      5.006  23.015   2e-16 ***
 C 20843.832   4646.937   4.485 1.12e-05 ***
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Residual standard error: 18.25 on 245 degrees of freedom

 Number of iterations to convergence: 4
 Achieved convergence tolerance: 2.244e-06


 Is there any other means of determining the quality of the curve fit? I
 have tried applying confidence intervals using confint(dum), but these
 curves seem unrealistically narrow. Thanks so much for your help!
 -Max

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

2012-01-26 Thread Bert Gunter
Simple question. 8 million pages in the statistical literature of
answers. What, indeed, is the secret to life?

Post on a statistical help list (e.g. stats.stackexchange.com). This
has almost nothing to do with R. Be prepared for an onslaught of often
conflicting wisdom.

-- Bert

On Thu, Jan 26, 2012 at 1:25 PM, Jhope jeanwaij...@gmail.com wrote:
 I ask the question about when to stop adding another variable even though it
 lowers the AIC because each time I add a variable the AIC is lower. How do I
 know when the model is a good fit? When to stop adding variables, keeping
 the model simple?

 Thanks, J

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4331848.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] warning In fun(...) : no DISPLAY variable so Tk is not available

2012-01-26 Thread sjkimble

Nevil Amos wrote:
 I am getting the above warning following loading of Geneland 3.1.5 on 
 unix , while a simple plot sends output to the pdf file ( see attached 
 code) no output results from Geneland functions, resulting in empty pdf 
 files
   

That message is saying it can't find an X11 server for Tcltk to use.  If 
you do have X11 available, then just set the DISPLAY environment 
variable in the normal way; if you don't, then you're not going to be 
able to use that package on Unix.  (The Windows implementation is 
self-contained, so it should work there.)

I am having the same error message for Geneland, but I am implementing on a
Windows machine.

--
View this message in context: 
http://r.789695.n4.nabble.com/warning-In-fun-no-DISPLAY-variable-so-Tk-is-not-available-tp2235656p4331992.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function for grouping

2012-01-26 Thread Petr Savicky
On Thu, Jan 26, 2012 at 08:29:22AM -0800, yan wrote:
 what if I don't need to store the combination results, I just want to get the
 combination result for a given row.
 e.g. for the 5 elements divided into 3 groups , and if I give another
 parameter which is the row number of the results, in petr's example, say if
 I set row number to 1, then I get 1,2,3,1,1.
 
 So I need a systematic way of generating the combination, but I don't need
 all the combinations in one go.

Hi.

The following function partitioning() computes m-th
partitioning of n elements into groups groups in the
lexicographic order.

  partitioning - function(n, groups, m)
  {
  a - matrix(nrow=n+1, ncol=groups+2)
  a[, groups + 2] - 0
  a[1, 0:groups + 1] - 0
  a[1, groups + 1] - 1
  for (i in seq.int(from=2, length=n)) {
  for (j in 0:groups) {
  a[i, j + 1] - j*a[i - 1, j + 1] + a[i - 1, j + 2]
  }
  }
  available - a[n + 1, 1]
  stopifnot(m = available)
  x - rep(NA, times=n)
  for (k in seq.int(length=n)) {
  free - max(x[seq.int(length=k-1)], 0)
  for (j in seq.int(length=min(free + 1, groups))) {
  subtree - a[n - k + 1, max(free, j) + 1]
  if (subtree  m) {
  available - available - subtree
  m - m - subtree
  } else {
  x[k] - j
  break
  }
  }
  }
  x
  }
  
  # the previous code for comparison modified to
  # produce partitionings in the same order
  
  check.row - function(x, k)
  {
  y - unique(x)
  length(y) == k  all(y == 1:k)
  }
  
  allPart - as.matrix(rev(expand.grid(x5=1:3, x4=1:3, x3=1:3, x2=1:2, x1=1)))
  ok - apply(allPart, 1, check.row, k=3)
  allPart - allPart[ok, ]
  
  # the comparison itself
  
  for (m in seq.int(length=nrow(allPart)+1)) {
  x - partitioning(5, 3, m)
  cat(m, x, x - allPart[m, ], \n)
  }

  # the output is 

  1 1 1 1 2 3 0 0 0 0 0 
  2 1 1 2 1 3 0 0 0 0 0 
  3 1 1 2 2 3 0 0 0 0 0 
  4 1 1 2 3 1 0 0 0 0 0 
  5 1 1 2 3 2 0 0 0 0 0 
  6 1 1 2 3 3 0 0 0 0 0 
  7 1 2 1 1 3 0 0 0 0 0 
  8 1 2 1 2 3 0 0 0 0 0 
  9 1 2 1 3 1 0 0 0 0 0 
  10 1 2 1 3 2 0 0 0 0 0 
  11 1 2 1 3 3 0 0 0 0 0 
  12 1 2 2 1 3 0 0 0 0 0 
  13 1 2 2 2 3 0 0 0 0 0 
  14 1 2 2 3 1 0 0 0 0 0 
  15 1 2 2 3 2 0 0 0 0 0 
  16 1 2 2 3 3 0 0 0 0 0 
  17 1 2 3 1 1 0 0 0 0 0 
  18 1 2 3 1 2 0 0 0 0 0 
  19 1 2 3 1 3 0 0 0 0 0 
  20 1 2 3 2 1 0 0 0 0 0 
  21 1 2 3 2 2 0 0 0 0 0 
  22 1 2 3 2 3 0 0 0 0 0 
  23 1 2 3 3 1 0 0 0 0 0 
  24 1 2 3 3 2 0 0 0 0 0 
  25 1 2 3 3 3 0 0 0 0 0 
  Error: m = available is not TRUE

The zeros confirm that partitioning(5, 3, m) is exactly
m-th row of allPart. The error at the end shows that
the function partitioning() recognizes the situation
that m is larger than the number of partitionings with
the given parameters.

The idea of the algorithm is that it searches through
the tree of all encodings of the partitionings and
the precomputed matrix a[,] allows to determine the
number of leaves under any node of the tree. Using
this, the algorithm can choose the right branch at
each node.

Hope this helps.

Petr.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Conditional cumulative sum

2012-01-26 Thread Axel Urbiz
Dear List,

I'll appreciate your help on this. I'm trying to create a variable as in
cumsum_y.cond1 below, which should compute the cumulative sum of y
conditional on the value of cond==1.

set.seed(1)
d - data.frame(y= sample(c(0,1), 10, replace= T),
cond= sample(c(0,1), 10, replace= T))


   y cond  cumsum_y.cond1
 1  00 0
 2  00 0
 3  11 1
 4  10 1
 5  01 1
 6  10 1
 7  11 2
 8  11 3
 9  10 3
10 01 3

Thank you.

Regards,
Axel.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditional cumulative sum

2012-01-26 Thread Pete Brecknock

Axel Urbiz wrote
 
 Dear List,
 
 I'll appreciate your help on this. I'm trying to create a variable as in
 cumsum_y.cond1 below, which should compute the cumulative sum of y
 conditional on the value of cond==1.
 
 set.seed(1)
 d - data.frame(y= sample(c(0,1), 10, replace= T),
 cond= sample(c(0,1), 10, replace= T))
 
 
y cond  cumsum_y.cond1
  1  00 0
  2  00 0
  3  11 1
  4  10 1
  5  01 1
  6  10 1
  7  11 2
  8  11 3
  9  10 3
 10 01 3
 
 Thank you.
 
 Regards,
 Axel.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


is this what you are looking for ...

set.seed(1)
d - data.frame(y= sample(c(0,1), 10, replace= T),
cond= sample(c(0,1), 10, replace= T)) 

d$cumsum_y.cond1 = cumsum(d$y  d$cond)

# Output
   y cond cumsum_y.cond1
1  00  0
2  00  0
3  11  1
4  10  1
5  01  1
6  10  1
7  11  2
8  11  3
9  10  3
10 01  3

HTH

Pete

--
View this message in context: 
http://r.789695.n4.nabble.com/Conditional-cumulative-sum-tp4332254p4332344.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Checking for invalid dates: Code works but needs improvement

2012-01-26 Thread Rui Barradas
Hello, again.

I now have a more complete answer to your points.

 1. It's too long. My understanding is that skilled programmers can usually
 or often complete tasks like this in a few lines.

It's not very shorter but it's more readable. (The programmer is always
suspect)

 2. It's not vectorized. I started out trying to do something that was
 vectorized
 but ran into problems with the strsplit function. I looked at the help
 file and
 it appears this function will only accept a single character vector. 

All but one instructions are vectorized. And the one that is not only loops
for
a few column names.
Use 'unlist' on the 'strsplit' function's output to give a vector.

 4. There's no way to specify names for input and output data. I imagine
 this would
 be fairly easy to specify this in the arguments to a function but am not
 sure how to
 incorporate it into a for loop.

You can now specify any matrix or data.frame, but it will only process the
columns with
dates. (This is not true, it will process anything with a '/' on it. Pay
attention.)

Near the beginning of your code include the following:


 TestDates - data.frame(scan(connection,
 list(Patient=0, birthDT=, diagnosisDT=,
 metastaticDT=)))

 close(connection)

TDSaved - TestDates# to avoid reopenning the connection

And then, after all of it,

fun - function(Dat){
f - function(jj, DF){
x - as.character(DF[, jj])
x - unlist(strsplit(x, /))
n - length(x)
M - x[seq(1, n, 3)]
D - x[seq(2, n, 3)]
Y - x[seq(3, n, 3)]
D[D == un] - 15
Y - ifelse(nchar(Y)  4 | Y  1900, NA, Y)
x - as.Date(paste(Y, M, D, sep=-), format=%Y-%m-%d)
if(any(is.na(x)))
cat(Warning: Invalid date values in, jj, \n,
as.character(DF[is.na(x), jj]), \n)
x
}
colinx - colnames(as.data.frame(Dat))
Dat - data.frame(sapply(colinx, function(j) f(j, Dat)))
for(i in colinx) class(Dat[[i]]) - Date
Dat
}

TD - TDSaved

TD[, DateNames] - fun(TD[, DateNames])

TD

Had fun in writing it.
Good luck.

Rui Barradas



--
View this message in context: 
http://r.789695.n4.nabble.com/Checking-for-invalid-dates-Code-works-but-needs-improvement-tp4324356p4332529.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Is there a R command for testing the difference of two liear regressions?

2012-01-26 Thread Michael
Hi al,

I am looking for a R command to test the difference of two linear
regressoon betas.

Lets say I have data x1, x2...x(n+1).
beta1 is obtained from regressing x1 to xn onto 1 to n.

beta2 is obtained from regressing x2 to x(n+1) onto 1 to n.

Is there a way in R to test whether beta1 and beta2 are statistically
different?

Thanks a lot!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there a R command for testing the difference of two liear regressions?

2012-01-26 Thread Mark Leeds
I don't know what is available in R but a relevant paper that provides a
test is by

 Hotelling, H ( September, 1940 )

The Selection of Variates For Use in Prediction With Some Comments On The
General Problem of Nuisance Parameters.

Annals of Mathematical Statistics, 11, 271-283.




On Thu, Jan 26, 2012 at 11:59 PM, Michael comtech@gmail.com wrote:

 Hi al,

 I am looking for a R command to test the difference of two linear
 regressoon betas.

 Lets say I have data x1, x2...x(n+1).
 beta1 is obtained from regressing x1 to xn onto 1 to n.

 beta2 is obtained from regressing x2 to x(n+1) onto 1 to n.

 Is there a way in R to test whether beta1 and beta2 are statistically
 different?

 Thanks a lot!

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multiple column comparison

2012-01-26 Thread ryanfuller
Hello, 
I have a very large content analysis project, which I've just begun to
collect training data on. I have three coders, who are entering data on up
to 95 measurements. Traditionally, I've used Excel to check coder agreement
(e.g., percentage agreement), by lining up each coder's measurements
side-by-side, creating a new column with the results using if statements.
That is, if (a=b, 1, 0). With this many variables, I am clearly interested
in something that I don't have to create manually every time I check
percentage agreement for coders. 

The data are set up like this: 

IDCODER V1  V2   V3   V4 ... V95
ID1  C1 y  int   doc  y
ID2  C1 y  ext   doc  y
ID1  C2nint  doc  y
ID2  C2nint  doc  y
ID1 C3 n int  doc  y
ID2 C3 n int  doc  y

I would like to write a script to do the following:
For each variable compare each pair of coders using if statements (e.g., if
C1.V1.==C1.V2, 1, 0)

IDC1.V1  C2.V1 C3.V1
ID1   y   y   y 
ID2  yy   y  

For each coding pair, enter the resulting 1s and 0s into a new column. 

The new column name would reflect the results of the comparison, such as
C1.C2.V1

I'd ideally like to create this so that it can handle any number of
variables and any number of coders. 

I appreciate any thoughts, help, and pointers on this. 

Thanks in advance. 

Best,
Ryan Fuller
Doctoral Candidate, Communication
Graduate Student Researcher, Carsey-Wolf Center
http://carseywolf.ucsb.edu
University of California, Santa Barbara



--
View this message in context: 
http://r.789695.n4.nabble.com/multiple-column-comparison-tp4332604p4332604.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Clusters: How to build a Amalgamation Schedule (sequence of jointing )?

2012-01-26 Thread womak
http://r.789695.n4.nabble.com/file/n4332721/Рисунок1.jpg 
Ultimately, I want, when I will set the limits of height's variation from
*min* to *max*, get into the file a set of clusters (*A, B, C, ... ..*) in
the form of:

l1; 32, 33, 34, 30, 31, 55, 60, 58, 59, 57, 61
l2; 50, 51, 52, 54, 47, 53, 48, 49, 28, 6, 39, 4, 27, 3, 1, 25 2, 26
l3; 41, 44, 43, 45, 40, 46, 8, 13, 9, 15, 11, 35, 7, 14, 10, 12, 36
l4 ... ...
... ...
Lm
Where *min*  / l1, l2, l3 . lm /  *max*

/l1,l2,l3/ - the height, /number/ -  case's number in the original file data

--
View this message in context: 
http://r.789695.n4.nabble.com/Clusters-How-to-build-a-Amalgamation-Schedule-sequence-of-jointing-tp4319741p4332721.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.