Re: [R] Why was the ‘doSMP’ package removed from CRAN?
On 25.01.2012 22:20, Tal Galili wrote: Hello dear list, I just noticed that: Package ‘doSMP’ was removed from the CRAN repository. http://cran.r-project.org/web/packages/doSMP/index.html Does any one know the reason for this? Is this a technical or a legal (e.g: license) issue? If legal issues were the reason, you had not found it in the archives anymore. Uwe Thanks, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why was the ‘doSMP’ package removed from CRAN?
On Thu, Jan 26, 2012 at 8:02 AM, Uwe Ligges lig...@statistik.tu-dortmund.de wrote: On 25.01.2012 22:20, Tal Galili wrote: Does any one know the reason for this? Is this a technical or a legal (e.g: license) issue? If legal issues were the reason, you had not found it in the archives anymore. Licensing can change - as a copyright holder I could decide the next release of my package is going to be under a proprietary license that doesn't allow redistribution. Any code already released under something like the GPL can't be forcibly removed, and people can make new forks from those, but if I'm the only person writing it and I decide to change the license and the latest free version isn't compatible with the current version of R then I'd expect to see the old versions in the archives and no version for the latest version of R. Last checkin at R-forge was only six weeks ago, and 1.0-3 installs fine on my latest R: https://r-forge.r-project.org/scm/?group_id=950 I suspect they just haven't pushed it to R-forge yet. Cockup before conspiracy. Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why was the ‘doSMP’ package removed from CRAN?
On 26.01.2012 09:39, Barry Rowlingson wrote: On Thu, Jan 26, 2012 at 8:02 AM, Uwe Ligges lig...@statistik.tu-dortmund.de wrote: On 25.01.2012 22:20, Tal Galili wrote: Does any one know the reason for this? Is this a technical or a legal (e.g: license) issue? If legal issues were the reason, you had not found it in the archives anymore. Licensing can change - as a copyright holder I could decide the next release of my package is going to be under a proprietary license that doesn't allow redistribution. Any code already released under something like the GPL can't be forcibly removed, and people can make new forks from those, but if I'm the only person writing it and I decide to change the license and the latest free version isn't compatible with the current version of R then I'd expect to see the old versions in the archives and no version for the latest version of R. Last checkin at R-forge was only six weeks ago, and 1.0-3 installs fine on my latest R: https://r-forge.r-project.org/scm/?group_id=950 I suspect they just haven't pushed it to R-forge yet. Cockup before conspiracy. Barry Before people start with even more speculations: Reason is that the revoIPC package does no compile on modern versions of gcc and hence has been archived, doSMP depends on it and therefore went to the archives as well. Uwe Ligges __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to remove rows representing concurrent sessions from data.frame?
I have a dataset like this (dput for this below) which represents user computer sessions: username machine start end 1 user1 D5599.domain.com 2011-01-03 09:44:18 2011-01-03 09:47:27 2 user1 D5599.domain.com 2011-01-03 09:46:29 2011-01-03 10:09:16 3 user1 D5599.domain.com 2011-01-03 14:07:36 2011-01-03 14:56:17 4 user1 D5599.domain.com 2011-01-05 15:03:17 2011-01-05 15:23:15 5 user1 D5599.domain.com 2011-02-14 14:33:39 2011-02-14 14:40:16 6 user1 D5599.domain.com 2011-02-23 13:54:30 2011-02-23 13:58:23 7 user1 D5599.domain.com 2011-03-21 10:10:18 2011-03-21 10:32:22 8 user1 D5645.domain.com 2011-06-09 10:12:41 2011-06-09 10:58:59 9 user1 D5682.domain.com 2011-01-03 12:03:45 2011-01-03 12:29:43 10USER2 D5682.domain.com 2011-01-12 14:26:05 2011-01-12 14:32:53 11USER2 D5682.domain.com 2011-01-17 15:06:19 2011-01-17 15:44:22 12USER2 D5682.domain.com 2011-01-18 15:07:30 2011-01-18 15:42:43 13USER2 D5682.domain.com 2011-01-25 15:20:55 2011-01-25 15:24:38 14USER2 D5682.domain.com 2011-02-14 15:03:00 2011-02-14 15:07:43 15USER2 D5682.domain.com 2011-02-14 14:59:23 2011-02-14 15:14:47 There may be serveral concurrent sessions for same username from the same computer. How can I remove those rows so that only one sessions is left for this data? I have no idea how to do this, maybe using difftime? -J structure(list(username = c(user1, user1, user1, user1, user1, user1, user1, user1, user1, USER2, USER2, USER2, USER2, USER2, USER2 ), machine = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c(D5599.domain.com, D5645.domain.com, D5682.domain.com, D5686.domain.com, D5694.domain.com, D5696.domain.com, D5772.domain.com, D5772.domain.com, D5847.domain.com, D5855.domain.com, D5871.domain.com, D5927.domain.com, D5927.domain.com, D5952.domain.com, D5993.domain.com, D6012.domain.com, D6048.domain.com, D6077.domain.com, D5688.domain.com, D5815.domain.com, D6106.domain.com, D6128.domain.com ), class = factor), start = structure(c(1294040658, 1294040789, 1294056456, 1294232597, 1297686819, 1298462070, 1300695018, 1307603561, 1294049025, 1294835165, 1295269579, 1295356050, 1295961655, 1297688580, 1297688363), class = c(POSIXct, POSIXt), tzone = ), end = structure(c(1294040847, 1294042156, 1294059377, 1294233795, 1297687216, 1298462303, 1300696342, 1307606339, 1294050583, 1294835573, 1295271862, 1295358163, 1295961878, 1297688863, 1297689287), class = c(POSIXct, POSIXt), tzone = )), .Names = c(username, machine, start, end), row.names = c(NA, 15L), class = data.frame) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Workshop on Bayesian methods and WinBUGS. One week to go!
Workshop on Bayesian methods and WinBUGS *** A two-day workshop on Bayesian methods is being held on Friday 3 - Saturday 4 February 2012 at the University of Sydney. This course is suitable for graduate students, academics, researchers and professionals who wish to introduce themselves in the application of Bayesian methods and the use of WinBUGS software. The workshop is offered in association with the 11th International Colloquium on Paratuberculosis (ICP) but is also open to non-ICP participants. Bayesian tools in the control of Paratuberculosis Program outline, Day 1 (3 February 2012): -Introduction to Bayesian Inference (Probability theory; Bayes¢ theorem; The likelihood function; The prior the posterior distribution). -Overview of WinBUGS (Running the software; Writing code; Estimation of a single proportion; Hands-on WINBUGS) -Bayesian prevalence estimation (One population; Multiple populations; Herd-level prevalence; Hands-on WINBUGS). Program outline, Day 2 (4 February 2012): -Bayesian latent class models in Se and Sp estimation (Conditionally independent dependent tests; Continuous tests; Hands-on WinBUGS). -Bayesian logistic regression (Ordinary logistic regression; Logistic regression adjusting for the Se and Sp of the diagnostic process; Hands-on WinBUGS). -Checklist of what should be present in a Bayesian analysis; Guidelines on informative prior specification; The ParaTB paradigm. Tutors: Ian A. Gardner, University of Prince Edward Island, Canada Polychronis Kostoulas, University of Thessaly, Greece Local coordinator: Navneet Dhand, University of Sydney, Australia For the program and further information go to: http://www.icp2012.com.au/downloads/ICPWorkshopFlyer.pdf Limited places (20) are available so be sure to register quickly. To register please go to: https://eiapac.certain.com/cem/getdemo.ei?id=7120002s=_2AC0SH5WT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?
Thank you everyone for your dedication to improving 'R' - its function to statistical analysis and comments. I have now 48 models (unique combinations of 1 to 6 variables) and have put them into a list and gained the results for all models. Below is a sample of my script results: m$model48 - Model48.glm sapply(m, extractAIC) model47 model48 [1,]8.000 46.000 [2,] 6789.863 3549.543 Q 1) How do you interpret these results? What is 1 and 2? 2) I have been suggested to try a quasibinomial, once responses are fixed. Is this necessary? If so is there a way I can do this by considering all these models ? 3) Then gaussian? Much appreciated! Saludos, J -- View this message in context: http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4329798.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How do I use the cut function to assign specific cut points?
I am new to R, and I am trying to cut a continuous variable BMI into different categories and can't figure out how to use it. I would like to cut it into four groups: 20, 20-25, 25-30 and = 30. I am having difficulty figuring the code for 20 and =30? Please help. Thank you. -- View this message in context: http://r.789695.n4.nabble.com/How-do-I-use-the-cut-function-to-assign-specific-cut-points-tp4329788p4329788.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I use the cut function to assign specific cut points?
On 01/26/2012 07:23 PM, citadel wrote: I am new to R, and I am trying to cut a continuous variable BMI into different categories and can't figure out how to use it. I would like to cut it into four groups:20, 20-25, 25-30 and= 30. I am having difficulty figuring the code for20 and=30? Please help. Thank you. Hi citadel, Assuming that you only have positive numbers and they are all less than 100: BMIcut-cut(BMI,breaks=c(0,20,25,30,100), include.lowest=TRUE,right=FALSE) This will give you =0-20, =20-25, =20-30, =30-100 Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] extracting from data.frames for survival analysis
Hi, I have a data frame: class(B27.vec) [1] data.frame head(B27.vec) AGE Gend B27 AgeOn DD uveitis psoriasis IBD CD UC InI BASDAI BASFI Smok UV 1 571 119 38 2 1 1 1 1 1 5.40 8.08 NA 1 2 351 133 2 2 1 1 1 1 1 1.69 2.28 NA 1 3 492 140 9 1 1 1 1 1 1 8.30 9.40 NA 0 4 321 121 11 1 1 1 1 1 1 5.10 9.10 NA 0 5 311 124 7 1 1 1 1 1 1 6.63 6.52 NA 0 6 271 123 4 1 2 1 1 1 1 7.19 6.51 NA 0 I am trying to perform survival analysis but continually get errors when extracting from this data.frame: attempt 1: X - Surv(B27.vec$AgeOn,B27.vec$UV) survdiff(X,rho=0,data=uvf) Error in x$terms : $ operator is invalid for atomic vectors attempt 2: X - Surv(B27.vec[,4],B27.vec[,15]) survdiff(X,rho=0,data=uvf) Error in x$terms : $ operator is invalid for atomic vector attempt 3: AO - B27.vec[[AgeOn, exact = TRUE]] UV - B27.vec[[UV,exact=TRUE]] X - Surv(AO,UV) survdiff(X,rho=0,data=uvf) Error in x$terms : $ operator is invalid for atomic vectors I have read ?data.frame extract.data.frame but I cannot understand how I might structure this differently so it extracts the required columns from this dataframe. For the second 2 attempts I am not using the $ term. Sorry if this seems basic but cannot understand why attempt 1 or 2 doesn't work. thanks Philip __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting from data.frames for survival analysis
Hi, Philip, counter-questions: 1. Which/where is the grouping variable for the test of differences in survival? 2. Assume the grouping variable is Gend in B27.vec. Then, why aren't you using survdiff( Surv( AgeOn, UV) ~ Gend, rho = 0, data = B27.vec) ? Hth -- Gerrit On Thu, 26 Jan 2012, Philip Robinson wrote: Hi, I have a data frame: class(B27.vec) [1] data.frame head(B27.vec) AGE Gend B27 AgeOn DD uveitis psoriasis IBD CD UC InI BASDAI BASFI Smok UV 1 571 119 38 2 1 1 1 1 1 5.40 8.08 NA 1 2 351 133 2 2 1 1 1 1 1 1.69 2.28 NA 1 3 492 140 9 1 1 1 1 1 1 8.30 9.40 NA 0 4 321 121 11 1 1 1 1 1 1 5.10 9.10 NA 0 5 311 124 7 1 1 1 1 1 1 6.63 6.52 NA 0 6 271 123 4 1 2 1 1 1 1 7.19 6.51 NA 0 I am trying to perform survival analysis but continually get errors when extracting from this data.frame: attempt 1: X - Surv(B27.vec$AgeOn,B27.vec$UV) survdiff(X,rho=0,data=uvf) Error in x$terms : $ operator is invalid for atomic vectors attempt 2: X - Surv(B27.vec[,4],B27.vec[,15]) survdiff(X,rho=0,data=uvf) Error in x$terms : $ operator is invalid for atomic vector attempt 3: AO - B27.vec[[AgeOn, exact = TRUE]] UV - B27.vec[[UV,exact=TRUE]] X - Surv(AO,UV) survdiff(X,rho=0,data=uvf) Error in x$terms : $ operator is invalid for atomic vectors I have read ?data.frame extract.data.frame but I cannot understand how I might structure this differently so it extracts the required columns from this dataframe. For the second 2 attempts I am not using the $ term. Sorry if this seems basic but cannot understand why attempt 1 or 2 doesn't work. thanks Philip __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Finding suspicious data points?
Dear list, I would like to find data points that at least should be checked one more time before I process them further. I've had a look at the outliers package for this, and the outliers function in that package, but it appears to only return one value. An example: outlier(c(1:3,rnorm(1000,mean=10,sd=300))) [1] 1 I think at least 1,2 and 3 should be checked in this case. Any ideas on how to achieve this in R? Actually, the real data I will be investigating consist of vector norms and angles (in an attempt to identify either very short, very long vectors, or vectors pointing in an odd angle for the category to which it has been assigned) so a 2D method would be even better. I would much appreciate any help I can get on this, /Fredrik -- Life is like a trumpet - if you don't put anything into it, you don't get anything out of it. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
That's not how to do it. Please follow the link in the footer and the correct way to subscribe should become clear. -pd On Jan 25, 2012, at 21:49 , Paul Johnston wrote: subscribe __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding suspicious data points?
According to the help file for 'outlier' , (quoting) x a data sample, vector in most cases. If argument is a dataframe, then outlier is calculated for each column by sapply. The same behavior is applied by apply when the matrix is given. (endquote) Looks like you could create a matrix that looks like an upper triangular like 1 1 1 NA 2 2 NA NA 3 and see the results. However, since 'outlier' just returns the value furthest from the mean, this doesn't really provide much information. If I were to write a function to find genuine outliers, I would do something like x[ abs(x-mean(x)) 3*sd(x)] , thus returning all values more than 3-sigma from the mean. quote I would like to find data points that at least should be checked one more time before I process them further. I've had a look at the outliers package for this, and the outliers function in that package, but it appears to only return one value. An example: outlier(c(1:3,rnorm(1000,mean=10,sd=300))) [1] 1 I think at least 1,2 and 3 should be checked in this case. Any ideas on how to achieve this in R? Actually, the real data I will be investigating consist of vector norms and angles (in an attempt to identify either very short, very long vectors, or vectors pointing in an odd angle for the category to which it has been assigned) so a 2D method would be even better. I would much appreciate any help I can get on this, -- Sent from my Cray XK6 Pendeo-navem mei anguillae plena est. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I use the cut function to assign specific cut points?
It is not valid to categorize BMI. This will result in major loss of information and residual confounding. Plus there is huge heterogeneity in the BMI = 30 group. Details are at http://biostat.mc.vanderbilt.edu/CatContinuous and see these articles: @Article{fil07cat, author = {Filardo, Giovanni and Hamilton, Cody and Hamman, Baron and Ng, Hon K. T. and Grayburn, Paul}, title ={Categorizing {BMI} may lead to biased results in studies investigating in-hospital mortality after isolated {CABG}}, journal = J Clin Epi, year = 2007, volume = 60, pages ={1132-1139}, annote = {BMI;CABG;surgical adverse events;hospital mortality;epidemiology;smoothing methods;categorization;categorizing continuous variables;investigators should waive categorization entirely and use smoothed functions for continuous variables;examples of non-monotonic relationships} } @Article{roy06dic, author = {Royston, Patrick and Altman, Douglas G. and Sauerbrei, Willi}, title ={Dichotomizing continuous predictors in multiple regression: a bad idea}, journal = Stat in Med, year = 2006, volume = 25, pages ={127-141}, annote = {continuous covariates;dichotomization;categorization;regression;efficiency;clinical research;residual confounding;destruction of statistical inference when cutpoints are chosen using the response variable;varying effect estimates when change cutpoints;difficult to interpret effects when dichotomize;nice plot showing effect of categorization;PBC data} } If you work with colleagues who tell you this is the way it's done don't go down without a fight. In general, good statistical practice dictates that categorization is only done for producing certain tables (for which case you might use the cut2 function in the Hmisc package). Even that will change as we incorporate more micrographics (think of loess plots with BMI on the x-axis) within table cells as is now done in the Hmisc summary.formula function for purely categorical variables. Frank citadel wrote I am new to R, and I am trying to cut a continuous variable BMI into different categories and can't figure out how to use it. I would like to cut it into four groups: 20, 20-25, 25-30 and = 30. I am having difficulty figuring the code for 20 and =30? Please help. Thank you. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/How-do-I-use-the-cut-function-to-assign-specific-cut-points-tp4329788p4330380.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting data from trellis object
tsippel wrote on 01/25/2012 01:23:05 PM: Simple question (I hope?). How does one extract the data from a trellis object? I need to get the data from a call to histogram(). A simple example below... dat-data.frame(year=as.factor(round(runif(500, 1990, 2010))), x=rnorm(500, 75, 35), y=rep(seq(1,50,by=1), times=10)) histogram(data=dat, y~x | year, type='count') How do I get the actual numbers plotted in the histograms from this call to histogram()? Thanks, Tim hist.stuff - histogram(data=dat, y~x | year, type='count') lapply(hist.stuff, print) Jean [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Conditional Random Fields
Are there other packages besides CRF available on R, for Conditional Random Fields ? Thanks in advance. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Request for help on manipulation large data sets
Dear All, I would like to ask for help on how to read different files automatically and do analysis using scripts. 1. Description of the data 1.1. there are 5 text files, each of which contains cleaned data for the same 100 SNPs. Observations (e.g., position on gnome, alelle type, ...) for SNPs are rows ordered by the SNP numbers, 1.2. there are 1 text file, containing the expression level of mRNAs 9 (and other information), which are rows ordered by mRNA numbers, So for each SNP the sample size is 5. 2. Description of aim Take SNP 1 and mRNA 1 for example. Write a scrip that can 2.1 extract row 1 from each of the 5 text files for SNP data 2.2 extract row 1 from the mRNA text file 2.3 regress mRNA 1 counts on SNP 1 2.4 store regression results, SNP number, mRNA number and their positions on gnome 3. key thing I need help automatic extraction of observations for one particular SNP from SNP data files, since there are actually 400 SNP data files, or more generally extract information automatically from different files and store it properly for operation. Thanks a lot! Chee [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R.lnk shortcut: Warning message when opening Microsoft Word 2003
On 12-01-25 3:46 PM, Grant Anderson wrote: Whenever I open Microsoft Word (2003), I get this warning two times---and then must manually 'OK' it twice before I can proceed in Word (a pain to do):  The drive or network connection that the shortcut 'R.lnk' refers to is unavailable. Make sure that the disk is properly inserted or the network resource is available, and then try again.  Perhaps I linked Word with R, somehow, when I recently installed R and Word on my new computer???  Anyhow, since I don't want any Word-R interaction, how can I get rid of these two annoying warnings? I've already searched my entire hard drive for 'R.lnk' (using X1.com) and found only those R-links I understand and want. This is clearly a Microsoft bug, so you should ask them. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R extracting regression coefficients from multiple regressions using lapply command
Michael Wither wrote on 01/26/2012 12:08:19 AM: Hi, I have a question about running multiple in regressions in R and then storing the coefficients. I have a large dataset with several variables, one of which is a state variable, coded 1-50 for each state. I'd like to run a regression of 28 select variables on the remaining 27 variables of the dataset (there are 55 variables total), and specific for each state, ie run a regression of variable1 on covariate1, covariate2, ..., covariate27 for observations where state==1. I'd then like to repeat this for variable1 for states 2-50, and the repeat the whole process for variable2, variable3,..., variable28. I think I've written the correct R code to do this, but the next thing I'd like to do is extract the coefficients, ideally into a coefficient matrix. Could someone please help me with this? Here's the code I've written so far, I'm not sure if this is the best way to do this. Please help me. for (num in 1:50) { #PUF is the data set I'm using #Subset the data by states PUFnum - subset(PUF, state==num) #Attach data set with state specific data attach(PUFnum) #Run our prediction regression #the variables class1 through e19700 are the 27 covariates I want to use regression - lapply(PUFnum, function(z) lm(z ~ class1+class2+class3+class4+class5+class6+class7+xtot+e00200+e00300 +e00600+e00900+e01000+p04470+e04800+e09600+e07180+e07220+e07260 +e06500+e10300+e59720+e11900+e18425+e18450+e18500+e19700)) Beta - lapply(regression, function(d) d- coef(regression$d)) detach(PUFnum) } Thanks, Mike This should help you get started. # You don't provide any sample data, so I made some up myself nstates - 5 nobs - 30 nys - 3 nxs - 4 PUF - data.frame(matrix(rnorm(nstates*nobs*(nys+nxs)), nrow=nstates*nobs, dimnames=list(NULL, c(paste(Y, 1:nys, sep=), paste(X, 1:nxs, sep=) PUF$state - rep(1:nstates, nobs) head(PUF) # create a character vector of all your covariate names # separated by a plus sign # this will serve as the right half of your regression equations covariates - paste(names(PUF)[nys + (1:nxs)], collapse= + ) # create an empty array to be filled with coefficients coefs - array(NA, dim=c(nstates, nys, nxs+1)) # fill the array with coefficients # this will work for you if the first 28 columns of your PUF # data frame are the response variables for(i in 1:nstates) { for(j in 1:nys) { coefs[i, j, ] - lm(formula(paste(names(PUF)[j], covariates, sep= ~ )), data=PUF[PUF$state==i, ])$coef }} coefs Jean [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Inserting a character into a character string XXXX
Hello everyone, I have a character vector of 24 hour time values in the format hm without the delimiting :. How can I insert the : immediately to the left of the second digit from the right? mytimes-scan(what=) 1457 1457 1310 1158 137 1855 Thanks! Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] null distribution of binom.test p values
Dear R-help, I must be missing something very obvious, but I am confused as to why the null distribution for p values generated by binom.test() appears to be non-uniform. The histogram generated below has a trend towards values closer to 1 than 0. I expected it to be flat. hist(sapply(1:1000, function(i,n=100) binom.test(sum(rnorm(n)0),n,p=0.5,alternative=two)$p.value)) This trend is more pronounced for small n, and the distribution appears uniform for larger n, say n=1000. I had expected the distribution to be discrete for small n, but not skewed. Can anyone explain why? Many thanks, Chris. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] eRm package - Rasch simulation
When I try to create a Rasch simulation of data using the sim.rasch function, I get more items than I intend #My code library(eRm) #Number of items k - 20 #Number of participants n - 100 #Create Rasch Data #sim.rasch(persons, items, seed = NULL, cutpoint = randomized) r.simulation - sim.rasch(n,k) I end up with 20 participants and 100 items, but the instructions say otherwise (and I believe I have the most up to date). I reverse the n and k sim.rasch(k,n); it works out to how I intended; 100 participants and 20 items. Can someone explain to me what I'm missing? I've read the package instructions and didn't see myself missing something (inputting vectors instead of an integer). James [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] eRm - Rasch modeling - First item missing from estimation
I'm trying to kick off some of the rust and learn some of the R packages for Rasch modeling. When I tried using the eRm package, I get item difficulty estimates for all my items accept the first (in terms of order) item. #Begin code library(eRm) r.simulation - sim.rasch(20,100) r.data - r.simulation$items #eRm results erm.rasch - RM(r.data) names(erm.rasch) erm.items - erm.rasch$etapar erm.items #end code All items accept V1 (the first item in the list) show up. James [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How do I compare a multiple staged response with multivariables to a Development Index?
Hi R- listeners, I should add that I would like also to compare my field data to an index model. The index was created by using the following script: devel.index - function(values, weights=c(1, 2, 3, 4, 5, 6)) { foo - values*weights return(apply(foo, 1, sum) / apply(values, 1, sum)) } Background: Surveyed turtle egg embryos have been categorized into 6 stages of development in the field. The stages in the field data are named ST0, ST1, ST2, ST3, ST4, Shells. from the data = data.to.analyze. Q? 1. What is the best way to analyze the field data on embryonic development of 6 stages? 2. Doing this while considering, testing the variables: Veg, HTL, Aeventexhumed, Sector, Rayos, TotalEggs? 3. And then compare the results to a development index. The goal is to determine hatching success in various areas of the beach. And try to create a development index of these microenvironments. Seasonality would play a key role. Is this possible? Many thanks! Saludos, Jean -- View this message in context: http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4329909.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 107, Issue 25
Hi Rui, Thanks for your reply to my post. My code still has various shortcomings but at least now it is fully functional. It may be that, as I transition to using R, I'll have to live with some less than ideal code, at least at the outset. I'll just have to write and re-write my code as I improve. Appreciate your help. Paul Message: 66 Date: Tue, 24 Jan 2012 09:54:57 -0800 (PST) From: Rui Barradas ruipbarra...@sapo.pt To: r-help@r-project.org Subject: Re: [R] Checking for invalid dates: Code works but needs improvement Message-ID: 1327427697928-4324533.p...@n4.nabble.com Content-Type: text/plain; charset=us-ascii Hello, Point 3 is very simple, instead of 'print' use 'cat'. Unlike 'print' it allows for several arguments and (very) simple formating. { cat(Error: Invalid date values in, DateNames[[i]], \n, TestDates[DateNames][[i]][TestDates$Invalid==1], \n) } Rui Barradas Message: 53 Date: Tue, 24 Jan 2012 08:54:49 -0800 (PST) From: Paul Miller pjmiller...@yahoo.com To: r-help@r-project.org Subject: [R] Checking for invalid dates: Code works but needs improvement Message-ID: 1327424089.1149.yahoomailclas...@web161604.mail.bf1.yahoo.com Content-Type: text/plain; charset=us-ascii Hello Everyone, Still new to R. Wrote some code that finds and prints invalid dates (see below). This code works but I suspect it's not very good. If someone could show me a better way, I'd greatly appreciate it. Here is some information about what I'm trying to accomplish. My sense is that the R date functions are best at identifying invalid dates when fed character data in their default format. So my code converts the input dates to character, breaks them apart using strsplit, and then reformats them. It then identifies which dates are missing in the sense that the month or year are unknown and prints out any remaining invalid date values. As I see it, the code has at least 4 shortcomings. 1. It's too long. My understanding is that skilled programmers can usually or often complete tasks like this in a few lines. 2. It's not vectorized. I started out trying to do something that was vectorized but ran into problems with the strsplit function. I looked at the help file and it appears this function will only accept a single character vector. 3. It prints out the incorrect dates but doesn't indicate which date variable they belong to. I tried various things with paste but never came up with anything that worked. Ideally, I'd like to get something that looks roughly like: Error: Invalid date values in birthDT 21931-11-23 1933-06-31 Error: Invalid date values in diagnosisDT 2010-02-30 4. There's no way to specify names for input and output data. I imagine this would be fairly easy to specify this in the arguments to a function but am not sure how to incorporate it into a for loop. Thanks, Paul ## Code for detecting invalid dates ## Test Data connection - textConnection( 1 11/23/21931 05/23/2009 un/17/2011 2 06/20/1940 02/30/2010 03/17/2011 3 06/17/1935 12/20/2008 07/un/2011 4 05/31/1937 01/18/2007 04/30/2011 5 06/31/1933 05/16/2009 11/20/un ) TestDates - data.frame(scan(connection, list(Patient=0, birthDT=, diagnosisDT=, metastaticDT=))) close(connection) TestDates class(TestDates$birthDT) class(TestDates$diagnosisDT) class(TestDates$metastaticDT) List of Date Variables DateNames - c(birthDT, diagnosisDT, metastaticDT) Read Dates for (i in seq(TestDates[DateNames])){ TestDates[DateNames][[i]] - as.character(TestDates[DateNames][[i]]) TestDates$ParsedDT - strsplit(TestDates[DateNames][[i]],/) TestDates$Month - sapply(TestDates$ParsedDT,function(x)x[1]) TestDates$Day - sapply(TestDates$ParsedDT,function(x)x[2]) TestDates$Year - sapply(TestDates$ParsedDT,function(x)x[3]) TestDates$Day[TestDates$Day==un] - 15 TestDates[DateNames][[i]] - with(TestDates, paste(Year, Month, Day, sep = -)) is.na( TestDates[DateNames][[i]] [TestDates$Month==un] ) - T is.na( TestDates[DateNames][[i]] [TestDates$Year==un] ) - T TestDates$Date - as.Date(TestDates[DateNames][[i]], format=%Y-%m-%d) TestDates$Invalid - ifelse(is.na(TestDates$Date) !is.na(TestDates[DateNames][[i]]), 1, 0) if( sum(TestDates$Invalid)==0 ) { TestDates[DateNames][[i]] - TestDates$Date } else { print ( TestDates[DateNames][[i]][TestDates$Invalid==1]) } TestDates - subset(TestDates, select = -c(ParsedDT, Month, Day, Year, Date, Invalid)) } TestDates class(TestDates$birthDT) class(TestDates$diagnosisDT) class(TestDates$metastaticDT) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inserting a character into a character string XXXX
Hello, Dan, you could probably use a combination of nchar(), substr() (or substring()) and paste(). Take a look at their online help pages. Hth -- Gerrit Hello everyone, I have a character vector of 24 hour time values in the format hm without the delimiting :. How can I insert the : immediately to the left of the second digit from the right? mytimes-scan(what=) 1457 1457 1310 1158 137 1855 Thanks! Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inserting a character into a character string XXXX
sub(([[:digit:]]{2,2})$, :\\1, mytimes) [1] 14:57 14:57 13:10 11:58 1:37 18:55 That will convert 05 to :05 and will do nothing to 5. Pad with 0's before calling sub if that is required. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Dan Abner Sent: Thursday, January 26, 2012 6:50 AM To: r-help@r-project.org Subject: [R] Inserting a character into a character string Hello everyone, I have a character vector of 24 hour time values in the format hm without the delimiting :. How can I insert the : immediately to the left of the second digit from the right? mytimes-scan(what=) 1457 1457 1310 1158 137 1855 Thanks! Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Checking for invalid dates: Code works but needs improvement
Sorry, sent this earlier but forgot to add an informative subject line. Am resending, in the hopes of getting further replies. My apologies. Hope this is OK. Paul Hi Rui, Thanks for your reply to my post. My code still has various shortcomings but at least now it is fully functional. It may be that, as I transition to using R, I'll have to live with some less than ideal code, at least at the outset. I'll just have to write and re-write my code as I improve. Appreciate your help. Paul Message: 66 Date: Tue, 24 Jan 2012 09:54:57 -0800 (PST) From: Rui Barradas ruipbarra...@sapo.pt To: r-help@r-project.org Subject: Re: [R] Checking for invalid dates: Code works but needs improvement Message-ID: 1327427697928-4324533.p...@n4.nabble.com Content-Type: text/plain; charset=us-ascii Hello, Point 3 is very simple, instead of 'print' use 'cat'. Unlike 'print' it allows for several arguments and (very) simple formating. { cat(Error: Invalid date values in, DateNames[[i]], \n, TestDates[DateNames][[i]][TestDates$Invalid==1], \n) } Rui Barradas Message: 53 Date: Tue, 24 Jan 2012 08:54:49 -0800 (PST) From: Paul Miller pjmiller...@yahoo.com To: r-help@r-project.org Subject: [R] Checking for invalid dates: Code works but needs improvement Message-ID: 1327424089.1149.yahoomailclas...@web161604.mail.bf1.yahoo.com Content-Type: text/plain; charset=us-ascii Hello Everyone, Still new to R. Wrote some code that finds and prints invalid dates (see below). This code works but I suspect it's not very good. If someone could show me a better way, I'd greatly appreciate it. Here is some information about what I'm trying to accomplish. My sense is that the R date functions are best at identifying invalid dates when fed character data in their default format. So my code converts the input dates to character, breaks them apart using strsplit, and then reformats them. It then identifies which dates are missing in the sense that the month or year are unknown and prints out any remaining invalid date values. As I see it, the code has at least 4 shortcomings. 1. It's too long. My understanding is that skilled programmers can usually or often complete tasks like this in a few lines. 2. It's not vectorized. I started out trying to do something that was vectorized but ran into problems with the strsplit function. I looked at the help file and it appears this function will only accept a single character vector. 3. It prints out the incorrect dates but doesn't indicate which date variable they belong to. I tried various things with paste but never came up with anything that worked. Ideally, I'd like to get something that looks roughly like: Error: Invalid date values in birthDT 21931-11-23 1933-06-31 Error: Invalid date values in diagnosisDT 2010-02-30 4. There's no way to specify names for input and output data. I imagine this would be fairly easy to specify this in the arguments to a function but am not sure how to incorporate it into a for loop. Thanks, Paul ## Code for detecting invalid dates ## Test Data connection - textConnection( 1 11/23/21931 05/23/2009 un/17/2011 2 06/20/1940 02/30/2010 03/17/2011 3 06/17/1935 12/20/2008 07/un/2011 4 05/31/1937 01/18/2007 04/30/2011 5 06/31/1933 05/16/2009 11/20/un ) TestDates - data.frame(scan(connection, list(Patient=0, birthDT=, diagnosisDT=, metastaticDT=))) close(connection) TestDates class(TestDates$birthDT) class(TestDates$diagnosisDT) class(TestDates$metastaticDT) List of Date Variables DateNames - c(birthDT, diagnosisDT, metastaticDT) Read Dates for (i in seq(TestDates[DateNames])){ TestDates[DateNames][[i]] - as.character(TestDates[DateNames][[i]]) TestDates$ParsedDT - strsplit(TestDates[DateNames][[i]],/) TestDates$Month - sapply(TestDates$ParsedDT,function(x)x[1]) TestDates$Day - sapply(TestDates$ParsedDT,function(x)x[2]) TestDates$Year - sapply(TestDates$ParsedDT,function(x)x[3]) TestDates$Day[TestDates$Day==un] - 15 TestDates[DateNames][[i]] - with(TestDates, paste(Year, Month, Day, sep = -)) is.na( TestDates[DateNames][[i]] [TestDates$Month==un] ) - T is.na( TestDates[DateNames][[i]] [TestDates$Year==un] ) - T TestDates$Date - as.Date(TestDates[DateNames][[i]], format=%Y-%m-%d) TestDates$Invalid - ifelse(is.na(TestDates$Date) !is.na(TestDates[DateNames][[i]]), 1, 0) if( sum(TestDates$Invalid)==0 ) { TestDates[DateNames][[i]] - TestDates$Date } else { print ( TestDates[DateNames][[i]][TestDates$Invalid==1]) } TestDates - subset(TestDates, select = -c(ParsedDT, Month, Day, Year, Date, Invalid)) } TestDates class(TestDates$birthDT) class(TestDates$diagnosisDT) class(TestDates$metastaticDT) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] Coloring Canada provinces (package maps?)
Barry, thanks a lot! I was able to read in Candian data set from gadm: library(raster) # Finding ISO3 code for Canada getData('ISO3') # Canada's code is CAN # Reading in data at different levels can0-getData('GADM', country=CAN, level=0) can1-getData('GADM', country=CAN, level=1) can2-getData('GADM', country=CAN, level=2) class(can0) str(can0) class(can1) str(can1) Apologies for a novice question (I've never worked with maps and raster before): what is the way in raster to see what are the geographic units within each data set (can0, can1, can2)? And what function allows to plot them and color them? Thanks a lot for the hints! Dimitri On Wed, Jan 25, 2012 at 4:37 PM, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: On Wed, Jan 25, 2012 at 9:25 PM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Dear R'ers, I am wondering what is the smallest geographicterritorial unit available for formatting in Canada. Provinces? I know that in the US it is the county so that I can color US counties any way I want, for example: ### Example for coloring US counties ### Creating an ARTIFICIAL criterion for coloring US counties: library(maps) If you want to extend your skills beyond the map package then you can plot anything that you can get a shapefile, or other common geospatial data set of, using the sp packages and friends such as maptools and rgdal. gadm has four levels of Canadian boundaries, at first glance - country, province (black), something smaller than province (blue) and then red which looks like urban divisions. The province upper-left doesn't seem to have any blue subdivisions, but that's possibly because there would be more subdivisions than people who actually live there. http://www.gadm.org/download Gadm also has a facility to download the data as .Rdata objects that can load straight into R. You might want to ask questions about spatial data programming on R-sig-geo or even on www.stackoverflow.com with the R tag. Barry -- Dimitri Liakhovitski marketfusionanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] null distribution of binom.test p values
I believe that what you are seeing is due to the discrete nature of the binomial test. When I run your code below I see the bar between 0.9 and 1.0 is about twice as tall as the bar between 0.0 and 0.1, but the bar between 0.8 and 0.9 is not there (height 0), if you average the top 2 bars (0.8-0.9 and 0.9-1.0) then the average height is similar to that of the lowest bar. The bar between 0.5 and 0.6 is also 0, if you average that one with the next 2 (0.6-0.7 and 0.7-0.8) then they are also similar to the bars near 0. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Chris Wallace Sent: Thursday, January 26, 2012 5:44 AM To: r-help@r-project.org Subject: [R] null distribution of binom.test p values Dear R-help, I must be missing something very obvious, but I am confused as to why the null distribution for p values generated by binom.test() appears to be non-uniform. The histogram generated below has a trend towards values closer to 1 than 0. I expected it to be flat. hist(sapply(1:1000, function(i,n=100) binom.test(sum(rnorm(n)0),n,p=0.5,alternative=two)$p.value)) This trend is more pronounced for small n, and the distribution appears uniform for larger n, say n=1000. I had expected the distribution to be discrete for small n, but not skewed. Can anyone explain why? Many thanks, Chris. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in ifelse(append, a, w) : , (list) object cannot be coerced to type 'logical'
Hello I will appreciate your help with the following. Running a script in R 2.14.1 under windows vista I get the following error message: Error in ifelse(append, a, w) : (list) object cannot be coerced to type 'logical' However, the very same script runs perfectly well under Ubuntu. My understanding is that this type of error is associated to writing data to disk. So the problem is likely caused by the following statements, which are inside a loop: P.vector - c(P1, P2, P3) write (P.vector, file = Type2_simul3_2012.txt, sep =\t , append =T). I have read that including \ in such statements can be problematic, because it is a scape character in R, but trying sep= instead of \t did not solve the problem. Any help will be much appreciated. Thanks in advance. Juan A. Balbuena -- Dr. Juan A. Balbuena Marine Zoology Unit Cavanilles Institute of Biodiversity and Evolutionary Biology University of Valencia [1]http://www.uv.es/~balbuena P.O. Box 22085 [2]http://www.uv.es/cavanilles/zoomarin/index.htm 46071 Valencia, Spain [3]http://cetus.uv.es/mullpardb/index.html e-mail: [4]j.a.balbu...@uv.estel. +34 963 543 658fax +34 963 543 733 NOTE! For shipments by EXPRESS COURIER use the following street address: C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain. References 1. http://www.uv.es/%7Ebalbuena 2. http://www.uv.es/cavanilles/zoomarin/index.htm 3. http://cetus.uv.es/mullpardb/index.html 4. mailto:j.a.balbu...@uv.es __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] adding additional information to histogram
Hi, I am a beginner with R, and I think the answer to my question will seem obvious, but after searching and trying without success I've decided to post to the list. I am working with data loaded from a csv filewith these fields: order_id, item_value As an order can have multiple items, an order_id may be present multiple times in the CSV. I managed to compute the total value and the number of items for each order: oli - read.csv(/tmp/order_line_items_data.csv, header=TRUE) orders_values - tapply(oli[[2]], oli[[1]], sum) items_per_order - tapply(oli[[2]], oli[[1]], length) I then can display the histogram of the order values: hist(orders_values, breaks=c(10*0:20,800), xlim=c(0,200), prob=TRUE) Now on this histogram, I would like to display the average number of items of the orders in each group (defined with the breaks). So for the bar of orders with value 0 to 10, I'd like to display the average number of items of these orders. Thanks in advance Raph __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function for grouping
what if I don't need to store the combination results, I just want to get the combination result for a given row. e.g. for the 5 elements divided into 3 groups , and if I give another parameter which is the row number of the results, in petr's example, say if I set row number to 1, then I get 1,2,3,1,1. So I need a systematic way of generating the combination, but I don't need all the combinations in one go. Many thanks yan -- View this message in context: http://r.789695.n4.nabble.com/function-for-grouping-tp4324436p4330877.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in ifelse(append, a, w) : , (list) object cannot be coerced to type 'logical'
On Jan 26, 2012, at 10:17 AM, Juan Antonio Balbuena wrote: Hello I will appreciate your help with the following. Running a script in R 2.14.1 under windows vista I get the following error message: Error in ifelse(append, a, w) : (list) object cannot be coerced to type 'logical' However, the very same script runs perfectly well under Ubuntu. My understanding is that this type of error is associated to writing data to disk. So the problem is likely caused by the following statements, which are inside a loop: P.vector - c(P1, P2, P3) write (P.vector, file = Type2_simul3_2012.txt, sep =\t , append =T). I have read that including \ in such statements can be problematic, because it is a scape character in R, but trying sep= instead of \t did not solve the problem. Isn't it much more likely that your failure to use TRUE and instead taking the lazy and failure-prone shortcut of using only T that is the root of the problem. Any help will be much appreciated. Thanks in advance. Juan A. Balbuena -- Dr. Juan A. Balbuena Marine Zoology Unit Cavanilles Institute of Biodiversity and Evolutionary Biology University of Valencia [1]http://www.uv.es/~balbuena P.O. Box 22085 [2]http://www.uv.es/cavanilles/zoomarin/index.htm 46071 Valencia, Spain [3]http://cetus.uv.es/mullpardb/index.html e-mail: [4]j.a.balbu...@uv.estel. +34 963 543 658fax +34 963 543 733 NOTE! For shipments by EXPRESS COURIER use the following street address: C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain. References 1. http://www.uv.es/%7Ebalbuena 2. http://www.uv.es/cavanilles/zoomarin/index.htm 3. http://cetus.uv.es/mullpardb/index.html 4. mailto:j.a.balbu...@uv.es __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lattice panels with grouped extra data as text?
I have a problem with including extra data in a lattice graphic. I am trying to put some calculated values in an extra column of a boxplot. A minimal example should show what I am trying to do: foo - data.frame( Treatment=rnorm(1:12,2), Variant=c(A,A,B,C,D,C,B,D,D,B,B,A), Szenario=c(Dry,Wet,Dry,Dry,Wet,Dry,Wet,Wet,Dry,Wet,Dry,Dry), Area=c(sample(100, 12)) ) require(lattice) require(latticeExtra) bwplot(Treatment ~ Variant | Szenario, data=foo, panel = function(...) { # Marking the extra column yellow panel.xblocks(c(4.5,5.0), c(rgb(255,255,0, alpha=127, max=255), rgb(255,255,0, alpha=127, max=255))) # Put in the extra values (instead of a string) panel.text(5, min(foo$Treatment), sum of foo$area per panel, srt=90, adj=c(0,0.5), cex=2) panel.bwplot(...) }, # Create extra space for a column xlim=range(0.5,5.5), scales = list(x = list(labels=c(NA,A,B,C,D,))) ) I would like to put summarized area values (from Foo$Area) in the yellow coloured columns of each panel. So afterwards there should be one sum in the first panel and another sum in the next panel (and so on, if more than two panels). It tried a little bit with groups and group.number but without success. Is there any easy solution for this? Any help is really appreciated. Rainer Hurling __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] List to Array: How to establish the dimension of the array
Please include the context of the discussion in your responses. See inline below. On 1/24/2012 11:33 PM, Ajay Askoolum wrote: Thanks you, I can get the length of aa with length(unlist(aa)). If aa has 4 dimensions, I imagine I'd need to do max(sapply(aa,sapply,sapply,length) Correct. How can I do this in a generic way? That is in a loop. I am clear about the exit condition for the loop. By generic, I assume that you mean without knowing the depth of the lists (that is, dimensions) to begin with? d-1 start loop if d = length(unlist(aa)) then exit loop Note that length(unlist(aa)) will only equal the product of the dimensions if the data is regular. That is, there is an entry for every combination of indices (within the dimensions). else d-d *expr How do I constructexpr such that it does d- d * length(aa) # first pass d- d * max(sapply(aa, length)) # second pass d- d * max(sapply(aa, sapply, length)) # third pass # ? ## fourth path etc (Apologies for the pseudo code describing the loop; I am not near a machine with R) I don't really understand this. One way I can thing of is to create a loop counter variable, say lc-1 and to increment it within the loop and then use a switch statement to execute the appropriate expression. This sems like a kludge to me. Is there a neater way? If you are trying to get back a vector of the dimensions, then this would work: dimRecursiveList - function(l) { if (class(l) == list) { c(length(l), dimRecursiveList(l[[1]])) } else { NULL } } From previous context: aa - list(list(list(37531.52, 62787.32, 5503.184, 33832.8), list(20469.60, 27057.27, 51160.25, 45165.24), list(957.932, 21902.94, 37531.52, 62787.32)), list(list(5503.184, 33832.8, 20469.6, 27057.27), list(51160.25, 45165.24, 957.932, 21902.94), list(37531.52, 62787.32, 5503.184, 33832.8))) Which then gives dimRecursiveList(aa) [1] 2 3 4 In this case, the data is regular, so Reduce(`*`, dimRecursiveList(aa)) == length(unlist(aa)) [1] TRUE If the data are not regular, and you want the dimension to be the largest, then it is more complicated (due to bookkeeping) dimRecursiveListIrregular - function(l) { if (class(l) == list) { dims - sapply(l, dimRecursiveListIrregular) if (is.null(dim(dims))) { if (all(is.na(dims))) { length(l) } else { c(length(l), max(dims, na.rm=TRUE)) } } else { c(length(l), apply(dims, 1, max, na.rm=TRUE)) } } else { NA } } If the data are regular, then it is better to convert this to an array. This function will do it for arbitrary depth RecursiveListToArray - function(l) { if(class(l) == list) { laply(l, RecursiveListToArray) } else { l } } aaa - RecursiveListToArray(aa) dim(aaa) [1] 2 3 4 -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health Science University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] eRm package - Rasch simulation
On Thu, 26 Jan 2012, James Holland wrote: When I try to create a Rasch simulation of data using the sim.rasch function, I get more items than I intend #My code library(eRm) #Number of items k - 20 #Number of participants n - 100 #Create Rasch Data #sim.rasch(persons, items, seed = NULL, cutpoint = randomized) r.simulation - sim.rasch(n,k) I end up with 20 participants and 100 items, but the instructions say otherwise (and I believe I have the most up to date). I reverse the n and k If I use your code above, I get R dim(r.simulation) [1] 100 20 which is exactly as expected: 100 rows = persons and 20 columns = items. This is with the current CRAN version of eRm: 0.14-0. Z sim.rasch(k,n); it works out to how I intended; 100 participants and 20 items. Can someone explain to me what I'm missing? I've read the package instructions and didn't see myself missing something (inputting vectors instead of an integer). James [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] eRm - Rasch modeling - First item missing from estimation
On Thu, 26 Jan 2012, James Holland wrote: I'm trying to kick off some of the rust and learn some of the R packages for Rasch modeling. When I tried using the eRm package, I get item difficulty estimates for all my items accept the first (in terms of order) item. #Begin code library(eRm) r.simulation - sim.rasch(20,100) r.data - r.simulation$items #eRm results erm.rasch - RM(r.data) names(erm.rasch) erm.items - erm.rasch$etapar erm.items #end code All items accept V1 (the first item in the list) show up. As explained on the manual page ?RM, one of the parameters is not identified (as the Rasch scale is latent with arbitrary 0). Either you can restrict the sum of all item parameters to zero (default) or the first item parameter. In both cases, the print output of the model omits the first item parameter. But coef(erm.rasch) will give you the appropriate expanded parameter vector. See also vignette(eRm, package = eRm) for more details. Z James [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Issues with PearsonIV distribution
Hi Michael, I am using the same version contributed by Martin Backer. I have been in touch with him via email as well. He said yesterday that he can help me with it but is busy for some time. This distribution function does not behave properly as can be seen in the attached spreadsheet. Check sheet RcodeOutput for Pearson Distribution fits in comparison to hyperbolic and normal fits on the same sample data. Regards, Gaurang -Original Message- From: R. Michael Weylandt [mailto:michael.weyla...@gmail.com] Sent: 25 January 2012 20:00 To: Jitendrabhai Gaurang (AGC) Cc: R-help@r-project.org Subject: Re: [R] Issues with PearsonIV distribution Which implementation of Pearson IV are you using? If it's trouble with a homebrewed one, you might want to use the p/q functions here: http://cran.r-project.org/web/packages/PearsonDS/index.html Otherwise, you likely will have to share some code for any concrete help. Michael On Wed, Jan 25, 2012 at 12:16 PM, gaurang.jitendrab...@aviva.com wrote: Hi team, I am facing issues with PearsonIV distribution fitting in R. I am applying Hyperbolic and PearsonIV distributions on the equity returns in UK over a period of 30 years. For the same data set i am getting strikingly different results under which Hyperbolic distribution does produce negative percentiles of the return after fitting but PearsonIV distribution does not. I think there is an issue with PearsonIV distribution in R; also there are no plots available for PearsonIV distribution. Can you help? I am ready to share my code. Gaurang Mehta Risk Calibration | Group Economic Capital | Aviva Group Email: gaurang.jitendrab...@aviva.com 14th Floor, St Helen's 1 Undershaft London, EC3P 3DQ Aviva plc, registered Office: St. Helen's, 1 Undershaft, London EC3P 3DQ. Registered in England No. 02468686. www.aviva.com This message and any attachments may be confidential or ...{{dropped:10}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Aviva plc, registered Office: St. Helen's, 1 Undershaft, London EC3P 3DQ. Registered in England No. 02468686. www.aviva.com This message and any attachments may be confidential or legally privileged. If you are not the intended recipient, please telephone or e-mail the sender and delete this message and any attachments from your system. Also, if you are not the intended recipient you must not copy this message or attachments or disclose the contents to any other person. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Aviva.__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] null distribution of binom.test p values
Greg, thanks for the reply. Unfortunately, I remain unconvinced! I ran a longer simulation, 100,000 reps. The size of the test is consistently too small (see below) and the histogram shows increasing bars even within the parts of the histogram with even bar spacing. See https://www-gene.cimr.cam.ac.uk/staff/wallace/hist.png y-sapply(1:10, function(i,n=100) binom.test(sum(rnorm(n)0),n,p=0.5,alternative=two)$p.value) mean(y0.01) # [1] 0.00584 mean(y0.05) # [1] 0.03431 mean(y0.1) # [1] 0.08646 Can that really be due to the discreteness of the distribution? C. On 26/01/12 16:08, Greg Snow wrote: I believe that what you are seeing is due to the discrete nature of the binomial test. When I run your code below I see the bar between 0.9 and 1.0 is about twice as tall as the bar between 0.0 and 0.1, but the bar between 0.8 and 0.9 is not there (height 0), if you average the top 2 bars (0.8-0.9 and 0.9-1.0) then the average height is similar to that of the lowest bar. The bar between 0.5 and 0.6 is also 0, if you average that one with the next 2 (0.6-0.7 and 0.7-0.8) then they are also similar to the bars near 0. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (no subject)
Dear Rhelp, When I try to plot a barplot, I get the following message: Error in plot.new() : figure margins too large What does this mean and how can I fix it? I appreciate your help, Horatio Gates [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] null distribution of binom.test p values
Yes that is due to the discreteness of the distribution, consider the following: binom.test(39,100,.5) Exact binomial test data: 39 and 100 number of successes = 39, number of trials = 100, p-value = 0.0352 alternative hypothesis: true probability of success is not equal to 0.5 95 percent confidence interval: 0.2940104 0.4926855 sample estimates: probability of success 0.39 binom.test(40,100,.5) Exact binomial test data: 40 and 100 number of successes = 40, number of trials = 100, p-value = 0.05689 alternative hypothesis: true probability of success is not equal to 0.5 95 percent confidence interval: 0.3032948 0.5027908 sample estimates: probability of success 0.4 (you can do the same for 60 and 61) So notice that the probability of getting 39 or more extreme is 0.0352, but anything less extreme will result in not rejecting the null hypothesis (because the probability of getting a 40 or a 60 (dbinom(40,100,.5)) is about 1% each, so we see a 2% jump there). So the size/probability of a type I error will generally not be equal to alpha unless n is huge or alpha is chosen to correspond to a jump in the distribution rather than using common round values. I have seen suggestions that instead of the standard test we use a test that rejects the null for values 39 and more extreme, don't reject the null for 41 and less extreme, and if you see a 40 or 60 then you generate a uniform random number and reject if it is below a certain value (that value chosen to give an overall probability of type I error of 0.05). This will correctly size the test, but becomes less reproducible (and makes clients nervous when they present their data and you pull out a coin, flip it, and tell them if they have significant results based on your coin flip (or more realistically a die roll)). I think it is better in this case if you know your final sample size is going to be 100 to explicitly state that alpha will be 0.352 (but then you need to justify why you are not using the common 0.05 to reviewers). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: Chris Wallace [mailto:chris.wall...@cimr.cam.ac.uk] Sent: Thursday, January 26, 2012 9:36 AM To: Greg Snow Cc: r-help@r-project.org Subject: Re: [R] null distribution of binom.test p values Greg, thanks for the reply. Unfortunately, I remain unconvinced! I ran a longer simulation, 100,000 reps. The size of the test is consistently too small (see below) and the histogram shows increasing bars even within the parts of the histogram with even bar spacing. See https://www-gene.cimr.cam.ac.uk/staff/wallace/hist.png y-sapply(1:10, function(i,n=100) binom.test(sum(rnorm(n)0),n,p=0.5,alternative=two)$p.value) mean(y0.01) # [1] 0.00584 mean(y0.05) # [1] 0.03431 mean(y0.1) # [1] 0.08646 Can that really be due to the discreteness of the distribution? C. On 26/01/12 16:08, Greg Snow wrote: I believe that what you are seeing is due to the discrete nature of the binomial test. When I run your code below I see the bar between 0.9 and 1.0 is about twice as tall as the bar between 0.0 and 0.1, but the bar between 0.8 and 0.9 is not there (height 0), if you average the top 2 bars (0.8-0.9 and 0.9-1.0) then the average height is similar to that of the lowest bar. The bar between 0.5 and 0.6 is also 0, if you average that one with the next 2 (0.6-0.7 and 0.7-0.8) then they are also similar to the bars near 0. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inserting a character into a character string XXXX
Hi Bill, Thanks very much for your response. Can you suggest an approach for the pre-padding? Here is a more respresentative sample of the values: mytimes-scan(what=) 1334 2310 39 2300 1556 3 404 37 1320 4 211 2320 Thanks! Dan On Thu, Jan 26, 2012 at 10:41 AM, William Dunlap wdun...@tibco.com wrote: sub(([[:digit:]]{2,2})$, :\\1, mytimes) [1] 14:57 14:57 13:10 11:58 1:37 18:55 That will convert 05 to :05 and will do nothing to 5. Pad with 0's before calling sub if that is required. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Dan Abner Sent: Thursday, January 26, 2012 6:50 AM To: r-help@r-project.org Subject: [R] Inserting a character into a character string Hello everyone, I have a character vector of 24 hour time values in the format hm without the delimiting :. How can I insert the : immediately to the left of the second digit from the right? mytimes-scan(what=) 1457 1457 1310 1158 137 1855 Thanks! Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coloring Canada provinces (package maps?)
On Thu, Jan 26, 2012 at 4:03 PM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Barry, thanks a lot! I was able to read in Candian data set from gadm: library(raster) # Finding ISO3 code for Canada getData('ISO3') # Canada's code is CAN # Reading in data at different levels can0-getData('GADM', country=CAN, level=0) can1-getData('GADM', country=CAN, level=1) can2-getData('GADM', country=CAN, level=2) class(can0) str(can0) class(can1) str(can1) Apologies for a novice question (I've never worked with maps and raster before): what is the way in raster to see what are the geographic units within each data set (can0, can1, can2)? And what function allows to plot them and color them? These are now SpatialPolygonDataFrame objects - like data frames, but each row has an associated polygonal geometry. These actually come from the sp package which the raster package has loaded for you. class(can0) will tell you this. names(can1) will tell you the names of the attributes of the polygons which you can use like columns of a data frame. plot(can1) will plot it spplot(can1, columnname) will do a coloured plot. For more info, check the help for sp or read the R-Spatial Task View on CRAN which has everything you need to know about maps and such in R. Ask any more questions like this on the R-sig-geo mailing list where the mapping R people hang out... Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
I usually get that error when I'm replotting on a window/device that's already really small: e.g., on my Mac plot(1:5) ## Make window really small plot(1:6) # Throws error Michael On Thu, Jan 26, 2012 at 11:32 AM, Harish Eswaran hra...@att.net wrote: Dear Rhelp, When I try to plot a barplot, I get the following message: Error in plot.new() : figure margins too large What does this mean and how can I fix it? I appreciate your help, Horatio Gates [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aplpy recursive function on a list
On 1/25/2012 10:09 AM, patzoul wrote: I have 2 series of data a,b and I would like to calculate a new series which is z[t] = z[t-1]*a[t] + b[t] , z[1] = b[1]. How can I do that without using a loop? A combination of Reduce and Map will work. Map to stitch together the a and b vectors, Reduce to step along them, allowing for accumulation. a - c(2,4,3,5) b - c(1,3,5,7) z - Reduce(function(zm1, coef) {coef[1] * zm1 + coef[2]}, Map(c, a, b), init = b[1], accumulate = TRUE) z [1] 1 3 15 50 257 -- View this message in context: http://r.789695.n4.nabble.com/aplpy-recursive-function-on-a-list-tp4328017p4328017.html Sent from the R help mailing list archive at Nabble.com. -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health Science University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inserting a character into a character string XXXX
One way to pad with initial zeros is to convert your strings to integers and format the integers: sprintf(%04d, as.integer(mytimes)) [1] 1334 2310 0039 2300 1556 0003 0404 [8] 0037 1320 0004 0211 2320 It has the added benefit that the call to as.integer will warn you if any supposed integers don't look like integers. If you use this approach you don't need the call to sub(): tmp - as.integer(mytimes) sprintf(%02d:%02d, tmp%/%100, tmp%%100) [1] 13:34 23:10 00:39 23:00 15:56 00:03 [7] 04:04 00:37 13:20 00:04 02:11 23:20 You could also check that tmp%/%100 (the hour) and tmp%%100 (the minute) are in their expected ranges before calling sprintf. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: Dan Abner [mailto:dan.abne...@gmail.com] Sent: Thursday, January 26, 2012 8:55 AM To: William Dunlap Cc: r-help@r-project.org Subject: Re: [R] Inserting a character into a character string Hi Bill, Thanks very much for your response. Can you suggest an approach for the pre-padding? Here is a more respresentative sample of the values: mytimes-scan(what=) 1334 2310 39 2300 1556 3 404 37 1320 4 211 2320 Thanks! Dan On Thu, Jan 26, 2012 at 10:41 AM, William Dunlap wdun...@tibco.com wrote: sub(([[:digit:]]{2,2})$, :\\1, mytimes) [1] 14:57 14:57 13:10 11:58 1:37 18:55 That will convert 05 to :05 and will do nothing to 5. Pad with 0's before calling sub if that is required. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Dan Abner Sent: Thursday, January 26, 2012 6:50 AM To: r-help@r-project.org Subject: [R] Inserting a character into a character string Hello everyone, I have a character vector of 24 hour time values in the format hm without the delimiting :. How can I insert the : immediately to the left of the second digit from the right? mytimes-scan(what=) 1457 1457 1310 1158 137 1855 Thanks! Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I compare a multiple staged response with multivariables to a Development Index?
This might get more traction on the R-SIG-Ecology lists. And best of luck to you; I quite like turtles. Michael On Thu, Jan 26, 2012 at 4:37 AM, Jhope jeanwaij...@gmail.com wrote: Hi R- listeners, I should add that I would like also to compare my field data to an index model. The index was created by using the following script: devel.index - function(values, weights=c(1, 2, 3, 4, 5, 6)) { foo - values*weights return(apply(foo, 1, sum) / apply(values, 1, sum)) } Background: Surveyed turtle egg embryos have been categorized into 6 stages of development in the field. The stages in the field data are named ST0, ST1, ST2, ST3, ST4, Shells. from the data = data.to.analyze. Q? 1. What is the best way to analyze the field data on embryonic development of 6 stages? 2. Doing this while considering, testing the variables: Veg, HTL, Aeventexhumed, Sector, Rayos, TotalEggs? 3. And then compare the results to a development index. The goal is to determine hatching success in various areas of the beach. And try to create a development index of these microenvironments. Seasonality would play a key role. Is this possible? Many thanks! Saludos, Jean -- View this message in context: http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4329909.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coloring Canada provinces (package maps?)
Thank you very much, Barry! Dimitri On Thu, Jan 26, 2012 at 12:00 PM, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: On Thu, Jan 26, 2012 at 4:03 PM, Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote: Barry, thanks a lot! I was able to read in Candian data set from gadm: library(raster) # Finding ISO3 code for Canada getData('ISO3') # Canada's code is CAN # Reading in data at different levels can0-getData('GADM', country=CAN, level=0) can1-getData('GADM', country=CAN, level=1) can2-getData('GADM', country=CAN, level=2) class(can0) str(can0) class(can1) str(can1) Apologies for a novice question (I've never worked with maps and raster before): what is the way in raster to see what are the geographic units within each data set (can0, can1, can2)? And what function allows to plot them and color them? These are now SpatialPolygonDataFrame objects - like data frames, but each row has an associated polygonal geometry. These actually come from the sp package which the raster package has loaded for you. class(can0) will tell you this. names(can1) will tell you the names of the attributes of the polygons which you can use like columns of a data frame. plot(can1) will plot it spplot(can1, columnname) will do a coloured plot. For more info, check the help for sp or read the R-Spatial Task View on CRAN which has everything you need to know about maps and such in R. Ask any more questions like this on the R-sig-geo mailing list where the mapping R people hang out... Barry -- Dimitri Liakhovitski marketfusionanalytics.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?
To pretend that AIC solves this problem is to ignore that AIC is just a restatement of P-values. Frank Rubén Roa wrote I think we have gone through this before. First, the destruction of all aspects of statistical inference is not at stake, Frank Harrell's post notwithstanding. Second, checking all pairs is a way to see for _all pairs_ which model outcompetes which in terms of predictive ability by -2AIC or more. Just sorting them by the AIC does not give you that if no model is better than the next best by less than 2AIC. Third, I was not implying that AIC differences play the role of significance tests. I agree with you that model selection is better not understood as a proxy or as a relative of significance tests procedures. Incidentally, when comparing many models the AIC is often inconclusive. If one is bent on selecting just _the model_ then I check numerical optimization diagnostics such as size of gradients, KKT criteria, and other issues such as standard errors of parameter estimates and the correlation matrix of parameter estimates. For some reasons I don't find model averaging appealing. I guess deep in my heart I expect more from my model than just the best predictive ability. Rubén -Mensaje original- De: r-help-bounces@ [mailto:r-help-bounces@] En nombre de Ben Bolker Enviado el: miércoles, 25 de enero de 2012 15:41 Para: r-help@.ethz Asunto: Re: [R]How do I compare 47 GLM models with 1 to 5 interactions and unique combinations? Rubén Roa rroa at azti.es writes: A more 'manual' way to do it is this. If you have all your models named properly and well organized, say Model1, Model2, ..., Model 47, with a slot with the AIC (glm produces a list with one component named 'aic') then something like this: x - matrix(0,1081,3) x[,1:2] - t(combn(47,2)) for(i in 1:n){x[,3] - abs(unlist(combn(eval(as.name(paste(Model,i,sep=)))$aic[1,])- unlist(combn(eval(as.name(paste(Model,i,sep=)))$aic,2)[2,]))} will calculate all the 1081 AIC differences between paired comparisons of the 47 models. It may not be pretty but works for me. An AIC difference equal or less than 2 is a tie, anything higher is evidence for the model with the less AIC (Sakamoto et al., Akaike Information Criterion Statistics, KTK Scientific Publishers, Tokyo). I wouldn't go quite as far as Frank Harrell did in his answer, but (1) it seems silly to do all pairwise comparisons of models; one of the big advantages of information-theoretic approaches is that you can just list the models in order of AIC ... (2) the dredge() function from the MuMIn package may be useful for automating this sort of thing. There is an also an AICtab function in the emdbook package. (3) If you're really just interested in picking the single model with the best expected predictive capability (and all of the other assumptions of the AIC approach are met -- very large data set, good fit to the data, etc.), then just picking the model with the best AIC will work. It's when you start to make inferences on the individual parameters within that model, without taking account of the model selection process, that you run into trouble. If you are really concerned about good predictions then it may be better to do multi-model averaging *or* use some form of shrinkage estimator. (4) The delta-AIC2 means pretty much the same rule of thumb is fine, but it drives me nuts when people use it as a quasi-significance-testing rule, as in the simpler model is adequate if dAIC2. If you're going to work in the model selection arena, then don't follow hypothesis-testing procedures! A smaller AIC means lower expected K-L distance, period. For the record, Brian Ripley has often expressed the (minority) opinion that AIC is *not* appropriate for comparing non-nested models (e.g. lt;http://tolstoy.newcastle.edu.au/R/help/06/02/21794.htmlgt;). -Original Message- From: r-help-bounces at r-project.org on behalf of Milan Bouchet-Valat Sent: Wed 1/25/2012 10:32 AM To: Jhope Cc: r-help at r-project.org Subject: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations? Le mardi 24 janvier 2012 à 20:41 -0800, Jhope a écrit : Hi R-listers, I have developed 47 GLM models with different combinations of interactions from 1 variable to 5 variables. I have manually made each model separately and put them into individual tables (organized by the number of variables) showing the AIC score. I want to compare all of these models. 1) What is the best way to compare various models with unique combinations and different number of variables? See ?step or ?stepAIC (from package MASS) if you want an automated way of doing this. 2) I am trying to develop the most simplest model ideally. Even though adding another variable would lower the AIC, No, not necessarily. how do I interpret it is worth
Re: [R] function for grouping
On Thu, Jan 26, 2012 at 08:29:22AM -0800, yan wrote: what if I don't need to store the combination results, I just want to get the combination result for a given row. e.g. for the 5 elements divided into 3 groups , and if I give another parameter which is the row number of the results, in petr's example, say if I set row number to 1, then I get 1,2,3,1,1. So I need a systematic way of generating the combination, but I don't need all the combinations in one go. Many thanks Hi. I suppose that your question is meant for some larger example than 5 elements into 3 groups, since for this small case, we can store the whole table. Considering, for example, 200 elements and 3 groups, which you mentioned previously, this can probably be done using some implementation of large integers. The reason is that the final table has approximately 3^200/6 rows. So, the index of the last row has approx log2(3^200/6) = 314.4 bits. The standard numbers have 53 bits. For large integers, one can use packages gmp, Rmpfr or some package for symbolic algebra, for example Ryacas. Do you want to use a large integer m for a search of the exactly m-th partitioning in lexicographic order? Let me point out that sampling random partitionings can be done much easier. Petr. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inserting a character into a character string XXXX
Cheers Bill!! On Thu, Jan 26, 2012 at 12:03 PM, William Dunlap wdun...@tibco.com wrote: One way to pad with initial zeros is to convert your strings to integers and format the integers: sprintf(%04d, as.integer(mytimes)) [1] 1334 2310 0039 2300 1556 0003 0404 [8] 0037 1320 0004 0211 2320 It has the added benefit that the call to as.integer will warn you if any supposed integers don't look like integers. If you use this approach you don't need the call to sub(): tmp - as.integer(mytimes) sprintf(%02d:%02d, tmp%/%100, tmp%%100) [1] 13:34 23:10 00:39 23:00 15:56 00:03 [7] 04:04 00:37 13:20 00:04 02:11 23:20 You could also check that tmp%/%100 (the hour) and tmp%%100 (the minute) are in their expected ranges before calling sprintf. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: Dan Abner [mailto:dan.abne...@gmail.com] Sent: Thursday, January 26, 2012 8:55 AM To: William Dunlap Cc: r-help@r-project.org Subject: Re: [R] Inserting a character into a character string Hi Bill, Thanks very much for your response. Can you suggest an approach for the pre-padding? Here is a more respresentative sample of the values: mytimes-scan(what=) 1334 2310 39 2300 1556 3 404 37 1320 4 211 2320 Thanks! Dan On Thu, Jan 26, 2012 at 10:41 AM, William Dunlap wdun...@tibco.com wrote: sub(([[:digit:]]{2,2})$, :\\1, mytimes) [1] 14:57 14:57 13:10 11:58 1:37 18:55 That will convert 05 to :05 and will do nothing to 5. Pad with 0's before calling sub if that is required. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Dan Abner Sent: Thursday, January 26, 2012 6:50 AM To: r-help@r-project.org Subject: [R] Inserting a character into a character string Hello everyone, I have a character vector of 24 hour time values in the format hm without the delimiting :. How can I insert the : immediately to the left of the second digit from the right? mytimes-scan(what=) 1457 1457 1310 1158 137 1855 Thanks! Dan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Checking for invalid dates: Code works but needs improvement
Paul, I have a partial solution for you. It is partial in that I have not quite figured out the correct incantation to convert a 5 digit year (eg. 11/23/21931) properly using the R date functions. According to various sources (eg. man strptime and man strftime) as well as the R help for both functions, there are extended formats available, but I am having a bout of cerebral flatulence in getting them to work correctly and a search has not been fruitful. Perhaps someone else can offer some insights. That being said, with the exception of correctly handling that one situation, which arguably IS a valid date a long time in the future and which would otherwise result in a truncated year (first four digits only) as.Date(11/23/21931, format = %m/%d/%Y) [1] 2193-11-23 Here is one approach: # Check the date. If as.Date() fails or the input is 10 characters return it checkDate - function(x) as.character(x[is.na(as.Date(x, format = %m/%d/%Y)) | nchar(as.character(x)) 10]) lapply(TestDates[, -1], checkDate) $birthDT [1] 11/23/21931 06/31/1933 $diagnosisDT [1] 02/30/2010 $metastaticDT [1] un/17/2011 07/un/2011 11/20/un You could fine tune the checkDate() function to handle other formats, etc. HTH, Marc Schwartz On Jan 26, 2012, at 9:54 AM, Paul Miller wrote: Sorry, sent this earlier but forgot to add an informative subject line. Am resending, in the hopes of getting further replies. My apologies. Hope this is OK. Paul Hi Rui, Thanks for your reply to my post. My code still has various shortcomings but at least now it is fully functional. It may be that, as I transition to using R, I'll have to live with some less than ideal code, at least at the outset. I'll just have to write and re-write my code as I improve. Appreciate your help. Paul Message: 66 Date: Tue, 24 Jan 2012 09:54:57 -0800 (PST) From: Rui Barradas ruipbarra...@sapo.pt To: r-help@r-project.org Subject: Re: [R] Checking for invalid dates: Code works but needs improvement Message-ID: 1327427697928-4324533.p...@n4.nabble.com Content-Type: text/plain; charset=us-ascii Hello, Point 3 is very simple, instead of 'print' use 'cat'. Unlike 'print' it allows for several arguments and (very) simple formating. { cat(Error: Invalid date values in, DateNames[[i]], \n, TestDates[DateNames][[i]][TestDates$Invalid==1], \n) } Rui Barradas Message: 53 Date: Tue, 24 Jan 2012 08:54:49 -0800 (PST) From: Paul Miller pjmiller...@yahoo.com To: r-help@r-project.org Subject: [R] Checking for invalid dates: Code works but needs improvement Message-ID: 1327424089.1149.yahoomailclas...@web161604.mail.bf1.yahoo.com Content-Type: text/plain; charset=us-ascii Hello Everyone, Still new to R. Wrote some code that finds and prints invalid dates (see below). This code works but I suspect it's not very good. If someone could show me a better way, I'd greatly appreciate it. Here is some information about what I'm trying to accomplish. My sense is that the R date functions are best at identifying invalid dates when fed character data in their default format. So my code converts the input dates to character, breaks them apart using strsplit, and then reformats them. It then identifies which dates are missing in the sense that the month or year are unknown and prints out any remaining invalid date values. As I see it, the code has at least 4 shortcomings. 1. It's too long. My understanding is that skilled programmers can usually or often complete tasks like this in a few lines. 2. It's not vectorized. I started out trying to do something that was vectorized but ran into problems with the strsplit function. I looked at the help file and it appears this function will only accept a single character vector. 3. It prints out the incorrect dates but doesn't indicate which date variable they belong to. I tried various things with paste but never came up with anything that worked. Ideally, I'd like to get something that looks roughly like: Error: Invalid date values in birthDT 21931-11-23 1933-06-31 Error: Invalid date values in diagnosisDT 2010-02-30 4. There's no way to specify names for input and output data. I imagine this would be fairly easy to specify this in the arguments to a function but am not sure how to incorporate it into a for loop. Thanks, Paul ## Code for detecting invalid dates ## Test Data connection - textConnection( 1 11/23/21931 05/23/2009 un/17/2011 2 06/20/1940 02/30/2010 03/17/2011 3 06/17/1935 12/20/2008 07/un/2011 4 05/31/1937 01/18/2007 04/30/2011 5 06/31/1933 05/16/2009 11/20/un ) TestDates - data.frame(scan(connection, list(Patient=0, birthDT=, diagnosisDT=, metastaticDT=))) close(connection)
Re: [R] Checking for invalid dates: Code works but needs improvement
On Tue, Jan 24, 2012 at 11:54 AM, Paul Miller pjmiller...@yahoo.com wrote: Hello Everyone, Still new to R. Wrote some code that finds and prints invalid dates (see below). This code works but I suspect it's not very good. If someone could show me a better way, I'd greatly appreciate it. Here is some information about what I'm trying to accomplish. My sense is that the R date functions are best at identifying invalid dates when fed character data in their default format. So my code converts the input dates to character, breaks them apart using strsplit, and then reformats them. It then identifies which dates are missing in the sense that the month or year are unknown and prints out any remaining invalid date values. As I see it, the code has at least 4 shortcomings. 1. It's too long. My understanding is that skilled programmers can usually or often complete tasks like this in a few lines. 2. It's not vectorized. I started out trying to do something that was vectorized but ran into problems with the strsplit function. I looked at the help file and it appears this function will only accept a single character vector. 3. It prints out the incorrect dates but doesn't indicate which date variable they belong to. I tried various things with paste but never came up with anything that worked. Ideally, I'd like to get something that looks roughly like: Error: Invalid date values in birthDT 21931-11-23 1933-06-31 Error: Invalid date values in diagnosisDT 2010-02-30 4. There's no way to specify names for input and output data. I imagine this would be fairly easy to specify this in the arguments to a function but am not sure how to incorporate it into a for loop. Thanks, Paul ## Code for detecting invalid dates ## Test Data connection - textConnection( 1 11/23/21931 05/23/2009 un/17/2011 2 06/20/1940 02/30/2010 03/17/2011 3 06/17/1935 12/20/2008 07/un/2011 4 05/31/1937 01/18/2007 04/30/2011 5 06/31/1933 05/16/2009 11/20/un ) TestDates - data.frame(scan(connection, list(Patient=0, birthDT=, diagnosisDT=, metastaticDT=))) close(connection) TestDates class(TestDates$birthDT) class(TestDates$diagnosisDT) class(TestDates$metastaticDT) List of Date Variables DateNames - c(birthDT, diagnosisDT, metastaticDT) Read Dates for (i in seq(TestDates[DateNames])){ TestDates[DateNames][[i]] - as.character(TestDates[DateNames][[i]]) TestDates$ParsedDT - strsplit(TestDates[DateNames][[i]],/) TestDates$Month - sapply(TestDates$ParsedDT,function(x)x[1]) TestDates$Day - sapply(TestDates$ParsedDT,function(x)x[2]) TestDates$Year - sapply(TestDates$ParsedDT,function(x)x[3]) TestDates$Day[TestDates$Day==un] - 15 TestDates[DateNames][[i]] - with(TestDates, paste(Year, Month, Day, sep = -)) is.na( TestDates[DateNames][[i]] [TestDates$Month==un] ) - T is.na( TestDates[DateNames][[i]] [TestDates$Year==un] ) - T TestDates$Date - as.Date(TestDates[DateNames][[i]], format=%Y-%m-%d) TestDates$Invalid - ifelse(is.na(TestDates$Date) !is.na(TestDates[DateNames][[i]]), 1, 0) if( sum(TestDates$Invalid)==0 ) { TestDates[DateNames][[i]] - TestDates$Date } else { print ( TestDates[DateNames][[i]][TestDates$Invalid==1]) } TestDates - subset(TestDates, select = -c(ParsedDT, Month, Day, Year, Date, Invalid)) } TestDates class(TestDates$birthDT) class(TestDates$diagnosisDT) class(TestDates$metastaticDT) If s is a vector of character strings representing dates then bad is a logical vector which is TRUE for the bad ones and FALSE for the good ones (adjust as needed if a different date range is valid) so s[bad] is the bad inputs and the output d is a Date vector with NAs for the bad ones: x - gsub(un, 15, s) d - as.Date(x, %m/%d/%Y) bad - is.na(d) | d as.Date(1900-01-01) | d Sys.Date() d[bad] - NA -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aplpy recursive function on a list
On 26-01-2012, at 17:58, Brian Diggs wrote: On 1/25/2012 10:09 AM, patzoul wrote: I have 2 series of data a,b and I would like to calculate a new series which is z[t] = z[t-1]*a[t] + b[t] , z[1] = b[1]. How can I do that without using a loop? A combination of Reduce and Map will work. Map to stitch together the a and b vectors, Reduce to step along them, allowing for accumulation. a - c(2,4,3,5) b - c(1,3,5,7) z - Reduce(function(zm1, coef) {coef[1] * zm1 + coef[2]}, Map(c, a, b), init = b[1], accumulate = TRUE) z [1] 1 3 15 50 257 I don't think so. a - c(2,4,3,5) b - c(1,3,5,7) z - rep(0,length(a)) z[1] - b[1] for( t in 2:length(a) ) { z[t] - a[t] * z[t-1] + b[t] } z gives [1] 1 7 26 137 and this agrees with a manual calculation. You get a vector of length 5 as result. It should be of length 4 with your data. If you change the Reduce expression to this u - Reduce(function(zm1, coef) {coef[1] * zm1 + coef[2]}, Map(c, a[-1], b[-1]), init = b[1], accumulate = TRUE) then you get the correct result. u [1] 1 7 26 137 Berend __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] What does [[1]] mean?
I know that [] is used for indexing. I know that [[]] is used for reference to a property of a COM object. I cannot find any explanation of what [[1]] does or, more pertinently, where it should be used. Thank you. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aplpy recursive function on a list
On 26-01-2012, at 19:10, Berend Hasselman wrote: On 26-01-2012, at 17:58, Brian Diggs wrote: On 1/25/2012 10:09 AM, patzoul wrote: I have 2 series of data a,b and I would like to calculate a new series which is z[t] = z[t-1]*a[t] + b[t] , z[1] = b[1]. How can I do that without using a loop? I don't think so. a - c(2,4,3,5) b - c(1,3,5,7) z - rep(0,length(a)) z[1] - b[1] for( t in 2:length(a) ) { z[t] - a[t] * z[t-1] + b[t] } z gives [1] 1 7 26 137 and this agrees with a manual calculation. You get a vector of length 5 as result. It should be of length 4 with your data. If you change the Reduce expression to this u - Reduce(function(zm1, coef) {coef[1] * zm1 + coef[2]}, Map(c, a[-1], b[-1]), init = b[1], accumulate = TRUE) then you get the correct result. u [1] 1 7 26 137 And the loop especially if byte compiled with cmpfun appears to be quite a bit quicker. Nrep - 1000 tfrml.loop - function(a,b) { z - rep(0,length(a)) z[1] - b[1] for( t in 2:length(a) ) { z[t] - a[t] * z[t-1] + b[t] } z } tfrml.rdce - function(a,b) { u - Reduce(function(zm1, coef) {coef[1] * zm1 + coef[2]}, Map(c, a[-1], b[-1]), init = b[1], accumulate = TRUE) u } library(compiler) tfrml.loop.c - cmpfun(tfrml.loop) tfrml.rdce.c - cmpfun(tfrml.rdce) z.loop - tfrml.loop(a,b) z.rdce - tfrml.rdce(a,b) all.equal(z.loop, z.rdce) library(rbenchmark) N - 500 set.seed(1) a - runif(N) b - runif(N) benchmark(tfrml.loop(a,b), tfrml.rdce(a,b), tfrml.loop.c(a,b), tfrml.rdce.c(a,b), replications=Nrep, columns=c(test, replications, elapsed)) test replications elapsed 1 tfrml.loop(a, b) 1000 2.665 3 tfrml.loop.c(a, b) 1000 0.554 2 tfrml.rdce(a, b) 1000 4.082 4 tfrml.rdce.c(a, b) 1000 3.143 Berend R-2.14.1 (32-bits), Mac OS X 10.6.8. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What does [[1]] mean?
Have you read ?[[ ? The short answer is that you can use both [] and [[]] on lists, the [] construct will return a subset of the list (which will be a list) while [[]] will return a single element of the list (which could be a list or a vector or whatever that element may be): compare: tmp - list( a=1, b=letters ) tmp[1] $a [1] 1 tmp[1] + 1 Error in tmp[1] + 1 : non-numeric argument to binary operator tmp[[1]] [1] 1 tmp[[1]] + 1 [1] 2 -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Ajay Askoolum Sent: Thursday, January 26, 2012 11:27 AM To: R General Forum Subject: [R] What does [[1]] mean? I know that [] is used for indexing. I know that [[]] is used for reference to a property of a COM object. I cannot find any explanation of what [[1]] does or, more pertinently, where it should be used. Thank you. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] merge multiple data frames
This is my reproducible example (three data frames: a, b, c) a-structure(list(date = structure(1:6, .Label = c(2012-01-03, 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23 ), class = factor), so2 = c(0.799401398190476, 0, 0, 0.0100453950434783, 0.200154920565217, 0.473866969181818), nox = c(111.716109973913, 178.077239330435, 191.257829021739, 50.6799951473913, 115.284643540435, 110.425185027727), no = c(48.8543691516522, 88.7197448817391, 93.9931932472609, 13.9759949817391, 43.1395266865217, 41.7280296016364 ), no2 = c(36.8673432865217, 42.37150668, 47.53311701, 29.3026882474783, 49.2986070321739, 46.5978461731818), co = c(0.618856168125, 0.99659347508, 0.66698741608, 0.38343731117, 0.281604928875, 0.155383408913043 ), o3 = c(12.1393100029167, 12.3522739816522, 10.9908791203043, 26.9122200013043, 13.8421695947826, 12.3788847045455), ipa = c(167.541954974667, 252.7196257875, 231.802370709167, 83.4850259595833, 174.394613581667, 173.868599272609), ws = c(1.47191016429167, 0.765781205208333, 0.937053086791667, 1.581022406625, 0.909756802125, 0.959252831695652 ), wd = c(45.2650019737732, 28.2493544114369, 171.049080544214, 319.753674830936, 33.8713897347193, 228.368119533759), temp = c(7.9197282588, 3.79434291520833, 2.1287644735, 6.733854600625, 3.136579722, 3.09864120704348), umr = c(86.11566638875, 94.5034087491667, 94.14451249375, 53.1016709004167, 65.63420423, 74.955669236087 )), .Names = c(date, so2, nox, no, no2, co, o3, ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class = data.frame) b-structure(list(date = structure(1:6, .Label = c(2012-01-03, 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23 ), class = factor), so2 = c(0, 0, 0, 0, 0, 0), nox = c(13.74758511, 105.8060582, 61.22720599, 11.45280354, 56.86804174, 39.17917222 ), no = c(0.882593766, 48.97037506, 9.732937217, 1.794549972, 16.32300019, 8.883637786), no2 = c(11.80447753, 25.35235381, 28.72990261, 8.590004034, 31.9003796, 25.50512403), co = c(0.113954917, 0.305985964, 0.064001839, 0, 1.86e-05, 0), o3 = c(5.570499897, 9.802379608, 5.729360104, 11.91304016, 12.13407993, 10.00961971 ), ipa = c(6.065110207, 116.9079971, 93.21240234, 10.5777998, 66.40740204, 34.47359848), ws = c(0.122115001, 0.367668003, 0.494913995, 0.627124012, 0.473895013, 0.593913019), wd = c(238.485119317031, 221.645073036776, 220.372076815032, 237.868340917096, 209.532933617465, 215.752030286564), temp = c(4.044159889, 1.176810026, 0.142934993, 0.184606999, -0.935989976, -2.015399933), umr = c(72.29229736, 88.69879913, 87.49530029, 24.00079918, 44.8852005, 49.47729874 )), .Names = c(date, so2, nox, no, no2, co, o3, ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class = data.frame) c-structure(list(date = structure(1:6, .Label = c(2012-01-03, 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23 ), class = factor), so2 = c(2.617839247, 0, 0, 0.231044086, 0.944608887, 2.12400444), nox = c(308.9046313, 275.6778849, 390.0824142, 178.7429364, 238.655832, 251.892601), no = c(156.0262489, 151.4412498, 221.0725021, 65.96049786, 106.541748, 119.3471241), no2 = c(74.80145447, 59.29991481, 66.5897975, 77.84267978, 75.68422569, 85.43044816 ), co = c(1.628431197, 1.716231492, 1.264678366, 1.693460745, 0.780637084, 0.892724398), o3 = c(26.1473999, 15.91584015, 22.46199989, 37.39400101, 15.63426018, 17.51494026), ipa = c(538.414978, 406.4620056, 432.6459961, 275.2820129, 435.7909851, 436.8039856), ws = c(4.995530128, 1.355309963, 1.708899975, 3.131690025, 1.546270013, 1.571320057 ), wd = c(58.15639877, 64.5657153143848, 39.9754269501381, 24.0739884380921, 55.9453098437477, 56.7648829092446), temp = c(10.24740028, 7.052690029, 4.33258009, 13.91609955, 8.762220383, 11.04300022), umr = c(97.60900116, 96.91899872, 96.20649719, 94.74620056, 82.04550171, 89.41320038 )), .Names = c(date, so2, nox, no, no2, co, o3, ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class = data.frame) Given the three data frames “a”, “b” and “c”, I need to merge them all by the common field “date”. The following attempt is working fine but… # code start all-merge(a,b,by=date,suffixes=c(.a,.b),all=T) all-merge(all,c,by=date,all=T) # code end …I would like to possibly do it in just “one shot” (a generalisation of the code for a more complex case of handling many data frames) also by assigning proper suffixes to each variable (not possible with the previous code snippet) Then I also try
[R] 3-parametric Weibull regression
Hello, I'm quite new to R and want to make a Weibull-regression with the survival package. I know how to build my Surv-object and how to make a standard-weibull regression with survreg. However, I want to fit a translated or 3-parametric weibull dist to account for a failure-free time. I think I would need a new object in survreg.distributions, but I don't know how to create that correctly. I've tried to inherit it from the extreme value dist, analogous to the implemented weibull-dist like so: survreg.distributions$weibull3p$name - weibull survreg.distributions$weibull3p$dist - extreme survreg.distributions$weibull3p$trans - function(y, x0) log(y) + x0 survreg.distributions$weibull3p$dtrans - function(y) 1/y survreg.distributions$weibull3p$itrans - function(x, x0) exp(x-x0) I then get a failure by survregDtest(survreg.distributions$weibull3p) that x0 is missing (since the transformation function is expected to take only one argument). Any ideas? Or is there maybe a package somewhere that supports 3-parametric weibull regression which I missed? Regards, Andreas [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge multiple data frames
I might do something like this: mergeAll - function(..., by = date, all = TRUE) { dotArgs - list(...) Reduce(function(x, y) merge(x, y, by = by, all = all, suffixes=paste(., names(dotArgs), sep = )), dotArgs)} mergeAll(a = a, b = b, c = c) str(.Last.value) You also might be able to set it up to capture names without you having to put a = a etc. using substitute. On Thu, Jan 26, 2012 at 12:29 PM, maxbre mbres...@arpa.veneto.it wrote: This is my reproducible example (three data frames: a, b, c) a-structure(list(date = structure(1:6, .Label = c(2012-01-03, 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23 ), class = factor), so2 = c(0.799401398190476, 0, 0, 0.0100453950434783, 0.200154920565217, 0.473866969181818), nox = c(111.716109973913, 178.077239330435, 191.257829021739, 50.6799951473913, 115.284643540435, 110.425185027727), no = c(48.8543691516522, 88.7197448817391, 93.9931932472609, 13.9759949817391, 43.1395266865217, 41.7280296016364 ), no2 = c(36.8673432865217, 42.37150668, 47.53311701, 29.3026882474783, 49.2986070321739, 46.5978461731818), co = c(0.618856168125, 0.99659347508, 0.66698741608, 0.38343731117, 0.281604928875, 0.155383408913043 ), o3 = c(12.1393100029167, 12.3522739816522, 10.9908791203043, 26.9122200013043, 13.8421695947826, 12.3788847045455), ipa = c(167.541954974667, 252.7196257875, 231.802370709167, 83.4850259595833, 174.394613581667, 173.868599272609), ws = c(1.47191016429167, 0.765781205208333, 0.937053086791667, 1.581022406625, 0.909756802125, 0.959252831695652 ), wd = c(45.2650019737732, 28.2493544114369, 171.049080544214, 319.753674830936, 33.8713897347193, 228.368119533759), temp = c(7.9197282588, 3.79434291520833, 2.1287644735, 6.733854600625, 3.136579722, 3.09864120704348), umr = c(86.11566638875, 94.5034087491667, 94.14451249375, 53.1016709004167, 65.63420423, 74.955669236087 )), .Names = c(date, so2, nox, no, no2, co, o3, ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class = data.frame) b-structure(list(date = structure(1:6, .Label = c(2012-01-03, 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23 ), class = factor), so2 = c(0, 0, 0, 0, 0, 0), nox = c(13.74758511, 105.8060582, 61.22720599, 11.45280354, 56.86804174, 39.17917222 ), no = c(0.882593766, 48.97037506, 9.732937217, 1.794549972, 16.32300019, 8.883637786), no2 = c(11.80447753, 25.35235381, 28.72990261, 8.590004034, 31.9003796, 25.50512403), co = c(0.113954917, 0.305985964, 0.064001839, 0, 1.86e-05, 0), o3 = c(5.570499897, 9.802379608, 5.729360104, 11.91304016, 12.13407993, 10.00961971 ), ipa = c(6.065110207, 116.9079971, 93.21240234, 10.5777998, 66.40740204, 34.47359848), ws = c(0.122115001, 0.367668003, 0.494913995, 0.627124012, 0.473895013, 0.593913019), wd = c(238.485119317031, 221.645073036776, 220.372076815032, 237.868340917096, 209.532933617465, 215.752030286564), temp = c(4.044159889, 1.176810026, 0.142934993, 0.184606999, -0.935989976, -2.015399933), umr = c(72.29229736, 88.69879913, 87.49530029, 24.00079918, 44.8852005, 49.47729874 )), .Names = c(date, so2, nox, no, no2, co, o3, ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class = data.frame) c-structure(list(date = structure(1:6, .Label = c(2012-01-03, 2012-01-04, 2012-01-05, 2012-01-06, 2012-01-07, 2012-01-08, 2012-01-09, 2012-01-10, 2012-01-11, 2012-01-12, 2012-01-13, 2012-01-14, 2012-01-15, 2012-01-16, 2012-01-17, 2012-01-18, 2012-01-19, 2012-01-20, 2012-01-21, 2012-01-22, 2012-01-23 ), class = factor), so2 = c(2.617839247, 0, 0, 0.231044086, 0.944608887, 2.12400444), nox = c(308.9046313, 275.6778849, 390.0824142, 178.7429364, 238.655832, 251.892601), no = c(156.0262489, 151.4412498, 221.0725021, 65.96049786, 106.541748, 119.3471241), no2 = c(74.80145447, 59.29991481, 66.5897975, 77.84267978, 75.68422569, 85.43044816 ), co = c(1.628431197, 1.716231492, 1.264678366, 1.693460745, 0.780637084, 0.892724398), o3 = c(26.1473999, 15.91584015, 22.46199989, 37.39400101, 15.63426018, 17.51494026), ipa = c(538.414978, 406.4620056, 432.6459961, 275.2820129, 435.7909851, 436.8039856), ws = c(4.995530128, 1.355309963, 1.708899975, 3.131690025, 1.546270013, 1.571320057 ), wd = c(58.15639877, 64.5657153143848, 39.9754269501381, 24.0739884380921, 55.9453098437477, 56.7648829092446), temp = c(10.24740028, 7.052690029, 4.33258009, 13.91609955, 8.762220383, 11.04300022), umr = c(97.60900116, 96.91899872, 96.20649719, 94.74620056, 82.04550171, 89.41320038 )), .Names = c(date, so2, nox, no, no2, co, o3, ipa, ws, wd, temp, umr), row.names = c(NA, 6L), class = data.frame) Given the three data
Re: [R] What does [[1]] mean?
Here is a page that should help: http://www.burns-stat.com/pages/Tutor/more_R_subscript.html Also Circle 8.1.54 of 'The R Inferno' http://www.burns-stat.com/pages/Tutor/R_inferno.pdf On 26/01/2012 18:27, Ajay Askoolum wrote: I know that [] is used for indexing. I know that [[]] is used for reference to a property of a COM object. I cannot find any explanation of what [[1]] does or, more pertinently, where it should be used. Thank you. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Patrick Burns pbu...@pburns.seanet.com twitter: @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of 'Some hints for the R beginner' and 'The R Inferno') __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] null distribution of binom.test p values
On Fri, Jan 27, 2012 at 5:36 AM, Chris Wallace chris.wall...@cimr.cam.ac.uk wrote: Greg, thanks for the reply. Unfortunately, I remain unconvinced! I ran a longer simulation, 100,000 reps. The size of the test is consistently too small (see below) and the histogram shows increasing bars even within the parts of the histogram with even bar spacing. See https://www-gene.cimr.cam.ac.uk/staff/wallace/hist.png y-sapply(1:10, function(i,n=100) binom.test(sum(rnorm(n)0),n,p=0.5,alternative=two)$p.value) mean(y0.01) # [1] 0.00584 mean(y0.05) # [1] 0.03431 mean(y0.1) # [1] 0.08646 Can that really be due to the discreteness of the distribution? Yes. All so-called exact tests tend to be conservative due to discreteness, and there's quite a lot of discreteness in the tails The problem is far worse for Fisher's exact test, and worse still for Fisher's other exact test (of Hardy-Weinberg equilibrium -- http://www.genetics.org/content/180/3/1609.full). You don't need to rely on finite-sample simulations here: you can evaluate the level exactly. Using binom.test() you find that the rejection regions are y=39 and y=61, so the level at nominal 0.05 is: pbinom(39,100,0.5)+pbinom(60,100,0.5,lower.tail=FALSE) [1] 0.0352002 agreeing very well with your 0.03431 At nominal 0.01 the exact level is pbinom(36,100,0.5)+pbinom(63,100,0.5,lower.tail=FALSE) [1] 0.006637121 and at 0.1 it is pbinom(41,100,0.5)+pbinom(58,100,0.5,lower.tail=FALSE) [1] 0.08862608 Your result at nominal 0.01 is a bit low, but I think that's bad luck. When I ran your code I got 0.00659 for the estimated level at nominal 0.01, which matches the exact calculations very well Theoreticians sweep this under the carpet by inventing randomized tests, where you interpolate a random p-value between the upper and lower values from a discrete distribution. It's a very elegant idea that I'm glad to say I haven't seen used in practice. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aplpy recursive function on a list
On 1/26/2012 10:33 AM, Berend Hasselman wrote: On 26-01-2012, at 19:10, Berend Hasselman wrote: On 26-01-2012, at 17:58, Brian Diggs wrote: On 1/25/2012 10:09 AM, patzoul wrote: I have 2 series of data a,b and I would like to calculate a new series which is z[t] = z[t-1]*a[t] + b[t] , z[1] = b[1]. How can I do that without using a loop? I don't think so. a- c(2,4,3,5) b- c(1,3,5,7) z- rep(0,length(a)) z[1]- b[1] for( t in 2:length(a) ) { z[t]- a[t] * z[t-1] + b[t] } z gives [1] 1 7 26 137 and this agrees with a manual calculation. You get a vector of length 5 as result. It should be of length 4 with your data. If you change the Reduce expression to this u- Reduce(function(zm1, coef) {coef[1] * zm1 + coef[2]}, Map(c, a[-1], b[-1]), init = b[1], accumulate = TRUE) then you get the correct result. u [1] 1 7 26 137 You are correct; I had an off-by-one error. It agreed with my manual calculation, which also had the same error. And the loop especially if byte compiled with cmpfun appears to be quite a bit quicker. Nrep- 1000 tfrml.loop- function(a,b) { z- rep(0,length(a)) z[1]- b[1] for( t in 2:length(a) ) { z[t]- a[t] * z[t-1] + b[t] } z } tfrml.rdce- function(a,b) { u- Reduce(function(zm1, coef) {coef[1] * zm1 + coef[2]}, Map(c, a[-1], b[-1]), init = b[1], accumulate = TRUE) u } library(compiler) tfrml.loop.c- cmpfun(tfrml.loop) tfrml.rdce.c- cmpfun(tfrml.rdce) z.loop- tfrml.loop(a,b) z.rdce- tfrml.rdce(a,b) all.equal(z.loop, z.rdce) library(rbenchmark) N- 500 set.seed(1) a- runif(N) b- runif(N) benchmark(tfrml.loop(a,b), tfrml.rdce(a,b), tfrml.loop.c(a,b), tfrml.rdce.c(a,b), replications=Nrep, columns=c(test, replications, elapsed)) test replications elapsed 1 tfrml.loop(a, b) 1000 2.665 3 tfrml.loop.c(a, b) 1000 0.554 2 tfrml.rdce(a, b) 1000 4.082 4 tfrml.rdce.c(a, b) 1000 3.143 Berend R-2.14.1 (32-bits), Mac OS X 10.6.8. The timings are interesting; I would not have expected the loop to have outperformed Reduce, or at least not by that much. The loop also benefits much more from compiling, which is not as surprising since Reduce and Map are already compiled. I would guess the difference is due to overhead changing the format of the a/b data and being able to specialize the code. I also ran timings for comparison and got qualitatively the same thing: benchmark(tfrml.loop(a,b), tfrml.rdce(a,b), tfrml.loop.c(a,b), tfrml.rdce.c(a,b), replications=Nrep, columns=c(test, replications, elapsed, relative), order=relative) test replications elapsed relative 3 tfrml.loop.c(a, b) 10000.34 1.00 1 tfrml.loop(a, b) 10001.89 5.558824 4 tfrml.rdce.c(a, b) 10002.12 6.235294 2 tfrml.rdce(a, b) 10002.79 8.205882 R version 2.14.1 (2011-12-22) Platform: x86_64-pc-mingw32/x64 (64-bit) (Windows 7) -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health Science University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Quality of fit statistics for NLS?
Dear all, I am trying to analyze some non-linear data to which I have fit a curve of the following form: dum - nls(y~(A + (B*x)/(C+x)), start = list(A=370,B=100,C=23000)) I am wondering if there is any way to determine meaningful quality of fit statistics from the nls function? A summary yields highly significant p-values, but it is my impression that these are questionable at best given the iterative nature of the fit: summary(dum) Formula: y ~ (A + (B * x)/(C + x)) Parameters: Estimate Std. Error t value Pr(|t|) A 388.753 4.794 81.090 2e-16 *** B 115.215 5.006 23.015 2e-16 *** C 20843.832 4646.937 4.485 1.12e-05 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 18.25 on 245 degrees of freedom Number of iterations to convergence: 4 Achieved convergence tolerance: 2.244e-06 Is there any other means of determining the quality of the curve fit? I have tried applying confidence intervals using confint(dum), but these curves seem unrealistically narrow. Thanks so much for your help! -Max [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculate a function repeatedly over sections of a ts object
Hi, I want to apply a function (in my case SDF; package “sapa”) repeatedly over discrete sections of a daily time series object by sliding a time window of constant length (e.g. 10 consecutive years or 1825 days) over the entire ts at increments of 1 time unit (e.g. 1 year or 365 days). So for example, the first SDF would be calculated for the daily values of my variable recorded between years 1 to 5, SDF2 to those for years 2 to 6 and so on until the total length of the series is covered. How can I implement this into a R script? Any help is much appreciated. Jorge __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why was the ‘doSMP’ package removed from CRAN?
Thank you for the explanation Uwe. With regards, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Thu, Jan 26, 2012 at 10:42 AM, Uwe Ligges lig...@statistik.tu-dortmund.de wrote: On 26.01.2012 09:39, Barry Rowlingson wrote: On Thu, Jan 26, 2012 at 8:02 AM, Uwe Ligges lig...@statistik.tu-dortmund.**de lig...@statistik.tu-dortmund.de wrote: On 25.01.2012 22:20, Tal Galili wrote: Does any one know the reason for this? Is this a technical or a legal (e.g: license) issue? If legal issues were the reason, you had not found it in the archives anymore. Licensing can change - as a copyright holder I could decide the next release of my package is going to be under a proprietary license that doesn't allow redistribution. Any code already released under something like the GPL can't be forcibly removed, and people can make new forks from those, but if I'm the only person writing it and I decide to change the license and the latest free version isn't compatible with the current version of R then I'd expect to see the old versions in the archives and no version for the latest version of R. Last checkin at R-forge was only six weeks ago, and 1.0-3 installs fine on my latest R: https://r-forge.r-project.org/**scm/?group_id=950https://r-forge.r-project.org/scm/?group_id=950 I suspect they just haven't pushed it to R-forge yet. Cockup before conspiracy. Barry Before people start with even more speculations: Reason is that the revoIPC package does no compile on modern versions of gcc and hence has been archived, doSMP depends on it and therefore went to the archives as well. Uwe Ligges [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculate a function repeatedly over sections of a ts object
I'm not sure if it's easily doable with a ts class, but the rollapply function in the zoo package will do this easily. (Also, I find zoo to be a much more natural time-series workflow than ts so it might make the rest of your life easier as well) Michael On Thu, Jan 26, 2012 at 2:24 PM, Jorge Molinos jgarc...@tcd.ie wrote: Hi, I want to apply a function (in my case SDF; package “sapa”) repeatedly over discrete sections of a daily time series object by sliding a time window of constant length (e.g. 10 consecutive years or 1825 days) over the entire ts at increments of 1 time unit (e.g. 1 year or 365 days). So for example, the first SDF would be calculated for the daily values of my variable recorded between years 1 to 5, SDF2 to those for years 2 to 6 and so on until the total length of the series is covered. How can I implement this into a R script? Any help is much appreciated. Jorge __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculate a function repeatedly over sections of a ts object
On Thu, Jan 26, 2012 at 4:00 PM, R. Michael Weylandt michael.weyla...@gmail.com wrote: I'm not sure if it's easily doable with a ts class, but the rollapply function in the zoo package will do this easily. (Also, I find zoo to be a much more natural time-series workflow than ts so it might make the rest of your life easier as well) rollapply does have ts and default methods in addition to a zoo method. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge multiple data frames
thank you for your reply I'll study and test your code (which is a bit obscure to me up to now); by the way do you think that merge_all is a wrong way to hit? thanks again m -- View this message in context: http://r.789695.n4.nabble.com/merge-multiple-data-frames-tp4331089p4331830.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?
I ask the question about when to stop adding another variable even though it lowers the AIC because each time I add a variable the AIC is lower. How do I know when the model is a good fit? When to stop adding variables, keeping the model simple? Thanks, J -- View this message in context: http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4331848.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to remove rows representing concurrent sessions from data.frame?
johannes rara wrote on 01/26/2012 02:46:57 AM: I have a dataset like this (dput for this below) which represents user computer sessions: username machine start end 1 user1 D5599.domain.com 2011-01-03 09:44:18 2011-01-03 09:47:27 2 user1 D5599.domain.com 2011-01-03 09:46:29 2011-01-03 10:09:16 3 user1 D5599.domain.com 2011-01-03 14:07:36 2011-01-03 14:56:17 4 user1 D5599.domain.com 2011-01-05 15:03:17 2011-01-05 15:23:15 5 user1 D5599.domain.com 2011-02-14 14:33:39 2011-02-14 14:40:16 6 user1 D5599.domain.com 2011-02-23 13:54:30 2011-02-23 13:58:23 7 user1 D5599.domain.com 2011-03-21 10:10:18 2011-03-21 10:32:22 8 user1 D5645.domain.com 2011-06-09 10:12:41 2011-06-09 10:58:59 9 user1 D5682.domain.com 2011-01-03 12:03:45 2011-01-03 12:29:43 10USER2 D5682.domain.com 2011-01-12 14:26:05 2011-01-12 14:32:53 11USER2 D5682.domain.com 2011-01-17 15:06:19 2011-01-17 15:44:22 12USER2 D5682.domain.com 2011-01-18 15:07:30 2011-01-18 15:42:43 13USER2 D5682.domain.com 2011-01-25 15:20:55 2011-01-25 15:24:38 14USER2 D5682.domain.com 2011-02-14 15:03:00 2011-02-14 15:07:43 15USER2 D5682.domain.com 2011-02-14 14:59:23 2011-02-14 15:14:47 There may be serveral concurrent sessions for same username from the same computer. How can I remove those rows so that only one sessions is left for this data? I have no idea how to do this, maybe using difftime? -J structure(list(username = c(user1, user1, user1, user1, user1, user1, user1, user1, user1, USER2, USER2, USER2, USER2, USER2, USER2 ), machine = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c(D5599.domain.com, D5645.domain.com, D5682.domain.com, D5686.domain.com, D5694.domain.com, D5696.domain.com, D5772.domain.com, D5772.domain.com, D5847.domain.com, D5855.domain.com, D5871.domain.com, D5927.domain.com, D5927.domain.com, D5952.domain.com, D5993.domain.com, D6012.domain.com, D6048.domain.com, D6077.domain.com, D5688.domain.com, D5815.domain.com, D6106.domain.com, D6128.domain.com ), class = factor), start = structure(c(1294040658, 1294040789, 1294056456, 1294232597, 1297686819, 1298462070, 1300695018, 1307603561, 1294049025, 1294835165, 1295269579, 1295356050, 1295961655, 1297688580, 1297688363), class = c(POSIXct, POSIXt), tzone = ), end = structure(c(1294040847, 1294042156, 1294059377, 1294233795, 1297687216, 1298462303, 1300696342, 1307606339, 1294050583, 1294835573, 1295271862, 1295358163, 1295961878, 1297688863, 1297689287), class = c(POSIXct, POSIXt), tzone = )), .Names = c(username, machine, start, end), row.names = c(NA, 15L), class = data.frame) # rearrange the data, so that there is one date/time variable # and another variable indicates start/end library(reshape) df2 - melt(df) # sort the data by user, machine, date/time df3 - df2[order(df2$username, df2$machine, df2$value), ] # for each user and machine, # keep only the first start record and the last end record first - function(x) { l - length(x) c(1, 1-(x[-1]==x[-l])) } last - function(x) { y - rev(x) l - length(y) rev(c(1, 1-(y[-1]==y[-l]))) } df4 - df3[(df3$variable==start first(df3$variable)) | (df3$variable==end last(df3$variable)), ] # combine the results df5 - cbind(df4[df4$variable==start, c(username, machine, value)], value2=df4$value[df4$variable==end]) df5 Jean [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Quality of fit statistics for NLS?
Inline below. -- Bert On Thu, Jan 26, 2012 at 12:16 PM, Max Brondfield max.brondfi...@gmail.com wrote: Dear all, I am trying to analyze some non-linear data to which I have fit a curve of the following form: dum - nls(y~(A + (B*x)/(C+x)), start = list(A=370,B=100,C=23000)) I am wondering if there is any way to determine meaningful quality of fit statistics from the nls function? A summary yields highly significant p-values, but it is my impression that these are questionable at best given the iterative nature of the fit: No. They are questionable primarily because there is no clear null model. They are based on profile likelihoods (as ?confint tells you), which may or may not be what you want for goodness of fit. One can always get goodness of fit statistics but the question in nonlinear models is: goodness of fit with respect to what? So the answer to your question is: if you know what you're doing, certainly. Otherwise, find someone who does. summary(dum) Formula: y ~ (A + (B * x)/(C + x)) Parameters: Estimate Std. Error t value Pr(|t|) A 388.753 4.794 81.090 2e-16 *** B 115.215 5.006 23.015 2e-16 *** C 20843.832 4646.937 4.485 1.12e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 18.25 on 245 degrees of freedom Number of iterations to convergence: 4 Achieved convergence tolerance: 2.244e-06 Is there any other means of determining the quality of the curve fit? I have tried applying confidence intervals using confint(dum), but these curves seem unrealistically narrow. Thanks so much for your help! -Max [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?
Simple question. 8 million pages in the statistical literature of answers. What, indeed, is the secret to life? Post on a statistical help list (e.g. stats.stackexchange.com). This has almost nothing to do with R. Be prepared for an onslaught of often conflicting wisdom. -- Bert On Thu, Jan 26, 2012 at 1:25 PM, Jhope jeanwaij...@gmail.com wrote: I ask the question about when to stop adding another variable even though it lowers the AIC because each time I add a variable the AIC is lower. How do I know when the model is a good fit? When to stop adding variables, keeping the model simple? Thanks, J -- View this message in context: http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4331848.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] warning In fun(...) : no DISPLAY variable so Tk is not available
Nevil Amos wrote: I am getting the above warning following loading of Geneland 3.1.5 on unix , while a simple plot sends output to the pdf file ( see attached code) no output results from Geneland functions, resulting in empty pdf files That message is saying it can't find an X11 server for Tcltk to use. If you do have X11 available, then just set the DISPLAY environment variable in the normal way; if you don't, then you're not going to be able to use that package on Unix. (The Windows implementation is self-contained, so it should work there.) I am having the same error message for Geneland, but I am implementing on a Windows machine. -- View this message in context: http://r.789695.n4.nabble.com/warning-In-fun-no-DISPLAY-variable-so-Tk-is-not-available-tp2235656p4331992.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] function for grouping
On Thu, Jan 26, 2012 at 08:29:22AM -0800, yan wrote: what if I don't need to store the combination results, I just want to get the combination result for a given row. e.g. for the 5 elements divided into 3 groups , and if I give another parameter which is the row number of the results, in petr's example, say if I set row number to 1, then I get 1,2,3,1,1. So I need a systematic way of generating the combination, but I don't need all the combinations in one go. Hi. The following function partitioning() computes m-th partitioning of n elements into groups groups in the lexicographic order. partitioning - function(n, groups, m) { a - matrix(nrow=n+1, ncol=groups+2) a[, groups + 2] - 0 a[1, 0:groups + 1] - 0 a[1, groups + 1] - 1 for (i in seq.int(from=2, length=n)) { for (j in 0:groups) { a[i, j + 1] - j*a[i - 1, j + 1] + a[i - 1, j + 2] } } available - a[n + 1, 1] stopifnot(m = available) x - rep(NA, times=n) for (k in seq.int(length=n)) { free - max(x[seq.int(length=k-1)], 0) for (j in seq.int(length=min(free + 1, groups))) { subtree - a[n - k + 1, max(free, j) + 1] if (subtree m) { available - available - subtree m - m - subtree } else { x[k] - j break } } } x } # the previous code for comparison modified to # produce partitionings in the same order check.row - function(x, k) { y - unique(x) length(y) == k all(y == 1:k) } allPart - as.matrix(rev(expand.grid(x5=1:3, x4=1:3, x3=1:3, x2=1:2, x1=1))) ok - apply(allPart, 1, check.row, k=3) allPart - allPart[ok, ] # the comparison itself for (m in seq.int(length=nrow(allPart)+1)) { x - partitioning(5, 3, m) cat(m, x, x - allPart[m, ], \n) } # the output is 1 1 1 1 2 3 0 0 0 0 0 2 1 1 2 1 3 0 0 0 0 0 3 1 1 2 2 3 0 0 0 0 0 4 1 1 2 3 1 0 0 0 0 0 5 1 1 2 3 2 0 0 0 0 0 6 1 1 2 3 3 0 0 0 0 0 7 1 2 1 1 3 0 0 0 0 0 8 1 2 1 2 3 0 0 0 0 0 9 1 2 1 3 1 0 0 0 0 0 10 1 2 1 3 2 0 0 0 0 0 11 1 2 1 3 3 0 0 0 0 0 12 1 2 2 1 3 0 0 0 0 0 13 1 2 2 2 3 0 0 0 0 0 14 1 2 2 3 1 0 0 0 0 0 15 1 2 2 3 2 0 0 0 0 0 16 1 2 2 3 3 0 0 0 0 0 17 1 2 3 1 1 0 0 0 0 0 18 1 2 3 1 2 0 0 0 0 0 19 1 2 3 1 3 0 0 0 0 0 20 1 2 3 2 1 0 0 0 0 0 21 1 2 3 2 2 0 0 0 0 0 22 1 2 3 2 3 0 0 0 0 0 23 1 2 3 3 1 0 0 0 0 0 24 1 2 3 3 2 0 0 0 0 0 25 1 2 3 3 3 0 0 0 0 0 Error: m = available is not TRUE The zeros confirm that partitioning(5, 3, m) is exactly m-th row of allPart. The error at the end shows that the function partitioning() recognizes the situation that m is larger than the number of partitionings with the given parameters. The idea of the algorithm is that it searches through the tree of all encodings of the partitionings and the precomputed matrix a[,] allows to determine the number of leaves under any node of the tree. Using this, the algorithm can choose the right branch at each node. Hope this helps. Petr. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Conditional cumulative sum
Dear List, I'll appreciate your help on this. I'm trying to create a variable as in cumsum_y.cond1 below, which should compute the cumulative sum of y conditional on the value of cond==1. set.seed(1) d - data.frame(y= sample(c(0,1), 10, replace= T), cond= sample(c(0,1), 10, replace= T)) y cond cumsum_y.cond1 1 00 0 2 00 0 3 11 1 4 10 1 5 01 1 6 10 1 7 11 2 8 11 3 9 10 3 10 01 3 Thank you. Regards, Axel. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Conditional cumulative sum
Axel Urbiz wrote Dear List, I'll appreciate your help on this. I'm trying to create a variable as in cumsum_y.cond1 below, which should compute the cumulative sum of y conditional on the value of cond==1. set.seed(1) d - data.frame(y= sample(c(0,1), 10, replace= T), cond= sample(c(0,1), 10, replace= T)) y cond cumsum_y.cond1 1 00 0 2 00 0 3 11 1 4 10 1 5 01 1 6 10 1 7 11 2 8 11 3 9 10 3 10 01 3 Thank you. Regards, Axel. [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. is this what you are looking for ... set.seed(1) d - data.frame(y= sample(c(0,1), 10, replace= T), cond= sample(c(0,1), 10, replace= T)) d$cumsum_y.cond1 = cumsum(d$y d$cond) # Output y cond cumsum_y.cond1 1 00 0 2 00 0 3 11 1 4 10 1 5 01 1 6 10 1 7 11 2 8 11 3 9 10 3 10 01 3 HTH Pete -- View this message in context: http://r.789695.n4.nabble.com/Conditional-cumulative-sum-tp4332254p4332344.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Checking for invalid dates: Code works but needs improvement
Hello, again. I now have a more complete answer to your points. 1. It's too long. My understanding is that skilled programmers can usually or often complete tasks like this in a few lines. It's not very shorter but it's more readable. (The programmer is always suspect) 2. It's not vectorized. I started out trying to do something that was vectorized but ran into problems with the strsplit function. I looked at the help file and it appears this function will only accept a single character vector. All but one instructions are vectorized. And the one that is not only loops for a few column names. Use 'unlist' on the 'strsplit' function's output to give a vector. 4. There's no way to specify names for input and output data. I imagine this would be fairly easy to specify this in the arguments to a function but am not sure how to incorporate it into a for loop. You can now specify any matrix or data.frame, but it will only process the columns with dates. (This is not true, it will process anything with a '/' on it. Pay attention.) Near the beginning of your code include the following: TestDates - data.frame(scan(connection, list(Patient=0, birthDT=, diagnosisDT=, metastaticDT=))) close(connection) TDSaved - TestDates# to avoid reopenning the connection And then, after all of it, fun - function(Dat){ f - function(jj, DF){ x - as.character(DF[, jj]) x - unlist(strsplit(x, /)) n - length(x) M - x[seq(1, n, 3)] D - x[seq(2, n, 3)] Y - x[seq(3, n, 3)] D[D == un] - 15 Y - ifelse(nchar(Y) 4 | Y 1900, NA, Y) x - as.Date(paste(Y, M, D, sep=-), format=%Y-%m-%d) if(any(is.na(x))) cat(Warning: Invalid date values in, jj, \n, as.character(DF[is.na(x), jj]), \n) x } colinx - colnames(as.data.frame(Dat)) Dat - data.frame(sapply(colinx, function(j) f(j, Dat))) for(i in colinx) class(Dat[[i]]) - Date Dat } TD - TDSaved TD[, DateNames] - fun(TD[, DateNames]) TD Had fun in writing it. Good luck. Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/Checking-for-invalid-dates-Code-works-but-needs-improvement-tp4324356p4332529.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is there a R command for testing the difference of two liear regressions?
Hi al, I am looking for a R command to test the difference of two linear regressoon betas. Lets say I have data x1, x2...x(nï¼1). beta1 is obtained from regressing x1 to xn onto 1 to n. beta2 is obtained from regressing x2 to x(nï¼1) onto 1 to n. Is there a way in R to test whether beta1 and beta2 are statistically different? Thanks a lot! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a R command for testing the difference of two liear regressions?
I don't know what is available in R but a relevant paper that provides a test is by Hotelling, H ( September, 1940 ) The Selection of Variates For Use in Prediction With Some Comments On The General Problem of Nuisance Parameters. Annals of Mathematical Statistics, 11, 271-283. On Thu, Jan 26, 2012 at 11:59 PM, Michael comtech@gmail.com wrote: Hi al, I am looking for a R command to test the difference of two linear regressoon betas. Lets say I have data x1, x2...x(nï¼1). beta1 is obtained from regressing x1 to xn onto 1 to n. beta2 is obtained from regressing x2 to x(nï¼1) onto 1 to n. Is there a way in R to test whether beta1 and beta2 are statistically different? Thanks a lot! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multiple column comparison
Hello, I have a very large content analysis project, which I've just begun to collect training data on. I have three coders, who are entering data on up to 95 measurements. Traditionally, I've used Excel to check coder agreement (e.g., percentage agreement), by lining up each coder's measurements side-by-side, creating a new column with the results using if statements. That is, if (a=b, 1, 0). With this many variables, I am clearly interested in something that I don't have to create manually every time I check percentage agreement for coders. The data are set up like this: IDCODER V1 V2 V3 V4 ... V95 ID1 C1 y int doc y ID2 C1 y ext doc y ID1 C2nint doc y ID2 C2nint doc y ID1 C3 n int doc y ID2 C3 n int doc y I would like to write a script to do the following: For each variable compare each pair of coders using if statements (e.g., if C1.V1.==C1.V2, 1, 0) IDC1.V1 C2.V1 C3.V1 ID1 y y y ID2 yy y For each coding pair, enter the resulting 1s and 0s into a new column. The new column name would reflect the results of the comparison, such as C1.C2.V1 I'd ideally like to create this so that it can handle any number of variables and any number of coders. I appreciate any thoughts, help, and pointers on this. Thanks in advance. Best, Ryan Fuller Doctoral Candidate, Communication Graduate Student Researcher, Carsey-Wolf Center http://carseywolf.ucsb.edu University of California, Santa Barbara -- View this message in context: http://r.789695.n4.nabble.com/multiple-column-comparison-tp4332604p4332604.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Clusters: How to build a Amalgamation Schedule (sequence of jointing )?
http://r.789695.n4.nabble.com/file/n4332721/Рисунок1.jpg Ultimately, I want, when I will set the limits of height's variation from *min* to *max*, get into the file a set of clusters (*A, B, C, ... ..*) in the form of: l1; 32, 33, 34, 30, 31, 55, 60, 58, 59, 57, 61 l2; 50, 51, 52, 54, 47, 53, 48, 49, 28, 6, 39, 4, 27, 3, 1, 25 2, 26 l3; 41, 44, 43, 45, 40, 46, 8, 13, 9, 15, 11, 35, 7, 14, 10, 12, 36 l4 ... ... ... ... Lm Where *min* / l1, l2, l3 . lm / *max* /l1,l2,l3/ - the height, /number/ - case's number in the original file data -- View this message in context: http://r.789695.n4.nabble.com/Clusters-How-to-build-a-Amalgamation-Schedule-sequence-of-jointing-tp4319741p4332721.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.