[R] calling the function which is stored in a list
Hi I'm storing two functions in a list # creating two function function1 - function(n) { return(sum(n)) } function2 - function(n) { return(mean(n)) } #storing the function function3 =c(function1,function2) is it possible to call the stored function and used it ? x=c(10,29) funtion3[1](x) Thanks - Thanks in Advance Arun -- View this message in context: http://r.789695.n4.nabble.com/calling-the-function-which-is-stored-in-a-list-tp4372131p4372131.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in Rd[[which]] : subscript out of bounds
On 08.02.2012 22:10, Martin Morgan wrote: On 02/08/2012 12:13 PM, Ben Ganzfried wrote: Hi Uwe, Thanks for the help. R version 2.14.0 (2011-10-31). The file in question looks like this (w/ a few minor edits for privacy): There are two 'description' entries; the second might be combined with the existing 'details'. Martin Yes, and let me add: you may want to use \dQuote{} and friends rather than real quotes, use markup for the e-mail address etc. Uwe Ligges \name{curatedData-package} \alias{curatedData-package} \alias{curatedData} \docType{package} \title{Cancer Gene Expression Analysis} \description{The curatedData package provides relevant functions and data for gene expression analysis in cancer patients.} \details{ \tabular{ll}{ Package: \tab curatedData\cr Type: \tab Package\cr Version: \tab 1.0\cr Date: \tab 2012-2-03\cr License: \tab Artistic-2.0\cr Depends: \tab R (= 2.10.0), affy\cr } } \author{ Benjamin F. Ganzfried, et al. Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard School of Public Health Maintainer: ben.ganzfr...@gmail.com } \description{ Please refer to the following key. For summarygrade: low = 1, 2, LMP. High= 3,4,23. For summarystage: early = 1,2, 12. late=3,4,23,34. For T: Stage (1-4). If multiple stages given (eg 34), use the highest. For substage: substage (abcd). For cases like ab, bc, use highest given. For G: Grade (1-4): If multiple given, ie 12, 23, use highest given. For N: N (0/1): degree of spread to regional lymph nodes. For M: M (0/1): presence of metastasis. For pltx: patient treated with platin. For tax: patient treated with taxol. For neo: patient treated with neoadjuvant treatment. For primary_therapy_outcome_success: response to any kind of therapy (including radiation only). For chemo_response: platinum resistance: refractory=3mo or less, resistant=6mo or less, sensitive=12mo or higher. For inferred_chemo_response: inferred platinum resistance: refractory=death in 6mo or less, sensitive=survival for 12mo or more. For debulking: amount of residual disease (optimal =1mm, suboptimal=1mm). } 2012/2/8 Uwe Liggeslig...@statistik.tu-dortmund.de On 08.02.2012 18:44, Ben Ganzfried wrote: Hi-- I googled the above error and found previous postings about this error on the list. I was having a little difficulty implementing the advice though. The suggestions were to use: traceback() and checkRd(). I'm using R in the directory in which the .Rd file with the problem is located, but I'm having difficulty figuring out how to proceed. I've looked through the help pages for traceback() and checkRd(), and would greatly appreciate any advice for how to find the errors and fix them. The specific error I am getting when trying to check my package is: * checking Rd files ... WARNING Error in Rd[[which]] : subscript out of bounds Which R version (assuming R-2.14.1 or R-devel)? Can you make the file available? Uwe Ligges problem found in ‘curatedData-package.Rd’ Thanks in advance! Ben [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/** posting-guide.htmlhttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dropterm in MANOVA for MLM objects
Thanks for nice explanation. Unfortunately, matrix in my question is exactly similar to the one I posted earlier : mat - matrix(rnorm(700), ncol=5, dimnames=list( paste(f, c(1:140),sep=_), c(A, B, C, D, E))) Question here is which of the 140 characteristics (i.e. f_1...f_140) distinguish the most between the five plant species. Is it true that this matrix can't be regressed with factor responses (species) ? If so, what alternatives can be used ? - Vickie From: j...@mcmaster.ca To: is...@live.com CC: r-help@r-project.org Subject: RE: [R] dropterm in MANOVA for MLM objects Date: Wed, 8 Feb 2012 20:37:37 -0500 Dear Vickie, I'm afraid that the test problem that you've constructed makes no sense, and doesn't correspond to the problem that you initially described, in which a matrix of presumably 5 responses for presumably 140 observations is regressed on 6 predictors. You regressed your randomly generated matrix of 5 responses and 140 observations on a factor constructed from the distinct 140 observation names. That factor has 140 levels, and so the model uses 140 df, all the df in the data. It's therefore not surprising that the error SSP matrix has 0 df, which is exactly what Anova.mlm (actually, linearHypothesis.mlm, which it calls) tells you. The remark that you found about univariate tests that you apparently found on-line concerns repeated-measures designs and is not relevant to your data. And you can't do a univariate ANOVA when there's 0 df for error in any event. Here's a proper simulation of the kind of data that I think you have: set.seed(12345) E - matrix(rnorm(140*5), ncol=5) X - matrix(rnorm(140*6), ncol=6) Beta - matrix(runif(6*5), ncol=5) Y - X %*% Beta + E colnames(Y) - c(A, B, C, D, E) colnames(X) - c(syct, mmin, mmax, cach, chmin, chmax) Data - as.data.frame(cbind(Y, X)) mod - lm(cbind(A, B, C, D, E) ~ syct + mmin + mmax + cach + chmin + chmax, data=Data) Anova(mod) Type II MANOVA Tests: Pillai test statistic Df test stat approx F num Df den Df Pr(F) syct 1 0.41622 18.395 5 129 9.31e-14 *** mmin 1 0.48288 24.091 5 129 2.2e-16 *** mmax 1 0.62100 42.273 5 129 2.2e-16 *** cach 1 0.61711 41.583 5 129 2.2e-16 *** chmin 1 0.72547 68.180 5 129 2.2e-16 *** chmax 1 0.54825 31.311 5 129 2.2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Best, John -Original Message- From: Vickie S [mailto:is...@live.com] Sent: February-08-12 5:53 PM To: j...@mcmaster.ca Cc: r-help@r-project.org Subject: RE: [R] dropterm in MANOVA for MLM objects Dear Prof Fox, I tried anova but got the following error message: mat - matrix(rnorm(700), ncol=5, dimnames=list( paste(f, c(1:140), sep=_), c(A, B, C, D, E))) summary(Anova(lm(cbind(A, B, C, D, E) ~ factor(rownames(mat)), data=as.data.frame(mat Error in summary(Anova(lm(cbind(A, B, C, D, E) ~ factor(rownames(mat)), : error in evaluating the argument 'object' in selecting a method for function 'summary': Error in linearHypothesis.mlm(mod, hyp.matrix.2, SSPE = SSPE, V = V, ...) : The error SSP matrix is apparently of deficient rank = 0 5 I looked in previous forum and it seems like i have only option of performing the univariate test here. Therefore I used the following, but it still results in an error message: Anova(lm(cbind(A, B, C, D, E) ~ factor(rownames(mat)), data=as.data.frame(mat)), univariate=TRUE, multivariate=F) Error in linearHypothesis.mlm(mod, hyp.matrix.2, SSPE = SSPE, V = V, ...) : The error SSP matrix is apparently of deficient rank = 0 5 Any suggestions ? Thanks Vickie I think I am still missing some important clues here. Is it because the feww From: j...@mcmaster.ca To: is...@live.com CC: r-help@r-project.org Subject: RE: [R] dropterm in MANOVA for MLM objects Date: Wed, 8 Feb 2012 17:01:34 -0500 Dear Vicki, I think that the Anova() function in the car package will do what you want (and will also properly handle models with more structure, such as interactions). Best, John John Fox Senator William McMaster Professor of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Vickie S Sent: February-08-12 3:57 PM To: r-help@r-project.org Subject: [R] dropterm in MANOVA for MLM objects Dear R fans, I have got a difficult sounding problem. For fitting a linear model using continuous response and then for re- fitting the model after excluding
Re: [R] Version control (git, mercurial) for R packages
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 09/02/12 00:01, Peter Langfelder wrote: Hi all, in particular package developers, I'm exploring using a version control system to keep better track of changes to the packages I maintain. I'm leaning towards git (although mercurial also looks good) but am not sure what is the best way to set up the repository. It seems I can't set the repository directly within the R package main directory, since it will be incompatible with the standard package structure. I can set the repository in the directory that contains my R packages, but then I have a single repository for all of my packages, which is not ideal. Git is nice - but if you ar looking for simplicity in usage for R packages, I guess r-forge and rforge are the easiest to use. But I would be interested in the workflow when using github as the main VCS. Of course, I could add one extra layer of directory structure (for example, if a package foo sit in directory RPackages/foo I could move it to directory RPackages/foo/foo and set the repository in RPackages/foo) but this seems roundabout and inelegant... I'd be grateful for any suggestions/pointers. I might be missing something here, but I am using r-forge (http://r-forge.r-project.org/) for exactly that. There is also rforge (http://www.rforge.net/) and some packages are on github. My folder looks as follow: - -local name for the package (selected folders / fiiles under VC) +--pkg (under VC) +--www (under VC) +--other fiolders (not under VC) If a project gets registered at r-forge, you can check it out and you have all the relevant info in a README file - don't know about rforge. Hope this helps, Cheers, Rainer Thanks, Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D):+49 - (0)3 21 21 25 22 44 email: rai...@krugs.de Skype: RMkrug -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8zhzsACgkQoYgNqgF2egrKOQCfb2tpU0lvS2F394gGDv+4c+Lp m4EAnRO76W8Rm2OM6yQ2QoWF0xQcEfBD =3QHA -END PGP SIGNATURE- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calling the function which is stored in a list
On 09.02.2012 08:54, arunkumar wrote: Hi I'm storing two functions in a list # creating two function function1- function(n) { return(sum(n)) } function2- function(n) { return(mean(n)) } #storing the function function3 =c(function1,function2) is it possible to call the stored function and used it ? x=c(10,29) funtion3[1](x) Yes, if you correct the typo and use list indexing with double brackets: function3[[1]](x) Uwe Ligges Thanks - Thanks in Advance Arun -- View this message in context: http://r.789695.n4.nabble.com/calling-the-function-which-is-stored-in-a-list-tp4372131p4372131.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Version control (git, mercurial) for R packages
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 09/02/12 00:33, Tom Roche wrote: Peter Langfelder Thu Feb 9 00:01:31 CET 2012 I'm exploring using a version control system +1! welcome to the new millenium :-) to keep better track of changes to the [R] packages I maintain. I'm leaning towards git I like 'git' too, but one thing to consider (though keep in mind that I'm new to R, so I Could Be Wrong): R-forge https://r-forge.r-project.org/ seems to be the canonical place to put R packages, and it's svn. That can be finessed, e.g., http://cameron.bracken.bz/git-with-r-forge Nice link - but I can't follow it. Actual Situation: - - Package on r-forge Want to have: - - git (github?) repo as my main cvs and - - branch (release_x.y.z) for new release for r-forge - - push branch release_x.y.z to r-forge to do it's magic. So there are two initial steps: 1) Initial import of r-forge repo to git(hub) (should be easy) 2) setup of local git repo for package (is easy) and then github is used as VCS When a release should be done, the following needs to be done: - - branch on git (easy) - - push new branch to r-forge (how?) now how could this be done easily? can I do this with git svn commit ? Cheers, Rainer FWIW, Tom Roche tom_ro...@pobox.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D):+49 - (0)3 21 21 25 22 44 email: rai...@krugs.de Skype: RMkrug -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8ziuQACgkQoYgNqgF2egr/0gCfaZLJ78RK2qOhdJAllixQYlgc Fw0An0lJnq2bdaCz+OlRDU5QYOMzWuut =C/DE -END PGP SIGNATURE- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] width and alignment of Latex table columns
Dear all, I'm using xtable package in order to produce Latex table. There is a way to specify not only the alignment but also the width of table colums?? Thanks for your attention. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Version control (git, mercurial) for R packages
Once upon a time r-forge had the option to sync from an external svn repository, e.g. hosted on googlecode. I haven't seen it available for some time, sadly. I'm sure many users would appreciate if this feature came back with the new interface. Not sure if it could work with git as well, though. b. On 9 February 2012 21:59, Rainer M Krug r.m.k...@gmail.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 09/02/12 00:33, Tom Roche wrote: Peter Langfelder Thu Feb 9 00:01:31 CET 2012 I'm exploring using a version control system +1! welcome to the new millenium :-) to keep better track of changes to the [R] packages I maintain. I'm leaning towards git I like 'git' too, but one thing to consider (though keep in mind that I'm new to R, so I Could Be Wrong): R-forge https://r-forge.r-project.org/ seems to be the canonical place to put R packages, and it's svn. That can be finessed, e.g., http://cameron.bracken.bz/git-with-r-forge Nice link - but I can't follow it. Actual Situation: - - Package on r-forge Want to have: - - git (github?) repo as my main cvs and - - branch (release_x.y.z) for new release for r-forge - - push branch release_x.y.z to r-forge to do it's magic. So there are two initial steps: 1) Initial import of r-forge repo to git(hub) (should be easy) 2) setup of local git repo for package (is easy) and then github is used as VCS When a release should be done, the following needs to be done: - - branch on git (easy) - - push new branch to r-forge (how?) now how could this be done easily? can I do this with git svn commit ? Cheers, Rainer FWIW, Tom Roche tom_ro...@pobox.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D): +49 - (0)3 21 21 25 22 44 email: rai...@krugs.de Skype: RMkrug -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8ziuQACgkQoYgNqgF2egr/0gCfaZLJ78RK2qOhdJAllixQYlgc Fw0An0lJnq2bdaCz+OlRDU5QYOMzWuut =C/DE -END PGP SIGNATURE- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Version control (git, mercurial) for R packages
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 09/02/12 10:09, baptiste auguie wrote: Once upon a time r-forge had the option to sync from an external svn repository, e.g. hosted on googlecode. I haven't seen it available for some time, sadly. I'm sure many users would appreciate if this feature came back with the new interface. Not sure if it could work with git as well, though. One option is obviously to use a local git repo and then use git svn - but my ideal setup would be to have one development repo on git(hub), and one package release repo on r-forge, and whenever I decide yes, that is a release, branch it on git and push this to r-forge. By doing this, r-forge only contains releases, and git(hub) the development history. But thinking about it, a different r-forge VCS backend (git) would also be an option. Why was that option scraped? Rainer b. On 9 February 2012 21:59, Rainer M Krug r.m.k...@gmail.com wrote: On 09/02/12 00:33, Tom Roche wrote: Peter Langfelder Thu Feb 9 00:01:31 CET 2012 I'm exploring using a version control system +1! welcome to the new millenium :-) to keep better track of changes to the [R] packages I maintain. I'm leaning towards git I like 'git' too, but one thing to consider (though keep in mind that I'm new to R, so I Could Be Wrong): R-forge https://r-forge.r-project.org/ seems to be the canonical place to put R packages, and it's svn. That can be finessed, e.g., http://cameron.bracken.bz/git-with-r-forge Nice link - but I can't follow it. Actual Situation: - Package on r-forge Want to have: - git (github?) repo as my main cvs and - branch (release_x.y.z) for new release for r-forge - push branch release_x.y.z to r-forge to do it's magic. So there are two initial steps: 1) Initial import of r-forge repo to git(hub) (should be easy) 2) setup of local git repo for package (is easy) and then github is used as VCS When a release should be done, the following needs to be done: - branch on git (easy) - push new branch to r-forge (how?) now how could this be done easily? can I do this with git svn commit ? Cheers, Rainer FWIW, Tom Roche tom_ro...@pobox.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D):+49 - (0)3 21 21 25 22 44 email: rai...@krugs.de Skype: RMkrug -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8zktEACgkQoYgNqgF2egrvIACeMwR6uIyJoLXvZSb4gvo7Opzm S7YAnimWJjfDgHUlpdn7rPRXMXmZltWf =gqtc -END PGP SIGNATURE- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fixed effects linear model in R
Dear Andrew, Thanks for your suggestion. I will indeed have a look at Allison's booklet... Best, On 7 February 2012 23:39, Andrew Miles rstuff.mi...@gmail.com wrote: Based on Paul Allison's booklet Fixed Effect Regression Models (2009), the FE model can be estimated by person-mean centering all of your variables (but not the outcome), and then including a random intercept for each person. The centering gives you the FE model estimates, and the random intercept adjusts the standard errors for clustering by individuals. Note that your data must be in person-period (or long) format to do this. In case you are unfamiliar with person-mean centering, that simply means taking the mean of each person's values for a given variable for all of the periods in your data, and then calculating a deviation from that mean at each time period. For example, a person's average income over four years might be $50,000, but in each year their actual income would be slightly higher or lower than this (these would be the person-mean deviations). In symbolic form, your code might look something like this: library(lme4) variable_pmcentered = variable - person_mean mod = lmer(outcome ~ variable_pmcentered + person_mean + other predictors + (1|personID)) The advantage of this method (which Allison calls a hybrid method) over traditional FE models is that you get the benefits of a FE model (subtracting out time-invariant omitted variables) along with the benefits of random effect models (e.g., estimating coefficients for time-invariant variables, estimating interactions with time, letting intercepts and slopes varying randomly, etc.) See Allison's booklet for more details on this method. Allison, Paul D. 2009. Fixed Effects Regression Models. Los Angeles, C.A.: Sage. Andrew Miles On Feb 7, 2012, at 5:00 PM, caribou...@gmx.fr wrote: Dear R-helpers, First of all, sorry for those who have (eventually) already received that request. The mail has been bumped several times, so I am not sure the list has received it... and I need help (if you have time)! ;-) I have a very simple question and I really hope that someone could help me I would like to estimate a simple fixed effect regression model with clustered standard errors by individuals. For those using Stata, the counterpart would be xtreg with the fe option, or areg with the absorb option and in both case the clustering is achieved with vce(cluster id) My question is : how could I do that with R ? An important point is that I have too many individuals, therefore I cannot include dummies and should use the demeaning usual procedure. I tried with the plm package with the within option, but R quikcly tells me that the memory limits are attained (I have over 10go ram!) while the dataset is only 700mo (about 50 000 individuals, highly unbalanced) I dont understand... plm do indeed demean the data so the computation should be fast and light enough... and instead it saturates my memory and do not converge... Do you have an idea ? Moreover, it is possible to obtain cluster robust standard errors with plm ? Are there any other solutions for fixed effects linear models (with the demean trick in order not to create as many dummies as individuals) ? Many thanks in advance ! ;) John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Force printing of excluded axis annotations
Justin Fincher wrote: Howdy, This should be simple, but I am finding that I can't find a simple solution. I have a plot to which I am manually adding the annotations to the y-axis with this command: axis(2, c(-4,-3,-2,-1,0,1,2,3,4,5,6,7),labels=c(-4,-3,-2,-1,0,1,2,3,4,5,6,7),cex.axis=8) The issue is that, apparently, R doesn't think that the -1 can fit, even though there is most certainly enough space. Is there a way to force R to print all the annotations I give it, regardless of proximity or to reduce the space it believes it needs? Thank you. - Fincher Hi Justin, The staxlab function (plotrix) will squeeze labels, just set nlines=1. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Hotelling T2 test extension for multigroup data
Hi all, I've got the following matrix : mat - matrix(rnorm(700), ncol=5, dimnames=list( paste(f, c(1:140), sep=_), c(A, B, C, D, E))) I can see that currently most of the multivariate Hotelling T2 tests are limited for application on two groups/samples. I wud appreciate if someone can provide me a suggestion how to implement the same for multiple groups. Thx, - Vickie __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unsparse a vector
On Wed, Feb 08, 2012 at 05:01:01PM -0500, Sam Steingold wrote: loop is too slow. it appears that sparseMatrix does what I want: ll - lapply(l,length) i - rep(1:4, ll) vv - unlist(l) j1 - as.factor(substring(vv,1,1)) t - table(j1) j - position of elements of j1 in names(t) sparseMatrix(i,j,x=as.numeric(substring(vv,2,2)), dimnames = names(t)) so, the question is, how do I produce a vector of positions? i.e., from vectors [1] A B A C A B and [1] A B C I need to produce a vector [1] 1 2 1 3 1 2 of positions of the elements of the first vector in the second vector. This particular thing may be done as follows match(c(A, B, A, C, A, B), c(A, B, C)) [1] 1 2 1 3 1 2 PS. Of course, I would much prefer a dataframe to a matrix... As the final result or also as an intermediate result? Changing individual rows in a data frame is much slower than in a matrix. Compare n - 1 mat - matrix(1:(2*n), nrow=n) df - as.data.frame(mat) system.time( for (i in 1:n) { mat[i, 1] - 0 } ) user system elapsed 0.021 0.000 0.021 system.time( for (i in 1:n) { df[i, 1] - 0 } ) user system elapsed 4.997 0.069 5.084 This effect is specific to working with rows. Working with the whole columns is a different thing. system.time( { col1 - df[[1]] for (i in 1:n) { col1[i] - 0 } df[[1]] - col1 } ) user system elapsed 0.019 0.000 0.019 Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] width and alignment of Latex table columns
Hi, You can set the width of the columns the same way you would in any other LaTeX table, e.g., by setting align to p{width}. http://www.giss.nasa.gov/tools/latex/ltx-68.html has a good summary of LaTeX tables that you may find helpful. Best, Ista On Thursday, February 09, 2012 10:08:32 AM n.via...@libero.it wrote: Dear all, I'm using xtable package in order to produce Latex table. There is a way to specify not only the alignment but also the width of table colums?? Thanks for your attention. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to exclude rows with not-connected coalitions
Dear all, I have question but cannot explain without providing some context first: I want to calculate how many policy-connected coalitions between 7 parties are possible. I have positions on an one-dimensional scale for each party and I have sorted the parties on the positions (it is sorted from extreme left to extreme right, hence using a left-right scale). A policy-connected coalition consists of parties that are connected on this left-right scale, hence there cannot be a party outside the coalition that is in-between two parties in the coalition on the left-right scale. My question is: how do a make a matrix that excludes those coalitions that are not policy-connected. I made a matrix with each combination of 0 (out of coalition) and 1 (in coalition), for 7 parties. a - expand.grid(rep(list(c(0,1)),7)) a gives all possible coalitions between these 7 parties. Now I want to exclude those rows with coalition that are not policy-connected. Hence: this one should be out, because the fifth party is not in. 0 1 1 1 0 1 1 And this one should be in, 0 0 0 1 1 1 1. Hope this is clear and that someone has a good suggestion. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] factor level for non-existing value
Hello everybody! Let's assume I have the following factor with it's levels: a-factor(c(2,3,3,2,4,2,3,2,2,2,3,2,3)) mydata-data.frame(a) When I plot the vector a using barplot(table(mydata$a) unfortunately the value 1 does not show up, as it does not appear in my data. But still, it theoretically exists. How can I assign the following levels to the factor? 1: dislike very much 2: dislike 3: like 4: like very much I have already tried the following code, which does not work levels(data$a)-c(dislike very much,dislike,like,like very much) as 2 then becomes dislike very much. I hope you understand my problem. Thank you for any help! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] width and alignment of Latex table columns
Hi Here is an example with something at hand if you use Sweave. Using Sweave you can embed the R in a chunk and properly format the headers and sometimes the table. the D{.}{.}{4.0} requires the use of \usepackage{dcolumn} \begin{table}[h] \caption[DNO and \rdate]% {DNO and \rdate with row totals}% \label{tab:CW2}% \centering \begin{tabular}{p{1.0in} *{7}{D{.}{.}{4.0}} } \toprule \addlinespace[3pt] % Header \multicolumn{1}{Measurement} \\ \addlinespace[3pt] \midrule \addlinespace[5pt] %Table2 Table2, keep.source=FALSE, results=tex, echo=FALSE= xx - addmargins(xx,2) print( xtable(xx, digits = rep(0,8) ), type= latex, floating = FALSE, tabular.environment = tabular, include.rownames = TRUE, include.colnames = FALSE, only.contents = TRUE, hline.after = NULL ) ## xtable @ % Table2 end \addlinespace[5pt] \bottomrule \end{tabular} \end{table} Regards Duncan Duncan Mackay Department of Agronomy and Soil Science University of New England ARMIDALE NSW 2351 Email: home mac...@northnet.com.au At 22:07 9/02/2012, you wrote: Hi, You can set the width of the columns the same way you would in any other LaTeX table, e.g., by setting align to p{width}. http://www.giss.nasa.gov/tools/latex/ltx-68.html has a good summary of LaTeX tables that you may find helpful. Best, Ista On Thursday, February 09, 2012 10:08:32 AM n.via...@libero.it wrote: Dear all, I'm using xtable package in order to produce Latex table. There is a way to specify not only the alignment but also the width of table colums?? Thanks for your attention. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] factor level for non-existing value
Hi Hello everybody! Let's assume I have the following factor with it's levels: a-factor(c(2,3,3,2,4,2,3,2,2,2,3,2,3)) mydata-data.frame(a) When I plot the vector a using barplot(table(mydata$a) unfortunately the value 1 does not show up, as it does not appear in my data. But still, it theoretically exists. How can I assign the following levels to the factor? 1: dislike very much 2: dislike 3: like 4: like very much I have already tried the following code, which does not work levels(data$a)-c(dislike very much,dislike,like,like very much) as 2 then becomes dislike very much. you can do it when constructing a factor a-factor(c(2,3,3,2,4,2,3,2,2,2,3,2,3), levels=1:4,labels=c(dislike very much,dislike,like,like very much)) or when you already have a factor a-factor(a, levels=1:4) I basically understand that factor is a vector of numeric values with levels and labels attribute. Each level can have some label which can be changed independently. All levels does not need to be present in a factor. However you shall not confuse it with function ?labels which has nothing to do with factors. Regards Petr I hope you understand my problem. Thank you for any help! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Workflow when using git (e.g. github) as SVN repo for package
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi This question is towards development of packages, but I think it does not fit into R-devel mailing list, so I post it here. I would like to use git as the VCS for my packages, but still be able to use the magic of R-forge. I know how to mgrate the existing SVN repo from R-forge to e.g. github, and I am a little bit familiar with using git - so no problem here. But where to from there? How can I bring the magic of R-forge into the game? I thought about pushing only releases to the SVN on R-forge so that the package are then created - but what is the best workflow for this? As many packages are hosted on github, there should be an easy and way to do it? Thanks, Rainer - -- Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Stellenbosch University South Africa Tel : +33 - (0)9 53 10 27 44 Cell: +33 - (0)6 85 62 59 98 Fax : +33 - (0)9 58 10 27 44 Fax (D):+49 - (0)3 21 21 25 22 44 email: rai...@krugs.de Skype: RMkrug -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk8zv3QACgkQoYgNqgF2egpHBgCfYW6Hmtl35kE+OsLm7+GyC6yc GpsAniaJSoTNFGLDxAgu/upUpkakXlcH =wxEn -END PGP SIGNATURE- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to exclude rows with not-connected coalitions
On Thu, Feb 09, 2012 at 12:18:06PM +, Schumacher, G. wrote: Dear all, I have question but cannot explain without providing some context first: I want to calculate how many policy-connected coalitions between 7 parties are possible. I have positions on an one-dimensional scale for each party and I have sorted the parties on the positions (it is sorted from extreme left to extreme right, hence using a left-right scale). A policy-connected coalition consists of parties that are connected on this left-right scale, hence there cannot be a party outside the coalition that is in-between two parties in the coalition on the left-right scale. My question is: how do a make a matrix that excludes those coalitions that are not policy-connected. I made a matrix with each combination of 0 (out of coalition) and 1 (in coalition), for 7 parties. a - expand.grid(rep(list(c(0,1)),7)) a gives all possible coalitions between these 7 parties. Now I want to exclude those rows with coalition that are not policy-connected. Hence: this one should be out, because the fifth party is not in. 0 1 1 1 0 1 1 And this one should be in, 0 0 0 1 1 1 1. Hi. If i understand correctly, a connected coalition is an interval on your scale. If this is correct, try the following for generating only the intervals. n - 7 a - matrix(nrow=n+choose(n, 2), ncol=n) k - 1 for (i in 1:n) { for (j in (i+1):(n+1)) { x - rep(0, times=n+1) x[i] - 1 x[j] - 1 a[k,] - cumsum(x)[1:n] k - k+1 } } a[a == 2] - 0 a [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,]1000000 [2,]1100000 [3,]1110000 [4,]1111000 [5,]1111100 [6,]1111110 [7,]1111111 [8,]0100000 [9,]0110000 [10,]0111000 ... If you already have the matrix of all coalitions and want to detect the connected ones, try the function ?rle. There should be exactly one run of ones. Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] factor level for non-existing value
Hi David, On Thursday, February 09, 2012 01:21:48 PM David Studer wrote: Hello everybody! Let's assume I have the following factor with it's levels: a-factor(c(2,3,3,2,4,2,3,2,2,2,3,2,3)) mydata-data.frame(a) #You need to specify levels and labels here, like this: a - factor(c(2,3,3,2,4,2,3,2,2,2,3,2,3), levels = 1:4, labels = c(dislike very much, dislike, like, like very much)) Then barplot will plot all four factor levels. barplot(table(a)) This is pretty basic stuff, likely to be covered in most introductory texts, so I suggest you take some time to read an introduction to R. Best, Ista When I plot the vector a using barplot(table(mydata$a) unfortunately the value 1 does not show up, as it does not appear in my data. But still, it theoretically exists. How can I assign the following levels to the factor? 1: dislike very much 2: dislike 3: like 4: like very much I have already tried the following code, which does not work levels(data$a)-c(dislike very much,dislike,like,like very much) as 2 then becomes dislike very much. I hope you understand my problem. Thank you for any help! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] search tutorial for function tt in cox.zph
It seems there is a function ?tt? in R which can help to find this function and integrate it in my Cox model. I am searching a good tutorial (or book reference) with example(s) to understand and apply the function ?tt? in R; There is a vignette describing the tt function. Due to an oversight of mine the vignette was not included in CRAN until the most recent update of the survival package (a few days ago). For diagnosis of time-dependence and even more for deciding how to deal with it, I find the plots from the cox.zph function to be the most helpful. These are discussed in my book with P Grambsch. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hotelling T2 test extension for multigroup data
On Feb 9, 2012, at 12:11 , Vickie S wrote: Hi all, I've got the following matrix : mat - matrix(rnorm(700), ncol=5, dimnames=list( paste(f, c(1:140), sep=_), c(A, B, C, D, E))) I can see that currently most of the multivariate Hotelling T2 tests are limited for application on two groups/samples. I wud appreciate if someone can provide me a suggestion how to implement the same for multiple groups. Please don't start up a new thread, cloning the same misconceptions as an earlier one! It doesn't help, you will only get people misunderstanding you in the same way... Apparently, you have rows and columns reversed relative to the common layout in multivariate analysis. Also, you have only one observation for each of your groups (species)? What makes you think that T^2 would work for you if you had only A and B species? I don't think anything will work unless you have either replication or some sort of additional assumptions on the covariance structure of your 140-dimensional response. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to exclude rows with not-connected coalitions
This can be done without nested loops. a - expand.grid(rep(list(c(0,1)),7)) tmp - apply(a, 1, function(x){ var(cumsum(x == 0)[x == 1]) }) a[is.na(tmp) | tmp == 0, ] ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie Kwaliteitszorg / team Biometrics Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Petr Savicky Verzonden: donderdag 9 februari 2012 13:44 Aan: r-help@r-project.org Onderwerp: Re: [R] how to exclude rows with not-connected coalitions On Thu, Feb 09, 2012 at 12:18:06PM +, Schumacher, G. wrote: Dear all, I have question but cannot explain without providing some context first: I want to calculate how many policy-connected coalitions between 7 parties are possible. I have positions on an one-dimensional scale for each party and I have sorted the parties on the positions (it is sorted from extreme left to extreme right, hence using a left-right scale). A policy-connected coalition consists of parties that are connected on this left-right scale, hence there cannot be a party outside the coalition that is in-between two parties in the coalition on the left-right scale. My question is: how do a make a matrix that excludes those coalitions that are not policy-connected. I made a matrix with each combination of 0 (out of coalition) and 1 (in coalition), for 7 parties. a - expand.grid(rep(list(c(0,1)),7)) a gives all possible coalitions between these 7 parties. Now I want to exclude those rows with coalition that are not policy-connected. Hence: this one should be out, because the fifth party is not in. 0 1 1 1 0 1 1 And this one should be in, 0 0 0 1 1 1 1. Hi. If i understand correctly, a connected coalition is an interval on your scale. If this is correct, try the following for generating only the intervals. n - 7 a - matrix(nrow=n+choose(n, 2), ncol=n) k - 1 for (i in 1:n) { for (j in (i+1):(n+1)) { x - rep(0, times=n+1) x[i] - 1 x[j] - 1 a[k,] - cumsum(x)[1:n] k - k+1 } } a[a == 2] - 0 a [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,]1000000 [2,]1100000 [3,]1110000 [4,]1111000 [5,]1111100 [6,]1111110 [7,]1111111 [8,]0100000 [9,]0110000 [10,]0111000 ... If you already have the matrix of all coalitions and want to detect the connected ones, try the function ?rle. There should be exactly one run of ones. Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to exclude rows with not-connected coalitions
Thanks Thierry and Petr for your solutions. I can continue now (until the next problem arises ;). From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf of ONKELINX, Thierry [thierry.onkel...@inbo.be] Sent: Thursday, February 09, 2012 2:09 PM To: Petr Savicky; r-help@r-project.org Subject: Re: [R] how to exclude rows with not-connected coalitions This can be done without nested loops. a - expand.grid(rep(list(c(0,1)),7)) tmp - apply(a, 1, function(x){ var(cumsum(x == 0)[x == 1]) }) a[is.na(tmp) | tmp == 0, ] ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie Kwaliteitszorg / team Biometrics Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium + 32 2 525 02 51 + 32 54 43 61 85 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens Petr Savicky Verzonden: donderdag 9 februari 2012 13:44 Aan: r-help@r-project.org Onderwerp: Re: [R] how to exclude rows with not-connected coalitions On Thu, Feb 09, 2012 at 12:18:06PM +, Schumacher, G. wrote: Dear all, I have question but cannot explain without providing some context first: I want to calculate how many policy-connected coalitions between 7 parties are possible. I have positions on an one-dimensional scale for each party and I have sorted the parties on the positions (it is sorted from extreme left to extreme right, hence using a left-right scale). A policy-connected coalition consists of parties that are connected on this left-right scale, hence there cannot be a party outside the coalition that is in-between two parties in the coalition on the left-right scale. My question is: how do a make a matrix that excludes those coalitions that are not policy-connected. I made a matrix with each combination of 0 (out of coalition) and 1 (in coalition), for 7 parties. a - expand.grid(rep(list(c(0,1)),7)) a gives all possible coalitions between these 7 parties. Now I want to exclude those rows with coalition that are not policy-connected. Hence: this one should be out, because the fifth party is not in. 0 1 1 1 0 1 1 And this one should be in, 0 0 0 1 1 1 1. Hi. If i understand correctly, a connected coalition is an interval on your scale. If this is correct, try the following for generating only the intervals. n - 7 a - matrix(nrow=n+choose(n, 2), ncol=n) k - 1 for (i in 1:n) { for (j in (i+1):(n+1)) { x - rep(0, times=n+1) x[i] - 1 x[j] - 1 a[k,] - cumsum(x)[1:n] k - k+1 } } a[a == 2] - 0 a [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,]1000000 [2,]1100000 [3,]1110000 [4,]1111000 [5,]1111100 [6,]1111110 [7,]1111111 [8,]0100000 [9,]0110000 [10,]0111000 ... If you already have the matrix of all coalitions and want to detect the connected ones, try the function ?rle. There should be exactly one run of ones. Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to pass weka classifier options with a meta classifier in RWeka?
Hi, I am trying to replicate a training of AttributeSelectedClassifier with CFsSubsetEval, BestFirst and NaiveBayes that I have initially done with Weka. Now, I am trying to use RWeka in R. I have a problem of passing arguments to the CfsSubsetEval, BestFirst and NaiveBayes. I have first created an interface for the classifier with: AS-make_Weka_classifier(weka/classifiers/meta/AttributeSelectedClassifier) And then I am trying to run the classifier with: nb.model-AS(class~.,data=ex, control=Weka_control( E=weka.attributeSelection.CfsSubsetEval, S=weka.attributeSelection.BestFirst -D 1, W=weka.classifiers.bayes.NaiveBayes -D)) But now, I get an error saying: Error in .jcall(classifier, V, buildClassifier, instances) : java.lang.Exception: Can't find class called: weka.classifiers.bayes.NaiveBayes -D indicating that the way I am passing the argument -D to the NaiveBayes is incorrect. I am uncertain from the RWeka documentation how the passing mechanism of Weka_control is supposed to work with meta classifiers. All help is greatly appreciated. Many thanks, Kari __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tr: Re: how to pass weka classifier options with a meta classifier in RWeka?
On Thu, 2012-02-09 at 14:52 +0100, Milan Bouchet-Valat wrote: Le jeudi 09 février 2012 à 15:31 +0200, Kari Ruohonen a écrit : snip And then I am trying to run the classifier with: nb.model-AS(class~.,data=ex, control=Weka_control( E=weka.attributeSelection.CfsSubsetEval, S=weka.attributeSelection.BestFirst -D 1, W=weka.classifiers.bayes.NaiveBayes -D)) But now, I get an error saying: Error in .jcall(classifier, V, buildClassifier, instances) : java.lang.Exception: Can't find class called: weka.classifiers.bayes.NaiveBayes -D indicating that the way I am passing the argument -D to the NaiveBayes is incorrect. I am uncertain from the RWeka documentation how the passing mechanism of Weka_control is supposed to work with meta classifiers. All help is greatly appreciated. I've never tried it myself, but ?Weka_control says: One can use lists for options taking multiple arguments, see the documentation for ‘SMO’ for an example. So maybe nb.model-AS(class~.,data=ex, control=Weka_control( E=weka.attributeSelection.CfsSubsetEval, S=list(weka.attributeSelection.BestFirst, D=1), W=list(weka.classifiers.bayes.NaiveBayes, D=1))) Cheers Hi and thanks for the suggestion. Unfortunately, it results in a similar error: Error in .jcall(classifier, V, buildClassifier, instances) : java.lang.Exception: Can't find class called: weka.classifiers.bayes.NaiveBayes -D 1 regards, Kari __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to compare optimized models ?
Good afternoon everybody, I'm using optimization routines like optimix or nlminb to optimize the parameters of my dispersal curves which are working with different kernels : /anlb2002z1b - nlminb(c(0.5,2), objective=LogLiketot,lower=c(0,0), upper=c(1,200))/ where /LogLiketot/ contains my dispersal kernel parameters and /(0.5,2)/, an exemple of starting point to estimate these parameters. /anlb2002z1b/ returns me different values, including the AIC. On the same data, I'm comparing 3 dispersal kernels, exponential kernel (one parameter), exponential-power kernel (two-parameters) and geometric kernel (two-parameters). Is there a way, in R, to compare these dispersal kernels on the base of their AIC to find the best fitting one ? A Likelihood ratio test works only for nested models ... Thanks in advance for your help. Diane. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coxme with frailty--variance of random effect?
Thank you very much for taking the time to answer. This pointed me in the right direction. For those interested, I found a useful explanation of the derivation of the estimated variance of the random effect in Hosmer Lemeshow's Applied Survival Analysis (1999), pp.321-26 (I love your book, Dr. Therneay, but I needed something easier...). They proceed in 4 steps: 1. Obtain the cumulative hazard function for each subject. 2. Choose an arbitrary value for the variance parameter of the frailties (call it theta). 3. Compute for each subject an estimate of the value of their frailties, USING this variance parameter theta: frailty_i= \frac(1+\theta \times c_i}{1+\theta \times H_i} (formula on p. 321), where H is the cumulative hazard for the subject. So if theta is 0 (no variance), then frailty=1 (i.e., no excess frailty). As theta goes to infinity, the estimated frailty is simply the ratio 1/(cumulative risk so far) or 1/(cumulative risk so far), depending on whether the subject is still alive or not. 4. fit the proportional hazard model with the same covariates as before, but this time including the frailties in the hazard function. 5. calculate a log-likelihood for the model. Repeat this for all possible values of the frailties (in practice, sweep through them according to some algorithm), and pick the one with the highest log-likelihood. SO IF I understand correctly, the frailties are calculated GIVEN a variance parameter of the frailties, and not the reverse. I.e., theta is chosen first, and the random effects are estimated accordingly. Which is why the variance of the estimated random effects is NOT the same as theta. Did I get this right? Thanks again, Thomas Best wishes, Thomas Chadefaux On Mon, Feb 6, 2012 at 1:56 PM, Terry Therneau thern...@mayo.edu wrote: In answer to the several questions: 1. Variance of the random effect: Your example is not reproducable since you didn't give the random number seed. Instead I'll use one of the data sets that comes with the survival package. library(coxme) fit - coxme(Surv(tstart, tstop, status) ~ treat + (1|center), cgd) VarCorr(fit)$center Intercept 0.1643779 var(ranef(fit)$center) Intercept [1] 0.06261608 Yes, it is true that for a random effects model, the estimated variance of the random effect is not equal to var(estimated per center effects). Exactly the same is true of a linear mixed effects model: try the same with lmer library(lme4) lfit(status ~ treat + (1|center), cgd) VarCorr(lfit)$center var(ranef(lfit)$center) Why? It's a statistical insight that took me a while, so I don't think I can explain it over email. Find someone familiar with mixed efffects models and have a chat. 2. coxph( +frailty(center)) and coxme give different results. Read the documentation. One of them is fitting a gamma distribution for the random effect, the other a Gaussian. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rearanging Data
On Feb 8, 2012, at 9:48 PM, kevin123 wrote: Hi, This is only a small portion of the Data i am working on I want to make a subset of this data set( Data Set=Claims) MemberID ProviderID Vendor PCPYear Specialty 1 422869788013252 172193 37796 Y1Surgery 2 979032483316066 726296 5300Y3Internal 3 27594272997752 140343 91972Y1 Internal 4 735705597053364 240043 70119 Y3 Laboratory I want to put all the rows containing Y1 into a subset. I tried this but it does not work sub - subset(Claims, Year=Y1) You (as well , apparently, as Tal) don't seem to have learned that = is an assignment function and the needed Comparison operator is == ?Comparison Try instead: sub - subset(Claims, Year==Y1) -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hotelling T2 test extension for multigroup data
Hey Peter, I agree that Hotelling's T test, although technically possible for two groups but rejecting the Ho in this case is not going to convey any meaning. But I am sure there is a possibility. My assumption is based on using simulated reference set for each comparison. -- Vickie Subject: Re: [R] Hotelling T2 test extension for multigroup data From: pda...@gmail.com Date: Thu, 9 Feb 2012 14:08:22 +0100 CC: r-help@r-project.org To: is...@live.com On Feb 9, 2012, at 12:11 , Vickie S wrote: Hi all, I've got the following matrix : mat - matrix(rnorm(700), ncol=5, dimnames=list( paste(f, c(1:140), sep=_), c(A, B, C, D, E))) I can see that currently most of the multivariate Hotelling T2 tests are limited for application on two groups/samples. I wud appreciate if someone can provide me a suggestion how to implement the same for multiple groups. Please don't start up a new thread, cloning the same misconceptions as an earlier one! It doesn't help, you will only get people misunderstanding you in the same way... Apparently, you have rows and columns reversed relative to the common layout in multivariate analysis. Also, you have only one observation for each of your groups (species)? What makes you think that T^2 would work for you if you had only A and B species? I don't think anything will work unless you have either replication or some sort of additional assumptions on the covariance structure of your 140-dimensional response. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] dropterm in MANOVA for MLM objects
Dear Vickie, No one will be able to wave a magic wand over your data to allow you to usefully estimate linear models with 0 df for error, and you certainly can't perform statistical tests. As Peter Dalgaard pointed out, the same confusion was reflected in your subsequent question about Hotelling's T^2. Hotelling T^2 is equivalent to MANOVA when there are two groups. Best, John On Thu, 9 Feb 2012 09:38:51 +0100 Vickie S is...@live.com wrote: Thanks for nice explanation. Unfortunately, matrix in my question is exactly similar to the one I posted earlier : mat - matrix(rnorm(700), ncol=5, dimnames=list( paste(f, c(1:140),sep=_), c(A, B, C, D, E))) Question here is which of the 140 characteristics (i.e. f_1...f_140) distinguish the most between the five plant species. Is it true that this matrix can't be regressed with factor responses (species) ? If so, what alternatives can be used ? - Vickie From: j...@mcmaster.ca To: is...@live.com CC: r-help@r-project.org Subject: RE: [R] dropterm in MANOVA for MLM objects Date: Wed, 8 Feb 2012 20:37:37 -0500 Dear Vickie, I'm afraid that the test problem that you've constructed makes no sense, and doesn't correspond to the problem that you initially described, in which a matrix of presumably 5 responses for presumably 140 observations is regressed on 6 predictors. You regressed your randomly generated matrix of 5 responses and 140 observations on a factor constructed from the distinct 140 observation names. That factor has 140 levels, and so the model uses 140 df, all the df in the data. It's therefore not surprising that the error SSP matrix has 0 df, which is exactly what Anova.mlm (actually, linearHypothesis.mlm, which it calls) tells you. The remark that you found about univariate tests that you apparently found on-line concerns repeated-measures designs and is not relevant to your data. And you can't do a univariate ANOVA when there's 0 df for error in any event. Here's a proper simulation of the kind of data that I think you have: set.seed(12345) E - matrix(rnorm(140*5), ncol=5) X - matrix(rnorm(140*6), ncol=6) Beta - matrix(runif(6*5), ncol=5) Y - X %*% Beta + E colnames(Y) - c(A, B, C, D, E) colnames(X) - c(syct, mmin, mmax, cach, chmin, chmax) Data - as.data.frame(cbind(Y, X)) mod - lm(cbind(A, B, C, D, E) ~ syct + mmin + mmax + cach + chmin + chmax, data=Data) Anova(mod) Type II MANOVA Tests: Pillai test statistic Df test stat approx F num Df den Df Pr(F) syct 1 0.41622 18.395 5 129 9.31e-14 *** mmin 1 0.48288 24.091 5 129 2.2e-16 *** mmax 1 0.62100 42.273 5 129 2.2e-16 *** cach 1 0.61711 41.583 5 129 2.2e-16 *** chmin 1 0.72547 68.180 5 129 2.2e-16 *** chmax 1 0.54825 31.311 5 129 2.2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Best, John -Original Message- From: Vickie S [mailto:is...@live.com] Sent: February-08-12 5:53 PM To: j...@mcmaster.ca Cc: r-help@r-project.org Subject: RE: [R] dropterm in MANOVA for MLM objects Dear Prof Fox, I tried anova but got the following error message: mat - matrix(rnorm(700), ncol=5, dimnames=list( paste(f, c(1:140), sep=_), c(A, B, C, D, E))) summary(Anova(lm(cbind(A, B, C, D, E) ~ factor(rownames(mat)), data=as.data.frame(mat Error in summary(Anova(lm(cbind(A, B, C, D, E) ~ factor(rownames(mat)), : error in evaluating the argument 'object' in selecting a method for function 'summary': Error in linearHypothesis.mlm(mod, hyp.matrix.2, SSPE = SSPE, V = V, ...) : The error SSP matrix is apparently of deficient rank = 0 5 I looked in previous forum and it seems like i have only option of performing the univariate test here. Therefore I used the following, but it still results in an error message: Anova(lm(cbind(A, B, C, D, E) ~ factor(rownames(mat)), data=as.data.frame(mat)), univariate=TRUE, multivariate=F) Error in linearHypothesis.mlm(mod, hyp.matrix.2, SSPE = SSPE, V = V, ...) : The error SSP matrix is apparently of deficient rank = 0 5 Any suggestions ? Thanks Vickie I think I am still missing some important clues here. Is it because the feww From: j...@mcmaster.ca To: is...@live.com CC: r-help@r-project.org Subject: RE: [R] dropterm in MANOVA for MLM objects Date: Wed, 8 Feb 2012 17:01:34 -0500 Dear Vicki, I think that the Anova() function in the car package will do what you want (and will also properly handle models with more structure, such as interactions). Best, John
Re: [R] dropterm in MANOVA for MLM objects
Anywez, I have figured out a solution. Whether it is a magic wand or something else, that I don't know but I wish it would work. Thx for the critical stand. -- Vickie From: j...@mcmaster.ca Subject: Re: [R] dropterm in MANOVA for MLM objects To: is...@live.com CC: r-help@r-project.org Date: Thu, 9 Feb 2012 09:43:37 -0500 Dear Vickie, No one will be able to wave a magic wand over your data to allow you to usefully estimate linear models with 0 df for error, and you certainly can't perform statistical tests. As Peter Dalgaard pointed out, the same confusion was reflected in your subsequent question about Hotelling's T^2. Hotelling T^2 is equivalent to MANOVA when there are two groups. Best, John On Thu, 9 Feb 2012 09:38:51 +0100 Vickie S is...@live.com wrote: Thanks for nice explanation. Unfortunately, matrix in my question is exactly similar to the one I posted earlier : mat - matrix(rnorm(700), ncol=5, dimnames=list( paste(f, c(1:140),sep=_), c(A, B, C, D, E))) Question here is which of the 140 characteristics (i.e. f_1...f_140) distinguish the most between the five plant species. Is it true that this matrix can't be regressed with factor responses (species) ? If so, what alternatives can be used ? - Vickie From: j...@mcmaster.ca To: is...@live.com CC: r-help@r-project.org Subject: RE: [R] dropterm in MANOVA for MLM objects Date: Wed, 8 Feb 2012 20:37:37 -0500 Dear Vickie, I'm afraid that the test problem that you've constructed makes no sense, and doesn't correspond to the problem that you initially described, in which a matrix of presumably 5 responses for presumably 140 observations is regressed on 6 predictors. You regressed your randomly generated matrix of 5 responses and 140 observations on a factor constructed from the distinct 140 observation names. That factor has 140 levels, and so the model uses 140 df, all the df in the data. It's therefore not surprising that the error SSP matrix has 0 df, which is exactly what Anova.mlm (actually, linearHypothesis.mlm, which it calls) tells you. The remark that you found about univariate tests that you apparently found on-line concerns repeated-measures designs and is not relevant to your data. And you can't do a univariate ANOVA when there's 0 df for error in any event. Here's a proper simulation of the kind of data that I think you have: set.seed(12345) E - matrix(rnorm(140*5), ncol=5) X - matrix(rnorm(140*6), ncol=6) Beta - matrix(runif(6*5), ncol=5) Y - X %*% Beta + E colnames(Y) - c(A, B, C, D, E) colnames(X) - c(syct, mmin, mmax, cach, chmin, chmax) Data - as.data.frame(cbind(Y, X)) mod - lm(cbind(A, B, C, D, E) ~ syct + mmin + mmax + cach + chmin + chmax, data=Data) Anova(mod) Type II MANOVA Tests: Pillai test statistic Df test stat approx F num Df den Df Pr(F) syct 1 0.41622 18.395 5 129 9.31e-14 *** mmin 1 0.48288 24.091 5 129 2.2e-16 *** mmax 1 0.62100 42.273 5 129 2.2e-16 *** cach 1 0.61711 41.583 5 129 2.2e-16 *** chmin 1 0.72547 68.180 5 129 2.2e-16 *** chmax 1 0.54825 31.311 5 129 2.2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Best, John -Original Message- From: Vickie S [mailto:is...@live.com] Sent: February-08-12 5:53 PM To: j...@mcmaster.ca Cc: r-help@r-project.org Subject: RE: [R] dropterm in MANOVA for MLM objects Dear Prof Fox, I tried anova but got the following error message: mat - matrix(rnorm(700), ncol=5, dimnames=list( paste(f, c(1:140), sep=_), c(A, B, C, D, E))) summary(Anova(lm(cbind(A, B, C, D, E) ~ factor(rownames(mat)), data=as.data.frame(mat Error in summary(Anova(lm(cbind(A, B, C, D, E) ~ factor(rownames(mat)), : error in evaluating the argument 'object' in selecting a method for function 'summary': Error in linearHypothesis.mlm(mod, hyp.matrix.2, SSPE = SSPE, V = V, ...) : The error SSP matrix is apparently of deficient rank = 0 5 I looked in previous forum and it seems like i have only option of performing the univariate test here. Therefore I used the following, but it still results in an error message: Anova(lm(cbind(A, B, C, D, E) ~ factor(rownames(mat)), data=as.data.frame(mat)), univariate=TRUE, multivariate=F) Error in linearHypothesis.mlm(mod, hyp.matrix.2, SSPE =
[R] Question about measurment invariance in a multigroup SEM
Hello All, I have a question about measurement invariance in a multigroup SEM. I am of the impression that one cannot test the equality of structural paths across groups without demonstrating measurement invariance of latent factors first. At a minimum, this would involve demonstrating that corresponding factor loadings do not differ across groups using, say, a difference in chi-square test. Am I right about this? Is there any published work that argues it is possible to compare the structural paths without demonstrating measurement invariance first? Thanks, Paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] February-March 2012 ***R/S-PLUS Courses***by XLSolutions Corp at 9 USA locations
Happy New Year ! XLSolutions February-March 2012 R/S-PLUS courses schedule is now available online at 9 USA cities for with 13 new courses: *** Suggest a future course date/city (1) R-PLUS: A Point-and-Click Approach to R (2) S-PLUS / R : Programming Essentials. (3) R/S+ Fundamentals and Programming Techniques (4) R/S-PLUS Functions by Example. (5) S/R-PLUS Programming 3: Advanced Techniques and Efficiencies. (6) R/S+ System: Advanced Programming. (7) R/S-PLUS Graphics: Essentials. (8) R/S-PLUS Graphics for SAS Users (9) R/S-PLUS Graphical Techniques for Marketing Research. (10) Multivariate Statistical Methods in R/S-PLUS: Practical Research Applications (11) Introduction to Applied Econometrics with R/S-PLUS (12) Exploratory Analysis for Large and Complex Problems in R/S-PLUS (13) Determining Power and Sample Size Using R/S-PLUS. (14) R/S-PLUS: Data Preparation for Data Mining (15) Data Cleaning Techniques in R/S-PLUS (16) R/S-PLUS: Applied Clustering Techniques More on website http://www.xlsolutions-corp.com/rplus.asp Ask for group discount and reserve your seat Now - Earlybird Rates. Payment due after the class! Email Sue Turner: sue at xlsolutions-corp.com Phone: 206-686-1578 Please let us know if you and your colleagues are interested in this class to take advantage of group discount. Register now to secure your seat. Cheers, Elvis Miller, PhD Manager Training. XLSolutions Corporation 206 686 1578 www.xlsolutions-corp.com elvis at xlsolutions-corp.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Any R course in New York or Washington DC in March?
Good day Eugene and R-useRs ! We have a few course coming up in New York City and Washington DC during the week of March 11th, 2012. http://www.xlsolutions-corp.com/rplus.asp R Rocks ! Sue From: eugene dalt eugened...@yahoo.com To: r-help@r-project.org r-help@r-project.org Sent: Monday, February 6, 2012 11:57 AM Subject: [R] Any R course in New York or Washington DC in March? Hey Folks, I am looking for a R course in New York or Washington DC, please email me any course announcement you know of for these dates. Best - Eugene [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Java heap space Error while reading table from postgres database using RJDBC
Hi List, I am reading table from postgres database into R session using RJDBC, table contains 150 columns and 20 rows. Sample code is as below, which works fine with smaller tables. db_driver - mydir$db_driver db_jar_file - mydir$db_jar_file db_server - mydir$db_server db_server_lgn - mydir$db_server_lgn db_server_pwd - mydir$db_server_pwd library(RJDBC) drv - JDBC(paste(db_driver, sep = ), paste(db_jar_file, sep = ), identifier.quote=`) conn - dbConnect(drv, paste(db_server, sep = ), paste(db_server_lgn, sep = ), paste(db_server_pwd, sep = )) cs_input_abt - dbReadTable(conn, cs_input_abt) Following are the different error occurs after executing above script, every time different error when above script is executed. 1. Error in .jcall(rp, I, fetch, stride) : java.lang.OutOfMemoryError: Java heap space 2. Error in .verify.JDBC.result(r, Unable to retrieve JDBC result set for , : Unable to retrieve JDBC result set for SELECT * FROM cs_input_abt (Could not initialize class org.postgresql.util.PSQLException) 3. Error in .verify.JDBC.result(r, Unable to retrieve JDBC result set for , : Unable to retrieve JDBC result set for SELECT * FROM bs_modelling_abt (GC overhead limit exceeded) Where am I going wrong? Is there any option which I had not used in the RJDBC connection or needed to add? [[elided Yahoo spam]] Ajit -- View this message in context: http://r.789695.n4.nabble.com/Java-heap-space-Error-while-reading-table-from-postgres-database-using-RJDBC-tp4372816p4372816.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calling the function which is stored in a list
Thanks it works - Thanks in Advance Arun -- View this message in context: http://r.789695.n4.nabble.com/calling-the-function-which-is-stored-in-a-list-tp4372131p4372557.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ifelse
Hello, I need to print a screen message after a test. list-c(10,1,100,10) ifelse(all(list %in% c(1,10,100)==TRUE),cat(YES\n),cat(NO\n)) YES Error in ifelse(all(l %in% c(1, 10, 100) == TRUE), cat(YES\n), cat(NO\n)) : replacement has length zero I have the correct answer, YES, but with an Error. Why? TY for any help Arnaud Gaboury A2CT2 Ltd. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] generate matrices
Dear all, I would like to generate 500 matrices of 20 numbers from standard normal distribution with names x1,x2,x3,â¦.x500.  I tried with loop for, but I donât know how to name matices : for (i in 1:500) {   x[[i]] - matrix(rnorm(20), 4)    } Any suggestion? Thanks, Blaž [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fixed effects with clustered standard errors
Dear Giovanni, Many thanks for your interesting suggestions. Your guess is indeed right, I only use the 'within' fixed effects specification. I will soon send to this list all the additional information you requested in order to understand what might cause this problem, but I would say as a first guess that the inefficiency is (probably?) due to individuals with too many datapoints : the median number of points is 10, but I have some individuals with more than 1000, 5000 or even 80 000 points overall! So basically my dataset is probably too strange, as you suggested, compared to the standard panel dataset in social sciences... To be continued... ;-) Many thanks again Best, On 8 February 2012 18:55, Millo Giovanni [via R] ml-node+s789695n4370302...@n4.nabble.com wrote: Dear John, interesting. There must be a bottleneck somewhere, which possibly went unnoticed because econometricians seldom use so many data points. In fact 'plm' wasn't designed to handle only 700 Megs of data at a time; but we're happy to investigate in this direction too. E.g., I was aware of some efficiency problems if effect=twoways but I seem to understand that you are using effect=individual? -- which takes me to the main point. I understand that enclosing the data for a reproducible report, as requested by the posting guide, is awkward for such a big dataset. Yet it would be of great help if you at least produced: - an output of your procedure, in order to see what goes wrong and where - the output of traceback() called immediately after you got the error (idem) and possibly gave it a try with lm() applied to the very same formula and data, maybe into a system.time( ... ) statement. Else, the information you provide is way too scant to even make an educated guess. For example, it isn't clear whether the problem is related to plm() or to vcovHC.plm etc. As far as simple demeaning is concerned, you might try the following code, which really does only that. Be aware that **standard errors are biased** etc. etc., this is not meant to be a proper function but just a computational test for your data and a quick demonstration of demeaning. 'plm()' is far more structured, for a number of reasons. Please execute it inside system.time() again. # test function for within model, BIASED SEs !! # ## ## ## example: ## data(Produc, package=plm) ## mod - FEmod(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, index=Produc$state, data=Produc) ## summary(mod) ## ## compare with: ## library(plm) ## example(plm) demean-function(x,index,lambda=1,na.rm=F) { as.vector(x-lambda*tapply(x,index,mean,na.rm=na.rm)[unclass(as.factor(in dex))])  } FEmod-function(formula,index,data=ls()) {  ## fit a model without intercept in any case  formula-as.formula(paste(deparse(formula(formula)),-1,sep=))  X-model.matrix(formula,data=data)  y-model.response(model.frame(formula,data=data))  ## reduce index accordingly  names(index)-row.names(data)  ind-index[which(names(index)%in%row.names(X))]  ## within transf.  MX-matrix(NA,ncol=dim(X)[[2]],nrow=dim(X)[[1]])  for(i in 1:dim(X)[[2]]) {   MX[,i]-demean(X[,i],index=ind,lambda=1)   }  My-demean(y,index=ind,lambda=1)  ## estimate within model  femod-lm(My~MX-1)  return(femod) } ### end test function Best, Giovanni ### original message # -- Message: 28 Date: Tue, 07 Feb 2012 15:35:07 +0100 From: [hidden email] To: [hidden email] Subject: [R] fixed effects with clustered standard errors Message-ID: [hidden email] Content-Type: text/plain; charset=utf-8 Dear R-helpers, I have a very simple question and I really hope that someone could help me I would like to estimate a simple fixed effect regression model with clustered standard errors by individuals. For those using Stata, the counterpart would be xtreg with the fe option, or areg with the absorb option and in both case the clustering is achieved with vce(cluster id) My question is : how could I do that with R ? An important point is that I have too many individuals, therefore I cannot include dummies and should use the demeaning usual procedure. I tried with the plm package with the within option, but R quikcly tells me that the memory limits are attained (I have over 10go ram!) while the dataset is only 700mo (about 50 000 individuals, highly unbalanced) I dont understand... plm do indeed demean the data so the computation should be fast and light enough... ?! Are there any other solutions ? Many thanks in advance ! ;) John end original message Giovanni Millo, PhD Research Dept., Assicurazioni Generali SpA Via Machiavelli 4, 34132 Trieste (Italy) tel. +39 040 671184 fax  +39 040 671160 Ai sensi del D.Lgs. 196/2003 si precisa che le informazi...{{dropped:12}} __
[R] Row-wise kronecker product with Matrix package
I'm trying to calculate the row-wise kronecker product A \Box B of two sparse matrices A and B, and am struggling to find a quick way to do this that takes advantage of sparseness. I thought a good idea would be to use rep to construct 2 matrices of the same dimension of the end product, and multiply these two together: library(Matrix) A-Matrix(c(1,0,0,0,0,1,2,0), 2, 4) B-Matrix(c(2,5,0,0,0,1,0,0,0,0), 2, 5) A[, rep(seq(ncol(A)), each = ncol(B))] * B[, rep(seq(ncol(B)),ncol(A))] This works, but for much larger problems is slow (compared to keeping A and B dense). I was wondering why this happens, and whether there might be a way around it? Thanks in advance for any advice, Alastair -- View this message in context: http://r.789695.n4.nabble.com/Row-wise-kronecker-product-with-Matrix-package-tp4373437p4373437.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cumulative R2 and Q2 values?
Greetings R users, I have been working on running plsda and I would like to have the R2 and Q2 values. I know the function R2 from the 'pls' package will generate both R2 and Q2 but they are for each separate class. Is there a way to get the cumulative R2 and Q2 for the whole model? R2(pls.new, estimate=all) Response: B (Intercept)1 comps 2 comps train 0.0 0.001759 0.3812 CV-0.03419 -0.082686 0.2983 Response: FR8 (Intercept) 1 comps 2 comps train 0.0 0.4362 0.4362 CV-0.03419 0.3936 0.3745 Response: S45 (Intercept) 1 comps 2 comps train 0.0 0.3682 0.7749 CV-0.03419 0.3061 0.7313 Again, what I am looking for is a way to get an overall R2 and Q2 values for the entire model. My thanks to any who can assist me, Charles [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] computing scores from a factor analysis
William, I had a problem similar to Wolfgang and I solved it through your help. Many thanks! Just an observation which sounded strange to me ( I am not a statistician, just a wildlife biologist) I have noticed that running the pca using principal with raw data (and therefore using scores=TRUE in the command line) gives different pca scores than running the same pca with the correlation matrix (using scores=FALSE in the command line and therefore calculating the scores in the way you suggested to Wolfgang). Is that normal? Thanks francesca -- View this message in context: http://r.789695.n4.nabble.com/computing-scores-from-a-factor-analysis-tp4306234p4372993.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subset a datafram according to time
PREFERED WAY OF DOING IT I have a data set of observations every second for a month long period, I want to extract the observations according to the date time of another data frame ( the other data frame is in the same format). I want to do this to match these observations to my test observations (in the other data frame) which are done every 6 minutes. So basically im shrinking the data frame of second observations to only display the date time observation every 6 minutes. ALTERNATIVE WAY In other words I just wanna extract every 6 minute observation value from this dataframe of everysecond observations datetimeobservations 02/08/2011 00:00 1.165 02/08/2011 00:01 1.241 02/08/2011 00:02 1.232 Im pretty new to the porgram done 2 days on it learning and learning, im getting my head around it. Help would be much appreciated. -- View this message in context: http://r.789695.n4.nabble.com/Subset-a-datafram-according-to-time-tp4372293p4372293.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generate matrices
What about this? B - 500 X - matrix(rnorm(B*20), ncol = B) colnames(X) - paste('x', 1:B, sep = ) head(X) Thus, each column correspond to a sample of 20 numbers from a standard normal distribution. HTH, Jorge.- On Thu, Feb 9, 2012 at 8:38 AM, Blaz Simcic wrote: Dear all, I would like to generate 500 matrices of 20 numbers from standard normal distribution with names x1,x2,x3,â¦.x500. I tried with loop for, but I donât know how to name matices : for (i in 1:500) { x[[i]] - matrix(rnorm(20), 4) } Any suggestion? Thanks, Blaž [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Outlier removal techniques
Hello, I need to analyse a data matrix with dimensions of 30x100. Before analysing the data there is, however, a need to remove outliers from the data. I read quite a lot about outlier removal already and I think the most common technique for that seems to be Principal Component Analysis (PCA). However, I think that these technqiue is quite subjective. When is an outlier an outlier? I uploaded an example PCA plot here: http://s14.postimage.org/oknyya1ld/pca.png Should we treat the green and red dots as outliers already or only the blue one which lies outside the 95% confidence interval. It seems very arbitrary how people remove outliers using PCA. I also thought about fitting a linear model through my data and look at distribution of the residuals. However, the problem with using linear models is that one can actually never be sure that the model used is the one which describes the data best. In model A, for instance, we might treat sample 1 as and outlier but fitting a different model B sample 1 might not be an outlier at all. I had a brief look at k-means clustering as well but I think it's not the right thing to go for. Again, how do one decide which cluster is an outler? And also it is known that different cluster analysis lead to totally different results. So which one to choose? Is there any other way to non-subjectively remove outliers from data? I would really appreciated any ideas/comments you might have on that topic. Cheers -- View this message in context: http://r.789695.n4.nabble.com/Outlier-removal-techniques-tp4372652p4372652.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] poLCA and conditional dependence
Dear all, I'm an Sri Lankan undergraduate student. I'm also a total newbie to R. My aim is to use the poLCA package to do a latent class analysis. I found the documentation very helpful, but need to make a small clarification that has stumped me awhile. In my work, I need to make provision for conditional dependence. I'm told that poLCA lets you do that. Unfortunately, I couldn't find a specific example on how to do this. However, I did find a reference to latent class regression using cbind(Y1,Y2,Y3)~X1+X2*X3 etc. Forgive my ignorance, but is this the same as conditional dependence ? I dont think so -- Best Regards, Suranga [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] GLM - guess the distribution of the response variable
Dear all, I have question regarding GLMs: I have a discrete response variable and a continuous explaining variable. Like this: http://www.myimg.de/?img=example1db0f.jpg I want to use a GLM to investigate. I have to specify the familiy of the distribution of the response variable - or, maybe more precise, the family of the distribution of the residuals of the response variable (the mean of the response should be explained by the deterministic model part of the GLM, as far as I understood). How do I know the distribution family of the response? OK - its needs to be a discrete distribution, but there are Poisson, Binomial, Neg. Binomial. Any ideas? Many thanks, wonko __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Arial font in eps figures in R
Hi, I am trying to create a graph using Arial font (as required by PLoS One). I have read probably all posts in R-help on this topic, as wells as R-news 2006. The code I have been trying is following: Arial - Type1Font(family=Arial, metrics=c(ArialMT.afm, arial-BoldMT.afm, Arial-ItalicMT.afm, Arial-BoldItalicMT.afm)) postscriptFonts(Arial=Arial) postscript(testArial.eps, horizontal=F, onefile=F, width=4, height=4) par(family=Arial) plot(1:10, 1:10) dev.off() but getting the following error message: Error in axis(side = side, at = at, labels = labels, ...) : family 'Arial' not included in PostScript device Alternatively I've also tried: Arial - Type1Font(family=Arial, metrics=c(ArialMT.afm, arial-BoldMT.afm, Arial-ItalicMT.afm, Arial-BoldItalicMT.afm)) postscriptFonts(Arial=Arial) postscript(testArial.eps, horizontal=F, onefile=F, width=4, height=4, family=Arial) plot(1:10, 1:10) dev.off() which resulted in: Error in postscript(testArial.eps, horizontal = F, onefile = F, width = 4, : Failed to initialise default PostScript font The *.afm files were downloaded from http://www.koders.com/, since I wasn't able to run ttf2afm or ttf2pt1 on my computer. The files are saved under ../R/library/grDevices/afm. I have also tried saving the files with LF line endings (instead of CRLF) as indicated in README under /grDevices/afm, and zipping them. That didn't make a difference. I am currently running R-2.14.1 on Windows7. If I run postscriptFonts(), the Arial font is included in the list, very much as other types of fonts. I've seen that some R packages are now using arial as default (e.g. googleVis and R2PPT) but I cannot see how it is done. Thanks in advance for any suggestions. /Sharon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package does not have a NAMESPACE
A new version of the kopls package has been made available. This installs without problems in R 2.14 (windows 32-bit). Download from here: https://sourceforge.net/projects/kopls/files/K-OPLS%20for%20R/1.1.2/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subset select=variable with a list of names
Hello, I would like to make a function which extracts a subset, from a dataset, with only the columns that I want (specifying their names). For example, having this matrix: mydata-matrix(c(22,1,3,2001,24,5,7,2002,26,7,8,2002,28,5,7,2003), byrow=TRUE, ncol=4, dimnames=list(c(1,2,3,4), c(age,day,month,year))) mydata age day month year 1 22 1 3 2001 2 24 5 7 2002 3 26 7 8 2002 4 28 5 7 2003 I would like to create a function like: x-function(names) {subset(mydata, select=names) } So I can choose every time which columns select, i.e. when I call: x(age,day) it would returns: age day 1 22 1 2 24 5 3 26 7 4 28 5 Obviously it is not working, and I don't know how to do to fix it. Do you have any suggestion? Thank you very much __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sample points - sp package
When I make a sample in sp, for example a random samping of 5 points in dataset1: sample.dataset1=spsample(dataset1, n = 5, random) I have a sp object for the selected sampling... My question is : How can I extract from dataset2 the points data for the coordinates(sample.dataset1) ? The dataset2 would be an object with the same area and coordinate system, but I would like to extract for the coordinates of sample.dataset1 the data in each point for dataset2. I didn't found a way to do this in the sp package docs, but I might have missed it. -- View this message in context: http://r.789695.n4.nabble.com/sample-points-sp-package-tp4372673p4372673.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] which R package is used for browsing web pages through coding
i know RCurl pakage to retrieve web content,it has limited use, i want interactive package like (in perl---Mechanize, In java---Watij,Prowser,HTMLunit,HTTPunit, in RubyWatir ,etc) this modules/packages opens appropriate browser,which can create queries,retrieves output, clicks buttons, fill up form automatically,searches keyword in search engine, Downloads many items from internet All this is by coding if find ,kindly give me sample examples(codes) at list two/five if not found,give me RCurl's sample codes starting from how to import library to closing browser, with explanation for each line of code -- View this message in context: http://r.789695.n4.nabble.com/which-R-package-is-used-for-browsing-web-pages-through-coding-tp4372888p4372888.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with TimeSeries
Hello everyone! I´ve started using R last week and I´m having really persistent problems trying to make predictions using the HoltWinters function. I´m sending my script and the errors I´m getting. My data is: Jan Feb Mar Apr May Jun Jul Ago Sep Oct Nov Dec 2004 118 143 169 158 143 135 135 140 135 125. 120. 120. 2005 143 158 180 180 150 150 153 148 150 145. 145. 143. 2006 165 180 190 168 163 163 168 175 180 160. 155. 150. 2007 152 163 185 163 158 175 175 178 168 168. 160. 160. 2008 155 170 195 195 195 185 185 165 165 153. 153. 153. The data I´m receiving using AP.ts - ts(AP,start=2004,end=2008,frequency=2) is: Jan Feb Mar Apr May Jun Jul Ago Sep Oct Nov Dec 2004.0 118 143 169 158 143 135 135 140 135 125. 120. 120. 2004.5 143 158 180 180 150 150 153 148 150 145. 145. 143. 2005.0 165 180 190 168 163 163 168 175 180 160. 155. 150. 2005.5 152 163 185 163 158 175 175 178 168 168. 160. 160. 2006.0 155 170 195 195 195 185 185 165 165 153. 153. 153. 2006.5 118 143 169 158 143 135 135 140 135 125. 120. 120. 2007.0 143 158 180 180 150 150 153 148 150 145. 145. 143. 2007.5 165 180 190 168 163 163 168 175 180 160. 155. 150. 2008.0 152 163 185 163 158 175 175 178 168 168. 160. 160. However that´s not what I want. Using fr=1 I get my data fine but can´t plot due to the error: plot(AP.ts) Error in plotts(x = x, y = y, plot.type = plot.type, xy.labels = xy.labels, : cannot plot more than 10 series as multiple AND AP.hw - HoltWinters(AP.ts,seasonal=mult) Error in decompose(ts(x[1L:wind], start = start(x), frequency = f), seasonal) : time series has no or less than 2 periods plot(AP.hw) Error in xy.coords(x, y) : 'x' and 'y' lengths differ FILE=AWB1.csv #FILE2=AWB2.csv AP - read.csv(FILE,header=T) AP is(AP) AP.ts - ts(AP,start=2004,end=2008,frequency=2) #AP.ts -ts(structure(c(read.csv(FILE),start=c(2004,1),end=c(2008,1),frequency=12))) is(AP.ts) AP.ts dput(AP.ts) plot(AP.ts) library(zoo) plot(zoo(AP.ts), ylim = c(0, 20)) AP.ts #is(AP.ts) #start(AP.ts);frequency(AP.ts);end(AP.ts) dput(AP.ts) AP.hw - HoltWinters(AP.ts,seasonal=mult) plot(AP.hw) AP.predict - predict(AP.hw,n.ahead=10) AP.predict ts.plot(AP.ts,AP.predict, lty=1:4) ### TESTE data(AirPassengers) aa - AirPassengers aa is(aa) start(aa);frequency(aa);end(aa) Thank you guys! -- Vinicius Macedo Magalhães (21) 9584-1533 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Tr: Re: how to pass weka classifier options with a meta classifier in RWeka?
Le jeudi 09 février 2012 à 15:31 +0200, Kari Ruohonen a écrit : Hi, I am trying to replicate a training of AttributeSelectedClassifier with CFsSubsetEval, BestFirst and NaiveBayes that I have initially done with Weka. Now, I am trying to use RWeka in R. I have a problem of passing arguments to the CfsSubsetEval, BestFirst and NaiveBayes. I have first created an interface for the classifier with: AS-make_Weka_classifier(weka/classifiers/meta/AttributeSelectedClassifier) And then I am trying to run the classifier with: nb.model-AS(class~.,data=ex, control=Weka_control( E=weka.attributeSelection.CfsSubsetEval, S=weka.attributeSelection.BestFirst -D 1, W=weka.classifiers.bayes.NaiveBayes -D)) But now, I get an error saying: Error in .jcall(classifier, V, buildClassifier, instances) : java.lang.Exception: Can't find class called: weka.classifiers.bayes.NaiveBayes -D indicating that the way I am passing the argument -D to the NaiveBayes is incorrect. I am uncertain from the RWeka documentation how the passing mechanism of Weka_control is supposed to work with meta classifiers. All help is greatly appreciated. I've never tried it myself, but ?Weka_control says: One can use lists for options taking multiple arguments, see the documentation for ‘SMO’ for an example. So maybe nb.model-AS(class~.,data=ex, control=Weka_control( E=weka.attributeSelection.CfsSubsetEval, S=list(weka.attributeSelection.BestFirst, D=1), W=list(weka.classifiers.bayes.NaiveBayes, D=1))) Cheers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Weibull Distribution
Hi, I want to estimate parameter of weibull distribution using mle for below density function, The PDF is, f(x) = b a^-b x^(b-1) exp -(x/a)^b In R ,density of the weibull distribution is, f(x) = (a/b) (x/b)^(a-1) exp(- (x/b)^a) which is different from my density, I trying to estimate mle parmeter from R using fitdistr and fitdist functions. Can anyone tell me whatever parameter estimated from R are correct for my density function or not? Thanks Yogendra -- View this message in context: http://r.789695.n4.nabble.com/Weibull-Distribution-tp4373016p4373016.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to specify the Oy axis length
Hi, I'd like to plot a point: plot(1,2) I use RStudio. I'd like to know how to specify the length of Oy axis. I 'd like to put 20 as length of Oy but , by default, I 've got 2.5. Thanks for your help -- View this message in context: http://r.789695.n4.nabble.com/how-to-specify-the-Oy-axis-length-tp4372937p4372937.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with arrows( )
Hello, I have got an issue with this code: plot(...) xMontant = as.POSIXlt(ecs$startAt[i]) xDescendant = as.POSIXlt(ecs$endAt[i]) arrows(xMontant, 0, xMontant, ecsPOW, col=green) Here is the error: Erreur dans arrows(xMontant, 0, xMontant, ecsPOW, col = green) : premier argument incorrect which means: Error in arrows(xMontant, 0, xMontant, ecsPOW, col = green) : first argument is inccorrect I have got the same error with segment(...) , usign the same arguments. Does anybody find where is the problem ? I checked if xMontant is in the right intervale, so it is. That's OK thanks for your help -- View this message in context: http://r.789695.n4.nabble.com/problem-with-arrows-tp4373152p4373152.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] AUC, C-index and p-value of Wilcoxon
Dear all, I am using the ROCR library to compute the AUC and also the Hmisc library to compute the C-index of a predictor and a group variable. The results of AUC and C-index are similar and give a value of about 0.57. The Wilcoxon p-value is 0.001! Why the AUC is showing small value and the p-value is high significant? The AUC is based on Wilcoxon calculation? Many thanks, Lina [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weibull Distribution
Hello, I want to estimate parameter of weibull distribution using mle for below density function, The PDF is, f(x) = b a^-b x^(b-1) exp -(x/a)^b In R ,density of the weibull distribution is, f(x) = (a/b) (x/b)^(a-1) exp(- (x/b)^a) which is different from my density, Is this homework? It seems you need to learn (portuguese) 8th grade math: your 'a' in R is 'b' and your 'b' in R, 'a'. Mathematically, they are exactly the same... Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/Weibull-Distribution-tp4373016p4373198.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Some question about rpart and splitting criterion
Dear All, I have questions about the function rpart to construct a regression tree in R code. My problem is how to change the splitting criteria. In the rpart we have : parms=list(split=..) , I ask you if in this command is it possible to use an another splitting criterion to substitute the default criteria( gini or information)? Does someone can help me ? Thank you, Myriam Tabasso __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with arrows( )
I found the issue: arrows( ) , segments( ) do not accept argument of type POSIXlt. One must convert the argument into POSIXct and it works well. -- View this message in context: http://r.789695.n4.nabble.com/problem-with-arrows-tp4373152p4373422.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ifelse
cat() returns a null value so it's problematic inside ifelse. Here you probably need a regular if statement: if(all(list %in% c(1, 10, 100))) cat(YES\n) else cat(NO\n) # The == TRUE is redundant. Michael On Thu, Feb 9, 2012 at 11:37 AM, Arnaud Gaboury arnaud.gabo...@a2ct2.com wrote: Hello, I need to print a screen message after a test. list-c(10,1,100,10) ifelse(all(list %in% c(1,10,100)==TRUE),cat(YES\n),cat(NO\n)) YES Error in ifelse(all(l %in% c(1, 10, 100) == TRUE), cat(YES\n), cat(NO\n)) : replacement has length zero I have the correct answer, YES, but with an Error. Why? TY for any help Arnaud Gaboury A2CT2 Ltd. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calling the function which is stored in a list
On Thu, Feb 9, 2012 at 2:54 AM, arunkumar akpbond...@gmail.com wrote: Hi I'm storing two functions in a list # creating two function function1 - function(n) { return(sum(n)) } function2 - function(n) { return(mean(n)) } #storing the function function3 =c(function1,function2) is it possible to call the stored function and used it ? x=c(10,29) funtion3[1](x) In addition to [[ as in the other response also try this: L - list(function1 = sum, function2 = mean) L$function1(1:3) Note that the proto package is somewhat similar but stores functions in environments rather than lists and also supports a type of object oriented programming: library(proto) p - proto(function1 = function(., n) sum(n), function2 = function(., n) mean(n), function3 = function(.) mean(.$x), x = 1:10 ) p$function1(1:3) p$function3() # mean of 1:10 p$x - 1:5 p$function3() # mean of 1:5 # define child p2 with its own x overriding p's x p2 - p$proto(x = 10:15) # p2 has its own x so this is mean of 10:15 # p2 inherits its methods from p so its has a # function3 too p2$function3() p$function3() # p unchanged. Still mean of 1:5. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ifelse
TY much. Works for me. Arnaud Gaboury A2CT2 Ltd. -Original Message- From: R. Michael Weylandt [mailto:michael.weyla...@gmail.com] Sent: jeudi 9 février 2012 17:47 To: Arnaud Gaboury Cc: r-help@r-project.org Subject: Re: [R] ifelse cat() returns a null value so it's problematic inside ifelse. Here you probably need a regular if statement: if(all(list %in% c(1, 10, 100))) cat(YES\n) else cat(NO\n) # The == TRUE is redundant. Michael On Thu, Feb 9, 2012 at 11:37 AM, Arnaud Gaboury arnaud.gabo...@a2ct2.com wrote: Hello, I need to print a screen message after a test. list-c(10,1,100,10) ifelse(all(list %in% c(1,10,100)==TRUE),cat(YES\n),cat(NO\n)) YES Error in ifelse(all(l %in% c(1, 10, 100) == TRUE), cat(YES\n), cat(NO\n)) : replacement has length zero I have the correct answer, YES, but with an Error. Why? TY for any help Arnaud Gaboury A2CT2 Ltd. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ifelse
On Feb 9, 2012, at 11:37 AM, Arnaud Gaboury wrote: Hello, I need to print a screen message after a test. list-c(10,1,100,10) ifelse(all(list %in% c(1,10,100)==TRUE),cat(YES\n),cat(NO\n)) YES Error in ifelse(all(l %in% c(1, 10, 100) == TRUE), cat(YES\n), cat(NO\n)) : replacement has length zero I have the correct answer, YES, but with an Error. Why? cat() returns NULL. Why ar enou not just returning the strings themselves or if you want to see the results as well as assign, then use print() which does return its arguments. TY for any help Arnaud Gaboury A2CT2 Ltd. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Outlier removal techniques
On Thu, 9 Feb 2012, mails wrote: I need to analyse a data matrix with dimensions of 30x100. Before analysing the data there is, however, a need to remove outliers from the data. I read quite a lot about outlier removal already and I think the most common technique for that seems to be Principal Component Analysis (PCA). However, I think that these technqiue is quite subjective. When is an outlier an outlier? I uploaded an example PCA plot here: Those more expert than I will certainly provide answers. What I do will new data is create box-and-whisker plots (I use the lattice package) which defines outliers as those data beyond 1.5x the first or third quartile values. No one but you can answer your question on when an outlier is an outlier. It depends on your data set and the context of the data. For example, a water chemistry value that far exceeds a regulartory threshold might be meaningful in the context of a one-off excursion (in which case it's not an outlier but a real data point) or it might result from a handling, instrumentation, or analytical error (in which case toss it as an outlier). Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generate matrices
On Feb 9, 2012, at 8:38 AM, Blaz Simcic wrote: Dear all, I would like to generate 500 matrices of 20 numbers from standard normal distribution with names x1,x2,x3,….x500. I tried with loop for, but I don’t know how to name matices : for (i in 1:500) { x[[i]] - matrix(rnorm(20), 4) } Any suggestion? Don't do it that way. Create an array-object instead. x - array(NA, dim=c(4,5,500) ) x[] - rnorm(4*5*500) str(x) num [1:4, 1:5, 1:500] -0.721 0.896 -1.432 0.386 -1.111 ... dimnames(x)[[3]] - paste(x, 1:500, sep=) str(x) num [1:4, 1:5, 1:500] -0.721 0.896 -1.432 0.386 -1.111 ... - attr(*, dimnames)=List of 3 ..$ : NULL ..$ : NULL ..$ : chr [1:500] x1 x2 x3 x4 ... -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to specify the Oy axis length
Perhaps the ylim argument to plot() Michael On Thu, Feb 9, 2012 at 9:04 AM, ikuzar raz...@hotmail.fr wrote: Hi, I'd like to plot a point: plot(1,2) I use RStudio. I'd like to know how to specify the length of Oy axis. I 'd like to put 20 as length of Oy but , by default, I 've got 2.5. Thanks for your help -- View this message in context: http://r.789695.n4.nabble.com/how-to-specify-the-Oy-axis-length-tp4372937p4372937.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GLM - guess the distribution of the response variable
On Feb 9, 2012, at 7:32 AM, wo...@posteo.de wrote: Dear all, I have question regarding GLMs: I have a discrete response variable and a continuous explaining variable. Like this: http://www.myimg.de/?img=example1db0f.jpg I want to use a GLM to investigate. I have to specify the familiy of the distribution of the response variable - or, maybe more precise, the family of the distribution of the residuals of the response variable (the mean of the response should be explained by the deterministic model part of the GLM, as far as I understood). How do I know the distribution family of the response? OK - its needs to be a discrete distribution, but there are Poisson, Binomial, Neg. Binomial. Is this homework, ... in which case please read the Posting Guide. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Wriritng to a CSV file
On Feb 8, 2012, at 21:56 , David Winsemius wrote: On Feb 8, 2012, at 2:29 PM, Ron Michael wrote: Okay, so I understood that appending can only happen row-wise. Therefore I tried with following code: write.csv(matrix(1:5, 1), dat.csv) write.csv(matrix(1:5, 1), dat.csv, append = TRUE) Warning message: In write.csv(matrix(1:5, 1), dat.csv, append = TRUE) : attempt to set 'append' ignored It is destroying my previous file. Where I have done wrong? Failed to read the help page. `write.csv` has some of its setting hard coded and will prevent you from changing them. Exactly. Note, however, that write.csv is really just write.table with a particular set of arguments. Nothing is keeping you from using a similar set of arguments with append=TRUE in an explicit write.table() call. It is of course debatable whether the behavior of write.csv is undue patronizing, but as I understand it, the rationale is that you can guarantee that write.csv creates a proper CSV file, but once you start appending, multiple things can go wrong: Forgetting to omit headers, different number of columns, different column types, etc. It is not realistic to have write.csv(...append=TRUE) make the necessary checks, and as it can't be sure to write a properly formed CSV, it just won't do it, period. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] subset select=variable with a list of names
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Francisco Sent: Thursday, February 09, 2012 4:52 AM To: r-help@r-project.org Subject: [R] subset select=variable with a list of names Hello, I would like to make a function which extracts a subset, from a dataset, with only the columns that I want (specifying their names). For example, having this matrix: mydata-matrix(c(22,1,3,2001,24,5,7,2002,26,7,8,2002,28,5,7,2003), byrow=TRUE, ncol=4, dimnames=list(c(1,2,3,4), c(age,day,month,year))) mydata age day month year 1 22 1 3 2001 2 24 5 7 2002 3 26 7 8 2002 4 28 5 7 2003 I would like to create a function like: x-function(names) {subset(mydata, select=names) } So I can choose every time which columns select, i.e. when I call: x(age,day) it would returns: age day 1 22 1 2 24 5 3 26 7 4 28 5 Obviously it is not working, and I don't know how to do to fix it. Do you have any suggestion? Thank you very much Given your function definition, the function call needs to be x(c(age,day)) Whether it is good form to write a function like this, I will leave to others to comment. Hope this is helpful, Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Outlier removal techniques
I wonder why it is still standard practice in some circles to search for outliers as opposed to using robust/resistent methods. Here is a great paper with a scientific approach to outliers: @Article{fin06cal, author = {Finney, David J.}, title ={Calibration guidelines challenge outlier practices}, journal = The American Statistician, year = 2006, volume = 60, pages ={309-313}, annote = {anticoagulant therapy;bias;causation;ethics;objectivity;outliers;guidelines for treatment of outliers;overview of types of outliers;letter to the editor and reply 61:187 May 2007} } Frank Rich Shepard wrote On Thu, 9 Feb 2012, mails wrote: I need to analyse a data matrix with dimensions of 30x100. Before analysing the data there is, however, a need to remove outliers from the data. I read quite a lot about outlier removal already and I think the most common technique for that seems to be Principal Component Analysis (PCA). However, I think that these technqiue is quite subjective. When is an outlier an outlier? I uploaded an example PCA plot here: Those more expert than I will certainly provide answers. What I do will new data is create box-and-whisker plots (I use the lattice package) which defines outliers as those data beyond 1.5x the first or third quartile values. No one but you can answer your question on when an outlier is an outlier. It depends on your data set and the context of the data. For example, a water chemistry value that far exceeds a regulartory threshold might be meaningful in the context of a one-off excursion (in which case it's not an outlier but a real data point) or it might result from a handling, instrumentation, or analytical error (in which case toss it as an outlier). Rich __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Outlier-removal-techniques-tp4372652p4373592.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] AUC, C-index and p-value of Wilcoxon
On Thu, Feb 09, 2012 at 02:05:08PM +, linda Porz wrote: Dear all, I am using the ROCR library to compute the AUC and also the Hmisc library to compute the C-index of a predictor and a group variable. The results of AUC and C-index are similar and give a value of about 0.57. The Wilcoxon p-value is 0.001! Why the AUC is showing small value and the p-value is high significant? The AUC is based on Wilcoxon calculation? Hi. There is no direct relationship between AUC and p-value of Wilcoxon. AUC measures, how well two distributions may be separated. The p-value measures, to which extent it is clear that the distributions are different. The test is significant, even if it is very clear that there is a tiny difference between the two distributions. This may happen for a large sample size. If the sample size increases, then AUC for separating variables X, Y converges to P(X Y), which may be 0.57 and still, the p-value may converge to 0. Hope this helps. Can you send a concrete numerical example? Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] object size versus saved size?
Posting under new subject. I’ve a question on saving a earth object to disk. Say m is the model then print(object.size(m), units=”Mb”) 163.8 Mb and save it off save(m, “tmp.rda”) and “tmp.rda” is roughly 171Mb file. I would like this to be much smaller. So I try m$bx-NULL then print(object.size(m), units=”Mb”) 14.9 Mb Great! however when I save it off save(m, “tmp.rda”) and “tmp.rda” is roughly 171Mb file – the same size. I do not care about the size of the file per se, I care about loading a smaller object at a future date. -- View this message in context: http://r.789695.n4.nabble.com/object-size-versus-saved-size-tp4373478p4373478.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Grouping together a time variable
I have the following variable, time, which is a character variable and it's structured as follows. head(as.character(dat$time), 30) [1] 00:00:01 00:00:16 00:00:24 00:00:25 00:00:25 00:00:40 00:01:50 00:01:54 00:02:33 00:02:43 00:03:22 [12] 00:03:31 00:03:41 00:03:42 00:03:43 00:04:04 00:05:09 00:05:17 00:05:19 00:05:21 00:05:22 00:05:22 [23] 00:05:28 00:05:44 00:05:54 00:06:54 00:06:54 00:07:10 00:08:15 00:08:26 What I am trying to do is group the data into one hour increment. So 5:01-6:00am, 6:01-7:00am, 7:01-8:00a, and so forth. However, I'm not sure if there's a simple route to do this in R or how to do it. Can anyone point me in the right direction? -- *Abraham Mathew Statistical Analyst www.amathew.com 720-648-0108 @abmathewks* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rearanging Data
Hi, This worked great: sub - subset(Claims, Year==Y1) Thanks for your help Kevin On Thu, Feb 9, 2012 at 6:12 AM, David Winsemius dwinsem...@comcast.netwrote: On Feb 8, 2012, at 9:48 PM, kevin123 wrote: Hi, This is only a small portion of the Data i am working on I want to make a subset of this data set( Data Set=Claims) MemberID ProviderID Vendor PCPYear Specialty 1 422869788013252 172193 37796 Y1Surgery 2 979032483316066 726296 5300Y3Internal 3 27594272997752 140343 91972Y1 Internal 4 735705597053364 240043 70119 Y3 Laboratory I want to put all the rows containing Y1 into a subset. I tried this but it does not work sub - subset(Claims, Year=Y1) You (as well , apparently, as Tal) don't seem to have learned that = is an assignment function and the needed Comparison operator is == ?Comparison Try instead: sub - subset(Claims, Year==Y1) -- David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] To write a function for getting the node numbers in rpart analysis..
## To read .csv file(data) and performing rpart ## # library(Hmisc) library(rpart) read.csv(D:/Training/RRE/Revolution(My Projects)/Project1/RProject(pmml from SPSS)/telco_churn.csv, header = TRUE, sep = ,, quote=\, dec=.) # help(read.csv) Telco- read.table(file = D:/Training/RRE/Revolution(My Projects)/Project1/RProject(pmml from SPSS)/telco_churn.csv, header = TRUE, sep = ,) names(Telco) ## ## ## To make factor variable ## Telco$region-factor(Telco$region,levels=c(1,2,3,4,5),labels=c(Zone1,Zone2,Zone3,Zone4,Zone5)) Telco$ed-factor(Telco$ed,levels=c(1,2,3,4,5),labels=c(DidNotHighSchool,HighSchoolDegree,SomeCollege, CollegeDegree,PostUndergraduateDegree)) Telco$marital-factor(Telco$marital,,levels=c(0,1),labels=c(Married,Un-married)) Telco$churn-factor(Telco$churn,levels=c(0,1),labels=c(NO,YES)) ## ## ## To perform rpart ## Telco.rpart - rpart(churn ~ region+ed+marital+tenure+age+address+income,data=Telco,method=class) plot(Telco.rpart,compress=FALSE,uniform=TRUE) text(Telco.rpart,use.n = TRUE, cex = .75) ## ## ## To predict rpart ## TelcoPredicted - predict(Telco.rpart,type=class) # Prediction/Scoring ## ## In the above computation, I would prefer to get the list of node numbers parallely for the each of the predictions of TelcoPredicted. Please help me writing a function for this. -- View this message in context: http://r.789695.n4.nabble.com/To-write-a-function-for-getting-the-node-numbers-in-rpart-analysis-tp4372508p4372508.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Row-wise kronecker product with Matrix package
Maybe one of these will improve: help.search('kronecker') ... spam::kronecker Kronecker Products on Sparse Matrices spam::spam.classClass spam base::kronecker Kronecker Products on Arrays Matrix::kronecker-methods Methods for Function 'kronecker()' in Package 'Matrix' I doubt it because they will calculate the full Kronecker prod and it will be up to you to index the rows, but you never know... system.time(A[, rep(seq(ncol(A)), each = ncol(B))] * B[,rep(seq(ncol(B)),ncol(A))]) user system elapsed 0.016 0.000 0.019 system.time(kronecker(A,B)[c(1,4),]) user system elapsed 0.008 0.000 0.008 system.time(spam::kronecker(A,B)[c(1,4),]) user system elapsed 0.008 0.000 0.009 Cheers On Thu, Feb 9, 2012 at 9:38 AM, Ally a.rushwo...@stats.gla.ac.uk wrote: I'm trying to calculate the row-wise kronecker product A \Box B of two sparse matrices A and B, and am struggling to find a quick way to do this that takes advantage of sparseness. I thought a good idea would be to use rep to construct 2 matrices of the same dimension of the end product, and multiply these two together: library(Matrix) A-Matrix(c(1,0,0,0,0,1,2,0), 2, 4) B-Matrix(c(2,5,0,0,0,1,0,0,0,0), 2, 5) A[, rep(seq(ncol(A)), each = ncol(B))] * B[, rep(seq(ncol(B)),ncol(A))] This works, but for much larger problems is slow (compared to keeping A and B dense). I was wondering why this happens, and whether there might be a way around it? Thanks in advance for any advice, Alastair -- View this message in context: http://r.789695.n4.nabble.com/Row-wise-kronecker-product-with-Matrix-package-tp4373437p4373437.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GLM - guess the distribution of the response variable
Below. -- Bert On Thu, Feb 9, 2012 at 9:06 AM, David Winsemius dwinsem...@comcast.net wrote: On Feb 9, 2012, at 7:32 AM, wo...@posteo.de wrote: Dear all, I have question regarding GLMs: I have a discrete response variable and a continuous explaining variable. Like this: http://www.myimg.de/?img=example1db0f.jpg I want to use a GLM to investigate. I have to specify the familiy of the distribution of the response variable - or, maybe more precise, the family of the distribution of the residuals of the response variable (the mean of the response should be explained by the deterministic model part of the GLM, as far as I understood). How do I know the distribution family of the response? OK - its needs to be a discrete distribution, but there are Poisson, Binomial, Neg. Binomial. Is this homework, ... in which case please read the Posting Guide. -- and if not, please do your homework by doing some reading and/or consulting your local statistician. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] loading packages in a function
Hello, I sourced successfully my function(). I need to load libraries, so I wrote this inside my function(): function() { #load needed library library(plyr) library(car) /... } It is OK, but I have this on my invite command when running the function: function() Loading required package: MASS Loading required package: nnet YOU DID A GOOD JOB,SEND EMAIL Last line is the supposed result of my function, so it ok. How to get rid of the first two lines, only for esthetic purpose? TY for your time. Arnaud Gaboury A2CT2 Ltd. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Setting up infile for R CMD BATCH
Hi Elai, yes, the approach works out pretty well. Thanks a lot for spending time on this and for the great help! Gang On Wed, Feb 8, 2012 at 5:43 PM, ilai ke...@math.montana.edu wrote: I'm going to declare this SOLVED. Yes, if you don't want a separate script for batch, you will need to modify the original script so it either readline or skips it. Here is an example: # Save in file myTest.R # Add this local function to the beginning of your original program and replace all the readline prompts with myreadline. # The new function takes on two arguments: # what: the original object (in your example it was type-...) # prompt: The same string from readline # All it is doing is searching for Answers.R, sourcing if available or prompting if not. myreadline - function(what,prompt){ ans - length(grep('Answers.R',list.files(),fixed=T))0 # add a warning for multiple files if(ans) source('Answers.R') else{ ret - as.integer(readline(prompt)) assign(what,ret,1) } } # here is an interactive program to square only negative values print(x - rnorm(1)) myreadline('type','x0 ? 1:T,2:F \n') print(x^type) ### END myTest.R Running myTest interactivly: source('myTest.R') [1] -0.3712215 x0 ? 1:T,2:F 2 [1] 0.1378054 # -.37^2 source('myTest.R') [1] 0.3160747 x0 ? 1:T,2:F 1 [1] 0.3160747 # .316^1 # Create a list of answers dump('type',file='Answers.R') # run again this time with answers available source('myTest.R') [1] -1.088665 # skips prompt [1] -1.088665 # -1.088^1 (type in Answer.R ==1) # Now you can also run as batch $ R CMD BATCH myTest.R out.R $ cat out.R # ... print(x - rnorm(1)) [1] -1.248487 myreadline('type','x0 ? 1:T,2:F \n') print(x^type) [1] -1.248487 That's it. Only thing is to keep the file names in the working directory straight. Enjoy On Wed, Feb 8, 2012 at 12:39 PM, Gang Chen gangch...@gmail.com wrote: Sorry Elai for the confusions. Let me try to reframe my predicament. The main program myTest.R has been written in interactive mode with many readline() lines embedded. Suppose a user has already run the program once before in interactive mode with all the answers saved in a text file called answer.R. Now s/he does not want to go through the interactive again because it's a tedious process, and would like to run the program but in batch mode with answer.R. And that's why I tried the following which didn't pan out: R CMD BATCH answer.R output.Rout Of couse I could rewrite a separate program specifically for batch mode as you suggested previously with eval(), for example. However, is there an approach to keeping the original program so that the user could run both interactive and batch mode? That probably requires modifying the readline() part, but how? Thanks, Gang On Wed, Feb 8, 2012 at 2:08 PM, ilai ke...@math.montana.edu wrote: Gang, Maybe someone here has a different take on things. I'm afraid I have no more insights on this unless you explain exactly what you are trying to achieve, or more importantly why? That may help understand what the problem really is. Do you want to save an interactive session for future runs? then ?save.image and all your answers are in the environment. In this case consider putting an if(!exists('type') | length(type)1 | is.na(type)) before type- readline(...) in your script so type wouldn't be overwritten in subsequent runs. If your goal is to batch evaluate multiple answer files from users (why else would you ask questions with readline?), then you should have enough to go on with my answer and the examples in ?eval. Elai On Wed, Feb 8, 2012 at 9:04 AM, Gang Chen gangch...@gmail.com wrote: Hi Elai, Thanks a lot for the suggestions! I really appreciate it... Your suggestion of using eval() and creating those answers in a list would work, but there is no alternative to readline() with which I could read the input in batch mode? I'm asking this because I'd like to have the program work in both interactive and batch mode. Thanks again, Gang On Wed, Feb 8, 2012 at 12:50 AM, ilai ke...@math.montana.edu wrote: Ahh, I think I'm getting it now. Well, readlines() is not going to work for you. The help file ?readline clearly states In non-interactive use the result is as if the response was RETURN and the value is ‘’. The implication is you cannot use it to insert different answers as if you were really there. How about using eval() instead? You will need to make the answers a named list (or just assigned objects). test - expression({ if(a2) print('+') else print('I got more') b - b+3 # reassign b in the environment print(b) print(c) d^2 }) dump('test',file='myTest.R') ; rm(test) # make the answers.R file: a=5 ; b=2 ; c=2 ; d=3 source(myTest.R) eval(test) # Now, from the terminal R CMD BATCH answers.R out.R # And here is my $ cat out.R ... flushed a=5 ; b=2 ; c=2 ; d=3
Re: [R] Outlier removal techniques
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- project.org] On Behalf Of Frank Harrell Sent: Thursday, February 09, 2012 9:19 AM To: r-help@r-project.org Subject: Re: [R] Outlier removal techniques I wonder why it is still standard practice in some circles to search for outliers as opposed to using robust/resistent methods. Here is a great paper with a scientific approach to outliers: @Article{fin06cal, author = {Finney, David J.}, title = {Calibration guidelines challenge outlier practices}, journal =The American Statistician, year = 2006, volume = 60, pages = {309-313}, annote = {anticoagulant therapy;bias;causation;ethics;objectivity;outliers;guidelines for treatment of outliers;overview of types of outliers;letter to the editor and reply 61:187 May 2007} } Frank Rich Shepard wrote On Thu, 9 Feb 2012, mails wrote: I need to analyse a data matrix with dimensions of 30x100. Before analysing the data there is, however, a need to remove outliers from the data. I read quite a lot about outlier removal already and I think the most common technique for that seems to be Principal Component Analysis (PCA). However, I think that these technqiue is quite subjective. When is an outlier an outlier? I uploaded an example PCA plot here: Those more expert than I will certainly provide answers. What I do will new data is create box-and-whisker plots (I use the lattice package) which defines outliers as those data beyond 1.5x the first or third quartile values. No one but you can answer your question on when an outlier is an outlier. It depends on your data set and the context of the data. For example, a water chemistry value that far exceeds a regulartory threshold might be meaningful in the context of a one-off excursion (in which case it's not an outlier but a real data point) or it might result from a handling, instrumentation, or analytical error (in which case toss it as an outlier). Rich __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. I would echo what Frank says. I would also add that in the absence of demonstrated measurement/recording errors, there is good reason to explain the extreme values as well as the typical values. If a model can't deal with extreme values, then it may be good enough for some purposes, but it is not a complete explanation and may fail at the worst time. I would highly recommend the book The Black Swan by Nassim Nicholas Taleb (NOT the ballet story). Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fill an array by rows
I've dug around but not been able to find anything, am probably missing something obvious. How can I fill a three-dimensional (or higher dimension) array by rows instead of columns. eg new1 - array(c(1:125), c(5,5,5)) works fine for me but fills it by columns and new2 - array(c(1:125), c(5,5,5), byrow=TRUE) throws an error. Am I missing something obvious? I also tried transposing the 3-d array but that didn't work either. TIA Jim -- Dr. Jim Maas University of East Anglia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generate matrices
Inline below. -- Bert On Thu, Feb 9, 2012 at 8:59 AM, David Winsemius dwinsem...@comcast.net wrote: On Feb 9, 2012, at 8:38 AM, Blaz Simcic wrote: Dear all, I would like to generate 500 matrices of 20 numbers from standard normal distribution with names x1,x2,x3,….x500. I tried with loop for, but I don’t know how to name matices : for (i in 1:500) { x[[i]] - matrix(rnorm(20), 4) } Any suggestion? Don't do it that way. Create an array-object instead. x - array(NA, dim=c(4,5,500) ) x[] - rnorm(4*5*500) Inefficient. Do it in one go: x - array(rnorm(20*500), dim = c(4,5,500)) str(x) num [1:4, 1:5, 1:500] -0.721 0.896 -1.432 0.386 -1.111 ... dimnames(x)[[3]] - paste(x, 1:500, sep=) str(x) num [1:4, 1:5, 1:500] -0.721 0.896 -1.432 0.386 -1.111 ... - attr(*, dimnames)=List of 3 ..$ : NULL ..$ : NULL ..$ : chr [1:500] x1 x2 x3 x4 ... -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fill an array by rows
Use aperm on the output of array: aperm(array(1:8, dim=c(2,2,2)), perm=c(2,1,3)) , , 1 [,1] [,2] [1,]12 [2,]34 , , 2 [,1] [,2] [1,]56 [2,]78 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jim Maas Sent: Thursday, February 09, 2012 9:47 AM To: r-help@r-project.org Subject: [R] fill an array by rows I've dug around but not been able to find anything, am probably missing something obvious. How can I fill a three-dimensional (or higher dimension) array by rows instead of columns. eg new1 - array(c(1:125), c(5,5,5)) works fine for me but fills it by columns and new2 - array(c(1:125), c(5,5,5), byrow=TRUE) throws an error. Am I missing something obvious? I also tried transposing the 3-d array but that didn't work either. TIA Jim -- Dr. Jim Maas University of East Anglia __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fill an array by rows
On Feb 9, 2012, at 12:47 PM, Jim Maas wrote: I've dug around but not been able to find anything, am probably missing something obvious. How can I fill a three-dimensional (or higher dimension) array by rows instead of columns. eg new1 - array(c(1:125), c(5,5,5)) works fine for me but fills it by columns and new2 - array(c(1:125), c(5,5,5), byrow=TRUE) throws an error. Am I missing something obvious? I also tried transposing the 3-d array but that didn't work either. You might want to look at aperm: Perhaps: new2 - aperm(new1, c(2,1,3)) -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] complex subscript/superscript on axis labels
Hi All, I am having trouble getting a complex subscript to work. I'm sure it's possible. Here is what I have: ylab=expression(paste(log ,L[peak], [erg ,s^{-1},])), I would like to have the subscript read peak,gamma where the gamma would be the greek symbol. I do want the comma to show as well. Thanks, EM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Version control (git, mercurial) for R packages
On Thu, Feb 9, 2012 at 1:46 AM, Gregory Jefferis jeffe...@gmail.com wrote: Dear Peter, Trying to respond to your original question Thanks for staying on thread :) I have a git repositiory in the root of my packages: ie package-foo/.git package-foo/R package-foo/inst Running make check in place warns about the presence of binary files, but that is the only problem I have found – have you come across another problem. Same warning here. Which made me think that R CMD build will probably tar up the git repository along with the package, which is not something I would like to do, and which CRAN people most likely won't tolerate in a package on CRAN. I have also switched to using Hadley Wickham's devtools (+ roxygen for docs). Both highly recommended. devtools can build a package and check it in a temporary location – this is safer than checking in place and avoids that warning. I'll check that out. You can see layouts for Hadley's packages on github e.g. : https://github.com/hadley/devtools Best wishes, Greg. Thanks for the suggestions. It occurred to me that I can write a short script that will copy the package minus the git repository into a temporary location and check or build it there, but I'll look into devtools as well. Peter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] complex subscript/superscript on axis labels
On Feb 9, 2012, at 12:53 PM, Eileen Meyer wrote: Hi All, I am having trouble getting a complex subscript to work. I'm sure it's possible. Here is what I have: ylab=expression(paste(log ,L[peak], [erg ,s^{-1},])), I would like to have the subscript read peak,gamma where the gamma would be the greek symbol. I do want the comma to show as well. (I do not see a gamma so using your verbal dsecription. The rest of your intent is muddied by the multiple subscripts. It's always difficult to know what people are thinking when they construct plotmath expressions and do not describe their full intent in English. Here's my guess: ylab=expression(log~L[peak*,*gamma][~erg~s^-1]) plot(1,1,ylab=ylab) The paste function is rarely needed and generally confuses the intent of the author. Using ~ and * generally results in more comprehensible expressions. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Version control (git, mercurial) for R packages
Same warning here. Which made me think that R CMD build will probably tar up the git repository along with the package, which is not something I would like to do, and which CRAN people most likely won't tolerate in a package on CRAN. It doesn't. And you can always use .Rbuildignore to ignore other files. I have also switched to using Hadley Wickham's devtools (+ roxygen for docs). Both highly recommended. devtools can build a package and check it in a temporary location – this is safer than checking in place and avoids that warning. I'll check that out. Yes, I strongly advise building and checking, or building or installing. From what I can tell, this is what R-core does, and it makes life much easier. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loading packages in a function
Perhaps the longest function in base R I know of: suppressPackageStartupMessages e.g., suppressPackageStartupMessages(library(zoo)) Michael On Thu, Feb 9, 2012 at 12:44 PM, Arnaud Gaboury arnaud.gabo...@a2ct2.com wrote: Hello, I sourced successfully my function(). I need to load libraries, so I wrote this inside my function(): function() { #load needed library library(plyr) library(car) /... } It is OK, but I have this on my invite command when running the function: function() Loading required package: MASS Loading required package: nnet YOU DID A GOOD JOB,SEND EMAIL Last line is the supposed result of my function, so it ok. How to get rid of the first two lines, only for esthetic purpose? TY for your time. Arnaud Gaboury A2CT2 Ltd. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] complex subscript/superscript on axis labels
On Feb 9, 2012, at 1:20 PM, David Winsemius wrote: On Feb 9, 2012, at 12:53 PM, Eileen Meyer wrote: Hi All, I am having trouble getting a complex subscript to work. I'm sure it's possible. Here is what I have: ylab=expression(paste(log ,L[peak], [erg ,s^{-1},])), I would like to have the subscript read peak,gamma where the gamma would be the greek symbol. I do want the comma to show as well. (I do not see a gamma so using your verbal dsecription. The rest of your intent is muddied by the multiple subscripts. It's always difficult to know what people are thinking when they construct plotmath expressions and do not describe their full intent in English. Here's my guess: ylab=expression(log~L[peak*,*gamma][~erg~s^-1]) plot(1,1,ylab=ylab) The paste function is rarely needed and generally confuses the intent of the author. Using ~ and * generally results in more comprehensible expressions. And it might look more physics-al if you had a cdot between the units: [~erg %.% s^-1] -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Version control (git, mercurial) for R packages
I'm exploring using a version control system to keep better track of changes to the packages I maintain. I'm leaning towards git (although mercurial also looks good) but am not sure what is the best way to set up the repository. It seems I can't set the repository directly within the R package main directory, since it will be incompatible with the standard package structure. I can set the repository in the directory that contains my R packages, but then I have a single repository for all of my packages, which is not ideal. Git is nice - but if you ar looking for simplicity in usage for R packages, I guess r-forge and rforge are the easiest to use. I think git + github is substantially easier to use, especially if you want to incorporate patches from the community. But I would be interested in the workflow when using github as the main VCS. The thing that has made this really successful for me is the install_github function in devtools. Then it's easy for people to try out development versions: install.packages(devtools) library(devtools) install_github(myrepo, myusername) This requires a working R development environment but thanks to the hard work of Simon Urbanek, Duncan Murdoch, Brian Ripley and others, this is a one-install process on all major platforms. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Grouping together a time variable
Perhaps cut.POSIXt (which is a generic so you can just call cut) depending on the unstated form of your time object. Michael On Thu, Feb 9, 2012 at 12:15 PM, Abraham Mathew abmathe...@gmail.com wrote: I have the following variable, time, which is a character variable and it's structured as follows. head(as.character(dat$time), 30) [1] 00:00:01 00:00:16 00:00:24 00:00:25 00:00:25 00:00:40 00:01:50 00:01:54 00:02:33 00:02:43 00:03:22 [12] 00:03:31 00:03:41 00:03:42 00:03:43 00:04:04 00:05:09 00:05:17 00:05:19 00:05:21 00:05:22 00:05:22 [23] 00:05:28 00:05:44 00:05:54 00:06:54 00:06:54 00:07:10 00:08:15 00:08:26 What I am trying to do is group the data into one hour increment. So 5:01-6:00am, 6:01-7:00am, 7:01-8:00a, and so forth. However, I'm not sure if there's a simple route to do this in R or how to do it. Can anyone point me in the right direction? -- *Abraham Mathew Statistical Analyst www.amathew.com 720-648-0108 @abmathewks* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rearanging Data
Try sub - subset(Claims, Year==Y1) In R, the equality test is performed by two equal signs, not one. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Wed, 8 Feb 2012, kevin123 wrote: Hi, This is only a small portion of the Data i am working on I want to make a subset of this data set( Data Set=Claims) MemberID ProviderID Vendor PCPYear Specialty 1 422869788013252 172193 37796 Y1Surgery 2 979032483316066 726296 5300Y3Internal 3 27594272997752 140343 91972Y1 Internal 4 735705597053364 240043 70119 Y3 Laboratory I want to put all the rows containing Y1 into a subset. I tried this but it does not work sub - subset(Claims, Year=Y1) I would greatly appreciate any help Kevin -- View this message in context: http://r.789695.n4.nabble.com/Rearanging-Data-tp4371717p4371717.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] AUC, C-index and p-value of Wilcoxon
On Thu, Feb 09, 2012 at 06:33:09PM +0100, Petr Savicky wrote: On Thu, Feb 09, 2012 at 02:05:08PM +, linda Porz wrote: Dear all, I am using the ROCR library to compute the AUC and also the Hmisc library to compute the C-index of a predictor and a group variable. The results of AUC and C-index are similar and give a value of about 0.57. The Wilcoxon p-value is 0.001! Why the AUC is showing small value and the p-value is high significant? The AUC is based on Wilcoxon calculation? Hi. There is no direct relationship between AUC and p-value of Wilcoxon. AUC measures, how well two distributions may be separated. The p-value measures, to which extent it is clear that the distributions are different. The test is significant, even if it is very clear that there is a tiny difference between the two distributions. This may happen for a large sample size. If the sample size increases, then AUC for separating variables X, Y converges to P(X Y), which may be 0.57 and still, the p-value may converge to 0. This effect may be demonstrated as follows. Try n - 50 for (i in 1:10) { x - rnorm(n) y - rnorm(n) + 0.25 out - wilcox.test(x, y, paired=FALSE) AUC - 1 - out$statistic/n^2 cat(AUC, out$p.value, \n) } The result may be 0.6132 0.05147433 0.5396 0.4971117 0.54 0.492754 0.5444 0.446199 0.5528 0.3646515 0.5692 0.2343673 0.6168 0.044479 0.5748 0.1985487 0.5152 0.796007 0.5528 0.3646515 The p-values are moderately significant or not significant. Try the same with n - 1000 0.564124 6.84383e-07 0.572522 1.953299e-08 0.575895 4.170283e-09 0.5651 4.623185e-07 0.584841 5.029007e-11 0.567354 1.829507e-07 0.601411 4.053585e-15 0.608903 3.356404e-17 0.583801 8.610077e-11 0.570637 4.497502e-08 The AUC estimates have lower variance, but otherwise have similar values. However, the p-values are now small, since the sample is larger. With a larger sample, the test is more sensitive and detects even a small difference as a significant one. Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loading packages in a function
TY for your answer, but here what i did : #load needed lybrary suppressPackageStartupMessages(library(plyr)) suppressPackageStartupMessages(library(car)) library(plyr) library(car) But I still get : Loading required package: MASS Loading required package: nnet Arnaud Gaboury A2CT2 Ltd. -Original Message- From: R. Michael Weylandt [mailto:michael.weyla...@gmail.com] Sent: jeudi 9 février 2012 19:26 To: Arnaud Gaboury Cc: r-help@r-project.org Subject: Re: [R] loading packages in a function Perhaps the longest function in base R I know of: suppressPackageStartupMessages e.g., suppressPackageStartupMessages(library(zoo)) Michael On Thu, Feb 9, 2012 at 12:44 PM, Arnaud Gaboury arnaud.gabo...@a2ct2.com wrote: Hello, I sourced successfully my function(). I need to load libraries, so I wrote this inside my function(): function() { #load needed library library(plyr) library(car) /... } It is OK, but I have this on my invite command when running the function: function() Loading required package: MASS Loading required package: nnet YOU DID A GOOD JOB,SEND EMAIL Last line is the supposed result of my function, so it ok. How to get rid of the first two lines, only for esthetic purpose? TY for your time. Arnaud Gaboury A2CT2 Ltd. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Arial font in eps figures in R
So close. The first version should work if you switch the order of the postscript() and par() functions: Arial - Type1Font(family=Arial, metrics=c(ArialMT.afm, arial-BoldMT.afm, Arial-ItalicMT.afm, Arial-BoldItalicMT.afm)) postscriptFonts(Arial=Arial) par(family=Arial) postscript(testArial.eps, horizontal=F, onefile=F, width=4, height=4) plot(1:10, 1:10) dev.off() -- David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77843-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Sharon Kuhlmann-B Sent: Thursday, February 09, 2012 4:53 AM To: r-help@R-project.org Subject: [R] Arial font in eps figures in R Hi, I am trying to create a graph using Arial font (as required by PLoS One). I have read probably all posts in R-help on this topic, as wells as R-news 2006. The code I have been trying is following: Arial - Type1Font(family=Arial, metrics=c(ArialMT.afm, arial-BoldMT.afm, Arial-ItalicMT.afm, Arial-BoldItalicMT.afm)) postscriptFonts(Arial=Arial) postscript(testArial.eps, horizontal=F, onefile=F, width=4, height=4) par(family=Arial) plot(1:10, 1:10) dev.off() but getting the following error message: Error in axis(side = side, at = at, labels = labels, ...) : family 'Arial' not included in PostScript device Alternatively I've also tried: Arial - Type1Font(family=Arial, metrics=c(ArialMT.afm, arial-BoldMT.afm, Arial-ItalicMT.afm, Arial-BoldItalicMT.afm)) postscriptFonts(Arial=Arial) postscript(testArial.eps, horizontal=F, onefile=F, width=4, height=4, family=Arial) plot(1:10, 1:10) dev.off() which resulted in: Error in postscript(testArial.eps, horizontal = F, onefile = F, width = 4, : Failed to initialise default PostScript font The *.afm files were downloaded from http://www.koders.com/, since I wasn't able to run ttf2afm or ttf2pt1 on my computer. The files are saved under ../R/library/grDevices/afm. I have also tried saving the files with LF line endings (instead of CRLF) as indicated in README under /grDevices/afm, and zipping them. That didn't make a difference. I am currently running R-2.14.1 on Windows7. If I run postscriptFonts(), the Arial font is included in the list, very much as other types of fonts. I've seen that some R packages are now using arial as default (e.g. googleVis and R2PPT) but I cannot see how it is done. Thanks in advance for any suggestions. /Sharon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.