date:20120209


Thanks for nice explanation. 
Unfortunately, matrix in my question is exactly similar to the one I posted 
earlier : 

mat - matrix(rnorm(700), ncol=5, dimnames=list( paste(f, c(1:140),sep=_), 
c(A, B, C, D, E))) 


Question here is which of the 140 characteristics (i.e. f_1...f_140) 
distinguish the most between the five plant
species.

Is it true that this matrix can't be regressed with factor responses (species) 
? If so, what alternatives can be used ?

 
-  Vickie



 From: j...@mcmaster.ca

 To: is...@live.com

 CC: r-help@r-project.org

 Subject: RE: [R] dropterm in MANOVA for MLM objects

 Date: Wed, 8 Feb 2012 20:37:37 -0500



 Dear Vickie,



 I'm afraid that the test problem that you've constructed makes no sense, and

 doesn't correspond to the problem that you initially described, in which a

 matrix of presumably 5 responses for presumably 140 observations is

 regressed on 6 predictors. You regressed your randomly generated matrix of 5

 responses and 140 observations on a factor constructed from the distinct 140

 observation names. That factor has 140 levels, and so the model uses 140 df,

 all the df in the data. It's therefore not surprising that the error SSP

 matrix has 0 df, which is exactly what Anova.mlm (actually,

 linearHypothesis.mlm, which it calls) tells you.



 The remark that you found about univariate tests that you apparently found

 on-line concerns repeated-measures designs and is not relevant to your data.

 And you can't do a univariate ANOVA when there's 0 df for error in any

 event.



 Here's a proper simulation of the kind of data that I think you have:



  set.seed(12345)

  E - matrix(rnorm(140*5), ncol=5)

  X - matrix(rnorm(140*6), ncol=6)

  Beta - matrix(runif(6*5), ncol=5)

  Y - X %*% Beta + E

  colnames(Y) - c(A, B, C, D, E)

  colnames(X) - c(syct, mmin, mmax, cach, chmin, chmax)

  Data - as.data.frame(cbind(Y, X))

  mod - lm(cbind(A, B, C, D, E) ~ syct + mmin + mmax + cach + chmin +

 chmax, data=Data)

  Anova(mod)



 Type II MANOVA Tests: Pillai test statistic

 Df test stat approx F num Df den Df Pr(F)

 syct 1 0.41622 18.395 5 129 9.31e-14 ***

 mmin 1 0.48288 24.091 5 129  2.2e-16 ***

 mmax 1 0.62100 42.273 5 129  2.2e-16 ***

 cach 1 0.61711 41.583 5 129  2.2e-16 ***

 chmin 1 0.72547 68.180 5 129  2.2e-16 ***

 chmax 1 0.54825 31.311 5 129  2.2e-16 ***

 ---

 Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



 Best,

 John



  -Original Message-

  From: Vickie S [mailto:is...@live.com]

  Sent: February-08-12 5:53 PM

  To: j...@mcmaster.ca

  Cc: r-help@r-project.org

  Subject: RE: [R] dropterm in MANOVA for MLM objects

 

 

  Dear Prof Fox,

  I tried anova but got the following error message:

 

  mat - matrix(rnorm(700), ncol=5, dimnames=list( paste(f, c(1:140),

  sep=_), c(A, B, C, D, E))) summary(Anova(lm(cbind(A, B, C,

  D, E) ~ factor(rownames(mat)), data=as.data.frame(mat

 

  Error in summary(Anova(lm(cbind(A, B, C, D, E) ~

  factor(rownames(mat)),  :

error in evaluating the argument 'object' in selecting a method for

  function 'summary': Error in linearHypothesis.mlm(mod, hyp.matrix.2,

  SSPE = SSPE, V = V, ...) :

The error SSP matrix is apparently of deficient rank = 0  5

 

  I looked in previous forum and it seems like i have only option of

  performing the univariate test here.

 

  Therefore I used the following, but it still results in an error

  message:

  Anova(lm(cbind(A, B, C, D, E) ~ factor(rownames(mat)),

  data=as.data.frame(mat)), univariate=TRUE, multivariate=F) Error in

  linearHypothesis.mlm(mod, hyp.matrix.2, SSPE = SSPE, V = V, ...) :

The error SSP matrix is apparently of deficient rank = 0  5

 

  Any suggestions ?

 

  Thanks

  Vickie

 

 

 

  I think I am still missing some important clues here. Is it because

  the feww

 

   From: j...@mcmaster.ca

   To: is...@live.com

   CC: r-help@r-project.org

   Subject: RE: [R] dropterm in MANOVA for MLM objects

   Date: Wed, 8 Feb 2012 17:01:34 -0500

  

   Dear Vicki,

  

   I think that the Anova() function in the car package will do what

  you

   want (and will also properly handle models with more structure, such

   as interactions).

  

   Best,

   John

  

   

   John Fox

   Senator William McMaster

   Professor of Social Statistics

   Department of Sociology

   McMaster University

   Hamilton, Ontario, Canada

   http://socserv.mcmaster.ca/jfox

  

  

  

  

-Original Message-

From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-

project.org] On Behalf Of Vickie S

Sent: February-08-12 3:57 PM

To: r-help@r-project.org

Subject: [R] dropterm in MANOVA for MLM objects

   

   

Dear R fans,

I have got a difficult sounding problem.

   

For fitting a linear model using continuous response and then for

re- fitting the model after excluding

Re: [R] Version control (git, mercurial) for R packages

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 09/02/12 00:01, Peter Langfelder wrote:
 Hi all, in particular package developers,
 
 I'm exploring using a version control system to keep better track
 of changes to the packages I maintain. I'm leaning towards git
 (although mercurial also looks good) but am not sure what is the
 best way to set up the repository. It seems I can't set the
 repository directly within the R package main directory, since it
 will be incompatible with the standard package structure. I can set
 the repository in the directory that contains my R packages, but
 then I have a single repository for all of my packages, which is
 not ideal.

Git is nice - but if you ar looking for simplicity in usage for R
packages, I guess r-forge and rforge are the easiest to use.

But I would be interested in the workflow when using github as the
main VCS.

 
 Of course, I could add one extra layer of directory structure (for 
 example, if a package foo sit in directory RPackages/foo I could
 move it to directory RPackages/foo/foo and set the repository in 
 RPackages/foo) but this seems roundabout and inelegant...
 
 I'd be grateful for any suggestions/pointers.

I might be missing something here, but I am using r-forge
(http://r-forge.r-project.org/) for exactly that. There is also rforge
(http://www.rforge.net/) and some packages are on github.


My folder looks as follow:

- -local name for the package (selected folders / fiiles under VC)
  +--pkg (under VC)
  +--www (under VC)
  +--other fiolders (not under VC)

If a project gets registered at r-forge, you can check it out and you
have all the relevant info in a README file - don't know about rforge.

Hope this helps,

Cheers,

Rainer

 
 Thanks,
 
 Peter
 
 __ R-help@r-project.org
 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
 read the posting guide http://www.R-project.org/posting-guide.html 
 and provide commented, minimal, self-contained, reproducible code.


- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :   +33 - (0)9 53 10 27 44
Cell:   +33 - (0)6 85 62 59 98
Fax :   +33 - (0)9 58 10 27 44

Fax (D):+49 - (0)3 21 21 25 22 44

email:  rai...@krugs.de

Skype:  RMkrug
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk8zhzsACgkQoYgNqgF2egrKOQCfb2tpU0lvS2F394gGDv+4c+Lp
m4EAnRO76W8Rm2OM6yQ2QoWF0xQcEfBD
=3QHA
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] calling the function which is stored in a list

2012-02-09 Thread Uwe Ligges




On 09.02.2012 08:54, arunkumar wrote:

Hi

I'm storing two functions in a list

# creating two function
   function1- function(n) {
   return(sum(n))
}

function2- function(n) {
   return(mean(n))
}

#storing the function
function3 =c(function1,function2)

is it possible to call the stored function and used it ?

  x=c(10,29)
funtion3[1](x)



Yes, if you correct the typo and use list indexing with double brackets:

function3[[1]](x)

Uwe Ligges




Thanks

-
Thanks in Advance
 Arun
--
View this message in context: 
http://r.789695.n4.nabble.com/calling-the-function-which-is-stored-in-a-list-tp4372131p4372131.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Version control (git, mercurial) for R packages

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 09/02/12 00:33, Tom Roche wrote:
 
 Peter Langfelder Thu Feb 9 00:01:31 CET 2012
 I'm exploring using a version control system
 
 +1! welcome to the new millenium :-)
 
 to keep better track of changes to the [R] packages I maintain.
 I'm leaning towards git
 
 I like 'git' too, but one thing to consider (though keep in mind
 that I'm new to R, so I Could Be Wrong): R-forge
 
 https://r-forge.r-project.org/
 
 seems to be the canonical place to put R packages, and it's svn. 
 That can be finessed, e.g.,
 
 http://cameron.bracken.bz/git-with-r-forge

Nice link - but I can't follow it.

Actual Situation:
- - Package on r-forge

Want to have:
- - git (github?) repo as my main cvs and
- - branch (release_x.y.z) for new release for r-forge
- - push branch release_x.y.z to r-forge to do it's magic.

So there are two initial steps:
1) Initial import of r-forge repo to git(hub) (should be easy)
2) setup of local git repo for package (is easy)

and then github is used as VCS

When a release should be done, the following needs to be done:
- - branch on git (easy)
- - push new branch to r-forge (how?)

now how could this be done easily?

can I do this with

  git svn commit

?

Cheers,

Rainer


 
 FWIW, Tom Roche tom_ro...@pobox.com
 
 __ R-help@r-project.org
 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
 read the posting guide http://www.R-project.org/posting-guide.html 
 and provide commented, minimal, self-contained, reproducible code.


- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :   +33 - (0)9 53 10 27 44
Cell:   +33 - (0)6 85 62 59 98
Fax :   +33 - (0)9 58 10 27 44

Fax (D):+49 - (0)3 21 21 25 22 44

email:  rai...@krugs.de

Skype:  RMkrug
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk8ziuQACgkQoYgNqgF2egr/0gCfaZLJ78RK2qOhdJAllixQYlgc
Fw0An0lJnq2bdaCz+OlRDU5QYOMzWuut
=C/DE
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] width and alignment of Latex table columns

2012-02-09 Thread n.via...@libero.it

Dear all,
I'm using xtable package in order to produce Latex table. There is a way to 
specify not only the alignment but also the width of table colums??
Thanks for your attention.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Version control (git, mercurial) for R packages

2012-02-09 Thread baptiste auguie

Once upon a time r-forge had the option to sync from an external svn
repository, e.g. hosted on googlecode. I haven't seen it available for
some time, sadly. I'm sure many users would appreciate if this feature
came back with the new interface. Not sure if it could work with git
as well, though.

b.

On 9 February 2012 21:59, Rainer M Krug r.m.k...@gmail.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 09/02/12 00:33, Tom Roche wrote:

 Peter Langfelder Thu Feb 9 00:01:31 CET 2012
 I'm exploring using a version control system

 +1! welcome to the new millenium :-)

 to keep better track of changes to the [R] packages I maintain.
 I'm leaning towards git

 I like 'git' too, but one thing to consider (though keep in mind
 that I'm new to R, so I Could Be Wrong): R-forge

 https://r-forge.r-project.org/

 seems to be the canonical place to put R packages, and it's svn.
 That can be finessed, e.g.,

 http://cameron.bracken.bz/git-with-r-forge

 Nice link - but I can't follow it.

 Actual Situation:
 - - Package on r-forge

 Want to have:
 - - git (github?) repo as my main cvs and
 - - branch (release_x.y.z) for new release for r-forge
 - - push branch release_x.y.z to r-forge to do it's magic.

 So there are two initial steps:
 1) Initial import of r-forge repo to git(hub) (should be easy)
 2) setup of local git repo for package (is easy)

 and then github is used as VCS

 When a release should be done, the following needs to be done:
 - - branch on git (easy)
 - - push new branch to r-forge (how?)

 now how could this be done easily?

 can I do this with

  git svn commit

 ?

 Cheers,

 Rainer



 FWIW, Tom Roche tom_ro...@pobox.com

 __ R-help@r-project.org
 mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
 read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 - --
 Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
 Biology, UCT), Dipl. Phys. (Germany)

 Centre of Excellence for Invasion Biology
 Stellenbosch University
 South Africa

 Tel :       +33 - (0)9 53 10 27 44
 Cell:       +33 - (0)6 85 62 59 98
 Fax :       +33 - (0)9 58 10 27 44

 Fax (D):    +49 - (0)3 21 21 25 22 44

 email:      rai...@krugs.de

 Skype:      RMkrug
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.11 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

 iEYEARECAAYFAk8ziuQACgkQoYgNqgF2egr/0gCfaZLJ78RK2qOhdJAllixQYlgc
 Fw0An0lJnq2bdaCz+OlRDU5QYOMzWuut
 =C/DE
 -END PGP SIGNATURE-

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Version control (git, mercurial) for R packages

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 09/02/12 10:09, baptiste auguie wrote:
 Once upon a time r-forge had the option to sync from an external 
 svn repository, e.g. hosted on googlecode. I haven't seen it 
 available for some time, sadly. I'm sure many users would 
 appreciate if this feature came back with the new interface. Not 
 sure if it could work with git as well, though.

One option is obviously to use a local git repo and then use git svn -
but my ideal setup would be to have one development repo on
git(hub), and one package release repo on r-forge, and whenever I
decide yes, that is a release, branch it on git and push this to
r-forge.
By doing this, r-forge only contains releases, and git(hub) the
development history.

But thinking about it, a different r-forge VCS backend (git) would
also be an option.

Why was that option scraped?

Rainer


 
 b.
 
 On 9 February 2012 21:59, Rainer M Krug r.m.k...@gmail.com wrote:
 On 09/02/12 00:33, Tom Roche wrote:
 
 Peter Langfelder Thu Feb 9 00:01:31 CET 2012
 I'm exploring using a version control system
 
 +1! welcome to the new millenium :-)
 
 to keep better track of changes to the [R] packages I 
 maintain. I'm leaning towards git
 
 I like 'git' too, but one thing to consider (though keep in 
 mind that I'm new to R, so I Could Be Wrong): R-forge
 
 https://r-forge.r-project.org/
 
 seems to be the canonical place to put R packages, and
 it's svn. That can be finessed, e.g.,
 
 http://cameron.bracken.bz/git-with-r-forge
 
 Nice link - but I can't follow it.
 
 Actual Situation: - Package on r-forge
 
 Want to have: - git (github?) repo as my main cvs and - branch 
 (release_x.y.z) for new release for r-forge - push branch 
 release_x.y.z to r-forge to do it's magic.
 
 So there are two initial steps: 1) Initial import of r-forge repo 
 to git(hub) (should be easy) 2) setup of local git repo for
 package (is easy)
 
 and then github is used as VCS
 
 When a release should be done, the following needs to be done: - 
 branch on git (easy) - push new branch to r-forge (how?)
 
 now how could this be done easily?
 
 can I do this with
 
 git svn commit
 
 ?
 
 Cheers,
 
 Rainer
 
 
 
 FWIW, Tom Roche tom_ro...@pobox.com
 
 __ 
 R-help@r-project.org mailing list 
 https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read 
 the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible
 code.
 
 
 
 __ 
 R-help@r-project.org mailing list 
 https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the 
 posting guide http://www.R-project.org/posting-guide.html and 
 provide commented, minimal, self-contained, reproducible code.

- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :   +33 - (0)9 53 10 27 44
Cell:   +33 - (0)6 85 62 59 98
Fax :   +33 - (0)9 58 10 27 44

Fax (D):+49 - (0)3 21 21 25 22 44

email:  rai...@krugs.de

Skype:  RMkrug
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk8zktEACgkQoYgNqgF2egrvIACeMwR6uIyJoLXvZSb4gvo7Opzm
S7YAnimWJjfDgHUlpdn7rPRXMXmZltWf
=gqtc
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fixed effects linear model in R

2012-02-09 Thread Francesco

Dear Andrew,

Thanks for your suggestion.
I will indeed have a look at Allison's booklet...

Best,

On 7 February 2012 23:39, Andrew Miles rstuff.mi...@gmail.com wrote:
 Based on Paul Allison's booklet Fixed Effect Regression Models (2009), the
 FE model can be estimated by person-mean centering all of your variables
 (but not the outcome), and then including a random intercept for each
 person.  The centering gives you the FE model estimates, and the random
 intercept adjusts the standard errors for clustering by individuals.  Note
 that your data must be in person-period (or long) format to do this.

 In case you are unfamiliar with person-mean centering, that simply means
 taking the mean of each person's values for a given variable for all of the
 periods in your data, and then calculating a deviation from that mean at
 each time period.  For example, a person's average income over four years
 might be $50,000, but in each year their actual income would be slightly
 higher or lower than this (these would be the person-mean deviations).  In
 symbolic form, your code might look something like this:

 library(lme4)
 variable_pmcentered = variable - person_mean
 mod = lmer(outcome ~ variable_pmcentered + person_mean + other predictors +
 (1|personID))

 The advantage of this method (which Allison calls a hybrid method) over
 traditional FE models is that you get the benefits of a FE model
 (subtracting out time-invariant omitted variables) along with the benefits
 of random effect models (e.g., estimating coefficients for time-invariant
 variables, estimating interactions with time, letting intercepts and slopes
 varying randomly, etc.)  See Allison's booklet for more details on this
 method.

 Allison, Paul D. 2009. Fixed Effects Regression Models. Los Angeles, C.A.:
 Sage.


 Andrew Miles


 On Feb 7, 2012, at 5:00 PM, caribou...@gmx.fr wrote:

 Dear R-helpers,

 First of all, sorry for those who have (eventually) already received that
 request.
 The mail has been bumped several times, so I am not sure the list has
 received it... and I need help (if you have time)! ;-)

 I have a very simple question and I really hope that someone could help me

 I would like to estimate a simple fixed effect regression model with
 clustered standard errors by individuals.
 For those using Stata, the counterpart would be xtreg with the fe option,
 or areg with the absorb option and in both case the clustering is achieved
 with vce(cluster id)

 My question is : how could I do that with R ?
 An important point is that I have too many individuals, therefore I cannot
 include dummies and should use the demeaning usual procedure.
 I tried with the plm package with the within option, but R quikcly tells
 me that the memory limits are attained (I have over 10go ram!) while the
 dataset is only 700mo (about 50 000 individuals, highly unbalanced)
 I dont understand... plm do indeed demean the data so the computation should
 be fast and light enough... and instead it saturates my memory and do not
 converge...

 Do you have an idea ?
 Moreover, it is possible to obtain cluster robust standard errors with plm ?

 Are there any other solutions for fixed effects linear models (with the
 demean trick in order not to create as many dummies as individuals) ?
 Many thanks in advance ! ;)
 John



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Force printing of excluded axis annotations

2012-02-09 Thread Jim Lemon


Justin Fincher wrote:
 Howdy,
 This should be simple, but I am finding that I can't find a simple
 solution.  I have a plot to which I am manually adding the annotations
 to the y-axis with this command:

 axis(2, 
c(-4,-3,-2,-1,0,1,2,3,4,5,6,7),labels=c(-4,-3,-2,-1,0,1,2,3,4,5,6,7),cex.axis=8)


 The issue is that, apparently, R doesn't think that the -1 can fit,
 even though there is most certainly enough space.  Is there a way to
 force R to print all the annotations I give it, regardless of
 proximity or to reduce the space it believes it needs? Thank you.
 - Fincher

Hi Justin,
The staxlab function (plotrix) will squeeze labels, just set nlines=1.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Hotelling T2 test extension for multigroup data


Hi all,
I've got the following matrix :
 
mat - matrix(rnorm(700), ncol=5, dimnames=list( paste(f, c(1:140), sep=_), 
c(A, B, C, D, E)))

I can see that currently most of the multivariate Hotelling T2 tests are 
limited for application on two groups/samples. 

I wud appreciate if someone can provide me a suggestion how to implement the 
same for multiple groups.


Thx,
- Vickie
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] unsparse a vector

On Wed, Feb 08, 2012 at 05:01:01PM -0500, Sam Steingold wrote:
 loop is too slow.
 it appears that sparseMatrix does what I want:
 
 ll - lapply(l,length)
 i - rep(1:4, ll)
 vv - unlist(l)
 j1 - as.factor(substring(vv,1,1))
 t - table(j1)
 j - position of elements of j1 in names(t)
 sparseMatrix(i,j,x=as.numeric(substring(vv,2,2)), dimnames = names(t))
 
 so, the question is, how do I produce a vector of positions?
 
 i.e., from vectors
 [1] A B A C A B
 and
 [1] A B C
 I need to produce a vector
 [1] 1 2 1 3 1 2
 of positions of the elements of the first vector in the second vector.

This particular thing may be done as follows

  match(c(A, B, A, C, A, B), c(A, B, C))
  [1] 1 2 1 3 1 2

 PS. Of course, I would much prefer a dataframe to a matrix...

As the final result or also as an intermediate result?

Changing individual rows in a data frame is much slower
than in a matrix.

Compare

  n - 1
  mat - matrix(1:(2*n), nrow=n)
  df - as.data.frame(mat)

  system.time( for (i in 1:n) { mat[i, 1] - 0 } )

 user  system elapsed 
0.021   0.000   0.021 

  system.time( for (i in 1:n) { df[i, 1] - 0 } )

 user  system elapsed 
4.997   0.069   5.084 

This effect is specific to working with rows. Working
with the whole columns is a different thing.

  system.time( {
  col1 - df[[1]]
  for (i in 1:n) { col1[i] - 0 }
  df[[1]] - col1
  } )

user  system elapsed 
   0.019   0.000   0.019 

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] width and alignment of Latex table columns

2012-02-09 Thread Ista Zahn

Hi,

You can set the width of the columns the same way you would in any other LaTeX 
table, e.g., by setting align to p{width}. 
http://www.giss.nasa.gov/tools/latex/ltx-68.html has a good summary of LaTeX 
tables that you may find helpful.

Best,
Ista

On Thursday, February 09, 2012 10:08:32 AM n.via...@libero.it wrote:
 Dear all,
 I'm using xtable package in order to produce Latex table. There is a way to
 specify not only the alignment but also the width of table colums??
 Thanks for your attention.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to exclude rows with not-connected coalitions

2012-02-09 Thread Schumacher, G.

Dear all,

I have question but cannot explain without providing some context first:

I want to calculate how many policy-connected coalitions between 7 parties are 
possible. I have positions on an one-dimensional scale for each party and I 
have sorted the parties on the positions (it is sorted from extreme left to 
extreme right, hence using a left-right scale). A policy-connected coalition 
consists of parties that are connected on this left-right scale, hence there 
cannot be a party outside the coalition that is in-between two parties in the 
coalition on the left-right scale. My question is: how do a make a matrix that 
excludes those coalitions that are not policy-connected.

I made a matrix with each combination of 0 (out of coalition) and 1 (in 
coalition), for 7 parties.

a - expand.grid(rep(list(c(0,1)),7))

a gives all possible coalitions between these 7 parties.

Now I want to exclude those rows with coalition that are not policy-connected.

Hence: this one should be out, because the fifth party is not in.

0 1 1 1 0 1 1

And this one should be in,

0 0 0 1 1 1 1.

Hope this is clear and that someone has a good suggestion.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] factor level for non-existing value

2012-02-09 Thread David Studer

Hello everybody!

Let's assume I have the following factor with it's levels:

a-factor(c(2,3,3,2,4,2,3,2,2,2,3,2,3))
mydata-data.frame(a)

When I plot the vector a using

barplot(table(mydata$a)

unfortunately the value 1 does not
show up, as it does not appear in my data.
But still, it theoretically exists.

How can I assign the following levels to the factor?

1: dislike very much
2: dislike
3: like
4: like very much

I have already tried the following code, which does not work
levels(data$a)-c(dislike very much,dislike,like,like very much)
as 2 then becomes dislike very much.

I hope you understand my problem.

Thank you for any help!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] width and alignment of Latex table columns

2012-02-09 Thread Duncan Mackay


Hi

Here is an example with something at hand if you use Sweave.
Using Sweave you can embed the R in a chunk and properly format the 
headers and sometimes the table.


the D{.}{.}{4.0} requires the use of \usepackage{dcolumn}

\begin{table}[h]
\caption[DNO and \rdate]%
{DNO and \rdate with row totals}%
\label{tab:CW2}%
\centering
\begin{tabular}{p{1.0in} *{7}{D{.}{.}{4.0}}  }
\toprule
\addlinespace[3pt]
% Header
\multicolumn{1}{Measurement} \\
\addlinespace[3pt]
\midrule
\addlinespace[5pt]
%Table2
Table2, keep.source=FALSE, results=tex, echo=FALSE=

  xx - addmargins(xx,2)

  print(
  xtable(xx,
 digits = rep(0,8)
 ),
 type= latex,
 floating = FALSE,
 tabular.environment = tabular,
 include.rownames = TRUE,
 include.colnames = FALSE,
 only.contents = TRUE,
 hline.after = NULL
  ) ## xtable

@ % Table2 end
\addlinespace[5pt]
\bottomrule
\end{tabular}
\end{table}

Regards

Duncan

Duncan Mackay
Department of Agronomy and Soil Science
University of New England
ARMIDALE NSW 2351
Email: home mac...@northnet.com.au



At 22:07 9/02/2012, you wrote:

Hi,

You can set the width of the columns the same way you would in any 
other LaTeX

table, e.g., by setting align to p{width}.
http://www.giss.nasa.gov/tools/latex/ltx-68.html has a good summary of LaTeX
tables that you may find helpful.

Best,
Ista

On Thursday, February 09, 2012 10:08:32 AM n.via...@libero.it wrote:
 Dear all,
 I'm using xtable package in order to produce Latex table. There is a way to
 specify not only the alignment but also the width of table colums??
 Thanks for your attention.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] factor level for non-existing value

2012-02-09 Thread Petr PIKAL

Hi

 
 Hello everybody!
 
 Let's assume I have the following factor with it's levels:
 
 a-factor(c(2,3,3,2,4,2,3,2,2,2,3,2,3))
 mydata-data.frame(a)
 
 When I plot the vector a using
 
 barplot(table(mydata$a)
 
 unfortunately the value 1 does not
 show up, as it does not appear in my data.
 But still, it theoretically exists.
 
 How can I assign the following levels to the factor?
 
 1: dislike very much
 2: dislike
 3: like
 4: like very much
 
 I have already tried the following code, which does not work
 levels(data$a)-c(dislike very much,dislike,like,like very much)
 as 2 then becomes dislike very much.

you can do it when constructing a factor

a-factor(c(2,3,3,2,4,2,3,2,2,2,3,2,3), levels=1:4,labels=c(dislike very 
much,dislike,like,like very much))

or when you already have a factor

a-factor(a, levels=1:4)

I basically understand that factor is a vector of numeric values with 
levels and labels attribute. Each level can have some label which can be 
changed independently. All levels does not need to be present in a factor.

However you shall not confuse it with function ?labels which has nothing 
to do with factors.

Regards
Petr




 
 I hope you understand my problem.
 
 Thank you for any help!
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Workflow when using git (e.g. github) as SVN repo for package

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi

This question is towards development of packages, but I think it does
not fit into R-devel mailing list, so I post it here.

I would like to use git as the VCS for my packages, but still be able
to use the magic of R-forge.

I know how to mgrate the existing SVN repo from R-forge to e.g.
github, and I am a little bit familiar with using git - so no problem
here.

But where to from there? How can I bring the magic of R-forge into
the game?

I thought about pushing only releases to the SVN on R-forge so that
the package are then created - but what is the best workflow for this?

As many packages are hosted on github, there should be an easy and way
to do it?

Thanks,

Rainer


- -- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
Biology, UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :   +33 - (0)9 53 10 27 44
Cell:   +33 - (0)6 85 62 59 98
Fax :   +33 - (0)9 58 10 27 44

Fax (D):+49 - (0)3 21 21 25 22 44

email:  rai...@krugs.de

Skype:  RMkrug
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk8zv3QACgkQoYgNqgF2egpHBgCfYW6Hmtl35kE+OsLm7+GyC6yc
GpsAniaJSoTNFGLDxAgu/upUpkakXlcH
=wxEn
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to exclude rows with not-connected coalitions

On Thu, Feb 09, 2012 at 12:18:06PM +, Schumacher, G. wrote:
 Dear all,
 
 I have question but cannot explain without providing some context first:
 
 I want to calculate how many policy-connected coalitions between 7 parties 
 are possible. I have positions on an one-dimensional scale for each party and 
 I have sorted the parties on the positions (it is sorted from extreme left to 
 extreme right, hence using a left-right scale). A policy-connected coalition 
 consists of parties that are connected on this left-right scale, hence there 
 cannot be a party outside the coalition that is in-between two parties in the 
 coalition on the left-right scale. My question is: how do a make a matrix 
 that excludes those coalitions that are not policy-connected.
 
 I made a matrix with each combination of 0 (out of coalition) and 1 (in 
 coalition), for 7 parties.
 
 a - expand.grid(rep(list(c(0,1)),7))
 
 a gives all possible coalitions between these 7 parties.
 
 Now I want to exclude those rows with coalition that are not policy-connected.
 
 Hence: this one should be out, because the fifth party is not in.
 
 0 1 1 1 0 1 1
 
 And this one should be in,
 
 0 0 0 1 1 1 1.

Hi.

If i understand correctly, a connected coalition is
an interval on your scale. If this is correct, try
the following for generating only the intervals.

  n - 7
  a - matrix(nrow=n+choose(n, 2), ncol=n)
  k - 1
  for (i in 1:n) {
  for (j in (i+1):(n+1)) {
  x - rep(0, times=n+1)
  x[i] - 1
  x[j] - 1
  a[k,] - cumsum(x)[1:n]
  k - k+1
  }
  }
  a[a == 2] - 0
  a

[,1] [,2] [,3] [,4] [,5] [,6] [,7]
   [1,]1000000
   [2,]1100000
   [3,]1110000
   [4,]1111000
   [5,]1111100
   [6,]1111110
   [7,]1111111
   [8,]0100000
   [9,]0110000
  [10,]0111000
  ...

If you already have the matrix of all coalitions
and want to detect the connected ones, try the
function ?rle. There should be exactly one
run of ones.

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] factor level for non-existing value

2012-02-09 Thread Ista Zahn

Hi David,

On Thursday, February 09, 2012 01:21:48 PM David Studer wrote:
 Hello everybody!
 
 Let's assume I have the following factor with it's levels:
 
 a-factor(c(2,3,3,2,4,2,3,2,2,2,3,2,3))
 mydata-data.frame(a)

#You need to specify levels and labels here, like this:

a - factor(c(2,3,3,2,4,2,3,2,2,2,3,2,3),
levels = 1:4,
labels = c(dislike very much,
  dislike,
  like,
  like very much))

Then barplot will plot all four factor levels.

barplot(table(a))

This is pretty basic stuff, likely to be covered in most introductory texts, 
so I suggest you take some time to read an introduction to R.

Best,
Ista

 
 When I plot the vector a using
 
 barplot(table(mydata$a)
 
 unfortunately the value 1 does not
 show up, as it does not appear in my data.
 But still, it theoretically exists.
 
 How can I assign the following levels to the factor?
 
 1: dislike very much
 2: dislike
 3: like
 4: like very much
 
 I have already tried the following code, which does not work
 levels(data$a)-c(dislike very much,dislike,like,like very much)
 as 2 then becomes dislike very much.
 
 I hope you understand my problem.
 
 Thank you for any help!
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] search tutorial for function tt in cox.zph

2012-02-09 Thread Terry Therneau


 It seems there is a function ?tt? in R which can help to find this
 function and integrate it in my Cox model.
 I am searching a good tutorial (or book reference) with example(s) to
 understand and apply the function ?tt? in R;

There is a vignette describing the tt function.  Due to an oversight of
mine the vignette was not included in CRAN until the most recent update
of the survival package (a few days ago).  
For diagnosis of time-dependence and even more for deciding how to deal
with it, I find the plots from the cox.zph function to be the most
helpful.  These are discussed in my book with P Grambsch.

Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hotelling T2 test extension for multigroup data

2012-02-09 Thread peter dalgaard


On Feb 9, 2012, at 12:11 , Vickie S wrote:

 
 Hi all,
 I've got the following matrix :
  
 mat - matrix(rnorm(700), ncol=5, dimnames=list( paste(f, c(1:140), 
 sep=_), c(A, B, C, D, E)))
 
 I can see that currently most of the multivariate Hotelling T2 tests are 
 limited for application on two groups/samples. 
 
 I wud appreciate if someone can provide me a suggestion how to implement the 
 same for multiple groups.
 

Please don't start up a new thread, cloning the same misconceptions as an 
earlier one! It doesn't help, you will only get people misunderstanding you in 
the same way...

Apparently, you have rows and columns reversed relative to the common layout in 
multivariate analysis. Also, you have only one observation for each of your 
groups (species)? 

What makes you think that T^2 would work for you if you had only A and B 
species? I don't think anything will work unless you have either replication or 
some sort of additional assumptions on the covariance structure of your 
140-dimensional response.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to exclude rows with not-connected coalitions

2012-02-09 Thread ONKELINX, Thierry

This can be done without nested loops.

a - expand.grid(rep(list(c(0,1)),7))
tmp - apply(a, 1, function(x){
  var(cumsum(x == 0)[x == 1])
})
a[is.na(tmp) | tmp == 0, ]


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and 
Forest
team Biometrie  Kwaliteitszorg / team Biometrics  Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey


-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens 
Petr Savicky
Verzonden: donderdag 9 februari 2012 13:44
Aan: r-help@r-project.org
Onderwerp: Re: [R] how to exclude rows with not-connected coalitions

On Thu, Feb 09, 2012 at 12:18:06PM +, Schumacher, G. wrote:
 Dear all,
 
 I have question but cannot explain without providing some context first:
 
 I want to calculate how many policy-connected coalitions between 7 parties 
 are possible. I have positions on an one-dimensional scale for each party and 
 I have sorted the parties on the positions (it is sorted from extreme left to 
 extreme right, hence using a left-right scale). A policy-connected coalition 
 consists of parties that are connected on this left-right scale, hence there 
 cannot be a party outside the coalition that is in-between two parties in the 
 coalition on the left-right scale. My question is: how do a make a matrix 
 that excludes those coalitions that are not policy-connected.
 
 I made a matrix with each combination of 0 (out of coalition) and 1 (in 
 coalition), for 7 parties.
 
 a - expand.grid(rep(list(c(0,1)),7))
 
 a gives all possible coalitions between these 7 parties.
 
 Now I want to exclude those rows with coalition that are not policy-connected.
 
 Hence: this one should be out, because the fifth party is not in.
 
 0 1 1 1 0 1 1
 
 And this one should be in,
 
 0 0 0 1 1 1 1.

Hi.

If i understand correctly, a connected coalition is an interval on your scale. 
If this is correct, try the following for generating only the intervals.

  n - 7
  a - matrix(nrow=n+choose(n, 2), ncol=n)
  k - 1
  for (i in 1:n) {
  for (j in (i+1):(n+1)) {
  x - rep(0, times=n+1)
  x[i] - 1
  x[j] - 1
  a[k,] - cumsum(x)[1:n]
  k - k+1
  }
  }
  a[a == 2] - 0
  a

[,1] [,2] [,3] [,4] [,5] [,6] [,7]
   [1,]1000000
   [2,]1100000
   [3,]1110000
   [4,]1111000
   [5,]1111100
   [6,]1111110
   [7,]1111111
   [8,]0100000
   [9,]0110000
  [10,]0111000
  ...

If you already have the matrix of all coalitions and want to detect the 
connected ones, try the function ?rle. There should be exactly one run of ones.

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to exclude rows with not-connected coalitions

2012-02-09 Thread Schumacher, G.

Thanks Thierry and Petr for your solutions. I can continue now (until the next 
problem arises ;). 

From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] on behalf of 
ONKELINX, Thierry [thierry.onkel...@inbo.be]
Sent: Thursday, February 09, 2012 2:09 PM
To: Petr Savicky; r-help@r-project.org
Subject: Re: [R] how to exclude rows with not-connected coalitions

This can be done without nested loops.

a - expand.grid(rep(list(c(0,1)),7))
tmp - apply(a, 1, function(x){
  var(cumsum(x == 0)[x == 1])
})
a[is.na(tmp) | tmp == 0, ]

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and 
Forest
team Biometrie  Kwaliteitszorg / team Biometrics  Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey

-Oorspronkelijk bericht-
Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] Namens 
Petr Savicky
Verzonden: donderdag 9 februari 2012 13:44
Aan: r-help@r-project.org
Onderwerp: Re: [R] how to exclude rows with not-connected coalitions

On Thu, Feb 09, 2012 at 12:18:06PM +, Schumacher, G. wrote:
 Dear all,

 I have question but cannot explain without providing some context first:

 I want to calculate how many policy-connected coalitions between 7 parties 
 are possible. I have positions on an one-dimensional scale for each party and 
 I have sorted the parties on the positions (it is sorted from extreme left to 
 extreme right, hence using a left-right scale). A policy-connected coalition 
 consists of parties that are connected on this left-right scale, hence there 
 cannot be a party outside the coalition that is in-between two parties in the 
 coalition on the left-right scale. My question is: how do a make a matrix 
 that excludes those coalitions that are not policy-connected.

 I made a matrix with each combination of 0 (out of coalition) and 1 (in 
 coalition), for 7 parties.

 a - expand.grid(rep(list(c(0,1)),7))

 a gives all possible coalitions between these 7 parties.

 Now I want to exclude those rows with coalition that are not policy-connected.

 Hence: this one should be out, because the fifth party is not in.

 0 1 1 1 0 1 1

 And this one should be in,

 0 0 0 1 1 1 1.

Hi.

If i understand correctly, a connected coalition is an interval on your scale. 
If this is correct, try the following for generating only the intervals.

  n - 7
  a - matrix(nrow=n+choose(n, 2), ncol=n)
  k - 1
  for (i in 1:n) {
  for (j in (i+1):(n+1)) {
  x - rep(0, times=n+1)
  x[i] - 1
  x[j] - 1
  a[k,] - cumsum(x)[1:n]
  k - k+1
  }
  }
  a[a == 2] - 0
  a

[,1] [,2] [,3] [,4] [,5] [,6] [,7]
   [1,]1000000
   [2,]1100000
   [3,]1110000
   [4,]1111000
   [5,]1111100
   [6,]1111110
   [7,]1111111
   [8,]0100000
   [9,]0110000
  [10,]0111000
  ...

If you already have the matrix of all coalitions and want to detect the 
connected ones, try the function ?rle. There should be exactly one run of ones.

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to pass weka classifier options with a meta classifier in RWeka?

2012-02-09 Thread Kari Ruohonen

Hi,
I am trying to replicate a training of AttributeSelectedClassifier with
CFsSubsetEval, BestFirst and NaiveBayes that I have initially done with
Weka. Now, I am trying to use RWeka in R.

I have a problem of passing arguments to the CfsSubsetEval, BestFirst
and NaiveBayes. I have first created an interface for the classifier
with:

AS-make_Weka_classifier(weka/classifiers/meta/AttributeSelectedClassifier)

And then I am trying to run the classifier with:

nb.model-AS(class~.,data=ex,
 control=Weka_control(
   E=weka.attributeSelection.CfsSubsetEval,
   S=weka.attributeSelection.BestFirst -D 1,
   W=weka.classifiers.bayes.NaiveBayes -D))

But now, I get an error saying:

Error in .jcall(classifier, V, buildClassifier, instances) : 
  java.lang.Exception: Can't find class called:
weka.classifiers.bayes.NaiveBayes -D

indicating that the way I am passing the argument -D to the NaiveBayes
is incorrect. I am uncertain from the RWeka documentation how the
passing mechanism of Weka_control is supposed to work with meta
classifiers. All help is greatly appreciated.

Many thanks, Kari

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Tr: Re: how to pass weka classifier options with a meta classifier in RWeka?

2012-02-09 Thread Kari Ruohonen

On Thu, 2012-02-09 at 14:52 +0100, Milan Bouchet-Valat wrote:
 Le jeudi 09 février 2012 à 15:31 +0200, Kari Ruohonen a écrit :
snip
  
  And then I am trying to run the classifier with:
  
  nb.model-AS(class~.,data=ex,
   control=Weka_control(
 E=weka.attributeSelection.CfsSubsetEval,
 S=weka.attributeSelection.BestFirst -D 1,
 W=weka.classifiers.bayes.NaiveBayes -D))
  
  But now, I get an error saying:
  
  Error in .jcall(classifier, V, buildClassifier, instances) : 
java.lang.Exception: Can't find class called:
  weka.classifiers.bayes.NaiveBayes -D
  
  indicating that the way I am passing the argument -D to the NaiveBayes
  is incorrect. I am uncertain from the RWeka documentation how the
  passing mechanism of Weka_control is supposed to work with meta
  classifiers. All help is greatly appreciated.
 I've never tried it myself, but ?Weka_control says:
  One can use lists for options taking multiple arguments, see the
  documentation for ‘SMO’ for an example.
 
 So maybe
 nb.model-AS(class~.,data=ex,
control=Weka_control(
 E=weka.attributeSelection.CfsSubsetEval,
 S=list(weka.attributeSelection.BestFirst, D=1),
 W=list(weka.classifiers.bayes.NaiveBayes, D=1)))
 
 
 Cheers

Hi and thanks for the suggestion. Unfortunately, it results in a similar
error:

Error in .jcall(classifier, V, buildClassifier, instances) : 
  java.lang.Exception: Can't find class called:
weka.classifiers.bayes.NaiveBayes -D 1

regards, Kari

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to compare optimized models ?

2012-02-09 Thread Diane Bailleul

Good afternoon everybody,
I'm using optimization routines like optimix or nlminb to optimize the 
parameters of my dispersal curves which are working with different 
kernels :

/anlb2002z1b - nlminb(c(0.5,2), objective=LogLiketot,lower=c(0,0), 
upper=c(1,200))/
where /LogLiketot/ contains my dispersal kernel parameters and 
/(0.5,2)/, an exemple of starting point to estimate these parameters.
/anlb2002z1b/ returns me different values, including the AIC.

On the same data, I'm comparing 3 dispersal kernels, exponential kernel 
(one parameter), exponential-power kernel (two-parameters) and geometric 
kernel (two-parameters).

Is there a way, in R, to compare these dispersal kernels on the base of 
their AIC to find the best fitting one ? A Likelihood ratio test works 
only for nested models ...

Thanks in advance for your help.
Diane.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] coxme with frailty--variance of random effect?

2012-02-09 Thread Thomas Chadefaux

Thank you very much for taking the time to answer. This pointed me in
the right direction.
For those interested, I found a useful explanation  of the derivation
of the estimated variance of the random effect in Hosmer  Lemeshow's
Applied Survival Analysis (1999), pp.321-26 (I love your book, Dr.
Therneay, but I needed something easier...). They proceed in 4 steps:

1. Obtain the cumulative hazard function for each subject.
2. Choose an arbitrary value for the variance parameter of the
frailties (call it theta).
3. Compute for each subject an estimate of the value of their
frailties, USING this variance parameter theta:
frailty_i= \frac(1+\theta \times c_i}{1+\theta \times H_i} (formula on
p. 321), where H is the cumulative hazard for the subject. So if theta
is 0 (no variance), then frailty=1 (i.e., no excess frailty). As theta
goes to infinity, the estimated frailty is simply the ratio
1/(cumulative risk so far) or 1/(cumulative risk so far), depending on
whether the subject is still alive or not.
4. fit the proportional hazard model with the same covariates as
before, but this time including the frailties in the hazard function.
5. calculate a log-likelihood for the model.

Repeat this for all possible values of the frailties (in practice,
sweep through them according to some algorithm), and pick the one with
the highest log-likelihood.

SO IF I understand correctly, the frailties are calculated GIVEN a
variance parameter of the frailties, and not the reverse. I.e., theta
is chosen first, and the random effects are estimated accordingly.
Which is why the variance of the estimated random effects is NOT the
same as theta. Did I get this right?

Thanks again,
Thomas


Best wishes,
Thomas Chadefaux

On Mon, Feb 6, 2012 at 1:56 PM, Terry Therneau thern...@mayo.edu wrote:
 In answer to the several questions:

 1. Variance of the random effect:  Your example is not reproducable
 since you didn't give the random number seed.  Instead I'll use one of
 the data sets that comes with the survival package.
 library(coxme)
 fit - coxme(Surv(tstart, tstop, status) ~ treat + (1|center), cgd)
 VarCorr(fit)$center
 Intercept
 0.1643779

 var(ranef(fit)$center)
 Intercept
 [1] 0.06261608

  Yes, it is true that for a random effects model, the estimated variance
 of the random effect is not equal to var(estimated per center effects).
 Exactly the same is true of a linear mixed effects model: try the same
 with lmer
         library(lme4)
         lfit(status ~ treat + (1|center), cgd)
         VarCorr(lfit)$center
         var(ranef(lfit)$center)

 Why? It's a statistical insight that took me a while, so I don't think I
 can explain it over email.  Find someone familiar with mixed efffects
 models and have a chat.

 2. coxph( +frailty(center)) and coxme give different results.
  Read the documentation.  One of them is fitting a gamma distribution
 for the random effect, the other a Gaussian.

 Terry Therneau



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rearanging Data



On Feb 8, 2012, at 9:48 PM, kevin123 wrote:


Hi,

This is only a small portion of the Data i am working on
I want to make a subset of this data set( Data Set=Claims)

 MemberID   ProviderID Vendor   PCPYear  Specialty
1 422869788013252 172193 37796   Y1Surgery
2 979032483316066 726296  5300Y3Internal
3  27594272997752 140343 91972Y1   Internal
4 735705597053364 240043 70119   Y3   Laboratory

I want to put all the rows containing Y1 into a subset. I tried this  
but it

does not work

sub - subset(Claims, Year=Y1)


You (as well , apparently, as Tal) don't seem to have learned that =  
is an assignment function and the needed Comparison operator is ==


?Comparison

Try instead:

sub - subset(Claims, Year==Y1)

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hotelling T2 test extension for multigroup data


Hey Peter,

I agree that Hotelling's T test, although technically possible for two groups 
but rejecting the Ho in this case is not going to convey any meaning.

But I am sure there is a possibility. My assumption is based on using simulated 
reference set for each comparison.

-- Vickie 





 Subject: Re: [R] Hotelling T2 test extension for multigroup data

 From: pda...@gmail.com

 Date: Thu, 9 Feb 2012 14:08:22 +0100

 CC: r-help@r-project.org

 To: is...@live.com





 On Feb 9, 2012, at 12:11 , Vickie S wrote:



 

  Hi all,

  I've got the following matrix :

 

  mat - matrix(rnorm(700), ncol=5, dimnames=list( paste(f, c(1:140), 
  sep=_), c(A, B, C, D, E)))

 

  I can see that currently most of the multivariate Hotelling T2 tests are 
  limited for application on two groups/samples.

 

  I wud appreciate if someone can provide me a suggestion how to implement 
  the same for multiple groups.

 



 Please don't start up a new thread, cloning the same misconceptions as an 
 earlier one! It doesn't help, you will only get people misunderstanding you 
 in the same way...



 Apparently, you have rows and columns reversed relative to the common layout 
 in multivariate analysis. Also, you have only one observation for each of 
 your groups (species)?



 What makes you think that T^2 would work for you if you had only A and B 
 species? I don't think anything will work unless you have either replication 
 or some sort of additional assumptions on the covariance structure of your 
 140-dimensional response.



 --

 Peter Dalgaard, Professor

 Center for Statistics, Copenhagen Business School

 Solbjerg Plads 3, 2000 Frederiksberg, Denmark

 Phone: (+45)38153501

 Email: pd@cbs.dk Priv: pda...@gmail.com



  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dropterm in MANOVA for MLM objects

2012-02-09 Thread John Fox

Dear Vickie,

No one will be able to wave a magic wand over your data to allow you to 
usefully estimate linear models with 0 df for error, and you certainly can't 
perform statistical tests. As Peter Dalgaard pointed out, the same confusion 
was reflected in your subsequent question about Hotelling's T^2. Hotelling T^2 
is equivalent to MANOVA when there are two groups.

Best,
 John

On Thu, 9 Feb 2012 09:38:51 +0100
 Vickie S is...@live.com wrote:
 
 Thanks for nice explanation. 
 Unfortunately, matrix in my question is exactly similar to the one I posted 
 earlier : 
 
 mat - matrix(rnorm(700), ncol=5, dimnames=list( paste(f, 
 c(1:140),sep=_), c(A, B, C, D, E))) 
 
 
 Question here is which of the 140 characteristics (i.e. f_1...f_140) 
 distinguish the most between the five plant
 species.
 
 Is it true that this matrix can't be regressed with factor responses 
 (species) ? If so, what alternatives can be used ?
 
  
 -  Vickie
 
 
 
  From: j...@mcmaster.ca
 
  To: is...@live.com
 
  CC: r-help@r-project.org
 
  Subject: RE: [R] dropterm in MANOVA for MLM objects
 
  Date: Wed, 8 Feb 2012 20:37:37 -0500
 
 
 
  Dear Vickie,
 
 
 
  I'm afraid that the test problem that you've constructed makes no sense, and
 
  doesn't correspond to the problem that you initially described, in which a
 
  matrix of presumably 5 responses for presumably 140 observations is
 
  regressed on 6 predictors. You regressed your randomly generated matrix of 5
 
  responses and 140 observations on a factor constructed from the distinct 140
 
  observation names. That factor has 140 levels, and so the model uses 140 df,
 
  all the df in the data. It's therefore not surprising that the error SSP
 
  matrix has 0 df, which is exactly what Anova.mlm (actually,
 
  linearHypothesis.mlm, which it calls) tells you.
 
 
 
  The remark that you found about univariate tests that you apparently found
 
  on-line concerns repeated-measures designs and is not relevant to your data.
 
  And you can't do a univariate ANOVA when there's 0 df for error in any
 
  event.
 
 
 
  Here's a proper simulation of the kind of data that I think you have:
 
 
 
   set.seed(12345)
 
   E - matrix(rnorm(140*5), ncol=5)
 
   X - matrix(rnorm(140*6), ncol=6)
 
   Beta - matrix(runif(6*5), ncol=5)
 
   Y - X %*% Beta + E
 
   colnames(Y) - c(A, B, C, D, E)
 
   colnames(X) - c(syct, mmin, mmax, cach, chmin, chmax)
 
   Data - as.data.frame(cbind(Y, X))
 
   mod - lm(cbind(A, B, C, D, E) ~ syct + mmin + mmax + cach + chmin +
 
  chmax, data=Data)
 
   Anova(mod)
 
 
 
  Type II MANOVA Tests: Pillai test statistic
 
  Df test stat approx F num Df den Df Pr(F)
 
  syct 1 0.41622 18.395 5 129 9.31e-14 ***
 
  mmin 1 0.48288 24.091 5 129  2.2e-16 ***
 
  mmax 1 0.62100 42.273 5 129  2.2e-16 ***
 
  cach 1 0.61711 41.583 5 129  2.2e-16 ***
 
  chmin 1 0.72547 68.180 5 129  2.2e-16 ***
 
  chmax 1 0.54825 31.311 5 129  2.2e-16 ***
 
  ---
 
  Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1   1
 
 
 
  Best,
 
  John
 
 
 
   -Original Message-
 
   From: Vickie S [mailto:is...@live.com]
 
   Sent: February-08-12 5:53 PM
 
   To: j...@mcmaster.ca
 
   Cc: r-help@r-project.org
 
   Subject: RE: [R] dropterm in MANOVA for MLM objects
 
  
 
  
 
   Dear Prof Fox,
 
   I tried anova but got the following error message:
 
  
 
   mat - matrix(rnorm(700), ncol=5, dimnames=list( paste(f, c(1:140),
 
   sep=_), c(A, B, C, D, E))) summary(Anova(lm(cbind(A, B, C,
 
   D, E) ~ factor(rownames(mat)), data=as.data.frame(mat
 
  
 
   Error in summary(Anova(lm(cbind(A, B, C, D, E) ~
 
   factor(rownames(mat)),  :
 
 error in evaluating the argument 'object' in selecting a method for
 
   function 'summary': Error in linearHypothesis.mlm(mod, hyp.matrix.2,
 
   SSPE = SSPE, V = V, ...) :
 
 The error SSP matrix is apparently of deficient rank = 0  5
 
  
 
   I looked in previous forum and it seems like i have only option of
 
   performing the univariate test here.
 
  
 
   Therefore I used the following, but it still results in an error
 
   message:
 
   Anova(lm(cbind(A, B, C, D, E) ~ factor(rownames(mat)),
 
   data=as.data.frame(mat)), univariate=TRUE, multivariate=F) Error in
 
   linearHypothesis.mlm(mod, hyp.matrix.2, SSPE = SSPE, V = V, ...) :
 
 The error SSP matrix is apparently of deficient rank = 0  5
 
  
 
   Any suggestions ?
 
  
 
   Thanks
 
   Vickie
 
  
 
  
 
  
 
   I think I am still missing some important clues here. Is it because
 
   the feww
 
  
 
From: j...@mcmaster.ca
 
To: is...@live.com
 
CC: r-help@r-project.org
 
Subject: RE: [R] dropterm in MANOVA for MLM objects
 
Date: Wed, 8 Feb 2012 17:01:34 -0500
 
   
 
Dear Vicki,
 
   
 
I think that the Anova() function in the car package will do what
 
   you
 
want (and will also properly handle models with more structure, such
 
as interactions).
 
   
 
Best,
 
John

Re: [R] dropterm in MANOVA for MLM objects


Anywez, I have figured out a solution. Whether it is a magic wand or something 
else, that I don't know but I wish it would work.

Thx for the critical stand.

-- Vickie




 From: j...@mcmaster.ca

 Subject: Re: [R] dropterm in MANOVA for MLM objects

 To: is...@live.com

 CC: r-help@r-project.org

 Date: Thu, 9 Feb 2012 09:43:37 -0500



 Dear Vickie,



 No one will be able to wave a magic wand over your data to allow you to 
 usefully estimate linear models with 0 df for error, and you certainly can't 
 perform statistical tests. As Peter Dalgaard pointed out, the same confusion 
 was reflected in your subsequent question about Hotelling's T^2. Hotelling 
 T^2 is equivalent to MANOVA when there are two groups.



 Best,

 John



 On Thu, 9 Feb 2012 09:38:51 +0100

 Vickie S is...@live.com wrote:

 

  Thanks for nice explanation.

  Unfortunately, matrix in my question is exactly similar to the one I posted 
  earlier :

 

  mat - matrix(rnorm(700), ncol=5, dimnames=list( paste(f, 
  c(1:140),sep=_), c(A, B, C, D, E)))

 

 

  Question here is which of the 140 characteristics (i.e. f_1...f_140) 
  distinguish the most between the five plant

  species.

 

  Is it true that this matrix can't be regressed with factor responses 
  (species) ? If so, what alternatives can be used ?

 

 

  -  Vickie

 

 

  

   From: j...@mcmaster.ca

 

   To: is...@live.com

 

   CC: r-help@r-project.org

 

   Subject: RE: [R] dropterm in MANOVA for MLM objects

 

   Date: Wed, 8 Feb 2012 20:37:37 -0500

 

  

 

   Dear Vickie,

 

  

 

   I'm afraid that the test problem that you've constructed makes no sense, 
   and

 

   doesn't correspond to the problem that you initially described, in which a

 

   matrix of presumably 5 responses for presumably 140 observations is

 

   regressed on 6 predictors. You regressed your randomly generated matrix 
   of 5

 

   responses and 140 observations on a factor constructed from the distinct 
   140

 

   observation names. That factor has 140 levels, and so the model uses 140 
   df,

 

   all the df in the data. It's therefore not surprising that the error SSP

 

   matrix has 0 df, which is exactly what Anova.mlm (actually,

 

   linearHypothesis.mlm, which it calls) tells you.

 

  

 

   The remark that you found about univariate tests that you apparently found

 

   on-line concerns repeated-measures designs and is not relevant to your 
   data.

 

   And you can't do a univariate ANOVA when there's 0 df for error in any

 

   event.

 

  

 

   Here's a proper simulation of the kind of data that I think you have:

 

  

 

set.seed(12345)

 

E - matrix(rnorm(140*5), ncol=5)

 

X - matrix(rnorm(140*6), ncol=6)

 

Beta - matrix(runif(6*5), ncol=5)

 

Y - X %*% Beta + E

 

colnames(Y) - c(A, B, C, D, E)

 

colnames(X) - c(syct, mmin, mmax, cach, chmin, chmax)

 

Data - as.data.frame(cbind(Y, X))

 

mod - lm(cbind(A, B, C, D, E) ~ syct + mmin + mmax + cach + chmin +

 

   chmax, data=Data)

 

Anova(mod)

 

  

 

   Type II MANOVA Tests: Pillai test statistic

 

   Df test stat approx F num Df den Df Pr(F)

 

   syct 1 0.41622 18.395 5 129 9.31e-14 ***

 

   mmin 1 0.48288 24.091 5 129  2.2e-16 ***

 

   mmax 1 0.62100 42.273 5 129  2.2e-16 ***

 

   cach 1 0.61711 41.583 5 129  2.2e-16 ***

 

   chmin 1 0.72547 68.180 5 129  2.2e-16 ***

 

   chmax 1 0.54825 31.311 5 129  2.2e-16 ***

 

   ---

 

   Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 

  

 

   Best,

 

   John

 

  

 

-Original Message-

 

From: Vickie S [mailto:is...@live.com]

 

Sent: February-08-12 5:53 PM

 

To: j...@mcmaster.ca

 

Cc: r-help@r-project.org

 

Subject: RE: [R] dropterm in MANOVA for MLM objects

 

   

 

   

 

Dear Prof Fox,

 

I tried anova but got the following error message:

 

   

 

mat - matrix(rnorm(700), ncol=5, dimnames=list( paste(f, c(1:140),

 

sep=_), c(A, B, C, D, E))) summary(Anova(lm(cbind(A, B, C,

 

D, E) ~ factor(rownames(mat)), data=as.data.frame(mat

 

   

 

Error in summary(Anova(lm(cbind(A, B, C, D, E) ~

 

factor(rownames(mat)), :

 

error in evaluating the argument 'object' in selecting a method for

 

function 'summary': Error in linearHypothesis.mlm(mod, hyp.matrix.2,

 

SSPE = SSPE, V = V, ...) :

 

The error SSP matrix is apparently of deficient rank = 0  5

 

   

 

I looked in previous forum and it seems like i have only option of

 

performing the univariate test here.

 

   

 

Therefore I used the following, but it still results in an error

 

message:

 

Anova(lm(cbind(A, B, C, D, E) ~ factor(rownames(mat)),

 

data=as.data.frame(mat)), univariate=TRUE, multivariate=F) Error in

 

linearHypothesis.mlm(mod, hyp.matrix.2, SSPE =

[R] Question about measurment invariance in a multigroup SEM

2012-02-09 Thread Paul Miller

Hello All,

I have a question about measurement invariance in a multigroup SEM. I am of the 
impression that one cannot test the equality of structural paths across groups 
without demonstrating measurement invariance of latent factors first. At a 
minimum, this would involve demonstrating that corresponding factor loadings do 
not differ across groups using, say, a difference in chi-square test. 

Am I right about this? 

Is there any published work that argues it is possible to compare the 
structural paths without demonstrating measurement invariance first?

Thanks,

Paul

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] February-March 2012 R/S-PLUS Coursesby XLSolutions Corp at 9 USA locations

2012-02-09 Thread s...@xlsolutions-corp.com

Happy New Year !

XLSolutions February-March 2012 R/S-PLUS courses schedule is now
available
online at 9 USA cities for with 13 new courses: *** Suggest a future
course
date/city

(1) R-PLUS: A Point-and-Click Approach to R
(2) S-PLUS / R : Programming Essentials.
(3) R/S+ Fundamentals and Programming Techniques
(4) R/S-PLUS Functions by Example.
(5) S/R-PLUS Programming 3: Advanced Techniques and Efficiencies.
(6) R/S+ System: Advanced Programming.
(7) R/S-PLUS Graphics: Essentials.
(8) R/S-PLUS Graphics for SAS Users
(9) R/S-PLUS Graphical Techniques for Marketing Research.
(10) Multivariate Statistical Methods in R/S-PLUS: Practical Research
Applications
(11) Introduction to Applied Econometrics with R/S-PLUS
(12) Exploratory Analysis for Large and Complex Problems in R/S-PLUS
(13) Determining Power and Sample Size Using R/S-PLUS.
(14) R/S-PLUS: Data Preparation for Data Mining
(15) Data Cleaning Techniques in R/S-PLUS
(16) R/S-PLUS: Applied Clustering Techniques


More on website

http://www.xlsolutions-corp.com/rplus.asp 

Ask for group discount and reserve your seat Now - Earlybird Rates.
Payment due after the class! Email Sue Turner: sue at
xlsolutions-corp.com

Phone: 206-686-1578


Please let us know if you and your colleagues are interested in this
class to take advantage of group discount. Register now to secure your
seat.

Cheers,
Elvis Miller, PhD
Manager Training.
XLSolutions Corporation
206 686 1578
www.xlsolutions-corp.com
elvis at xlsolutions-corp.com


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Any R course in New York or Washington DC in March?

2012-02-09 Thread s...@xlsolutions-corp.com

Good day Eugene and R-useRs !


We have  a few course coming up in New York City and Washington DC
during the week of March 11th, 2012.
http://www.xlsolutions-corp.com/rplus.asp

R Rocks !

Sue




From: eugene dalt eugened...@yahoo.com
 To: r-help@r-project.org r-help@r-project.org 
Sent: Monday, February 6, 2012 11:57 AM
 Subject: [R] Any R course in New York or Washington DC in March?
 
Hey Folks,
 
I am looking for a R course in New York or Washington DC,  please email
me any course announcement you know of for these dates.
 
Best - Eugene
[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Java heap space Error while reading table from postgres database using RJDBC

2012-02-09 Thread aajit75


Hi List,
I am reading table from postgres database into R session using RJDBC, table
contains 150 columns  and 20 rows.
Sample code is as below, which works fine with smaller tables.

db_driver - mydir$db_driver
db_jar_file - mydir$db_jar_file
db_server - mydir$db_server
db_server_lgn - mydir$db_server_lgn
db_server_pwd - mydir$db_server_pwd

library(RJDBC)

drv - JDBC(paste(db_driver,  sep = ),
   paste(db_jar_file,  sep = ),
   identifier.quote=`)

conn - dbConnect(drv, paste(db_server,  sep = ),
paste(db_server_lgn,  sep = ), 
paste(db_server_pwd,  sep = ))

cs_input_abt - dbReadTable(conn, cs_input_abt)

Following are the different error occurs after executing above script, every
time different error when above script is executed.
1. Error in .jcall(rp, I, fetch, stride) : 
  java.lang.OutOfMemoryError: Java heap space
2. Error in .verify.JDBC.result(r, Unable to retrieve JDBC result set for
,  : 
  Unable to retrieve JDBC result set for SELECT * FROM cs_input_abt (Could
not initialize class org.postgresql.util.PSQLException)
3. Error in .verify.JDBC.result(r, Unable to retrieve JDBC result set for
,  : 
  Unable to retrieve JDBC result set for SELECT * FROM bs_modelling_abt (GC
overhead limit exceeded)

Where am I going wrong? Is there any option which I had not used in the
RJDBC connection or needed to add?
[[elided Yahoo spam]]
Ajit



--
View this message in context: 
http://r.789695.n4.nabble.com/Java-heap-space-Error-while-reading-table-from-postgres-database-using-RJDBC-tp4372816p4372816.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] calling the function which is stored in a list

2012-02-09 Thread arunkumar1111

Thanks it works

-
Thanks in Advance
Arun
--
View this message in context: 
http://r.789695.n4.nabble.com/calling-the-function-which-is-stored-in-a-list-tp4372131p4372557.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ifelse

Hello,

I need to print a screen message after a test.

list-c(10,1,100,10)
ifelse(all(list %in% c(1,10,100)==TRUE),cat(YES\n),cat(NO\n))
YES
Error in ifelse(all(l %in% c(1, 10, 100) == TRUE), cat(YES\n), cat(NO\n)) : 
  replacement has length zero


I have the correct answer, YES, but with an Error.

Why?

TY for any help


Arnaud Gaboury
 
A2CT2 Ltd.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] generate matrices

2012-02-09 Thread Blaz Simcic

Dear all,
I would like to generate 500 matrices of 20 numbers from
standard normal distribution with names x1,x2,x3,â¦.x500. Â 
I tried with loop for, but I donât know how to name matices :
for (i in 1:500)Â  {
Â Â  x[[i]] - matrix(rnorm(20), 4)Â Â Â  Â }
Any suggestion?
Thanks,Â  BlaÅ¾
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fixed effects with clustered standard errors

2012-02-09 Thread John L

Dear Giovanni,

Many thanks for your interesting suggestions.
Your guess is indeed right, I only use the 'within' fixed effects specification.

I will soon send to this list all the additional information you
requested in order to understand what might cause this problem, but I
would say as a first guess that the inefficiency is (probably?) due to
individuals with too many datapoints : the median number of points is
10, but I have some individuals with more than 1000, 5000 or even 80
000 points overall!
So basically my dataset is probably too strange, as you suggested,
compared to the standard panel dataset in social sciences...

To be continued... ;-)
Many thanks again

Best,

On 8 February 2012 18:55, Millo Giovanni [via R]
ml-node+s789695n4370302...@n4.nabble.com wrote:
 Dear John,

 interesting. There must be a bottleneck somewhere, which possibly went
 unnoticed because econometricians seldom use so many data points. In
 fact 'plm' wasn't designed to handle only 700 Megs of data at a time;
 but we're happy to investigate in this direction too. E.g., I was aware
 of some efficiency problems if effect=twoways but I seem to understand
 that you are using effect=individual? -- which takes me to the main
 point.

 I understand that enclosing the data for a reproducible report, as
 requested by the posting guide, is awkward for such a big dataset. Yet
 it would be of great help if you at least produced:

 - an output of your procedure, in order to see what goes wrong and where
 - the output of traceback() called immediately after you got the error
 (idem)

 and possibly gave it a try with lm() applied to the very same formula
 and data, maybe into a system.time( ... ) statement.

 Else, the information you provide is way too scant to even make an
 educated guess. For example, it isn't clear whether the problem is
 related to plm() or to vcovHC.plm etc.

 As far as simple demeaning is concerned, you might try the following
 code, which really does only that. Be aware that **standard errors are
 biased** etc. etc., this is not meant to be a proper function but just a
 computational test for your data and a quick demonstration of demeaning.
 'plm()' is far more structured, for a number of reasons. Please execute
 it inside system.time() again.

 # test function for within model, BIASED SEs !! #
 ##
 ## ## example:
 ## data(Produc, package=plm)
 ## mod - FEmod(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
 index=Produc$state, data=Produc)
 ## summary(mod)
 ## ## compare with:
 ## library(plm)
 ## example(plm)

 demean-function(x,index,lambda=1,na.rm=F) {

 as.vector(x-lambda*tapply(x,index,mean,na.rm=na.rm)[unclass(as.factor(in
 dex))])
 Â  }
 FEmod-function(formula,index,data=ls()) {

 Â  ## fit a model without intercept in any case
 Â  formula-as.formula(paste(deparse(formula(formula)),-1,sep=))
 Â  X-model.matrix(formula,data=data)
 Â  y-model.response(model.frame(formula,data=data))
 Â  ## reduce index accordingly
 Â  names(index)-row.names(data)
 Â  ind-index[which(names(index)%in%row.names(X))]

 Â  ## within transf.
 Â  MX-matrix(NA,ncol=dim(X)[[2]],nrow=dim(X)[[1]])
 Â  for(i in 1:dim(X)[[2]]) {
 Â  Â  MX[,i]-demean(X[,i],index=ind,lambda=1)
 Â  Â  }
 Â  My-demean(y,index=ind,lambda=1)

 Â  ## estimate within model
 Â  femod-lm(My~MX-1)

 Â  return(femod)
 }
 ### end test function 


 Best,
 Giovanni

 ### original message #

 --

 Message: 28
 Date: Tue, 07 Feb 2012 15:35:07 +0100
 From: [hidden email]
 To: [hidden email]
 Subject: [R] fixed effects with clustered standard errors
 Message-ID: [hidden email]
 Content-Type: text/plain; charset=utf-8

 Dear R-helpers,

 I have a very simple question and I really hope that someone could help
 me

 I would like to estimate a simple fixed effect regression model with
 clustered standard errors by individuals.
 For those using Stata, the counterpart would be xtreg with the fe
 option, or areg with the absorb option and in both case the clustering
 is achieved with vce(cluster id)

 My question is : how could I do that with R ? An important point is that
 I have too many individuals, therefore I cannot include dummies and
 should use the demeaning usual procedure.
 I tried with the plm package with the within option, but R quikcly
 tells me that the memory limits are attained (I have over 10go ram!)
 while the dataset is only 700mo (about 50 000 individuals, highly
 unbalanced)
 I dont understand... plm do indeed demean the data so the computation
 should be fast and light enough... ?!

 Are there any other solutions ?
 Many thanks in advance ! ;)
 John


  end original message 
 Giovanni Millo, PhD
 Research Dept.,
 Assicurazioni Generali SpA
 Via Machiavelli 4,
 34132 Trieste (Italy)
 tel. +39 040 671184
 fax Â +39 040 671160


 Ai sensi del D.Lgs. 196/2003 si precisa che le informazi...{{dropped:12}}

 __

[R] Row-wise kronecker product with Matrix package

2012-02-09 Thread Ally

 
I'm trying to calculate the row-wise kronecker product A \Box B of two
sparse matrices A and B, and am struggling to find a quick way to do this
that takes advantage of sparseness.  I thought a good idea would be to use
rep to construct 2 matrices of the same dimension of the end product, and
multiply these two together: 

library(Matrix)
A-Matrix(c(1,0,0,0,0,1,2,0), 2, 4)
B-Matrix(c(2,5,0,0,0,1,0,0,0,0), 2, 5)

A[, rep(seq(ncol(A)), each = ncol(B))] * B[, rep(seq(ncol(B)),ncol(A))]

This works, but for much larger problems is slow (compared to keeping A and
B dense).  I was wondering why this happens, and whether there might be a
way around it?  

Thanks in advance for any advice, 

Alastair

--
View this message in context: 
http://r.789695.n4.nabble.com/Row-wise-kronecker-product-with-Matrix-package-tp4373437p4373437.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Cumulative R2 and Q2 values?

2012-02-09 Thread Charles Determan Jr

Greetings R users,

I have been working on running plsda and I would like to have the R2 and Q2
values.  I know the function R2 from the 'pls' package will generate both
R2 and Q2 but they are for each separate class.  Is there a way to get the
cumulative R2 and Q2 for the whole model?

R2(pls.new, estimate=all)

Response: B
   (Intercept)1 comps  2 comps
train  0.0   0.001759   0.3812
CV-0.03419  -0.082686   0.2983

Response: FR8
   (Intercept)  1 comps  2 comps
train  0.0   0.4362   0.4362
CV-0.03419   0.3936   0.3745

Response: S45
   (Intercept)  1 comps  2 comps
train  0.0   0.3682   0.7749
CV-0.03419   0.3061   0.7313

Again, what I am looking for is a way to get an overall R2 and Q2 values
for the entire model.  My thanks to any who can assist me,

Charles

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] computing scores from a factor analysis

2012-02-09 Thread francesca

William,
I had a problem similar to Wolfgang and I solved it through your help.
Many thanks!

Just an observation which sounded strange to me ( I am not a statistician,
just a wildlife biologist)
I have noticed  that  running the pca using principal with raw data  (and
therefore using scores=TRUE in the command line) gives different pca scores
than running the same pca with the correlation matrix (using scores=FALSE in
the command line and therefore calculating the scores in the way you
suggested to Wolfgang). Is that normal?

Thanks

francesca




--
View this message in context: 
http://r.789695.n4.nabble.com/computing-scores-from-a-factor-analysis-tp4306234p4372993.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Subset a datafram according to time

2012-02-09 Thread NickNz125

PREFERED WAY OF DOING IT 
I have a data set of observations every second for a month long period, I
want to extract the observations according to the  date  time of another
data frame ( the other data frame is in the same format). I want to do this
to match these  observations   to my test observations (in the other data
frame) which are done every 6 minutes. So basically im shrinking the data
frame of second observations to only display the date time observation every
6 minutes.

ALTERNATIVE WAY
In other words I just wanna extract every 6 minute observation value from
this dataframe of everysecond observations

datetimeobservations
02/08/2011 00:00 1.165  
02/08/2011 00:01 1.241
02/08/2011 00:02 1.232


Im pretty new to the porgram done 2 days on it learning and learning, im
getting my head around it.

Help would be much appreciated.

--
View this message in context: 
http://r.789695.n4.nabble.com/Subset-a-datafram-according-to-time-tp4372293p4372293.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] generate matrices

2012-02-09 Thread Jorge I Velez

What about this?

B - 500
X - matrix(rnorm(B*20), ncol = B)
colnames(X) - paste('x', 1:B, sep = )
head(X)

Thus, each column correspond to a sample of 20 numbers from a standard
normal distribution.

HTH,
Jorge.-


On Thu, Feb 9, 2012 at 8:38 AM, Blaz Simcic  wrote:

 Dear all,
 I would like to generate 500 matrices of 20 numbers from
 standard normal distribution with names x1,x2,x3,â¦.x500.
 I tried with loop for, but I donât know how to name matices :
 for (i in 1:500)  {
x[[i]] - matrix(rnorm(20), 4) }
 Any suggestion?
 Thanks,  BlaÅ¾
[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Outlier removal techniques

2012-02-09 Thread mails

Hello,


I need to analyse a data matrix with dimensions of 30x100.
Before analysing the data there is, however, a need to remove outliers from
the data.
I read quite a lot about outlier removal already and I think the most common
technique for that seems to be Principal Component Analysis (PCA). However,
I think that these technqiue is quite subjective. When is an outlier an
outlier?
I uploaded an example PCA plot here: 

http://s14.postimage.org/oknyya1ld/pca.png

Should we treat the green and red dots as outliers already or only the blue
one which
lies outside the 95% confidence interval. It seems very arbitrary how people
remove outliers using PCA.

I also thought about fitting a linear model through my data and look at
distribution of the residuals. 
However, the problem with using linear models is that one can actually never
be sure that the model
 used is the one which describes the data best. In model A, for instance, we
might treat sample 1 as and 
outlier but fitting a different model B sample 1 might not be an outlier at
all.

I had a brief look at k-means clustering as well but I think it's not the
right thing to go for. 
Again, how do one decide which cluster is an outler? And also it is known
that different 
cluster analysis lead to totally different results. So which one to choose?


Is there any other way to non-subjectively remove outliers from data?
I would really appreciated any ideas/comments you might have on that topic.


Cheers

--
View this message in context: 
http://r.789695.n4.nabble.com/Outlier-removal-techniques-tp4372652p4372652.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] poLCA and conditional dependence

2012-02-09 Thread Suranga Kasthurirathne

Dear all,

I'm an Sri Lankan undergraduate student. I'm also a total newbie to R.

My aim is to use the poLCA package to do a latent class analysis.
I found the documentation very helpful, but need to make a small
clarification that has stumped me awhile.

In my work, I need to make provision for conditional dependence. I'm told
that poLCA lets you do that. Unfortunately, I couldn't find a
specific example on how to do this.

However, I did find a reference to latent class regression using
 cbind(Y1,Y2,Y3)~X1+X2*X3 etc.

Forgive my ignorance, but is this the same as conditional dependence ? I
dont think so


-- 
Best Regards,

Suranga

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] GLM - guess the distribution of the response variable

2012-02-09 Thread wonko


Dear all,

I have question regarding GLMs:
I have a discrete response variable and a continuous explaining 
variable. Like this:

http://www.myimg.de/?img=example1db0f.jpg

I want to use a GLM to investigate. I have to specify the familiy of 
the distribution of the response variable - or, maybe more precise, the 
family of the distribution of the residuals of the response variable 
(the mean of the response should be explained by the deterministic model 
part of the GLM, as far as I understood).


How do I know the distribution family of the response? OK - its needs 
to be a discrete distribution, but there are Poisson, Binomial, Neg. 
Binomial.


Any ideas?

Many thanks,

wonko

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Arial font in eps figures in R

2012-02-09 Thread Sharon Kuhlmann-B

Hi,

I am trying to create a graph using Arial font (as required by PLoS One). I 
have read probably all posts in R-help on this topic, as wells as R-news 2006.

The code I have been trying is following:

Arial - Type1Font(family=Arial,
metrics=c(ArialMT.afm, arial-BoldMT.afm, 
Arial-ItalicMT.afm, Arial-BoldItalicMT.afm))
postscriptFonts(Arial=Arial)
postscript(testArial.eps, horizontal=F, onefile=F, width=4, height=4)
par(family=Arial)
plot(1:10, 1:10)
dev.off()

but getting the following error message:

Error in axis(side = side, at = at, labels = labels, ...) : 
  family 'Arial' not included in PostScript device


Alternatively I've also tried:

Arial - Type1Font(family=Arial,
metrics=c(ArialMT.afm, arial-BoldMT.afm, 
Arial-ItalicMT.afm, Arial-BoldItalicMT.afm))
postscriptFonts(Arial=Arial)
postscript(testArial.eps, horizontal=F, onefile=F, width=4, height=4, 
family=Arial)
plot(1:10, 1:10)
dev.off()

which resulted in:

Error in postscript(testArial.eps, horizontal = F, onefile = F, width = 4,  : 
  Failed to initialise default PostScript font


The *.afm files were downloaded from http://www.koders.com/, since I wasn't 
able to run ttf2afm or ttf2pt1 on my computer. The files are saved under 
../R/library/grDevices/afm.  I have also tried saving the files with LF line 
endings (instead of CRLF) as indicated in README under /grDevices/afm, and 
zipping them. That didn't make a difference.

I am currently running R-2.14.1 on Windows7. If I run postscriptFonts(), the 
Arial font is included in the list, very much as other types of fonts.

I've seen that some R packages are now using arial as default (e.g. googleVis 
and R2PPT) but I cannot see how it is done.

Thanks in advance for any suggestions.

/Sharon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] package does not have a NAMESPACE

2012-02-09 Thread Reinker, Stefan

A new version of the kopls package has been made available. This installs 
without problems in R 2.14 (windows 32-bit). Download from here:
https://sourceforge.net/projects/kopls/files/K-OPLS%20for%20R/1.1.2/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] subset select=variable with a list of names

2012-02-09 Thread Francisco


Hello,
I would like to make a function which extracts a subset, from a dataset, 
with only the columns that I want (specifying their names).


For example, having this matrix:
 mydata-matrix(c(22,1,3,2001,24,5,7,2002,26,7,8,2002,28,5,7,2003), 
byrow=TRUE, ncol=4, dimnames=list(c(1,2,3,4), 
c(age,day,month,year)))


 mydata

  age day month year
1  22   1 3 2001
2  24   5 7 2002
3  26   7 8 2002
4  28   5 7 2003


I would like to create a function like:
x-function(names) {subset(mydata, select=names) }

So I can choose every time which columns select, i.e. when I call:
x(age,day)

it would returns:
  age day
1  22   1
2  24   5
3  26   7
4  28   5

Obviously it is not working, and I don't know how to do to fix it. Do 
you have any suggestion?


Thank you very much

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] sample points - sp package

2012-02-09 Thread anaraster

When I make a sample in sp, for example a random samping of 5 points in
dataset1: 

sample.dataset1=spsample(dataset1, n = 5, random)

I have a sp object for the selected sampling...

My question is : How can I extract from dataset2 the points  data for the
coordinates(sample.dataset1) ?

The dataset2 would be an object with the same area and coordinate system,
but I would like to extract for the coordinates of sample.dataset1 the data
in each point for dataset2.

I didn't found a way to do this in the sp package docs, but I might have
missed it.

--
View this message in context: 
http://r.789695.n4.nabble.com/sample-points-sp-package-tp4372673p4372673.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] which R package is used for browsing web pages through coding

2012-02-09 Thread sagarnikam123

i know RCurl pakage to retrieve web content,it has limited use, i want
interactive package like

(in perl---Mechanize,
In java---Watij,Prowser,HTMLunit,HTTPunit,
in RubyWatir ,etc)

this modules/packages opens appropriate browser,which can create
queries,retrieves output, clicks buttons, fill up form
automatically,searches keyword in search engine, Downloads many items from
internet
All this is by coding 

if find ,kindly give me sample examples(codes) at list two/five
if not found,give me RCurl's sample codes starting from how to import
library to closing browser,
with explanation for each line of code



--
View this message in context: 
http://r.789695.n4.nabble.com/which-R-package-is-used-for-browsing-web-pages-through-coding-tp4372888p4372888.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help with TimeSeries

2012-02-09 Thread Vinicius Magalhães

Hello everyone!

I´ve started using R last week and I´m having really persistent problems
trying to make predictions using the HoltWinters function.

I´m sending my script and the errors I´m getting.

My data is:

 Jan Feb Mar Apr May Jun Jul Ago Sep  Oct  Nov  Dec
2004 118 143 169 158 143 135 135 140 135 125. 120. 120.
2005 143 158 180 180 150 150 153 148 150 145. 145. 143.
2006 165 180 190 168 163 163 168 175 180 160. 155. 150.
2007 152 163 185 163 158 175 175 178 168 168. 160. 160.
2008 155 170 195 195 195 185 185 165 165 153. 153. 153.


The data I´m receiving using AP.ts -
ts(AP,start=2004,end=2008,frequency=2)  is:

  Jan Feb Mar Apr May Jun Jul Ago Sep  Oct  Nov  Dec
2004.0 118 143 169 158 143 135 135 140 135 125. 120. 120.
2004.5 143 158 180 180 150 150 153 148 150 145. 145. 143.
2005.0 165 180 190 168 163 163 168 175 180 160. 155. 150.
2005.5 152 163 185 163 158 175 175 178 168 168. 160. 160.
2006.0 155 170 195 195 195 185 185 165 165 153. 153. 153.
2006.5 118 143 169 158 143 135 135 140 135 125. 120. 120.
2007.0 143 158 180 180 150 150 153 148 150 145. 145. 143.
2007.5 165 180 190 168 163 163 168 175 180 160. 155. 150.
2008.0 152 163 185 163 158 175 175 178 168 168. 160. 160.

However that´s not what I want. Using fr=1 I get my data fine but can´t
plot due to the error:

plot(AP.ts)
Error in plotts(x = x, y = y, plot.type = plot.type, xy.labels =
xy.labels,  :
  cannot plot more than 10 series as multiple

AND

 AP.hw - HoltWinters(AP.ts,seasonal=mult)
Error in decompose(ts(x[1L:wind], start = start(x), frequency = f),
seasonal) :
  time series has no or less than 2 periods
 plot(AP.hw)
Error in xy.coords(x, y) : 'x' and 'y' lengths differ


FILE=AWB1.csv
#FILE2=AWB2.csv

AP - read.csv(FILE,header=T)
AP
is(AP)

AP.ts - ts(AP,start=2004,end=2008,frequency=2)
#AP.ts
-ts(structure(c(read.csv(FILE),start=c(2004,1),end=c(2008,1),frequency=12)))
is(AP.ts)
AP.ts
dput(AP.ts)

plot(AP.ts)

library(zoo)
plot(zoo(AP.ts), ylim = c(0, 20))

AP.ts #is(AP.ts) #start(AP.ts);frequency(AP.ts);end(AP.ts)
dput(AP.ts)

AP.hw - HoltWinters(AP.ts,seasonal=mult)
plot(AP.hw)

AP.predict - predict(AP.hw,n.ahead=10)
AP.predict

ts.plot(AP.ts,AP.predict, lty=1:4)

### TESTE
data(AirPassengers)
aa - AirPassengers
aa
is(aa)
start(aa);frequency(aa);end(aa)

Thank you guys!
-- 
Vinicius Macedo Magalhães
(21) 9584-1533

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Tr: Re: how to pass weka classifier options with a meta classifier in RWeka?

2012-02-09 Thread Milan Bouchet-Valat

Le jeudi 09 février 2012 à 15:31 +0200, Kari Ruohonen a écrit :
 Hi,
 I am trying to replicate a training of AttributeSelectedClassifier with
 CFsSubsetEval, BestFirst and NaiveBayes that I have initially done with
 Weka. Now, I am trying to use RWeka in R.
 
 I have a problem of passing arguments to the CfsSubsetEval, BestFirst
 and NaiveBayes. I have first created an interface for the classifier
 with:
 
 AS-make_Weka_classifier(weka/classifiers/meta/AttributeSelectedClassifier)
 
 And then I am trying to run the classifier with:
 
 nb.model-AS(class~.,data=ex,
          control=Weka_control(
            E=weka.attributeSelection.CfsSubsetEval,
            S=weka.attributeSelection.BestFirst -D 1,
            W=weka.classifiers.bayes.NaiveBayes -D))
 
 But now, I get an error saying:
 
 Error in .jcall(classifier, V, buildClassifier, instances) : 
   java.lang.Exception: Can't find class called:
 weka.classifiers.bayes.NaiveBayes -D
 
 indicating that the way I am passing the argument -D to the NaiveBayes
 is incorrect. I am uncertain from the RWeka documentation how the
 passing mechanism of Weka_control is supposed to work with meta
 classifiers. All help is greatly appreciated.
I've never tried it myself, but ?Weka_control says:
     One can use lists for options taking multiple arguments, see the
     documentation for ‘SMO’ for an example.

So maybe
nb.model-AS(class~.,data=ex,
                           control=Weka_control(
                            E=weka.attributeSelection.CfsSubsetEval,
                            S=list(weka.attributeSelection.BestFirst, D=1),
                            W=list(weka.classifiers.bayes.NaiveBayes, D=1)))


Cheers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Weibull Distribution

2012-02-09 Thread Yogendra

Hi,

I want to estimate parameter of weibull distribution using mle for below
density function,
The PDF is,
f(x) = b a^-b x^(b-1)  exp -(x/a)^b  

In R ,density of the weibull distribution is,
f(x) = (a/b) (x/b)^(a-1) exp(- (x/b)^a)

which is different from my density,

I trying to estimate mle parmeter from R using fitdistr and fitdist
functions.
Can anyone tell me whatever parameter estimated from R are correct for my
density function or not?

Thanks 
Yogendra

--
View this message in context: 
http://r.789695.n4.nabble.com/Weibull-Distribution-tp4373016p4373016.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to specify the Oy axis length

2012-02-09 Thread ikuzar

Hi, 

I'd like to plot a point: plot(1,2)
I use RStudio. 
I'd like to know how to specify the length of Oy axis. I 'd like to put 20
as length of Oy but , by default, I 've got 2.5.

Thanks for your help

--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-specify-the-Oy-axis-length-tp4372937p4372937.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] problem with arrows( )

2012-02-09 Thread ikuzar

Hello, 

I have got an issue with this code:

plot(...)
xMontant = as.POSIXlt(ecs$startAt[i]) 
xDescendant = as.POSIXlt(ecs$endAt[i])
arrows(xMontant, 0, xMontant, ecsPOW, col=green)

Here is the error:
Erreur dans arrows(xMontant, 0, xMontant, ecsPOW, col = green) : 
  premier argument incorrect

which means:
 Error in arrows(xMontant, 0, xMontant, ecsPOW, col = green) : 
first argument is inccorrect

I have got the same error with segment(...) , usign the same arguments.

Does anybody find where is the problem ?
I checked if xMontant is in the right intervale, so it is. That's OK

thanks for your help





--
View this message in context: 
http://r.789695.n4.nabble.com/problem-with-arrows-tp4373152p4373152.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] AUC, C-index and p-value of Wilcoxon

2012-02-09 Thread linda Porz

Dear all,

I am using the ROCR library to compute the AUC and also the Hmisc library
to compute the C-index of a predictor and a group variable. The results of
AUC and C-index are similar and give a value of about 0.57. The Wilcoxon
p-value is 0.001! Why the AUC is showing small value and the p-value is
high significant? The AUC is based on Wilcoxon calculation?

Many thanks,
Lina

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Weibull Distribution

2012-02-09 Thread Rui Barradas

Hello,


 I want to estimate parameter of weibull distribution using mle for below
 density function,
 The PDF is,
 f(x) = b a^-b x^(b-1)  exp -(x/a)^b  
 
 In R ,density of the weibull distribution is,
 f(x) = (a/b) (x/b)^(a-1) exp(- (x/b)^a)
 
 which is different from my density,


Is this homework?
It seems you need to learn (portuguese) 8th grade math: your 'a' in R is 'b'
and your 'b' in R, 'a'.
Mathematically, they are exactly the same...

Rui Barradas


--
View this message in context: 
http://r.789695.n4.nabble.com/Weibull-Distribution-tp4373016p4373198.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Some question about rpart and splitting criterion

2012-02-09 Thread Myriam Tabasso

Dear All,
I have questions about the function rpart  to construct a regression tree in 
R code.
My problem is how to change the splitting criteria.

In the rpart we have : parms=list(split=..) , I ask you if in this 
command is it possible to use an another splitting criterion to substitute the 
default criteria( gini or information)?

Does someone can help me ?
Thank you,
Myriam Tabasso

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with arrows( )

2012-02-09 Thread ikuzar

I found the issue:

arrows( ) , segments( ) do not accept argument of type POSIXlt. One must
convert the argument into POSIXct and it works well.

--
View this message in context: 
http://r.789695.n4.nabble.com/problem-with-arrows-tp4373152p4373422.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ifelse

cat() returns a null value so it's problematic inside ifelse. Here you
probably need a regular if statement:

if(all(list %in% c(1, 10, 100))) cat(YES\n) else cat(NO\n) # The
== TRUE is redundant.

Michael

On Thu, Feb 9, 2012 at 11:37 AM, Arnaud Gaboury
arnaud.gabo...@a2ct2.com wrote:
 Hello,

 I need to print a screen message after a test.

list-c(10,1,100,10)
ifelse(all(list %in% c(1,10,100)==TRUE),cat(YES\n),cat(NO\n))
 YES
 Error in ifelse(all(l %in% c(1, 10, 100) == TRUE), cat(YES\n), cat(NO\n)) 
 :
  replacement has length zero


 I have the correct answer, YES, but with an Error.

 Why?

 TY for any help


 Arnaud Gaboury

 A2CT2 Ltd.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] calling the function which is stored in a list

2012-02-09 Thread Gabor Grothendieck

On Thu, Feb 9, 2012 at 2:54 AM, arunkumar akpbond...@gmail.com wrote:
 Hi

 I'm storing two functions in a list

 # creating two function
  function1 - function(n) {
  return(sum(n))
 }

 function2 - function(n) {
  return(mean(n))
 }

 #storing the function
 function3 =c(function1,function2)

 is it possible to call the stored function and used it ?

  x=c(10,29)
 funtion3[1](x)


In addition to [[ as in the other response also try this:

L - list(function1 = sum, function2 = mean)
L$function1(1:3)


Note that the proto package is somewhat similar but stores functions
in environments rather than lists and also supports a type of object
oriented programming:

library(proto)

p - proto(function1 = function(., n) sum(n),
  function2 = function(., n) mean(n),
  function3 = function(.) mean(.$x),
  x = 1:10
)

p$function1(1:3)
p$function3() # mean of 1:10
p$x - 1:5
p$function3() # mean of 1:5

# define child p2 with its own x overriding p's x
p2 - p$proto(x = 10:15)

# p2 has its own x so this is mean of 10:15
# p2 inherits its methods from p so its has a
#   function3 too
p2$function3()

p$function3() # p unchanged. Still mean of 1:5.



-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ifelse

TY much.

Works for me.

Arnaud Gaboury
 
A2CT2 Ltd.


-Original Message-
From: R. Michael Weylandt [mailto:michael.weyla...@gmail.com] 
Sent: jeudi 9 février 2012 17:47
To: Arnaud Gaboury
Cc: r-help@r-project.org
Subject: Re: [R] ifelse

cat() returns a null value so it's problematic inside ifelse. Here you probably 
need a regular if statement:

if(all(list %in% c(1, 10, 100))) cat(YES\n) else cat(NO\n) # The == TRUE is 
redundant.

Michael

On Thu, Feb 9, 2012 at 11:37 AM, Arnaud Gaboury arnaud.gabo...@a2ct2.com 
wrote:
 Hello,

 I need to print a screen message after a test.

list-c(10,1,100,10)
ifelse(all(list %in% c(1,10,100)==TRUE),cat(YES\n),cat(NO\n))
 YES
 Error in ifelse(all(l %in% c(1, 10, 100) == TRUE), cat(YES\n), cat(NO\n)) 
 :
  replacement has length zero


 I have the correct answer, YES, but with an Error.

 Why?

 TY for any help


 Arnaud Gaboury

 A2CT2 Ltd.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ifelse



On Feb 9, 2012, at 11:37 AM, Arnaud Gaboury wrote:


Hello,

I need to print a screen message after a test.


list-c(10,1,100,10)
ifelse(all(list %in% c(1,10,100)==TRUE),cat(YES\n),cat(NO\n))

YES
Error in ifelse(all(l %in% c(1, 10, 100) == TRUE), cat(YES\n),  
cat(NO\n)) :

 replacement has length zero


I have the correct answer, YES, but with an Error.

Why?


cat() returns NULL. Why ar enou not just returning the strings  
themselves or if you want to see the results as well as assign, then  
use print() which does return its arguments.





TY for any help


Arnaud Gaboury

A2CT2 Ltd.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Outlier removal techniques

2012-02-09 Thread Rich Shepard


On Thu, 9 Feb 2012, mails wrote:


I need to analyse a data matrix with dimensions of 30x100. Before
analysing the data there is, however, a need to remove outliers from the
data. I read quite a lot about outlier removal already and I think the
most common technique for that seems to be Principal Component Analysis
(PCA). However, I think that these technqiue is quite subjective. When is
an outlier an outlier? I uploaded an example PCA plot here:


  Those more expert than I will certainly provide answers. What I do will
new data is create box-and-whisker plots (I use the lattice package) which
defines outliers as those data beyond 1.5x the first or third quartile
values.

  No one but you can answer your question on when an outlier is an outlier.
It depends on your data set and the context of the data. For example, a
water chemistry value that far exceeds a regulartory threshold might be
meaningful in the context of a one-off excursion (in which case it's not an
outlier but a real data point) or it might result from a handling,
instrumentation, or analytical error (in which case toss it as an outlier).

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] generate matrices



On Feb 9, 2012, at 8:38 AM, Blaz Simcic wrote:


Dear all,
I would like to generate 500 matrices of 20 numbers from
standard normal distribution with names x1,x2,x3,….x500.
I tried with loop for, but I don’t know how to name matices :
for (i in 1:500)  {
   x[[i]] - matrix(rnorm(20), 4) }
Any suggestion?


Don't do it that way. Create an array-object instead.

 x - array(NA, dim=c(4,5,500) )
 x[] - rnorm(4*5*500)
 str(x)
 num [1:4, 1:5, 1:500] -0.721 0.896 -1.432 0.386 -1.111 ...
 dimnames(x)[[3]] - paste(x, 1:500, sep=)
 str(x)
 num [1:4, 1:5, 1:500] -0.721 0.896 -1.432 0.386 -1.111 ...
 - attr(*, dimnames)=List of 3
  ..$ : NULL
  ..$ : NULL
  ..$ : chr [1:500] x1 x2 x3 x4 ...

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to specify the Oy axis length

Perhaps the ylim argument to plot()

Michael

On Thu, Feb 9, 2012 at 9:04 AM, ikuzar raz...@hotmail.fr wrote:
 Hi,

 I'd like to plot a point: plot(1,2)
 I use RStudio.
 I'd like to know how to specify the length of Oy axis. I 'd like to put 20
 as length of Oy but , by default, I 've got 2.5.

 Thanks for your help

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/how-to-specify-the-Oy-axis-length-tp4372937p4372937.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] GLM - guess the distribution of the response variable



On Feb 9, 2012, at 7:32 AM, wo...@posteo.de wrote:


Dear all,

I have question regarding GLMs:
I have a discrete response variable and a continuous explaining  
variable. Like this:

http://www.myimg.de/?img=example1db0f.jpg

I want to use a GLM to investigate. I have to specify the familiy  
of the distribution of the response variable - or, maybe more  
precise, the family of the distribution of the residuals of the  
response variable (the mean of the response should be explained by  
the deterministic model part of the GLM, as far as I understood).


How do I know the distribution family of the response? OK - its  
needs to be a discrete distribution, but there are Poisson,  
Binomial, Neg. Binomial.


Is this homework, ...  in which case please read the Posting Guide.

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Wriritng to a CSV file

2012-02-09 Thread peter dalgaard


On Feb 8, 2012, at 21:56 , David Winsemius wrote:

 
 On Feb 8, 2012, at 2:29 PM, Ron Michael wrote:
 
 Okay, so I understood that appending can only happen row-wise. Therefore I 
 tried with following code:
 
 write.csv(matrix(1:5, 1), dat.csv)
 write.csv(matrix(1:5, 1), dat.csv, append = TRUE)
 Warning message:
 In write.csv(matrix(1:5, 1), dat.csv, append = TRUE) :
  attempt to set 'append' ignored
 
 It is destroying my previous file. Where I have done wrong?
 
 Failed to read the help page. `write.csv` has some of its setting hard coded 
 and will prevent you from changing them.

Exactly. Note, however, that write.csv is really just write.table with a 
particular set of arguments. Nothing is keeping you from using a similar set of 
arguments with append=TRUE in an explicit write.table() call.

It is of course debatable whether the behavior of write.csv is undue 
patronizing, but as I understand it, the rationale is that you can guarantee 
that write.csv creates a proper CSV file, but once you start appending, 
multiple things can go wrong: Forgetting to omit headers, different number of 
columns, different column types, etc. It is not realistic to have 
write.csv(...append=TRUE) make the necessary checks, and as it can't be sure to 
write a properly formed CSV, it just won't do it, period.


-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subset select=variable with a list of names

2012-02-09 Thread Nordlund, Dan (DSHS/RDA)

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Francisco
 Sent: Thursday, February 09, 2012 4:52 AM
 To: r-help@r-project.org
 Subject: [R] subset select=variable with a list of names

 Hello,
 I would like to make a function which extracts a subset, from a
 dataset,
 with only the columns that I want (specifying their names).

 For example, having this matrix:
   mydata-matrix(c(22,1,3,2001,24,5,7,2002,26,7,8,2002,28,5,7,2003),
 byrow=TRUE, ncol=4, dimnames=list(c(1,2,3,4),
 c(age,day,month,year)))

   mydata

age day month year
 1  22   1 3 2001
 2  24   5 7 2002
 3  26   7 8 2002
 4  28   5 7 2003

 I would like to create a function like:
 x-function(names) {subset(mydata, select=names) }

 So I can choose every time which columns select, i.e. when I call:
 x(age,day)

 it would returns:
age day
 1  22   1
 2  24   5
 3  26   7
 4  28   5

 Obviously it is not working, and I don't know how to do to fix it. Do
 you have any suggestion?

 Thank you very much

Given your function definition, the function call needs to be

x(c(age,day))

Whether it is good form to write a function like this, I will leave to others 
to comment. 

Hope this is helpful,

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Outlier removal techniques

2012-02-09 Thread Frank Harrell

I wonder why it is still standard practice in some circles to search for
outliers as opposed to using robust/resistent methods.  

Here is a great paper with a scientific approach to outliers:

@Article{fin06cal,
  author =   {Finney, David J.},
  title ={Calibration guidelines challenge outlier practices},
  journal =  The American Statistician,
  year = 2006,
  volume =   60,
  pages ={309-313},
  annote =   {anticoagulant
therapy;bias;causation;ethics;objectivity;outliers;guidelines for
treatment of outliers;overview of types of outliers;letter to the editor and
reply 61:187 May 2007}
}

Frank

Rich Shepard wrote
 
 On Thu, 9 Feb 2012, mails wrote:
 
 I need to analyse a data matrix with dimensions of 30x100. Before
 analysing the data there is, however, a need to remove outliers from the
 data. I read quite a lot about outlier removal already and I think the
 most common technique for that seems to be Principal Component Analysis
 (PCA). However, I think that these technqiue is quite subjective. When is
 an outlier an outlier? I uploaded an example PCA plot here:
 
Those more expert than I will certainly provide answers. What I do will
 new data is create box-and-whisker plots (I use the lattice package) which
 defines outliers as those data beyond 1.5x the first or third quartile
 values.
 
No one but you can answer your question on when an outlier is an
 outlier.
 It depends on your data set and the context of the data. For example, a
 water chemistry value that far exceeds a regulartory threshold might be
 meaningful in the context of a one-off excursion (in which case it's not
 an
 outlier but a real data point) or it might result from a handling,
 instrumentation, or analytical error (in which case toss it as an
 outlier).
 
 Rich
 
 __
 R-help@ mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Outlier-removal-techniques-tp4372652p4373592.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] AUC, C-index and p-value of Wilcoxon

On Thu, Feb 09, 2012 at 02:05:08PM +, linda Porz wrote:
 Dear all,
 
 I am using the ROCR library to compute the AUC and also the Hmisc library
 to compute the C-index of a predictor and a group variable. The results of
 AUC and C-index are similar and give a value of about 0.57. The Wilcoxon
 p-value is 0.001! Why the AUC is showing small value and the p-value is
 high significant? The AUC is based on Wilcoxon calculation?

Hi.

There is no direct relationship between AUC and p-value of 
Wilcoxon. AUC measures, how well two distributions may be
separated. The p-value measures, to which extent it is
clear that the distributions are different. The test is
significant, even if it is very clear that there is a tiny
difference between the two distributions. This may happen
for a large sample size. If the sample size increases,
then AUC for separating variables X, Y converges to P(X  Y),
which may be 0.57 and still, the p-value may converge to 0.

Hope this helps.

Can you send a concrete numerical example?

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] object size versus saved size?

2012-02-09 Thread luckjiff

Posting under new subject.

I’ve a question on saving a earth object to disk.
Say m is the model then
print(object.size(m), units=”Mb”)
163.8 Mb
and save it off
save(m, “tmp.rda”)
and “tmp.rda” is roughly 171Mb file.

I would like this to be much smaller.
So I try
m$bx-NULL
then
print(object.size(m), units=”Mb”)
14.9 Mb Great!
however when I save it off
save(m, “tmp.rda”)
and “tmp.rda” is roughly 171Mb file – the same size.

I do not care about the size of the file per se, I care about loading a
smaller object at a future date. 

--
View this message in context: 
http://r.789695.n4.nabble.com/object-size-versus-saved-size-tp4373478p4373478.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Grouping together a time variable

2012-02-09 Thread Abraham Mathew

I have the following variable, time, which is a character variable and it's
structured as follows.

 head(as.character(dat$time), 30) [1] 00:00:01 00:00:16 00:00:24 
 00:00:25 00:00:25 00:00:40 00:01:50 00:01:54 00:02:33 00:02:43 
 00:03:22
[12] 00:03:31 00:03:41 00:03:42 00:03:43 00:04:04 00:05:09
00:05:17 00:05:19 00:05:21 00:05:22 00:05:22
[23] 00:05:28 00:05:44 00:05:54 00:06:54 00:06:54 00:07:10
00:08:15 00:08:26


What I am trying to do is group the data into one hour increment. So
5:01-6:00am, 6:01-7:00am, 7:01-8:00a,
and so forth.

However, I'm not sure if there's a simple route to do this in R or how to
do it.
Can anyone point me in the right direction?

-- 
*Abraham Mathew
Statistical Analyst
www.amathew.com
720-648-0108
@abmathewks*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rearanging Data

2012-02-09 Thread Kevin Corry

Hi,

This worked great:

sub - subset(Claims, Year==Y1)

Thanks for your help

Kevin

On Thu, Feb 9, 2012 at 6:12 AM, David Winsemius dwinsem...@comcast.netwrote:


 On Feb 8, 2012, at 9:48 PM, kevin123 wrote:

  Hi,

 This is only a small portion of the Data i am working on
 I want to make a subset of this data set( Data Set=Claims)

  MemberID   ProviderID Vendor   PCPYear  Specialty
 1 422869788013252 172193 37796   Y1Surgery
 2 979032483316066 726296  5300Y3Internal
 3  27594272997752 140343 91972Y1   Internal
 4 735705597053364 240043 70119   Y3   Laboratory

 I want to put all the rows containing Y1 into a subset. I tried this but
 it
 does not work

 sub - subset(Claims, Year=Y1)


 You (as well , apparently, as Tal) don't seem to have learned that = is
 an assignment function and the needed Comparison operator is ==

 ?Comparison

 Try instead:

 sub - subset(Claims, Year==Y1)

 --
 David Winsemius, MD
 West Hartford, CT



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] To write a function for getting the node numbers in rpart analysis..

2012-02-09 Thread Yashwanth M.R

## To read .csv file(data) and performing rpart ##
# library(Hmisc)
library(rpart)

read.csv(D:/Training/RRE/Revolution(My Projects)/Project1/RProject(pmml
from SPSS)/telco_churn.csv, header = TRUE, sep = ,, quote=\, dec=.)
# help(read.csv)
Telco- read.table(file = D:/Training/RRE/Revolution(My
Projects)/Project1/RProject(pmml from SPSS)/telco_churn.csv, header = TRUE,
sep = ,)
names(Telco)

##
##

## To make factor variable ##
Telco$region-factor(Telco$region,levels=c(1,2,3,4,5),labels=c(Zone1,Zone2,Zone3,Zone4,Zone5))

Telco$ed-factor(Telco$ed,levels=c(1,2,3,4,5),labels=c(DidNotHighSchool,HighSchoolDegree,SomeCollege,

CollegeDegree,PostUndergraduateDegree))

Telco$marital-factor(Telco$marital,,levels=c(0,1),labels=c(Married,Un-married))

Telco$churn-factor(Telco$churn,levels=c(0,1),labels=c(NO,YES))

##
##

## To perform rpart ##

Telco.rpart - rpart(churn ~
region+ed+marital+tenure+age+address+income,data=Telco,method=class)
plot(Telco.rpart,compress=FALSE,uniform=TRUE)
text(Telco.rpart,use.n = TRUE, cex = .75) 

##
##

## To predict rpart ##

TelcoPredicted - predict(Telco.rpart,type=class) # Prediction/Scoring


##
##





In the above computation, I would prefer to get the list of node numbers
parallely  for the each of the predictions of TelcoPredicted. Please help
me writing a function for this.

--
View this message in context: 
http://r.789695.n4.nabble.com/To-write-a-function-for-getting-the-node-numbers-in-rpart-analysis-tp4372508p4372508.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Row-wise kronecker product with Matrix package

2012-02-09 Thread ilai

Maybe one of these will improve:
help.search('kronecker')
...
spam::kronecker Kronecker Products on Sparse Matrices
spam::spam.classClass spam
base::kronecker Kronecker Products on Arrays
Matrix::kronecker-methods
Methods for Function 'kronecker()' in Package
'Matrix'

I doubt it because they will calculate the full Kronecker prod and
it will be up to you to index the rows, but you never know...

 system.time(A[, rep(seq(ncol(A)), each = ncol(B))] *
B[,rep(seq(ncol(B)),ncol(A))])
   user  system elapsed
  0.016   0.000   0.019
 system.time(kronecker(A,B)[c(1,4),])
   user  system elapsed
  0.008   0.000   0.008
 system.time(spam::kronecker(A,B)[c(1,4),])
   user  system elapsed
  0.008   0.000   0.009

Cheers

On Thu, Feb 9, 2012 at 9:38 AM, Ally a.rushwo...@stats.gla.ac.uk wrote:

 I'm trying to calculate the row-wise kronecker product A \Box B of two
 sparse matrices A and B, and am struggling to find a quick way to do this
 that takes advantage of sparseness.  I thought a good idea would be to use
 rep to construct 2 matrices of the same dimension of the end product, and
 multiply these two together:

 library(Matrix)
 A-Matrix(c(1,0,0,0,0,1,2,0), 2, 4)
 B-Matrix(c(2,5,0,0,0,1,0,0,0,0), 2, 5)

 A[, rep(seq(ncol(A)), each = ncol(B))] * B[, rep(seq(ncol(B)),ncol(A))]

 This works, but for much larger problems is slow (compared to keeping A and
 B dense).  I was wondering why this happens, and whether there might be a
 way around it?

 Thanks in advance for any advice,

 Alastair

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Row-wise-kronecker-product-with-Matrix-package-tp4373437p4373437.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] GLM - guess the distribution of the response variable

2012-02-09 Thread Bert Gunter

Below. -- Bert

On Thu, Feb 9, 2012 at 9:06 AM, David Winsemius dwinsem...@comcast.net wrote:

On Feb 9, 2012, at 7:32 AM, wo...@posteo.de wrote:

Dear all,

I have question regarding GLMs:
I have a discrete response variable and a continuous explaining variable.
Like this:
http://www.myimg.de/?img=example1db0f.jpg

I want to use a GLM to investigate. I have to specify the familiy of the
distribution of the response variable - or, maybe more precise, the family
of the distribution of the residuals of the response variable (the mean of
the response should be explained by the deterministic model part of the GLM,
as far as I understood).

How do I know the distribution family of the response? OK - its needs to
be a discrete distribution, but there are Poisson, Binomial, Neg. Binomial.

Is this homework, ... in which case please read the Posting Guide.

-- and if not, please do your homework by doing some reading and/or
consulting your local statistician.

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

[R] loading packages in a function

Hello,

I sourced successfully my function().

I need to load libraries, so I wrote this inside my function():

function()

{

#load needed library
library(plyr)
library(car)

/...

}

It is OK, but I have this on my invite command when running the function:

 function()
Loading required package: MASS
Loading required package: nnet
YOU DID A GOOD JOB,SEND EMAIL

Last line is the supposed result of my function, so it ok.


How to get rid of the first two lines, only for esthetic purpose?

TY for your time.


Arnaud Gaboury
 
A2CT2 Ltd.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Setting up infile for R CMD BATCH

2012-02-09 Thread Gang Chen

Hi Elai, yes, the approach works out pretty well. Thanks a lot for
spending time on this and for the great help!

Gang

On Wed, Feb 8, 2012 at 5:43 PM, ilai ke...@math.montana.edu wrote:
 I'm going to declare this SOLVED. Yes, if you don't want a separate
 script for batch, you will need to modify the original script so it
 either readline or skips it. Here is an example:

 # Save in file myTest.R
 
 # Add this local function to the beginning of your original program
 and replace all the readline prompts with myreadline.
 # The new function takes on two arguments:
 # what: the original object (in your example it was type-...)
 # prompt: The same string from readline
 # All it is doing is searching for Answers.R, sourcing if available
 or prompting if not.

 myreadline - function(what,prompt){
  ans - length(grep('Answers.R',list.files(),fixed=T))0  # add a
 warning for multiple files
  if(ans) source('Answers.R')
  else{
    ret - as.integer(readline(prompt))
    assign(what,ret,1)
  }
 }

 # here is an interactive program to square only negative values

 print(x - rnorm(1))
 myreadline('type','x0 ? 1:T,2:F \n')
 print(x^type)

 ### END myTest.R 

 Running myTest interactivly:

 source('myTest.R')
 [1] -0.3712215
 x0 ? 1:T,2:F
 2
 [1] 0.1378054           # -.37^2
 source('myTest.R')
 [1] 0.3160747
 x0 ? 1:T,2:F
 1
 [1] 0.3160747          # .316^1

 # Create a list of answers
 dump('type',file='Answers.R')

 # run again this time with answers available
 source('myTest.R')
 [1] -1.088665   # skips prompt
 [1] -1.088665   # -1.088^1 (type in Answer.R ==1)

 # Now you can also run as batch
 $ R CMD BATCH myTest.R out.R
 $ cat out.R
 # ...
 print(x - rnorm(1))
 [1] -1.248487
 myreadline('type','x0 ? 1:T,2:F \n')
 print(x^type)
 [1] -1.248487

 That's it. Only thing is to keep the file names in the working
 directory straight.

 Enjoy


 On Wed, Feb 8, 2012 at 12:39 PM, Gang Chen gangch...@gmail.com wrote:
 Sorry Elai for the confusions.

 Let me try to reframe my predicament. The main program myTest.R has
 been written in interactive mode with many readline() lines embedded.
 Suppose a user has already run the program once before in interactive
 mode with all the answers saved in a text file called answer.R. Now
 s/he does not want to go through the interactive again because it's a
 tedious process, and would like to run the program but in batch mode
 with answer.R. And that's why I tried the following which didn't pan
 out:

 R CMD BATCH answer.R output.Rout

 Of couse I could rewrite a separate program specifically for batch
 mode as you suggested previously with eval(), for example. However, is
 there an approach to keeping the original program so that the user
 could run both interactive and batch mode? That probably requires
 modifying the readline() part, but how?

 Thanks,
 Gang


 On Wed, Feb 8, 2012 at 2:08 PM, ilai ke...@math.montana.edu wrote:
 Gang,
 Maybe someone here has a different take on things. I'm afraid I have
 no more insights on this unless you explain exactly what you are
 trying to achieve, or more importantly why? That may help understand
 what the problem really is.

 Do you want to save an interactive session for future runs? then
 ?save.image and all your answers are in the environment. In this
 case consider putting an if(!exists('type') | length(type)1 |
 is.na(type)) before type- readline(...)  in your script so type
 wouldn't be overwritten in subsequent runs.

 If your goal is to batch evaluate multiple answer files from users
 (why else would you ask questions with readline?), then you should
 have enough to go on with my answer and the examples in ?eval.

 Elai


 On Wed, Feb 8, 2012 at 9:04 AM, Gang Chen gangch...@gmail.com wrote:
 Hi Elai,

 Thanks a lot for the suggestions! I really appreciate it...

 Your suggestion of using eval() and creating those answers in a list
 would work, but there is no alternative to readline() with which I
 could read the input in batch mode? I'm asking this because I'd like
 to have the program work in both interactive and batch mode.

 Thanks again,
 Gang


 On Wed, Feb 8, 2012 at 12:50 AM, ilai ke...@math.montana.edu wrote:
 Ahh,
 I think I'm getting it now. Well, readlines() is not going to work for
 you. The help file ?readline clearly states In non-interactive use
 the result is as if the response was RETURN and the value is ‘’.
 The implication is you cannot use it to insert different answers as
 if you were really there.
 How about using eval() instead? You will need to make the answers a
 named list (or just assigned objects).

 test - expression({
  if(a2) print('+')
  else print('I got more')
  b - b+3   # reassign b in the environment
  print(b)
  print(c)
  d^2
 })
 dump('test',file='myTest.R') ; rm(test)

 # make the answers.R file:

 a=5 ; b=2 ; c=2 ; d=3
 source(myTest.R)
 eval(test)

 # Now, from the terminal  R CMD BATCH answers.R out.R
 # And here is my $ cat out.R
 ... flushed
 a=5 ; b=2 ; c=2 ; d=3

Re: [R] Outlier removal techniques

2012-02-09 Thread Nordlund, Dan (DSHS/RDA)

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Frank Harrell
 Sent: Thursday, February 09, 2012 9:19 AM
 To: r-help@r-project.org
 Subject: Re: [R] Outlier removal techniques

 I wonder why it is still standard practice in some circles to search
 for
 outliers as opposed to using robust/resistent methods.

 Here is a great paper with a scientific approach to outliers:

 @Article{fin06cal,
   author = {Finney, David J.},
   title =  {Calibration guidelines challenge outlier
 practices},
   journal =The American Statistician,
   year =   2006,
   volume = 60,
   pages =  {309-313},
   annote = {anticoagulant
 therapy;bias;causation;ethics;objectivity;outliers;guidelines for
 treatment of outliers;overview of types of outliers;letter to the
 editor and
 reply 61:187 May 2007}
 }

 Frank

 Rich Shepard wrote

  On Thu, 9 Feb 2012, mails wrote:

  I need to analyse a data matrix with dimensions of 30x100. Before
  analysing the data there is, however, a need to remove outliers from
 the
  data. I read quite a lot about outlier removal already and I think
 the
  most common technique for that seems to be Principal Component
 Analysis
  (PCA). However, I think that these technqiue is quite subjective.
 When is
  an outlier an outlier? I uploaded an example PCA plot here:

 Those more expert than I will certainly provide answers. What I do
 will
  new data is create box-and-whisker plots (I use the lattice package)
 which
  defines outliers as those data beyond 1.5x the first or third
 quartile
  values.

 No one but you can answer your question on when an outlier is an
  outlier.
  It depends on your data set and the context of the data. For example,
 a
  water chemistry value that far exceeds a regulartory threshold might
 be
  meaningful in the context of a one-off excursion (in which case it's
 not
  an
  outlier but a real data point) or it might result from a handling,
  instrumentation, or analytical error (in which case toss it as an
  outlier).

  Rich

  __
  R-help@ mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

I would echo what Frank says.  I would also add that in the absence of 
demonstrated measurement/recording errors, there is good reason to explain 
the extreme values as well as the  typical values.  If a model can't deal with 
extreme values, then it may be good enough for some purposes, but it is not a 
complete explanation and may fail at the worst time.  I would highly 
recommend the book The Black Swan by Nassim Nicholas Taleb (NOT the ballet 
story).

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] fill an array by rows

2012-02-09 Thread Jim Maas

I've dug around but not been able to find anything, am probably missing 
something obvious.


How can I fill a three-dimensional (or higher dimension) array by rows 
instead of columns.


eg

new1 - array(c(1:125), c(5,5,5))

works fine for me but fills it by columns and

new2 - array(c(1:125), c(5,5,5), byrow=TRUE)

throws an error.

Am I missing something obvious?  I also tried transposing the 3-d array 
but that didn't work either.


TIA

Jim

--
Dr. Jim Maas
University of East Anglia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] generate matrices

2012-02-09 Thread Bert Gunter

Inline below.

-- Bert

On Thu, Feb 9, 2012 at 8:59 AM, David Winsemius dwinsem...@comcast.net wrote:

 On Feb 9, 2012, at 8:38 AM, Blaz Simcic wrote:

 Dear all,
 I would like to generate 500 matrices of 20 numbers from
 standard normal distribution with names x1,x2,x3,….x500.
 I tried with loop for, but I don’t know how to name matices :
 for (i in 1:500)  {
   x[[i]] - matrix(rnorm(20), 4)     }
 Any suggestion?


 Don't do it that way. Create an array-object instead.

 x - array(NA, dim=c(4,5,500) )
 x[] - rnorm(4*5*500)

Inefficient. Do it in one go:

x - array(rnorm(20*500), dim = c(4,5,500))



 str(x)
  num [1:4, 1:5, 1:500] -0.721 0.896 -1.432 0.386 -1.111 ...
 dimnames(x)[[3]] - paste(x, 1:500, sep=)
 str(x)
  num [1:4, 1:5, 1:500] -0.721 0.896 -1.432 0.386 -1.111 ...
  - attr(*, dimnames)=List of 3
  ..$ : NULL
  ..$ : NULL
  ..$ : chr [1:500] x1 x2 x3 x4 ...

 --
 David Winsemius, MD
 West Hartford, CT

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fill an array by rows

2012-02-09 Thread William Dunlap

Use aperm on the output of array:
   aperm(array(1:8, dim=c(2,2,2)), perm=c(2,1,3))
  , , 1
  
   [,1] [,2]
  [1,]12
  [2,]34

  , , 2

   [,1] [,2]
  [1,]56
  [2,]78

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Jim Maas
 Sent: Thursday, February 09, 2012 9:47 AM
 To: r-help@r-project.org
 Subject: [R] fill an array by rows
 
 I've dug around but not been able to find anything, am probably missing
 something obvious.
 
 How can I fill a three-dimensional (or higher dimension) array by rows
 instead of columns.
 
 eg
 
 new1 - array(c(1:125), c(5,5,5))
 
 works fine for me but fills it by columns and
 
 new2 - array(c(1:125), c(5,5,5), byrow=TRUE)
 
 throws an error.
 
 Am I missing something obvious?  I also tried transposing the 3-d array
 but that didn't work either.
 
 TIA
 
 Jim
 
 --
 Dr. Jim Maas
 University of East Anglia
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fill an array by rows



On Feb 9, 2012, at 12:47 PM, Jim Maas wrote:

I've dug around but not been able to find anything, am probably  
missing something obvious.


How can I fill a three-dimensional (or higher dimension) array by  
rows instead of columns.


eg

new1 - array(c(1:125), c(5,5,5))

works fine for me but fills it by columns and

new2 - array(c(1:125), c(5,5,5), byrow=TRUE)

throws an error.

Am I missing something obvious?  I also tried transposing the 3-d  
array but that didn't work either.


You might want to look at aperm:

 Perhaps:
new2 - aperm(new1, c(2,1,3))





--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] complex subscript/superscript on axis labels

2012-02-09 Thread Eileen Meyer


Hi All,

I am having trouble getting  a complex subscript to work.  I'm sure it's 
possible.  Here is what I have:



 ylab=expression(paste(log ,L[peak], [erg ,s^{-1},])),

I would like to have the subscript read peak,gamma where the gamma 
would be the greek symbol.  I do want the comma to show as well.


Thanks,

EM

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Version control (git, mercurial) for R packages

2012-02-09 Thread Peter Langfelder

On Thu, Feb 9, 2012 at 1:46 AM, Gregory Jefferis jeffe...@gmail.com wrote:
 Dear Peter,

 Trying to respond to your original question

Thanks for staying on thread :)




 I have a git repositiory in the root of my packages:

 ie

 package-foo/.git
 package-foo/R
 package-foo/inst

 Running make check in place warns about the presence of binary files, but
 that is the only problem I have found – have you come across another
 problem.

Same warning here. Which made me think that R CMD build will probably
tar up the git repository along with the package, which is not
something I would like to do, and which CRAN people most likely won't
tolerate in a package on CRAN.

 I have also switched to using Hadley Wickham's devtools (+ roxygen
 for docs). Both highly recommended. devtools can build a package and check
 it in a temporary location – this is safer than checking in place and avoids
 that warning.

I'll check that out.


 You can see layouts for Hadley's packages on github e.g. :

 https://github.com/hadley/devtools

 Best wishes,

 Greg.

Thanks for the suggestions. It occurred to me that I can write a short
script that will copy the package minus the git repository into a
temporary location and check or build it there, but I'll look into
devtools as well.

Peter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] complex subscript/superscript on axis labels



On Feb 9, 2012, at 12:53 PM, Eileen Meyer wrote:


Hi All,

I am having trouble getting  a complex subscript to work.  I'm sure  
it's possible.  Here is what I have:



ylab=expression(paste(log ,L[peak], [erg ,s^{-1},])),

I would like to have the subscript read peak,gamma where the gamma  
would be the greek symbol.  I do want the comma to show as well.


(I do not see a gamma so using your verbal dsecription. The rest of  
your intent is muddied by the multiple subscripts. It's always  
difficult to know what people are thinking when they construct  
plotmath expressions and do not describe their full intent in English.  
Here's my guess:


ylab=expression(log~L[peak*,*gamma][~erg~s^-1])
 plot(1,1,ylab=ylab)

The paste function is rarely needed and generally confuses the intent  
of the author. Using ~ and * generally results in more  
comprehensible expressions.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Version control (git, mercurial) for R packages

2012-02-09 Thread Hadley Wickham

 Same warning here. Which made me think that R CMD build will probably
 tar up the git repository along with the package, which is not
 something I would like to do, and which CRAN people most likely won't
 tolerate in a package on CRAN.

It doesn't.  And you can always use .Rbuildignore to ignore other files.

  I have also switched to using Hadley Wickham's devtools (+ roxygen
 for docs). Both highly recommended. devtools can build a package and check
 it in a temporary location – this is safer than checking in place and avoids
 that warning.

 I'll check that out.

Yes, I strongly advise building and checking, or building or
installing.  From what I can tell, this is what R-core does, and it
makes life much easier.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] loading packages in a function

Perhaps the longest function in base R I know of:

suppressPackageStartupMessages

e.g.,

suppressPackageStartupMessages(library(zoo))

Michael

On Thu, Feb 9, 2012 at 12:44 PM, Arnaud Gaboury
arnaud.gabo...@a2ct2.com wrote:
 Hello,

 I sourced successfully my function().

 I need to load libraries, so I wrote this inside my function():

 function()

 {

 #load needed library
 library(plyr)
 library(car)

 /...

 }

 It is OK, but I have this on my invite command when running the function:

 function()
 Loading required package: MASS
 Loading required package: nnet
 YOU DID A GOOD JOB,SEND EMAIL

 Last line is the supposed result of my function, so it ok.


 How to get rid of the first two lines, only for esthetic purpose?

 TY for your time.


 Arnaud Gaboury

 A2CT2 Ltd.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] complex subscript/superscript on axis labels



On Feb 9, 2012, at 1:20 PM, David Winsemius wrote:



On Feb 9, 2012, at 12:53 PM, Eileen Meyer wrote:


Hi All,

I am having trouble getting  a complex subscript to work.  I'm sure  
it's possible.  Here is what I have:



ylab=expression(paste(log ,L[peak], [erg ,s^{-1},])),

I would like to have the subscript read peak,gamma where the  
gamma would be the greek symbol.  I do want the comma to show as  
well.


(I do not see a gamma so using your verbal dsecription. The rest of  
your intent is muddied by the multiple subscripts. It's always  
difficult to know what people are thinking when they construct  
plotmath expressions and do not describe their full intent in  
English. Here's my guess:


ylab=expression(log~L[peak*,*gamma][~erg~s^-1])
plot(1,1,ylab=ylab)

The paste function is rarely needed and generally confuses the  
intent of the author. Using ~ and * generally results in more  
comprehensible expressions.


And it might look more physics-al if you had a cdot between the  
units:


[~erg %.% s^-1]


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Version control (git, mercurial) for R packages

2012-02-09 Thread Hadley Wickham

 I'm exploring using a version control system to keep better track
 of changes to the packages I maintain. I'm leaning towards git
 (although mercurial also looks good) but am not sure what is the
 best way to set up the repository. It seems I can't set the
 repository directly within the R package main directory, since it
 will be incompatible with the standard package structure. I can set
 the repository in the directory that contains my R packages, but
 then I have a single repository for all of my packages, which is
 not ideal.

 Git is nice - but if you ar looking for simplicity in usage for R
 packages, I guess r-forge and rforge are the easiest to use.

I think git + github is substantially easier to use, especially if you
want to incorporate patches from the community.

 But I would be interested in the workflow when using github as the
 main VCS.

The thing that has made this really successful for me is the
install_github function in devtools.  Then it's easy for people to try
out development versions:

install.packages(devtools)
library(devtools)
install_github(myrepo, myusername)

This requires a working R development environment but thanks to the
hard work of Simon Urbanek, Duncan Murdoch, Brian Ripley and others,
this is a one-install process on all major platforms.

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grouping together a time variable

Perhaps cut.POSIXt (which is a generic so you can just call cut)
depending on the unstated form of your time object.

Michael

On Thu, Feb 9, 2012 at 12:15 PM, Abraham Mathew abmathe...@gmail.com wrote:
 I have the following variable, time, which is a character variable and it's
 structured as follows.

 head(as.character(dat$time), 30) [1] 00:00:01 00:00:16 00:00:24 
 00:00:25 00:00:25 00:00:40 00:01:50 00:01:54 00:02:33 00:02:43 
 00:03:22
 [12] 00:03:31 00:03:41 00:03:42 00:03:43 00:04:04 00:05:09
 00:05:17 00:05:19 00:05:21 00:05:22 00:05:22
 [23] 00:05:28 00:05:44 00:05:54 00:06:54 00:06:54 00:07:10
 00:08:15 00:08:26


 What I am trying to do is group the data into one hour increment. So
 5:01-6:00am, 6:01-7:00am, 7:01-8:00a,
 and so forth.

 However, I'm not sure if there's a simple route to do this in R or how to
 do it.
 Can anyone point me in the right direction?

 --
 *Abraham Mathew
 Statistical Analyst
 www.amathew.com
 720-648-0108
 @abmathewks*

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rearanging Data

2012-02-09 Thread Phil Spector


Try

  sub - subset(Claims, Year==Y1)

In R, the equality test is performed by two
equal signs, not one.

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu



On Wed, 8 Feb 2012, kevin123 wrote:


Hi,

This is only a small portion of the Data i am working on
I want to make a subset of this data set( Data Set=Claims)

 MemberID   ProviderID Vendor   PCPYear  Specialty
1 422869788013252 172193 37796   Y1Surgery
2 979032483316066 726296  5300Y3Internal
3  27594272997752 140343 91972Y1   Internal
4 735705597053364 240043 70119   Y3   Laboratory

I want to put all the rows containing Y1 into a subset. I tried this but it
does not work

sub - subset(Claims, Year=Y1)

I would greatly appreciate any help

Kevin

--
View this message in context: 
http://r.789695.n4.nabble.com/Rearanging-Data-tp4371717p4371717.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] AUC, C-index and p-value of Wilcoxon

On Thu, Feb 09, 2012 at 06:33:09PM +0100, Petr Savicky wrote:
 On Thu, Feb 09, 2012 at 02:05:08PM +, linda Porz wrote:
  Dear all,
  
  I am using the ROCR library to compute the AUC and also the Hmisc library
  to compute the C-index of a predictor and a group variable. The results of
  AUC and C-index are similar and give a value of about 0.57. The Wilcoxon
  p-value is 0.001! Why the AUC is showing small value and the p-value is
  high significant? The AUC is based on Wilcoxon calculation?
 
 Hi.
 
 There is no direct relationship between AUC and p-value of 
 Wilcoxon. AUC measures, how well two distributions may be
 separated. The p-value measures, to which extent it is
 clear that the distributions are different. The test is
 significant, even if it is very clear that there is a tiny
 difference between the two distributions. This may happen
 for a large sample size. If the sample size increases,
 then AUC for separating variables X, Y converges to P(X  Y),
 which may be 0.57 and still, the p-value may converge to 0.

This effect may be demonstrated as follows. Try

  n - 50
  for (i in 1:10) {
  x - rnorm(n)
  y - rnorm(n) + 0.25
  out - wilcox.test(x, y, paired=FALSE)
  AUC - 1 - out$statistic/n^2
  cat(AUC, out$p.value, \n)
  }

The result may be

  0.6132 0.05147433 
  0.5396 0.4971117 
  0.54 0.492754 
  0.5444 0.446199 
  0.5528 0.3646515 
  0.5692 0.2343673 
  0.6168 0.044479 
  0.5748 0.1985487 
  0.5152 0.796007 
  0.5528 0.3646515 

The p-values are moderately significant or not
significant.

Try the same with n - 1000

  0.564124 6.84383e-07 
  0.572522 1.953299e-08 
  0.575895 4.170283e-09 
  0.5651 4.623185e-07 
  0.584841 5.029007e-11 
  0.567354 1.829507e-07 
  0.601411 4.053585e-15 
  0.608903 3.356404e-17 
  0.583801 8.610077e-11 
  0.570637 4.497502e-08 

The AUC estimates have lower variance, but otherwise
have similar values. However, the p-values are now
small, since the sample is larger. With a larger sample,
the test is more sensitive and detects even a small
difference as a significant one.

Hope this helps.

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] loading packages in a function