Re: [R] length of a string

2007-09-05 Thread Bill.Venables
sLengths - with(dataFrame, nchar(as.character(SEQUENCE))) 


Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile: +61 4 8819 4402
Home Phone: +61 7 3286 7700
mailto:[EMAIL PROTECTED]
http://www.cmis.csiro.au/bill.venables/ 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of João Fadista
Sent: Wednesday, 5 September 2007 11:51 PM
To: r-help@stat.math.ethz.ch
Subject: [R] length of a string

Dear all,
 
I would like to know how can I compute the length of a string in a dataframe. 
Example:
 
SEQUENCE   ID
TGCTCCCATCTCCACGGHR04FS00645
ACTGAACTCCCATCTCCAAT  HR0595847847
 
I would like to know how to compute the length of each SEQUENCE.
 

Best regards,
João Fadista

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] t-distribution

2007-08-01 Thread Bill.Venables
for the upper tail:

 1-pt(1.11, 9)
[1] 0.1478873


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Nair, Murlidharan
T
Sent: Thursday, 2 August 2007 4:43 AM
To: r-help@stat.math.ethz.ch
Subject: [R] t-distribution

If I have a calculated t can I get the probability associated with it
using an R function by giving it the df and t? I know I can do the whole
calculation using t.test() or get the t-distribution using qt().  If
t=1.11 and df =9 can I get the probability? 

 

Thanks../Murli 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] t-distribution

2007-08-01 Thread Bill.Venables
Well, is t = 1.11 all that accurate in the first place?  :-)

In fact, reading beween the lines of the original enquiry, what the
person probably wanted was something like

ta - pt(-1.11, 9) + pt(1.11, 9, lower.tail = FALSE)

which is the two-sided t-test tail area.

The teller of the parable will usually leave some things unexplained...

Bill. 


Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile: +61 4 8819 4402
Home Phone: +61 7 3286 7700
mailto:[EMAIL PROTECTED]
http://www.cmis.csiro.au/bill.venables/ 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ben Bolker
Sent: Thursday, 2 August 2007 4:57 AM
To: r-help@stat.math.ethz.ch
Subject: Re: [R] t-distribution

 Bill.Venables at csiro.au writes:

 
 for the upper tail:
 
  1-pt(1.11, 9)
 [1] 0.1478873
 
   wouldn't 
 pt(1.11, 9, lower.tail=FALSE)
  be more accurate?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] automatically generating and accessing data frames of varyingdimensions

2007-07-12 Thread Bill.Venables
I'm not sure why you want to do it this way (it would probably help if
we had a more complete picture of what you were really trying to do, but
here are a few possibilities for the questions you ask.

1. generating data frames.

rw - c(4,5,2)
cl - c(3,3,3)

for(i in 1:length(rw))
assign(paste(auto.data, i, sep=.),
as.data.frame(array(0, dim=c(rw[i], cl[i]), 
dimnames = list(NULL, paste(X, 1:cl[i],
sep=)

check: 

 auto.data.1
  X1 X2 X3
1  0  0  0
2  0  0  0
3  0  0  0
4  0  0  0
 auto.data.2
  X1 X2 X3
1  0  0  0
2  0  0  0
3  0  0  0
4  0  0  0
5  0  0  0
 auto.data.3
  X1 X2 X3
1  0  0  0
2  0  0  0

2. filling them up (... are you sure you want to do it this way?)

The simplest way is probably through an intermediary

for(nam in paste(auto.data, 1:3, sep=.)) { # loop over the names
  tmp - get(nam)
  for(i in 1:nrow(tmp))
for(j in 1:ncol(tmp))
  tmp[i, j] - i+j-i*j # 'some value'
  assign(nam, tmp)
  rm(tmp)
}

check:

 auto.data.1
  X1 X2 X3
1  1  1  1
2  1  0 -1
3  1 -1 -3
4  1 -2 -5
 auto.data.2
  X1 X2 X3
1  1  1  1
2  1  0 -1
3  1 -1 -3
4  1 -2 -5
5  1 -3 -7
 auto.data.3
  X1 X2 X3
1  1  1  1
2  1  0 -1
 

It may work, but I have to say, though, I'm almost sure this is a
mistake.  There has to be a better way using the facilities that R
provides for avoiding heavy loops like this.

Just a hunch...

Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile: +61 4 8819 4402
Home Phone: +61 7 3286 7700
mailto:[EMAIL PROTECTED]
http://www.cmis.csiro.au/bill.venables/ 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Drescher, Michael
(MNR)
Sent: Friday, 13 July 2007 9:19 AM
To: r-help@stat.math.ethz.ch
Subject: [R] automatically generating and accessing data frames of
varyingdimensions

Hi All,

I want to automatically generate a number of data frames, each with an
automatically generated name and an automatically generated number of
rows. The number of rows has been calculated before and is different for
all data frames (e.g. c(4,5,2)). The number of columns is known a priori
and the same for all data frames (e.g. c(3,3,3)). The resulting data
frames could look something like this:

 auto.data.1
  X1 X2 X3
1  0  0  0
2  0  0  0
3  0  0  0
4  0  0  0

 auto.data.2
  X1 X2 X3
1  0  0  0
2  0  0  0
3  0  0  0
4  0  0  0
5  0  0  0

 auto.data.3
  X1 X2 X3
1  0  0  0
2  0  0  0

Later, I want to fill the elements of the data frames with values read
from somewhere else, automatically looping through the previously
generated data frames.

I know that I can automatically generate variables with the right number
of elements with something like this:

 auto.length - c(12,15,6)
 for(i in 1:3) {
+ nam - paste(auto.data,i, sep=.)
+ assign(nam, 1:auto.length[i])
+ }
 auto.data.1
 [1]  1  2  3  4  5  6  7  8  9 10 11 12
 auto.data.2
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
 auto.data.3
[1]  1 2 3 4 5 6

But how do I turn these variables into data frames or give them any
dimensions? Any commands such as 'as.matrix', 'data.frame', or 'dim' do
not seem to work. I also seem not to be able to access the variables
with something like auto.data.i since:

 auto.data.i
Error: object auto.data.i not found

Thus, how would I be able to automatically write to the elements of the
data frames later in a loop such as ...

 for(i in 1:3) {
+ for(j in 1:nrow(auto.data.i)) {   ### this obviously does not work
since 'Error in nrow(auto.data.i) : object auto.data.i not found'
+ for(k in 1:ncol(auto.data.i)) {
+ auto.data.i[j,k] - 'some value'
+ }}}

Thanks a bunch for all your help.

Best, Michael


Michael Drescher
Ontario Forest Research Institute
Ontario Ministry of Natural Resources
1235 Queen St East
Sault Ste Marie, ON, P6A 2E3
Tel: (705) 946-7406
Fax: (705) 946-2030

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] type III ANOVA for a nested linear model

2007-07-10 Thread Bill.Venables
The message from this cute little data set is very clear.  Consider

 fm - aov(resp ~ A*B + A/C, mydata)
 
 drop1(fm, test = F)
Single term deletions

Model:
resp ~ A * B + A/C
   Df Sum of Sq RSS AIC F value  Pr(F)
none   65.540  47.261   
A:B 216.132  81.672  47.222  0.7384 0.5168
A:C 6   199.501 265.041  60.411  3.0440 0.1007

So neither of the non-marginal terms is significant.  To address
questions about the main effects the natural next step is to remove the
interactions.  By orthogonality you can safely cut a few corners and do
both at once:


 drop1(update(fm, .~A+B), test = F)
Single term deletions

Model:
resp ~ A + B
   Df Sum of Sq RSS AIC F value Pr(F)
none   281.17   57.47  
A   2 33.12  314.30   55.48  0.82460.4586
B   1915.21 1196.38   81.54 45.5695 9.311e-06

There is a very obvious, even trivial, B main effect, but nothing else.
All this becomes even more glaring if you take the unusal step of
plotting the data.

What sort of editor would overlook this clear and demonstrable message
leaping out from the data in favour of some arcane argument about types
of sums of squares?  Several answers come to mind: A power freak, a SAS
afficianado, an idiot.

If you get nowhere with this editor, my suggestion, hard as it may seem,
is that you do not submit to that kind of midnless idealogy and make
fatuous compromises for the sake of immediate publication. If necessary,
part company with that editor and find somewhere else to publish where
the editor has some inkling of what statistical inference is all about.

Bill Venables.
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Carsten Jaeger
Sent: Tuesday, 10 July 2007 4:15 AM
To: R help list
Subject: [R] type III ANOVA for a nested linear model

Hello,

is it possible to obtain type III sums of squares for a nested model as
in the following:

lmod - lm(resp ~ A * B + (C %in% A), mydata))

I have tried

library(car)
Anova(lmod, type=III)

but this gives me an error (and I also understand from the documentation
of Anova as well as from a previous request
(http://finzi.psych.upenn.edu/R/Rhelp02a/archive/64477.html) that it is
not possible to specify nested models with car's Anova).

anova(lmod) works, of course.

My data (given below) is balanced so I expect the results to be similar
for both type I and type III sums of squares. But are they *exactly* the
same? The editor of the journal which I'm sending my manuscript to
requests what he calls conventional type III tests and I'm not sure if
can convince him to accept my type I analysis.

R mydata
  A B C  resp
1 1 1  1 34.12
2 1 1  2 32.45
3 1 1  3 44.55
4 1 2  1 20.88
5 1 2  2 22.32
6 1 2  3 27.71
7 2 1  6 38.20
8 2 1  7 31.62
9 2 1  8 38.71
102 2  6 18.93
112 2  7 20.57
122 2  8 31.55
133 1  9 40.81
143 1 10 42.23
153 1 11 41.26
163 2  9 28.41
173 2 10 24.07
183 2 11 21.16

Thanks a lot,

Carsten

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a little problem on selecting a subset from dataset A accordingto dataset B?

2007-07-09 Thread Bill.Venables
 AB - with(B, subset(A, coords.x1 %in% X1))
 AB
   coords.x1 coords.x2
0   542250.9   3392404
7   541512.5   3394722
8   541479.3   3394878
9   538903.4   3395943
18  543274.0   3389919
19  543840.8   3392012

 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of zhijie zhang
Sent: Monday, 9 July 2007 2:43 AM
To: R-help@stat.math.ethz.ch
Subject: [R] a little problem on selecting a subset from dataset A
accordingto dataset B?

Dear Friends,
   I want to extract the records from A according to B, but the results
are
not correct because R says :
  The length of long object is not integer times on the length of short
object.
  Anybody have met the same problem? How to do it correctly?

length(A)=47
length(B)=6

A[A$coords.x1==B$X1,]   #the program for the above task. I should get 6
records, but i only get former 4 records for the above reason.

Thanks.
 The folloing shows dataset A and B.


 A
   coords.x1 coords.x2
0  542250.89 3392404.1
1  538813.87 3388339.0
2  536049.19 3385821.6
3  533659.62 3383194.2
4  530642.30 3376834.9
5  529573.15 3378177.8
6  530853.82 3394838.8
7  541512.51 3394721.6
8  541479.33 3394877.8
9  538903.39 3395942.5
10 536019.95 3396286.1
11 538675.23 3384213.2
12 535127.95 3381255.4
13 533852.24 3378660.4
14 531360.91 3379273.8
15 539289.14 3375759.8
16 543410.51 3384353.1
17 543089.27 3388170.1
18 543274.03 3389919.2
19 543840.77 3392012.4
20 553383.55 3402401.8
21 554621.51 3397938.9
22 564096.42 3397524.4
23 567529.64 3398702.9
24 561798.76 3404864.0
25 562868.34 3405502.2
26 563145.22 3403192.1
27 562419.87 3404090.4
28 558321.85 3403879.9
29 567050.74 3404973.1
30 570609.70 3408742.4
31 556777.57 3397858.0
32 531353.38 3368596.6
33 533513.50 3372749.3
34 537543.19 3364284.8
35 538779.41 3368224.8
36 525930.09 3374067.7
37 522990.85 3369213.1
38 528826.37 3359019.0
39 533865.85 3362595.4
40 531200.25 3365053.0
41 551054.10 3377181.3
42 546974.19 3369284.8
43 572315.59 3359541.1
44 562703.63 3355173.4
45 558959.31 3357804.4
46 558531.39 3361741.1


 B
 X1X2
1 542250.89 3392404.1
2 541512.51 3394721.6
3 541479.33 3394877.8
4 538903.39 3395942.5
5 543274.03 3389919.2
6 543840.77 3392012.4

-- 
With Kind Regards,

oooO:
(..):
:\.(:::Oooo::
::\_)::(..)::
:::)./:::
::(_/
:
[***
]
Zhi Jie,Zhang ,PHD
Tel:86-21-54237149
Dept. of Epidemiology,School of Public Health,Fudan University
Address:No. 138 Yi Xue Yuan Road,Shanghai,China
Postcode:200032
Email:[EMAIL PROTECTED]
Website: www.statABC.com
[***
]
oooO:
(..):
:\.(:::Oooo::
::\_)::(..)::
:::)./:::
::(_/
:

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Within matrix

2007-07-09 Thread Bill.Venables
Unless you want to do this millions of times, efficiecy is probably not
a big issue here, but simplicity always pays off.

I'm presuming you are dealing with a single classification setup.

Let f (n x 1) be a *factor* defining the classes
Let X (n x p) be the data matrix.

Then the steps I would use to find the between and within SSP matrices,
'by hand' are as follows:

Tot - scale(X, scale = FALSE)  # sweep out the grand means
Res - resid(aov(X ~ f))# sweep out the class means

WSS - crossprod(Res)   # within SSP matrix
BSS - crossprod(Tot - Res) # between SSP matrix

 


Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile: +61 4 8819 4402
Home Phone: +61 7 3286 7700
mailto:[EMAIL PROTECTED]
http://www.cmis.csiro.au/bill.venables/ 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: Tuesday, 10 July 2007 9:25 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Within matrix

Hi all,

I am working on cluster, I am trying to evaluate a within and between 
matrix. Is there any facility for that ? I did my own function, but I 
am not a programmer, so I am affraid I am not really able to programme 
efficiant and fast function...

Thanks

Christophe


Ce message a ete envoye par IMP, grace a l'Universite Paris 10 Nanterre

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to calculate the index the number of speciescombinations?

2007-07-07 Thread Bill.Venables
Here is a step by step explanation.  

The way you present the data is as species (rows) by sites (columns)
data frame

 dim(species_x_sites)
[1] 17 20

There are in fact only 19 sites as one of the columns of the data frame
is the species name:

 names(species_x_sites)
 [1] Species  Cuba Hispaniola   Jamaica
Puerto_Rico 
 [6] Guadeloupe   Martinique   Dominica St._Lucia
Barbados
[11] St._Vincent  Grenada  Antigua  St._Croix
Grand_Cayman
[16] St._KittsBarbuda  Montserrat   St._Martin
St._Thomas  

To use the standard tools you need to turn it around and make a
site_x_species matrix

 site_x_species - t(as.matrix(species_x_sites[, -1]))

(The [, -1] simply omits the species column and the t() transoses it)

 dim(site_x_species)
[1] 19 17

Now how many unique combinations are there?

 unique_combos - unique(site_x_species) # unique *rows*

Not to find out how many we could use

 dim(unique_combos)
[1] 10 17

or

 nrow(unique_combos)
[1] 10

Which I believe corresponds to your index.



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Zhang Jian
Sent: Saturday, 7 July 2007 4:20 PM
To: Sarah Goslee; r-help
Subject: Re: [R] How to calculate the index the number of
speciescombinations?

Sorry, I can not understand your reply very clearly.
How to compute the number of unique sites ?
Can you give me a simply example or do a simply analyse using one data?
Thanks very much.
   Jian Zhang


On 7/7/07, Sarah Goslee [EMAIL PROTECTED] wrote:

 It should be the number of unique sites. In this case, the number of
 unique columns in the data frame. See ?unique. (Interestingly,
 convention is usually that species are columns and sites are rows.)

 For your sample data you only see 10 of the 2^17 possible combinations
 of 17 species (not 2n).

 Sarah

 On 7/7/07, Zhang Jian [EMAIL PROTECTED] wrote:
  I want to analyze the co-occurrence of some species. In some papers,
the
  authors said that the indexthe number of species combinations
(COMBO)
 is a
  good index. I try to calculate the index by R language. But I can
not
 get
  the right value. I think that I do not understand the concept of the
 index
  because my english is not good.
 
  The concept:
  *The number of species combinations   *This index scans the columns
of
 the
  presence-absence matrix and keeps track of the number of unique
species
  combinations that are represented in different sites. For an
assemblage
 of n
  species, there are 2n possible species combinations, including the
  combination of no species being present (Pielou and Pielou 1968). In
 most
  real matrices, the number of sites (= columns) is usually
substantially
 less
  than 2n, which places an upper bound on the number of species
 combinations
  that can be found in both the observed and the simulated matrices.
 
  Presence-absence Data (Each row represents different species and
each
 column
  represents a different site. A 1 indicates a species is present at
a
  particular site, and a 0 indicates that a species is absent from a
  particular site):
  Species Cuba Hispaniola Jamaica Puerto_Rico Guadeloupe Martinique
 Dominica
  St._Lucia Barbados St._Vincent Grenada Antigua St._Croix
Grand_Cayman
  St._Kitts Barbuda Montserrat St._Martin St._Thomas
  Carduelis_dominicensis 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  Loxia_leucoptera 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  Volatinia_jacarina 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
  Sporophila_nigricollis 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
  Melopyrrha_nigra 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
  Loxigilla_portoricensis 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  Loxigilla_violacea 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  Loxigilla_noxis 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0
  Melanospiza_richardsoni 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
  Tiara_olivacea 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
  Tiara_bicolor 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1
  Tiara_canora 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  Loxipasser_anoxanthus 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  Saltator_albicollis 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
  Torreornis_inexpectata 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  Ammodramus_savannarum 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  Zonotrichia_capensis 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 

 --
 Sarah Goslee
 http://www.functionaldiversity.org


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] termplot with uniform y-limits

2007-07-02 Thread Bill.Venables
Does anyone have, or has anyone ever considered making, a version of
'termplot' that allows the user to specify that all plots should have
the same y-limits?

This seems a natural thing to ask for, as the plots share a y-scale.  If
you don't have the same y-axes you can easily misread the comparative
contributions of the different components. 

Notes: the current version of termplot does not allow the user to
specify ylim.  I checked.

  the plot tools that come with mgcv do this by default.  Thanks
Simon.


Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile: +61 4 8819 4402
Home Phone: +61 7 3286 7700
mailto:[EMAIL PROTECTED]
http://www.cmis.csiro.au/bill.venables/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about PCA with prcomp

2007-07-02 Thread Bill.Venables
...but with 500 variables and only 20 'entities' (observations) you will
have 481 PCs with dead zero eigenvalues.  How small is 'smaller' and how
many is a few?

Everyone who has responded to this seems to accept the idea that PCA is
the way to go here, but that is not clear to me at all.  There is a
2-sample structure in the 20 observations that you have.  If you simply
ignore that in doing your PCA you are making strong assumptions about
sampling that would seem to me unlikely to be met.  If you allow for the
structure and project orthogonal to it then you are probably throwing
the baby out with the bathwater - you want to choose variables which
maximise separation between the 2 samples (and now you are up to 482
zero principal variances, if that matters...).

I think this problem probably needs a bit of a re-think.  Some variant
on singular LDA, for example, may be a more useful way to think about
it.

Bill Venables.  

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ravi Varadhan
Sent: Monday, 2 July 2007 1:29 PM
To: 'Patrick Connolly'
Cc: r-help@stat.math.ethz.ch; 'Mark Difford'
Subject: Re: [R] Question about PCA with prcomp

The PCs that are associated with the smaller eigenvalues. 



---

Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: [EMAIL PROTECTED]

Webpage:
http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html

 





-Original Message-
From: Patrick Connolly [mailto:[EMAIL PROTECTED]
Sent: Monday, July 02, 2007 4:23 PM
To: Ravi Varadhan
Cc: 'Mark Difford'; r-help@stat.math.ethz.ch
Subject: Re: [R] Question about PCA with prcomp

On Mon, 02-Jul-2007 at 03:16PM -0400, Ravi Varadhan wrote:

| Mark,
| 
| What you are referring to deals with the selection of covariates, 
| since
PC
| doesn't do dimensionality reduction in the sense of covariate
selection.
| But what Mark is asking for is to identify how much each data point 
| contributes to individual PCs.  I don't think that Mark's query makes
much
| sense, unless he meant to ask: which individuals have high/low scores

| on PC1/PC2.  Here are some comments that may be tangentially related 
| to
Mark's
| question:
| 
| 1.  If one is worried about a few data points contributing heavily to

| the estimation of PCs, then one can use robust PCA, for example, 
| using robust covariance matrices.  MASS has some tools for this.
| 2.  The biplot for the first 2 PCs can give some insights 3. PCs, 
| especially, the last few PCs, can be used to identify outliers.

What is meant by last few PCs?

-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

   ___Patrick Connolly   
 {~._.~} Great minds discuss ideas
 _( Y )_Middle minds discuss events 
(:_~*~_:)Small minds discuss people  
 (_)-(_)   . Anon
  
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subscripting specified variables in a function

2007-06-26 Thread Bill.Venables
I think what you are trying to do is quite tricky.  Here is one way you
might like to think about.

 tdat - data.frame(a = 1:5, b = c(1:3, 101,101))
 tdat
  a   b
1 1   1
2 2   2
3 3   3
4 4 101
5 5 101
 test.fx - function(dta, expvar, expval) {
  lang - substitute(subset(dat, v1  v2),
list(dat = substitute(dta), 
v1 = substitute(expvar), 
v2 = substitute(expval)))
  newdta - eval.parent(lang)
  summary(newdta[deparse(substitute(expvar))])
 }
 test.fx(tdat, b, 100)
   b  
 Min.   :101  
 1st Qu.:101  
 Median :101  
 Mean   :101  
 3rd Qu.:101  
 Max.   :101  
 test.fx(tdat, b, 2)
   b 
 Min.   :  3.00  
 1st Qu.: 52.00  
 Median :101.00  
 Mean   : 68.33  
 3rd Qu.:101.00  
 Max.   :101.00  
  


Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile: +61 4 8819 4402
Home Phone: +61 7 3286 7700
mailto:[EMAIL PROTECTED]
http://www.cmis.csiro.au/bill.venables/ 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Zodet, Marc W.
(AHRQ)
Sent: Wednesday, 27 June 2007 12:43 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Subscripting specified variables in a function

I'm trying to create a function which will allow me to subset a data set
based on values of various specified variables.  I also want to then
apply some other function(s) (e.g., summary).

 

This is what I've tried so far

 

 test.fx - function(dta, expvar, expval) {

+ newdta - subset(dta, eval(expvar)expval)

+ summary(newdta$eval(expvar))

+ }

 

 test.fx(fyc04s, quote(totexp04), 100)

Error in summary(newdta$eval(expvar)) : attempt to apply non-function

 

 

The subset works fine, but the my attempt to access the specified
variable bombs.  

 

Is there a syntactical change I can make?

Is it better to attach newdta?

 

Thanks in advance for any guidance.

 

Marc

 

Marc W. Zodet, MS   

Senior Health Statistician

Agency for Healthcare Research and Quality

Center for Financing, Access, and Cost Trends

301-427-1563 (Telephone)

301-427-1276 (Fax)

[EMAIL PROTECTED]

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] correlation comparison one more time

2007-06-17 Thread Bill.Venables
Hypothesis tests are normally set up to test a null hypothesis within a broader 
class of alternatives, which includes the null as a special case.  Roughly 
speaking the logic is

We assume that the outer class includes the truth. We have a simple special 
case of this we call the null hypothesis that in some sense represents 'no 
effect'. Does the data provide cogent evidence that the special case is not 
adequate?

A standard way to address this question is, for example, to maximise the 
likelihood under null and alternative and to use the difference in log 
likelihood as the basis of a test statistic known as the likelihood ratio.

The way you have set up your hypotheses does not match this paradigm.  Your 
hypothesis is, in essence, that the squared correlation between A and D is 
larger than any other squared correlation involving two different variables, 
which include A or D.

It is clear enough what you are asking, but since it doesn't match the standard 
paradigm it is unlilely that any standard procedure will be available to 
address it.  It is unclear, for example, how you might go about setting up a 
likelihood ratio test. 

I think the answer to your question is no, not off the shelf at least, and 
you probably need to think about the problem in the null and alternative 
hypothesis framework to make progress.

Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile:  (I don't have one!)
Home Phone: +61 7 3286 7700
mailto:[EMAIL PROTECTED]
http://www.cmis.csiro.au/bill.venables/ 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of rafael
Sent: Sunday, 17 June 2007 8:20 PM
To: r-help@stat.math.ethz.ch
Subject: [R] correlation comparison one more time

I would like ask again,
because I cant find the answer

I have such problem:

My data containing 4 variables (A,B,C,D) and are completed from 4 samples.
Each of matrix is such:
A   B   C   D
A 1   ab   ac   ad
B ab  1bc   bd  
C ac   bc   1   cd
D ad   bd   cd   1

My hypothesis are that

ad is the strongest correlation for A and for D (sign doesn't matter)
bc is the strongest correlation for B and for C (sign doesn't matter)

across samples.

Is it possible test these hypothesis?

Any help would be appreciated

Rafał Bartczuk
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Retain names in conversion of matrix to vector

2007-06-14 Thread Bill.Venables
There is a slightly surprising way to do this in one step.  Here's an
example

 tmp - matrix(1:16, 4, 4)
 dimnames(tmp) - list(letters[1:4], letters[1:4])
 tmp
  a b  c  d
a 1 5  9 13
b 2 6 10 14
c 3 7 11 15
d 4 8 12 16
 as.data.frame(as.table(tmp))
   Var1 Var2 Freq
1 aa1
2 ba2
3 ca3
4 da4
5 ab5
6 bb6
7 cb7
8 db8
9 ac9
10bc   10
11cc   11
12dc   12
13ad   13
14bd   14
15cd   15
16dd   16
  


Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile:(I don't have one!)
Home Phone: +61 7 3286 7700
mailto:[EMAIL PROTECTED]
http://www.cmis.csiro.au/bill.venables/ 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of philozine
Sent: Friday, 15 June 2007 9:04 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Retain names in conversion of matrix to vector

Hi R-listers,

I'm using R only for a few basic functions but am having difficulty
doing something that *should* be simple. I have an nxn matrix, Q, where
Q[i,j] is a directed value (in this case, oil exports from i to j). Note
that Q[i,j]~=Q[j,i]. I imported column names along with the matrix then
copied them to the rows using rownames(Q) - colnames(Q). Simple so far.

What I'd like to do now is convert Q for export into a vector of values
with the original row and column names intact. Having one vector each
for row, column, and cell would be ideal, e.g., [1,1] = i's name, [1,2]
= j's name, and [1,3] = Q[i, j]. But just being able to export my matrix
data in vector form with the correct row/col names for each observation
would be sufficient.

Thus far I've tried c(), vector(), and a few others, but can't get the
correct results. They do generate the correct vector of matrix values,
but they do not appear to retain both row and column names. (Or, rather,
I have not discovered how to make them do so.)

To illustrate, my data currently look something like this:

ABCD
A | 0  |.1 |.4  |.6  |
B |.2 | 0  |.2  |.1  |
C |.5  |.9  | 0  |.9  |
D |.7  | 0  |.3  | 0  |

I would like them to look like this (at least when exported as a .txt
file, if not necessary when displayed within R):

  i   j   Q
| A | A | 0 |
| A | B |.1 |
| A | C |.4 |
| A | D |.6 |
| B | A |.2 |
| B | B | 0 |
| B | C |.2 |
[...] and so on

If anybody knows how to do this, I will be extremely appreciative!

Best regards,


   
-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] AIC consistency with S-PLUS

2007-05-31 Thread Bill.Venables
I think we need some clarification here.

extractAIC() is available on both systems, in the stats package in R and
in the MASS library in S-PLUS.  If you use extractAIC() on both systems,
do you get the same ordering of models? 

AIC() is also available on both systems, in the stats package again in
R, and in the nlme3 library in S-PLUS.  On R there are not too many
classes where methods are available for both, but on S-PLUS there are a
few.  

So what are you comparing with what?  You need to say what classes of
objects you are dealing with, that is very important, and what generic
functions you are using for the comparison on each system.  The capacity
for confusion in this area is immense.

Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile:(I don't have one!)
Home Phone: +61 7 3286 7700
mailto:[EMAIL PROTECTED]
http://www.cmis.csiro.au/bill.venables/ 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Alice Shelly
Sent: Friday, 1 June 2007 10:07 AM
To: r-help@stat.math.ethz.ch
Subject: [R] AIC consistency with S-PLUS

Hello-
I understand that log-likelihoods are bound to differ by constants, but
if i estimate AIC for a set of simple nested linear models using the
following 4 methods, shouldn't at least two of them produce the same
ordering of models?

in R:
extractAIC
AIC

in S-PLUS:
AIC
n*log(deviance(mymodel)/n) + 2*p

I find it troubling that these methods all give me different answers as
to the best model or even short set of models.

Thanks for your comments.

Alice Shelly

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scan a directory and then make a string vector consisting offile names

2007-05-31 Thread Bill.Venables
Would you believe it's 

 list.files()

See ?list.files for variants.


Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile:(I don't have one!)
Home Phone: +61 7 3286 7700
mailto:[EMAIL PROTECTED]
http://www.cmis.csiro.au/bill.venables/ 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Taka Matzmoto
Sent: Friday, 1 June 2007 12:19 PM
To: r-help@stat.math.ethz.ch
Subject: [R] scan a directory and then make a string vector consisting
offile names

Dear R-users,

I am looking for a way to scan a directory and then make a string vector

consisting of the file names in the directory.

For example, under c:\temp\
there are 4 files
a.txt, b.txt, c.txt, and d.txt

I would like to have a string vector
c(a.txt,b.txt,c.txt,d.txt)

How do I do that?

Thanks

Taka,

_
Make every IM count. Download Messenger and join the i'm Initiative now.

It's free. http://im.live.com/messenger/im/home/?source=TAGHM_June07

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Factor function: odd behavior when labels argument containsduplicates?

2007-05-30 Thread Bill.Venables
There is a difference between levels and labels.  I think this is what
you want.

 x - factor(rep(0:5, 2))
 x
 [1] 0 1 2 3 4 5 0 1 2 3 4 5
Levels: 0 1 2 3 4 5
 levels(x) - c(1,1:5)
 x
 [1] 1 1 2 3 4 5 1 1 2 3 4 5
Levels: 1 2 3 4 5
 table(x)
x
1 2 3 4 5 
4 2 2 2 2 

 


Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile:(I don't have one!)
Home Phone: +61 7 3286 7700
mailto:[EMAIL PROTECTED]
http://www.cmis.csiro.au/bill.venables/ 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Steen Ladelund
Sent: Wednesday, 30 May 2007 6:27 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Factor function: odd behavior when labels argument
containsduplicates?

Hi all.

I think it would be nice to be able to combine levels of a factor on
creation a la

 x - rep(0:5,5)

 y - factor(x,levels=0:5,labels=c('1','1',2:5))  ## (1)

 y
 [1] 1 1 2 3 4 5 1 1 2 3 4 5 1 1 2 3 4 5 1 1 2 3 4 5 1 1 2 3 4 5
Levels: 1 1 2 3 4 5

I thougt this would (should?) create a factor with 5 levels, ie
combining 0 and 1 in x into one level in y.

I find it hard to predict the behavior of the following lines:

 table(factor(y))  ## Odd ?
 1  1  2  3  4  5 
10  0  5  5  5  5 
 table(factor(factor(y)))  ## This is what I want
 1  2  3  4  5 
10  5  5  5  5 

It also seems strange that this should be the way to go:
 levels(y) - levels(y)   ## Does what I want following (1) even tough
it appear to be a void statement?
 y
y
 [1] 1 1 2 3 4 5 1 1 2 3 4 5 1 1 2 3 4 5 1 1 2 3 4 5 1 1 2 3 4 5
Levels: 1 2 3 4 5

Am I missing an important point about factors or the factor function?

steen

Steen Ladelund, statistician
+4543233275 [EMAIL PROTECTED]
Research Center for Prevention and Health
Glostrup University Hospital, Denmark
www.fcfs.kbhamt.dk

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] linear model by month

2007-05-28 Thread Bill.Venables
 dat - 
+ Month  ExcessReturn  Return  STO
+ 8  0.047595875  0.05274292  0.854352503
+ 8  0.016134874  0.049226941  4.399372005
+ 8  -0.000443869  0.004357305  -1.04980297
+ 9  0.002206554  -0.089068828  0.544809429
+ 9  0.021296551  0.003795071  0.226875834
+ 9  0.006741578  0.014104606  0.721986383
+ 
 dat - read.table(textConnection(dat), header = T)
 dat - transform(dat, Month = factor(paste(M, Month, sep=)))
 
 FM - lm(ExcessReturn ~ Month/(Return+STO) - 1, dat)
 
 coef(FM)
   MonthM8MonthM9 
  -0.0140439300.028057112 
MonthM8:Return MonthM9:Return 
   1.2916883380.097940939 
   MonthM8:STOMonthM9:STO 
  -0.007593598   -0.031436815 
 

#
the coefficients are two intercepts, two slopes on Return, two slopes on
STO.

Why must it be lm?  It might be simple to use lmList from the nlme
package.

 


Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile:(I don't have one!)
Home Phone: +61 7 3286 7700
mailto:[EMAIL PROTECTED]
http://www.cmis.csiro.au/bill.venables/ 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Benoit Chemineau
Sent: Monday, 28 May 2007 11:48 PM
To: r-help@stat.math.ethz.ch
Subject: [R] linear model by month

Hi R-programmers !

I would like to perform a linear model regressio using the 'lm' function
and
i don't know how to do it.

The data is organised as below:
Month  ExcessReturn  Return  STO
8  0.047595875  0.05274292  0.854352503
8  0.016134874  0.049226941  4.399372005
8  -0.000443869  0.004357305  -1.04980297
9  0.002206554  -0.089068828  0.544809429
9  0.021296551  0.003795071  0.226875834
9  0.006741578  0.014104606  0.721986383

the model is:
ExcessReturn= a + b1*Return + b2*STO + u, u is the error term, a is the
intercept.

I would like to have monthly estimates of b1 and b2 using the least
squares
estimation. (b1month8 and b2month8, b1month9 and b2month9).

Thank you for your help !

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] looking for the na.omit equivalent for a matrix of characters

2007-05-28 Thread Bill.Venables
Surely all you need to do is change them to missing and use na.omit.
e.g.

x - matrix(letters, 13, 2)
x[4] - NA  # put one in the middle


is.na(x[x == NA]) - TRUE
(x - na.omit(x))


Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile:(I don't have one!)
Home Phone: +61 7 3286 7700
mailto:[EMAIL PROTECTED]
http://www.cmis.csiro.au/bill.venables/ 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Andrew Yee
Sent: Tuesday, 29 May 2007 12:50 PM
To: r-help@stat.math.ethz.ch
Subject: [R] looking for the na.omit equivalent for a matrix of
characters

I have a matrix of characters (actually numbers that have been read in
as
numbers), and I'd like to remove the NA.

I'm familiar with na.omit, but is there an equivalent of na.omit when
the NA
are the actual characters NA?

Thanks,
Andrew

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Speeding up resampling of rows from a large matrix

2007-05-25 Thread Bill.Venables
Here is a possibility.  The only catch is that if a pair of rows is
selected twice you will get the results in a block, not scattered at
random throughout the columns of G.  I can't see that as a problem.

### --- start code excerpt ---
nSNPs - 1000
H - matrix(sample(0:1, 120*nSNPs , replace=T), nrow=120)

# G - matrix(0, nrow=3, ncol=nSNPs)

# Keep in mind that the real H is 120 x 65000

ij - as.matrix(subset(expand.grid(i = 1:120, j = 1:120), i  j))

nResamples - 3000
sel - sample(1:nrow(ij), nResamples, rep = TRUE)
repf - table(sel)   # replication factors
ij - ij[as.numeric(names(repf)), ]  # distinct choice made

G - matrix(0, nrow = 3, ncol = nrow(ij))  # for now

for(j in 1:ncol(G))
  G[,j] - rowSums(outer(0:2, colSums(H[ij[j, ], ]), ==))

G - G[, rep(1:ncol(G), repf)] # bulk up the result

# _
# _
# _
# _pair - replicate(nResamples, sample(1:120, 2))
# _
# _gen - function(x){g - sum(x); c(g==0, g==1, g==2)}
# _
# _for (i in 1:nResamples){
# _G - G + apply(H[pair[,i],], 2, gen)
# _}
### --- end of code excerpt ---

I did a timing on my machine which is a middle-of-the range windows
monstrosity...

 system.time({
+ 
+ nSNPs - 1000
+ H - matrix(sample(0:1, 120*nSNPs , replace=T), nrow=120)
+ 
+ # G - matrix(0, nrow=3, ncol=nSNPs)
+ 
+ # Keep in mind that the real H is 120 x 65000
+ 
+ ij - as.matrix(subset(expand.grid(i = 1:120, j = 1:120), i  j))
+ 
+ nResamples - 3000
+ sel - sample(1:nrow(ij), nResamples, rep = TRUE)
+ repf - table(sel)   # replication factors
+ ij - ij[as.numeric(names(repf)), ]  # distinct choice made
+ 
+ G - matrix(0, nrow = 3, ncol = nrow(ij))  # for now
+ 
+ for(j in 1:ncol(G))
+   G[,j] - rowSums(outer(0:2, colSums(H[ij[j, ], ]), ==))
+ 
+ G - G[, rep(1:ncol(G), repf)] # bulk up the result
+ 
+ # _
+ # _
+ # _
+ # _pair - replicate(nResamples, sample(1:120, 2))
+ # _
+ # _gen - function(x){g - sum(x); c(g==0, g==1, g==2)}
+ # _
+ # _for (i in 1:nResamples){
+ # _G - G + apply(H[pair[,i],], 2, gen)
+ # _}
+ #
_#--
-
+ # _
+ })
   user  system elapsed 
   0.970.000.99 


Less than a second.  Somewhat of an improvement on the 80 minutes, I
reckon.  This will increase, of course when you step the size of the H
matrix up from 1000 to 65000 columns

Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile:(I don't have one!)
Home Phone: +61 7 3286 7700
mailto:[EMAIL PROTECTED]
http://www.cmis.csiro.au/bill.venables/ 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Juan Pablo
Lewinger
Sent: Friday, 25 May 2007 4:04 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Speeding up resampling of rows from a large matrix

I'm trying to:

Resample with replacement pairs of distinct rows from a 120 x 65,000 
matrix H of 0's and 1's. For each resampled pair sum the resulting 2 
x 65,000 matrix by column:

 0 1 0 1 ...
+
 0 0 1 1 ...
___
=  0 1 1 2 ...

For each column accumulate the number of 0's, 1's and 2's over the 
resamples to obtain a 3 x 65,000 matrix G.

For those interested in the background, H is a matrix of haplotypes, 
each pair of haplotypes forms a genotype, and each column corresponds 
to a SNP. I'm using resampling to compute the null distribution of 
the maximum over correlated SNPs of a simple statistic.


The code:
#---

nSNPs - 1000
H - matrix(sample(0:1, 120*nSNPs , replace=T), nrow=120)
G - matrix(0, nrow=3, ncol=nSNPs)
# Keep in mind that the real H is 120 x 65000

nResamples - 3000
pair - replicate(nResamples, sample(1:120, 2))

gen - function(x){g - sum(x); c(g==0, g==1, g==2)}

for (i in 1:nResamples){
G - G + apply(H[pair[,i],], 2, gen)
}
#---

The problem is that the loop takes about 80 mins to complete and I 
need to repeat the whole thing 10,000 times, which would then take 
over a year and a half!

Is there a way to speed this up so that the full 10,000 iterations 
take a reasonable amount of time (say a week)?

My machine has an Intel Xeon 3.40GHz CPU with 1GB of RAM

  sessionInfo()
R version 2.5.0 (2007-04-23)
i386-pc-mingw32

I would greatly appreciate any help.

Juan Pablo Lewinger
Department of Preventive Medicine
Keck School of Medicine
University of Southern California

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman

Re: [R] name of object in quotes

2007-05-23 Thread Bill.Venables
This is a long-standing idiom:

 foo - function(xvar) deparse(substitute(xvar))
 foo(x1)
[1] x1

 


Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred):   +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile:   (I don't have one!)
Home Phone:+61 7 3286 7700
mailto:[EMAIL PROTECTED]
http://www.cmis.csiro.au/bill.venables/ 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gonzalo Rivero
Sent: Wednesday, 23 May 2007 8:14 PM
To: R-help@stat.math.ethz.ch
Subject: [R] name of object in quotes

I am writing a function in which, at some point, I to recuperate the
name of a previous object in quotes. I am currently using the function
Cs() from the Hmisc library but the result is:

foo - function(xvar) {
variable - Cs(xvar)
return(variable)
}

foo(x1)
 xvar

when I would expected to obtain x1.

Any suggestion?

Thanks


-- 
*Gonzalo Rivero*
Ph.D. candidate

Centro de Estudios Avanzados en Ciencias Sociales
Juan March Institute
Castelló 77, 2nd floor
28006, Madrid

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] converting a row of a data.frame to a vector

2007-05-15 Thread Bill.Venables
If the data frame has factors and numeric vectors, there is a question
on what form you want the row vector to be in. Only a data frame (list)
can have a mixture of the two. 

Consider:

 dat - data.frame(x=1:3, y=4:6, z=letters[1:3])
 (r1 - dat[1,])
  x y z
1 1 4 a
 class(r1)
[1] data.frame
 (r1 - data.matrix(dat)[1,])
x y z 
1 4 1 
 class(r1)
[1] integer
 (r1 - as.matrix(dat)[1,])
  x   y   z 
1 4 a 
 class(r1)
[1] character


Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred):   +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile:   (I don't have one!)
Home Phone:+61 7 3286 7700
mailto:[EMAIL PROTECTED]
http://www.cmis.csiro.au/bill.venables/ 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Andrew Yee
Sent: Wednesday, 16 May 2007 9:37 AM
To: r-help@stat.math.ethz.ch
Subject: [R] converting a row of a data.frame to a vector

I've searched for the answer to this in the help list archive, but
wasn't
able to get the answer to work.

I'm interested in converting a row of a data.frame into a vector.

However, when I use as.vector(x,[1,]) I get another data.frame, instead
of a
vector.  (On the other hand, when I use as.vector(x,[,1]), I get a
vector.)

Thanks,
Andrew

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] number of days

2007-05-14 Thread Bill.Venables
The standard answer is to use

 as.POSIXlt(2007-05-14) - as.POSIXlt(1950-01-01)
Time difference of 20952 days

If you want to ensure the time difference is in daily units, then
 
 difftime(as.POSIXlt(2007-05-14), 
  as.POSIXlt(1950-01-01), units = days)
Time difference of 20952 days

If you want to use the answer in numerical computations, then one more
coercion is usually needed

 as.numeric(difftime(as.POSIXlt(2007-05-14), 
   as.POSIXlt(1950-01-01), units =
days))
[1] 20952 


Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred):   +61 7 3826 7251
Fax (if absolutely necessary):+61 7 3826 7304
Mobile:(I don't have one!)
Home Phone:   +61 7 3286 7700
mailto:[EMAIL PROTECTED]
http://www.cmis.csiro.au/bill.venables/ 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of threshold
Sent: Monday, 14 May 2007 7:16 PM
To: r-help@stat.math.ethz.ch
Subject: [R] number of days


Hi, some of you knows how to calculate in R the number of days between
given
dates? 
issue relevant in option pricing
thanks, robert
-- 
View this message in context:
http://www.nabble.com/number-of-days-tf3751163.html#a10600371
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple question about function with glm

2007-05-06 Thread Bill.Venables
There is a statistical problem and a code problem.

The statistical problem is that if your 'x' has a mean that depends
non-trivially on predictors, then you would not expect its distribution
ignoring predictors to be normal.  You would expect the residuals after
modelling to be normal.  Basically you cannot sensibly test normality of
the response before you fit the model.  It's a very common mistake, even
if rather an obvious one.

The code problem is that, do you really know whether your models have
been fitted or not?  The 'summary(xmodel)' part of your function below
will not print anything out, so if you were expecting something from
that you would be disappointed.

You might try replacing 
summary(xmodel)
confint(xmodel)

By
print(summary(xmodel))
print(confint(xmodel))

But this is not really a very good paradigm in genera.

Finally, I'm a bit puzzled why you use glm() when the simpler lm() would
have done the job.  You are fitting a linear model and do not need the
extra paraphernaila that generalized linear models require.

Bill Venables. 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Chung-hong Chan
Sent: Sunday, 6 May 2007 6:47 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Simple question about function with glm

Dear all,

I coded a function called u.glm

u.glm - function (x,ahi,age,bmiz,gender) {
library(nortest)
lil.rslt - lillie.test(x)
if (lil.rslt$p.value 0.05)
{
cat(Logtrans=0, lillie=,lil.rslt$p.value,\n)
xmodel-glm(x~ahi+age+bmiz+as.factor(gender))
summary(xmodel)
confint(xmodel)

}
else
{
cat(Logtrans=1, lillie=,lil.rslt$p.value,\n)
xmodel-glm(x~ahi+age+bmiz+as.factor(gender))
summary(xmodel)
confint(xmodel)
}

}

Basically I just want to test the response variable for normality before
modeling.
When I try to use this function, it can do the lillie's test but failed
to do the glm.
What's wrong with my code?

Regards,
CH


--
The scientists of today think deeply instead of clearly. One must be
sane to think clearly, but one can think deeply and be quite insane.
Nikola Tesla
http://www.macgrass.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hotelling T-Squared vs Two-Factor Anova

2007-04-15 Thread Bill.Venables
I take it all subjects are measured at the same time points, or
Hotelling's T^2 becomes rather messy.

The essential difference lies in the way the variance matrix is
modelled.  The usual repeated measures model would model the variance
matrix as equal variances and equal covariances, i.e. with two
parameters, (though you can vary this using, e.g. lme).  Hotelling's T^2
would model the variance matrix as a general symmetric matrix, i.e. for
the 4x4 case using 4+3+2+1 = 10 parameters.  If it is appropriate, the
repeated measures model is much more parsimonious.

Bill Venables. 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Sean Scanlan
Sent: Saturday, 14 April 2007 5:38 PM
To: [EMAIL PROTECTED]
Subject: [R] Hotelling T-Squared vs Two-Factor Anova

Hi,

I am a graduate student at Stanford University and I have a general
statistics question.  What exactly is the difference between doing a
two-factor repeated measures ANOVA and a Hotelling T-squared test for a
paired comparison of mean vectors?

Given:

Anova: repeated measures on both factors, 1st factor = two different
treatments, 2nd factor = 4 time points, where you are measuring the
blood pressure at each of the time points.

Hotelling T^2: You look at the difference in the 4x1 vector of blood
pressure measurements for the two different treatments, where the four
rows in the vector are the four time points.


I am mainly interested in the main effects of the two treatments.  Can
someone please explain if there would be a difference in the two methods
or any advantage in using one over the other?

Thanks,
Sean

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] resolving expand.grid NA errors

2007-03-25 Thread Bill.Venables
Hi Bob,

Instead of fitting the model as 

 mod.multacute -multinom(kc$group ~ kc$in.acute.danger * 
   kc$violent.convictions, na.rm=T)

you might have more success fitting it sa

 mod.multacute - multinom(group ~ in.acute.danger * 
violent.convictions, data = kc,  na.action = na.omit)

This separates the variables from the data frame in which they occur.
When you predict with a new data frame, the process will look for
factors in.acure.danger and not kc$in.acute.danger.

Note also that na.rm is not an argument for multinom.  You probably mean
na.action.

I suspect this is the problem, but without the data I can't be sure.

Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred):   +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile:   (I don't have one!)
Home Phone:+61 7 3286 7700
mailto:[EMAIL PROTECTED]
http://www.cmis.csiro.au/bill.venables/ 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bob Green
Sent: Monday, 26 March 2007 6:30 AM
To: r-help@stat.math.ethz.ch
Subject: [R] resolving expand.grid  NA errors

I am hoping for some advice regarding resolving error messages I have 
received when trying to use the expand.grid command.

library(nnet)
library(MASS)
library(car)
mod.multacute -multinom(kc$group ~ kc$in.acute.danger * 
kc$violent.convictions, na.rm=T)
summary(mod.multacute, cor=F, Wald=T)
Anova (mod.multacute)
confint (mod.multacute)

  predictors - expand.grid(group=1:3, in.acute.danger = c(y,n), 
violent.convictions = c(y,n))
  p.fit - predict(mod.multacute, predictors, type='probs')
Error in predict.multinom(mod.multacute, predictors, type = probs) :
 NAs are not allowed in subscripted assignments
In addition: Warning message:
'newdata' had 12 rows but variable(s) found have 160 rows

There are two errors - the NA error which I thought would have been 
removed in line 3 above. I also tried ra.omit.
I know there will be a difference between the raw data and the new 
data but do not know what I need to change to be able to successfully 
run the command.

What I want to do is obtain the fitted probabilities and plot them.

Any suggestions are appreciated,

Bob Green

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem dropping rows based on values in a column

2007-03-25 Thread Bill.Venables
I think you want

delete - c(14772,14744)
jdata - subset(jdata, !(PID %in% delete))



Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred):   +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile:   (I don't have one!)
Home Phone:+61 7 3286 7700
mailto:[EMAIL PROTECTED]
http://www.cmis.csiro.au/bill.venables/ 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of John Sorkin
Sent: Monday, 26 March 2007 12:19 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Problem dropping rows based on values in a column

I am trying to drop rows of a dataframe based on values of the column
PID, but my strategy is not working. I hope someoen can tell me what I
am doing incorrectly.


# Values of PID column
 jdata[,PID]
 [1] 16608 16613 16355 16378 16371 16280 16211 16169 16025 11595 15883
15682 15617 15615 15212 14862 16539
[18] 12063 16755 16720 16400 16257 16209 16200 16144 11598 13594 15419
15589 15982 15825 15834 15491 15822
[35] 15803 15795 10202 15680 15587 15552 15588 15375 15492 15568 15196
10217 15396 15477 15446 15374 14092
[52] 14033 15141 14953 15473 10424 13445 14854 10481 14793 14744 14772

#Prepare to drop last two rows, rows that ahve 14744 and 14772 in the
PID column
 delete-c(14772,14744)

#Try to delete last two rows, but as you will see, I am not able to drop
the last two rows.
 jdata[jdata$PID!=delete,PID]
 [1] 16608 16613 16355 16378 16371 16280 16211 16169 16025 11595 15883
15682 15617 15615 15212 14862 16539
[18] 12063 16755 16720 16400 16257 16209 16200 16144 11598 13594 15419
15589 15982 15825 15834 15491 15822
[35] 15803 15795 10202 15680 15587 15552 15588 15375 15492 15568 15196
10217 15396 15477 15446 15374 14092
[52] 14033 15141 14953 15473 10424 13445 14854 10481 14793 14744 14772
 


Thanks,
John

John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
Baltimore VA Medical Center GRECC,
University of Maryland School of Medicine Claude D. Pepper OAIC,
University of Maryland Clinical Nutrition Research Unit, and
Baltimore VA Center Stroke of Excellence

University of Maryland School of Medicine
Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524

(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)
[EMAIL PROTECTED]

Confidentiality Statement:
This email message, including any attachments, is for the\ s...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need to fit a regression line using orthogonal residuals

2007-01-30 Thread Bill.Venables
Jonathon Kopecky asks:

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jonathon Kopecky
Sent: Tuesday, 30 January 2007 5:52 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Need to fit a regression line using orthogonal residuals

I'm trying to fit a simple linear regression of just Y ~ X, but both X
and Y are noisy.  Thus instead of fitting a standard linear model
minimizing vertical residuals, I would like to minimize
orthogonal/perpendicular residuals.  I have tried searching the
R-packages, but have not found anything that seems suitable.  I'm not
sure what these types of residuals are typically called (they seem to
have many different names), so that may be my trouble.  I do not want to
use Principal Components Analysis (as was answered to a previous
questioner a few years ago), I just want to minimize the combined noise
of my two variables.  Is there a way for me to do this in R?
[WNV] There's always a way if you are prepared to program it.  Your
question is a bit like asking Is there a way to do this in Fortran?
The most direct way to do it is to define a function that gives you the
sum of the perpendicular distances and minimise it using, say, optim().
E.g.
ppdis - function(b, x, y) sum((y - b[1] - b[2]*x)^2/(1+b[2]^2))
b0 - lsfit(x, y)$coef  # initial value
op - optim(b0, ppdis, method = BFGS, x=x, y=y)
op  # now to check the results
plot(x, y, asp = 1)  # why 'asp = 1'?? exercise
abline(b0, col = red)
abline(op$par, col = blue)
There are a couple of things about this you should be aware of, though
First, this is just a fiddly way of finding the first principal
component, so your desire not to use Principal Component Analysis is
somewhat thwarted, as it must be.
Second, the result is sensitive to scale - if you change the scales of
either x or y, e.g. from lbs to kilograms, the answer is different.
This also means that unless your measurement units for x and y are
comparable it's hard to see how the result can make much sense.  A
related issue is that you have to take some care when plotting the
result or orthogonal distances will not appear to be orthogonal.
Third, the resulting line is not optimal for either predicting y for a
new x or x from a new y.  It's hard to see why it is ever of much
interest.
Bill Venables.


Jonathon Kopecky
University of Michigan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] comparing two matrices

2007-01-21 Thread Bill.Venables
But this is using paste() the wrong way round. A better way would be

 join - function(x) do.call(paste, c(as.data.frame(x), sep = \r))
 which(join(mat1) %in% join(mat2))
[1]  8 13 16 19 24

This is essentially the technique used by duplicated.data.frame

Bill Venables 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Adrian Dusa
Sent: Sunday, 21 January 2007 8:17 PM
To: Dimitris Rizopoulos
Cc: [EMAIL PROTECTED]; r-help@stat.math.ethz.ch
Subject: Re: [R] comparing two matrices

On Sunday 21 January 2007 12:04, Dimitris Rizopoulos wrote:
 I think the following should work in your case:

 mat1 - data.matrix(expand.grid(0:2, 0:2, 0:2))
 mat2 - mat1[c(19, 16, 13, 24, 8), ]
 
 ind1 - apply(mat1, 1, paste, collapse = /)
 ind2 - apply(mat2, 1, paste, collapse = /)
 match(ind2, ind1)


Oh yes, I thought about that too.
It works fast enough for small matrices, but I deal with very large
ones. 
Using paste() on such matrices decreases the speed dramatically.

Thanks again,
Adrian

-- 
Adrian Dusa
Romanian Social Data Archive
1, Schitu Magureanu Bd
050025 Bucharest sector 5
Romania
Tel./Fax: +40 21 3126618 \
  +40 21 3120210 / int.101

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Vectorize rearrangement within each column

2007-01-19 Thread Bill.Venables
As with most things like this, you can trade memory for speed.  Here is
an obfuscated solution that appears to eschew loops entirely.

 ma - matrix(10:15, nr = 3)
 idx - matrix(c(1,3,2, 2,3,1), nr = 3)
 mb - ma
 mb[] - as.vector(ma)[as.vector(idx + 
outer(rep(nrow(ma), nrow(ma)), 1:ncol(ma)-1, '*'))]
 mb
 [,1] [,2]
[1,]   10   14
[2,]   12   15
[3,]   11   13

Ordinarily, though, my preferred solution would be the for() loop.

Bill Venables
CMIS, CSIRO Laboratories,
PO Box 120, Cleveland, Qld. 4163
AUSTRALIA
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary):+61 7 3826 7304
Mobile (rarely used):+61 4 1963 4642
Home Phone:  +61 7 3286 7700
mailto:[EMAIL PROTECTED]
http://www.cmis.csiro.au/bill.venables/ 
 
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Osorio Roberto
Sent: Friday, 19 January 2007 4:15 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Vectorize rearrangement within each column

Consider a matrix like

  ma = matrix(10:15, nr = 3)
  ma
  [,1] [,2]
[1,]   10   13
[2,]   11   14
[3,]   12   15

I want to rearrange each column according to row indexes (1 to 3)  
given in another matrix, as in

  idx = matrix(c(1,3,2, 2,3,1), nr = 3)
  idx
  [,1] [,2]
[1,]12
[2,]33
[3,]21

The new matrix mb will have for each column the corresponding column  
of ma indexed by the corresponding column of idx, as in

  mb = ma
  for (j in 1:2) mb[,j] = ma[idx[,j], j]   
  mb
  [,1] [,2]
[1,]   10   14
[2,]   12   15
[3,]   11   13

Can I avoid the for() loop? I'm specially interested to find out if a  
fast implementation using lapply() would be feasible for large input  
matrices (analogues of ma and idx) transformed into data frames.

Roberto Osorio

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] if ... problem with compound instructions

2006-12-31 Thread Bill.Venables
Step 1:

quadrant - 1 + (X[, 1]  0) + 2*(X[, 2]  0)

This is not the usual labelling of the quadrants as '3' and '4' are 
interchanged. If you want to be picky about it

quadrant - ifelse(quadrant  2, 7 - quadrant, quadrant)

Step 2:

angle - atan2(X[,2], X[,1]) %% (2*pi)  # I think this is what you want

(why did you want to know the quadrant?)

Oh, then you might do

X[, 3:4] - cbind(quadrant, angle)

---

Bill Venables
CMIS, CSIRO Laboratories,
PO Box 120, Cleveland, Qld. 4163
AUSTRALIA
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary):    +61 7 3826 7304
Mobile (rarely used):    +61 4 1963 4642
Home Phone:  +61 7 3286 7700
mailto:[EMAIL PROTECTED]
http://www.cmis.csiro.au/bill.venables/ 
 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Richard Rowe
Sent: Monday, 1 January 2007 12:36 PM
To: r-help@stat.math.ethz.ch
Subject: [R] if ... problem with compound instructions

I am having problems with the 'if' syntax.

I have an n x 4 matrix, X say.  The first two columns hold x, y values and 
I am attempting to fill the second two columns with the quadrant in which 
the datapoint (x, y) is and with the heading angle.

So I have two problems
1) how to do this elegantly (at which I've failed as I can't seem to 
vectorize the problem) and
2) how to accomplish the task in a for loop ...

for (i in 1: length(X[,1])) (
  if ((X[i,1] ==0)  (X[i,2] ==0)) (X[i,3] - NA; X[i,4] -NA) else (
removing the pathological case ... then a series of nested if statements 
assigning quadrant and calculating heading

e.g.
if ((X[i,1]  0)  (X[i,2] = 0)) (X[i,3] - 4; X[i,4] - 
atan(X[i,1]/X[i,2]) + 2*pi) else (


In the first instance the ';'  seems to the source of a syntax 
error.  Removing the second elements of the compound statement solves the 
syntax problem and the code runs.

As the R syntax is supposed to be 'Algol-like' I had thought

if A then B else C
should work for compound B ... ie that the bracket (X[i,3] - NA; X[i,4] 
-NA) should be actioned

1) any elegant solutions to what must be a common task?

2) any explanations for the ';' effect ?

thanks

Richard Rowe

Dr Richard Rowe
Zoology  Tropical Ecology
School of Tropical Biology
James Cook University
Townsville 4811
AUSTRALIA

ph +61 7 47 81 4851
fax +61 7 47 25 1570
JCU has CRICOS Provider Code 00117J

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strange logical results

2006-12-29 Thread Bill.Venables
Hi Erin,

You would be safe on a machine that represented floating point numbers
in base 10, and I haven't seen one of those for such a long time... All
modern machines use base 2 for floating point numbers.

The moral of the story is not to believe what you see printed.  The
number you see printed innocently as '-0.4' has been arrived at by two
different processes and uses two different *approximations* to the real
thing on a binary machine, and my chance they have arrived at a slightly
different result.  Slight, but enough to make '==' ring the alarm.  Here
is a demo.

 x - seq(-1,1,by=0.1)
 x[7] - (-0.4)
[1] 1.110223e-16

So the method used by seq() to arrive at an approximation to -0.4 is
just slightly different from the method used by the parser when it reads
the characters '-0.4' and translates them into a floating point number.
It just so happens that for the others you checked the two
approximations agreed, but you can't trust that to happen all the time.

Moral of the story: don't use the '==' or '!=' operators with floating
point numbers.  It's an old tale but still current.

OK, so what can you do to implement the idea of checking equality
'within a tolerance'?  I'm glad you asked.  You can write a couple of
binary operators yourself.  There is an object called .Machine that is a
list of machine constants.  The obvious one to compare the difference
with is .Machine$double.eps

 `%~=%` - function(a, b) abs(a - b)  .Machine$double.eps
 `%~!%` - function(a, b) abs(a - b)  .Machine$double.eps
 x %~=% -0.4
 [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE
[15] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 x %~!% -0.4
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
TRUE  TRUE  TRUE
[15]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

The world is approximately sane once more.

Bill Venables.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Erin Hodgess
Sent: Friday, 29 December 2006 8:36 PM
To: r-help@stat.math.ethz.ch
Subject: [R] strange logical results

Dear R People:  

I am getting some odd results when using logical operators:

 x - seq(from=-1,to=1,by=0.1)

  x
 [1] -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1  0.0  0.1  0.2
0.3  0.4
[16]  0.5  0.6  0.7  0.8  0.9  1.0
 x == -1
 [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 x == -0.9
 [1] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 x == -0.8
 [1] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 x == -0.7
 [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 x == -0.6
 [1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 x == -0.5
 [1] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 x == -0.4
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 

Should this show as true also, please?

I saw this in both Windows and LINUX Versions 2.4.0

Thanks in advance,
Sincerely,
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Higher Dimensional Matrices

2006-12-25 Thread Bill.Venables
Lars from space [sic] asks:
 -Original Message-

 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of downunder
 Sent: Monday, 25 December 2006 12:31 PM To: r-help@stat.math.ethz.ch
 Subject: [R] Higher Dimensional Matrices
 
 
 Hi all.
 
 I want to calculate partial correlations while controlling for one
 or more variables. That works already fine when I control for
 example just for x[,1] and x[,2] that gives me one single
 correlation matrix and i have to to it for x [,1]...x[,10]. That
 would give me out 10 matrices. Controlling for 2 Variables 100
 matrices. how can I run a loop to get f.e the 10 or 100 matrices at
 once? I appreciate for every hint. have nice holidays.

I don't quite understand this.  You have 10 variables and you want to
find the partial correlations controlling for two of them at a time.
If you take each possible set of two variables to condition upon at a
time, this would give you choose(10, 2) = 45 matrices, wouldn't it?
Where do you get '10 or 100' matrices from?

 
 greetings lars
 
 x - read.table(C:/.dat)
 dim(x) #200x10
 a - matrix(0,200,10)
 for (i in 1:10)
 a[,i] - residuals(lm(x[,i]~1+x[,1]+x[,2]))
 b - matrix(0,200,10)
 for (i in 1:10)
 b[,i] - residuals(lm(x[,i]~1+x[,1]+x[,2]))
 #a=round(a,5)
 #b=round(b,5)
 d - cor(a,b)
 d

But a and b are the same, aren't they?  Why do you need to compute
them twice?  Why not just use cor(a, a) [which is the same as cor(a),
of course]?

There is a more serious problem here, though.  Your residuals are
after regression on x[, 1:2] so if you *select* x[, 1:2] as the
y-variables in your regression then the residuals are going to be
zero, but in practice round-off error.  so the first two rows and
columns of d will be correlations with round-off error,
i.e. misleading junk.  It doesn't make sense to include the
conditioning variables in the correlation matrix *conditioning on
them*.  Only the 8 x 8 matrix of the others among themselves is
defined, really.

So how do we do it?  Here are a few pointers.

To start, here is a function that uses a somewhat more compact way of
finding the partial correlations than your method.  Sorting out how it
works should be a useful exercise.

partialCorr - function (cond, mat) 
cor(qr.resid(qr(cbind(1, mat[, cond])), mat[, -cond]))

To find the matrix of partial correlations conditioning on x[, 1:2]
you would use

d - partialCorr(c(1,2), x)

So how to do it for all possible conditioning pairs of variables.
Well you could do it in an obvious loop:

cmats - array(0, c(8,8,45))
k - 0
for(i in 1:9) for(j in (i+1):10) {
k - k+1
cmats[, , k] - partialCorr(c(i, j), x)
}

Now the results are in a 3-way array, but without any kind of labels.
Perhaps you should think about how to fix that one yourself...

With more than two conditioning variables you should probably use a
function to generate the subsets of the appropriate size rather than
trying to write ever more deeply nested loops.  There are plenty of
functions around to do this.

Bill Venables.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to sum one column in a data frame keyed on other columns

2006-12-12 Thread Bill.Venables
Here is an elementary way of doing it: 

 dat
  url time somethingirrelevant visits
1 www.foo.com 1:00 xxx100
2 www.foo.com 1:00 yyy 50
3 www.foo.com 2:00 xyz 25
4 www.bar.com 1:00 xxx200
5 www.bar.com 1:00 zzz200
6 www.foo.com 2:00 xxx500
 dat - transform(dat, key = paste(url, time))
 total_visits - with(dat, tapply(visits, key, sum))
 m - match(names(total_visits), dat$key)
 tdat - cbind(dat[m, c(url, time)], total_visits)
 tdat
  url time total_visits
4 www.bar.com 1:00  400
1 www.foo.com 1:00  150
3 www.foo.com 2:00  525
 

This should not be too difficult to morph into a fairly general
function.  Here's what I might do [warning: somewhat obscure code
follows]

sumUp - function(dat, key_list, sum_list) {
  key - with(dat, do.call(paste, dat[, key_list, drop = FALSE]))
  totals - as.matrix(sapply(dat[, sum_list, drop = FALSE], tapply, key,
sum))
  dimnames(totals)[[2]] - paste(total, sum_list, sep = _)
  m - match(dimnames(totals)[[1]], key)
  cbind(dat[m, key_list, drop = FALSE], totals)
}

check:

 sumUp(dat, c(url, time), visits)
  url time total_visits
4 www.bar.com 1:00  400
1 www.foo.com 1:00  150
3 www.foo.com 2:00  525

 sumUp(dat, url, visits)
  url total_visits
4 www.bar.com  400
1 www.foo.com  675

Question for the reader: why to you need 'drop = FALSE' (in three
places)?

Bill Venables. 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of George Nachman
Sent: Wednesday, 13 December 2006 9:35 AM
To: r-help@stat.math.ethz.ch
Subject: [R] How to sum one column in a data frame keyed on other
columns

I have a data frame that looks like this:

url time somethingirrelevant visits
www.foo.com 1:00 xxx 100
www.foo.com 1:00 yyy 50
www.foo.com 2:00 xyz 25
www.bar.com 1:00 xxx 200
www.bar.com 1:00 zzz 200
www.foo.com 2:00 xxx 500

I'd like to write some code that takes this as input and outputs
something like this:

url time total_vists
www.foo.com 1:00 150
www.foo.com 2:00 525
www.bar.com 1:00 400

In other words, I need to calculate the sum of visits for each unique
tuple of (url,time).

I can do it with this code, but it's very slow, and doesn't seem like
the right approach:

keys = list()
getkey = function(m,cols,index) { paste(m[index,cols],collapse=,)  }
for (i in 1:nrow(data)) { keys[[getkey(data,1:2,i)]] = 0 }
for (i in 1:nrow(data)) { keys[[getkey(data,1:2,i)]] =
keys[[getkey(data,1:2,i)]] + data[i,4] }

I'm sure there's a more functional-programming approach to this
problem! Any ideas?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strings as factors

2006-12-11 Thread Bill.Venables
Here is a possibility:

 test - expand.grid(id = 1:2, sex = c('male', 'female'))
 sapply(test, class)
   id   sex 
integer  factor 
 test - transform(test, sex = as.character(sex))
 sapply(test, class)
 id sex 
  integer character 


But I am surprised at the reason you give for needing it as a character
vector, because factors often act as character vectors under matching
anyway.

 sexf - factor(test[[2]])
 sexf
[1] male   male   female female
Levels: female male

 which(sexf %in% male)
[1] 1 2
 which(sexf == male)
[1] 1 2

Bill Venables
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Alexander Nervedi
Sent: Tuesday, 12 December 2006 10:09 AM
To: r-help@stat.math.ethz.ch
Subject: [R] strings as factors

Hi,
To be able to match cases with a benchmark I need to have a data.frame
with 
a character id variable. however, I am surprised why this seems to be so

hard. In fact I was  unable to succeed. Here is what I tried:

test1 -expand.grid(ID = 1:2, sex = c(male,female))
is(test1[,2])
[1] factor   oldClass
test2 -expand.grid(ID = 1:2, sex = c('male','female'))
is(test2[,2])
[1] factor   oldClass
test3 -expand.grid(ID = 1:2, sex = I(c(male,female)))
is(test3[,2])
[1] factor   oldClass
test4 -expand.grid(ID = 1:2, sex = I(c('male','female')))
is(test4[,2])
[1] factor   oldClass
options(stringsAsFactors = FALSE)
options(stringsAsFactors)
$stringsAsFactors
[1] FALSE

test5 -expand.grid(ID = 1:2, sex = I(c('male','female')))
is(test5[,2])
[1] factor   oldClass


is there anyway I can get sex to be a character?

Arnab

_
Visit MSN Holiday Challenge for your chance to win up to $50,000 in
Holiday

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Force square crosstabulation

2006-12-02 Thread Bill.Venables
Use factors with specified levels.

 lev - letters[1:4]
 table(factor(letters[1:4], levels = lev),
   factor(letters[c(1:3,3)], levels = lev)) 
   
a b c d
  a 1 0 0 0
  b 0 1 0 0
  c 0 0 1 0
  d 0 0 1 0

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Manuel Morales
Sent: Sunday, 3 December 2006 11:27 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Force square crosstabulation

Hello list members,

I'm looking for a way to force the results of a crosstabulation to be
square - that is, to include 0 values.

For example:

table(letters[1:4],letters[c(1:3,3)])

yields:
a b c
  a 1 0 0
  b 0 1 0
  c 0 0 1
  d 0 0 1

I would like to return:
a b c d
  a 1 0 0 0
  b 0 1 0 0
  c 0 0 1 0
  d 0 0 1 0

Any suggestions?

Thanks!
-- 
Manuel A. Morales
http://mutualism.williams.edu

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Command Line Prompt Symbol

2006-11-09 Thread Bill.Venables
The simplest way to do this is to put

options(prompt = R )

in your .Rprofile file.  If you don't have an .Rprifile file, 
then persuading Windows to let you call a file by that name can 
be frustrating.  I suggest you delegate the job to R itself and 
use something like this

---

wd - getwd()
setwd(Sys.getenv(R_USER))  ## change working directories 
cat('\noptions(prompt = R )\n',
file = .Rprofile, append = TRUE)
setwd(wd)

--- 
Putting the .Rprofile in the R_USER directory ensures that it 
will be used prior to all invocations of R you make.

Bill Venables, 
CMIS, CSIRO Laboratories, 
PO Box 120, Cleveland, Qld. 4163 
AUSTRALIA 
Office Phone (email preferred): +61 7 3826 7251 
Fax (if absolutely necessary):+61 7 3826 7304 
Mobile (rarely used):+61 4 1963 4642 
Home Phone:  +61 7 3286 7700 
mailto:[EMAIL PROTECTED] 
http://www.cmis.csiro.au/bill.venables/ 


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jacob van Wyk
Sent: Friday, 10 November 2006 3:33 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Command Line Prompt Symbol

Hi
I run R in Windows.
Is there a simple way of changing the prompt symbol  to, say, R ?
(Not just for a temporary session, but every time R command window is
opened.) The documentation of doing this is rather sparse.
Much appreciated for your assistance.
Jacob
 
 
Jacob L van Wyk
Department of Statistics
University of Johannesburg, APK
P O Box 524
Auckland Park 2006
South Africa
Tel: +27 11 489 3080
Fax: +27 11 489 2832
 
 

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Type II and III sum of square in Anova (R, car package)

2006-08-29 Thread Bill.Venables
I cannot resist a very brief entry into this old and seemingly
immortal issue, but I will be very brief, I promise!

Amasco Miralisus suggests:

 As I understood form R FAQ, there is disagreement among Statisticians
 which SS to use

(http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-does-the-output-from-a
nova_0028_0029-depend-on-the-order-of-factors-in-the-model_003f).

To let this go is to concede way too much.  The 'disagreement' is
really over whether this is a sensible question to ask in the first
place.  One side of the debate suggests that the real question is what
hypotheses does it make sense to test and within what outer
hypotheses.  Settle that question and no issue on types of sums of
squares arises.

This is often a hard question to get your head around, and the
attraction of offering a variety of 'types of sums of squares' holds
out the false hope that perhaps you don't need to do so.  The bad
news is that for good science and good decision making, you do.

Bill Venables.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] singular matrix

2006-08-28 Thread Bill.Venables
Nongluck Klibbua reports:

 -Original Message- 

 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Nongluck
 Klibbua
 Sent: Tuesday, 29 August 2006 11:12 AM
 To: R-help@stat.math.ethz.ch
 Subject: [R] singular matrix
 
 Dear R-users,
 I try to use solve to get the solution of this matrix.But it has
error
 happen below.
 How  I can solve this problem. 
 [1] a
   [,1]
 [1,] 0.8109437
 [2,] 5.7569740
 [1] b
   [,1]  [,2]
 [1,] 0.3141293  2.230037
 [2,] 2.2300367 15.831264
 
 Error in solve.default(b, a) : system is computationally singular:
 reciprocal condition number = 1.37415e-018

The irony seems to be that if you quote them to only this number of
digits then the system is no longer quite singular enough for the
problem to appear, at least on Windows R 2.3.1:

 a
  [,1]
[1,] 0.8109437
[2,] 5.7569740
 b
  [,1]  [,2]
[1,] 0.3141293  2.230037
[2,] 2.2300367 15.831264
 solve(b, a)
  [,1]
[1,]  2.5831242104
[2,] -0.0002203103
 b %*% solve(b, a) - a  ### check
  [,1]
[1,] -1.110223e-16
[2,] -8.881784e-16

In general, though, in dealing with singular systems you might want to
look at the function ginv in the MASS library.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Check values in colums matrix

2006-08-24 Thread Bill.Venables
As a minor footnote to both of these, I would add that both assume
that all the columns of the dataset are numeric.  It doesn't cost much
to generalize it to cover any matrix structure, of any mode:

constantColmuns - function(Xmat) 
which(apply(Xmat, 2, function(z) length(unique(z)) == 1))

 -Original Message-
 From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Berton Gunter
 Sent: Friday, 25 August 2006 9:37 AM
 To: 'Gabor Grothendieck'; 'Muhammad Subianto'
 Cc: r-help@stat.math.ethz.ch
 Subject: Re: [R] Check values in colums matrix
 
 Absolutely. But do note that if the values in obj are the product of
 numerical computations then columns of equal values may turn out to be
only
 **nearly** equal and so the sd may turn out to be **nearly** 0 and not
 exactly 0. This is a standard issue in numerical computation, of
course, and
 has been commented on in this list at least dozens of times, but it's
still
 a gotcha for the unwary (so now dozens +1).
 
 -- Bert Gunter
 Genentech Non-Clinical Statistics
 South San Francisco, CA
  
  
 
  -Original Message-
  From: [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of Gabor 
  Grothendieck
  Sent: Thursday, August 24, 2006 4:28 PM
  To: Muhammad Subianto
  Cc: r-help@stat.math.ethz.ch
  Subject: Re: [R] Check values in colums matrix
  
  Try sd(obj.tr) which will give a vector of standard 
  deviations, one per column.
  A column's entry will be zero if and only if all values in the
column
  are the same.
  
  On 8/24/06, Muhammad Subianto [EMAIL PROTECTED] wrote:
   Dear all,
   I apologize if my question is quite simple.
   I have a dataset (20 columns  1000 rows) which
   some of columns have the same value and the others
   have different values.
   Here are some piece of my dataset:
   obj - cbind(c(1,1,1,4,0,0,1,4,-1),
   c(0,1,1,4,1,0,1,4,-1),
   c(1,1,1,4,2,0,1,4,-1),
   c(1,1,1,4,3,0,1,4,-1),
   c(1,1,1,4,6,0,1,5,-1),
   c(1,1,1,4,6,0,1,6,-1),
   c(1,1,1,4,6,0,1,7,-1),
   c(1,1,1,4,6,0,1,8,-1))
   obj.tr - t(obj)
   obj.tr
obj.tr
   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
   [1,]11140014   -1
   [2,]01141014   -1
   [3,]11142014   -1
   [4,]11143014   -1
   [5,]11146015   -1
   [6,]11146016   -1
   [7,]11146017   -1
   [8,]11146018   -1
   
  
   How can I do to check columns 2,3,4,6,7 and 9 have
   the same value, and columns 1,5 and 8 have different values.
  
   Best, Muhammad Subianto

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] glm inside one self-defined function

2006-08-22 Thread Bill.Venables
Mike Wolfgang asks:


 From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Mike Wolfgang
 Sent: Wednesday, 23 August 2006 1:31 PM
 To: R-help list
 Subject: [R] glm inside one self-defined function
 
 Hi list,
 
 I've searched in R-help and found some related discussions but still
could
 not understand this type of error. My own function is pretty complex,
so I
 would not put it here, but the basic algorithm is like this:
 myfun-function(k){
   mydata-...#by someway I create a data frame
   mymodel-glm(y~.,family=binomial(),data=mydata)
   ...#some other stuff
 }

I think you are leaving out something.  Here is a test of what you
claim gives a problem (R 2.3.1, Windows):

 myfun - function(n) {
+   z - rnorm(n)
+   mydata - data.frame(x = z, 
+ y = rbinom(n, size = 1, prob = exp(z)/(1+exp(z
+   fm - glm(y ~ x, binomial, mydata)
+   fm
+ }
 
 myfun(100)

Call:  glm(formula = y ~ x, family = binomial, data = mydata) 

Coefficients:
(Intercept)x  
 0.1587   1.0223  

Degrees of Freedom: 99 Total (i.e. Null);  98 Residual
Null Deviance:  137.6 
Residual Deviance: 118.3AIC: 122.3 

Not even a murmur of complaint.  (This also works in S-PLUS 7.0 but
earlier versions of S-PLUS gave a problem rather like the one you note,
curiously.)

Look again at your code and see if the abstract version you give
really matches what you did, may I suggest?

 
 as I execute this function, it gives error like this
 Error in inherits(x, data.frame) : object mydata not found
 
 So I guess glm here tries to find mydata in the parent environment.
Why
 doesn't it take mydata inside the function? How to let glm correctly
 locate it? Is this (scope/environment) mentioned in R manual? Thanks,
 
 Mike

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with workaround for: Function '`[`' is not in thederivatives table

2006-08-15 Thread Bill.Venables
Earl F. Glynn asks: 
 -Original Message-
 From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Earl F. Glynn
 Sent: Tuesday, 15 August 2006 8:44 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Help with workaround for: Function '`[`' is not in
thederivatives table
 
 # This works fine:
  a - 1
  b - 2
  c - 3
  E - expression(a * exp(b*X) + c)
  X - c(0.5, 1.0, 2.0)
  eval(E)
 [1]  5.718282 10.389056 57.598150
  D(E, b)
 a * (exp(b * X) * X)
  eval(D(E, b))
 [1]   1.359141   7.389056 109.196300
 
 # But if (a,b,c) are replaced with (A[1], A[2], A[3]), how can I get a

 derivative using D?

It's well to note what D can differentiate with respect to.  The
second argument is, to quote the help page, a character string giving
the name of a variable...  'A[1]' is a character string, but it is
not the name of a variable.  When parsed it becomes a call to the
function, literally, `[`.

  A - c(1, 2, 3)
  E - expression(A[1] * exp(A[2]*X) + A[3])
  X - c(0.5, 1.0, 2.0)
  eval(E)
 [1]  5.718282 10.389056 57.598150

No problem, because the evaluator does know how to evaluate `[`, along
with everything else in this expr.

 
 
 
 # Why doesn't this work?  Any workarounds?
  D(E, A[2])
 Error in D(E, A[2]) : Function '`[`' is not in the derivatives table

It fails because 'A[2]' is not a (character string giving the) name of
a variable.  I think the error message could be a bit more
informative, but ... all you need to do is read he help page, really.

  If I want to have a long vector of coefficients, A, (perhaps
 dozens) how can I use D to compute partial derivatives?

Essentially you need to turn calls to '[' into names of variables, do
the derivative business and then turn them back again.  This is not
easy to do in complete generality, but if you are only talking about
singly subscripted arrays, you can get somewhere.  Here is an outline
of what I mean.

 Ident - ([A-z][A-z0-9_.]*)
 Subsc - ([A-z][A-z0-9_.]*|[0-9]+)
 patn - paste(Ident, \\[, Subsc, \\], sep = )
 repl - \\1__\\2
 E - expression(A[1] * exp(A[2]*X) + A[3])
 Es - deparse(E[[1]])
 Es
[1] A[1] * exp(A[2] * X) + A[3]
 Ess - gsub(patn, repl, Es)
 Ess
[1] A__1 * exp(A__2 * X) + A__3
 Ex - parse(text = Ess)[[1]]
 Ex
A__1 * exp(A__2 * X) + A__3

OK, the calls to `[` have been replaced by variables with two
underscores in the middle.  We hope this works - there is a strong
assumption here on just how complicated your indices are, for example.
We are assuming they are either identifiers (not using two successive
underscores in their names) or numbers.  If they are not, you need to
get even craftier.


 Ex1 - D(Ex, A__2)
 Ex1
A__1 * (exp(A__2 * X) * X)
 Ex1s - deparse(Ex1)
 Ex1s
[1] A__1 * (exp(A__2 * X) * X)
 pat1 - paste(Ident, __, Subsc, sep = )
 rep1 - \\1\\[\\2\\]
 Ex1ss - gsub(pat1, rep1, Ex1s)
 Ex1ss
[1] A[1] * (exp(A[2] * X) * X)
 Ex2 - parse(text = Ex1ss)[[1]]
 Ex2
A[1] * (exp(A[2] * X) * X)

Which is the required result.  This is messy and gets messier if you
are looking for some kind of generality, but you need to remember, R
is not, and never will be, a replacement for Maple.

 
 
 Thanks for any help with this.

Best of luck in automating this, but the tools are there.

Bill Venables.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Constrain coefs. in linear model to sum to 0

2006-08-07 Thread Bill.Venables
 
Gorjanc Gregor asks:

 
 Hello!
 
 I would like to use constrain to sum coeficients of a factor to 0
instead
 of classical corner contraint i.e. I would like to fit a model like
 
 lm(y ~ 1 + effectA + effectB)
 
 and say get parameters
 
 intercept
 effectA_1
 effectA_2
 effectB_1
 effectB_2
 effectB_3
 
 where effectA_1 represents deviation of level A_1 from intercept and 
 sum(effectA_1, effectA_2) = 0 and the same for factor B.
 
 Is this possible to do?

Yes, but not quite as simply as you would like.  If you set

options(contrasts = c(contr.sum, contr.poly))

for example, then factor models are parametrised as you wish above,
BUT you don't get all the effects directly

In your case above, for example, if fm is the fitted model object, then

coef(fm)

Will give you intercept, effectA_2, effectB_2, effectB_3.  The
remaining effects*_1 you will need to calculate as the negative of the
sum of all the others.

This gets a bit more complicated when you have crossed terms, a*b, but
the same principle applies.

Bill Venables.


 
 Lep pozdrav / With regards,
 Gregor Gorjanc
 
 --
 University of Ljubljana PhD student
 Biotechnical FacultyURI: http://www.bfro.uni-lj.si/MR/ggorjan
 Zootechnical Department mail: gregor.gorjanc at bfro.uni-lj.si
 Groblje 3   tel: +386 (0)1 72 17 861
 SI-1230 Domzale fax: +386 (0)1 72 17 888
 Slovenia, Europe
 --
 One must learn by doing the thing; for though you think you know it,
  you have no certainty until you try. Sophocles ~ 450 B.C.


Well, now's your chance!

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fitting models in a loop

2006-08-01 Thread Bill.Venables
 
Markus Gesmann writes:

 Murray,
 
 How about creating an empty list and filling it during your loop:
 
  mod - list()
  for (i in 1:6) {
 mod[[i]] - lm(y ~ poly(x,i))
 print(summary(mod[[i]]))
 }
 
 All your models are than stored in one object and you can use lapply
to
 do something on them, like:
  lapply(mod, summary) or lapply(mod, coef)

I think it is important to see why this deceptively simple 
solution does not achieve the result that Murray wanted.

Take any fitted model object, say mod[[4]].  For this object the 
formula component of the call will be, literally,  y ~ poly(x, i), 
and not y ~ poly(x, 4), as would be required to use the object,
e.g. for prediction.  In fact all objects have the same formula.

You could, of course, re-create i and some things would be OK, 
but getting pretty messy.

You would still have a problem if you wanted to plot the fit with 
termplot(), for example, as it would try to do a two-dimensional 
plot of the component if both arguments to poly were variables.

 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of
 [EMAIL PROTECTED]
 Sent: 01 August 2006 06:16
 To: [EMAIL PROTECTED]; r-help@stat.math.ethz.ch
 Subject: Re: [R] Fitting models in a loop
 
 
 Murray,
 
 Here is a general paradigm I tend to use for such problems.  It
extends
 to fairly general model sequences, including different responses, c
 
 First a couple of tiny, tricky but useful functions:
 
 subst - function(Command, ...) do.call(substitute, list(Command,
 list(...)))
 
 abut - function(...)  ## jam things tightly together
   do.call(paste, c(lapply(list(...), as.character), sep = )) 
 
 Name - function(...) as.name(do.call(abut, list(...)))
 
 Now the gist.
 
 fitCommand - quote({
 MODELi - lm(y ~ poly(x, degree = i), theData)
 print(summary(MODELi))
 })
 for(i in 1:6) {
 thisCommand - subst(fitCommand, MODELi = Name(model_, i), i
=
 i)
 print(thisCommand)  ## only as a check
 eval(thisCommand)
 }
 
 At this point you should have the results and
 
 objects(pat = ^model_)
 
 should list the fitted model objects, all of which can be updated,
 summarised, plotted, c, because the information on their construction
 is all embedded in the call.
 
 Bill.
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Murray
Jorgensen
 Sent: Tuesday, 1 August 2006 2:09 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] Fitting models in a loop
 
 If I want to display a few polynomial regression fits I can do
something
 
 like
 
 for (i in 1:6) {
 mod - lm(y ~ poly(x,i))
 print(summary(mod))
 }
 
 Suppose that I don't want to over-write the fitted model objects, 
 though. How do I create a list of blank fitted model objects for later

 use in a loop?
 
 Murray Jorgensen
 --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fitting models in a loop

2006-08-01 Thread Bill.Venables
This (below) also runs into trouble if you try to predict with new data
since you have no rule for re-constructing the formula.  Also, you can't
plot the term as a single contributor to the linear predictor with
termplot().

I'm sure given enough ingenuity you can get round these two, but why
avoid the language manipulation solution, when it does the lot?

Bill.


-Original Message-
From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, 2 August 2006 12:01 PM
To: Venables, Bill (CMIS, Cleveland)
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED];
r-help@stat.math.ethz.ch
Subject: Re: [R] Fitting models in a loop

A simple way around this is to pass it as a data frame.
In the code below the only change we made was to change
the formula from y ~ poly(x, i) to y ~ . and pass poly(x,i)
in a data frame as argument 2 of lm:

# test data
set.seed(1)
x - 1:10
y - x^3 + rnorm(10)

# run same code except change the lm call
mod - list()
for (i in 1:3) {
mod[[i]] - lm(y ~., data.frame(poly(x, i)))
print(summary(mod[[i]]))
}

After running the above we can test that it works:

 for(i in 1:3) print(formula(mod[[i]]))
y ~ X1
y ~ X1 + X2
y ~ X1 + X2 + X3

On 8/1/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 Markus Gesmann writes:

  Murray,
 
  How about creating an empty list and filling it during your loop:
 
   mod - list()
   for (i in 1:6) {
  mod[[i]] - lm(y ~ poly(x,i))
  print(summary(mod[[i]]))
  }
 
  All your models are than stored in one object and you can use lapply
 to
  do something on them, like:
   lapply(mod, summary) or lapply(mod, coef)

 I think it is important to see why this deceptively simple
 solution does not achieve the result that Murray wanted.

 Take any fitted model object, say mod[[4]].  For this object the
 formula component of the call will be, literally,  y ~ poly(x, i),
 and not y ~ poly(x, 4), as would be required to use the object,
 e.g. for prediction.  In fact all objects have the same formula.

 You could, of course, re-create i and some things would be OK,
 but getting pretty messy.

 You would still have a problem if you wanted to plot the fit with
 termplot(), for example, as it would try to do a two-dimensional
 plot of the component if both arguments to poly were variables.

 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of
  [EMAIL PROTECTED]
  Sent: 01 August 2006 06:16
  To: [EMAIL PROTECTED]; r-help@stat.math.ethz.ch
  Subject: Re: [R] Fitting models in a loop
 
 
  Murray,
 
  Here is a general paradigm I tend to use for such problems.  It
 extends
  to fairly general model sequences, including different responses, c
 
  First a couple of tiny, tricky but useful functions:
 
  subst - function(Command, ...) do.call(substitute, list(Command,
  list(...)))
 
  abut - function(...)  ## jam things tightly together
do.call(paste, c(lapply(list(...), as.character), sep = ))
 
  Name - function(...) as.name(do.call(abut, list(...)))
 
  Now the gist.
 
  fitCommand - quote({
  MODELi - lm(y ~ poly(x, degree = i), theData)
  print(summary(MODELi))
  })
  for(i in 1:6) {
  thisCommand - subst(fitCommand, MODELi = Name(model_, i),
i
 =
  i)
  print(thisCommand)  ## only as a check
  eval(thisCommand)
  }
 
  At this point you should have the results and
 
  objects(pat = ^model_)
 
  should list the fitted model objects, all of which can be updated,
  summarised, plotted, c, because the information on their
construction
  is all embedded in the call.
 
  Bill.
 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Murray
 Jorgensen
  Sent: Tuesday, 1 August 2006 2:09 PM
  To: r-help@stat.math.ethz.ch
  Subject: [R] Fitting models in a loop
 
  If I want to display a few polynomial regression fits I can do
 something
 
  like
 
  for (i in 1:6) {
  mod - lm(y ~ poly(x,i))
  print(summary(mod))
  }
 
  Suppose that I don't want to over-write the fitted model objects,
  though. How do I create a list of blank fitted model objects for
later

  use in a loop?
 
  Murray Jorgensen
  --

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fitting models in a loop

2006-07-31 Thread Bill.Venables
Murray,

Here is a general paradigm I tend to use for such problems.  It extends
to fairly general model sequences, including different responses, c

First a couple of tiny, tricky but useful functions:

subst - function(Command, ...) do.call(substitute, list(Command,
list(...)))

abut - function(...)  ## jam things tightly together
  do.call(paste, c(lapply(list(...), as.character), sep = )) 

Name - function(...) as.name(do.call(abut, list(...)))

Now the gist.

fitCommand - quote({
MODELi - lm(y ~ poly(x, degree = i), theData)
print(summary(MODELi))
})
for(i in 1:6) {
thisCommand - subst(fitCommand, MODELi = Name(model_, i), i =
i)
print(thisCommand)  ## only as a check
eval(thisCommand)
}

At this point you should have the results and

objects(pat = ^model_)

should list the fitted model objects, all of which can be updated,
summarised, plotted, c, because the information on their construction
is all embedded in the call.

Bill.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Murray Jorgensen
Sent: Tuesday, 1 August 2006 2:09 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Fitting models in a loop

If I want to display a few polynomial regression fits I can do something

like

for (i in 1:6) {
mod - lm(y ~ poly(x,i))
print(summary(mod))
}

Suppose that I don't want to over-write the fitted model objects, 
though. How do I create a list of blank fitted model objects for later 
use in a loop?

Murray Jorgensen
-- 
Dr Murray Jorgensen  http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [EMAIL PROTECTED]Fax 7 838 4155
Phone  +64 7 838 4773 wkHome +64 7 825 0441Mobile 021 1395 862

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating 3D Gaussian Plot

2006-01-28 Thread Bill.Venables
Getting a picture like this is pretty easy.  e.g.

x - y - seq(-5, 5, len = 200)
X - expand.grid(x = x, y = y)
X - transform(X, z = dnorm(x, -2.5)*dnorm(y) - dnorm(x, 2.5)*dnorm(y))
z - matrix(X$z, nrow = 200)

persp(x, y, z, col = lightgoldenrod, border = NA,
   theta = 30, phi = 15, ticktype = detailed, 
   ltheta = -120, shade = 0.25)

You can vary things as you wish.  

I don't follow the remark about picking grid points at random for
analysis, though.  On simple, entirely deterministic things like this
wouldn't you just be analysing the randomness that you inject into it by
the choice process, effectively?

Bill Venables.



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Laura Quinn
Sent: Sunday, 29 January 2006 12:28 AM
To: Duncan Murdoch
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Creating 3D Gaussian Plot


My apologies.

With further apologies for the poor graphics, this link demonstrates the
sort of 3d mesh which I am hoping to replicate - I would like to be able
to replicate a number of these of varying intensity. Demonstrating
different levels of potential via the steepness of the slopes.

http://maxwell.ucdavis.edu/~electro/potential/images/steep.jpg

I then wish to pick a number of grid points at random from the output to
perform a further analysis upon.

I hope this makes things a little clearer!

Again, any help gratefully received, thank you.


Laura Quinn
Institute of Atmospheric Science
School of Earth and Environment
University of Leeds
Leeds
LS2 9JT

tel: +44 113 343 1596
fax: +44 113 343 6716
mail: [EMAIL PROTECTED]

On Sat, 28 Jan 2006, Duncan Murdoch wrote:

 On 1/28/2006 8:55 AM, Laura Quinn wrote:
  Hello,
 
  I requested help a couple of weeks ago creating a dipole field in R
but
  receieved no responses. Eventually I opted to create a 3d sinusoidal
plot
  and concatenate this with its inverse as a means for a next best
  situation. It seems that this isn't sufficient for my needs and I'm
really
  after creating a continuous 3d gaussian mesh with a positive and
  negative dipole.

 The names you're using don't mean anything to me; perhaps there just
 aren't enough atmospheric scientists on the list and that's why you
 didn't get any response.  If you don't get a response this time, you
 should describe what you want in basic terms, and/or point to examples
 of it on the web.

 Duncan Murdoch

 
  Can anyone offer any pointers at all?
 
  Laura Quinn
  Institute of Atmospheric Science
  School of Earth and Environment
  University of Leeds
  Leeds
  LS2 9JT
 
  tel: +44 113 343 1596
  fax: +44 113 343 6716
  mail: [EMAIL PROTECTED]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] How do you convert this from S Plus to R - any help appreciated . thanks

2006-01-27 Thread Bill.Venables
This particular call to exportData is probably equivalent in effect to
the R call:

write.table(MU.Cost, 
file = paste(C:/RAUDSL/S,
as.character(MU.Cost$Run.Id[1]),
.,as.character(MU.Cost$MU.Id[1]),
.MU.PRICE.OUTPUT.txt, sep=),
append = FALSE)

as Barry sort of guesses, but in general exportData handles a wide
variety of data export facilities, including writing to odbc connexions,
so other uses of exportData would need the RODBC library in R and
functions like sqlSave or sqlUpdate.

write.table is also an S-PLUS function as well, but exportData is
generally a whole lot faster.

Bill Venables.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Barry Rowlingson
Sent: Friday, 27 January 2006 7:57 PM
To: r-help@stat.math.ethz.ch
Subject: Re: [R] How do you convert this from S Plus to R - any help
appreciated . thanks


Briggs, Meredith M wrote:


exportData(MU.Cost,paste(C:/RAUDSL/S,as.character(MU.Cost$Run.Id[1]),

.,as.character(MU.Cost$MU.Id[1]),.MU.PRICE.OUTPUT.txt,sep=),append
 = FALSE,type=ASCII,quote=FALSE)

  Looks like perfectly good R to me.

  Except there's no exportData function. I assume this is an Splus 
function that R doesn't have, in which case telling us what it does 
might help. What does the Splus manual have to say about it?

I'm guessing R's write.table might be of use.

  Assuming its exportData that has you stuck - the other bits should 
allwork in R no problem, all it does is construct a path from parts of 
the MU.Cost object.

Barry

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] D(dnorm...)?

2006-01-25 Thread Bill.Venables
Hi Spencer,

I think if you have a problem that needs a lot of symbolic manipulation
you are probably better off driving it directly from something like
Maple or Mathematica (I prefer maple, actually) than trying to drive it
from R.  It just gets too clumsy.  On the other hand it is very handy
having a simple differentiator available in R, like D(...) for small
jobs that are not worth taking to a big system like Maple.  The point I
was trying to make in the previous message was that with a little
thought it could be made a lot more convenient.  

This arose in connexion with a real problem.  We needed to differentiate
a pdf that had the normal density function in it, but was otherwise
quite simple and we had to hack the code in another system (not unlike
R, as it happens) to handle it.  The hack was quite small and it became
clear that with a slight change of design users would not need to do
hacks like this for simple extensions such as the one we needed.  As it
was a hack, we only put it in for the standard density and I suspect
that is the reason why even now the derivative tables in both R and the
other system (not unlike R) only handle normal density and distribution
funcitons in one variable.

I'm sort of avoiding your question, because I don't know how hard it
would be to link R with Yacas, either, but if you really wanted to go
that way I see that Yacas can be driven over the net via a Java applet.
Something like this might provide the simplest link, if not the most
efficient.  But note that Yacas uses the eccentric Mathematica notation,
where the functions are Capitalised, for example, as nouns are in
German.  That's a small bother you could do without, too.

Regards,
Bill Venables.


-Original Message-
From: Spencer Graves [mailto:[EMAIL PROTECTED] 
Sent: Thursday, 26 January 2006 2:14 PM
To: Venables, Bill (CMIS, Cleveland)
Cc: [EMAIL PROTECTED]; r-help@stat.math.ethz.ch;
[EMAIL PROTECTED]
Subject: Re: [R] D(dnorm...)?


Hello, Bill:

  I'm not qualified to make this suggestion since I'm incapable
of 
turning it into reality, but what about creating a link between R and 
one of the Mathematica clones like Yacas?  I can immagine that it could 
be substantially more difficult than linking R to other software like 
Excel, but ... .

  Spencer Graves

[EMAIL PROTECTED] wrote:
 Yes Bert, this time you are missing something (unusually) ...
 
 As Brian Ripley pointed out 'dnorm' is in the derivative table, *but*
 only as a function of one variable.  So if you want to find the
 derivative of 
 
 dnorm(x, mean, sigma)
 
 you have to write it as 1/sigma * dnorm((x - mu)/sigma).  Here is a
 little example:
 
 
D(Quote(pnorm((x-mu)/sigma)), x)
 
 dnorm((x - mu)/sigma) * (1/sigma)
 
 
D(D(Quote(pnorm((x-mu)/sigma)), x), mu)
 
 (x - mu)/sigma * (dnorm((x - mu)/sigma) * (1/sigma)) * (1/sigma)
 
 ---
 
 Like Brian, I recall the suggestion that we make D(...) extensible.  I
 still think it is a good idea and worth considering.  Under one scheme
 you would specify an object such as
 
 Fnorm - structure(quote(pnorm(x, mu, sigma)), 
   deriv = 
   list(x = Quote(dnorm(x, mu, sigma)/sigms),
   mu = Quote(-dnorm(x, mu, sigma)/sigma),
   sigma = Quote(-(x - mu)*dnorm(x, mu, sigma)/sigma^2),
   class = dfunction)
 
 ane write a generic differentiate function with a dfunction method
 and D as the default.
 
 I don't think it's quite that easy, but the plan is clear enough.
 
 Bill.
 
 
 
 
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Berton Gunter
 Sent: Thursday, 26 January 2006 8:58 AM
 To: 'Spencer Graves'; r-help@stat.math.ethz.ch
 Subject: Re: [R] D(dnorm...)?
 
 
 dnorm() is an internal function, so I don't see how D (or deriv) can
do
 anything with it symbolically. Am I missing something?
 
 -- Bert
  
  
 
 
-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Spencer Graves
Sent: Wednesday, January 25, 2006 2:43 PM
To: r-help@stat.math.ethz.ch
Subject: [R] D(dnorm...)?

Can someone help me understand the following:

  D(expression(dnorm(x, mean)), mean)
[1] 0
  sessionInfo()

R version 2.2.1, 2005-12-20, i386-pc-mingw32

attached base packages:
[1] methods   stats graphics  grDevices utils   
  datasets
[7] base

By my computations, this should be something like 
((mean-x)/sd^2)*dnorm(...).

Thanks for your help.
Spencer Graves

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing 

Re: [R] D(dnorm...)?

2006-01-25 Thread Bill.Venables
While symbolic computation is handy, I actually think a more pressing
addition to R is some kind of automatic differentiation facility,
particularly 'reverse mode' AD, which can be spectacular.  There are
free tools available for it as well, though I don't know how well
developed they are.  See:

http://www-unix.mcs.anl.gov/autodiff/AD_Tools/

I admit this is not quite the same thing, but for statistical
computations this is, in my experience, the key thing you need.  (Well,
for frequentist estimation at any rate...)

There are commercial systems that use this idea already, of course.  Two
that I know of are 'ADMB' (and its associated 'ADMB-RE' for random
effects) estimation and of course the 'S-NUOPT' module for another
system not unlike R.

ADMB is, frankly, difficult to use but it performs so well and so
quickly once you get it going nothing else seems to come close to it.  I
has become almost a de-facto standard at the higher end of the fishery
stock assessment game, for example, where they are always fitting huge,
highly complex and very non-linear models.

Bill V.

-Original Message-
From: Berwin A Turlach [mailto:[EMAIL PROTECTED] On
Behalf Of Berwin A Turlach
Sent: Thursday, 26 January 2006 4:50 PM
To: Spencer Graves
Cc: Venables, Bill (CMIS, Cleveland); r-help@stat.math.ethz.ch;
[EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: [R] D(dnorm...)?


G'day Spencer,

 SG == Spencer Graves [EMAIL PROTECTED] writes:

SG I'm not qualified to make this suggestion since I'm incapable
SG of turning it into reality, [...]
This statement applies to me too, nevertheless I would like to point
out the following GPL library:

http://www.gnu.org/software/libmatheval/

I am wondering since some time how hard it would be to incorporate
that library into R and let it provide symbolic differentiation
capabilities for R.

Cheers,

Berwin

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] D(dnorm...)?

2006-01-25 Thread Bill.Venables
Yes Bert, this time you are missing something (unusually) ...

As Brian Ripley pointed out 'dnorm' is in the derivative table, *but*
only as a function of one variable.  So if you want to find the
derivative of 

dnorm(x, mean, sigma)

you have to write it as 1/sigma * dnorm((x - mu)/sigma).  Here is a
little example:

 D(Quote(pnorm((x-mu)/sigma)), x)
dnorm((x - mu)/sigma) * (1/sigma)

 D(D(Quote(pnorm((x-mu)/sigma)), x), mu)
(x - mu)/sigma * (dnorm((x - mu)/sigma) * (1/sigma)) * (1/sigma)

---

Like Brian, I recall the suggestion that we make D(...) extensible.  I
still think it is a good idea and worth considering.  Under one scheme
you would specify an object such as

Fnorm - structure(quote(pnorm(x, mu, sigma)), 
deriv = 
list(x = Quote(dnorm(x, mu, sigma)/sigms),
mu = Quote(-dnorm(x, mu, sigma)/sigma),
sigma = Quote(-(x - mu)*dnorm(x, mu, sigma)/sigma^2),
class = dfunction)

ane write a generic differentiate function with a dfunction method
and D as the default.

I don't think it's quite that easy, but the plan is clear enough.

Bill.





-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Berton Gunter
Sent: Thursday, 26 January 2006 8:58 AM
To: 'Spencer Graves'; r-help@stat.math.ethz.ch
Subject: Re: [R] D(dnorm...)?


dnorm() is an internal function, so I don't see how D (or deriv) can do
anything with it symbolically. Am I missing something?

-- Bert
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Spencer Graves
 Sent: Wednesday, January 25, 2006 2:43 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] D(dnorm...)?
 
 Can someone help me understand the following:
 
   D(expression(dnorm(x, mean)), mean)
 [1] 0
   sessionInfo()
 
 R version 2.2.1, 2005-12-20, i386-pc-mingw32
 
 attached base packages:
 [1] methods   stats graphics  grDevices utils   
   datasets
 [7] base
 
 By my computations, this should be something like 
 ((mean-x)/sd^2)*dnorm(...).
 
 Thanks for your help.
 Spencer Graves
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Making a markov transition matrix

2006-01-22 Thread Bill.Venables
That solution for the case 'with gaps' merely omits transitions where
the transition information is not for a single time step.  (Mine can be
modified for this as well - see below.)

But if you know that a firm went from state i in year y to state j in
year y+3, say, without knowing the intermediate states, that must tell
you something about the 1-step transition matrix as well.  How do you
use this information?

That's a much more difficult problem but you can do it using maximum
likelihood, e.g.  You think about how to calculate the likelihood
function - and then to optimise it.  This is getting a bit away from the
original 'programming trick' question, but it is an interesting problem
that occurs more often than I had realised.  I'd be interested in
knowing if anyone had done anything slick in this area.

Bill Venables.

-Original Message-
From: Ajay Narottam Shah [mailto:[EMAIL PROTECTED] 
Sent: Sunday, 22 January 2006 5:15 PM
To: R-help
Cc: [EMAIL PROTECTED]; Venables, Bill (CMIS, Cleveland)
Subject: Re: [R] Making a markov transition matrix


On Sun, Jan 22, 2006 at 01:47:00PM +1100, [EMAIL PROTECTED] wrote:
 If this is a real problem, here is a slightly tidier version of the
 function I gave on R-help:
 
 transitionM - function(name, year, state) {
   raw - data.frame(name = name, state = state)[order(name, year), ]
   raw01 - subset(data.frame(raw[-nrow(raw), ], raw[-1, ]), 
 name == name.1)
   with(raw01, table(state, state.1))
 }

To modify this solution for the 'with gaps' case, omitting multiple step
transitions, you need to include the year in the 'raw' data frame and
then just change the subset condition to

name == name.1  year == year.1 - 1


 
 Notice that this does assume there are 'no gaps' in the time series
 within firms, but it does not require that each firm have responses
for
 the same set of years.
 
 Estimating the transition probability matrix when there are gaps
within
 firms is a more interesting problem, both statistically and, when you
 figure that out, computationally.

With help from Gabor, here's my best effort. It should work even if
there are gaps in the timeseries within firms, and it allows different
firms to have responses in different years. It is wrapped up as a
function which eats a data frame. Somebody should put this function
into Hmisc or gtools or something of the sort.

# Problem statement:
#
# You are holding a dataset where firms are observed for a fixed
# (and small) set of years. The data is in long format - one
# record for one firm for one point in time. A state variable is
# observed (a factor).
# You wish to make a markov transition matrix about the time-series
# evolution of that state variable.

set.seed(1001)

# Raw data in long format --
raw - data.frame(name=c(f1,f1,f1,f1,f2,f2,f2,f2),
  year=c(83,   84,  85,  86,  83,  84,  85,  86),
  state=sample(1:3, 8, replace=TRUE)
  )

transition.probabilities - function(D, timevar=year,
 idvar=name, statevar=state) {
  merged - merge(D, cbind(nextt=D[,timevar] + 1, D),
by.x = c(timevar, idvar), by.y = c(nextt, idvar))
  t(table(merged[, grep(statevar, names(merged), value = TRUE)]))
}

transition.probabilities(raw, timevar=year, idvar=name,
statevar=state)

-- 
Ajay Shah
http://www.mayin.org/ajayshah  
[EMAIL PROTECTED]
http://ajayshahblog.blogspot.com
*(:-? - wizard who doesn't know the answer.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Making a markov transition matrix

2006-01-21 Thread Bill.Venables
If you can be sure that there are no missing years within firms, I think
I would do it this way:

 raw - raw[do.call(order, raw), ]  # not needed here

 raw01 - subset(data.frame(raw[-nrow(raw), ], raw[-1, ]), name ==
name.1)
 with(raw01, table(state, state.1))

 state.1
state 1 2 3
1 1 0 0
2 0 2 1
3 1 1 0

So what would the general function look like?  I suppose

transitionM - function(name, year, state) {
  raw - data.frame(name = name, year = year, state = state)
  raw - raw[do.call(order, raw), ]  # needed in general
  raw01 - subset(data.frame(raw[-nrow(raw), ], raw[-1, ]), name ==
name.1)
  with(raw01, table(state, state.1))
}

give it a burl:

 with(raw, transitionM(name, year, state))
 state.1
state 1 2 3
1 1 0 0
2 0 2 1
3 1 1 0

(NB no 'for' loops.)  ezy peezy.

W.


Bill Venables, 
CMIS, CSIRO Laboratories, 
PO Box 120, Cleveland, Qld. 4163 
AUSTRALIA 
Office Phone (email preferred): +61 7 3826 7251 
Fax (if absolutely necessary):+61 7 3826 7304 
Mobile (rarely used):+61 4 1963 4642 
Home Phone:  +61 7 3286 7700 
mailto:[EMAIL PROTECTED] 
http://www.cmis.csiro.au/bill.venables/ 



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of jim holtman
Sent: Sunday, 22 January 2006 11:20 AM
To: Ajay Narottam Shah
Cc: R-help
Subject: Re: [R] Making a markov transition matrix


Ignore last reply.  I sent the wrong script.

 set.seed(1001)

 # Raw data in long format --
 raw - data.frame(name=c(f1,f1,f1,f1,f2,f2,f2,f2),
+  year=c(83,   84,  85,  86,  83,  84,  85,  86),
+  state=sample(1:3, 8, replace=TRUE)
+  )
 # Shift to wide format --
 fixedup - reshape(raw, timevar=year, idvar=name, v.names=state,
+   direction=wide)

 trans - as.matrix(fixedup)
 result - NULL
 # loop through all the columns and build up the 'result'
 for (i in 2:(ncol(trans) - 1)){
+ result - rbind(result, cbind(name=trans[,1], PREV=trans[,i],
NEXT=trans[,i+1]))
+ }
 result
  name PREV NEXT
1 f1 3  2
5 f2 2  3
1 f1 2  2
5 f2 3  1
1 f1 2  2
5 f2 1  1

 (markov - table(result[,PREV], result[,NEXT]))

1 2 3
  1 1 0 0
  2 0 2 1
  3 1 1 0



On 1/21/06, jim holtman [EMAIL PROTECTED] wrote:

 Is this what you want:


 set.seed(1001)

 # Raw data in long format --
 raw - data.frame(name=c(f1,f1,f1,f1,f2,f2,f2,f2),
  year=c(83,   84,  85,  86,  83,  84,  85,  86),
  state=sample(1:3, 8, replace=TRUE)
  )
 # Shift to wide format --
 fixedup - reshape(raw, timevar=year, idvar=name, v.names=state,
   direction=wide)

 trans - as.matrix(fixedup)
 result - NULL
 for (i in 2:(ncol(trans) - 1)){
  result - rbind(result, cbind(name=trans[,1], prev=trans[,i],
 next=trans[,i+1]))
 }

 result

 markov - table(try$prev.state, try$new.state)





  On 1/21/06, Ajay Narottam Shah [EMAIL PROTECTED] wrote:
 
  Folks,
 
  I am holding a dataset where firms are observed for a fixed (and
  small) set of years. The data is in long format - one record for
one
  firm for one point in time. A state variable is observed (a factor).
 
  I wish to make a markov transition matrix about the time-series
  evolution of that state variable. The code below does this. But it's
  hardcoded to the specific years that I observe. How might one
  generalise this and make a general function which does this? :-)
 
-ans.
 
 
 
  set.seed(1001)
 
  # Raw data in long format --
  raw - data.frame(name=c(f1,f1,f1,f1,f2,f2,f2,f2),
   year=c(83,   84,  85,  86,  83,  84,  85,  86),
   state=sample(1:3, 8, replace=TRUE)
   )
  # Shift to wide format --
  fixedup - reshape(raw, timevar=year, idvar=name,
v.names=state,
direction=wide)
  # Now tediously build up records for an intermediate data structure
  try - rbind(
  data.frame(prev=fixedup$state.83, new=fixedup$state.84),
  data.frame(prev=fixedup$state.84, new=fixedup$state.85),
  data.frame(prev=fixedup$state.85, new=fixedup$state.86)
  )
  # This is a bad method because it is hardcoded to the specific
values
  # of year.
  markov - table(destination$prev.state, destination$new.state)
 
  --
  Ajay Shah
http://www.mayin.org/ajayshah
 
  [EMAIL PROTECTED]
  http://ajayshahblog.blogspot.com
  *(:-? - wizard who doesn't know the answer.
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
 
http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/pos
ting-guide.html
 



 --
 Jim Holtman
 Cincinnati, OH
 +1 513 247 0281

 What the problem you are trying to solve?




--
Jim Holtman
Cincinnati, OH
+1 513 247 0281

What the problem you are trying to solve?

[[alternative HTML version deleted

Re: [R] problems with glm

2006-01-15 Thread Bill.Venables
Most of your problems seem to come from 'link = log' whereas you
probably mean 'link = logit' (which is the default.  Hence:

##
 success - c(13,12,11,14,14,11,13,11,12)
 failure - c(0,0,0,0,0,0,0,2,2)
 predictor - c(0,80*5^(0:7))
 glm(cbind(success,failure) ~ predictor,
+   family = binomial, #(link=log),
+   control = glm.control(epsilon=1e-8,trace=TRUE,maxit=50))
Deviance = 7.621991 Iterations - 1 
Deviance = 6.970934 Iterations - 2 
Deviance = 6.941054 Iterations - 3 
Deviance = 6.940945 Iterations - 4 
Deviance = 6.940945 Iterations - 5 

Call:  glm(formula = cbind(success, failure) ~ predictor, family =
binomial,  control = glm.control(epsilon = 1e-08, trace = TRUE,
maxit = 50)) 

Coefficients:
(Intercept)predictor  
  4.180e+00   -4.106e-07  

Degrees of Freedom: 8 Total (i.e. Null);  7 Residual
Null Deviance:  12.08 
Residual Deviance: 6.941AIC: 15.85 
 

###

Bill Venables, 
CMIS, CSIRO Laboratories, 
PO Box 120, Cleveland, Qld. 4163 
AUSTRALIA 
Office Phone (email preferred): +61 7 3826 7251 
Fax (if absolutely necessary):+61 7 3826 7304 
Mobile (rarely used):+61 4 1963 4642 
Home Phone:  +61 7 3286 7700 
mailto:[EMAIL PROTECTED] 
http://www.cmis.csiro.au/bill.venables/ 



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Roland R Regoes
Sent: Sunday, 15 January 2006 11:04 PM
To: r-help@stat.math.ethz.ch
Subject: [R] problems with glm


Dear R users,

 I am having some problems with glm. The first is an error message
subscript out of bounds. The second is the fact that reasonable
starting values are not accepted by the function.

 To be more specific, here is an example:

 success - c(13,12,11,14,14,11,13,11,12)
 failure - c(0,0,0,0,0,0,0,2,2)
 predictor - c(0,80*5^(0:7))
 glm(cbind(success,failure) ~ predictor + 0,
+   family=binomial(link=log),
+   control=glm.control(epsilon=1e-8,trace=TRUE,maxit=50))
Deviance = 3.348039 Iterations - 1 
Error: subscript out of bounds
In addition: Warning message:
step size truncated: out of bounds 


The model with intercept yields:

 glm(cbind(success,failure) ~ predictor ,
+   family=binomial(link=log),
+   control=glm.control(epsilon=1e-8,trace=FALSE,maxit=50))

Call:  glm(formula = cbind(success, failure) ~ predictor, family =
binomial(link = log),  control = glm.control(epsilon = 1e-08,
trace = FALSE, maxit = 50)) 

Coefficients:
(Intercept)predictor  
 -5.830e-17   -4.000e-08  

Degrees of Freedom: 8 Total (i.e. Null);  7 Residual
Null Deviance:  12.08 
Residual Deviance: 2.889AIC: 11.8 
There were 33 warnings (use warnings() to see them)
 warnings()
1: step size truncated: out of bounds
...
31: step size truncated: out of bounds
32: algorithm stopped at boundary value in: glm.fit(x = X, y = Y,
weights = weights, start = start, etastart = etastart,   ...
33: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X,
y = Y, weights = weights, start = start, etastart = etastart,   ...


Since the intercept in the above fit is fairly small I thought I
could use -4e-8 as a reasonable starting value in a model without
intercept. But to no avail:

 glm(cbind(success,failure) ~ predictor + 0, start=-4e-8,
+   family=binomial(link=log),
+   control=glm.control(epsilon=1e-8,trace=FALSE,maxit=50))
Error in glm.fit(x = X, y = Y, weights = weights, start = start,
etastart = etastart,  : 
cannot find valid starting values: please specify some


I am stuck here. Am I doing something wrong when specifying the
starting value? I would appreciate any help. (I could not find anything
relevant in the documentation of glm and the mailing list archives, but
I did not read the source code of glm yet.)

Roland


PS: I am using R Version 2.2.0 (R Cocoa GUI 1.13 (1915)) on MacOSX
10.4.4

-- 
---
Roland Regoes
Theoretical Biology
Universitaetsstr. 16 
ETH Zentrum, CHN H76.1 
CH-8092 Zurich, Switzerland

tel: +41-44-632-6935
fax: +41-44-632-1271

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] reduce levels

2005-11-07 Thread Bill.Venables
sub$mm - factor(sub$mm)

is the simplest way to change a single factor in this way.  To do a
whole data frame you just need a loop:

drop_unused_levels - function(data) 
as.data.frame(lapply(data, function(x) if(is.factor(x))
factor(x) else x ))

Here's your example again, but witn a slightly different idiom...

test - data.frame(mm = letters[1:3])
sub - subset(test, mm == b)

fixedSub - drop_unused_levels(sub)

Bill Venables.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Claus Atzenbeck
Sent: Tuesday, 8 November 2005 9:52 AM
To: R-help
Subject: [R] reduce levels


Hi all:

I have an example that shows my problem:

 test - data.frame(c(a, b, c))
 colnames(test) - mm
 sub - subset(test, mm==b)
 sub$mm
[1] b
Levels: a b c
 levels(sub$mm)
[1] a b c

How can I reduce the levels to exclusively those which are part of the
data frame? That is in the above named example exclusively b.

Reason: I want to iterate exclusively through those levels that exist
within a subset, but leave away all others.

Thanks for any hint.
Claus

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Roots of quadratic system.

2005-05-01 Thread Bill.Venables
Are you looking for a unique solution or families of solutions?

Can't you turn a root-finding problem for a system of equations 
with a unique solution into an optimisation problem, anyway?

E.g.  You want to solve

f1(x) = g1
f2(x) = g2
...

Why not optimise L(x) = (f1(x) - g1)^2 + (f2(x) - g2)^2 + ... 
with respect to x?  If the minimum value is zero, then you are
done; if it is greater than zero your original system does not
have a solution.

If you are in the complex domain the changes needed are obvious.

V.

: -Original Message-
: From: [EMAIL PROTECTED] 
: [mailto:[EMAIL PROTECTED] On Behalf Of John Janmaat
: Sent: Monday, 2 May 2005 12:48 AM
: To: r-help@stat.math.ethz.ch
: Subject: [R] Roots of quadratic system.
: 
: 
: Hello,
: 
: I have a system of quadratic equations (results of a 
: Hamiltonian optimization)
: which I need to find the roots for.  Is there a package 
: and/or function which
: will find the roots for a quadratic system?  Note that I am 
: not opimizing, but
: rather solving the first order conditions which come from a 
: Hamiltonian.  I am
: basically looking for something in R that will do the same 
: thing as fsolve in
: Matlab.
: 
: Thanks,
: 
: John.
: 
: ==
: Dr. John Janmaat
: Department of Economics
: Acadia University
: Tel: 902-585-1461
: 
: __
: R-help@stat.math.ethz.ch mailing list
: https://stat.ethz.ch/mailman/listinfo/r-help
: PLEASE do read the posting guide! 
: http://www.R-project.org/posting-guide.html
:

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] How to extract function arguments literally

2005-04-29 Thread Bill.Venables
instead of as.character(substitute(arg)) use deparse(substitute(arg))

: -Original Message-
: From: [EMAIL PROTECTED] 
: [mailto:[EMAIL PROTECTED] On Behalf Of Shigeru Mase
: Sent: Saturday, 30 April 2005 1:24 PM
: To: r-help@stat.math.ethz.ch
: Subject: [R] How to extract function arguments literally
: 
: 
: Dear all,
: 
: One of my friends asked me if it is possible to extract actual R
: function arguments literally (precisely, as strings). The reason is
: simple. He feels sometimes awkward to attach quotation marks :-). What
: he actually wants is to pass R command arguments to XLisp subroutines
: (He has been an enthusiastic XLisp user for a long time and 
: still tends
: to use R as a wrapper to XLisp). Is it possible, or should I 
: advise him
: not to be so lazy? The following is his simple example to explain the
: situation.
: 
: R function lc which passes its argument to an Xlisp function:
: 
:  lc=function(cmd){
: cmd=as.character(substitute(cmd))
: .XLisp(lcommand, cmd)}
: 
: Corresponding XLisp function lcommand:
: 
:  (defun lcommand (str) (eval (with-input-from-string (s str) 
: (read s
: 
: If the argument cmd is a string (i.e. with quotation marks) or has no
: space, it works fine.
: 
:  as.character(substitute(abc))
: [1] abc
: 
: But, if it has no quotation marks or contains spaces, it 
: yileds an error.
: 
:  as.character(substitute((def x1 x2))
: 
: Error: syntax error
: 
: He wants to use the syntax like lc((def x1 x2)), not like
: lc((def x1 x2)).
: 
: Thanks in advance.
: 
: ==
: Shigeru MASE [EMAIL PROTECTED]
: Tokyo Institute of Technology
: Dept. of Math. and Comp. Sciences
: O-Okayama 2-12-1-W8-28, Tokyo, 152-8550, Japan
: 
: __
: R-help@stat.math.ethz.ch mailing list
: https://stat.ethz.ch/mailman/listinfo/r-help
: PLEASE do read the posting guide! 
: http://www.R-project.org/posting-guide.html
:

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] The eigen function

2005-04-25 Thread Bill.Venables
To me this is not at all surprising.

If you read the help info for eigen it says clearly that calculating the
eigenvectors is the slow part.  So it is entirely likely that completely
different algorithms will be used if you are asking for only the
eigenvalues, or if you are asking for both eigenvalues and eigenvectors.

At least that's what I would do...

(You should check what happens with EISPACK = TRUE as well, though.)

Bill Venables.

: -Original Message-
: From: [EMAIL PROTECTED] 
: [mailto:[EMAIL PROTECTED] On Behalf Of 
: Louisell, Paul T.
: Sent: Tuesday, 26 April 2005 6:26 AM
: To: 'r-help@stat.math.ethz.ch'
: Subject: [R] The eigen function
: 
: 
: I'm using R version 2.0.1 on a Windows 2000 operating system. 
: Here is some
: actual code I executed:
: 
:  test
:  [,1] [,2]
: [1,] 1000  500
: [2,]  500  250
:  eigen(test, symmetric=T)$values
: [1]  1.25e+03 -3.153033e-15
:  eigen(test, symmetric=T)$values[2] = 0
: [1] FALSE
:  eigen(test, symmetric=T, only.values=T)$values
: [1] 12500
:  eigen(test, symmetric=T, only.values=T)$values[2] = 0
: [1] TRUE
: 
: I'm wondering why the 'eigen' function is returning different values
: depending on whether the parameter only.values=T. This is 
: probably some
: numerical quirk of the code; it must do things differently 
: when it has to
: compute eigenvectors than it does when only computing 
: eigenvalues. It's
: easily checked that the exact eigenvalues are 1250 and 0. Can 
: one of the
: developers tell me whether this should be regarded as a bug or not?
: 
: Thanks,
: 
: Paul Louisell
: Pratt  Whitney
: Statistician
: TechNet: 435-5417
: e-mail: [EMAIL PROTECTED]
: 
: __
: R-help@stat.math.ethz.ch mailing list
: https://stat.ethz.ch/mailman/listinfo/r-help
: PLEASE do read the posting guide! 
: http://www.R-project.org/posting-guide.html
:

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] question about about the drop1

2005-04-23 Thread Bill.Venables
Ronggui asks:

: -Original Message-
: From: [EMAIL PROTECTED] 
: [mailto:[EMAIL PROTECTED] On Behalf Of ronggui
: Sent: Saturday, 23 April 2005 5:41 PM
: To: r help
: Subject: [R] question about about the drop1
: 
: 
: the data is :
:  table.8.3-data.frame(expand.grid( 
: marijuana=factor(c(Yes,No),levels=c(No,Yes)), 
: cigarette=factor(c(Yes,No),levels=c(No,Yes)), 
: alcohol=factor(c(Yes,No),levels=c(No,Yes))), 
: count=c(911,538,44,456,3,43,2,279))
: 
:  fit3-glm(count ~ .^3, poisson, table.8.3)
:  sumary(fit3)
: ...
: Residual deviance: -1.5543e-15  on 0  degrees of freedom
: AIC: 65.043
: 
:  drop1(fit3, .~., test=Chisq)

Putting in the dummy formula, . ~ ., will force inappropriate 
terms (marginal terms) to be dropped from the model.


: Single term deletions
: 
: Model:
: count ~ (marijuana + cigarette + alcohol)^3
: Df   DevianceAICLRT   Pr(Chi)
: none -1.554e-15  65.04 
: marijuana1 365.78 428.83 365.78  2.2e-16 ***
:    ~~
: marijuana:cigarette:alcohol  1   0.37  63.42   0.37   0.54084
: ~
: 
:  update(fit3,.~.-marijuana )
: ...
: Residual Deviance: -1.188e-13   AIC: 65.04
:~

Yes. This is the same as the full model, indicating that
the update has not changed the model at all.  You still
have higher order interactions in the model so the main
effect term has to be left in (or it's equivalent).


   
: 
: update(fit3,.~.-marijuana:cigarette:alcohol )
:  ...
: Residual Deviance: 0.374AIC: 63.42 
:~


This update has removed the only non-marginal term from the 
model and so has made a change.  This is why you are seeing
the same deviance and AIC here as you saw with drop1.  (This
line from drop1 was the only appropriate one to be shown.)


: 
: i find 0.374 and 63.42 are equal the result from drop1,but   
: Residual Deviance: -1.188e-13   AIC: 65.04is not.
: 
: so i wonder why? what does the drop1 work exactly?

If you tried

drop1(fit3, test = Chisq)

at the initial stage you would have only seen the three-way
interaction.  The other terms you are seeing with your
inappropriate invocation of drop1 are not what you think
they are.

: 
: thank you !
: 
: __
: R-help@stat.math.ethz.ch mailing list
: https://stat.ethz.ch/mailman/listinfo/r-help
: PLEASE do read the posting guide! 
: http://www.R-project.org/posting-guide.html
:

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] A question on the library lme4

2005-04-23 Thread Bill.Venables
Luis Fernando Chaves asks:

: -Original Message-
: From: [EMAIL PROTECTED] 
: [mailto:[EMAIL PROTECTED] On Behalf Of Luis 
: Fernando Chaves
: Sent: Sunday, 24 April 2005 11:29 AM
: To: [EMAIL PROTECTED]
: Subject: [R] A question on the library lme4
: 
: 
: Hi,
: 
: I ran the following model using nlme:
: 
: model2 - lme(log(malrat1) ~ I(year-1982), 
: random = ~1|Continent/Country, data=wbmal10)
: 
: I'm trying to run a Poisson GlMM to avoid the above 
: transformation but I 
: don't know how to specify the model using lmer in the lme4 library:
: 
: model3 - lmer((malrat1) ~ I(year-1982) + ??, 
: data=wbmal10, family=poisson)

(I asked the same question of Doug Bates, as it happens.)

Try the following:

wbmal10 - transform(wbmal10, Cont_Country = Continent:Country)
require(lme4)
model3 - lmer(malrat1 ~ I(year - 1982) + (1|Continent)
+ (1|Cont_Country), family = poisson, data = wbmal10)

: 
: How can I introduce a random factor of the form= ~1|Continent/Country?
: 
: Thanks;
: 
: -- 
: Luis Fernando Chaves
: 
: __
: R-help@stat.math.ethz.ch mailing list
: https://stat.ethz.ch/mailman/listinfo/r-help
: PLEASE do read the posting guide! 
: http://www.R-project.org/posting-guide.html
:

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Anova - interpretation of the interaction term

2005-04-22 Thread Bill.Venables


: -Original Message-
: From: [EMAIL PROTECTED] 
: [mailto:[EMAIL PROTECTED] On Behalf Of 
: michael watson (IAH-C)
: Sent: Friday, 22 April 2005 7:47 PM
: To: r-help@stat.math.ethz.ch
: Subject: [R] Anova - interpretation of the interaction term
: 
: 
: Hi
: 
: So carrying on my use of analysis of variance to check for the effects
: of two factors.  It's made simpler by the fact that both my 
: factors have
: only two levels each, creating four unique groups.
: 
: I have a highly significant interaction term.  In the context of the
: experiment, this makes sense.  I can visualise the data 
: graphically, and
: sure enough I can see that both factors have different effects on the
: data DEPENDING on what the value of the other factor is.  
: 
: I explain this all to my colleague - and she asks but which ones are
: different?  This is best illustrated with an example.  We have either
: infected | uninfected, and vaccinated | unvaccinated (the two 
: factors).
: We're measuring expression of a gene.  Graphically, in the infected
: group, vaccination makes expression go up.  In the uninfected group,
: vaccination makes expression go down.  In both the vaccinated and
: unvaccinated groups, infection makes expression go down, but it goes
: down further in unvaccinated than it does in vaccinated.
: 
: So from a statistical point of view, I can see exactly why the
: interaction term is significant, but what my colleage wants to know is
: that WITHIN the vaccinated group, does infection decrease expression
: significantly?  And within the unvaccinated group, does infection
: decrease expression significantly?  Etc etc etc  Can I get this
: information from the output of the ANOVA, or do I carry out a separate
: test on e.g. just the vaccinated group? (seems a cop out to me)

No, you can't get this kind of specific information out of the anova
table and yes, anova tables *are* a bit of a cop out.  (I sometimes 
think they should only be allowed between consenting adults in private.)

What you are asking for is a non-standard, but perfectly reasonable
partition of the degrees of freedom between the classes of a single
factor with four levels got by pairing up the levels of vaccination and
innoculation.  Of course you can get this information, but you have to
do a bit of work for it.  

Before I give the example which I don't expect too many people to read
entirely, let me issue a little challenge, namely to write tools to 
automate a generalized version of the procedure below.

Here is the example, (drawing from the explanation given in a certain 
book, to wit chapter 6):

 dat - expand.grid(vac = c(N, Y), inf = c(-, +))
 dat - rbind(dat, dat)  # to get a bit of replication

Now we make a 4-level factor from vaccination and infection and
generate a bit of data with an infection effect built into it:

 dat - transform(dat, vac_inf = vac:inf, 
 y = as.numeric(inf) + rnorm(8))
 dat
   vac inf vac_inf  y
1N   - N:-  0.2285096
2Y   - Y:-  1.3504610
3N   + N:+  2.5581254
4Y   + Y:+  2.9208313
11   N   - N:- -0.8403039
21   Y   - Y:- -0.2440574
31   N   + N:+  2.4844055
41   Y   + Y:+  2.0772671

Now give the joint factor contrasts reflecting the partition
we want to effect:

 levels(dat$vac_inf)
[1] N:- N:+ Y:- Y:+
 m - matrix(scan(), ncol = 4, byrow = T)
1: -1  1  0  0
5:  0  0 -1  1
9:  1  1 -1 -1
13: 
Read 12 items
 fractions(ginv(m))  ## just to see what it looks like
 [,1] [,2] [,3]
[1,] -1/20  1/4
[2,]  1/20  1/4
[3,]0 -1/2 -1/4
[4,]0  1/2 -1/4

Note that we could have simply used t(m), but this
is not always possible.  Associate these contrasts, fit
and analyse:

 contrasts(dat$vac_inf) - ginv(m)
 gm - aov(y ~ vac_inf, dat)
 summary(gm)
Df  Sum Sq Mean Sq F value  Pr(F)
vac_inf  3 12.1294  4.0431   7.348 0.04190
Residuals4  2.2009  0.5502

This doesn't tell us too much other than there are differences,
probably.  Now to specify the partition:


 summary(gm, 
split = list(vac_inf = list(- vs +|N = 1, 
- vs +|Y = 2)))
Df  Sum Sq Mean Sq F value  Pr(F)
vac_inf  3 12.1294  4.0431  7.3480 0.04190
  vac_inf: - vs +|N  1  7.9928  7.9928 14.5262 0.01892
  vac_inf: - vs +|Y  1  3.7863  3.7863  6.8813 0.05860
Residuals4  2.2009  0.5502


As expected, infection changes the mean for both vaccinated and
unvaccinated, as we arranged when we generated the data.

: 
: Many thanks, and sorry, but it's Friday.
: 
: Mick
: 
: __
: R-help@stat.math.ethz.ch mailing list
: https://stat.ethz.ch/mailman/listinfo/r-help
: PLEASE do read the posting guide! 
: http://www.R-project.org/posting-guide.html
:

__
R-help@stat.math.ethz.ch mailing list

RE: [R] Using R to illustrate the Central Limit Theorem

2005-04-21 Thread Bill.Venables
Here's a bit of a refinement on Ted's first suggestion.

 
 N - 1
 graphics.off()
 par(mfrow = c(1,2), pty = s)
 for(k in 1:20) {
m - (rowMeans(matrix(runif(M*k), N, k)) - 0.5)*sqrt(12*k)
hist(m, breaks = FD, xlim = c(-4,4), main = k,
prob = TRUE, ylim = c(0,0.5), col = lemonchiffon)
pu - par(usr)[1:2]
x - seq(pu[1], pu[2], len = 500)
lines(x, dnorm(x), col = red)
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ., col = blue)
abline(0, 1, col = red)
Sys.sleep(1)
  }



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: Friday, 22 April 2005 4:48 AM
To: Paul Smith
Cc: r-help@stat.math.ethz.ch
Subject: RE: [R] Using R to illustrate the Central Limit Theorem


On 21-Apr-05 Paul Smith wrote:
 Dear All
 
 I am totally new to R and I would like to know whether R is able and
 appropriate to illustrate to my students the Central Limit Theorem,
 using for instance 100 independent variables with uniform distribution
 and showing that their sum is a variable with an approximated normal
 distribution.
 
 Thanks in advance,
 
 Paul

Similar to Francisco's suggestion:

  m-numeric(1);
  for(k in (1:20)){
for(i in(1:1)){m[i]-(mean(runif(k))-0.5)*sqrt(12*k)}
hist(m,breaks=0.3*(-15:15),xlim=c(-4,4),main=sprintf(%d,k))
  }

(On my slowish laptop, this ticks over at a satidfactory rate,
about 1 plot per second. If your mahine is much faster, then
simply increase 1 to a larger number.)

The real problem with demos like this, starting with the
uniform distribution, is that the result is, to the eye,
already approximately normal when k=3, and it's only out
in the tails that the improvement shows for larger values
of k.

This was in fact the way we used to simulate a normal
distribution in the old days: look up 3 numbers in
Kendall  Babington-Smith's Tables of Random Sampling
Numbers, which are in effect pages full of integers
uniform on 00-99, and take their mean.

It's the one book I ever encountered which contained
absolutely no information -- at least, none that I ever
spotted.

A more dramatic illustration of the CLT effect might be
obtained if, instead of runif(k), you used rbinom(k,1,p)
for p  0.5, say:

  m-numeric(1);
  p-0.75; for(j in (1:50)){ k-j*j
for(i in(1:1)){m[i]-(mean(rbinom(k,1,p))-p)/sqrt(p*(1-p)/k)}
hist(m,breaks=41,xlim=c(-4,4),main=sprintf(%d,k))
  }

Cheers,
Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 21-Apr-05   Time: 19:48:05
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] generalized linear mixed models - how to compare?

2005-04-19 Thread Bill.Venables
Andrew Criswell asks:

Hello All:

Should I conclude from this discussion that there is no
practical
means by which nested generalized mixed models can be compared
from
output produced through glmmPQL or GLMM?

[WNV]  The picture is, in my view, not as bleak as this, but there are
are 
certainly many open questions in this area and much research left to do.

 What is one then to do???

[WNV]  Research in statistics, perhaps?  Every little bit helps.  It is
a
mistake to assume that everything is known about even the common
approximations
used in statistical practice, and this area is still opening up.

Andrew


On Sun, 17 Apr 2005, Deepayan Sarkar wrote:

 On Sunday 17 April 2005 08:39, Nestor Fernandez wrote:


 I want to evaluate several generalized linear mixed models,
including
 the null model, and select the best approximating one. I have
tried
 glmmPQL (MASS library) and GLMM (lme4) to fit the models.
Both result
 in similar parameter estimates but fairly different
likelihood
 estimates.
 My questions:
 1- Is it correct to calculate AIC for comparing my models,
given that
 they use quasi-likelihood estimates? If not, how can I
compare them?
 2- Why the large differences in likelihood estimates between
the two
 procedures?


 The likelihood reported by glmmPQL is wrong, as it's the
likelihood of
 an incorrect model (namely, an lme model that approximates the
correct
 glmm model).


Actually glmmPQL does not report a likelihood.  It returns an
object
of class lme, but you need to refer to the reference for how
to
interpret that.  It *is* support software for a book.

 GLMM uses (mostly) the same procedure to get parameter
estimates, but as a final step calculates the likelihood for the correct
model for those estimates (so the likelihood reported by it should be
fairly reliable).


Well, perhaps but I need more convincing.  The likelihood
involves
many high-dimensional non-analytic integrations, so I do not see
how
GLMM can do those integrals -- it might approximate them, but
that
would not be `calculates the likelihood for the correct model'.
It
would be helpful to have a clarification of this claim.  (Our
experiments show that finding an accurate value of the
log-likelihood
is difficult and many available pieces of software differ in
their
values by large amounts.)

Further, since neither procedure does ML fitting, this is not a
maximized likelihood as required to calculate an AIC value.  And
even
if it were, you need to be careful as often one GLMM is a
boundary
value for another, in which case the theory behind AIC needs
adjustment.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,
http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html



-- 
Andrew R. Criswell, Ph.D.
Graduate School, Bangkok University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] array indexing and which

2005-04-17 Thread Bill.Venables
You need to think about it just a bit harder.

[Hint: what happens if you leave out the first 'which' and just make

ids - (d[, 1]  0)

does it work then...?]

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Werner Wernersen
Sent: Monday, 18 April 2005 3:13 AM
To: r-help@stat.math.ethz.ch
Subject: [R] array indexing and which


Hi R friends!

I am stuck with a stupid question: I can circumvent it
but I would like to 
understand why it is wrong. It would be nice if you
could give me a hint...

I have an 2D array d and do the following:
ids - which(d[,1]0)

then I have a vector gk with same column size as d and
do:
ids2 - which(gk[ids]==1)

but I can't interprete the result I get in ids2.

I get the expected result when I use:
which(gk==1  d[,1]0)

Why is the first version wrong?

The reason why I try to use the ids vectors is that I
want to avoid recomputation.

Thanks for your help!
   Werner

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] nls segmented model with unknown joint points

2005-04-16 Thread Bill.Venables
This is how I'd write the formula for use with nls/nlme: 

y ~ b41*(x - 1) + b42*(x^2 - 1) + 
 ifelse((a41 - x) = 0, b43*(a41 - x)^2, 0) +
 ifelse((a42 - x) = 0, b44*(a42 - x)^2, 0)

This is a direct translation from your funny foreign-looking code below
that probably makes it clear what's going on.  A more swish R form might
be

y ~ b41*(x - 1) + b42*(x^2 - 1) + 
b43*pmax(a41 - x, 0)^2 + b44*pmax(a42 - x, 0)^2
 
You mention nlm, too.  Here you would use a function rather than a
formula, but the idea is the same.

V.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of andy
Sent: Sunday, 17 April 2005 1:09 PM
To: r-help@stat.math.ethz.ch
Subject: [R] nls segmented model with unknown joint points


Hello,

I am interested in fitting a segmented model with unknown joint points
in nls and perhaps eventually in nlme.  I can fit this model in sas (see
below, joint points to be estimated are a41 and a41), but am unsure how
to specify this in the nlm function.  I would really appreciate any
suggestions or example code.  Thanks a lot. -andy  

proc nlin data=Stems.Trees;
 params  b41=-3 b42=1.5 b43=-1.5 b44=50 a41=0.75 a42=0.1;

 term1 = (b41*(x - 1) + b42*(x**2 -1));

 if (a41 - x) = 0 then 
term2 = (b43*(a41 - x)**2);
 else
term2 = 0; 

 if (a42 - x) =0 then
term3 = (b44*(a42 - x)**2); 
 else 
term3 = 0;

 model y = term1+term2+term3;
run;

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Getting subsets of a data frame

2005-04-15 Thread Bill.Venables
You should look at 

 ?[

and look very carefully at the drop argument.  For your example

 sw[, 1]

is the first component of the data frame, but 

 sw[, 1, drop = FALSE]

is a data frame consisting of just the first component, as
mathematically fastidious people would expect.

This is a convention, and like most arbitrary conventions it can be very
useful most of the time, but some of the time it can be a very nasty
trap.  Caveat emptor.

Bill Venables.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Fernando Saldanha
Sent: Saturday, 16 April 2005 1:07 PM
To: Submissions to R help
Subject: [R] Getting subsets of a data frame


I was reading in the Reference Manual about Extract.data.frame.

There is a list of examples of expressions using [ and [[, with the
outcomes. I was puzzled by the fact that, if sw is a data frame, then

sw[, 1:3]

is also a data frame,

but 

sw[, 1]

is just a vector.

Since R has no scalars, it must be the case that 1 and 1:1 are the same:

 1 == 1:1
[1] TRUE
 
Then why isn't sw[,1] = sw[, 1:1] a data frame?

FS

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] String in data frame

2005-04-15 Thread Bill.Venables
str - as.character(d$name)

should do the trick.  (damn... my shift key just broke as well...)

bill venables.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Cuichang Zhao
Sent: Saturday, 16 April 2005 2:00 PM
To: r-help@stat.math.ethz.ch
Subject: [R] String in data frame


hello, 
how can take the string in the data frame.
right now i have a table that create as a data frame and stored in the
file called data.xls and now i want to read data frame as a table in
my another r program, i used the following command:
the first column of the data frame is just one number called num, but
the second one a list of string, called name. 
 
d - read.table(data.xls, header = T);
 
what i get for d is a table of 2 columns: num and name: 
 
the one for d$num is just a normal list without any levels with it, and
this is what i want. but the second d$name is a list with a levels with
it. 
the list i have for d$name is: d$name = (bal bal bal bal bal bal),
levlels:bal, however, when i want to have for following code, something
different happens:
namelist - NA;
a - d$name[1]; #this will outputs a = bal, lelvels:bal
 namelist - c(namelist, a); #this does not outptu (NA, bal), instead it
outputs (NA, 1), if i keep #adding a to the namelist, it keeps adding 1
to the namelist instead of bal. However, i want to add bal to the
namelist, not 1, so how i can do this?
 
Thank you very much.
 
Cuichang Zhao
 
April 15, 2005 


-


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] chronological ordering of factor in lm() and plot()

2005-04-15 Thread Bill.Venables
First put

 day.names - c(sun, mon, tue, wed, thu, fri, sat)

then

 days - factor(as.character(days), levels = day.names)

will ensure the ordering you want.

Bill Venables.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Eric C. Jennings
Sent: Saturday, 16 April 2005 2:29 PM
To: R-help
Subject: [R] chronological ordering of factor in lm() and plot()


I am trying to do some basic regression and ANOVA on cycle times
(numeric
vectors) across weekdays (character vector), where I have simply
labelled my
days as:
days- c(mon,tue,wed...etc).
(NOTE: There are actually multiple instances of each day, and the data
is
read-in from a .dat file.)

I have no trouble at all with the actual number crunching, It is the
proper ordering of the factor that I am asking about. R first
alphabetizes
it(fri,mon,thu...) before doing the work of lm(), aov() and
especially
plot().

I have tried as.ordered(factor( )), but that doesn't do anything.
If I re-assign levels() in the way that I want, that just renames the
the
levels of the factor but does not reorder it internally.
I've looked at chron(), but that seems to entail using a numeric vector
instead of a character vector.

How can I get it to properly (chronologically) order the factor. (In
some
ways I'm thinking that all I can do is:
days- c(a.mon,b.tues,c.wed...etc)

Thanks for all that you can do
Eric Jennings
[EMAIL PROTECTED]
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] factors in multinom function (nnet)

2005-04-12 Thread Bill.Venables
Alexandre,

I have a couple of remarks to make, not all of which you might find
immediately helpful, I regret to say.

* The choice between using predictors linearly or in factor versions is
a modelling choice that is in no way specific to multinom.  It is a
general aspect of modelling that has to be faced in a whole variety of
situations.  Indeed the full spectrum of choices is much wider than
this: linear, polynomials, splines, different sorts of splines, harmonic
terms, factors, ...  In fact the idea behind gam's was really to allow
some of this extensive field of choices to be model driven, but I
digress.  Point 1: you need to learn about modelling first and then
apply it to multinom.

* It is curious to me that someone could be interested in multinomial
models per se.  Usually people have a context where multinomial models
might be one approach to describing the situation in a statistically
useful way.  Another could be something like classification trees.  The
context is really what decides what modelling choices of this kind might
be sensible.

* There is an obvious suggestion for one reference, a certain notorious
blue and yellow book for which multinom is part of the support software.
I believe they discuss some of the alternatives as well, like
classification trees, and some of the principles of modelling, but it's
been a while since I read it...

* Frank Harrell recently issued an excellent article on this list on
brain surgery in a hurry to which you may usefully refer.  I believe it
was on April 1.

Bill Venables.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Alexandre Brito
Sent: Wednesday, 13 April 2005 8:20 AM
To: r-help@stat.math.ethz.ch
Subject: [R] factors in multinom function (nnet)


Dear All:

I am interested in multinomial logit models (function multinon, library
nnet) but I'm having troubles in choose whether to define the predictors
as factors or not.

I had posted earlier this example (thanks for the reply ronggui):

worms - data.frame(year = rep(2000:2004, c(3,3,3,3,3)),
age = rep(1:3, 5), 
mud = c(2,5,0,8,7,7,5,9,14,12,8,7,5,13,11),
sand = c(4,7,13,4,14,13,20,17,15,23,20,9,35,27,18), 
rocks = c(2,6,7,9,3,2,2,10,5,19,13,17,11,20,29))

k - as.matrix(worms[,3:5])

(mud, sand and rocks are factors;  age and year are predictors)

Now there are several possibilities:

m1 - multinom(k ~ year+age, data = worms)
m2 - multinom(k ~ factor(year)+age, data = worms)
m3 - multinom(k ~ year+factor(age), data = worms)
m4 - multinom(k ~ factor(year)+factor(age), data = worms)
m5 - multinom(k ~ year:age, data = worms)
m6 - multinom(k ~ year*age, data = worms)
m7 - multinom(k ~ factor(year):age, data=worms)
m8 - multinom(k ~ year:factor(age), data=worms) 

and so on.

I am far from an expert on this, and I would like to learn more about
the utilization of multinom function in R and the kind of doubts I
described above. So I hope that someone can recommend me some references
in this matter (internet, books...) if any is available. 

Thanks in advance, best wishes 

Alexandre
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] glm family=binomial logistic sigmoid curve problem

2005-04-11 Thread Bill.Venables
Couple of points:

* If you provide relative frequencies for the binomial response, you
need also to give weights so that the actual counts can be
reconstructed.  This is what the warning message is telling you: if you
reconstruct the counts using the default (unity) weights, the counts are
not integers...  In this case the simplest work-around is to use a
quasibinomial family, which at least shuts up the warning message.

* predict with glm objects, by default, predicts *linear predictors*.
You need to predict responses.

Here's how I would (minimally) correct your script:


year - c(2003+(6/12), 2004+(2/12), 2004+(10/12), 2005+(4/12))
percent - c(0.31, 0.43, 0.47, 0.50)
plot(year, percent, xlim = c(2003, 2007), ylim = c(0, 1))
Lm - lm(percent ~ year)
abline(Lm)
bm - glm(percent ~ year, family = quasibinomial)
points(year, fitted(bm), pch = 3)
curve(predict(bm, data.frame(year = x), type = resp), add = TRUE)

Supplementary points:

* It is a good idea to work with data frames, despite the fact that you
need not.

* Using lm as the name for a fitted linear model object can cause
problems, not to say confusion. 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of James Salsman
Sent: Monday, 11 April 2005 4:51 PM
To: r-help@stat.math.ethz.ch
Subject: [R] glm family=binomial logistic sigmoid curve problem


I'm trying to plot an extrapolated logistic sigmoid curve using
glm(..., family=binomial) as follows, but neither the fitted()
points or the predict()ed curve are plotting correctly:

  year - c(2003+(6/12), 2004+(2/12), 2004+(10/12), 2005+(4/12))
  percent - c(0.31, 0.43, 0.47, 0.50)
  plot(year, percent, xlim=c(2003, 2007), ylim=c(0, 1))
  lm - lm(percent ~ year)
  abline(lm)
  bm - glm(percent ~ year, family=binomial)
Warning message:
non-integer #successes in a binomial glm! in: eval(expr, envir, enclos)
  points(year, fitted(bm), pch=+)
NULL
  curve(predict(bm, data.frame(year=x)), add=TRUE)

All four of the binomial-fitted points fall exactly on the simple
linear regression line, and the predict() curve is nowhere near any
of the data points.  What am I doing wrong?

What does the warning mean?  Do I need more points?

I am using R on Windows, Version 2.0.1  (2004-11-15)

Thank you for your kind help.

Sincerely,
James Salsman

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Re: beta distribution in terms of it's mean and standarddeviation

2005-04-10 Thread Bill.Venables
For three of the beta distribution functions in R the parameters
defining the distribution are alpha, beta and 'ncp'.  Pretty trivially,
there is no bijection between these three and the mean and variance, but
for the special case of ncp = 0, I think there is.

Rather than just write it down, it's probably a good idea to see how to
get it.

Note that

mu = alpha/(alpha+beta)
s2 = alpha*beta/((alpha+beta)^2*(alpha+beta+1)) 
   = mu*(1-mu)/(alpha+beta+1)

So as an intermediate result put

alpha + beta = mu*(1-mu)/s2 - 1 (= ab, say, which must be positive or
you are in trouble)

giving

alpha = mu*ab
beta = (1-mu)*ab

Go forth and write your versions of the (central) beta distribution
support functions...

Bill Venables.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Tolga Uzuner
Sent: Monday, 11 April 2005 9:01 AM
To: [EMAIL PROTECTED]
Cc: r-help@stat.math.ethz.ch
Subject: [R] Re: beta distribution in terms of it's mean and
standarddeviation


Tolga Uzuner wrote:

 Hi,

 Is the beta distribution implemented in terms of it's mean and 
 standard deviation, as opposed to alpha and beta, in any R package ?

 Thanks
 Tolga

Hmm... answering my own question... guess there is no bijection between 
{alpha,beta} and {mean,variance} which is why... ocurred to me after I 
sent the question, unless someone disagrees.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] plotting Principal components vs individual variables.

2005-04-10 Thread Bill.Venables
At the cost of breaking the thread I'm going to change your subject and
replace 'Principle' by 'Principal'.  I just can't stand it any longer...

OK, here is how I would solve your other problems.  First put

 wh - c(USA, New Zealand, Dominican Republic, 
  Western Samoa, Cook Islands)
 ind - match(wh, row.names(running))

Now 'ind' has the indices of the special set as numbers.  You need this
because although your data frame is indexed by the country names, your
'principal' [NB] component scores are not.

Plot the scores against the first variable:

 plot(running$X100m, running.pca$scores[, 1])  # you got this far

Finally, pick out the special set:

 points(running$X100m[ind], running.pca$scores[ind, 1], pch=4,
col=red, cex=2)
 text(running$X100m[ind], running.pca$scores[ind, 1], pos = 3, cex =
0.7) # optional

and you can forget all about the subset data frame running2.

Bill Venables.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Brett Stansfield
Sent: Monday, 11 April 2005 10:04 AM
To: R help (E-mail)
Subject: [R] plotting Principle components vs individual variables.


Dear R,

I'm trying to plot the first principle component of an analysis vs the
first
variable but am having trouble. I have no trouble doing the  initial
plot
but have difficulty thereafter.

First I want to highlight some points of the following data set

list(running)
[[1]]
   X100m X200m X400m X800m X1500m   X5K  X10K Marathon
Argentina  10.39 20.81 46.84  1.81   3.70 14.04 29.36   137.72
Australia  10.31 20.06 44.84  1.74   3.57 13.28 27.66   128.30
Austria10.44 20.81 46.82  1.79   3.60 13.26 27.72   135.90
Belgium10.34 20.68 45.04  1.73   3.60 13.22 27.45   129.95
Bermuda10.28 20.58 45.91  1.80   3.75 14.68 30.55   146.62
Brazil 10.22 20.43 45.21  1.73   3.66 13.62 28.62   133.13
Burma  10.64 21.52 48.30  1.80   3.85 14.45 30.28   139.95
Canada 10.17 20.22 45.68  1.76   3.63 13.55 28.09   130.15
Chile  10.34 20.80 46.20  1.79   3.71 13.61 29.30   134.03
China  10.51 21.04 47.30  1.81   3.73 13.90 29.13   133.53
Columbia   10.43 21.05 46.10  1.82   3.74 13.49 27.88   131.35
Cook Islands   12.18 23.20 52.94  2.02   4.24 16.70 35.38   164.70
Costa Rica 10.94 21.90 48.66  1.87   3.84 14.03 28.81   136.58
Czechoslovakia 10.35 20.65 45.64  1.76   3.58 13.42 28.19   134.32
Denmark10.56 20.52 45.89  1.78   3.61 13.50 28.11   130.78
Dominican Republic 10.14 20.65 46.80  1.82   3.82 14.91 31.45   154.12
Finland10.43 20.69 45.49  1.74   3.61 13.27 27.52   130.87
France 10.11 20.38 45.28  1.73   3.57 13.34 27.97   132.30
East Germany   10.12 20.33 44.87  1.73   3.56 13.17 27.42   129.92
West Germany   10.16 20.37 44.50  1.73   3.53 13.21 27.61   132.23
United Kingdom 10.11 20.21 44.93  1.70   3.51 13.01 27.51   129.13
Greece 10.22 20.71 46.56  1.78   3.64 14.59 28.45   134.60
Guatemala  10.98 21.82 48.40  1.89   3.80 14.16 30.11   139.33
Hungary10.26 20.62 46.02  1.77   3.62 13.49 28.44   132.58
India  10.60 21.42 45.73  1.76   3.73 13.77 28.81   131.98
Indonesia  10.59 21.49 47.80  1.84   3.92 14.73 30.79   148.83
Ireland10.61 20.96 46.30  1.79   3.56 13.32 27.81   132.35
Israel 10.71 21.00 47.80  1.77   3.72 13.66 28.93   137.55
Italy  10.01 19.72 45.26  1.73   3.60 13.23 27.52   131.08
Japan  10.34 20.81 45.86  1.79   3.64 13.41 27.72   128.63
Kenya  10.46 20.66 44.92  1.73   3.55 13.10 27.38   129.75
South Korea10.34 20.89 46.90  1.79   3.77 13.96 29.23   136.25
North Korea10.91 21.94 47.30  1.85   3.77 14.13 29.67   130.87
Luxembourg 10.35 20.77 47.40  1.82   3.67 13.64 29.08   141.27
Malaysia   10.40 20.92 46.30  1.82   3.80 14.64 31.01   154.10
Mauritius  11.19 22.45 47.70  1.88   3.83 15.06 31.77   152.23
Mexico 10.42 21.30 46.10  1.80   3.65 13.46 27.95   129.20
Netherlands10.52 20.95 45.10  1.74   3.62 13.36 27.61   129.02
New Zealand10.51 20.88 46.10  1.74   3.54 13.21 27.70   128.98
Norway 10.55 21.16 46.71  1.76   3.62 13.34 27.69   131.48
Papua New Guinea   10.96 21.78 47.90  1.90   4.01 14.72 31.36   148.22
Philippines10.78 21.64 46.24  1.81   3.83 14.74 30.64   145.27
Poland 10.16 20.24 45.36  1.76   3.60 13.29 27.89   131.58
Portugal   10.53 21.17 46.70  1.79   3.62 13.13 27.38   128.65
Rumania10.41 20.98 45.87  1.76   3.64 13.25 27.67   132.50
Singapore  10.38 21.28 47.40  1.88   3.89 15.11 31.32   157.77
Spain  10.42 20.77 45.98  1.76   3.55 13.31 27.73   131.57
Sweden 10.25 20.61 45.63  1.77   3.61 13.29 27.94   130.63
Switzerland10.37 20.46 45.78  1.78   3.55 13.22 27.91   131.20
Taiwan 10.59 

RE: [R] axis colors in pairs plot

2005-04-07 Thread Bill.Venables
Hi Anne,

Here's one suggestion, use a simple panel function:

cols - c(red, green3, blue)

with(iris, 
  pairs(iris[, -5], main = Andersons Iris Data - 3 species,
  panel = function(x, y, ...)
  points(x, y, pch = (2:4)[Species], col = cols[Species], ...)
  ))
 
Bill Venables

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Anne York
Sent: Friday, 8 April 2005 8:51 AM
To: Help R
Subject: [R] axis colors in pairs plot


The following command produces red axis line in a pairs 
plot:

pairs(iris[1:4], main = Anderson's Iris Data -- 3 species,
pch = +, col = c(red, green3,  blue)[unclass(iris$Species)])


Trying to fool pairs in the following  way  produces the 
same plot as above:

pairs(iris[1:4], main = Anderson's Iris Data -- 3 species,pch = +,
col = c(black, red, green3, blue)[ 1+ unclass(iris$Species)])

One very kludgy work-around is to define a new level 1, say 
foo in the first row of iris:

iris2=iris
iris2$Species = as.character(iris2$Species)
iris2$Species[1]=foo
iris2$Species = factor(iris2$Species)

pairs(iris2[1:4], main = Anderson's Iris Data -- 3 
species, pch = +,
col = c( black,red, green3,blue)[ unclass(iris2$Species)])

However, if any other row is redefined, the red-axis 
persists. For example:

iris2=iris
iris2$Species = as.character(iris2$Species)
iris2$Species[3]=foo
iris2$Species = factor(iris2$Species)


pairs(iris2[1:4], main = Anderson's Iris Data -- 3
species, pch = +,
col = c( black,red, green3,blue)[ unclass(iris2$Species)])

I'd appreciate suggestions for a simpler work-around.

Thanks,
Anne

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] two methods for regression, two different results

2005-04-05 Thread Bill.Venables
This is possible if x and z are orthogonal, but in general it doesn't
work as you have noted.  (If it did work it would almost amount to a way
of inverting geenral square matrices by working one row at a time, no
going back...)

It is possible to fit a bivariate regression using simple linear
regression techniques iteratively like this, but it is a bit more
involved than your two step process.

1. regress y on x and take the residuals: ryx - resid(lm(y ~ x))

2. regress z on x and take the residuals: rzx - resid(lm(z ~ x))

3. regress ryx on rzx: fitz - lm(ryx ~ rzx)

4. this gives you the estimate of the coefficient on z (what you call
below b2):
b2 - coef(fitz)[2]

5. regress y - b2*z on x: fitx - lm(I(y - b2*z) ~ x)

This last step gets you the estimates of b0 and b1.

None of this works with significances, though, because in going about it
this way you have essentially disguised the degrees of freedom involved.
So you can get the right estimates, but the standard errors,
t-statistics and residual variances are all somewhat inaccurate (though
usually not by much).

If x and z are orthogonal the (curious looking) step 2 is not needed.

This kind of idea lies behind some algorithms (e.g. Stevens' algorithm)
for fitting very large regressions essentially by iterative processes to
avoid constructing a huge model matrix.

Bill Venables

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of John Sorkin
Sent: Wednesday, 6 April 2005 12:55 PM
To: r-help@stat.math.ethz.ch
Subject: [R] two methods for regression, two different results


Please forgive a straight stats question, and the informal notation.
 
let us say we wish to perform a liner regression:
y=b0 + b1*x + b2*z
 
There are two ways this can be done, the usual way, as a single
regression, 
fit1-lm(y~x+z)
or by doing two regressions. In the first regression we could have y as
the dependent variable and x as the independent variable 
fit2-lm(y~x). 
The second regrssion would be a regression in which the residuals from
the first regression would be the depdendent variable, and the
independent variable would be z.
fit2-lm(fit2$residuals~z)
 
I would think the two methods would give the same p value and the same
beta coefficient for z. The don't. Can someone help my understand why
the two methods do not give the same results. Additionally, could
someone tell me when one method might be better than the other, i.e.
what question does the first method anwser, and what question does the
second method answer. I have searched a number of textbooks and have not
found this question addressed.
 
Thanks,
John
 
John Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
Baltimore VA Medical Center GRECC and
University of Maryland School of Medicine Claude Pepper OAIC
 
University of Maryland School of Medicine
Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
 
410-605-7119 
-- NOTE NEW EMAIL ADDRESS:
[EMAIL PROTECTED]

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] how to draw a 45 degree line on qqnorm() plot?

2005-04-03 Thread Bill.Venables
Remember this is just a diagnostic procedure and all you are really
looking for is whether the normal scores plot is pretty straight.  The
reason for anchoring the guiding line at the quartiles is really the
same reason that boxplots use quartiles for the central chunk.  You
don't want the guiding line to be influenced by the tails too much.
It's a robustness thing.

In order for the 45 degree line to be useful, you have to *standardize*
your residuals so that they look like *standard* normal residuals (just
as the normal scores are standardized).  If you do that then the 45
degree line is likely to be useful, but no more so in practice than the
usual qqline, which applies irrespective of standardization.

Bill Venables.

-Original Message-
From: Terry Mu [mailto:[EMAIL PROTECTED] 
Sent: Sunday, 3 April 2005 6:10 PM
To: Venables, Bill (CMIS, Cleveland)
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] how to draw a 45 degree line on qqnorm() plot?


Thank you. I didn't know scale(). 

Qqline passes through 1st and 3rd quantiles, It doesn't seem very
useful to me. I thought a diagnol line will demonstrate the deviation
from a standard normal. Correct me if I was wrong. Thanks.



On Apr 3, 2005 2:40 AM, [EMAIL PROTECTED] [EMAIL PROTECTED]
wrote:
 You didn't try very hard.
 
 Try this, look at it and think about it:
 
 jj - scale(sample(1:100, 10))
 qqnorm(jj)
 abline(0, 1)
 
 Rather than abline, however, most people, however, would use
 
 qqline(jj)
 
 in which case you don't need the scaling.
 
 V.
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Terry Mu
 Sent: Sunday, 3 April 2005 4:00 PM
 To: R-Help
 Subject: [R] how to draw a 45 degree line on qqnorm() plot?
 
 # I can not draw a 45 degree line on a qqnorm() plot,
 
 jj - sample(c(1:100), 10)
 qqnorm(jj)
 
 abline() don't work.
 Thank you.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] how to draw a 45 degree line on qqnorm() plot?

2005-04-02 Thread Bill.Venables
You didn't try very hard.  

Try this, look at it and think about it:

jj - scale(sample(1:100, 10))  
qqnorm(jj)
abline(0, 1)  

Rather than abline, however, most people, however, would use

qqline(jj)

in which case you don't need the scaling.

V.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Terry Mu
Sent: Sunday, 3 April 2005 4:00 PM
To: R-Help
Subject: [R] how to draw a 45 degree line on qqnorm() plot?


# I can not draw a 45 degree line on a qqnorm() plot,

jj - sample(c(1:100), 10)
qqnorm(jj)

abline() don't work.
Thank you.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] NA's?

2005-03-30 Thread Bill.Venables
Your message doesn't help us very much.  You haven't said what kind of
calculation it is you want to do, and that certainly matters.  For
example, for some kinds of computations the solution you started below
would work fine:

 M - matrix(1:16, 4, 4)
 is.na(diag(M)) - TRUE
 M
 [,1] [,2] [,3] [,4]
[1,]   NA59   13
[2,]2   NA   10   14
[3,]37   NA   15
[4,]48   12   NA
 rowSums(M, na.rm = TRUE)
[1] 27 26 25 24
 colSums(M, na.rm = TRUE)
[1]  9 20 31 42

You can also use apply( ) with functions that will accept missing values
(and ignore them) for computations on either the rows or the columns.

Hoping for a general mechanism that would somehow signal the diagonal
values as values to be ignored in a general way is not a possibility.
Just as a curiosity, what were you hoping that na.omit(M) would do?

V.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Asha Jayanthi
Sent: Thursday, 31 March 2005 11:22 AM
To: r-help@stat.math.ethz.ch
Subject: [R] NA's?


I have a large matrix of data .

The size of the matrix ranges from 100 x 100 to 1000 x 1000

Now i have to do computations on that. And should not consider the
diagonal 
elements.

I tried setting diag(M) = NA  and M = na.omit(M).

But this omits all the rows. I only want to omit that diagonal elements
only 
but consider the whole row.

diag(M) = 0 seems like a good option but this will affect my result.

How to proceed with this. How to just ignore some specific values. what
if i 
want to consider only the upper / lower triangular matrix

Asha


http://adfarm.mediaplex.com/ad/ck/4686-26272-10936-31?ck=RegSell Start
your 
business.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] NA's?

2005-03-30 Thread Bill.Venables
is.na(diag(M)) - TRUE

cmeans - colMeans(M, na.rm = TRUE)
csd - apply(M, 2, sd, na.rm = TRUE)
cvar - csd ^2

(or

cvar - apply(M, 2, var, na.rm = TRUE)

)

Using 'kmeans' on a matrix but 'ignoring the diagonal entries' just
doesn't make sense as it stands, so I can't help you there.

V.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Asha Jayanthi
Sent: Thursday, 31 March 2005 3:48 PM
To: r-help@stat.math.ethz.ch
Subject: RE: [R] NA's?


I am sorry about that.


I like to do column mean, sd, var

as well as kmeans on the matrix

does this  na.rm = TRUE work for such fuctions and only the diagonal is 
ignored?

From: [EMAIL PROTECTED]
To: [EMAIL PROTECTED],r-help@stat.math.ethz.ch
Subject: RE: [R] NA's?
Date: Thu, 31 Mar 2005 11:42:05 +1000

Your message doesn't help us very much.  You haven't said what kind of
calculation it is you want to do, and that certainly matters.  For
example, for some kinds of computations the solution you started below
would work fine:

  M - matrix(1:16, 4, 4)
  is.na(diag(M)) - TRUE
  M
  [,1] [,2] [,3] [,4]
[1,]   NA59   13
[2,]2   NA   10   14
[3,]37   NA   15
[4,]48   12   NA
  rowSums(M, na.rm = TRUE)
[1] 27 26 25 24
  colSums(M, na.rm = TRUE)
[1]  9 20 31 42

You can also use apply( ) with functions that will accept missing
values
(and ignore them) for computations on either the rows or the columns.

Hoping for a general mechanism that would somehow signal the diagonal
values as values to be ignored in a general way is not a possibility.
Just as a curiosity, what were you hoping that na.omit(M) would do?

V.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Asha Jayanthi
Sent: Thursday, 31 March 2005 11:22 AM
To: r-help@stat.math.ethz.ch
Subject: [R] NA's?


I have a large matrix of data .

The size of the matrix ranges from 100 x 100 to 1000 x 1000

Now i have to do computations on that. And should not consider the
diagonal
elements.

I tried setting diag(M) = NA  and M = na.omit(M).

But this omits all the rows. I only want to omit that diagonal elements
only
but consider the whole row.

diag(M) = 0 seems like a good option but this will affect my result.

How to proceed with this. How to just ignore some specific values. what
if i
want to consider only the upper / lower triangular matrix

Asha


http://adfarm.mediaplex.com/ad/ck/4686-26272-10936-31?ck=RegSell Start
your
business.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html


http://ads.mediaturf.net/event.ng/Type=clickFlightID=17307AdID=44925T
argetID=9763Targets=9763Values=414,868,1093,2385Redirect=http:%2F%2Fw
ww.icicibanknripromotions.com%2Fm2i_feb%2Fnri_M2I_feb.jsp%3Fadid%3D44925
%26siteid%3D1093%26flightid%3D17307 
Get a FREE 30 minute India Calling Card.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Generating list of vector coordinates

2005-03-28 Thread Bill.Venables
 rev(expand.grid(k = 1:5, j = 1:4, i = 1:3))
   i j k
1  1 1 1
2  1 1 2
3  1 1 3
4  1 1 4
5  1 1 5
6  1 2 1
7  1 2 2
8  1 2 3

...

55 3 3 5
56 3 4 1
57 3 4 2
58 3 4 3
59 3 4 4
60 3 4 5
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ronnen Levinson
Sent: Tuesday, 29 March 2005 9:21 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Generating list of vector coordinates



   Hi.
   Can  anyone  suggest  a  simple  way  to  obtain in R a list of
vector
   coordinates of the following form? The code below is Mathematica.

 In[5]:=
 Flatten[Table[{i,j,k},{i,3},{j,4},{k,5}], 2]
 Out[5]=
 {{1,1,1},{1,1,2},{1,1,3},{1,1,4},{1,1,5},{1,2,1},{1,2,2},{1,2,3},{1
 ,2,4},{1,2,

 5},{1,3,1},{1,3,2},{1,3,3},{1,3,4},{1,3,5},{1,4,1},{1,4,2},{1,4,3},
 {1,4,

 4},{1,4,5},{2,1,1},{2,1,2},{2,1,3},{2,1,4},{2,1,5},{2,2,1},{2,2,2},
 {2,2,

 3},{2,2,4},{2,2,5},{2,3,1},{2,3,2},{2,3,3},{2,3,4},{2,3,5},{2,4,1},
 {2,4,

 2},{2,4,3},{2,4,4},{2,4,5},{3,1,1},{3,1,2},{3,1,3},{3,1,4},{3,1,5},
 {3,2,

 1},{3,2,2},{3,2,3},{3,2,4},{3,2,5},{3,3,1},{3,3,2},{3,3,3},{3,3,4},
 {3,3,
 5},{3,4,1},{3,4,2},{3,4,3},{3,4,4},{3,4,5}}

   I've  been  futzing with apply(), outer(), and so on but haven't
found
   an elegant solution.
   Thanks,
   Ronnen.
   P.S. E-mailed CCs of posted replies appreciated.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Robust multivariate regression with rlm

2005-03-23 Thread Bill.Venables
lm works for multivariate responses
rlm does not - check what the help file says about the response.

That's about it, really.

Bill Venables.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Markku
Mielityinen
Sent: Thursday, 24 March 2005 5:20 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Robust multivariate regression with rlm


Dear Group,

I am having trouble with using rlm on multivariate data sets. When I
call rlm I get 

Error in lm.wfit(x, y, w, method = qr) : 
incompatible dimensions

lm on the same data sets seem to work well (see code example). Am I
doing something wrong?

I have already browsed through the forums and google but could not find
any related discussions.

I use Windows XP and R Version 2.0.1  (2004-11-15) (if that makes a
difference).

Example code:

 Mx
  [,1]  [,2]
[1,]  49.10899  45.75513
[2,] 505.92018  48.81037
[3,] 973.30659  50.28478
[4,]  55.99533 508.94504
[5,] 964.96028 513.69579
[6,]  48.25670 975.94972
[7,] 510.21291 967.62767
[8,] 977.12363 978.29216
 My
 [,1] [,2]
[1,]   50   50
[2,]  512   50
[3,]  974   50
[4,]   50  512
[5,]  974  512
[6,]   50  974
[7,]  512  974
[8,]  974  974
 model-lm(My~Mx)
 model

Call:
lm(formula = My ~ Mx)

Coefficients:
 [,1]   [,2] 
(Intercept)   0.934727   3.918421
Mx1   1.003517  -0.004202
Mx2  -0.002624   0.998155

 model-rlm(My~Mx)
Error in lm.wfit(x, y, w, method = qr) : 
incompatible dimensions
 model-rlm(My~Mx,psi=psi.bisquare)
Error in lm.wfit(x, y, w, method = qr) : 
incompatible dimensions

Another example (this one seems to work):

 Mx-matrix(c(0,0,1,0,0,1),ncol=2,byrow=TRUE)+1
 My-matrix(c(0,0,1,1,-1,1),ncol=2,byrow=TRUE)+1
 model-rlm(My~Mx)
 model
Call:
rlm(formula = My ~ Mx)
Converged in 0 iterations

Coefficients:
[,1] [,2]
(Intercept)1   -1
Mx111
Mx2   -11

Degrees of freedom: 6 total; 0 residual
Scale estimate: 0

Best regards,
Markku Mielityinen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] error with polr()

2005-03-21 Thread Bill.Venables
This is always tricky.  Here is a work-around.  

Try asking for the Hessian with the original fit:

 fm - polr(factor(y) ~ lx, data = ord.dat, Hess=T)
 summary(fm)
Call:
polr(formula = factor(y) ~ lx, data = ord.dat, Hess = T)

Coefficients:
  Value Std. Error  t value
lx 2.420614  0.8146359 2.971406

Intercepts:
Value  Std. Error t value
0|1 0.5865 0.8118 0.7224 
1|2 4.8966 1.7422 2.8106 

Residual Deviance: 20.43286 
AIC: 26.43286 

---

[I have no idea if this is the same as SAS but if not, please report
the problem to SAS Inc.]

Bill Venables.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Chaehyung Ahn
Sent: Tuesday, 22 March 2005 11:44 AM
To: r-help@stat.math.ethz.ch
Subject: [R] error with polr()


Dear Sir,

I get an error message when I use polr() in MASS package.

My data is ord.dat.  I made y a factor.

   y y1 y2   x   lx
1  0  0  0 3.2e-02 -1.49485
2  0  0  0 3.2e-02 -1.49485
3  0  0  0 1.0e-01 -1.0
4  0  0  0 1.0e-01 -1.0
5  0  0  0 3.2e-01 -0.49485
6  0  0  0 3.2e-01 -0.49485
7  1  1  0 1.0e+00  0.0
8  0  0  0 1.0e+00  0.0
9  1  1  0 3.2e+00  0.50515
10 1  1  0 3.2e+00  0.50515
11 0  0  0 1.0e+01  1.0
12 1  1  0 1.0e+01  1.0
13 1  1  0 3.2e+01  1.50515
14 2  1  1 3.2e+01  1.50515
15 2  1  1 1.0e+02  2.0
16 1  1  0 1.0e+02  2.0
17 2  1  1 3.2e+02  2.50515
18 1  1  0 3.2e+02  2.50515
19 2  1  1 1.0e+03  3.0
20 2  1  1 1.0e+03  3.0

When I try,
 polr(y~lx,data=ord.dat)

I gives me a output, which is the same as that from SAS.

But when I try,
 summary(polr(y~lx,data=ord.dat))

Re-fitting to get Hessian

Error in optim(start, fmin, gmin, method = BFGS, hessian = Hess, ...)
:
initial value in vmmin is not finite

And the weird thing is that it's fine if I use x instead of
lx, where lx=log10(x).

thanks

Sincerely,

cahn

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] A rude question

2005-01-27 Thread Bill.Venables
When Haydn was asked about his 100+ symphonies he is reputed to have
replied sunt mala bona mixta which is kind of dog latin for There are
good ones and bad ones all mixed together.  It's certainly the same
with R packages so to continue the latin motif: caveat emptor

The R engine, on the other hand, is pretty well uniformly excellent code
but you have to take my word for that.  Actually, you don't.  The whole
engine is open source so, if you wish, you can check every line of it.
If people were out to push dodgy software, this is not the way they'd go
about it.

Bill Venables.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]
Sent: Thursday, 27 January 2005 3:10 PM
To: r-help@stat.math.ethz.ch
Subject: [R] A rude question


Dear all, 
 I am beginner using R. I have a question about it. When you use it,
since it is written by so many authors, how do you know that the
results are trustable?(I don't want to affend anyone, also I trust
people). But I think this should be a question.

 Thanks,
 Ming

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] AIC, glm, lognormal distribution

2004-12-12 Thread Bill.Venables
Benjamin,

A couple of points:

 fit - glm(y ~ x, gaussian(link = log))

does NOT fit a model with a lognormal response distribution.  It fits a
non-linear regression model with an ordinary gaussian response
distribution.  This model has constant variance, whereas the lognormal
model (which you would fit by transforming the response) has constant
coefficient of variation.  You would transform the response for two
reasons, namely it should linearize the relationship between
(transformed) response and predictors AND it should change a constant CV
into homoscedasticity, or constant variance.  This latter property as
important as the first, usually.  You should not think of a glm with log
link as a kind of handy alternative to a log-transformed regression as
they are in reality very different models.

Second point: you claim that the calls

 fitA - glm(y ~ x, gaussian(link = log))
 fitB - glm(y ~ x, gaussian)
 fitC - lm(y ~ x)

give identical results.  This is NOT true for me on R 2.0.1 (Windows),
so you may care to check that, (although fitB and fitC are fully
equivalent, of course).

When you sort out the model you really need to use, you may find stepAIC
in the MASS library useful as one tool for model selection, or at least
for steps towards that generally rather complex goal.

Bill Venables.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Benjamin M.
Osborne
Sent: Monday, 13 December 2004 12:40 PM
To: [EMAIL PROTECTED]
Subject: [R] AIC, glm, lognormal distribution


I'm attempting to do model selection with AIC, using a glm and a
lognormal
distribution, but:

fit1-glm(BA~Year,data=pdat.sp1.65.04, family=gaussian(link=log))

## gives the same result as either of the following:
fit1-glm(BA~Year,data=pdat.sp1.65.04, family=gaussian)
fit1-lm(BA~Year,data=pdat.sp1.65.04)

fit1
#Coefficients:
#(Intercept) Year2004
#-1.6341  -0.2741

#Degrees of Freedom: 84 Total (i.e. Null);  83 Residual
#Null Deviance:  1.521
#Residual Deviance: 1.476AIC: -97.31


fit1-glm(BA~Year,data=pdat.sp1.65.04, family=quasi(link=log))
# also gives the same result but returns AIC: NA


## Is it possible to model a lognormal distribution without having to
transform
## the data themselves?  (i.e.:

fit1-lm(log(BA)~Year,data=pdat.sp1.65.04)



Thanks in advance,
Ben Osborne

--
Botany Department
University of Vermont
109 Carrigan Drive
Burlington, VT 05405

[EMAIL PROTECTED]
phone: 802-656-0297
fax: 802-656-0440

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Can't seem to finish a randomForest.... Just goes and goes!

2004-04-05 Thread Bill.Venables
Alternatively, if you can arrive at a sensible ordering of the levels
you can declare them ordered factors and make the computation feasible
once again.

Bill Venables.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Torsten Hothorn
Sent: Monday, 5 April 2004 4:27 PM
To: David L. Van Brunt, Ph.D.
Cc: R-Help
Subject: Re: [R] Can't seem to finish a randomForest Just goes and
goes!


On Sun, 4 Apr 2004, David L. Van Brunt, Ph.D. wrote:

 Playing with randomForest, samples run fine. But on real data, no go.

 Here's the setup: OS X, same behavior whether I'm using R-Aqua 1.8.1 
 or the Fink compile-of-my-own with X-11, R version 1.8.1.

 This is on OS X 10.3 (aka Panther), G4 800Mhz with 512M physical 
 RAM.

 I have not altered the Startup options of R.

 Data set is read in from a text file with read.table, and has 46 
 variables and 1,855 cases. Trying the following:

 The DV is categorical, 0 or 1. Most of the IV's are either continuous,

 or correctly read in as factors. The largest factor has 30 levels 
 Only the
^

This means: there are 2^(30-1) = 536.870.912 possible splits to be
evaluated everytime this variable is picked up (minus something due to
empty levels). At least the last time I looked at the code, randomForest
used an exhaustive search over all possible splits. Try reducing the
number of levels to something reasonable (or for a first shot: remove
this variable from the learning sample).

Best,

Torsten


 DV seems to need identifying as a factor to force class trees over
 regresssion:

 Mydata$V46-as.factor(Mydata$V46)
 Myforest.rf-randomForest(V46~.,data=Mydata,ntrees=100,mtry=7,proximi
 ties=FALSE
 , importance=FALSE)

 5 hours later, R.bin was still taking up 75% of my processor.  When 
 I've tried this with larger data, I get errors referring to the buffer

 (sorry, not in front of me right now).

 Any ideas on this? The data don't seem horrifically large. Seems like 
 there are a few options for setting memory size, but I'm  not sure 
 which of them to try tweaking, or if that's even the issue.

 __
 [EMAIL PROTECTED] mailing list 
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html



__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] binding vectors or matrix using their names

2004-03-24 Thread Bill.Venables
Goodness Patrick, this must surely qualify for the obfuscated R competition finals.  I 
love it!

There are two solutions I can think of with do.call and here they are:

 x - 1
 x2 - runif(10)
 x12 - c(x, x2)

 do.call(cbind, lapply(x12, as.name))
  x x2
 [1,] 1 0.99327265
 [2,] 1 0.63260097
 [3,] 1 0.17411170
 [4,] 1 0.54635634
 [5,] 1 0.75603670
 [6,] 1 0.27739270
 [7,] 1 0.32125068
 [8,] 1 0.01326344
 [9,] 1 0.37519602
[10,] 1 0.11133052

 do.call(cbind, lapply(x12, get))
  [,1]   [,2]
 [1,]1 0.99327265
 [2,]1 0.63260097
 [3,]1 0.17411170
 [4,]1 0.54635634
 [5,]1 0.75603670
 [6,]1 0.27739270
 [7,]1 0.32125068
 [8,]1 0.01326344
 [9,]1 0.37519602
[10,]1 0.11133052
 

I suspect the first offers a few advantages over the second, (which some other people 
have implicitly suggested).  The first preserves the names in the result.  Also, if 
the vectors are large, the second constructs a large language object and then 
evaluates it to get a large result.  The first uses the names rather than the objects 
themselves, so at least the language object is small, even if the result is not.

A more serious, philosophical word on Patrick's solution.  It is rarely necessary (in 
my limited experience, sure) to have to use parse() like this.  Where it provides a 
quick (kludgy?) solution I often find it a useful exercise to consider alternatives.  
They often come out simpler and you nearly always pick up something useful in the 
process.

Bill V.



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Patrick Burns
Sent: Thursday, 25 March 2004 7:02 AM
To: Stephane DRAY
Cc: [EMAIL PROTECTED]; Stephane DRAY; Tom Blackwell
Subject: Re: [R] binding vectors or matrix using their names


I think you are looking for the eval-parse-text idiom:

eval(parse(text=paste(cbind(, paste(my.names, collapse=, ), 

Patrick Burns

Burns Statistics
[EMAIL PROTECTED]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and A Guide for the Unwilling S User)

Stephane DRAY wrote:

 Hi Tom,

 Your approach did not work,

  do.call(cbind, as.list(my.names))
  [,1] [,2]
 [1,] x  x2

 but it helps me a lot to find the good one:

 do.call(cbind, as.list(parse(text=my.names)))

 Thanks,


 At 14:56 24/03/2004, Tom Blackwell wrote:

 I believe the syntax is

 result - do.call(cbind, as.list(my.names))

 Haven't checked this on your example, though.

 -  tom blackwell  -  u michigan medical school  -  ann arbor  -

 On Wed, 24 Mar 2004, Stephane DRAY wrote:

  Hello list,
  I have two vectors x and x2:
 
  x=runif(10)
  x2=runif(10)
 
  and one vectors with their names :
 
  my.names=c(x,x2)
 
  I would like to cbind these two vectors using their names contained
 in the
  vector my.names.
  I can create a string with comma
  ncomma=paste(my.names,collapse=,)
 
  and now, I just need a function to transform this string into a
 adequate
  argument for cbind:
 
  cbind(afunction(ncomma))
 
  Is there in R a function that can do the job ? If not, how can I do
 it ??
 
  Thanks in advance,
  Sincerely.
 
 
  Stéphane DRAY
  
 -
 -

 
  Département des Sciences Biologiques
  Université de Montréal, C.P. 6128, succursale centre-ville 
  Montréal, Québec H3C 3J7, Canada
 
  Tel : 514 343 6111 poste 1233
  E-mail : [EMAIL PROTECTED]
  
 -
 -

 
  Web  
 http://www.steph280.freesurf.fr/
 
  __
  [EMAIL PROTECTED] mailing list 
  https://www.stat.math.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html
 

 __
 [EMAIL PROTECTED] mailing list 
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html


 Stéphane DRAY
 --
 

 Département des Sciences Biologiques
 Université de Montréal, C.P. 6128, succursale centre-ville Montréal, 
 Québec H3C 3J7, Canada

 Tel : 514 343 6111 poste 1233
 E-mail : [EMAIL PROTECTED]
 --
 

 Web  
 http://www.steph280.freesurf.fr/

 __
 [EMAIL PROTECTED] mailing list 
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html



__
[EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 

RE: [R] profile error on an nls object

2004-03-18 Thread Bill.Venables
Adrian Dragulescu asks:
 
 Hello all,
 
 This is the error message that I get.
hyp.res - nls(log(y)~log(pdf.hyperb(theta,X)), data=dataModel,
 +  start=list(theta=thetaE0),
 +  trace=TRUE)
 45.54325 :   0.100  1.3862944 -4.5577142  0.0005503
 3.728302 :   0.0583857346  0.4757772859 -4.9156128701  0.0005563154
 1.584317 :   0.0194149477  0.3444648833 -4.9365149150  0.0004105426
 1.569333 :   0.0139310639  0.3824648048 -4.9024001228  0.0004089738
 1.569311 :   0.0137155342  0.3888648619 -4.8979817546  0.0004137501
 1.569311 :   0.0136895846  0.3893564152 -4.8976182201  0.0004141057
 1.569311 :   0.0136876315  0.3894059947 -4.8975821760  0.0004141343
hyp.res.S - summary(hyp.res)
hyp.res.S
 
 Formula: log(y) ~ log(pdf.hyperb(theta, X))
 
 Parameters:
  Estimate Std. Error t value Pr(|t|)
 theta1  0.0136876  0.0359964   0.3800.705
 theta2  0.3894060  0.3079860   1.2640.211
 theta3 -4.8975822  0.2219928 -22.062   2e-16 ***
 theta4  0.0004141  0.0005457   0.7590.451
 ---
 Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
 
 Residual standard error: 0.1542 on 66 degrees of freedom
 
 Correlation of Parameter Estimates:
  theta1theta2theta3
 theta2 -0.02168
 theta3 -0.02029  0.997736
 theta4 -0.97182 -0.008054 -0.008952
 
pr1 - profile(hyp.res)
 1.825584 :   0.3894059947 -4.8975821760  0.0004141343
 ...
 1.875413 :  0.000300086 1.830592703 0.000608001
 Error in prof$getProfile() : step factor 0.000488281 reduced below
`minFactor' of 0.000976563
 
 
 Why is there an error on profile, which should only evaluate the
 objective function for different parameter values?

profile() doesn't merely evaluate the objective function at
different parameter values. It holds one of the parameters constant
and optimises the objective function with respect to all the others.
This is repeated for a sequence of values passing through the mle, and
the same for each parameter.  

If you fix a parameter at a very unrealistic value (as can happen) it
is not surprising that the optimisation with respect to all the others
runs into trouble.  This is a computationally difficult area and I am
more surprised by the cases that work without a hitch than by the ones
that don't.  You can expect to have to tweak things a bit in many cases.

Bill Venables.

 
 Thanks a lot.
 
 Adrian
 
 __

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Recursive partitioning with multicollinear variables

2004-02-09 Thread Bill.Venables
No, for regression trees collinearity is a non-issue, because it is not a linear 
procedure.  Having variables that are linearly dependent (even exactly so) merely 
widens the scope of choice that the algorithm has to make cuts.

I'm not sure what you mean by Multicollinear variables should appear as alternate 
splits.  Do you mean that every second split should be in one variable of a 
particular set?  Perhaps you mean alternative instead of alternate?  In either 
case I think you are worrying over nothing.  Just go ahead and do the tree-based model 
analysis and don't worry about it.

Here is a little picture that might clarify things.  Suppose Latitude and Longitude 
are two variables on which the algorithm may choose to split.  This means that splits 
in these geographical variables can only occur in a North-South or an East-West 
direction.  Let's suppose you add in two extra variables that are completely dependent 
on the first, namely

LatPlusLong - Latitude + Longitude
LatMinusLong - Latitude - Longitude

and now offer all four variables as potential split variables.  Now the algorithm may 
split North-South, East-West, NorthEast-SouthWest or NorthWest-SouthEast.  All you 
have done is increase the scope of choice for the algorithm to make splits.  Not only 
does the linear dependence not matter, but I'd argue it could be a pretty good thing.

One serious message to take from this as well, though, is to use regression trees for 
prediction.  Don't read too much into the variables that the algorithm has chosen to 
use at any stage.

Bill Venables.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jean-Noel
Sent: Monday, 9 February 2004 8:25 PM
To: [EMAIL PROTECTED]
Subject: [R] Recursive partitioning with multicollinear variables


Dear all,
I would like to perform a regression tree analysis on a dataset with multicollinear 
variables (as climate variables often are). The questions that I am asking are:
 1- Is there any particular statistical problem in using multicollinear variables in a 
regression tree?
 2- Multicollinear variables should appear as alternate splits. Would it be more 
accurate to present these alternate splits in the results of the analysis or apply a 
variable selection or reduction procedure before the regression tree? Thank you in 
advance,

Jean-Noel Candau

INRA - Unité de Recherches Forestières Méditerranéennes
Avenue A. Vivaldi
84000 AVIGNON
Tel: (33) 4 90 13 59 22
Fax: (33) 4 90 13 59 59

__
[EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html