from:"Bill.Venables"

Re: [R] Extract elements from objects in a list

2011-06-28 Thread Bill.Venables

 x - lapply(1:100, function(x) summary(runif(100)))

 head(x, 4)
[[1]]
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
0.02922 0.38330 0.58120 0.58230 0.83430 0.99870 

[[2]]
Min.  1st Qu.   Median Mean  3rd Qu. Max. 
0.004903 0.281400 0.478900 0.497100 0.729900 0.990700 

[[3]]
Min.  1st Qu.   Median Mean  3rd Qu. Max. 
0.002561 0.283100 0.567400 0.516500 0.736800 0.986700 

[[4]]
Min.  1st Qu.   Median Mean  3rd Qu. Max. 
0.004827 0.264000 0.499500 0.503200 0.732300 0.998500 

 sapply(x, [, 3)[1:4]
Median Median Median Median 
0.5812 0.4789 0.5674 0.4995 
  

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Jeremy Miles
Sent: Wednesday, 29 June 2011 9:23 AM
To: r-help
Subject: [R] Extract elements from objects in a list

Hi All,

I want to extract elements of elements in a list.

Here's an example of what I mean:

If I create a list:

x - as.list(100)
for(loop in c(1:100)) {
x[[loop]] - summary(runif(100))
}


 head(x)
[[1]]
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
0.02271 0.25260 0.58130 0.52120 0.77270 0.99670

[[2]]
Min.  1st Qu.   Median Mean  3rd Qu. Max.
0.006796 0.259700 0.528100 0.515500 0.781900 0.993100

[[3]]
   Min. 1st Qu.  MedianMean 3rd Qu.Max.
0.00927 0.22800 0.40780 0.46410 0.69460 0.98780

I want to extract (say) the medians as a vector.  This would be:
x[[1]][[3]]
x[[2]][[3]]
x[[3]][[3]]

I thought there would be a way of doing this with something like
apply(), but I cannot work it out.  Is there a way of doing this
without a loop?  Thanks,

Jeremy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] New to R, trying to use agnes, but can't load my ditance matrix

2011-06-27 Thread Bill.Venables

The first problem is that you are using a character string as the first 
argument to agnes()

The help information for agnes says that its first argument, x, is


   x: data matrix or data frame, or dissimilarity matrix, depending
  on the value of the 'diss' argument.

Not a character string.  So first you have to read your data into R and hold it 
as a data matrix or data frame.  Then you have a choice.  Either you can 
calculate your own distance matrix with it and then call agnes() with that as 
the first argument (and with diss = TRUE) or you can get agnes() to calculate 
the distance matrix for you, in which case you need to specify how, using the 
metric = argument.

With 1 entities to cluster, your distance matrix will require

 1*/2
[1] 49995000

numbers to be stored at once.  I hope you are using a 64-bit OS!

With such large numbers of entities to cluster, the usual advice is to try 
something more suited to the job.  clara() is designed for this kind of problem.

It might be useful to keep in mind that R is not a package.  (Repeat: R is NOT 
a package - I cannot stress that strongly enough.)  It is a programming 
language. To use it effectively you really need to know something about how it 
works, first.  It might pay you to spend a little time getting used to the 
protocols, how to do simple things in R like reading in data and manipulating 
it, before you tackle such a large and potentially tricky clustering problem.

Bill Venables. 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Karen R. Khar
Sent: Monday, 27 June 2011 5:44 PM
To: r-help@r-project.org
Subject: [R] New to R, trying to use agnes, but can't load my ditance matrix

Hi,

I'm mighty new to R. I'm using it on Windows. I'm trying to cluster using a
distance matrix I created from the data on my own and called it D10.dist. I
loaded the cluster package. Then tried the following command...

 agnes(E:D10.dist, diss = TRUE, metric = euclidean, stand = FALSE,
 method = average, par.method, keep.diss = n  1000, keep.data = !diss)

And it responded...

Error in agnes(E:D10.dist, diss = TRUE, metric = euclidean, stand =
FALSE,  : 
  x is not and cannot be converted to class dissimilarity

D10.dist has the following data...

D1  0
D2  0.6083920
D3  0.4974510.5376620
D4  0.6345480.3933430.5374260
D5  0.5587850.5433990.6322210.7266330
D6  0.6594830.7017780.7414250.668624
0.6559140
D7  0.6030120.6591730.5717760.687599
0.3837120.6839480
D8  0.6119190.6653570.5264530.715093
0.4574960.6982130.3170390
D9  0.41501 0.6521170.5520110.68969 0.485988
0.7027380.42819 0.4425980
D10 0.3765120.6006070.5178570.673515
0.5304210.6677360.5370250.48062
0.2405590

I would appreciate any suggestions. Please assume I know virtually nothing
about R.

Thanks,
Karen

PS I'll eventually be using ~10,000 species to cluster. I'll need to have
within and between cluster distance info and I'll want a plot colored by
cluster. I agnes the right R tool to use?

--
View this message in context: 
http://r.789695.n4.nabble.com/New-to-R-trying-to-use-agnes-but-can-t-load-my-ditance-matrix-tp3627154p3627154.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accessing variables in a data frame

2011-06-26 Thread Bill.Venables

Just to start things off:

 var.name - c(gdp,inf,unp)
 var.id - c(w,i)
 
 x - paste(var.name, rep(var.id, each=length(var.name)), sep=_)
 x
[1] gdp_w inf_w unp_w gdp_i inf_i unp_i
 

Now the three differences:

gdp_w - gdp_i
inf_w - inf_i
unp_w - unp_i

Can be got using

dwi - dat[, x[1:3]] - dat[, x[4:6]]

and the other three differences

gdp - gdp_w
inf - inf_w
unp - unp_w

by

dw - dat[, var.name] - dat[, x[1:3]]

The results, in both cases, should be data frames

Bill Venables.
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Serguei Kaniovski
Sent: Sunday, 26 June 2011 10:01 PM
To: r-help@r-project.org
Subject: [R] Accessing variables in a data frame


Hello

My data.frame (dat) contains many variables named var.names and others
named var.names_var.id

For example

var.name - c(gdp,inf,unp)
var.id - c(w,i)

x - paste(var.name, rep(var.id, each=length(var.name)), sep=_)

How can I access variables in the dama.frame by names listed in x, for
example to compute

gdp_w - gdp_i
inf_w - inf_i
unp_w - unp_i

or

gdp - gdp_w
inf - inf_w
unp - unp_w

without needing to code each difference separately?

Thanks for your help!
Serguei Kaniovski
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extract worksheet names from an Excel file

2011-06-23 Thread Bill.Venables

Package XLConnect appears to provide this kind of thing. 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Shi, Tao
Sent: Friday, 24 June 2011 2:42 PM
To: r-help@r-project.org
Subject: [R] extract worksheet names from an Excel file

Hi list,

Is there a R function I can use to extract the worksheet names from an Excel 
file?  If no, any other automatic ways (not using R) to do this?

thanks!

...Tao

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] omitting columns from a data frame

2011-06-21 Thread Bill.Venables

Suppose

names(xm1) - c(alpha, beta, gamma, delta)

then

xm2 - subset(xm1, select = alpha:gamma)

or

xm2 - subset(xm1, select = -delta)

will do the same job as xm1[, -4] 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Erin Hodgess
Sent: Tuesday, 21 June 2011 1:45 PM
To: R help
Subject: [R] omitting columns from a data frame

Dear R People:

I have a data frame, xm1, which has 12 rows and 4 columns.

If I put is xm1[,-4], I get all rows, and columns 1 - 3, which is as
it should be.

Now, is there a way to use the names of the columns to omit them, please?

Thanks so much in advance!

Sincerely,
Erin


-- 
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] and now a cut question

2011-06-21 Thread Bill.Venables

 con - 
+ textConnection(13053 13068 13068 13053 14853 14853 14850 14850 13053 13053 
13068 13068
+ )
 x - scan(con)
Read 12 items
 
 cut(x, 4)
 [1] (1.31e+04,1.35e+04] (1.31e+04,1.35e+04] (1.31e+04,1.35e+04]
 [4] (1.31e+04,1.35e+04] (1.44e+04,1.49e+04] (1.44e+04,1.49e+04]
 [7] (1.44e+04,1.49e+04] (1.44e+04,1.49e+04] (1.31e+04,1.35e+04]
[10] (1.31e+04,1.35e+04] (1.31e+04,1.35e+04] (1.31e+04,1.35e+04]
4 Levels: (1.31e+04,1.35e+04] (1.35e+04,1.4e+04] ... (1.44e+04,1.49e+04]
 
 cut(x, 4, dig.lab = 5)
 [1] (13051,13502] (13051,13502] (13051,13502] (13051,13502] (14404,14855]
 [6] (14404,14855] (14404,14855] (14404,14855] (13051,13502] (13051,13502]
[11] (13051,13502] (13051,13502]
Levels: (13051,13502] (13502,13953] (13953,14404] (14404,14855]
  

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Erin Hodgess
Sent: Tuesday, 21 June 2011 10:46 AM
To: R help
Subject: [R] and now a cut question

Hello again R People:

I have the following:

 xm1[,1]
 [1] 13053 13068 13068 13053 14853 14853 14850 14850 13053 13053 13068 13068
 cut(xm1[,1],4)
 [1] (1.31e+04,1.35e+04] (1.31e+04,1.35e+04] (1.31e+04,1.35e+04]
 [4] (1.31e+04,1.35e+04] (1.44e+04,1.49e+04] (1.44e+04,1.49e+04]
 [7] (1.44e+04,1.49e+04] (1.44e+04,1.49e+04] (1.31e+04,1.35e+04]
[10] (1.31e+04,1.35e+04] (1.31e+04,1.35e+04] (1.31e+04,1.35e+04]
4 Levels: (1.31e+04,1.35e+04] (1.35e+04,1.4e+04] ... (1.44e+04,1.49e+04]


Is there any way to have the levels print as 13100, 13500, etc., please?

Thanks,
Erin


-- 
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] converting character to numeric

2011-06-21 Thread Bill.Venables

..or something like that.  Without more details it is hard to know just what is 
going on. 

Firstly in R the object is a 'data frame' (or object of class data.frame to 
be formal).  There is no standard object in R called a 'database'.  

If you read in your data using read.csv, then mydata is going to be a data 
frame.  Character columns in the original .csv file will be (most likely) 
factors in the R object.  (This varies with how you import it, though.)  This 
means you will need to convert them to character before you convert them to 
numeric.  If they really are character, this initial conversion will not do 
anything (good or bad).

If you want to operate on the individual columns of the data frame, then I 
would recommend you do it using something like:

mydata - within(mydata, {
apples - as.numeric(as.character(apples)) 
oranges - as.numeric(as.character(oranges))
...
})

Bill Venables.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Steven Kennedy
Sent: Wednesday, 22 June 2011 6:52 AM
To: Alina Sheyman
Cc: r-help@r-project.org
Subject: Re: [R] converting character to numeric

You need:
mydata$apples-as.numeric(mydata$apples)


On Wed, Jun 22, 2011 at 6:38 AM, Alina Sheyman alina...@gmail.com wrote:
 I'm trying to convert data from character to numeric.

  I've imported data as a csv file, I'm assuming that the import is a
 database - are all the columns in  a database considered vectors  and that
 they can be  operated on individually
 Therefore I've tried the following
 mydata - as.numeric(mydata$apples)

 when i then look at mydata again  the named column is still in character
 format
  if i do mydata2 - as.numeric(mydata$apples)
 the new object mydata2 is empty.

 Am i missing something about the structure of R?

 alina

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] converting character to numeric

2011-06-21 Thread Bill.Venables

The point I would make is that for safety it's much better to use FALSE rather 
than F.  FALSE is a reserved word in R, F is a pre-set variable, but can easily 
be changed at any time by the user.

Secondly, doesn't this do the same as yours:

readFF.csv - function(..., stringsAsFactors = FALSE) 
read.csv(..., stringsAsFactors = stringsAsFactors)

?  (Warning: untested code...)

Personally, I much prefer stringsAsFactors = TRUE, but then that's the way I 
was brought up...

Bill Venables.

-Original Message-
From: Don McKenzie [mailto:d...@u.washington.edu] 
Sent: Wednesday, 22 June 2011 8:40 AM
To: Venables, Bill (CMIS, Dutton Park)
Cc: stevenkennedy2...@gmail.com; alina...@gmail.com; r-help@r-project.org
Subject: Re: [R] converting character to numeric

I have to chime in here with a slight revision to read.csv(), for  
those of us like me who have whined over time about stringsAsFactors.
(tested on my machine macOSX 10.6)

readFF.csv -
function (file, header = TRUE, sep = ,, quote = \, dec = .,
 fill = TRUE, comment.char = , ...)
read.table(file = file, header = header, sep = sep, quote = quote,
 dec = dec, fill = fill, stringsAsFactors=F,comment.char =  
comment.char, ...)


On 21-Jun-11, at 3:16 PM, bill.venab...@csiro.au wrote:

 ..or something like that.  Without more details it is hard to know  
 just what is going on.

 Firstly in R the object is a 'data frame' (or object of class  
 data.frame to be formal).  There is no standard object in R  
 called a 'database'.

 If you read in your data using read.csv, then mydata is going to be  
 a data frame.  Character columns in the original .csv file will be  
 (most likely) factors in the R object.  (This varies with how you  
 import it, though.)  This means you will need to convert them to  
 character before you convert them to numeric.  If they really are  
 character, this initial conversion will not do anything (good or bad).

 If you want to operate on the individual columns of the data frame,  
 then I would recommend you do it using something like:

 mydata - within(mydata, {
   apples - as.numeric(as.character(apples))
   oranges - as.numeric(as.character(oranges))
   ...
 })

 Bill Venables.

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- 
 project.org] On Behalf Of Steven Kennedy
 Sent: Wednesday, 22 June 2011 6:52 AM
 To: Alina Sheyman
 Cc: r-help@r-project.org
 Subject: Re: [R] converting character to numeric

 You need:
 mydata$apples-as.numeric(mydata$apples)


 On Wed, Jun 22, 2011 at 6:38 AM, Alina Sheyman alina...@gmail.com  
 wrote:
 I'm trying to convert data from character to numeric.

  I've imported data as a csv file, I'm assuming that the import is a
 database - are all the columns in  a database considered  
 vectors  and that
 they can be  operated on individually
 Therefore I've tried the following
 mydata - as.numeric(mydata$apples)

 when i then look at mydata again  the named column is still in  
 character
 format
  if i do mydata2 - as.numeric(mydata$apples)
 the new object mydata2 is empty.

 Am i missing something about the structure of R?

 alina

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

Science is organized skepticism in the reliability of expert opinion
 -- Richard Feynman




Don McKenzie, Research Ecologist
Pacific WIldland Fire Sciences Lab
US Forest Service

Affiliate Professor
School of Forest Resources, College of the Environment
CSES Climate Impacts Group
University of Washington

phone: 206-732-7824
d...@uw.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Unreasonable syntax error

2011-06-20 Thread Bill.Venables

The advice is always NOT to use Microsoft Word to edit an R file.  That stuff 
is poisonous.  Microsoft word, typical of all Microsoft software, does not do 
what you tell it to do but helpfully does what it thinks you meant to ask it to 
do but were too dumb to do so.

Even notepad, gawdelpus, would be better.

When I look at your script the first sign of trouble I see is:

Error: unexpected input in:
 s[1,3,k]- (0.1) * runif(1)+ 0.1; 
 s[1,1,k]- (0.02) *  runif(1)+ 0.98 ¨

which is the spurious character Microsoft word has helpfully inserted to make 
it all look nice.  It's downhill all the way from there.

(R does not expect Microsoft Word, just as nobody expects the Spanish 
Inquisition.  See http://www.youtube.com/watch?v=CSe38dzJYkY).

Bill Venables.

 
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of james198877
Sent: Tuesday, 21 June 2011 7:13 AM
To: r-help@r-project.org
Subject: [R] Unreasonable syntax error

http://r.789695.n4.nabble.com/file/n3612530/PSC.r PSC.r 

Hi all,

I just wrote a program in R by editing it in Microsoft Word and then pasting
into the text editor of R. The above is the file.

And below is what the console complains Why doesn't it recognise 'r'??

I have to mention that at least when I typed this first several lines into
the console, the first error didn't appear. I don't try the next errors as
there would be too many lines to type...I'm not sure if this is something
about Word

http://r.789695.n4.nabble.com/file/n3612530/lastsave.txt lastsave.txt 

Thanks a lot for your help!!!

--
View this message in context: 
http://r.789695.n4.nabble.com/Unreasonable-syntax-error-tp3612530p3612530.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] setting breaks in hist

2011-06-20 Thread Bill.Venables

The way to guarantee a specific number of panels in the histogram, say n, is to 
specify n+1 breaks which cover the range of the data.  As far as I know this is 
the only way.

Bill Venables.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Erin Hodgess
Sent: Tuesday, 21 June 2011 10:17 AM
To: R help
Subject: [R] setting breaks in hist

Dear R People:

Is there a way to guarantee that breaks=n will give you exactly n
breaks, please?

I'm fairly certain that the answer is no, but thought I'd check.

Thanks,
Erin


-- 
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] please help! what are the different using log-link function and log transformation?

2011-06-19 Thread Bill.Venables

The two commands you give below are certain to lead to very different results, 
because they are fitting very different models.

The first is a gaussian model for the response with a log link, and constant 
variance.

The second is a gaussian model for a log-transformed response and identity 
link.  On the original scale this model would imply a constant coefficient of 
variation and hence a variance proportional to the square of the mean, and not 
constant.

Your problem is not particularly an R issue, but a difficulty with 
understanding generalized linear models (and hence generalized additive models, 
which are based on them).

Bill Venables.

From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
pigpigmeow [gloryk...@hotmail.com]
Sent: 19 June 2011 17:39
To: r-help@r-project.org
Subject: [R] please help! what are the different using log-link function and 
log transformation?

I'm new R-programming user, I need to use gam function.

y - gam(a ~ s(b), family = gaussian(link=log),  data)
y - gam(log(a) ~ s(b), family = gaussian (link=identity), data)

why [do] these two command [give different] results?

I guess these two command results are same, but actally these two command
results are different, Why?

--
View this message in context: 
http://r.789695.n4.nabble.com/please-help-what-are-the-different-using-log-link-function-and-log-transformation-tp3608931p3608931.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] For loop by factor.

2011-06-19 Thread Bill.Venables

If I understand you correctly, you are trying to find the cumulative maximum 
from the end within each level of the factor.  If this is what you are trying 
to do, then here is one way you might like to do it.  First, define the 
function:

 cumMax - function(x) Reduce(max, x, right = TRUE, accumulate = TRUE)

Here is how you might use it:

 test
  A B
1 a 3
2 a 2
3 a 1
4 b 3
5 b 2
6 c 2
7 c 3
8 c 1
9 c 1
 (test - within(test, B - ave(B, A, FUN = cumMax))
  A B
1 a 3
2 a 2
3 a 1
4 b 3
5 b 2
6 c 3
7 c 3
8 c 1
9 c 1 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Christopher Peters
Sent: Monday, 20 June 2011 6:21 AM
To: r-help@r-project.org
Subject: [R] For loop by factor.

I have a data.frame as follows:

a  3
a  2
a  1
b  3
b  2
c  2
c  3
c  1
c  1

Each factor (a, b, c) should be monotonically decreasing, notice that factor
'c' is not.

I could use some help to figure out how to form a logical structure (mostly
just syntax), that will check each 'next value' for each factor to see if it
is less than the previous value.  If it is less than the previous value, do
nothing, else subtract 'next value' from 'current value', add that amount to
the starting value and each previous value to the 'next value' is greater
than 'previous value'.

So basically the data.frame would look like:

a 3
a 2
a 1
b 3
b 2
c 3
c 3
c 1
c 1

Thanks for your help!
__
Christopher P. Peters
Research Associate
Center for Energy Studies
http://www.enrg.lsu.edu/staff/peters
Energy, Coast  Environment Building
Louisiana State University
Baton Rouge, LA 70803
Telephone: 225-578-4400
Fax: 225-578-4541

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] can this sequence be generated easier?

2011-06-17 Thread Bill.Venables

Here ia an idea that might be useful to adapt

fixedSumCombinations - function(N, terms) 
if(terms == 1) return(N) else
if(terms == 2) return(cbind(0:N, N:0)) else {
X - NULL
for(i in 0:N) 
X - rbind(X, cbind(i, Recall(N-i, terms-1)))
X
}

example:

 fixedSumCombinations(4, 3)
  i
 [1,] 0 0 4
 [2,] 0 1 3
 [3,] 0 2 2
 [4,] 0 3 1
 [5,] 0 4 0
 [6,] 1 0 3
 [7,] 1 1 2
 [8,] 1 2 1
 [9,] 1 3 0
[10,] 2 0 2
[11,] 2 1 1
[12,] 2 2 0
[13,] 3 0 1
[14,] 3 1 0
[15,] 4 0 0

Bill Venables.


From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
pdb [ph...@philbrierley.com]
Sent: 18 June 2011 14:34
To: r-help@r-project.org
Subject: [R] can this sequence be generated easier?

I have 'x' variables that I need to find the optimum combination of, with the
constraint that the sum of all x variables needs to be exactly 100. I need
to test all combinations to get the optimal mix.

This is easy if I know how many variables I have - I can hard code as below.
But what if I don't know the number of variables and want this to be a
flexible parameter. Is there a sexy recursive way that this can be done in
R?

#for combinations of 2 variables
vars = 2
for(i in 0:100){
for(j in 0:(100-i)){
...do some test i,j combination
}}

#for combinations of 3 variables
vars = 3
for(i in 0:100){
for(j in 0:(100-i)){
for(k in 0:100-(i+j)){
...do some test on i,j,k combination
}}}



--
View this message in context: 
http://r.789695.n4.nabble.com/can-this-sequence-be-generated-easier-tp3607240p3607240.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] functions for polynomial and rational interplation?

2011-06-13 Thread Bill.Venables

Neville's algorithm is not an improvement on Lagrange interpolation, it is 
simply one way of calculating it that has some useful properties.  The result 
is still the Lagrange interpolating polynomial, though, with all its flaws.

Implementing Neville's algorithm is fairly easy using the PolynomF package, but 
I'm not sure if it really offers much advantage.  YMMV.

Bill Venables. 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of dslowik
Sent: Tuesday, 14 June 2011 10:34 AM
To: r-help@r-project.org
Subject: [R] functions for polynomial and rational interplation?

Are there implementations of, e.g. Neville's algorithm, for interpolating
polynomials through some data points? Nevilles' is an improvement on
Lagrange interpolation. And how about interpolating rational functions? I
could not find anything at rseek.org or at crantastic.org.
thanks

--
View this message in context: 
http://r.789695.n4.nabble.com/functions-for-polynomial-and-rational-interplation-tp3595334p3595334.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Score Test Function

2011-06-11 Thread Bill.Venables

The score test looks at the effect of adding extra columns to the model matrix. 
 The function glm.scoretest takes the fitted model object as the first argument 
and the extra column, or columns, as the second argument.  Your x2 argument has 
length only 3.  Is this really what you want?  I would have expected you need 
to specify a vector of length nrow(DF), [as in the help information for 
glm.scoretest itself].

Bill Venables.
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Tyler Rinker
Sent: Sunday, 12 June 2011 2:46 PM
To: r-help@r-project.org
Subject: [R] Score Test Function


Greeting R Community,
 
I'm trying to learn Logistic Regression on my own and am using An Introduction 
to Logistic Regression Analysis and Reporting (Peng, C., Lee, K.,  Ingersoll, 
G. ,2002). This article uses a Score Test Stat as a measure of overall fit for 
a logistic regression model.  The author calculates this statistic using SAS.  
I am looking for an [R] function that can compute this stat and a p=value 
(please don't critique the stat itself).  As far as I can tell glm.scoretest() 
from library(statmod) is the only function for this but it does not give a 
pvalue and appears to not correspond with the author's example.  Some chat on 
the discussion board a while back indicated that this was a well known test 
Click here for that thread but difficult to calculate in [R].  I'm wondering if 
there is a way to do this fairly easily in [R] at this time?
 
I am working with the data set from the study and am looking to get close to 
the 9.5177 score test stat and .0086 p-value with 2 degrees freedom.
Below is the code for the data set and some descriptives so you can see the 
data set is the same as  from the study.  I highlighted my attempt to get a 
score test statistic and am curious if this is it (minus the p-value).
 
#BEGINNING OF CODE
id-factor(1:189)
gender-factor(c(Boy,Boy,Girl,Girl,Girl,Boy,Girl,Boy,Girl,Girl,Boy,Boy,Boy,Boy,Girl,Girl,Boy,Girl,Boy,Girl,Boy,Girl,Girl,
Boy,Boy,Girl,Girl,Girl,Boy,Boy,Boy,Girl,Boy,Girl,Boy,Girl,Girl,Girl,Girl,Girl,Boy,Girl,Boy,Girl,Girl,Girl,
Girl,Boy,Girl,Boy,Girl,Boy,Girl,Girl,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Girl,Boy,Boy,Boy,Boy,Girl,Boy,
Girl,Boy,Boy,Boy,Girl,Boy,Girl,Girl,Boy,Girl,Girl,Girl,Boy,Boy,Boy,Boy,Boy,Girl,Girl,Girl,Girl,Boy,Girl,
Girl,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Boy,Girl,Boy,Boy,Girl,Girl,Girl,Boy,Girl,Boy,Girl,Girl,Girl,Boy,
Girl,Boy,Girl,Boy,Girl,Boy,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Girl,Boy,Girl,Boy,Boy,Boy,Boy,Boy,Boy,Boy,Girl,
Girl,Girl,Boy,Boy,Girl,Girl,Boy,Girl,Boy,Boy,Boy,Girl,Girl,Girl,Girl,Boy,Boy,Girl,Boy,Boy,Girl,Boy,Boy,Boy,
Boy,Girl,Boy,Boy,Girl,Girl,Boy,Boy,Boy,Boy,Boy,Girl,Girl,Girl,Girl,Boy,Boy,Boy,Girl,Boy,Girl,Boy,Boy,Boy,Girl))
reading.score-c(91.0,77.5,52.5,54.0,53.5,62.0,59.0,51.5,61.5,56.5,47.5,75.0,47.5,53.5,50.0,50.0,49.0,59.0,60.0,60.0,
60.5,50.0,101.0,60.0,60.0,83.5,61.0,75.0,84.0,56.5,56.5,45.0,60.5,77.5,62.5,70.0,69.0,62.0,107.5,54.5,92.5,94.5,65.0,
80.0,45.0,45.0,66.0,66.0,57.5,42.5,60.0,64.0,65.0,47.5,57.5,55.0,55.0,76.5,51.5,59.5,59.5,59.5,55.0,70.0,66.5,84.5,
57.5,125.0,70.5,79.0,56.0,75.0,57.5,56.0,67.5,114.5,70.0,67.0,60.5,95.0,65.5,85.0,55.0,63.5,61.5,60.0,52.5,65.0,87.5,
62.5,66.5,67.0,117.5,47.5,67.5,67.5,77.0,73.5,73.5,68.5,55.0,92.0,55.0,55.0,60.0,120.5,56.0,84.5,60.0,85.0,93.0,60.0,
65.0,58.5,85.0,67.0,67.5,65.0,60.0,47.5,79.0,80.0,57.5,64.5,65.0,60.0,85.0,60.0,58.0,61.5,60.0,65.0,93.5,52.5,42.5,
75.0,48.5,64.0,66.0,82.5,52.5,45.5,57.5,65.0,46.0,75.0,100.0,77.5,51.5,62.5,44.5,51.0,56.0,58.5,69.0,65.0,60.0,65.0,
65.0,40.0,55.0,52.5,54.5,74.0,55.0,60.5,50.0,48.0,51.0,55.0,93.5,61.0,52.5,57.5,60.0,71.0,65.0,60.0,55.0,60.0,77.0,
52.5,95.0,50.0,47.5,50.0,47.0,71.0,65.0)
reading.recommendation-as.factor(c(rep(no,130),rep(yes,59)))
DF-data.frame(id,gender,reading.score,reading.recommendation)
head(DF)
#=
#  DESCRIPTIVES
#=
table(DF[2:4])  #BREAKDOWN OF SCORES BY GENDER AND REMEDIAL READING 
RECOMENDATIONS
table(DF[c(2)])  #TABLE OF GENDER
print(table(DF[c(2)])/sum(table(DF[c(4)]))*100,digits=4)#PERCENT GENDER 
BREAKDOWN
table(DF[c(4)])  #TABLE RECOMENDDED FOR REMEDIAL READING
print(table(DF[c(4)])/sum(table(DF[c(4)]))*100,digits=4)#Probability of 
Reccomended
table(DF[c(2,4)])  #TABLE OF GIRLS AND BOYS RECOMENDDED FOR REMEDIAL READING
print(prop.table(table(DF[c(2,4)]),1)*100,digits=4)#Probability of Reccomended
#=
#ANALYSIS
#=
(mod1-glm(reading.recommendation~reading.score+gender,family=binomial,data=DF))
 
library(statmod)
with(DF,glm.scoretest(mod1, c(0,2,3), dispersion=NULL))^2
#If I move the decimal over 2 to the right

Re: [R] Custom Sort on a Table object

2011-06-06 Thread Bill.Venables

Here is a one way.

 tab
fm
0 to 5  11.328000 6.900901
15 to 24 6.100570 5.190058
25 to 34 9.428707 6.567280
35 to 4410.462158 7.513270
45 to 54 7.621988 5.692905
5 to 14  6.502741 6.119663
55 to 64 5.884737 4.319905
65 to 74 5.075606 4.267810
75 to 84 4.702020 3.602362
85 and_over  4.75 3.877551
 
 lowAge - as.numeric(sapply(strsplit(rownames(tab), ), [, 1))
 (tab - tab[order(lowAge), ])
fm
0 to 5  11.328000 6.900901
5 to 14  6.502741 6.119663
15 to 24 6.100570 5.190058
25 to 34 9.428707 6.567280
35 to 4410.462158 7.513270
45 to 54 7.621988 5.692905
55 to 64 5.884737 4.319905
65 to 74 5.075606 4.267810
75 to 84 4.702020 3.602362
85 and over  4.75 3.877551
  

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Galen Moore
Sent: Tuesday, 7 June 2011 1:23 PM
To: r-help@r-project.org
Subject: [R] Custom Sort on a Table object

Greetings - 

 

I've got the following table (the result of a two-way table operation):

 

  f m

  0 to 5  11.328000  6.900901

  15 to 24 6.100570  5.190058

  25 to 34 9.428707  6.567280

  35 to 4410.462158  7.513270

  45 to 54 7.621988  5.692905

  5 to 14  6.502741  6.119663

  55 to 64 5.884737  4.319905

  65 to 74 5.075606  4.267810

  75 to 84 4.702020  3.602362

  85 and over  4.75  3.877551

 

Which I'd like to sort so that the column of rownames (which represent age
bands) and their corresponding f and m values appear in logical order.  I've
tried a bunch of things; merging with a separate df bearing Age Bands paired
with a sequence number, stripping out row vectors and rbind-ing a new df,
etc., all to no avail.  It seems to be very difficult to spin a table object
into a data frame without being stuck with the tables rownames!

 

I haven't yet tried writing to an external file and then reading it back (so
as to get R to forget that it's a Table object), and then merging on the
group bands to pull in a sequence vector upon which to do an order().  Seems
like it should be easier. 

 

Many thanks,

 

Galen 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Matrix Question

2011-06-02 Thread Bill.Venables

Here is one way you might do it.

 con - textConnection(
+ characteristics_ch1.3  Stage: T1N0  Stage: T2N1
+ Stage: T0N0  Stage: T1N0  Stage: T0N3
+ )
 txt - scan(con, what = )
Read 11 items
 close(con)
 
 Ts - grep(^T, txt, value = TRUE)
 Ts - sub(T([[:digit:]]+)N([[:digit:]]+), \\1x\\2, Ts)
 out - do.call(rbind, strsplit(Ts, x))
 mode(out) - numeric
 dimnames(out) - list(rep(, nrow(out)), c(T, N))
 
 out
 T N
 1 0
 2 1
 0 0
 1 0
 0 3
 

Now you can print 'out' however you want it, e.g.

 sink(outfile.txt)
 out
 sink()

This is slightly more complex than it might be as I have allowed for the 
possibility that your numbers have more than one digit.

Bill Venables. 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Ben Ganzfried
Sent: Friday, 3 June 2011 4:54 AM
To: r-help@r-project.org
Subject: [R] Matrix Question

Hi,

First of all, I would like to introduce myself as I will probably have many
questions over the next few weeks and want to thank you guys in advance for
your help.  I'm a cancer researcher and I need to learn R to complete a few
projects.  I have an introductory background in Python.

My questions at the moment are based on the following sample input file:
*Sample_Input_File*
 characteristics_ch1.3  Stage: T1N0  Stage: T2N1  Stage: T0N0  Stage:
T1N0  Stage:
T0N3

characteristics_ch1.3 is a column header in  the input excel file.

T's represent stage and N's represent degree of disease spreading.

I want to create output that looks like this:
*Sample_Output_File*
T N
1 0
2 1
0 0
1 0
0 3

As it currently stands, my code is the following:

rm(list=ls())
source(../../functions.R)

uncurated - read.csv(../uncurated/Sample_Input_File_full_pdata.csv,as.is
=TRUE,row.names=1)

##initial creation of curated dataframe
curated -
initialCuratedDF(rownames(uncurated),template.filename=Sample_Template_File.csv)

##
##start the mappings
##


##title - alt_sample_name
curated$alt_sample_name - uncurated$title

#T
tmp - uncurated$characteristics_ch1.3
tmp - *??*
curated$T - tmp

#N
tmp - uncurated$characteristics_ch1.3
tmp - *??*
curated$N - tmp

write.table(curated, row.names=FALSE,
file=../curated/Sample_Output_File_curated_pdata.txt,sep=\t)

My question is the following:

What code gets me the desired output (replacing the *??*'s above)?  I
want to: a) Find the integer value one element to the right of T; and b)
find the integer value one element to the right of N.  I've read the
regular expression tutorial for R, but could only figure out how to grab an
integer value if it is the only integer value in the row (ie more than one
integer value makes this basic regular expression unsuccessful).

Thank you very much for any help you can provide.

Sincerely,

Ben Ganzfried

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] count value changes in a column

2011-05-31 Thread Bill.Venables

I thought so to.  If so, here is one way you could do it

fixSeq - function(state) {
  shift1 - function(x) c(1, x[-length(x)])
  repeat {
change - state %in% c(4,5)  shift1(state) == 3
if(any(change))
state[change] - 3 else break
  }
  state
}

e.g.
 state
 [1] 1 3 3 5 5 3 2 4 2 1 5 3 3 5
 fixSeq(state)
 [1] 1 3 3 3 3 3 2 4 2 1 5 3 3 3
 

For the data frame:
 set.seed(144)
 dfr - data.frame(state = sample(rep(1:5,200)))
 
 dfr - within(dfr, changed - fixSeq(state))
 head(dfr, 11)
   state changed
1  5   5
2  2   2
3  1   1
4  5   5
5  2   2
6  1   1
7  1   1
8  4   4
9  3   3
10 5   3  ### --- change
11 3   3
  

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of jim holtman
Sent: Wednesday, 1 June 2011 11:20 AM
To: Justin Haynes
Cc: r-help@r-project.org
Subject: Re: [R] count value changes in a column

Why isn't the sequence:

1 3 3 3 3 3 2 4 2 1 5 3 3 3

according to your rule about 5s following 3s.

On Tue, May 31, 2011 at 6:23 PM, Justin Haynes jto...@gmail.com wrote:
 is there a way to look for value changes in a column?

 set.seed(144)
 df-data.frame(state=sample(rep(1:5,200),1000))

 any of the five states are acceptable.  however if, for example,
 states 4 or 5 follow state 3, i want to overwrite them with 3.
 changes from 1 to any value and 2 to any value are acceptable as are
 changes from any value to 1 or 2.

 By way of an example:

 the sequence 1 3 3 5 5 3 2 4 2 1 5 3 3 5

 should read   1 3 3 3 3 3 2 4 2 1 5 5 5 5


 Thanks for the help!

 Justin

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Forcing a negative slope in linear regression?

2011-05-31 Thread Bill.Venables

If you want to go ahead with this in cold blood, you might look at the 'nnls' 
package.  

It fits regressions with non-negative coefficients.  This might seem like the 
very opposite of what you want, but it essentially gets you there.  You have to 
be prepared for the coefficient to go to zero though, if according to the data 
it really needs to be positive to minimise the residual SSQ.

Here's what you do:

* For any predictor, x, for which you want the regression coefficient to be 
non-positive, use -x as the predictor in the model.  Think about it.

* (The real trick) For any predictor, z, whose coefficient is not to be 
constrained at all, put *both* z and -z in as predictors.  The algorithm will 
choose only one of them.

nnls is now quite an old package and the interface is rather klunky, but the 
method is still OK.

Bill Venables.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Jeff Newmiller
Sent: Wednesday, 1 June 2011 11:38 AM
To: J S; r-help@r-project.org
Subject: Re: [R] Forcing a negative slope in linear regression?

If you force the slope, it is no longer a regression, so no. It is best to add 
those other dependent variables to the regression and evaluate whether their 
presence causes the fit to improve and yield signs of coefficients that match 
what you expect.
---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---
Sent from my phone. Please excuse my brevity.

J S yulya...@gmail.com wrote:

Dear forum members,



How can I force a negative slope in a linear regression even though the
slope might be positive?



I will need it for the purpose of determining the trend due reasons other
than biological because the biological (genetic) trend is not positive for
these data.



Thanks. Julia




Example of the data:



[1] 1.254 1.235 1.261 0.952 1.202 1.152 0.801 0.424 0.330 0.251 0.229 0.246

[13] 0.414 0.494 0.578 0.628 0.514 0.594 0.827 0.812 0.629 0.928 0.707 0.976

[25] 1.099 1.039 1.272 1.398 1.926 1.987 2.132 1.644 2.174 2.453 2.392 3.002

[37] 3.352 2.410 2.206 2.692 2.653 1.604 2.536 3.070 3.137 4.187 4.803 4.575

[49] 4.580 3.779 4.201 5.685 4.915 5.929 5.474 6.140 5.182 5.524 5.848 5.830

[61] 5.800 7.517 6.422

[[alternative HTML version deleted]]

_

R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Value of 'pi'

2011-05-30 Thread Bill.Venables

There is an urban legend that says Indiana passed a law implying pi = 3.

(Because it says so in the bible...)

 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Joshua Wiley
Sent: Monday, 30 May 2011 4:10 PM
To: Vincy Pyne
Cc: r-help@r-project.org
Subject: Re: [R] Value of 'pi'

Dear Vincy,

I hope that in school you also learned that 22/7 is an approximation.
Please consult your local mathematician for a proof that pi != 22/7.
A quick search will provide you with volumes of information on what pi
is, how it may be calculated, and calculations out to thousands of
digits.

Cheers,

Josh

On Sun, May 29, 2011 at 11:01 PM, Vincy Pyne vincy_p...@yahoo.ca wrote:
 Dear R helpers,

 I have one basic doubt about the value of pi. In school, we have learned that

 pi = 22/7 (which is = 3.142857). However, if I type pi in R, I get pi = 
 3.141593. So which value of pi should be considered?

 Regards

 Vincy





        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Basic question about three factor Anova

2011-05-30 Thread Bill.Venables

This is really a question about the help file for gl.

The arguments are

gl(n, k, length = n*k, labels = 1:n, ordered = FALSE)

'n' is the number of factor levels.  That seems to be easy enough

'k' is called the number of replications.  This is perhaps not the best way 
to express what it is.  k is the number of times each of the n levels is to be 
repeated before starting again.  In your example the 'a' levels are repeated 3 
times (to cover 'b'), the 'b' levels are repeated once since you read in the 
values b1 b2 b3 b1 b2 ... and the levels of 'c' are repeated 60 times each 
since the top 60 values are all c1 and the bottom 60 values are all c2.

'length' is the overall length of the factor you are generating.  By default is 
is just n*k, but in this case it has to be 4 (A levels) x 3 (B levels) x 2 (C 
levels) x 5 (reps in each A:B:C subgroup).

The other two arguments are clear enough.

Bill Venables.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Bogdan Lataianu
Sent: Tuesday, 31 May 2011 8:05 AM
To: r-help@r-project.org
Subject: [R] Basic question about three factor Anova

 Read the data using scan():
#
#  a1   a2   a3   a4
# ----
# b1   b2   b3 b1   b2   b3 b1   b2   b3 b1   b2   b3
# ---  ---  ------  ---  ------  ---  ------  ---  ---
#
# c1:
# 4.1  4.6  3.74.9  5.2  4.75.0  6.1  5.53.9  4.4  3.7
# 4.3  4.9  3.94.6  5.6  4.75.4  6.2  5.93.3  4.3  3.9
# 4.5  4.2  4.15.3  5.8  5.05.7  6.5  5.63.4  4.7  4.0
# 3.8  4.5  4.55.0  5.4  4.55.3  5.7  5.03.7  4.1  4.4
# 4.3  4.8  3.94.6  5.5  4.75.4  6.1  5.93.3  4.2  3.9
#
# c2:
# 4.8  5.6  5.04.9  5.9  5.06.0  6.0  6.14.1  4.9
4.3
# 4.5  5.8  5.25.5  5.3  5.45.7  6.3  5.33.9  4.7  4.1
# 5.0  5.4  4.65.5  5.5  4.75.5  5.7  5.54.3  4.9  3.8
# 4.6  6.1  4.95.3  5.7  5.15.7  5.9  5.84.0  5.3  4.7
# 5.0  5.4  4.75.5  5.5  4.95.5  5.7  5.64.3  4.3  3.8
#
# NOTE: Cut and paste the numbers without the leading # or labels
#

 Y - scan()
 A - gl(4,3, 4*3*2*5, labels=c(a1,a2,a3,a4));
 B - gl(3,1, 4*3*2*5, labels=c(b1,b2,b3));
 C - gl(2,60, 4*3*2*5, labels=c(c1,c2));
 anova(lm(Y~A*B*C))   # all effects and interactions



In the above example, why the number of replications for A is 3, for B
is 1 and for C is 60?
And why 4*3*2*5? Is the 5 because there are 5 lines in each 4*3*2
group?
What is the logic behind this?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] DateTime Math in R - POSIXct

2011-05-30 Thread Bill.Venables

Perhaps because the timezone is specified as a character string and not a 
date-time object complete with timezone.

From the help filr for as.POSIXct.numeric:

origin:a date-time object, or something which can be coerced by 
as.POSIXct(tz=GMT) to such an object. 

Note the coercion.

Bill Venables.


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Galen Moore
Sent: Tuesday, 31 May 2011 12:20 PM
To: r-help@r-project.org
Subject: [R] DateTime Math in R - POSIXct

Greetings - 

 

I'm battling POSIXct, as per the code below.  My input is actually an XL
file, but the weird results below correctly model what I am seeing in my
program.

 

Before I punt and use lubridate or timeDate, could anyone please help me
understand why POSIXct forces my variable back to GMT?

 

I suspect that I'm not properly coding the tzone value, but it does not
throw an error as-is.

 

 

 tstamp - 2011-05-22 11:45:00 MDT

 mode(tstamp)

[1] character

 

 dateP - as.POSIXct(tstamp, origin=1970-01-01, tzone=MDT)

 mode(dateP)

[1] numeric

 dateP

[1] 2011-05-22 11:45:00 MDT

 

 dateN - as.numeric(dateP)

 dateN

[1] 1306086300

 

 dateP2 - as.POSIXct(dateN, origin=1970-01-01, tzone=MDT)

 dateP2

[1] 2011-05-22 18:45:00 MDT

 

Many thanks.

 

Galen Moore


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Adding a numeric to all values in the dataframe

2011-05-20 Thread Bill.Venables

Oops

The first line of my template should use data.matrix() rather than data.frame()

data.matrix() is guaranteed to return a numerical matrix from a data frame, 
making arithmetic always possible.

Bill Venables.


From: Venables, Bill (CMIS, Dutton Park)
Sent: 20 May 2011 14:30
To: 'Ramya'; 'r-help@r-project.org'
Subject: RE: [R] Adding a numeric to all values in the dataframe

For that kind of operation (unusual as it is) work with numeric matrices.  When 
you are finished, if you still want a data frame, make it then, not before.

If your data starts off as data frame to begin with, turn it into a matrix 
first.  E.g.

myMatrix - data.frame(myData)
myMatrix2 - myMatrix + 2
myData2 - data.frame(myMatrix2)

Bill Venables.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Ramya
Sent: Friday, 20 May 2011 1:54 PM
To: r-help@r-project.org
Subject: [R] Adding a numeric to all values in the dataframe

Hi there

I just want to add 2 to all the values in dataframe.

I tried using sapply but it seem to die all the time.

Any help would be appreciated.

Thanks
Ramya

--
View this message in context: 
http://r.789695.n4.nabble.com/Adding-a-numeric-to-all-values-in-the-dataframe-tp3537594p3537594.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Contrasts in Penalized Package

2011-05-20 Thread Bill.Venables

The key line is

prep - .checkinput(match.call(), parent.frame())
 
Among other things the model matrix is built in .checkinput( ) which is not 
exported from the package namespace.  So you have to get rough with it and use

penalized:::.checkinput

and then you see these line of code

options(contrasts = c(unordered = contr.none, ordered = contr.diff))

and the scene is set.  The selection of contrasts is hardwired and to get 
around it you would need to hack the package, not a great idea.  
The namespace will get you if you try to mask contr.none with a function of 
your own, for example.

Bill Venables.



From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
Lars Bishop [lars...@gmail.com]
Sent: 20 May 2011 20:52
To: r-help@r-project.org
Subject: [R] Contrasts in Penalized Package

Hi,

The penalized documentation says that Unordered factors are turned
into as many dummy variables as the factor has levels. This is done
by a function in the package called contr.none. I'm trying to figure
out how exactly is a model matrix created with this contrast option
when the user calls the function with a formula. I typed
library(penalized) ; penalized but couldn't point the part of the
code where this is done.

I'll appreciate any help on this.

Lars.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [r] regression coefficient for different factors

2011-05-20 Thread Bill.Venables

You have received suggestions about this already, but you may want to consider 
something like this as an alternative:

 require(english)
 lev - as.character(as.english(0:9))
 dat - data.frame(f = factor(sample(lev, 500,
+   rep=TRUE), levels = lev),
+   B = rnorm(500))
 dat - within(dat, A - 2 + 3*B + rnorm(B))
 
 ### method 1: using a loop
 coefs - sapply(levels(dat$f),
+ function(x) coef(lm(A ~ B, dat,
+ subset = f == x)))
 t(coefs)
  (Intercept)B
zero 1.967234 2.795218
one  1.864298 3.048861
two  1.978757 2.893950
three2.035777 2.796963
four 2.092047 2.826677
five 2.263936 3.229843
six  1.740911 3.114069
seven1.975918 3.090971
eight2.064802 3.048225
nine 2.030697 3.059960
 
 ### Greg Snow's suggeston - use lmList
 require(nlme)
 coef(lmList(A ~ B | f, dat))
  (Intercept)B
zero 1.967234 2.795218
one  1.864298 3.048861
two  1.978757 2.893950
three2.035777 2.796963
four 2.092047 2.826677
five 2.263936 3.229843
six  1.740911 3.114069
seven1.975918 3.090971
eight2.064802 3.048225
nine 2.030697 3.059960
 

Bill Venables

From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
Francesco Nutini [nutini.france...@gmail.com]
Sent: 20 May 2011 19:17
To: [R] help
Subject: [R] [r] regression coefficient for different factors

Dear R-helpers,

In my dataset I have two continuous variable (A and B) and one factor.
I'm investigating the regression between the two variables usign the command
lm(A ~ B, ...)
but now I want to know the regression coefficient (r2) of A vs. B for every 
factors.
I know that I can obtain this information with excel, but the factor have 68 
levels...maybe [r] have a useful command.

Thanks,

Francesco Nutini

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] not using attach() but in one case ....

2011-05-19 Thread Bill.Venables

 Martin Maechler writes: 
 
 Well, then you don't know  *THE ONE* case where modern users of
 R should use attach() ... as I have been teaching for a while,
 but seem not have got enought students listening ;-) ...
 
   ---  Use it instead of load()  {for save()d R objects} ---
 
 The advantage of attach() over load() there is that loaded
 objects (and there maye be a bunch!), are put into a separate
 place in the search path and will not accidentally overwrite
 objects in the global workspace. 
 
 Of course, there are still quite a few situations {e.g. in
 typical BATCH use of R for simulations, or Sweaving, etc} where
 load() is good enough, and the extras of using attach() are not
 worth it.
 
 But the unconditional  do not use attach() 
 is not quite ok,
 at least not when you talk to non-beginners.
 
 Martin Maechler, ETH Zurich

Curiously this is why I wrote the SOAR package.  Instead of save() you
use Store() (Frank Harrell had already taken 'Save') and instead of
attach() use Attach().  The objects are saved as separate rda files in
a special subdirectory and when Attach() is used on it they are placed
on the search path, normally at position 2, as promises.  The global
environment is kept relatively free of extraneous objects (if you want
to work like that) and the operation is fast and very low on memory
use, (unless you want to use every object you see all at once, of
course).

The track package from Tony Plate achieves somewhat similar results
but with a different method.  Essentially it implements something very
like the old S-PLUS style of keeping everything out of memory as files
in a .Data subdirectory, (although Tony uses the name 'rdatadir'),
unless actually in use.

Bill Venables.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extraction of mean square value from ANOVA

2011-05-19 Thread Bill.Venables

That only applies if you have the same factors a and b each time.  If this is 
the case you can do things in a much more slick way.

u - matrix(rnorm(5000), nrow = 10)  ## NB, nrow
AB - expand.grid(a = letters[1:2], b = letters[1:5])
M - lm(u ~ a+b, AB)
rmsq - colSums(resid(M)^2)/M$df.resid

and Bob's your uncle.  

If you really want to do it quickly you would bypass lm() altogether and use 
something like ls.fit or, at an even lower level, qr() and qr.resid().  There 
are umpteen ways of fitting linear models.

Bill Venables.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Dennis Murphy
Sent: Friday, 20 May 2011 1:38 PM
To: Cheryl Johnson
Cc: r-help@r-project.org
Subject: Re: [R] extraction of mean square value from ANOVA

Hi:

It's easier to use an apply family function than a loop for this type
of problem, as illustrated below:

# Generate 30 random samples of size 10 from a standard
# normal distribution and put them into a matrix
u - matrix(rnorm(300), ncol = 10)
a - factor(rep(1:5, each = 2))
b - factor(rep(1:2, 5))

# Side note: It's not a good idea to name an object 'c' because a
commonly used function in the base package has that name already, as
in c(1, 3, 5)...

# A function to fit the model and to compute the MSE
# deviance() returns the residual sum of squares in a linear model
meansq - function(y) {
m - lm(y ~ a + b)
deviance(m)/df.residual(m)
  }

# Apply the function to each row of u; the result is a vector of MSEs
msevec - apply(u, 1, meansq)
msevec

Note 2: The function works if a and b are in the same environment as
the meansq() function. If you do all of this in the console, there
should be no problem. If you decide to put all of this into a
function, then you need to be more careful.

HTH,
Dennis

On Thu, May 19, 2011 at 6:46 PM, Cheryl Johnson
johnson.cheryl...@gmail.com wrote:
 Hello,

 I am randomly generating values and then using an ANOVA table to find the
 mean square value. I would like to form a loop that extracts the mean square
 value from ANOVA in each iteration. Below is an example of what I am doing.

 a-rnorm(10)
 b-factor(c(1,1,2,2,3,3,4,4,5,5))
 c-factor(c(1,2,1,2,1,2,1,2,1,2))

 mylm-lm(a~b+c)
 anova(mylm)

 Since I would like to use a loop to generate this several times it would be
 helpful to know how to extract the mean square value from ANOVA.

 Thanks

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Adding a numeric to all values in the dataframe

2011-05-19 Thread Bill.Venables

For that kind of operation (unusual as it is) work with numeric matrices.  When 
you are finished, if you still want a data frame, make it then, not before.

If your data starts off as data frame to begin with, turn it into a matrix 
first.  E.g. 

myMatrix - data.frame(myData)
myMatrix2 - myMatrix + 2
myData2 - data.frame(myMatrix2)

Bill Venables.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Ramya
Sent: Friday, 20 May 2011 1:54 PM
To: r-help@r-project.org
Subject: [R] Adding a numeric to all values in the dataframe

Hi there

I just want to add 2 to all the values in dataframe.

I tried using sapply but it seem to die all the time. 

Any help would be appreciated.

Thanks
Ramya

--
View this message in context: 
http://r.789695.n4.nabble.com/Adding-a-numeric-to-all-values-in-the-dataframe-tp3537594p3537594.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Style Guide -- Was Post-hoc tests in MASS using glm.nb

2011-05-18 Thread Bill.Venables

Hi Bert,

I think people should know about the Google Sytle Guide for R because, as I 
said, it represents a thoughtful contribution to the debate.  Most of its 
advice is very good (meaning I agree with it!) but some is a bit too much (for 
example, the blanket advice never to use S4 classes and methods - that's just 
resisting progress, in my view).  The advice on using - for the (normal) 
assingment operator rather than = is also good advice, (according to me), but 
people who have to program in both C and R about equally often may find it a 
bit tedious.  We can argue over that one.

I suggest it has a place in the R FAQ but with a suitable warning that this is 
just one view, albeit a thougtful one.  I don't think it need be included in 
the posting guide, though.  It would take away some of the fun.  :-)

Bill Venables. 

-Original Message-
From: Bert Gunter [mailto:gunter.ber...@gene.com] 
Sent: Wednesday, 18 May 2011 11:47 PM
To: Venables, Bill (CMIS, Dutton Park)
Cc: r-help@r-project.org
Subject: R Style Guide -- Was Post-hoc tests in MASS using glm.nb

Thanks Bill. Do you and others think that a link to this guide (or
another)should be included in the Posting Guide and/or R FAQ?

-- Bert

On Tue, May 17, 2011 at 4:07 PM,  bill.venab...@csiro.au wrote:
 Amen to all of that, Bert.  Nicely put.  The google style guide (not perfect, 
 but a thoughtful contribution on these kinds of issues, has avoiding attach() 
 as its very first line.  See 
 http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html)

 I would add, though, that not enough people seem yet to be aware of 
 within(...), a companion of with(...) in a way, but used for modifying data 
 frames or other kinds of list objects.  It should be seen as a more flexible 
 replacement for transform() (well, almost).

 The difference between with() and within() is as follows:

 with(data, expr, ...)

 allows you to evaluate 'expr' with 'data' providing the primary source for 
 variables, and returns *the evaluated expression* as the result.  By contrast

 within(data, expr, ...)

 again uses 'data' as the primary source for variables when evaluating 'expr', 
 but now 'expr' is used to modify the varibles in 'data' and returns *the 
 modified data set* as the result.

 I use this a lot in the data preparation phase of a project, especially, 
 which is usually the longest, trickiest, most important, but least discussed 
 aspect of any data analysis project.

 Here is a simple example using within() for something you cannot do in one 
 step with transform():

 polyData - within(data.frame(x = runif(500)), {
  x2 - x^2
  x3 - x*x2
  b - runif(4)
  eta - cbind(1,x,x2,x3) %*% b
  y - eta + rnorm(x, sd = 0.5)
  rm(b)
 })

 check:

 str(polyData)
 'data.frame':   500 obs. of  5 variables:
  $ x  : num  0.5185 0.185 0.5566 0.2467 0.0178 ...
  $ y  : num [1:500, 1] 1.343 0.888 0.583 0.187 0.855 ...
  $ eta: num [1:500, 1] 1.258 0.788 1.331 0.856 0.63 ...
  $ x3 : num  1.39e-01 6.33e-03 1.72e-01 1.50e-02 5.60e-06 ...
  $ x2 : num  0.268811 0.034224 0.309802 0.060844 0.000315 ...


 Bill Venables.

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of Bert Gunter
 Sent: Wednesday, 18 May 2011 12:08 AM
 To: Peter Ehlers
 Cc: R list
 Subject: Re: [R] Post-hoc tests in MASS using glm.nb

 Folks:

 Only if the user hasn't yet been introduced to the with() function,
 which is linked to on the ?attach page.

 Note also this sentence from the ?attach page:
   attach can lead to confusion.

 I can't remember the last time I needed attach().

 Peter Ehlers

 Yes. But perhaps it might be useful to flesh this out with a bit of
 commentary. To this end, I invite others to correct or clarify the
 following.

 The potential confusion comes from requiring R to search for the
 data. There is a rigorous process by which this is done, of course,
 but it requires that the runtime environment be consistent with that
 process, and the programmer who wrote the code may not have control
 over that environment. The usual example is that one has an object
 named,say,  a in the formula and in the attached data and another
 a also in the global environment. Then the wrong a would be found.
 The same thing can happen if another data set gets attached in a
 position before the one of interest. (Like Peter, I haven't used
 attach() in so long that I don't know whether any warning messages are
 issued in such cases).

 Using the data =  argument when available or the with() function
 when not avoids this potential confusion and tightly couples the data
 to be analyzed with the analysis.

 I hope this clarifies the previous posters' comments.

 Cheers,
 Bert


 [... non-germane material snipped ...]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

Re: [R] R Style Guide -- Was Post-hoc tests in MASS using glm.nb

2011-05-18 Thread Bill.Venables

I used to think like that.  However I have recently re-read John Chambers' 
Software for Data Analysis and now I'm starting to see the point.

S4 classes and methods do require you to plan your classes and methods well and 
the do impose a discipline that can seem rigid and unnecessary.  But I have 
found that to program well you do need to exerceise a lot of discipline, mainly 
because it can take quite some time to spot all the inadequacies and even traps 
in your code that an ill-disciplined approach lets you get away with at first.

IMHO, of course.  Perhaps you can all see the traps that elude me.  

Cheers,
Bill.

PS Rolf Turner?  Respectful?  Goodness, what's going on?  :-)

-Original Message-
From: Rolf Turner [mailto:rolf.tur...@xtra.co.nz] 
Sent: Thursday, 19 May 2011 9:34 AM
To: Venables, Bill (CMIS, Dutton Park)
Cc: r-help@r-project.org
Subject: Re: [R] R Style Guide -- Was Post-hoc tests in MASS using glm.nb


On 19/05/11 10:26, bill.venab...@csiro.au wrote:

SNIP
 Most of [the Google style guide's] advice is very good (meaning I agree with 
 it!) but some is a bit too much (for example, the blanket advice never to use 
 S4 classes and methods - that's just resisting progress, in my view).
SNIP

I must respectfully disagree with this view, and concur heartily with 
the style guide.
S4 classes and methods are a ball-and-chain that one has to drag along.  
See also
fortune(S4 methods). :-)

 cheers,

 Rolf

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Post-hoc tests in MASS using glm.nb

2011-05-17 Thread Bill.Venables

Amen to all of that, Bert.  Nicely put.  The google style guide (not perfect, 
but a thoughtful contribution on these kinds of issues, has avoiding attach() 
as its very first line.  See 
http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html)

I would add, though, that not enough people seem yet to be aware of 
within(...), a companion of with(...) in a way, but used for modifying data 
frames or other kinds of list objects.  It should be seen as a more flexible 
replacement for transform() (well, almost).

The difference between with() and within() is as follows:

with(data, expr, ...) 

allows you to evaluate 'expr' with 'data' providing the primary source for 
variables, and returns *the evaluated expression* as the result.  By contrast

within(data, expr, ...) 

again uses 'data' as the primary source for variables when evaluating 'expr', 
but now 'expr' is used to modify the varibles in 'data' and returns *the 
modified data set* as the result.

I use this a lot in the data preparation phase of a project, especially, which 
is usually the longest, trickiest, most important, but least discussed aspect 
of any data analysis project.  

Here is a simple example using within() for something you cannot do in one step 
with transform():

polyData - within(data.frame(x = runif(500)), {
  x2 - x^2
  x3 - x*x2   
  b - runif(4)
  eta - cbind(1,x,x2,x3) %*% b   
  y - eta + rnorm(x, sd = 0.5)  
  rm(b)
})

check:

 str(polyData)
'data.frame':   500 obs. of  5 variables:
 $ x  : num  0.5185 0.185 0.5566 0.2467 0.0178 ...
 $ y  : num [1:500, 1] 1.343 0.888 0.583 0.187 0.855 ...
 $ eta: num [1:500, 1] 1.258 0.788 1.331 0.856 0.63 ...
 $ x3 : num  1.39e-01 6.33e-03 1.72e-01 1.50e-02 5.60e-06 ...
 $ x2 : num  0.268811 0.034224 0.309802 0.060844 0.000315 ...
 

Bill Venables.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Bert Gunter
Sent: Wednesday, 18 May 2011 12:08 AM
To: Peter Ehlers
Cc: R list
Subject: Re: [R] Post-hoc tests in MASS using glm.nb

Folks:

 Only if the user hasn't yet been introduced to the with() function,
 which is linked to on the ?attach page.

 Note also this sentence from the ?attach page:
   attach can lead to confusion.

 I can't remember the last time I needed attach().

 Peter Ehlers

Yes. But perhaps it might be useful to flesh this out with a bit of
commentary. To this end, I invite others to correct or clarify the
following.

The potential confusion comes from requiring R to search for the
data. There is a rigorous process by which this is done, of course,
but it requires that the runtime environment be consistent with that
process, and the programmer who wrote the code may not have control
over that environment. The usual example is that one has an object
named,say,  a in the formula and in the attached data and another
a also in the global environment. Then the wrong a would be found.
The same thing can happen if another data set gets attached in a
position before the one of interest. (Like Peter, I haven't used
attach() in so long that I don't know whether any warning messages are
issued in such cases).

Using the data =  argument when available or the with() function
when not avoids this potential confusion and tightly couples the data
to be analyzed with the analysis.

I hope this clarifies the previous posters' comments.

Cheers,
Bert


 [... non-germane material snipped ...]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Post-hoc tests in MASS using glm.nb

2011-05-17 Thread Bill.Venables

PS I should have followed the example with one using with() for something that 
would often be done with attach():  Consider:

with(polyData, {
  plot(x, y, pch=.)
  o - order(x)
  lines(x[o], eta[o], col = red)
})

I use this kind of dodge a lot, too, but now you can mostly use data= arguments 
on the individual functions.

Bill Venables.

-Original Message-
From: Venables, Bill (CMIS, Dutton Park) 
Sent: Wednesday, 18 May 2011 9:07 AM
To: 'Bert Gunter'; 'Peter Ehlers'
Cc: 'R list'
Subject: RE: [R] Post-hoc tests in MASS using glm.nb

Amen to all of that, Bert.  Nicely put.  The google style guide (not perfect, 
but a thoughtful contribution on these kinds of issues, has avoiding attach() 
as its very first line.  See 
http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html)

I would add, though, that not enough people seem yet to be aware of 
within(...), a companion of with(...) in a way, but used for modifying data 
frames or other kinds of list objects.  It should be seen as a more flexible 
replacement for transform() (well, almost).

The difference between with() and within() is as follows:

with(data, expr, ...) 

allows you to evaluate 'expr' with 'data' providing the primary source for 
variables, and returns *the evaluated expression* as the result.  By contrast

within(data, expr, ...) 

again uses 'data' as the primary source for variables when evaluating 'expr', 
but now 'expr' is used to modify the varibles in 'data' and returns *the 
modified data set* as the result.

I use this a lot in the data preparation phase of a project, especially, which 
is usually the longest, trickiest, most important, but least discussed aspect 
of any data analysis project.  

Here is a simple example using within() for something you cannot do in one step 
with transform():

polyData - within(data.frame(x = runif(500)), {
  x2 - x^2
  x3 - x*x2   
  b - runif(4)
  eta - cbind(1,x,x2,x3) %*% b   
  y - eta + rnorm(x, sd = 0.5)  
  rm(b)
})

check:

 str(polyData)
'data.frame':   500 obs. of  5 variables:
 $ x  : num  0.5185 0.185 0.5566 0.2467 0.0178 ...
 $ y  : num [1:500, 1] 1.343 0.888 0.583 0.187 0.855 ...
 $ eta: num [1:500, 1] 1.258 0.788 1.331 0.856 0.63 ...
 $ x3 : num  1.39e-01 6.33e-03 1.72e-01 1.50e-02 5.60e-06 ...
 $ x2 : num  0.268811 0.034224 0.309802 0.060844 0.000315 ...
 

Bill Venables.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Bert Gunter
Sent: Wednesday, 18 May 2011 12:08 AM
To: Peter Ehlers
Cc: R list
Subject: Re: [R] Post-hoc tests in MASS using glm.nb

Folks:

 Only if the user hasn't yet been introduced to the with() function,
 which is linked to on the ?attach page.

 Note also this sentence from the ?attach page:
   attach can lead to confusion.

 I can't remember the last time I needed attach().

 Peter Ehlers

Yes. But perhaps it might be useful to flesh this out with a bit of
commentary. To this end, I invite others to correct or clarify the
following.

The potential confusion comes from requiring R to search for the
data. There is a rigorous process by which this is done, of course,
but it requires that the runtime environment be consistent with that
process, and the programmer who wrote the code may not have control
over that environment. The usual example is that one has an object
named,say,  a in the formula and in the attached data and another
a also in the global environment. Then the wrong a would be found.
The same thing can happen if another data set gets attached in a
position before the one of interest. (Like Peter, I haven't used
attach() in so long that I don't know whether any warning messages are
issued in such cases).

Using the data =  argument when available or the with() function
when not avoids this potential confusion and tightly couples the data
to be analyzed with the analysis.

I hope this clarifies the previous posters' comments.

Cheers,
Bert


 [... non-germane material snipped ...]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__

Re: [R] simprof test using jaccard distance

2011-05-17 Thread Bill.Venables

The documentation for simprof says, with respect to the method.distance 
argument, This value can also be any function which returns a dist object. 

So you should be able to use the Jaccard index by setting up your own function 
to compute it.  e.g.

Jaccard - function(X) vegan::vegdist(X, method = jaccard)
tst - simprof( , method.distance = Jaccard, .)

by rights, ought to do the job.

Bill Venables.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of maia.ber...@csiro.au
Sent: Tuesday, 17 May 2011 6:03 PM
To: r-help@r-project.org
Subject: [ExternalEmail] [R] simprof test using jaccard distance

Dear All,
I would like to use the simprof function (clustsig package) but the available 
distances do not include Jaccard distance, which is the most appropriate for 
pres/abs community data. Here is the core of the function:
 simprof
function (data, num.expected = 1000, num.simulated = 999, method.cluster = 
average,
method.distance = euclidean, method.transform = identity,
alpha = 0.05, sample.orientation = row, const = 0, silent = TRUE,
increment = 100)
{
if (!is.matrix(data))
data - as.matrix(data)
rawdata - data
if (sample.orientation == column)
rawdata - t(rawdata)
if (is.function(method.distance))
rawdata.dist - method.distance(rawdata)
else if (method.distance == braycurtis) {
rawdata.dist - braycurtis(rawdata, const)
if (!is.null(rownames(rawdata)))
attr(rawdata.dist, Labels) - rownames(rawdata)
}
else rawdata.dist - dist(rawdata, method = method.distance)
if (!method.transform == identity)
rawdata - trans(rawdata, method.transform)
hclust.results - hclust(rawdata.dist, method = method.cluster)
pMatrix - cbind(hclust.results$merge, matrix(data = NA,
nrow = nrow(hclust.results$merge), ncol = 1))
simprof.results - simprof.body(rawdata = rawdata, num.expected = 
num.expected,
num.simulated = num.simulated, method.cluster = method.cluster,
method.distance = method.distance, originaldata = rawdata,
alpha = alpha, clust.order = hclust.results$merge, startrow = 
nrow(hclust.results$merge),
pMatrix = pMatrix, const = const, silent = silent, increment = 
increment)
results - list()
results[[numgroups]] - length(simprof.results$samples)
results[[pval]] - simprof.results$pval
results[[hclust]] - hclust.results
if (!is.null(rownames(rawdata))) {
for (i in 1:length(simprof.results$samples)) {
for (j in 1:length(simprof.results$samples[[i]])) {
simprof.results$samples[[i]][j] - 
rownames(rawdata)[as.numeric(simprof.results$samples[[i]][j])]
}
}
}
results[[significantclusters]] - simprof.results$samples
return(results)
}
environment: namespace:clustsig

I tried to trick the function by bypassing the input of a raw community matrix 
(sites x species) by inputing instead a distance matrix already calculated with 
vegdist (vegan package) (Jaccard distance is available in vegdist, but not in 
the function dist used in simprof...)
I therefore modified the function as follow:

simprof2=function (data, num.expected = 1000, num.simulated = 999, 
method.cluster = average,
alpha = 0.05, sample.orientation = row, const = 0, silent = TRUE,
increment = 100)
{   hclust.results - hclust(data, method = method.cluster)
pMatrix - cbind(hclust.results$merge, matrix(data = NA,
nrow = nrow(hclust.results$merge), ncol = 1))
simprof.results - simprof.body(data = data, num.expected = num.expected,
num.simulated = num.simulated, method.cluster = 
method.cluster,originaldata = data,
alpha = alpha, clust.order = hclust.results$merge, startrow = 
nrow(hclust.results$merge),
pMatrix = pMatrix, const = const, silent = silent, increment = 
increment)
results - list()
results[[numgroups]] - length(simprof.results$samples)
results[[pval]] - simprof.results$pval
results[[hclust]] - hclust.results
if (!is.null(rownames(data))) {
for (i in 1:length(simprof.results$samples)) {
for (j in 1:length(simprof.results$samples[[i]])) {
simprof.results$samples[[i]][j] - 
rownames(data)[as.numeric(simprof.results$samples[[i]][j])]
}
}
}
results[[significantclusters]] - simprof.results$samples
return(results)
}
But upon running it on my vegdist object, I get the following error:
 simprof2(G3_jaccard)
Error in simprof2(G3_jaccard) : could not find function simprof.body

I guess this function is part of the initial environment in which simprof runs, 
but how do I access it in order to call it when I run my own function outside 
of the initial environment?
Another possible but maybe more complex way would be to define the jaccard 
distance (as J= 2B/(1+B) B beeing BrayCurtis distance already used in the 
initial function)

Re: [R] Multiple plots on one device using stl

2011-05-17 Thread Bill.Venables

If you 

?plot.stl

you will see that that the second argument, set.pars, is a list of argument 
settings for par(), including a (variable) default setting for mfrow.  I.e. 
plot.stl overrides your external setting (which will also override any layout() 
setting).  

It looks like to override it back, you may need to take charge yourself.  Here 
is the gist of a partial way round it, perhaps, sort of. ...


par(mar = c(0, 6, 0, 6), oma = c(6, 0, 4, 0),
tck = -0.01, mfcol = c(4, 2))

plot(stl.1, set.pars = list())
plot(stl.2, set.pars = list())


Things might be a bit more flexible if you were to use xyplot.stl in the 
latticeExtra package.  With lattice the operations of making the plot and 
displaying it are more cleanly separated.

Bill Venables.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Ben Madin
Sent: Wednesday, 18 May 2011 11:55 AM
To: r-help@r-project.org
Subject: [R] Multiple plots on one device using stl

G'day,

I am looking at monthly reports, and have three series of monthly data from 
2007 to 2009. I would like to show the season decomposition of these two series 
side by side on the one device, however using plot doesn't seem to respect any 
use of layout(matrix(1:3, ncol=3)) or par(mfcol=c(1,3)).

I'm guessing that this means that the plot(stl) perhaps uses them, but I can't 
find anywhere the / a plot.stl() - ie, I can't work out where the plot() call 
is going? 

I've attached a small example of data and some probably overly verbose code. 
There are only two sets of example data, not three as mentioned above.

# load the data
vals.1 -
structure(list(year = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c(2007, 
2008, 2009), class = c(ordered, factor)), month = structure(c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 
5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
8L, 9L, 10L, 11L, 12L), .Label = c(Jan, Feb, Mar, Apr, 
May, Jun, Jul, Aug, Sep, Oct, Nov, Dec), class = c(ordered, 
factor)), count = c(105, 100, 64, 44, 49, 65, 88, 90, 99, 92, 
93, 88, 212, 146, 96, 131, 220, 143, 137, 138, 395, 362, 349, 
222, 294, 268, 298, 310, 426, 348, 287, 101, 66, 112, 105, 4)), .Names = 
c(year, 
month, count), row.names = c(1, 2, 3, 4, 5, 6, 
7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 
29, 30, 31, 32, 33, 34, 35, 36), class = data.frame)
vals.2 -
structure(list(year = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c(2007, 
2008, 2009), class = c(ordered, factor)), month = structure(c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 
5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
8L, 9L, 10L, 11L), .Label = c(Jan, Feb, Mar, Apr, May, 
Jun, Jul, Aug, Sep, Oct, Nov, Dec), class = c(ordered, 
factor)), count = c(45, 34, 17, 6, 7, 16, 12, 11, 14, 17, 11, 
20, 27, 12, 10, 14, 22, 23, 92, 144, 385, 274, 320, 252, 240, 
146, 222, 142, 122, 117, 163, 89, 51, 89, 108)), .Names = c(year, 
month, count), row.names = c(1, 2, 3, 4, 5, 6, 
7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 
29, 30, 31, 32, 33, 34, 35), class = data.frame)

# create the time series
series.1 - ts(vals.1$count, start=c(2007,1), freq=12)
series.2 - ts(vals.2$count, start=c(2007,1), freq=12)

# apply the seasonal decomposition
stl.1 - stl(series.1, per, robust=TRUE)
stl.2 - stl(series.2, per, robust=TRUE)

# set up the device for side by side display
layout(matrix(1:2, ncol=2))

# little check
layout.show(2)

# plot the first series and second series
plot(stl.1, labels = c('Count by month','Seasonal 
Component','Trend','Remainder'), main='Series.1 Decomposition')
plot(stl.2, labels = c('Count by month','Seasonal 
Component','Trend','Remainder'), main='Series.2 Decomposition')

# hmm, used whole device twice, try again

par(mfcol=c(1,2))

# and now the second

plot(stl.1, labels = c('Count by month','Seasonal 
Component','Trend','Remainder'), main='Series.1 Decomposition')
plot(stl.2, labels = c('Count by month','Seasonal 
Component','Trend','Remainder'), main='Series.2 Decomposition')

# oh what about split.screen()

split.screen(c(1,2))
screen(1)
# now plot
plot(stl.1, labels = c('Count by month','Seasonal 
Component','Trend','Remainder'), main='Series.1 Decomposition') 

# something wrong with the plot, not seeing original series at the top.
screen(2)

# Error in par(split.screens[[n]]) : parameter j in mfg is out of range




cheers

Ben



SessionInfo()
R version 2.13.0 Patched (2011-05-05 r55784)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1]

Re: [R] Extracting the dimnames of an array with variable dimensions

2011-05-16 Thread Bill.Venables

Here is an alternative solution

 foo - array(data = rnorm(32), dim = c(4,4,2),
+ dimnames=list(letters[1:4], LETTERS[1:4], letters[5:6]))
 
 ind - which(foo  0, arr.ind = TRUE)
 row.names(ind) - NULL  ## to avoid warnings.
 
 mapply([, dimnames(foo), data.frame(ind))
  [,1] [,2] [,3]
 [1,] a  A  e 
 [2,] b  A  e 
 [3,] a  B  e 
 [4,] c  B  e 
 [5,] a  C  e 
 [6,] c  C  e 
 [7,] b  D  e 
 [8,] a  A  f 
 [9,] b  B  f 
[10,] c  B  f 
[11,] d  B  f 
[12,] a  C  f 
[13,] d  C  f 
[14,] a  D  f 
[15,] b  D  f 

Bill Venables. 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Dennis Murphy
Sent: Monday, 16 May 2011 3:14 PM
To: Pierre Roudier
Cc: r-help@r-project.org
Subject: Re: [R] Extracting the dimnames of an array with variable dimensions

Hi:

Does it have to be an array?  If all you're interested in is the
dimnames, how about this?

library(plyr)
foo - array(data = rnorm(32), dim = c(4,4,2),
   dimnames=list(letters[1:4], LETTERS[1:4], letters[5:6]))
 foo
, , e

   A  B  C  D
a -0.2183877 -0.8912908 -2.0175612 -0.8080548
b  0.4870784 -0.8626293 -0.5641368 -0.5219722
c  0.8821044  0.3187850  1.2203297 -0.3151186
d -0.9894656 -1.1779108  0.9853935  0.3560747

, , f

   A  B  C  D
a  0.7357773 -1.7591637  1.6320887  1.2248529
b  0.4662315  0.1131432 -0.9790887 -0.6575306
c -0.3564725 -0.9202688  0.1017894  0.7382683
d  0.2825117  0.9242299  0.3577063 -1.3297339

# flatten array into a data frame with dimnames as factors
# adply() converts an array to a data frame, applying a function
# along the stated dimensions
 u - adply(foo, c(1, 2, 3), as.vector)
subset(u, V1  0)[, 1:3]
   X1 X2 X3
2   b  A  e
3   c  A  e
7   c  B  e
11  c  C  e
12  d  C  e
16  d  D  e
17  a  A  f
18  b  A  f
20  d  A  f
22  b  B  f
24  d  B  f
25  a  C  f
27  c  C  f
28  d  C  f
29  a  D  f
31  c  D  f

HTH,
Dennis

On Sun, May 15, 2011 at 9:20 PM, Pierre Roudier
pierre.roud...@gmail.com wrote:
 Hi list,

 In a function I am writing, I need to extract the dimension names of
 an array. I know this can be acheived easily using dimnames() but my
 problem is that I want my function to be robust when the number of
 dimensions varies. Consider the following case:

 foo - array(data = rnorm(32), dim = c(4,4,2),
 dimnames=list(letters[1:4], LETTERS[1:4], letters[5:6]))

 # What I want is to extract the *names of the dimensions* for which
 foo have positive values:
 ind - which(foo  0, arr.ind = TRUE)

 # A first solution is:
 t(apply(ind, 1, function(x) unlist(dimnames(foo[x[1], x[2], x[3],
 drop=FALSE]
 # But it does require to know the dimensions of foo

 I would like to do something like:

 ind - which(foo  0, arr.ind = TRUE)
 t(apply(ind, 1, function(x) unlist(dimnames(foo[x, drop=FALSE]

 but in that case the dimnames are dropped.

 Any suggestion?

 Cheers,

 Pierre

 --
 Scientist
 Landcare Research, New Zealand

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Post-hoc tests in MASS using glm.nb

2011-05-16 Thread Bill.Venables

?relevel

Also, you might want to fit the models as follows

Model1 - glm.nb(Cells ~ Cryogel*Day, data = myData)

myData2 - within(myData, Cryogel - relevel(Cryogel, ref = 2))
Model2 - update(Model1, data = myData1) 

c

You should always spedify the data set when you fit a model if at all possible. 
 I would recommend you NEVER use attach() to put it on the search path, (under 
all but the most exceptional circumstances).

You could fit your model as 

Model0 - glm.nv(Cells ~ interaction(Cryogel, Day) - 1, data = myData)

This will give you the subclass means as the regression coefficients.  You can 
then use vcov(Model0) to get the variance matrix and compare any two you like 
using directly calculated t-statistics.  This is pretty straightforward as well.

Bill Venables.


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of bryony
Sent: Tuesday, 17 May 2011 3:46 AM
To: r-help@r-project.org
Subject: [R] Post-hoc tests in MASS using glm.nb

I am struggling to generate p values for comparisons of levels (post-hoc
tests) in a glm with a negative binomial distribution

I am trying to compare cell counts on different days as grown on different
media (e.g. types of cryogel) so I have 2 explanatory variables (Day and
Cryogel), which are both factors, and an over-dispersed count variable
(number of cells) as the response. I know that both variables are
significant, and that there is a significant interaction between them.
However, I seem unable to generate multiple comparisons between the days and
cryogels. 

So my model is 

Model1-glm.nb(Cells~Cryogel+Day+Day:Cryogel)

The output gives me comparisons between levels of the factors relative to a
reference level (Day 0 and Cryogel 1) as follows:

Coefficients:
   Estimate Std. Error z value Pr(|z|)
(Intercept)  1.2040 0.2743   4.389 1.14e-05 ***
Day143.3226 0.3440   9.658   2e-16 ***
Day283.3546 0.3440   9.752   2e-16 ***
Day7 3.3638 0.3440   9.779   2e-16 ***
Cryogel2 0.7097 0.3655   1.942  0.05215 .  
Cryogel3 0.7259 0.3651   1.988  0.04677 *  
Cryogel4 1.4191 0.3539   4.010 6.07e-05 ***
Day14:Cryogel2  -0.7910 0.4689  -1.687  0.09162 .  
Day28:Cryogel2  -0.5272 0.4685  -1.125  0.26053
Day7:Cryogel2   -1.1794 0.4694  -2.512  0.01199 *  
Day14:Cryogel3  -1.0833 0.4691  -2.309  0.02092 *  
Day28:Cryogel3   0.1735 0.4733   0.367  0.71395
Day7:Cryogel3   -1.0907 0.4690  -2.326  0.02003 *  
Day14:Cryogel4  -1.2834 0.4655  -2.757  0.00583 ** 
Day28:Cryogel4  -0.6300 0.4591  -1.372  0.16997
Day7:Cryogel4   -1.3436 0.4596  -2.923  0.00347 ** 


HOWEVER I want ALL the comparisons e.g. Cryogel 2 versus 4, 3 versus 2 etc
on each of the days. I realise that such multiple comparsions need to be
approached with care to avoid Type 1 error, however it is easy to do this in
other programmes (e.g. SPSS, Genstat) and I'm frustrated that it appears to
be difficult in R. I have tried the glht (multcomp) function but it gives me
the same results. I assume that there is some way of entering the data
differently so as to tell R to use a different reference level each time and
re-run the analysis for each level, but don't know what this is.
Please help!

Many thanks for your input

Bryony

--
View this message in context: 
http://r.789695.n4.nabble.com/Post-hoc-tests-in-MASS-using-glm-nb-tp3526934p3526934.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Filtering out bad data points

2011-05-09 Thread Bill.Venables

You could use a function to do the job:

withinRange - function(x, r = quantile(x, c(0.05, 0.95)))
x = r[1]  x = r[2]

dtest2 - subset(dftest, withinRange(x)) 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Robert A'gata
Sent: Tuesday, 10 May 2011 10:57 AM
To: r-help@r-project.org
Subject: [R] Filtering out bad data points

Hi,

I always have a question about how to do this best in R. I have a data
frame and a set of criteria to filter points out. My procedure is to
always locate indices of those points, check if index vector length is
greater than 0 or not and then remove them. Meaning

dftest - data.frame(x=rnorm(100),y=rnorm(100));
qtile - quantile(dftest$x,probs=c(0.05,0.95));
badIdx - which((dftest$x  qtile[1]) | (dftest$x  qtile[2]));
if (length(badIdx)  0) {
dftest - dftest[-idx,];
}

My question is that is there a more streamlined way to achieve this? Thank you.

Cheers,

Robert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Creating Observation ID

2011-05-09 Thread Bill.Venables

Here is one way:

df - data.frame(Value = rnorm(30),
 Group = sample(c('A','B','C'), 30,
 replace = TRUE))

## make a little function to do the job
iNumber - function(f) {
  f - as.factor(f)
  X - outer(f, levels(f), ==)+0
  rowSums(X * apply(X, 2, cumsum))
}

## add the numbering column
df - within(df, internalNumber - iNumber(Group))

## Check that it works
 head(df)
   Value Group internalNumber
1 -1.5014788 C  1
2  0.6035679 C  2
3 -0.6953930 C  3
4 -0.2413863 A  1
5 -0.1170961 A  2
6  1.5110721 C  4 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Robert Baer
Sent: Tuesday, 10 May 2011 7:22 AM
To: R-help@r-project.org
Subject: [R] Creating Observation ID

If I have a data frame something like:
Value=rnorm(30)
Group = sample(c('A','B','C'), 30, replace=TRUE)
df = data.frame(Value, Group)

It seems like it should be simple to create an 'ObsID' column which indicates 
the observation order of each Value within each of the 3 groups.  Somehow, I 
can't quite see how to do it without manually sub-setting the parent data frame 
and then putting it back together again.  

Anyone able to get me started on a cleaner (more R-like) approach?

Thanks,

Rob

--
Robert W. Baer, Ph.D.
Professor of Physiology
Kirksville College of Osteopathic Medicine
A. T. Still University of Health Sciences
800 W. Jefferson St.
Kirksville, MO 63501
660-626-2322
FAX 660-626-2965

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] is this an ANOVA ?

2011-04-13 Thread Bill.Venables

You probably want to do something like this:

 fm - lm(y ~ x, MD)
 anova(fm)
Analysis of Variance Table

Response: y
  Df Sum Sq Mean Sq F valuePr(F)
x  2250   125.0  50 1.513e-06
Residuals 12 30 2.5   


Answers to questions:

1. No.
2. Yes.

(whoever you are). 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Ubuntu Diego
Sent: Wednesday, 13 April 2011 10:10 AM
To: r-help@r-project.org
Subject: [R] is this an ANOVA ?

Hi all,
I have a very easy questions (I hope). I had measure a property of 
plants, growing in three different substrates (A, B and C). The rest of the 
conditions remained constant. There was very high variation on the results. 
I want to do address, whether there is any difference in the response 
(my measurement) from substrate to substrate?

x-c('A','A','A','A','A','B','B','B','B','B','C','C','C','C','C') # Substrate 
type
y - c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) # Results of the measurement
MD-data.frame(x,y)

I wrote a linear model for this:

summary(lm(y~x,data=MD))

This is the output:

Call:
lm(formula = y ~ x, data = MD)

Residuals:
   Min 1Q Median 3QMax 
-2.000e+00 -1.000e+00  5.551e-17  1.000e+00  2.000e+00 

Coefficients:
Estimate Std. Error t value Pr(|t|)
(Intercept)   3. 0.7071   4.243 0.001142 ** 
xB5. 1.   5.000 0.000309 ***
xC   10. 1.  10.000 3.58e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 1.581 on 12 degrees of freedom
Multiple R-squared: 0.8929, Adjusted R-squared: 0.875 
F-statistic:50 on 2 and 12 DF,  p-value: 1.513e-06 

I conclude that there is an effect of substrate type (x). 
NOW the questions :
1) Do the fact that the all p-values are significant means that 
all the groups are different from each other ?
2) Is there a (easy) way to plot,  mean plus/minus 2*sd for 
each substrate type ? (with asterisks denoting significant differences ?)


THANKS !

version
platform   x86_64-apple-darwin9.8.0 
arch   x86_64   
os darwin9.8.0  
system x86_64, darwin9.8.0  
status  
major  2
minor  11.1 
year   2010 
month  05   
day31   
svn rev52157
language   R
version.string R version 2.11.1 (2010-05-31)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multinom() residual deviance

2011-04-09 Thread Bill.Venables

The residual deviance from a multinomial model is numerically equal (up to 
round-off error) to that you would get had you fitted the model as a surrogate 
Poisson generalized linear model.  Here is a short demo building on your example

 set.seed(101)
 df - data.frame(f = sample(letters[1:3], 500, rep = TRUE),
+  x = rnorm(500))
 library(nnet)
 mm - multinom(f ~ x, df)
# weights:  9 (4 variable)
initial  value 549.306144 
final  value 548.246724 
converged
 
 X - with(df, outer(f, levels(f), ==)+0)
 
 DF - within(expand.grid(a = 1:500, b = letters[1:3]),
+  a - factor(a))
 DF - cbind(DF, f = as.vector(X), x = df$x)
 
 
 gm - glm(f ~ a + b/x, poisson, DF)  ## surrogate Poisson equivalent model
 
 deviance(gm)  ## surrogate Poisson
[1] 1096.493
 deviance(mm)  ## multinomial from nnet package
[1] 1096.493
 

Bill Venables

From: Sascha Vieweg [saschav...@gmail.com]
Sent: 09 April 2011 17:06
To: Venables, Bill (CMIS, Dutton Park)
Cc: saschav...@gmail.com; r-help@r-project.org
Subject: RE: [R] multinom() residual deviance

Hello

I am aware of the differences between the two models, excuse me
for being imprecise on that in my first posting. My question only
regards the nature or structure of the deviance, and thus
whether the residual deviance of the multinomial model is the same
residual deviance as reported by, say, glm. This is particularly
important, since I want to calculate a pseudo R-Squared as
follows (data from below):

library(nnet)
m - multinom(y ~ ., data=df)
(llnull - deviance(update(m, . ~ 1, trace=F)))
(llmod - deviance(m))
(RMcFadden - 1 - (llmod/llnull))

(I know that many statisticians here highly discourage people from
using the pseudo R-Squareds, however, not many editors read this
mailing list and still insist.)

The MASS book warning alerted me that the multinom residual
deviance may be of principally different structure/nature than the
glm one. Thus, while the calculation above holds for a glm, it
does not for a multinom model. Am I right? And how to fix?


On 11-04-09 05:43, bill.venab...@csiro.au wrote:

 The two models you fit are quite different.  The first is a binomial model 
 equivalent to

 fm - glm(I(y == a) ~ x, binomial, df)

 which you can check leads to the same result.  I.e. this model amalgamates 
 classes b and c into one.

 The second is a multivariate logistic model that considers all three classes 
 defined by your factor y, (and has twice the number of parameters, among 
 other things).  The three clases, a, b and c remain separate in the 
 model.

 Hence the two models are not directly comparable, so why should the deviance 
 be?

 Bill Venables.
 
 From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf 
 Of Sascha Vieweg [saschav...@gmail.com]
 Sent: 09 April 2011 01:14
 To: r-help@r-project.org
 Subject: [R] multinom() residual deviance

 Running a binary logit model on the data

 df - data.frame(y=sample(letters[1:3], 100, repl=T),
 x=rnorm(100))

 reveals some residual deviance:

 summary(glm(y ~ ., data=df, family=binomial(logit)))

 However, running a multinomial model on that data (multinom, nnet)
 reveals a residual deviance:

 summary(multinom(y ~ ., data=df))

 On page 203, the MASS book says that here the deviance is
 comparing with the model that correctly predicts each person, not
 the multinomial response for each cell of the mininal model,
 followed by and instruction how to compare with the saturated
 model.

 For me as a beginner, this sounds like an important warning,
 however, I don't know what the warning exactly means and hence do
 not know what the difference between the residual deviance of the
 former (binary) and the latter (multinomial) model is.

 (I need the deviances to calculate some of the pseudo R-squares
 with function pR2(), package pscl.)

 Could you give good advice?

 Thanks
 *S*

 --
 Sascha Vieweg, saschav...@gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


--
Sascha Vieweg, saschav...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multinom() residual deviance

2011-04-08 Thread Bill.Venables

The two models you fit are quite different.  The first is a binomial model 
equivalent to

fm - glm(I(y == a) ~ x, binomial, df)

which you can check leads to the same result.  I.e. this model amalgamates 
classes b and c into one.

The second is a multivariate logistic model that considers all three classes 
defined by your factor y, (and has twice the number of parameters, among other 
things).  The three clases, a, b and c remain separate in the model.

Hence the two models are not directly comparable, so why should the deviance be?

Bill Venables.

From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
Sascha Vieweg [saschav...@gmail.com]
Sent: 09 April 2011 01:14
To: r-help@r-project.org
Subject: [R] multinom() residual deviance

Running a binary logit model on the data

df - data.frame(y=sample(letters[1:3], 100, repl=T),
x=rnorm(100))

reveals some residual deviance:

summary(glm(y ~ ., data=df, family=binomial(logit)))

However, running a multinomial model on that data (multinom, nnet)
reveals a residual deviance:

summary(multinom(y ~ ., data=df))

On page 203, the MASS book says that here the deviance is
comparing with the model that correctly predicts each person, not
the multinomial response for each cell of the mininal model,
followed by and instruction how to compare with the saturated
model.

For me as a beginner, this sounds like an important warning,
however, I don't know what the warning exactly means and hence do
not know what the difference between the residual deviance of the
former (binary) and the latter (multinomial) model is.

(I need the deviances to calculate some of the pseudo R-squares
with function pR2(), package pscl.)

Could you give good advice?

Thanks
*S*

--
Sascha Vieweg, saschav...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pulling strings from a Flat file

2011-04-06 Thread Bill.Venables

Isn't all you need read.fwf?

From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
Kalicin, Sarah [sarah.kali...@intel.com]
Sent: 06 April 2011 09:48
To: r-help@r-project.org
Subject: [R] Pulling strings from a Flat file

Hi,

I have a flat file that contains a bunch of strings that look like this. The 
file was originally in Unix and brought over into Windows:

E123456E234567E345678E456789E567891E678910E. . . .
Basically the string starts with E and is followed with 6 numbers. One 
string=E123456, length=7 characters. This file contains 10,000's of these 
strings. I want to separate them into one vector the length of the number of 
strings in the flat file, where each string is it's on unique value.

cc-c(7,7,7,7,7,7,7)
 aa- file(Master,r, raw=TRUE)
 readChar(aa, cc, useBytes = FALSE)
[1] E123456  \nE23456 7\nE3456 78\nE456 789\nE56 7891\nE6 78910\nE
 close(aa)
 unlink(Master)

The biggest issue is I am getting \n added into the string, which I am not sure 
where it is coming from, and splices the strings. Any suggestions on getting 
rid of the /n and create an infinite sequence of 7's for the string length for 
the cc vector? Is there a better way to do this?

Sarah

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fraction with timelag

2011-03-24 Thread Bill.Venables

 How about simply

 df - data.frame(id = 1:6,
+   xout = c(12.34, 21.34, 2.34, 4.56, 3.24, 3.45),
+   xin = c(NA, 34,67,87,34, NA))
 
 with(df, c(NA, xin[-1]/xout[-length(xout)]))
[1]NA  2.755267  3.139644 37.179487  7.456140NA
 

BTW You seem to have some decimal points missing in your example. 

BTW2:  zoo?

From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
Patrick Hausmann [patrick.hausm...@uni-bremen.de]
Sent: 24 March 2011 21:31
To: r-help@r-project.org
Subject: [R] fraction with timelag

Dear r-help,

I'm having this DF

df - data.frame(id = 1:6,
  xout = c(1234, 2134, 234, 456, 324, 345),
  xin= c(NA, 34,67,87,34, NA))

and would like to calculate the fraction (xin_t / xout_t-1)
The result should be:
# NA, 2.76, 3.14, 37.18, 7.46, NA

I am sure there is a solution using zoo... but I don't know how...

Thanks for any help!
Patrick

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R as a non-functional language

2011-03-21 Thread Bill.Venables

That's not the point.  The point is that R has functions which have 
side-effects and hence does not meet the strict requirements for a functional 
language. 

-Original Message-
From: ONKELINX, Thierry [mailto:thierry.onkel...@inbo.be] 
Sent: Monday, 21 March 2011 7:20 PM
To: russ.abb...@gmail.com; Venables, Bill (CMIS, Dutton Park)
Cc: r-help@r-project.org
Subject: RE: [R] R as a non-functional language

Dear Russ,

Why not use simply

pH - c(area1 = 4.5, area2 = 7, mud = 7.3, dam = 8.2, middle = 6.3)

That notation is IMHO the most readable for students.

Best regards,

Thierry

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie  Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics  Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey

 -Oorspronkelijk bericht-
 Van: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] Namens Russ Abbott
 Verzonden: zondag 20 maart 2011 6:46
 Aan: bill.venab...@csiro.au
 CC: r-help@r-project.org
 Onderwerp: Re: [R] R as a non-functional language

 I'm afraid I disagree.  As a number of people have shown, 
 it's certainly possible to get the end result

  pH - c(4.5,7,7.3,8.2,6.3)
  names(pH) - c('area1','area2','mud','dam','middle')
  pH
  area1  area2muddam middle
4.57.07.38.26.3

 using a single expression. But what makes this non-functional 
 is that the
 names() function operates on a reference to the pH 
 object/entity/element. In other words, the names() function 
 has a side effect, which is not permitted in strictly 
 functional programming.

 I don't know if R has threads. But imagine it did and that one ran

  names(pH) - c('area1','area2','mud','dam','middle')

 and

   names(pH) - c('areaA','areaB','dirt','blockage','center')

 in simultaneous threads. Since they are both operating on the 
 same pH element, it is uncertain what the result would be. 
 That's one of the things that functional programming prevents.

 *-- Russ *

 On Sat, Mar 19, 2011 at 10:22 PM, bill.venab...@csiro.au wrote:

  PS the form

  names(p) - c(...)

  is still functional, of course.  It is just a bit of 
 syntactic sugar 
  for the clumsier

  p - `names-`(p, c(...))

  e.g.
   pH - `names-`(pH, letters[1:5])
   pH
   a   b   c   d   e
  4.5 7.0 7.3 8.2 6.3

  -Original Message-
  From: Venables, Bill (CMIS, Dutton Park)
  Sent: Sunday, 20 March 2011 3:09 PM
  To: 'Gabor Grothendieck'; 'russ.abb...@gmail.com'
  Cc: 'r-help@r-project.org'
  Subject: RE: [R] R as a non-functional language

  The idiom I prefer is

  pH - structure(c(4.5,7,7.3,8.2,6.3),
 names = c('area1','area2','mud','dam','middle'))

  -Original Message-
  From: r-help-boun...@r-project.org 
  [mailto:r-help-boun...@r-project.org]
  On Behalf Of Gabor Grothendieck
  Sent: Sunday, 20 March 2011 2:33 PM
  To: russ.abb...@gmail.com
  Cc: r-help@r-project.org
  Subject: Re: [R] R as a non-functional language

  On Sun, Mar 20, 2011 at 12:20 AM, Russ Abbott 
 russ.abb...@gmail.com
  wrote:
   I'm reading Torgo (2010) *Data Mining with 
   R*http://www.liaad.up.pt/~ltorgo/DataMiningWithR/code.htmlin
   preparation for a class I'll be teaching next quarter.  Here's an 
   example that is very non-functional.

   pH - c(4.5,7,7.3,8.2,6.3)
   names(pH) - c('area1','area2','mud','dam','middle')
   pH
area1  area2muddam middle
 4.57.07.38.26.3

   This sort of thing seems to be quite common in R.

  Try this:

  pH - setNames(c(4.5,7,7.3,8.2,6.3),
  c('area1','area2','mud','dam','middle'))

  --
  Statistics  Software Consulting
  GKX Group, GKX Associates Inc.
  tel: 1-877-GKX-GROUP
  email: ggrothendieck at gmail.com

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible

Re: [R] Help with POSIXct

2011-03-21 Thread Bill.Venables

You might try

dat$F1 - format(as.Date(dat$F1), format = %b-%y)

although it rather depends on the class of F1 as it has been read.  

Bill Venables.

(It would be courteous of you to give us yor name, by the way.) 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of i...@mathewanalytics.com
Sent: Tuesday, 22 March 2011 6:31 AM
To: r-help@r-project.org
Subject: [R] Help with POSIXct


I rarely work with dates in R, so I know very little about the
POSIXct and POSIXlt classes. I'm importing an excel file into R using
the RODBC package, and am having issues reformatting the dates.


1. The important info:
 sessionInfo()
R version 2.12.2 (2011-02-25)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C   
 
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] RODBC_1.3-2

loaded via a namespace (and not attached):
[1] tools_2.12.2


2. My question:

My data looks like the following once it is imported into R.

one - odbcConnectExcel(marketshare.xls)
dat - sqlFetch(one, Sheet1)
close(one)

 dat
   F1 Marvel DC
1  2010-01-01 42 34
2  2010-02-01 45 34
3  2010-03-01 47 29
4  2010-04-01 45 32
5  2010-05-01 45 35
6  2010-06-01 42 34

Variable F1, is supposed to be Jan-10, Feb-10, Mar-10, etc.
However, in the process of importing the .xls file, R is reformating
the dates. How can I retrieve the original month-year format
that I have in my excel file.


3. While the R help list discourages against asking multiple questions in 
an inquiry, it'd be really helpful if someone could point me to any good
online
resources on the POSIXct and POSIXlt classes.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to substract a valur from dataframe with condition

2011-03-21 Thread Bill.Venables

dat - within(dat, {
X2 - ifelse(X2  50, 100-X2, X2)
X3 - ifelse(X3  50, 100-X3, X3)
})
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of joe82
Sent: Tuesday, 22 March 2011 7:40 AM
To: r-help@r-project.org
Subject: [R] How to substract a valur from dataframe with condition

Hello All,

I need help with my dataframe, it is big but here I am using a small table
as an example.

My dataframe df looks like:
X1  X2X3
1   2011-02  0.00 96.00
2   2011-02  0.00  2.11
3   2011-02  2.00  3.08
4   2011-02  0.06  2.79
5   2011-02  0.00 96.00
6   2011-02  0.00 97.00
7   2011-02  0.08  2.23

I want values in columns X2 and X3 to be checked if they are greater than
50, if yes, then subtract from '100'

df should look like:

   X1  X2X3
1   2011-02  0.00 4.00
2   2011-02  0.00  2.11
3   2011-02  2.00  3.08
4   2011-02  0.06  2.79
5   2011-02  0.00 4.00
6   2011-02  0.00 3.00
7   2011-02  0.08  2.23


Please help, I will really appreciate that.

Thanks,

Joe




--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-substract-a-valur-from-dataframe-with-condition-tp3394907p3394907.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R as a non-functional language

2011-03-19 Thread Bill.Venables

The idiom I prefer is

pH - structure(c(4.5,7,7.3,8.2,6.3), 
names = c('area1','area2','mud','dam','middle')) 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Gabor Grothendieck
Sent: Sunday, 20 March 2011 2:33 PM
To: russ.abb...@gmail.com
Cc: r-help@r-project.org
Subject: Re: [R] R as a non-functional language

On Sun, Mar 20, 2011 at 12:20 AM, Russ Abbott russ.abb...@gmail.com wrote:
 I'm reading Torgo (2010) *Data Mining with
 R*http://www.liaad.up.pt/~ltorgo/DataMiningWithR/code.htmlin
 preparation for a class I'll be teaching next quarter.  Here's an
 example
 that is very non-functional.

 pH - c(4.5,7,7.3,8.2,6.3)
 names(pH) - c('area1','area2','mud','dam','middle')
 pH
  area1  area2    mud    dam middle
   4.5    7.0    7.3    8.2    6.3

 This sort of thing seems to be quite common in R.

Try this:

pH - setNames(c(4.5,7,7.3,8.2,6.3), c('area1','area2','mud','dam','middle'))

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] making dataframes

2011-03-16 Thread Bill.Venables

Firstly, the way you have constructed your data frame in the example will 
convert everything to factors.  What you need to do is actually a bit simpler:

###
dum - data.frame(date, col1, col2)
###

One way to turn this into the kind of data frame you want is to convert the 
main part of it to a table first, and then coerce into a data frame:

###
tab - as.table(as.matrix(dum[, -1]))
row.names(tab) - date
names(dimnames(tab)) - c(date, category)
Dum - as.data.frame(tab, responseName = rainfall)
Dum$date - factor(Dum$date, levels = date)
###

Here is a checK:

 head(Dum)

  date category rainfall
1  jan col1  8.2
2  feb col1  5.4
3  mar col1  4.3
4  apr col1  4.1
5  may col1  3.1
6 june col1  2.5

 with(Dum, tapply(rainfall, date, mean))

 jan  feb  mar  apr  may june july  aug  sep  oct  nov  dec 
5.65 3.85 4.50 5.50 5.30 1.80 2.35 6.50 5.35 2.20 5.95 4.40 


Bill Venables.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of pelt
Sent: Thursday, 17 March 2011 12:28 AM
To: r-help@r-project.org
Subject: [R] making dataframes

Dear all,

I have a dataframe which looks like this (dummy):

date-c(jan, feb, mar, apr, may, june, july, 
aug,sep,oct,nov,dec)
col1-c(8.2,5.4,4.3,4.1,3.1,2.5,1.1,4.5,3.2,1.9,7.8,6.5)
col2-c(3.1,2.3,4.7,6.9,7.5,1.1,3.6,8.5,7.5,2.5,4.1,2.3)
dum-data.frame(cbind(date,col1,col2))
dum
   date col1 col2
1   jan  8.2  3.1
2   feb  5.4  2.3
3   mar  4.3  4.7
4   apr  4.1  6.9
5   may  3.1  7.5
6  june  2.5  1.1
7  july  1.1  3.6
8   aug  4.5  8.5
9   sep  3.2  7.5
10  oct  1.9  2.5
11  nov  7.8  4.1
12  dec  6.5  2.3

I would like to convert this data.frame into something that looks like this:
   date rainfall category
1   jan  8.2  col1
2   feb  5.4  col1
3   mar  4.3  col1
4   apr  4.1  col1
5   may  3.1  col1
6  june  2.5  col1
7  july  1.1  col1
8   aug  4.5  col1
9   sep  3.2  col1
10  oct  1.9  col1
11  nov  7.8  col1
12  dec  6.5  col1
1   jan   3.1 col2
2   feb  2.3 col2
3   mar  4.7 col2
4   apr   6.9 col2
5   may   7.5 col2
6  june   1.1 col2
7  july3.6 col2
8   aug   8.5 col2
9   sep   7.5 col2
10  oct  2.5 col2
11  nov  4.1 col2
12  dec  2.3 col2

So the column-names become categories.  The dataset is rather large with 
many columns and a lengthy date-string. Is there an easy way to do this?

Thank you for your help,

Kind regards,

Saskia van Pelt

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Why doesn't this work ?

2011-03-16 Thread Bill.Venables

It doesn't work (in R) because it is not written in R.  It's written in some 
other language that looks a bit like R.

 t - 3
 z - t %in% 1:3
 z
[1] TRUE
 t - 4
 z - t %in% 1:3
 z
[1] FALSE
 

 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of eric
Sent: Thursday, 17 March 2011 1:26 PM
To: r-help@r-project.org
Subject: [R] Why doesn't this work ?

Why doesn't this work and is there a better way ?

z -ifelse(t==1 || 2 || 3, 1,0)
t -3
z
[1] 1
t -4
z
[1] 1

trying to say ...if t == 1 or if t== 2 or if t ==3 then true, otherwise
false

--
View this message in context: 
http://r.789695.n4.nabble.com/Why-doesn-t-this-work-tp3383656p3383656.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subset using grepl

2011-03-16 Thread Bill.Venables

subset(data, grepl([1-5], section)  !grepl(0, section)) 

BTW

grepl([1:5], section) 

does work.  It checks for the characters 1, :, or 5.  
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Kang Min
Sent: Thursday, 17 March 2011 1:29 PM
To: r-help@r-project.org
Subject: Re: [R] Subset using grepl

I have a new question, also regarding grepl.
I would like to subset rows with numbers from 1 to 5 in the section
column, so I used

subset(data, grepl([1:5], section))

but this gave me rows with 10, 1 and 5. (Why is this so?) So I tried

subset(data, grepl([1,2,3,4,5], section))

which worked. But I also got 10 in the dataframe as well. How can I
exclude 10?

data
section piece   LTc1LTc2
10a 10-10.729095368 NA
10a 10-259.53292189 13.95612454
10h 10-30.213756661 NA
10i 10-4NA  NA
10b NA  NA  NA
10c NA  NA  NA
10d NA  NA  NA
10e NA  NA  NA
10f NA  NA  NA
10g NA  NA  NA
10h NA  NA  NA
10j NA  NA  NA
1b  1-1 NA  NA
1d  1-2 29.37971303 12.79688209
1g  1-6 NA  7.607911603
1h  1-3 0.298059164 27.09896941
1i  1-4 25.11261782 19.87149991
1j  1-5 36.66969601 42.28507923
1a  NA  NA  NA
1c  NA  NA  NA
1e  NA  NA  NA
1f  NA  NA  NA
2a  2-1 15.98582117 10.58696146
2a  2-2 0.557308341 41.52650718
2c  2-3 14.99499024 10.0896793
2e  2-4 148.4530636 56.45493191
2f  2-5 25.27493551 12.98808577
2i  2-6 20.32857108 22.76075728
2b  NA  NA  NA
2d  NA  NA  NA
2g  NA  NA  NA
2h  NA  NA  NA
2j  NA  NA  NA
3a  3-1 13.36602867 11.47541439
3a  3-7 NA  111.9007822
3c  3-2 10.57406701 5.58567
3d  3-3 11.73240891 10.73833651
3e  3-8 NA  14.54214165
3h  3-4 21.56072089 21.59748884
3i  3-5 15.42846935 16.62715409
3i  3-6 129.7367193 121.8206045
3b  NA  NA  NA
3f  NA  NA  NA
3g  NA  NA  NA
3j  NA  NA  NA
5b  5-1 18.61733498 18.13545293
5d  5-3 NA  7.81018526
5f  5-2 12.5158971  14.37884817
5a  NA  NA  NA
5c  NA  NA  NA
5e  NA  NA  NA
5g  NA  NA  NA
5h  NA  NA  NA
5i  NA  NA  NA
5j  NA  NA  NA
9h  9-1 NA  NA
9a  NA  NA  NA
9b  NA  NA  NA
9c  NA  NA  NA
9d  NA  NA  NA
9e  NA  NA  NA
9f  NA  NA  NA
9g  NA  NA  NA
9i  NA  NA  NA
9j  NA  NA  NA
8a  8-1 14.29712852 12.83178905
8e  8-2 23.46594953 9.097377872
8f  8-3 NA  NA
8f  8-4 22.20001584 20.39646766
8h  8-5 50.54497551 56.93752065
8b  NA  NA  NA
8c  NA  NA  NA
8d  NA  NA  NA
8g  NA  NA  NA
8i  NA  NA  NA
8j  NA  NA  NA
4b  4-1 40.83468857 35.99017683
4f  4-3 NA  182.8060799
4f  4-4 NA  36.81401955
4h  4-2 17.13625062 NA
4a  NA  NA  NA
4c  NA  NA  NA
4d  NA  NA  NA
4e  NA  NA  NA
4g  NA  NA  NA
4i  NA  NA  NA
4j  NA  NA  NA
7b  7-1 8.217605633 8.565035083
7a  NA  NA  NA
7c  NA  NA  NA
7d  NA  NA  NA
7e  NA  NA  NA
7f  NA  NA  NA
7g  NA  NA  NA
7h  NA  NA  NA
7i  NA  NA  NA
7j  NA  NA  NA
6b  6-6 NA  11.57887288
6c  6-1 27.32608984 17.17778959
6c  6-2 78.21988783 61.80558768
6d  6-7 NA  3.599685625
6f  6-3 26.78838281 23.33258286
6h  6-4 NA  NA
6h  6-5 NA  NA
6a  NA  NA  NA
6e  NA  NA  NA
6g  NA  NA  NA
6i  NA  NA  NA
6j  NA  NA  NA


On Jan 29, 10:43 pm, Prof Brian Ripley rip...@stats.ox.ac.uk wrote:
 On Sat, 29 Jan 2011, Kang Min wrote:
  Thanks Prof Ripley, the condition worked!
  Btw I tried to search ?repl but I don't have documentation for it. Is
  it in a non-basic package?

 I meant grepl: the edit messed up (but not on my screen, as sometimes
 happens when working remotely).  The point is that 'perl=TRUE'
 guarantees that [A-J] is interpreted in ASCII order.





  On Jan 29, 6:54ï¿½pm, Prof Brian Ripley rip...@stats.ox.ac.uk wrote:
  The grep comdition is [A-J]

  BTW, why there are lots of unnecessary steps here, including using
  cbind() and subset():

  x - rep(LETTERS[1:20],3)
  y - rep(1:3, 20)
  z - paste(x,y, sep=)

Re: [R] data.frame transformation

2011-03-14 Thread Bill.Venables

It is possible to do it with numeric comparisons, as well, but to make life 
comfortable you need to turn off the warning system temporarily.

df - data.frame(q1 = c(0,0,33.33,check),
 q2 = c(0,33.33,check,9.156),
 q3 = c(check,check,25,100),
 q4 = c(7.123,35,100,check))

conv - function(x, cutoff) {
oldOpt - options(warn = -1)
on.exit(options(oldOpt))
x - as.factor(x)
lev - as.numeric(levels(x))
levels(x)[!is.na(lev)  lev  cutoff] - .
x
}

Check:
 (df1 - data.frame(lapply(df, conv, cutoff = 10)))
 q1q2q3q4
1 . . check .
2 . 33.33 check35
3 33.33 check25   100
4 check .   100 check
 

Bill Venables. 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of David Winsemius
Sent: Tuesday, 15 March 2011 6:29 AM
To: andrija djurovic
Cc: r-help@r-project.org
Subject: Re: [R] data.frame transformation


On Mar 14, 2011, at 3:51 PM, andrija djurovic wrote:

 I would like to hide cells with values less the 10%, so . or just  
  doesn't make me any difference. Also I used apply combined with
 as.character:

 apply(df, 2, function(x)  ifelse(as.character(x)  10,.,x))

 This is, probably not a good solution, but it works except that I  
 lose  row names and because of that I was wondering if there is some  
 other way to do this.

 Anyway thank you both i will try to do this before combining numbers  
 and strings.

I saw your later assertion that it didn't work which surprised me. My  
version of your data followed my advice not to use factors and your  
effort did succeed when the columns were character rather than factor.  
I put back the row numbers by coercing back to a data.frame. `apply`  
returns a matrix.

  df-data.frame(q1=c(0,0,33.33,check),q2=c(0,33.33, check,9.156),
+ q3=c(check,check,25,100),q4=c(7.123,35,100,check),  
stringsAsFactors=FALSE)
  as.data.frame(apply(df, 2, function(x)  ifelse(as.character(x)   
10,.,x)))
  q1q2q3q4
1 . . check 7.123
2 . 33.33 check35
3 33.33 .25   100
4 check 9.156   100 check

There is a danger of using character collation in that if there are  
any leading characters in those strings that are below 1 such as a  
blank or any other punctuation, they will get dotted.

  ,  1
[1] TRUE
  .  1
[1] TRUE
  -  1
[1] TRUE

And 1.check would also get dotted

  1.check  10
[1] TRUE


 Andrija

 On Mon, Mar 14, 2011 at 8:11 PM, David Winsemius dwinsem...@comcast.net 
  wrote:

 On Mar 14, 2011, at 2:52 PM, andrija djurovic wrote:

 Hi R users,

 I have following data frame

 df-data.frame(q1=c(0,0,33.33,check),q2=c(0,33.33,check,9.156),
 q3=c(check,check,25,100),q4=c(7.123,35,100,check))

 and i would like to replace every element that is less then 10  
 with . (dot)
 in order to obtain this:

q1q2q3q4
 1 . . check .
 2 . 33.33 check35
 3 33.33 check25   100
 4 check .   100 check

 I had a lot of difficulties because each variable is factor.

 Right, so comparisons with  will throw an error.  I would  
 sidestep the factor problem with stringsAsFactors=FALSE in the  
 data.frame call. You might want to reconsider the . as a missing  
 value. If you are coming from a SAS background, you should try to  
 get comfortable with NA or NA_character as a value.


 df-data.frame(q1=c(0,0,33.33,check),q2=c(0,33.33,check,9.156),
  q3=c(check,check,25,100),q4=c(7.123,35,100,check),  
 stringsAsFactors=FALSE)

 is.na(df) - t(apply(df, 1, function(x)  as.numeric(x)  10))

 Warning messages:
 1: In FUN(newX[, i], ...) : NAs introduced by coercion
 2: In FUN(newX[, i], ...) : NAs introduced by coercion
 3: In FUN(newX[, i], ...) : NAs introduced by coercion
 4: In FUN(newX[, i], ...) : NAs introduced by coercion
  df
 q1q2q3q4
 1  NA  NA check  NA
 2  NA 33.33 check35

 3 33.33 check25   100
 4 check  NA   100 check


 Could someone help me with this?

 Thanks in advance for any help.

 Andrija

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 David Winsemius, MD
 West Hartford, CT



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal,

Re: [R] Stepwise Discriminant... in R

2011-03-13 Thread Bill.Venables

If you want to do a stepwise selection there is a function in the klaR package 
to do it.  This is not what you are asking for, though.  You want a way of 
finding the successive error rates as additional variables are added in the 
forward selection process.  As far as I can see you have to do this yourself 
and it is a mildly interesting little exercise in R programming.  Here is one 
possible way to do it.

First you need a couple of functions:

##
errorRate - function(object, ...) {
  if(!require(MASS)) stop(you need the MASS package installed)
  UseMethod(errorRate)
}

errorRate.lda - function(object, data = eval.parent(object$call$data),
  type = plug-in) {
  pred - predict(object, data, type = type)$class
  actu - eval(formula(object)[[2]], data)
  conf - table(pred, actu)
  1 - sum(diag(conf))/sum(conf)
}

eRates - function(object, data = eval.parent(object$call$data),
   type = plug-in) {
  f - formula(object)
  r - data.frame(formula = deparse(f[[3]]),
  Error = errorRate(object, data,
  type = type))
  while(length(f[[3]])  1) {
f[[3]] - f[[3]][[2]]
object$call$formula - f
object - update(object, data = data)
r - rbind(data.frame(formula = deparse(f[[3]]),
  Error = errorRate(object, data,
  type = type)),
   r)
  }
  r
}
##

(I have made errorRate generic as it is potentially a generic sort of 
operation.)
Now look at your trivial example (extended a bit):

##
require(klaR)
QRBdfa -
data.frame(LANDUSE = sample(c(A, B, C), 270, rep = TRUE),
   Al = runif(270, 0, 125),
   Sb = runif(270, 0, 1),
   Ba = runif(270, 0, 235),
   Bi = runif(270, 0, 0.11),
   Cr = runif(270, 0, 65))

gw_obj - greedy.wilks(LANDUSE ~ ., data = QRBdfa, niveau = 1) ## NB large 
'niveau'
##

To use the functions you need an lda fit with the same formula as for the gw 
object and the same data argument as in the original call.  (If you try to do 
this the way suggested in the help file for greedy.wilks the functions to be 
used here will not work. No dollars in formulae is always a good rule to 
follow.)

The way greedy.wilks is written makes this a bit tricky, but unless you want to 
just type it in, here is a partly automated way of doing it:

##
require(MASS)
fit - do.call(lda, list(formula = formula(gw_obj),
 data = quote(QRBdfa)))
##

To use the functions:

 errorRate(fit)  ## for one error rate
[1] 0.5962963
 eRates(fit) ## for a sequence of error rates.
 formula Error
1 Ba 0.6148148
2Ba + Bi 0.6296296
3   Ba + Bi + Al 0.6074074
4  Ba + Bi + Al + Cr 0.5740741
5 Ba + Bi + Al + Cr + Sb 0.5962963
 

Since this example uses very artificial random data, the output will be 
different every time you re-create the data.  Note also that the error rates 
are not necessarily monotonically decreasing.

Bill Venables.



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Ty Smith
Sent: Monday, 14 March 2011 3:51 AM
To: r-help@r-project.org
Subject: [R] Stepwise Discriminant... in R

Hello R list,

I'm looking to do some stepwise discriminant function analysis (DFA) based
on the minimization of Wilks' lambda in R to end up with a composite
signature (of metals Al,Sb,Bi,Cr,Ba) capable of discriminating
100% of the source factors (LANDUSE: A,B,C).

The Wilks' lambda portion seems straightforward. I am using the following:

gw_obj - greedy.wilks(LANDUSE ~ ., data = QRBdfa, niveau = 0.1)
gw_obj

Thus determining the stepwise order of metals.But I can't seem to figure out
how to coerce the DFA to give me an output with the % of factors which each
successive metal (variable) correctly classifies (discriminates). e.g.

StepMetal%correctly classified
1Al25
2Sb   75
3Bi89
4Cr   100

I've worked up a trivial example below. Can anyone offer any suggestions on
how I might go about doing this in R?

I am working in a MAC OS environment with a current version of R.

Many thanks in advance!

Tyler

#Example
library(scatterplot3d)
library(klaR)

Al -runif(27, 0, 125)
QRBdfa - as.data.frame(Al)
QRBdfa$LANDUSE - factor(c(A,A,A,B,B,B,C,C,C))
QRBdfa$Sb - runif(27, 0, 1)
QRBdfa$Ba - runif(27, 0, 235)
QRBdfa$Bi - runif(27, 0, 0.11)
QRBdfa$Cr - runif(27, 0, 65)


gw_obj - greedy.wilks(LANDUSE ~ ., data = QRBdfa, niveau = 0.1)
gw_obj


fit - lda(LANDUSE ~ Al + Sb + Bi + Cr + Ba, data = QRBdfa)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

Re: [R] troubles with logistic regression

2011-03-13 Thread Bill.Venables

It means you have selected a response variable from one data frame 
(unmarried.male) and a predictor from another data frame (fieder.male) and they 
have different lengths.  

You might be better off if you used the names in the data frame rather than 
selecting columns in a form such as 'some.data.frame[, 3]',  This just confuses 
the issue and makes it very easy to make mistakes - as indeed you have done.

Also, to fit models on subsets of the data, you do not have to create separate 
data frames.  See the 'subset' argument of glm, which is standard for most 
fitting functions.  This is also a way to avoid problems and would have helped 
you here as well.

Bill Venables.
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of gked
Sent: Monday, 14 March 2011 4:33 AM
To: r-help@r-project.org
Subject: [R] troubles with logistic regression

hello everyone,
I working on the dataset for my project in class and got stuck on trying to
run logistic regression. here is my code:
data - read.csv(file=C:/Users/fieder.data.2000.csv)

# creating subset of men 
fieder.male-subset(data,data[,8]==1)
unmarried.male-subset(data,data[,8]==1data[,6]==1)

# glm fit
agesq.male-(unmarried.male[,5])^2
male.sqrtincome-sqrt(unmarried.male[,9])

fieder.male.mar.glm-glm(as.factor(unmarried.male[,6])~
 factor(fieder.male[,7])+fieder.male[,5]+agesq.male+
  male.sqrtincome,binomial(link=logit) )
par(mfrow=c(1,1))
plot(c(0,300),c(0,1),pch= ,
   xlab=sqrt income, truncated at 9,
   ylab=modeled probability of being never-married)
junk- lowess(male.sqrtincome,
  log(fieder.male.mar.glm$fitted.values/
  (1-fieder.male.mar.glm$fitted.values)))
  lines(junk$x,exp(junk$y)/(1+exp(junk$y)))
title(main=probability of never marrying\n males, by sqrt(income))
points(male.sqrtincome[unmarried.male==0],
  fieder.male.mar.glm$fitted.values[unmarried.male==0],pch=16)
points(male.sqrtincome[unmarried.male==1],
  fieder.male.mar.glm$fitted.values[unmarried.male==1],pch=1)

The error says: 
Error in model.frame.default(formula = as.factor(unmarried.male[, 6]) ~  : 
  variable lengths differ (found for 'factor(fieder.male[, 7])')
 
What does it mean? Where am i making a mistake?
Thank you
P.S. i  am also attaching data file in .csv format
http://r.789695.n4.nabble.com/file/n3352356/fieder.data.2000.csv
fieder.data.2000.csv 

--
View this message in context: 
http://r.789695.n4.nabble.com/troubles-with-logistic-regression-tp3352356p3352356.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cleaning date columns

2011-03-09 Thread Bill.Venables

Here is one possible way (I think - untested code)


cData - do.call(rbind, lapply(split(data, data$prochi), 
function(dat) {
dat - dat[order(dat$date), ]
while(any(d - (diff(dat$date) = 3))) 
dat - dat[-(min(which(d))+1), ]
dat
}))

(It would be courteous of you to give us your real name, by the way) 

Bill Venables.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Newbie19_02
Sent: Wednesday, 9 March 2011 9:20 PM
To: r-help@r-project.org
Subject: [R] Cleaning date columns

Hi Everyone,

I have the following problem:

data - structure(list(prochi = c(IND1, IND1, IND1, 
IND2, IND2, IND2, IND2, IND3, 
IND4, IND5), date_admission = structure(c(6468, 
6470, 7063, 9981, 9983, 14186, 14372, 5129, 9767, 11168), class = Date)),
.Names = c(prochi, 
date_admission), row.names = c(27, 28, 21, 86, 77, 
80, 1, 114, 192, 322), class = data.frame)


I have records for individuals that were taken on specific dates.  Some of
the dates are within 3 days of each other.  I want to be able to clean my
date column and select the earliest of the dates that occur within 3 days of
each other per individual as a single observation that represents the N
observations.  So for example:

input:
IND11987-09-17
IND1 1987-09-19
IND1 1989-05-04

output:
IND11987-09-17
IND1 1989-05-04

I'm  not sure where to start with this?

Thanks,
Nat
 

--
View this message in context: 
http://r.789695.n4.nabble.com/Cleaning-date-columns-tp3343359p3343359.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extracting only odd columns from a matrix

2011-03-09 Thread Bill.Venables

Xonly - XY[, grep(^X, dimnames(XY)[[2]])]

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Nixon, Matthew
Sent: Thursday, 10 March 2011 12:20 AM
To: r-help@R-project.org
Subject: [R] Extracting only odd columns from a matrix

Hi,

This might seem like a simple question but at the moment I am stuck for ideas. 
The columns of my matrix in which some data is stored are of this form:

X1 Y1 X2 Y2 X3 Y3 ... Xn Yn

with n~100. I would like to look at just the X values (i.e. odd column 
numbers). Is there an easy way to loop round extracting only these columns?

Any help would be appreciated.

Thank you.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] attr question

2011-03-07 Thread Bill.Venables

Erin

You could use 

as.vector(t.test(buzz$var1, conf.level=.98)$conf.int)

Bill Venables.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Erin Hodgess
Sent: Monday, 7 March 2011 3:12 PM
To: R help
Subject: [R] attr question

Dear R People:

When I want to produce a small sample confidence interval using
t.test, I get the following:

 t.test(buzz$var1, conf.level=.98)$conf.int
[1] 2.239337 4.260663
attr(,conf.level)
[1] 0.98

How do I keep the attr statement from printing, please?  I'm sure it's
something really simple.

Thanks,
Sincerely,
Erin



-- 
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extract rows with unique values from data.frame

2011-03-07 Thread Bill.Venables

Here is possibly one method (if I have understood you correctly):

 con - textConnection(
+  xloc yloc  gonad  indEneW   Agent
+ 123  20   516.74   1 0.02 20.21 0.25
+ 223  20  1143.20   1 0.02 20.21 0.50
+ 321  19   250.00   1 0.02 20.21 0.25
+ 422  15   251.98   1 0.02 18.69 0.25
+ 524  18   598.08   1 0.02 18.69 0.25
+ )
 
 pop - read.table(con, header = TRUE)
 close(con)
 
 i - with(pop, cumsum(!duplicated(cbind(xloc, yloc
 
 k - 2  ## how many do you want?
 
 no - min(which(i == k))
 pop[1:no, ]
  xloc yloc   gonad ind  Ene W Agent
1   23   20  516.74   1 0.02 20.21  0.25
2   23   20 1143.20   1 0.02 20.21  0.50
3   21   19  250.00   1 0.02 20.21  0.25
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Nicolas Gutierrez
Sent: Tuesday, 8 March 2011 1:52 PM
To: R help
Subject: [R] extract rows with unique values from data.frame

Hello!

I have the data frame pop:

 xloc yloc  gonad  indEneW   Agent
123  20   516.74   1 0.02 20.21 0.25
223  20  1143.20   1 0.02 20.21 0.50
321  19   250.00   1 0.02 20.21 0.25
422  15   251.98   1 0.02 18.69 0.25
524  18   598.08   1 0.02 18.69 0.25

And I need to extract the number of rows that have unique (xloc, yloc) 
values. For example I need 2 unique cells (xloc,yloc) so the number of 
rows to extract would be 3 as follows:

 xloc yloc  gonad  indEneW   Agent
123  20   516.74   1 0.02 20.21 0.25
223  20  1143.20   1 0.02 20.21 0.50
321  19   250.00   1 0.02 20.21 0.25

Thanks for any help!

Nico

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] creating a count variable in R

2011-03-03 Thread Bill.Venables

You can probably simplify this if you can assume that the dates are in sorted 
order.  Here is a way of doing it even if the days are in arbitrary order.  The 
count refers to the number of times that this date has appeared so far in the 
sequence.

con - textConnection(
01/01/2011
01/01/2011
02/01/2011
02/01/2011
02/01/2011
02/01/2011
03/01/2011
03/01/2011
03/01/2011
03/01/2011
03/01/2011
03/01/2011
03/01/2011
)
days - scan(con, what = )
close(con)
X - model.matrix(~days-1)
XX - apply(X, 2, cumsum)
dat - data.frame(days = days, count = rowSums(X*XX))
dat

###
this uses days as a character string vector.  If they are actual dates, then 
convert them to character strings for this operation.

Bill Venables. 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of JonC
Sent: Friday, 4 March 2011 7:58 AM
To: r-help@r-project.org
Subject: [R] creating a count variable in R

Hi R helpers,

I'm trying to create a count in R , but as there is no retain function like
in SAS I'm running into difficulties.

I have the following :

Date_var   and wish to obtain  Date_var 
Count_var 
01/01/2011   01/01/2011 
1
01/01/2011   01/01/2011 
2
02/01/2011   02/01/2011 
1
02/01/2011   02/01/2011 
2
02/01/2011   02/01/2011 
3
02/01/2011   02/01/2011 
4
03/01/2011   03/01/2011 
1
03/01/2011   03/01/2011 
2
03/01/2011   03/01/2011 
3
03/01/2011   03/01/2011 
4
03/01/2011   03/01/2011 
5
03/01/2011   03/01/2011 
6
03/01/2011   03/01/2011 
7

As can be seen above the count var is re initialised every time a new date
is found. I hope this is easy.

Many thanks in advance for assistance. It is appreciated. 

Cheers

Jon 


--
View this message in context: 
http://r.789695.n4.nabble.com/creating-a-count-variable-in-R-tp3334288p3334288.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R usage survey

2011-03-03 Thread Bill.Venables

No.  That's not answering the question.  ALL surveys are for collecting 
information.

The substantive issue is what purpose do you have in seeking this information 
in the first place and what are you going to do with it when you get it?

Do you have some commercial purpose in mind?  If so, what is it?

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Harsh
Sent: Friday, 4 March 2011 1:13 AM
To: rex.dw...@syngenta.com
Cc: r-help@r-project.org
Subject: Re: [R] R usage survey

Hi Rex and useRs,

The purpose of the survey has been mentioned on the survey link goo.gl/jw1ig
but I will also reproduce it here.
- Geographical distribution of R users
- Application areas where R is being used
- Supporting technology being used along with R
- Academic background distribution of R users

The potential personally identifiable information such as name and employer
name are optional fields. Actually all the fields in the survey are
optional.

Some of the analysis output(s) could be along the lines of :-
- Usage statistics of various R packages
- Distribution of R users across countries/cities
- Mapping various applications to packages
- Text Mining of the responses to create informative word clouds

Personally, I am excited about the kind of data I will receive through this
survey and the various insights that could be derived. As already mentioned,
the results will be shared with the community.

Thank you Rex for raising an important point. It is indeed necessary for me
to personally assure the user community that the results will be shared in a
manner that will not contain any personally identifiable information.

Those who wish to gain access to the raw data will be provided with all the
fields but not the name and employer name fields.

Just out of curiosity : It is possible to get name, employer name, location,
usage information and academic background details when searching for R users
on LinkedIn and the many R related groups there.
Does this also provide potential opportunities for misuse and outrageous
analyses, since almost anyone can get onto LinkedIn and access user profiles
?

Thank you for your interest and support.
Regards,
Harsh












On Thu, Mar 3, 2011 at 8:02 PM, rex.dw...@syngenta.com wrote:

 Harsh, Suitably analyzed for whose purposes?  One man's suitable is
 another's outrageous. That's why people want to see the gowns at the
 Oscars.  Under what auspices are you conducting this survey?  What do you
 intend to do with it?  You don't give any assurance that the results you
 post won't have personally identifiable information. I don't get the
 impression that you know much about survey design.

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Harsh
 Sent: Thursday, March 03, 2011 5:53 AM
 To: r-help@r-project.org
 Subject: [R] R usage survey

 Hi R users,
 I request members of the R community to consider filling a short survey
 regarding the use of R.
 The survey can be found at http://goo.gl/jw1ig

 Please accept my apologies for posting here for a non-technical reason.

 The data collected will be suitably analyzed and I'll post a link to the
 results in the coming weeks.

 Thank you all for your interest and for sharing your R usage information.

 Regards,
 Harsh Singhal

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 message may contain confidential information. If you are not the designated
 recipient, please notify the sender immediately, and delete the original and
 any copies. Any use of the message by you is prohibited.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Non-conformable arrays

2011-03-02 Thread Bill.Venables

Here is one way.

1. make sure y.test is a factor

2. Use

table(y.test, 
  factor(PredictedTestCurrent, levels = levels(y.test)) 

3. If PredictedTestCurrent is already a factor with the wrong levels, turn it 
back into a character string vector first.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Gregory Ryslik
Sent: Thursday, 3 March 2011 1:46 PM
To: r-help Help
Subject: [R] Non-conformable arrays

Hi Everyone,

I'm running some simulations where eventually I need to table the results. The 
problem is, that while most simulations I have at least one predicted outcome 
for each of the six possible categories, sometimes the algorithm assigns all 
the outcomes and one category is left out. Thus when I try to add the 
misclassification matrices I get an error. Is there a simple way I can make 
sure that all my tables have the same size (with a row or column of zeros) if 
appropriate without a messy if structure checking each condition?

To be more specific,

here's my line of code for the table command and the two matrices that I 
sometimes have. Notice that in the second matrix, the fad column is missing. 
Basically, I want all the columns and rows to be predetermined so that no 
columns/rows go missing. Thanks for your help!

Kind regards,
Greg

table(y.test,PredictedTestCurrent):


  PredictedTestCurrent
y.test adi car con fad gla mas
   adi   9   0   0   0   0   0
   car   0   6   1   0   0   3
   con   1   0   3   0   0   0
   fad   0   0   0   2   5   4
   gla   0   1   0   0   6   3
   mas   0   0   0   1   4   4


  PredictedTestCurrent
y.test adi car con gla mas
   adi   8   0   0   0   0
   car   0   8   0   0   1
   con   2   0   3   0   0
   fad   0   1   0   4   7
   gla   0   0   0   3   5
   mas   0   2   0   6   3




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] What am I doing wrong with this loop ?

2011-03-02 Thread Bill.Venables

Here is a start

 x - as.data.frame(runif(2000, 12, 38))
 length(x)
[1] 1
 names(x)
[1] runif(2000, 12, 38)
 

Why are you turning x and y into data frames?

It also looks as if you should be using if(...) ... else ... rather than 
ifelse(.,.,), too.

You need to sort out a few issues, it seems. 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of eric
Sent: Thursday, 3 March 2011 1:19 PM
To: r-help@r-project.org
Subject: [R] What am I doing wrong with this loop ?

What is wrong with this loop ?  I am getting an error saying incorrect number
of dimensions y[i,2]

x - as.data.frame(runif(2000, 12, 38))
z -numeric(length(x))
y - as.data.frame(z)
for(i in 1:length(x)) {
  y - ifelse(i  500, as.data.frame(lowess(x[1:i,1], f=1/9)) ,
as.data.frame(lowess(x[(i-499):i,1], f=1/9)))  
  z[i] -y[i,2]
}

--
View this message in context: 
http://r.789695.n4.nabble.com/What-am-I-doing-wrong-with-this-loop-tp3332703p3332703.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logistic Stepwise Criterion

2011-03-01 Thread Bill.Venables

The probability OF the residual deviance is zero.  The significance level for 
the residual deviance according to its asymptotic Chi-squared distribution is a 
possible criterion, but a silly one.  If you want to minimise that, just fit no 
variables at all.  That's the best you can do. If you want to maximise it, just 
minimise the deviance itself, which means include all possible variables in the 
regression, together with as many interactions as you can as well.  (Incidently 
R doesn't have restrictions on how many interaction terms it can handle, those 
are imposed my your computer.)

I suggest you think again about what criterion you really want to use.  Somehow 
you need to balance fit in the training sample against some complexity measure. 
 AIC and BIC are commonly used criteria, but not the only ones.  I suggest you 
start with these and see if either does the kind of job you want.

Stepwise regression with interaction terms can be a bit tricky if you want to 
impose the marginality constraints, but that is a bigger issue.

Bill Venables. 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Mano Gabor Toth
Sent: Wednesday, 2 March 2011 6:44 AM
To: r-help@r-project.org
Subject: [R] Logistic Stepwise Criterion

Dear R-help members,

I'd like to run a binomial logistic stepwise regression with ten explanatory
variables and as many interaction terms as R can handle. I'll come up with
the right R command sooner or later, but my real question is whether and how
the criterion for the evaluation of the different models can be set to be
the probability of the residual deviance in the Chi-Square distribution
(which would be more informative of overall model fit than AIC).

Thanks in advance for all your help.

Kind regards,

Mano Gabor TOTH
MA Political Science
Central European University

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to prove the MLE estimators are normal distributed?

2011-03-01 Thread Bill.Venables

This is a purely statistical question and you should try asking it on some 
statistics list.  

This is for help with using R, mostly for data analysis and graphics.  A glance 
at the posting guide (see the footnote below) might be a good idea.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Ning Cheng
Sent: Wednesday, 2 March 2011 8:21 AM
To: r-help@r-project.org
Subject: [R] How to prove the MLE estimators are normal distributed?

Dear List,
I'm now working on MLE and OSL estimators.I just noticed that the
textbook argues they are joint normal distributed.But how to prove the
conclusion?

Thanks for your time in advance!

Best,
Ning

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Transforming list into an array

2011-02-27 Thread Bill.Venables

One way to do it is to use the 'abind' package

 NCurvas - 10
 NumSim - 15 
 dW - replicate(NumSim, matrix(rnorm(NCurvas * 3), NCurvas, 3),
+ simplify = FALSE)
 library(abind)
 DW - do.call(abind, c(dW, rev.along = 0))
 dim(DW)
[1] 10  3 15

 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Luis Felipe Parra
Sent: Monday, 28 February 2011 10:59 AM
To: r-help
Subject: [R] Transforming list into an array

Hello. I have the following object which is a list of length NumSim with
each entry being a matrix of dimensions Ncurvas x 3:

 dW =
replicate(NumSim,cbind(rnorm(Ncurvas),rnorm(Ncurvas),rnorm(Ncurvas)),simplify=F)

I would like to transform it into an array of dimension Ncurvas x 3 x
NumSim. Does anybody does how to do this? or how to generate directly and
array composed of independent random nomrmal numbers of dimensions Ncurvas x
3 x NumSim.

Thank you

Felipe Parra

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Combinations

2011-02-27 Thread Bill.Venables

You can compute the logarithm of it easily enough

 lchoose(54323456, 2345)
[1] 25908.4

Now, what did you want to do with it? 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Jim Silverton
Sent: Monday, 28 February 2011 10:38 AM
To: r-help@r-project.org
Subject: Re: [R] Combinations

any idea how to get R to compute a combination like (54323456, 2345) ?
-- 
Thanks,
Jim.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Weighted Mean By Factor Using BY

2011-02-23 Thread Bill.Venables

Here is the party line, perhaps

by(data, data$TYPE, function(dat)
   with(dat, weighted.mean(MEASURE, COUNT)))


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Mike Schumacher
Sent: Thursday, 24 February 2011 9:40 AM
To: r-help@r-project.org
Subject: Re: [R] Weighted Mean By Factor Using BY

I withdraw this question, I was able to accomplish this by creating a new
function.

Now if only I could get the by output into a dataframe...

Mike

On Wed, Feb 23, 2011 at 2:25 PM, Mike Schumacher
mike.schumac...@gmail.comwrote:

 Hello R folks,

 Reproducible code below - I'm trying to do a weighted mean by a factor and
 can't figure it out.  Thanks in advance for your assistance.

 Mike

 data-data.frame(c(5,5,1,1,1),
 c(10,8,9,5,3),
 c(A,A,A,B,B))

 names(data)-c(COUNT,MEASURE,TYPE)

 ## UNWEIGHTED MEAN BY TYPE
 by(data$MEASURE,data$TYPE,mean)

 ## WEIGHTED MEAN WITHOUT TYPE
 weighted.mean(x=data$MEASURE,w=data$COUNT)

 ## WEIGHTED MEAN BY TYPE - DOESNT WORK
 by(data$MEASURE,data$TYPE,weighted.mean(x=data$MEASURE,w=data$COUNT))




 --
 Michael Schumacher
 Manager, Data and Analytics
 ValueClick
 mike.schumac...@gmail.com






-- 
Michael Schumacher
mike.schumac...@gmail.com
818-851-8638

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with rbind when data frame contains an date-time variable POSIXt POSIXlt

2011-02-17 Thread Bill.Venables

The solution is probably to make the data-time columns POSIXct:

 x - read.table(textConnection(
+ ID event.date.time
+ 1 '2009-07-23 00:20:00'
+ 2 '2009-08-18 16:25:00'
+ 3 '2009-08-13 08:30:00'
+ ), header = TRUE)
 y - read.table(textConnection(
+ ID event.date.time
+ 4 '2009-08-25 10:25:00'
+ 5 '2009-08-10 06:20:00'
+ 6 '2009-10-09 08:20:00'
+ ), header = TRUE)
 closeAllConnections()
 
 x - within(x, 
+ event.date.time - as.POSIXct(as.character(event.date.time),
+ format = %Y-%m-%d %H:%M:%S))
 y - within(y, 
+ event.date.time - as.POSIXct(as.character(event.date.time),
+ format = %Y-%m-%d %H:%M:%S))
 z - rbind(x, y)
 z
  ID event.date.time
1  1 2009-07-23 00:20:00
2  2 2009-08-18 16:25:00
3  3 2009-08-13 08:30:00
4  4 2009-08-25 10:25:00
5  5 2009-08-10 06:20:00
6  6 2009-10-09 08:20:00
 

No problems. 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Andrew Anglemyer
Sent: Friday, 18 February 2011 3:54 PM
To: r-help@r-project.org
Subject: [R] problem with rbind when data frame contains an date-time variable 
POSIXt POSIXlt

I'm trying to rbind two data frames, both with the same columns names.  One
of the columns is a variable with date-time and this variable is causing the
rbind to fail--giving the error
Error in names(value[[jj]])[ri] - nm :  'names' attribute [7568] must be
the same length as the vector [9]

Is there a way to stack or rbind these two data frames even with this
extended date-time variable?  The class of event.date.time in each data
frame is POSIXt POSIXlt.
x
ID event.date.time
1 2009-07-23 00:20:00
2 2009-08-18 16:25:00
3 2009-08-13 08:30:00

y
ID event.date.time
4 2009-08-25 10:25:00
5 2009-08-10 06:20:00
6 2009-10-09 08:20:00

I would like to get

z
ID event.date.time
1 2009-07-23 00:20:00
2 2009-08-18 16:25:00
3 2009-08-13 08:30:00
4 2009-08-25 10:25:00
5 2009-08-10 06:20:00
6 2009-10-09 08:20:00

I've looked at stripping the dates and times, but it would be really helpful
for my purposes to keep the extended variable date-time variable (have to
eventually get 24 hours from baseline.date.time).

thanks for any and all help!
Andy

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Saturated model in binomial glm

2011-02-16 Thread Bill.Venables

This is a very good question.  You have spotted something that not many people 
see and it is important.

The bland assertion that the deviance can be used as a test of fit can be 
seriously misleading.

For this data the response is clearly binary, Admitted (success) or 
Rejected (failure) and the other two factors are explanatory variables.  

Any binomial model can be fitted by expressing the data as a binary response.  
In this case there are

 sum(UCBAd$Freq)
[1] 4526
 

4526 trials, corresponding to the individual applicants for admission.  We can 
expand the data frame right out to this level and fit the model with the data 
in that form, and in this case the weights will be the default, ie. all 1's.

We can also *group* the data into subsets which are homogeneous with respect to 
the explanatory variables.  

The most succinct grouping would be into 12 groups corresponding to the 2 x 6 
distinct classes defined by the two explanatory variables.  In this case you 
would specify the response either as a two-column matrix of successes/failures, 
or as a proportion with the totals for each of the 12 cases as weights.

Another succince grouping is into 2 x 2 x 6 classes as you do in your example.  
In this case your response is the factor and the weights are the frequencies.

For all three cases a) the estimates will be the same, and so the predictions 
will be identical, b) the deviance will also be the same, but c) the degrees of 
freedom attributed to the deviance will be different.

The reason for c) is, as you have intuited, the saturated model is different.  
Literally, the saturated model is a model with one mean parameter for each 
value taken as an observation when the model is fitted.  So the saturated model 
is *not* invariant with respect to grouping.

Let's look at two of these cases computationally:


 UCB_Expanded - UCBAd[rep(1:24, UCBAd$Freq), 1:3] ## expand the data frame
 dim(UCB_Expanded)
[1] 45263

  ### now fit your model

 m1 - glm(Admit ~ Gender + Dept, binomial, UCBAd, weights = Freq)

  ### and the same model using the binary data form

 m2 - glm(Admit ~ Gender + Dept, binomial, UCB_Expanded)

  ### as predicted, the coefficients are identical (up to round off)

 coef(m1)
 (Intercept) GenderFemaleDeptBDeptCDeptDDeptE 
 -0.58205140  -0.09987009   0.04339793   1.26259802   1.29460647   1.73930574 
   DeptF 
  3.30648006 
 coef(m2)
 (Intercept) GenderFemaleDeptBDeptCDeptDDeptE 
 -0.58205140  -0.09987009   0.04339793   1.26259802   1.29460647   1.73930574 
   DeptF 
  3.30648006 

   ### and so are the deviances, perhaps surprisingly:

 deviance(m1)
[1] 5187.488
 deviance(m2)
[1] 5187.488

   ### but the degrees of freedom attributed to the deviance are different!

 m1$df.resid
[1] 17
 m2$df.resid
[1] 4519
 
 
If you were to fit the model in the most succinct form, with 12 relative 
frequencies, then you would get the same deviance again, but the degrees of 
freedom would be only 5.

So you need to be very careful in taking the deviance, even in binomial models, 
as a test of fit.  The way the data are grouped is relevant.

If you have two fixed models, e.g. Admit ~ Gender, and Admit ~ Gender + Dept, 
then

The estimated coefficients, and their standard errors, vcov matrix,
The deviance, and so
*Differences* in deviances and
*Differences* in degrees of freedom

will be the same however the data are grouped, and so the usual tests and CI 
processes go ahead fine.

But the deviance itself can be misleading as a test of fit, since the outer 
hypothesis, the saturated model, is not fixed and depends on grouping.  For the 
ungrouped binary case it is *usually* misleading when taken simply at face 
value as chi-squared distributed under the null hypothesis.

I think there is a book or two around that discusses this issue, but probably 
not well enough.

Bill Venables.


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Giovanni Petris
Sent: Thursday, 17 February 2011 7:58 AM
To: r-help@r-project.org
Subject: [R] Saturated model in binomial glm


Hi all,

Could somebody be so kind to explain to me what is the saturated model
on which deviance and degrees of freedom are calculated when fitting a
binomial glm?

Everything makes sense if I fit the model using as response a vector of
proportions or a two-column matrix. But when the response is a factor
and counts are specified via the weights argument, I am kind of lost
as far as what is the implied saturated model. 

Here is a simple example, based on the UCBAdmissions data.

 UCBAd - as.data.frame(UCBAdmissions)
 UCBAd - glm(Admit ~ Gender + Dept, family = binomial,
+ weights = Freq, data = UCBAd)
 UCBAd$deviance
[1] 5187.488
 UCBAd$df.residual
[1] 17

I can see that the data frame UCBAd has 24 rows and using 1+1+5
parameters to fit the model leaves me with 17 degrees of freedom.

Re: [R] How to get warning about implicit factor to integer coercion?

2011-02-14 Thread Bill.Venables

Your complaint is based on what you think a factor should be rather than what 
it actually is andhow it works.  The trick with R (BTW I think it's version 
2.12.x rather than 12.x at this stage...) is learning to work *with* it as it 
is rather than making it work the way you would like it to do.

Factors are a bit tricky.  The are numeric objects, even if arithmetic is 
inhibited.

 f - factor(letters)
 is.numeric(f)  ## this looks strange
[1] FALSE
 mode(f)## but at a lower level
[1] numeric

Take a simple example.  

 x - structure(1:26, names = sample(letters))
 x
 h  o  u  w  l  z  a  j  e  n  k  i  s  v  t  g  f  x  c  b  y  d  m  q  p  r 
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 

If you use the factor f as an index, it behaves as a numeric vector of indices:

 x[f]
 h  o  u  w  l  z  a  j  e  n  k  i  s  v  t  g  f  x  c  b  y  d  m  q  p  r 
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 

That's just the way it is.  That's the reality.  You have to learn to deal with 
it. 

Sometimes a factor behaves as a character string vector, when no other 
interpretation would make sense.  e.g.

 which(f == o)
[1] 15


but in other cases they do not.  In this case you can make the coercion 
explicit of course, if that is your bent:

 which(as.character(f) == o)
[1] 15
 

but here there is no need.  There are cases were you *do* need to make an 
explicit coercion, though, if you want it to behave as a character string 
vector, and indexing is one:

 x[as.character(f)]
 a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  t  u  v  w  x  y  z 
 7 20 19 22  9 17 16  1 12  8 11  5 23 10  2 25 24 26 13 15  3 14  4 18 21  6 
  

If you want factors to behave universally as character string vectors, the 
solution is not to use factors at all but to use character string vectors 
instead.  You can get away with a surprising amount this way.  e.g. character 
string vectors used in model formulae are silently coerced to factors, anyway.  
What you need to learn is how to read in data frames keeping the character 
string columns as is and stop them from being made into factors at that 
stage.  That is a lesson for another day...

Bill Venables.


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of WB Kloke
Sent: Monday, 14 February 2011 8:31 PM
To: r-h...@stat.math.ethz.ch
Subject: [R] How to get warning about implicit factor to integer coercion?

Is there a way in R (12.x) to avoid the implicit coercion of factors to integers
in the context of subscripts?

If this is not possible, is there a way to get at least a warning, if any
coercion of this type happens, given that the action of this coercion is almost
never what is wanted?

Of course, in the rare case that as.integer() is applied explicitly onto a
factor, the warning is not needed, but certainly not as disastrous as in the
other case.

Probably, in a future version of R, an option similar to that for partial
matches of subscripts might be useful to change the default from maximal
unsafety to safe use.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Predictions with missing inputs

2011-02-11 Thread Bill.Venables

With R it is always possible to shoot yourself squarely in the foot, as you 
seem keen to do, but R does at least often make it difficult.

When you predict, you need to have values for ALL variables used in the model.  
Just leaving out the coefficients corresponding to absent predictors is 
equivalent to assuming that those coefficients are zero, and there is no basis 
whatever for so assuming.  (In this constructed example things are different 
because the missing variable is a nonsense variable and the coefficient should 
be roughly zero, as it is, but in general that is not going to be the case.)

So you need to supply some value for each of the missing predictors if you are 
going to use the standard prediction tools.  An obvious plug is the mean of 
that variable in the training data, though more sophisticated alternatives 
would often be available.

Here is a suggestion for your case.

## fit some linear model to random data

x - matrix(rnorm(100*3),100,3)
y - sample(1:2, 100, replace = TRUE)
mydata - data.frame(y, x)
library(splines)## missing from your code.
mymodel - lm(y ~ ns(X1, df = 3) + X2 + X3, data = mydata)
summary(mymodel)

## create new data with 1 missing input

mynewdata - within(data.frame(matrix(rnorm(100*2), 100, 2)),  ## add in an X3
   X3 - mean(mydata$X3))
mypred - predict(mymodel, mynewdata)


From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
Axel Urbiz [axel.ur...@gmail.com]
Sent: 12 February 2011 11:51
To: R-help@r-project.org
Subject: [R] Predictions with missing inputs

Dear users,

I'll appreciate your help with this (hopefully) simple problem.

I have a model object which was fitted to inputs X1, X2, X3. Now, I'd like
to use this object to make predictions on a new data set where only X1 and
X2 are available (just use the estimated coefficients for these variables in
making predictions and ignoring the coefficient on X3). Here's my attempt
but, of course, didn't work.

#fit some linear model to random data

x=matrix(rnorm(100*3),100,3)
y=sample(1:2,100,replace=TRUE)
mydata - data.frame(y,x)
mymodel - lm(y ~ ns(X1, df=3) + X2 + X3, data=mydata)
summary(mymodel)

#create new data with 1 missing input

mynewdata - data.frame(matrix(rnorm(100*2),100,2))
mypred - predict(mymodel, mynewdata)
Thanks in advance for your help!

Axel.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] if a variable is defined

2011-02-10 Thread Bill.Venables

!is.null(my.obj...@my.data.frame$my.var) 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Kushan Thakkar
Sent: Friday, 11 February 2011 10:30 AM
To: r-help@r-project.org
Subject: [R] if a variable is defined

I have an object type my.object. One of its slots contains a data frame
(i.e. my.obj...@my.data.frame) .. I want to check if one of the variables
exists in this data frame (i.e. my.obj...@my.data.frame$my.var)

I am trying to use the exists function but can't figure out how the
arguments work. Please help.

So far I have tried

exists(my.obj...@my.data.frame$my.var)
exists(my.obj...@my.data.frame$my.var)
exists(my.var,where=my.obj...@my.data.frame)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Getting p-value from summary output

2011-02-10 Thread Bill.Venables

Hi Alice,

You can use

pvals - summary(myprobit)$coefficients[, Pr(|z|)]

Notice that if the p-value is very small, the printed version is abbreviated, 
but the object itself has full precision (not that it matters).

Bill Venables. 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Allie818
Sent: Friday, 11 February 2011 9:46 AM
To: r-help@r-project.org
Subject: [R] Getting p-value from summary output


I can get this summary of a model that I am running:

summary(myprobit)

Call:
glm(formula = Response_Slot ~ trial_no, family = binomial(link = probit), 
data = neg_data, na.action = na.pass)

Deviance Residuals: 
Min   1Q   Median   3Q  Max  
-0.9528  -0.8934  -0.8418   1.4420   1.6026  

Coefficients:
 Estimate Std. Error z value Pr(|z|)
(Intercept) -0.340528   0.371201  -0.9170.359
trial_no-0.005032   0.012809  -0.3930.694

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 62.687  on 49  degrees of freedom
Residual deviance: 62.530  on 48  degrees of freedom
AIC: 66.53

Number of Fisher Scoring iterations: 4

But I would like to get the p-value [column heading Pr(|z|)] for the
esimate.
I can get the coefficient estimates with myprobit$coefficients. Is there
something similar to get the p-value?

Thank you in advance.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Getting-p-value-from-summary-output-tp3300503p3300503.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] factor.scores

2011-02-09 Thread Bill.Venables

The function factor.scores does not inherit anything.  It is a generic 
function that provieds methods for a number of classes, including those you 
mention.  (The terminology is important if you are to understand what is going 
on here):


 library(ltm)
Loading required package: MASS
Loading required package: msm
Loading required package: mvtnorm
Loading required package: polycor
Loading required package: sfsmisc

This is package 'ltm' version '0.9-5'

 find(factor.scores)
[1] package:ltm
 factor.scores
function (object, ...) 
{
UseMethod(factor.scores)
}
environment: namespace:ltm
 methods(factor.scores)
[1] factor.scores.gpcm  factor.scores.grm   factor.scores.ltm   
factor.scores.rasch
[5] factor.scores.tpm  
  

So what you have to do, if you want factor.scores() to work on objects that you 
create is two things:

1. Give your objects an S3 class attribute, say bifactor, and
2. Write your own S3 method function 

factor.scores.bifactor - function(object, z, y, x, ...) {  }

to perform the task.  The name of the function has to be 
factor.scores.bifactor and the name of the first formal argument has to be 
object and you must have a ... in the argument list, but from that point on 
you have a pretty free hand.

What you probably want to do is use some of the existing method functions to do 
the hard work once you have munged you object into a form that makes that 
possible.

Bill Venables.


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of James Lewis
Sent: Thursday, 10 February 2011 12:09 PM
To: r-help@R-project.org
Subject: [R] factor.scores

The function factor.scores is used with package ltm and others to estimate IRT 
type scores for various models.  It inherits objects of class grm, gpcm and a 
few others.  What I would like to do is to use the factor.scores function, but 
feed it my own item parameters (from a bifactor model where the 2PL parameters 
are adjusted for the bifactor structure). Does anybody have an idea of how this 
might be done?  I can, of course, create a list, matrix or other object 
containing the item parameters, but factor.scores only inherits particular 
object classes.  Any thoughts would be great. Thanks,

Jimmy

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merge multiple .csv files

2011-02-09 Thread Bill.Venables

You want advice?

1. Write sentences that contain a subject and where appropriate, an object as 
well.  This makes your email just that bit more polite.  This list is not a 
paid service.

2. The sheets may have variables in common, but do they have the same name in 
both, and the same class, and some values in common?  

You do not give us very much to go on.   

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Benjamin Caldwell
Sent: Thursday, 10 February 2011 1:22 PM
To: r-help
Subject: [R] merge multiple .csv files

Am trying to merge about 15 .csv tables - tried a test run but came up with
0 rows (no data for each variable/column head)

 CAHSEE.EA.feb.2009-read.csv(2009 CAHSEE EA feb 2009.csv, header=TRUE)
 CAHSEE.IM.MATH.2009-read.csv(2009 CAHSEE Impact Math.csv, header=TRUE)
 testmerge-merge(CAHSEE.EA.feb.2009,CAHSEE.IM.MATH.2009)
 testmerge
 [1] Grade  LocalStudentID MathPassed MathScaleScore SchoolCode

 [6] LastName   FirstName  ELAPassed  ELAScaleScore
 MathTestDate
0 rows (or 0-length row.names)

I know several variables are shared in both sheets. Please advise.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Newb Prediction Question using stepAIC and predict(), is R wrong?

2011-02-09 Thread Bill.Venables

Using complex names, like res[, 3+i] or res$var, in the formula for a model is 
a very bad idea, especially if eventually you want eventualluy to predict to 
new data.  (In fact it won't work, so that makes is very bad indeed.)  So do 
not use '$' or '[..]' terms in model formulae - this is going to cause problems 
when it comes to predict, because your formula will not associate with the 
names it has in its formula in the new data frame.  When you think about it, 
this is obvious.

In your case you will have to identify the actual names and build the formula 
that way.

So your model will be fitted with a call something like

fm - lm(paid ~ x3i + xi + Sun + Fri + Sat, data = reservesub)

(but you will have to use the real names for the first two, of course).

If you are doing this in some kind of loop, there are ways to handle it without 
using terms such as reservesub[, 3+i] but they are not all that simple.  Still, 
if you want to predict from the model to new data, there is no way round it.

Interactions are inculded generally with the * or the / linear model operators.

Bill Venables. 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of BSanders
Sent: Thursday, 10 February 2011 2:49 PM
To: r-help@r-project.org
Subject: [R] Newb Prediction Question using stepAIC and predict(), is R wrong?


I'm using stepAIC to fit a model.  Then I'm trying to use that model to
predict future happenings.

My first few variables are labeled as their column. (Is this a problem?)
The dataframe that I use to build the model is the same as the data I'm
using to predict with.

Here is a portion of what is happening..


This is the value it is predicting  =  [1] 9.482975

Summary of the model
Call:
lm(formula = reservesub$paid ~ reservesub[, 3 + i] + reservesub$grads[, 
i] + reservesub$Sun + reservesub$Fri + reservesub$Sat)

Residuals:
Min  1Q  Median  3Q Max 
-15.447  -4.993  -1.090   3.910  27.454 

Coefficients:
  Estimate Std. Error t value Pr(|t|)
(Intercept)5.713701.46449   3.902 0.000149 ***
reservesub[, 3 + i]1.008680.01643  61.391   2e-16 ***
reservesub$grads[, i]  0.446490.12131   3.681 0.000333 ***
reservesub$Sun 8.636061.95100   4.426 1.93e-05 ***
reservesub$Fri 3.769282.00079   1.884 0.061682 .  
reservesub$Sat 4.031032.12754   1.895 0.060225 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 7.842 on 138 degrees of freedom
  (131 observations deleted due to missingness)
Multiple R-squared: 0.9794, Adjusted R-squared: 0.9787 
F-statistic:  1312 on 5 and 138 DF,  p-value:  2.2e-16 


Here is the data that is being fed into predicted[p] =
predict.(stepsaicguess[[p]], newdata = reservesubpred[p,])
   V1  V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18
V19 paid Mon Tue Wed Thu
276 10/3/2010 155 84 76 68 64 63 53 42  42  42  42  38  38  38  35  31  31 
NA   84   0   0   0   0  
 Fri Sat Sun grads.1 grads.2 grads.3 grads.4 grads.5 grads.6 grads.7
0   01   8   4   1  10  11   0   0
grads.8 grads.9 grads.10 grads.11 grads.12 grads.13 grads.14
 0   400340


In this case, i = 1, so I calculate the predicted value should be 
5.7137+1.00868*84+.44649*8+1*8.636+0*3.769+0*4.03=102

But, R is giving me 9.482975 for a predicted value .. (Which, interestingly
is 5.7137+3.769*1) (Intercept+Sat)

Another question I have is, if I were to include interactions in this model,
would I have to make those variables in my prediction dataframe, or would R
'know' what to do?

Thanks in advance for your expert assistance.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Newb-Prediction-Question-using-stepAIC-and-predict-is-R-wrong-tp3298569p3298569.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] leap year and order function

2011-01-30 Thread Bill.Venables

 yearLength - function(year) 365 + (year %% 4 == 0)


 yearLength(1948:2010)
 [1] 366 365 365 365 366 365 365 365 366 365 365 365 366 365 365 365 366 365 
365 365 366
[22] 365 365 365 366 365 365 365 366 365 365 365 366 365 365 365 366 365 365 
365 366 365
[43] 365 365 366 365 365 365 366 365 365 365 366 365 365 365 366 365 365 365 
366 365 365
  

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Bobby Lee
Sent: Monday, 31 January 2011 2:23 PM
To: R-help@r-project.org
Subject: [R] leap year and order function

im trying to write a for loop so that every leap year, the number of days
becomes to 366 instead of 365. could someone help me out?
and also, this set of data has 99.99s I set all the 99.99 ==NA.
however, when im doing the order function to find the max value of that
year, it still reads 99.99 as the max value.
Thank you very much

maxday - matrix(NA,63,5)

for (a in 1948:2010){

maxday[,1]-1948:2010
yearly-na.omit(dat.mat[dat.mat[,1]==a,])

maxday[a-1947,2]-yearly[order(yearly[,4])[*365*],2]
maxday[a-1947,3]-yearly[order(yearly[,4])[*365*],3]


maxday[63,2]-yearly[order(yearly[,4])[127],2]
maxday[63,3]-yearly[order(yearly[,4])[127],3]
maxday[a-1947,4]-max(yearly[,4])
maxday[,5]-len[,2]
}

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Unexpected Gap in simple line plot

2011-01-20 Thread Bill.Venables

You do have missing values.  Setting xlim does not subset the data. 

How about


link - http://processtrends.com/files/RClimate_CTS_latest.csv;
cts - read.csv(link, header = TRUE)

scts - subset(cts, !is.na(GISS)  !is.na(cts))   ## remove defectives

plot(GISS ~ yr_frac, scts, type = l, xlim = c(1982, 1983),
 xaxs = i, yaxs = i)
points(GISS ~ yr_frac, scts, type = p, col = red)


?

Bill Venables
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of D Kelly O'Day
Sent: Friday, 21 January 2011 11:12 AM
To: r-help@r-project.org
Subject: [R] Unexpected Gap in simple line plot


I am getting an unexpected gap in a simple plot of monthly data series.

I have created a csv file of monthly climate observations that I store
on-line. When I download the csv file and plot one of the series, I get a
gap even though there is data for the missing point.

Here is a snippet to show the problem.

  ## Strange plot results
link - http://processtrends.com/files/RClimate_CTS_latest.csv;
cts - read.csv(link, header=T)

## Simple line plot - gap for no reason
plot(cts$yr_frac, cts$GISS, type=l, xlim=c(1982, 1983),xaxs=i,
yaxs=i)

   ## May, 1982 observation missing
   ## Add points to plot in red, to see if May shows up
   points(cts$yr_frac, cts$GISS, type=p, col=red)
   ## yes, May shows up in points

## Look at cts data.frame. No obvious problems to me??
  small - subset(cts, cts$yr_frac 1982  cts$yr_frac 1983)
  small[,c(1,3)]

The same problem occurs in the other data vectors (HAD, NOAA, RSS, UAH, etc)

I have not been able to figure why I am getting the gap and how to show a
continuous line through May, 1982 data points.

Any suggestions?

D Kelly O'Day
http://chartsgraphs.worpress.com http://chartsgraphs.wordprss.com 



-- 
View this message in context: 
http://r.789695.n4.nabble.com/Unexpected-Gap-in-simple-line-plot-tp3228853p3228853.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Log difference in a dataframe column

2011-01-18 Thread Bill.Venables

lag and as.ts are separate operations (which in fact commute)

 lag(as.ts(1:10), 1)
Time Series:
Start = 0 
End = 9 
Frequency = 1 
 [1]  1  2  3  4  5  6  7  8  9 10

 as.ts(lag(1:10, 1))
Time Series:
Start = 0 
End = 9 
Frequency = 1 
 [1]  1  2  3  4  5  6  7  8  9 10
 

You do NOT need to call as.ts at all it that's your bent:

 lag(1:10, 1)
 [1]  1  2  3  4  5  6  7  8  9 10
attr(,tsp)
[1] 0 9 1

but as lag is an operation designed for time series objects, it's a pretty good 
bet that to make any sense of the result you need to start (or end up) with a 
time series object.

The key is the concept of an 'object' in R.  You need to know what kind of 
objects you are dealing with.
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of eric
Sent: Wednesday, 19 January 2011 1:08 PM
To: r-help@r-project.org
Subject: Re: [R] Log difference in a dataframe column


Just learning so excuse me if I'm being too basic here. But I'm wondering how
should I know that as.ts would be needed for lag ? Is there a thought
process or way to inspect that I should have gone through to know that log
would work on y[,5] but lag would not work on [,5] ? 

Is the general rule that lag is not in the base package and therefore would
not work ?

Thanks for the comments

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Log-difference-in-a-dataframe-column-tp3221225p3224478.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot continuous data vs clock time

2011-01-17 Thread Bill.Venables

plot(y~x, type=p, xlim = x[c(2,4)])

?
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of wangxipei
Sent: Tuesday, 18 January 2011 1:27 PM
To: r-help
Subject: [R] plot continuous data vs clock time

Dear R users,
  I have a question about ploting clock time, the example is as below:
 y-seq(from=1, to=30, by=5)
 x-c(0:01,1:20, 8:40, 9:25, 15:30, 21:23)
 x-as.POSIXct(strptime(paste(x),%H:%M))
   plot(y~x, type=p)
I got the plot, but if I want to plot the x range from 1:20 to 9:25, how can I 
set the xlim argument?
Any suggestions about the clock time plotting would be appreciated.
Xipei Wang
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data prep question

2011-01-16 Thread Bill.Venables

Here is one way

Here is one way:

 con - textConnection(
+ ID  TIMEOBS
+ 001 220023 
+ 001 240011 
+ 001 320010 
+ 001 450022
+ 003 3900 45 
+ 003 5605 32
+ 005 180056
+ 005 190034
+ 005 230023)
 dat - read.table(con, header = TRUE,
+ colClasses = c(factor, numeric, numeric))
 closeAllConnections()
 
 tmp - lapply(split(dat, dat$ID), 
+ function(x) within(x, TIME - TIME - min(TIME)))
 split(dat, dat$ID) - tmp
 dat
   ID TIME OBS
1 0010  23
2 001  200  11
3 001 1000  10
4 001 2300  22
5 0030  45
6 003 1705  32
7 0050  56
8 005  100  34
9 005  500  23
 



From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
Matthew Strother [rstro...@gmail.com]
Sent: 16 January 2011 07:26
To: r-help@r-project.org
Subject: [R] data prep question

I have a data set with several thousand observations across time, grouped by 
subject (example format below)

ID  TIMEOBS
001 220023
001 240011
001 320010
001 450022
003 390045
003 560532
005 180056
005 190034
005 230023
...

I would like to identify the first time for each subject, and then subtract 
this value from each subsequent time.  However, the number of observations per 
subject varies widely (from 1 to 20), and the intervals between times varies 
widely.   Is there a package that can help do this, or a loop that can be set 
up to evaluate ID, then calculate the values?  The outcome I would like is 
presented below.
ID  TIMEOBS
001 0   23
001 200 11
001 100010
001 230022
003 0   45
003 170532
005 0   56
005 100 34
005 500 23
...

Any help appreciated.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rounding variables in a data frame

2011-01-14 Thread Bill.Venables

If you can specify the omitted columns as numbers there is a quick way to do 
it.  e.g.

 d
  d1   d2d3d4
1   9.586524 4.833417 0.8142588 -3.237877
2  11.481521 6.536360 2.3361894 -4.042314
3  10.243192 5.506440 2.0443788 -3.478543
4   9.969548 6.159666 3.0449121 -4.827746
5   9.193832 4.085796 3.0176759 -2.199658
6  10.821571 5.442870 1.2630331 -3.483939
7   9.288714 5.756906 1.2432079 -4.387761
8   8.258251 7.729755 0.9170335 -5.416554
9  10.371080 6.234921 2.4462774 -4.433582
10  9.378106 5.389561 4.0592613 -3.433946
 out - d
 out[-3] - round(d[-3])
 out
   d1 d2d3 d4
1  10  5 0.8142588 -3
2  11  7 2.3361894 -4
3  10  6 2.0443788 -3
4  10  6 3.0449121 -5
5   9  4 3.0176759 -2
6  11  5 1.2630331 -3
7   9  6 1.2432079 -4
8   8  8 0.9170335 -5
9  10  6 2.4462774 -4
10  9  5 4.0592613 -3
 

If you want to specify the column names as a character vector, this is almost 
as easy.  e.g.

 omit - c(d1, d3)
 leave - match(omit, names(d))
 out - d
 out[-leave] - round(d[-leave])
 out
  d1 d2d3 d4
1   9.586524  5 0.8142588 -3
2  11.481521  7 2.3361894 -4
3  10.243192  6 2.0443788 -3
4   9.969548  6 3.0449121 -5
5   9.193832  4 3.0176759 -2
6  10.821571  5 1.2630331 -3
7   9.288714  6 1.2432079 -4
8   8.258251  8 0.9170335 -5
9  10.371080  6 2.4462774 -4
10  9.378106  5 4.0592613 -3
 

Bill Venables

From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
Pete B [peter.breckn...@bp.com]
Sent: 15 January 2011 14:53
To: r-help@r-project.org
Subject: [R] Rounding variables in a data frame

Hi All

I am trying to use the round function on some columns of a dataframe while
leaving others unchanged. I wish to specify those columns to leave
unchanged.

My attempt is below - here, I would like the column d3 to be left but
columns d1, d2 and d4 to be rounded to 0 decimal places. I would welcome any
suggestions for a nicer way of doing this.

d1= rnorm(10,10)
d2= rnorm(10,6)
d3= rnorm(10,2)
d4= rnorm(10,-4)

d = data.frame(d1,d2,d3,d4)

x= NULL
for (i in 1:ncol(d)){
  if (colnames(d)[i] ==d3){x[i] = d[i]
  } else { x[i] = round(d[i])}
  out = do.call(cbind,x)
}

colnames(out) = colnames(d)

Thanks and regards

Pete
--
View this message in context: 
http://r.789695.n4.nabble.com/Rounding-variables-in-a-data-frame-tp3218729p3218729.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Weighted least squares regression for an exponential decay function

2011-01-14 Thread Bill.Venables

nls in the stats package.

?nls

From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
Erik Thulin [ethu...@gmail.com]
Sent: 15 January 2011 16:16
To: r-help@r-project.org
Subject: [R] Weighted least squares regression for an exponential decay function

Hello,

I have a data set of data which is best fit by an exponential decay
function. I would like to use a nonlinear weighted least squares regression.

What function should I be using?

Thank you!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems creating a PNG file for a dendrogram: Error in plot.window(...) : need finite 'xlim' values

2011-01-11 Thread Bill.Venables

I very much doubt your first example does work.  the value of plot() is NULL 
which if you plot again will give the error message you see in your second 
example.

What where you trying to achieve doing

p - plot(hc)
plot(p) ### this one is trying to plot NULL

?

Here is an example (such as you might have given, according to the posting 
guide):

 x - matrix(rnorm(500*5), 500, 5)
 dc - as.dist(1-cor(x))
 hc - hclust(dc)
 p - plot(hc)
 plot(p)
Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
 

Look familiar? This is why:

 p
NULL
 


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Richard Vlasimsky
Sent: Wednesday, 12 January 2011 10:01 AM
To: r-help@r-project.org
Subject: [R] Problems creating a PNG file for a dendrogram: Error in 
plot.window(...) : need finite 'xlim' values


Has anyone successfully created a PNG file for a dendrogram?

I am able to successfully launch and view a dendrogram in Quartz.  However, the 
dendrogram is quite large (too large to read on a computer screen), so I am 
trying to save it to a file (1000x4000 pixels) for viewing in other apps.  
However, whenever I try to initiate a PNG device, I get a need finitite 'xlim' 
values error. 



Here is some example code to illustrate my point:

cor.matrix - cor(mydata,method=pearson,use=pairwise.complete.obs);
distance - as.dist(1.0-cor.matrix);
hc - hclust(distance);
p - plot(hc);
plot(p);
#This works!  Plot is generated in quartz no problem.


#Now, try this:
png(filename=delme.png,width=4000,height=1000);
cor.matrix - cor(mydata,method=pearson,use=pairwise.complete.obs);
distance - as.dist(1.0-cor.matrix);
hc - hclust(distance);
p - plot(hc);
plot(p);
#Error in plot.window(...) : need finite 'xlim' values
#In addition: Warning messages:
#1: In min(x) : no non-missing arguments to min; returning Inf
#2: In max(x) : no non-missing arguments to max; returning -Inf
#3: In min(x) : no non-missing arguments to min; returning Inf
#4: In max(x) : no non-missing arguments to max; returning -Inf 

This is the exact same code, only a prior call to png() causes the seemingly 
unrelated xlim to fail.  Why is this?

Thanks,
Richard Vlasimsky
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting Factors -- Sorting x-axis

2011-01-07 Thread Bill.Venables

That rather depends on what kind of plot you want to use.

Here is one option that you can use without any changes:

##
con - textConnection(
   Months Prec
1 Jan   102.1
2 Feb69.7
3 Mar44.7
4 Apr32.1
5 May24.0
6 Jun18.7
7 Jul14.0
8 Aug20.0
9 Sep32.4
10Oct58.9
11Nov94.5
12Dec   108.2
)
tab - read.table(con)
closeAllConnections()
rm(con)

with(tab, barplot(Prec, names = as.character(Months)))

#
## If you want to adjust the factor so that the levels remain in natural months 
order, then
#

tab - within(tab, 
Months - factor(Months, levels = month.abb))

with(tab, barplot(Prec, names = levels(Months)))

##



From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
Taylor, Eric HLS:EX [eric.tay...@gov.bc.ca]
Sent: 07 January 2011 09:13
To: 'r-h...@stat.math.ethz.ch'
Subject: [R]  Plotting Factors -- Sorting x-axis

Hello;

How do I plot these data in R without the Months being ordered alphabetically?

   Months Prec
1 Jan   102.1
2 Feb69.7
3 Mar44.7
4 Apr32.1
5 May24.0
6 Jun18.7
7 Jul14.0
8 Aug20.0
9 Sep32.4
10Oct58.9
11Nov94.5
12Dec   108.2


Eric
Eric Taylor, Air Quality Meteorologist, Health Protection,
Ministry of Health Services




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] anova vs aov commands for anova with repeated measures

2011-01-07 Thread Bill.Venables

lm() and aov() are not fully equivalent.  They both fit linear models, but they 
use different algorighms, and this allows aov, for example, to handle some 
simple multistratum models.  The algorithm used by lm does not allow this, but 
it has other advantages for simpler models.

If you want to fit a multistratum model, such as a repeated measures model, you 
need to use aov.

When it comes to finding the residuals, you have to be explicit about which 
residuals you mean, too.  You get residuals for each stratum in a multistratum 
model.  Using plain old resid() will not work as that function can only be used 
when there is only one kind of residuals vector defined.  (it could me made to 
do something sensible, but that's another issue.  Right now, it doesn't.)

The function proj() can be used on a fitted model object to obtain the 
projections, at each stratum, on to the subspaces defined by the terms in the 
model, and includes the residuals (the projection onto the orthogonal 
complement of the model space in R^n) as the final column of the matrix of 
projections.  These are easy to dig out and you can analyse away.

See ?proj for an example of its use, but you will need to dig out the 
appropriate column yourself.


From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
Frodo Jedi [frodo.j...@yahoo.com]
Sent: 08 January 2011 01:51
To: r-help@r-project.org
Subject: [R] anova vs aov commands for anova with repeated measures

Dear all,
I need to understand a thing in the beheaviour of the two functions aov and
anova in the following case
involving an analysis of ANOVA with repeated measures:

If I use the folowing command I don´t get any problem:

aov1 = aov(response ~ stimulus*condition + Error(subject/(stimulus*condition)),
data=scrd)
 summary(aov1)

Instead if I try to fit the same model for the regression I get an error:
 fit1- lm(response ~ stimulus*condition + Error(subject/(stimulus*condition)),
data=scrd)

Error in eval(expr, envir, enclos) : could not find function Error

so I cannot run the command anova(fit1) afterwards.


I want to use fit1- lm(response ~ stimulus*condition +
Error(subject/(stimulus*condition)), data=scrd)

because I want to analyse the residuals in order to check normality, and see if
the anova assumption of normality
still holds.

Could you please help me in understanding how to do this?

Thanks in advance

Best regards



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] packagename:::functionname vs. importFrom

2011-01-03 Thread Bill.Venables

If you use ::: to access non-exported functions, as Frank confesses he does, 
then you can't complain if in the next release of the package involved the 
non-exported objects are missing and things are being done another way 
entirely.  That's the deal.

On the other hand, sometimes package authors do not envisage all the ways their 
package will be used and neglecting to export some object is mostly because the 
author simply did not anticipate that anyone would ever need to use it. But 
sometimes they do.  A common case is when you need to do some operations very 
efficiently and there are simplifications in the input of which you can take 
advantage to cut down on the overheads.  In that case you usually need the 
cut-down (non-exported) workhorse rather than the (exported) show-pony front 
end.

The documentation suggests that if you ever need to use ::: perhaps you should 
be contacting the package maintainer to have the article in question exported.  
This makes a lot of sense, but it can also creates quite a bit of work for the 
maintainers, too, if they agree to do it.

It's a very grey area, in my experience.

Bill Venables. 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Hadley Wickham
Sent: Tuesday, 4 January 2011 3:06 PM
To: Frank Harrell
Cc: r-help@r-project.org
Subject: Re: [R] packagename:::functionname vs. importFrom

 Correct.  I'm doing this because of non-exported functions in other packages,
 so I need :::

But you really really shouldn't be doing that.  Is there a reason that
the package authors won't export the functions?

 I'd still appreciate any insight about whether importFrom in NAMESPACE
 defers package loading so that if the package is not actually used (and is
 not installed) there will be no problem.

Imported packages need to be installed - but it's the import vs.
suggests vs. depends statement in DESCRIPTION that controls this
behaviour, not the namespace.

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Changing column names

2010-12-31 Thread Bill.Venables

You don't give us much to go on, but some variant of

country - c(US, France, UK, NewZealand, Germany, Austria, Italy, 
Canada)
result - read.csv(result.csv, header = FALSE)

names(result) - country

should do what you want.


From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
Vincy Pyne [vincy_p...@yahoo.ca]
Sent: 31 December 2010 16:07
To: r-help@r-project.org
Subject: [R] Changing column names

Dear R helpers

Wish you all a very Happy and Prosperous New Year 2011.

I have following query.

country = c(US, France, UK, NewZealand, Germany, Austria, Italy, 
Canada)

Through some other R process, the result.csv file is generated as

result.csv

 var1   var2  var3  var4var5var6   var7   var8
1  25 452992 108 105 65 56
2  801328338  38  11 47 74
3 135 117456  74  74 74 29


I need the country names to be column heads i.e. I need an output like

 result_new
USFrance UK   NewZealand  Germany  Austria  Italy 
Canada
1   25  45  29  92 108105   
 65 56
2   80132  83  38   38  11  
  47 74
3  135 11  74  56   74  74  
  74 29


The number of countries i.e. length(country) matches with total number of 
variables (i.e. no of columns in 'result.csv').

One way of doing this is to use country names as column names while writing the 
'result.csv' file.

write.csv(data.frame(US = ..., France = ...), 'result.csv', 
row.names = FALSE)


However, the problem is I don't know in what order the country names will 
appear and also there could be addition or deletion of some country names. 
Also, if there are say 150 country names, the above way (i.e. writing.csv) of 
defining the column names is not practical.

Basically I want to change the column heads after the 'result.csv' is generated.

Kindly guide.

Regards

Vincy




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] access a column of a dataframe without qualifying the name of the column

2010-12-29 Thread Bill.Venables

Here is an alternaive approach that is closer to that used by lm and friends.

 df - data.frame(x=1:10,y=11:20)
 test - function(col, dat) eval(substitute(col), envir = dat)
 test(x, df)
 [1]  1  2  3  4  5  6  7  8  9 10
 test(y, df)
 [1] 11 12 13 14 15 16 17 18 19 20
 

There is a slight added bonus this way

 test(x+y+1, df)
 [1] 13 15 17 19 21 23 25 27 29 31
 

(Well, I did say 'slight'.)

Bill Venables.



From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
David Winsemius [dwinsem...@comcast.net]
Sent: 30 December 2010 10:44
To: John Sorkin
Cc: r-help@r-project.org
Subject: Re: [R] access a column of a dataframe without qualifying the name 
of the column

On Dec 29, 2010, at 7:11 PM, John Sorkin wrote:

 I am trying to write a function that will access a column of a data
 frame without having to qualify the name of the data frame column as
 long as the name of the dataframe is passed to the function. As can
 be seen from the code below, my function is not working:

Not sure what the verb qualify means in programming. Quoting?


 df - data.frame(x=1:10,y=11:20)
 df

 test - function(column,data) {
  print(data$column)
 }

 test(x,df)

 I am trying to model my function after the way that lm works where
 one needs not qualify column names, i.e.


  df - data.frame(x=1:10,y=11:20)
  test - function(column,dat) { print(colname -
deparse(substitute(column)))
+  dat[[colname]]
+ }
 
  test(x,df)
[1] x
  [1]  1  2  3  4  5  6  7  8  9 10
 

--
David.




 fit1- lm(y~x,data=df)


 John David Sorkin M.D., Ph.D.
 Chief, Biostatistics and Informatics
 University of Maryland School of Medicine Division of Gerontology
 Baltimore VA Medical Center
 10 North Greene Street
 GRECC (BT/18/GR)
 Baltimore, MD 21201-1524
 (Phone) 410-605-7119
 (Fax) 410-605-7913 (Please call phone number above prior to faxing)

 Confidentiality Statement:
 This email message, including any attachments, is for th...{{dropped:
 6}}

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] filling up holes

2010-12-28 Thread Bill.Venables

Dear 'analyst41' (it would be a courtesy to know who you are)

Here is a low-level way to do it.  

First create some dummy data

 allDates - seq(as.Date(2010-01-01), by = 1, length.out = 50) 
 client_ID - sample(LETTERS[1:5], 50, rep = TRUE)
 value - 1:50
 date - sample(allDates)
 clientData - data.frame(client_ID, date, value)

At this point clientData has 50 rows, with 5 clients, each with a sample of 
datas.  Everything is in random order execept value.

Now write a little function to fill out a subset of the data consisting of one 
client's data only:
 
 fixClient - function(cData) {
+   dateRange - range(cData$date)
+   dates - seq(dateRange[1], dateRange[2], by = 1)
+   fullSet - data.frame(client_ID = as.character(cData$client_ID[1]),
+ date = dates, value = NA)
+ 
+   fullSet$value[match(cData$date, dates)] - cData$value
+   fullSet  
+ }

Now split up the data, apply the fixClient function to each section and 
re-combine them again:

 allData - do.call(rbind,
+lapply(split(clientData, clientData$client_ID), fixClient))

Check:

 head(allData)
client_ID   date value
A.1 A 2010-01-0436
A.2 A 2010-01-0518
A.3 A 2010-01-06NA
A.4 A 2010-01-07NA
A.5 A 2010-01-08NA
A.6 A 2010-01-0949
 

Seems OK.  At this point the data are in sorted order by client and date, but 
that should not matter.

Bill Venables.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of analys...@hotmail.com
Sent: Wednesday, 29 December 2010 10:45 AM
To: r-help@r-project.org
Subject: [R] filling up holes

I have a data frame with three columns

client ID | date | value


For each cilent ID I want to determine Min date and Max date and for
any dates in between that are missing I want to insert a row

Client ID | date| NA

Any help would be appreciated.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] linear regression for grouped data

2010-12-28 Thread Bill.Venables

library(nlme)
lmList(y ~ x | factor(ID), myData)

This gives a list of fitted model objects. 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Entropi ntrp
Sent: Wednesday, 29 December 2010 12:24 PM
To: r-help@r-project.org
Subject: [R] linear regression for grouped data

Hi,
I have been examining large data and need to do simple linear regression
with the data which is grouped based on the values of a particular
attribute. For instance, consider three columns : ID, x, y,  and  I need to
regress x on y for each distinct value of ID. Specifically, for the set of
data corresponding to each of the 4 values of ID (76,111,121,168) in the
below data, I should invoke linear regression 4 times. The challenge is
that, the length of the ID vector is around 2 and therefore linear
regression must be done automatically for each distinct value of ID.

   IDx y
 76 36476 15.8  76 36493 66.9  76 36579 65.6  111 35465 10.3  111 35756 4.8
121 38183 16  121 38184 15  121 38254 9.6  121 38255 7  168 37727 21.9  168
37739 29.7  168 37746 97.4
I was wondering whether there is an easy way to group data based on the
values of ID in R  so that linear regression can be done easily for each
group determined by each value of ID. Or, is the only way to construct
loops  with 'for' or 'while'  in which a matrix is generated for each
distinct value of ID  that stores corresponding values of x and y by
screening the entire ID vector?

Thanks in advance,

Yasin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] monthly median in a daily dataset

2010-12-19 Thread Bill.Venables

I find this function useful for digging out months from Date objects

Month - function(date, ...)
  factor(month.abb[as.POSIXlt(date)$mon + 1], levels = month.abb) 

For this little data set below this is what it gives

 with(data, tapply(value, Month(date), median, na.rm = TRUE))
  Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec 
   NANANANANANANANANANA 0.035 0.030

Here is another useful little one:

Year - function(date, ...)
  as.POSIXlt(date)$year + 1900

So if you wanted the median by year and month you could do

 with(data, tapply(value, list(Year(date), Month(date)), median, na.rm = TRUE))
 Jan Feb Mar Apr May Jun Jul Aug Sep Oct   Nov  Dec
2008  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA 0.035 0.03

(The result is a matrix, which in this case has only one row, of course.)

See how you go.

Bill Venables.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of HUXTERE
Sent: Monday, 20 December 2010 8:32 AM
To: r-help@r-project.org
Subject: [R] monthly median in a daily dataset


Hello,

I have a multi-year dataset (see below) with date, a data value and a flag
for the data value. I want to find the monthly median for each month in this
dataset and then plot it. If anyone has suggestions they would be greatly
apperciated. It should be noted that there are some dates with no values and
they should be removed.

Thanks
Emily

 print ( str(data$flow$daily) )
'data.frame':   16071 obs. of  3 variables:
 $ date :Class 'Date'  num [1:16071] -1826 -1825 -1824 -1823 -1822 ...
 $ value: num  NA NA NA NA NA NA NA NA NA NA ...
 $ flag : chr  ...
NULL

5202008-11-01 0.034 
1041   2008-11-02 0.034 
1562   2008-11-03 0.034 
2083   2008-11-04 0.038 
2604   2008-11-05 0.036 
3125   2008-11-06 0.035 
3646   2008-11-07 0.036 
4167   2008-11-08 0.039 
4688   2008-11-09 0.039 
5209   2008-11-10 0.039 
5730   2008-11-11 0.038 
6251   2008-11-12 0.039 
6772   2008-11-13 0.039 
7293   2008-11-14 0.038 
7814   2008-11-15 0.037 
8335   2008-11-16 0.037 
8855   2008-11-17 0.037 
9375   2008-11-18 0.037 
9895   2008-11-19 0.034B
10415  2008-11-20 0.034B
10935  2008-11-21 0.033B
11455  2008-11-22 0.034B
11975  2008-11-23 0.034B
12495  2008-11-24 0.034B
13016  2008-11-25 0.034B
13537  2008-11-26 0.033B
14058  2008-11-27 0.033B
14579  2008-11-28 0.033B
15068  2008-11-29 0.034B
15546  2008-11-30 0.035B
5212008-12-01 0.035B
1042   2008-12-02 0.034B
1563   2008-12-03 0.033B
2084   2008-12-04 0.031B
2605   2008-12-05 0.031B
3126   2008-12-06 0.031B
3647   2008-12-07 0.032B
4168   2008-12-08 0.032B
4689   2008-12-09 0.032B
5210   2008-12-10 0.033B
5731   2008-12-11 0.033B
6252   2008-12-12 0.032B
6773   2008-12-13 0.031B
7294   2008-12-14 0.030B
7815   2008-12-15 0.030B
8336   2008-12-16 0.029B
8856   2008-12-17 0.028B
9376   2008-12-18 0.028B
9896   2008-12-19 0.028B
10416  2008-12-20 0.027B
10936  2008-12-21 0.027B
11456  2008-12-22 0.028B
11976  2008-12-23 0.028B
12496  2008-12-24 0.029B
13017  2008-12-25 0.029B
13538  2008-12-26 0.029B
14059  2008-12-27 0.030B
14580  2008-12-28 0.030B
15069  2008-12-29 0.030B
15547  2008-12-30 0.031B
15851  2008-12-31 0.031B
-- 
View this message in context: 
http://r.789695.n4.nabble.com/monthly-median-in-a-daily-dataset-tp3094917p3094917.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Use generalised additive model to plot curve

2010-12-14 Thread Bill.Venables

Dear Lurker,

If all you art trying to do is to plot something, isn't all you need something 
like the following?

x - c( 30,  50,  80,  90, 100)
y - c(160, 180, 250, 450, 300)
sp - spline(x, y, n = 500)
plot(sp, type = l, xlab = x, ylab = y, 
 las = 1, main = A Spline Interpolation)
points(x, y, pch = 3, col = red, lwd = 2) 

Bill Venables.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of e-letter
Sent: Wednesday, 15 December 2010 8:37 AM
To: r-help@r-project.org
Subject: [R] Use generalised additive model to plot curve

Readers,

I have been reading 'the r book' by Crawley and think that the
generalised additive model is appropriate for this problem. The
package 'gam' was installed using the command (as root)

install.package(gam)
...
library(gam)

 library(gam)
Loading required package: splines
Loading required package: akima
 library(mgcv)
This is mgcv 1.3-25

Attaching package: 'mgcv'


The following object(s) are masked from package:gam :

 gam,
 gam.control,
 gam.fit,
 plot.gam,
 predict.gam,
 s,
 summary.gam

 x-c(30,50,80,90,100)
 y-c(160,180,250,450,300)
 model-gam(y~s(x))
Error in smooth.construct.tp.smooth.spec(object, data, knots) :
A term has fewer unique covariate combinations than specified
maximum degrees of freedom

The objective is to plot y against x, finally to produce a graph with
a smooth curve (and then remove the data points). What is my mistake
please?

yours,

r251
gnu/linux mandriva2008

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] LaTeX, MiKTeX, LyX: A Guide for the Perplexed

2010-12-07 Thread Bill.Venables

A new fortune is born?

Sharing LaTeX documents with people using word processors only is no more 
difficult than giving driving directions to someone who is blindfolded and has 
all 4 limbs tied behind their back.  Collaboration with people who insist on 
using programs that process their words much like a food processor processes 
food is the one legitimate reason to not use LaTeX (but untying them and 
removing the blindfold is much better).

Gets my vote. 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Greg Snow
Sent: Wednesday, 8 December 2010 2:24 PM
To: Paul Miller; r-help@r-project.org
Subject: Re: [R] LaTeX, MiKTeX, LyX: A Guide for the Perplexed

See inline below


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of Paul Miller
 Sent: Tuesday, December 07, 2010 4:29 PM
 To: r-help@r-project.org
 Subject: [R] LaTeX, MiKTeX, LyX: A Guide for the Perplexed
 
 Hello Everyone,
 
 Been learning R over the past several months. Read several books and
 have learned a great deal about data manipulation, statistical
 analysis, and graphics.
 
 Now I want to learn how to make nice looking documents and about
 literate programming. My understanding is that R users normally do
 this using LaTeX, MiKTeX, LyX, etc. in conjuction with Sweave. An
 alternative might be to use the R2wd package to create Word documents.
 
 So I guess I have four questions:
 
 1. How do I choose between the various options? Why would someone
 decide to use LaTeX instead of MiKTeX or vice versa for example?

MikTeX is a flavor of LaTeX for windows, so there is not much to decide between 
them, MikTeX is LaTeX (though there are other implementations on windows)

 
 2. What are the best sources of information about LaTeX, MiKTeX, LyX,
 etc.?

There are many documents about LaTeX, including books. But I would suggest 
starting with the Not So Short Introduction to LaTeX that is free and comes 
with MikTeX (and probably the others).  LyX should have its own docs with it.  
There are several other guides that come free that would be good to read after 
that.  If you want a paper book then I would suggest the original by Lamport.

 3. What is the learning curve like for each of these? What do you
 get for the time you put in learning something that is more difficult?

That depends on your background, if you have never worked from the command line 
or with a markup language, then it is pretty steep, if you have worked with 
those types of tools then it is not as bad.  But either way it is worth it.  
When I first learned LaTeX I was tempted to go back and rewrite my master's 
thesis in it instead of the word processor.  I did not, but my dissertation was 
in LaTeX.

 4. How do people who use LaTeX, MiKTeX, LyX, etc. share documents with
 people who are just using Word? How difficult does using LaTeX, MiKTeX,
 LyX, etc. make it to collaborate on projects with others?

Sharing LaTeX documents with people using word processors only is no more 
difficult than giving driving directions to someone who is blindfolded and has 
all 4 limbs tied behind their back.  Collaboration with people who insist on 
using programs that process their words much like a food processor processes 
food is the one legitimate reason to not use LaTeX (but untying them and 
removing the blindfold is much better).  If you just need basic input or 
approval then give them a paper version or pdf file and then you make the 
changes.  If they are going to be writing major portions or doing a lot of 
editing, then using LaTeX without all people understanding it will be a 
headache.

Some other packages to consider are odfWeave (odf is the opensource office 
suite, it can read and write MSWord documents, but still sweave with R); R2HTML 
works with sweave where the base document is in html;  sword is a tool from the 
same group as rexcel that gives the same general idea as sweave but using 
MSWord (windows only). 

 Thanks,
 
 Paul
 


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Newbie - want to view code for a function

2010-12-07 Thread Bill.Venables

For a substantial calculation like this the algorithms will likely be in C or 
Fortran.

You will need to download the source for the stats package from CRAN (as a 
tar.gz file), expand it, and look at the source code in the appropriate 
sub-directories.

You can get a bit of a road map in R by 

 find(ar.yw)
 methods(ar.yw)
 stats:::ar.yw.default
c

but you can't see the underlying C or Fortran code, which is where the real 
action will be.

(Of course you might want to read the help information first and follow up on 
the references there.  Only in desperation, of course.)

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Dick Knox
Sent: Wednesday, 8 December 2010 2:58 PM
To: r-help@r-project.org
Subject: [R] Newbie - want to view code for a function

(I am) Brand new to R.
(I) Want to understand the algorithm used in yule-walker time series 
autoregression model
I assume there is a way to see  the source for ar.yw
I also assume that everybody except me knows how

Could someone suggest to me how to find out

I've looked thru some of the documenttion  - there's a lot - and 
apparently I haven't looked the right place.

Thanks in advance

Dick

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Summing up Non-numeric column

2010-12-07 Thread Bill.Venables

?unique 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of zhiji19
Sent: Wednesday, 8 December 2010 2:57 PM
To: r-help@r-project.org
Subject: [R] Summing up Non-numeric column

Dear All

If I have the following dataset

V1 V2
x   y
y   x
z   b
a   c
b   j
d   l
c   o

How do I use R command to get the total number of different letter in column
V1
column V1 has 7 different letters.

Thank you
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Summing-up-Non-numeric-column-tp3077710p3077710.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Two time measures

2010-11-27 Thread Bill.Venables

I think all you need is 

?split 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Eduardo de Oliveira Horta
Sent: Sunday, 28 November 2010 8:02 AM
To: r-help@r-project.org
Subject: [R] Two time measures

Hello!

I have a csv file of intra-day financial data (5-min closing prices) that
looks like this: (obs - the dates are formated as day/month/year, as is
usual here in Brazil)

Date;Time;Close
01/09/2009;10:00;56567
01/09/2009;10:05;56463
01/09/2009;10:10;56370
##(goes on all day)
01/09/2009;16:45;55771
01/09/2009;16:50;55823
01/09/2009;16:55;55814
##(jumps to the subsequent day)
02/09/2009;10:00;55626
02/09/2009;10:05;55723
02/09/2009;10:10;55659
##(goes on all day)
02/09/2009;16:45;55742
02/09/2009;16:50;55717
02/09/2009;16:55;55385
## (and so on to the next day)

I would like to store the intra-day 5-min prices into a list, where each
element would represent one day, something like
list[[1]]
price at time 1, day 1
price at time 2, day 1
...
price at time n_1, day 1

list[[2]]
price at time 1, day 2
price at time 2, day 2
...
price at time n_2, day 2

and so on.

As the n_1, n_2, etc, suggest, each day have its own number of
observations (this reflects the fact that the market may open and close at
varying daytimes). I have guessed that a list would be a better way to store
my data, since a matrix would be filled with NA's and that is not exactly
what I'm looking for.

Thanks in advance, and best regards,

Eduardo Horta

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Gap between graph and axis

2010-11-22 Thread Bill.Venables

perhaps you need something like this.

par(yaxs = i)
plot(runif(10), type = h, ylim = c(0, 1.1))

 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Sebastian Rudnick
Sent: Tuesday, 23 November 2010 10:37 AM
To: r-help@r-project.org
Subject: [R] Gap between graph and axis

Hi everyone!

I want to plot some precipitation data via plot(type=h). Unfortunately there
is always a gap between the bars and the x-axis, so that the bars reach in the
negative area below 0 at the y-axis, which is very misleading. The
ylim-parameter is set to 0 and max of precipitation, the min value of
precipitation is 0 as well.
I tried to fix this via the fig parameter, but I have no idea how to do it at
all. 
I hope anyone can help.

Thanks a lot,

Sebastian

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a philosophy R question

2010-11-20 Thread Bill.Venables

The conventional view used to be that S is the language and that R and S-PLUS 
are implementations of it.  R is usually described as 'a programming 
environment for data analysis and graphics' (as was S-PLUS before it).  However 
as the language that R implements diverges inexorably from the classical 
definition of S it is now probably more accurate to describe the language 
itself as R as well, now a dialect of S, if you will.  

The only thing most would agree upon, though is that neither R nor S-PLUS 
should *ever* be described as stats packages.  Such a description is 
definitely to be avoided.  In fact calling R a 'package' at all would be both 
confusing and misleading (not to mention demeaning!).

Bill Venables. 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Erin Hodgess
Sent: Sunday, 21 November 2010 11:56 AM
To: R help
Subject: [R] a philosophy R question

Dear R People:

Do we say that R is a programming language or a programming environment, please?

Which is correct, please?

Thanks in advance,
Sincerely,
Erin


-- 
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can't invert matrix

2010-11-20 Thread Bill.Venables

What you show below is only a representation of the matrix to 7dp.  If you look 
at that, though, the condition number is suspiciously large (i.e. the matrix is 
very ill-conditioned):

 txt - textConnection(
+   0.99252358 0.93715047 0.7540535 0.4579895
+   0.01607797 0.09616267 0.2452471 0.3088614
+   0.09772828 0.58451468 1.4907090 1.8773815
+  -0.0100 0. 0.090 0.170
+ )
 mat - matrix(scan(txt, quiet = TRUE), ncol = 4, byrow = TRUE)
 close(txt)
 
 with(svd(mat), d[1]/d[4])
[1] 82473793

With this representation, though, the matrix does seem to allow inversion on a 
windows 32 bit machine:

 solve(mat)
   [,1]  [,2] [,3][,4]
[1,]  1.4081192  -7826819  1287643  21.5115344
[2,] -0.3987761  18796863 -3092403 -25.7757002
[3,] -0.1208272 -18836093  3098860  -0.9159397
[4,]  0.1467979   9511648 -1564829   7.6326466
 

and horrible as it is (look closely at columns 2 and 3) it does check out 
pretty well:

 mat %*% solve(mat)
  [,1]  [,2]  [,3]  [,4]
[1,]  1.00e+00 -3.060450e-10  1.108447e-10 -1.025872e-15
[2,]  3.571091e-18  1.00e+00 -5.269385e-11 -1.840975e-16
[3,]  8.201989e-17  2.559318e-09  1.00e+00 -1.284563e-15
[4,] -3.203479e-18 -8.867573e-11  1.813305e-11  1.00e+00
 

but a generalized inverse (with default tolerances) is very different

 library(MASS)
 ginv(mat)
[,1]   [,2]   [,3]  [,4]
[1,]  1.27299552 -0.2800302 -1.7022000  16.38665
[2,] -0.07426349  0.1804989  1.0971974 -13.46780
[3,] -0.44601712  0.2273789  1.3821521 -13.24953
[4,]  0.31100880 -0.1368457 -0.8318576  13.86073
 

which emphasises the delicacy of dealing with an ill-conditioned matrix.  
Incidently this also checks out fairly well according to the definition of a 
ginverse:

 range(mat - mat %*% ginv(mat) %*% mat)
[1] -2.132894e-08  2.128452e-08


When dealing with numerical matrices you have to be prepared for the unexpected.

Bill Venables.
 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Noah Silverman
Sent: Sunday, 21 November 2010 2:56 PM
To: r-help@r-project.org
Subject: [R] Can't invert matrix

Hi,

I'm trying to use the solve() function in R to invert a matrix.  I get
the following error, Lapack routine dgesv: system is exactly singular

However, My matrix doesn't appear to be singular. 

[,1]   [,2]  [,3]  [,4]
[1,]  0.99252358 0.93715047 0.7540535 0.4579895
[2,]  0.01607797 0.09616267 0.2452471 0.3088614
[3,]  0.09772828 0.58451468 1.4907090 1.8773815
[4,] -0.0100 0. 0.090 0.170


Can anyone help me understand what is happening here?

Thanks!

-N

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] density at particular values

2010-11-20 Thread Bill.Venables

It's actually not too difficult to write the density function itself as 
returning a function rather than a list of x and y values.  Here is a no frills 
(well, few frills) version:

### cut here ###
densityfun - local({
  normd - function(value, bw) {
force(value); force(bw)
function(z) dnorm(z, mean = value, sd = bw)
  }
  function(x, bw = bw.nrd0, adjust = 1) {
bandw - bw(x) * adjust
flist - lapply(x, normd, bw = bandw)
function(z)
if(length(z) = 1)
mean(sapply(flist, function(fun) fun(z)))
else rowMeans(sapply(flist, function(fun) fun(z)))
  }
})
### cut here ###

To test it:


library(MASS)
x - faithful$eruptions
dx - density(x, n = 500)
plot(dx)
dfun - densityfun(x)
sx - sample(dx$x, 25)
points(sx, dfun(sx), pch = 4)
curve(dfun, add = TRUE, col = blue, n = 500)


This idea could be extneded to provide essentially the same features as 
density() itself.  The details are left as an exercise...  If anyone ever needs 
to integrate a kernel density estimate, (and for now I can't see why they 
would, but if), then this would provide a way to do it with integrate().

Bill Venables.


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of David Winsemius
Sent: Sunday, 21 November 2010 3:21 PM
To: Shant Ch
Cc: r-help@r-project.org
Subject: Re: [R] density at particular values


On Nov 20, 2010, at 9:34 PM, Shant Ch wrote:

 David, I did look at ?density many times. I think I didn't explain  
 what I have to find.

 Suppose I have a data.
 x- c(rnorm(40,5,3),rcauchy(30,0,4),rexp(30,6))

 Suppose I don't have information about the mixture, I have been  
 given only the data.

 density(x) will give the 6 number summary of the data, given as x  
 and also the 6 number summary of the density of density given as y.

I am not sure what the number six (6) represents in the two palces it  
occurs:

  str(density(x))
List of 7
  $ x: num [1:512] -59.2 -59 -58.8 -58.5 -58.3 ...
  $ y: num [1:512] 3.30e-05 5.27e-05 8.19e-05 1.24e-04  
1.82e-04 ...
  $ bw   : num 1.37
  $ n: int 100
  $ call : language density.default(x = x)
  $ data.name: chr x
  $ has.na   : logi FALSE
  - attr(*, class)= chr density

Perhaps you want to look at:

?approxfun

It would let you interpolate at points that are not part of the set  
density(x)$x

__
David.



 I want to find the density of the given data at x=1. I basically  
 want the value of y(=density) for x=1 i.e. kernel density at x=1.

 Shant











 From: David Winsemius dwinsem...@comcast.net
 To: Shant Ch sha1...@yahoo.com
 Cc: r-help@r-project.org
 Sent: Sat, November 20, 2010 8:54:32 PM
 Subject: Re: [R] density at particular values


 On Nov 20, 2010, at 8:07 PM, Shant Ch wrote:

  Hello everyone!
 
  I want to use density function of R to compute the density at  
 x(=0, say). But it
  is giving me the 5-number summary and mean of the data and  
 densities at that
  point.
  I just want the densities at different values specified by me. Can  
 anyone let me
  know how can I find that?

 Here's what you should have done (even before posting):

 ?density
 Read the help page to see the structure of what density() returns.
 Value
 x  the n coordinates of the points where the density is estimated.

 y  the estimated density values. These will be non-negative, but can  
 be zero.

 Realize that the specified by me part is either going to be  
 modified to pick an existing estimate near my specification or  
 that you will need to approximate the value. So what is the actual  
 problem (and the actual data setup) ?

 --David.


 
  For example
 
 
  Thanks in advance for your help.
 
 
  Shant
 
 
 
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 David Winsemius, MD
 West Hartford, CT




David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 3 4 5 >

1 - 100 of 427 matches

Mail list logo