Re: [R] roc and lattice

2007-01-10 Thread Deepayan Sarkar
On 1/9/07, rhk [EMAIL PROTECTED] wrote:
 Hello, I am afraid I do not fully understand all intricacies of
 programming in lattice plots. In the code below I try to plot an ROC
 curve, following R-news 4(1). When I condition on the variable 'group' I
 get the error message below, when I plot the curve for all data (i.e., y
 ~ pred.prob), I get the plot I want. Can someone point out why
 conditioning gives that message? Thanks, Ruud

   plot.a - xyplot(y ~ pred.prob|group, data=x.df,
 +  xlim=c(0,1),xlab=1-specificiteit,
 +  ylab=sensitiviteit,
 +  panel=function(x,y,subscripts,...){
 +   DD - table(-x,y)
 +   sens - cumsum(DD[,2])/sum(DD[,2])
 +   mspec - cumsum(DD[,1])/sum(DD[,1])
 +   panel.xyplot(mspec,sens,type=l,...)
 +   panel.abline(0,1)
 +  })
   print(plot.a)
 Error in panel(x = c(0.000265710002003069, 0.000345712857778025,
 0.000265710002003069,  :
 subscript out of bounds

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

Since you haven't bothered to give us a reproducible example, all we
can say is that somewhere in your code, a subscript is out of bounds.
Since the only place where you use subscripts is as DD[,2] etc, I
would start by checking if DD indeed has two columns and two rows as
you seem to expect.

Deepayan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] where is the NIR dataset?

2007-01-10 Thread Carmen Meier
I did just the download of the pls package, but the NIR dataset is not 
available

 require(pls)
[1] TRUE
 data(NIR)
Warning message:
data set 'NIR' not found in: data(NIR)

is there another package with the dataset for the examples?
 
With regards Carmen

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] where is the NIR dataset?

2007-01-10 Thread talepanda
NIR seems to be not dataset. It is a member of dataset yarn.
Try:

library(pls)
data(yarn)
str(yarn)

On 1/10/07, Carmen Meier [EMAIL PROTECTED] wrote:
 I did just the download of the pls package, but the NIR dataset is not
 available

  require(pls)
 [1] TRUE
  data(NIR)
 Warning message:
 data set 'NIR' not found in: data(NIR)

 is there another package with the dataset for the examples?

 With regards Carmen

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] where is the NIR dataset?

2007-01-10 Thread talepanda
See changelog of pls 2.0-0

- The 'NIR' and 'sensory' data sets have been renamed to 'yarn' and 'oliveoil'.

you can see it in package source from CRAN

http://cran.r-project.org/src/contrib/Descriptions/pls.html

HTH

On 1/10/07, Carmen Meier [EMAIL PROTECTED] wrote:
 talepanda schrieb:
  NIR seems to be not dataset. It is a member of dataset yarn.
  Try:
 but i seems that it was a dataset NIR:
 have a look to a former question:
 http://finzi.psych.upenn.edu/R/Rhelp02a/archive/47269.html
 /  require(pls) /
 /  data(NIR) /
 /  class(NIR) /
 / [1] data.frame /
 / /

 With regards Carmen




__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] randomForest and missing data

2007-01-10 Thread Torsten Hothorn


On Tue, 9 Jan 2007, Bálint Czúcz wrote:


There is an improved version of the original random forest algorithm
available in the party package (you can find some additional
information on the details here:
http://www.stat.uni-muenchen.de/sfb386/papers/dsp/paper490.pdf ).

I do not know whether it yields a solution to your problem about
missing data, but maybe it's a check worth...



yes, `cforest()' is able to deal with missing values. More specifically, 
the implementation is based on conditional trees (`ctree()') which are 
able to set up surrogate splits.


Torsten


Best regards:

Bálint

On 1/4/07, Darin A. England [EMAIL PROTECTED] wrote:


Does anyone know a reason why, in principle, a call to randomForest
cannot accept a data frame with missing predictor values? If each
individual tree is built using CART, then it seems like this
should be possible. (I understand that one may impute missing values
using rfImpute or some other method, but I would like to avoid doing
that.)

If this functionality were available, then when the trees are being
constructed and when subsequent data are put through the forest, one
would also specify an argument for the use of surrogate rules, just
like in rpart.

I realize this question is very specific to randomForest, as opposed
to R in general, but any comments are appreciated. I suppose I am
looking for someone to say It's not appropriate, and here's why
... or Good idea. Please implement and post your code.

Thanks,

Darin England, Senior Scientist
Ingenix

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] posthoc tests with ANCOVA

2007-01-10 Thread Torsten Hothorn


On Tue, 9 Jan 2007, Walter Durka wrote:


dear all,



Walter,

_please_ cc questions on contributed packages to the maintainer, since not 
everybody follows r-help closely...



I want to perform a posthoc test for my ANCOVA:
a1-aov(seeds~treatment*length)

With
summary(glht(a1, linfct = mcp(treatment = Tukey)))
R tells me: covariate interactions found -- please choose appropriate
contrast



one needs to specify a certain value for `length' which, I assume, is a 
numeric covariate, right? The current interface doesn't support this (this 
on my to-do-list), however, you can set up the matrix of linear functions 
by yourself (contact me privately if you have problems to do that).



How do I build these contrasts?

Ideally, I would like to have the posthoc test for the ANCOVA including
a block-effect
a2-aov(seeds~treatment*length+Error(site))

How do I make a posthoc test here?



its on the to-do-list as well :-(

Torsten


Thanks for any comments
Walter


--

*
Dr. Walter Durka
Department Bioz?noseforschung
Department of community ecology

Helmholtz-Zentrum f?r Umweltforschung GmbH - UFZ
Helmholtz Centre for Environmental Research - UFZ
Theodor-Lieser-Str. 4 / 06120 Halle / Germany

[EMAIL PROTECTED] / http://www.ufz.de/index.php?en=798
phone +49 345 558 5314 / Fax +49 345 558 5329

+

Das UFZ hat einen neuen Namen:
Helmholtz-Zentrum f?r Umweltforschung GmbH - UFZ

The UFZ has a new name:
Helmholtz Centre for Environmental Research - UFZ

+

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] select subsets in data frame

2007-01-10 Thread Mark Hempelmann
Dear WizaRds!

A trivial question indeed on selecting subsets in data frames. I am 
sorry. Unfortunately, I did not find any helpful information on the 
introduction, searched the help archive and read in introductory books. 
Please help:

I want to select column KB which is read via read.csv2 as a data.frame 
into d. I checked that it is indeed a data.frame object and included the 
correct header information in line 1. For example purposes, look at this 
small object:
*= (4)
d - data.frame(A=1:3, Date=c(01.01.07,02.01.07,03.01.07),
KB=c(Eenie, Meenie, Miney) )

d[KB==Eenie,] # gives
@
output-start
[1] ADate KB
0 rows (or 0-length row.names)
output-end
@
If I follow Venables/ Ripley in Modern Applied Statistics with S, it 
should look like this:

*= (5)
library(MASS)
attach(painters)
painters[Colour=17,]
@
gives the correct subset. But
d[KB==Eenie,] # gives

Error in `[.data.frame`(d, KB == Eenie, ) :
 object KB not found

I need every KB named Eenie. What did I do wrong? The alternative I 
found seems to be quite complicated:

*= (6)
d[which( d[,KB]==Eenie ), ]
@
output-start
   A DateKB
1 1 01.01.07 Eenie
output-end

Thank you so much for your help.

cheers
mark


I believe I found the missing link between animal and civilized man. 
It's us. -- Konrad Lorenz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] select subsets in data frame

2007-01-10 Thread Chuck Cleland
Mark Hempelmann wrote:
 Dear WizaRds!
 
 A trivial question indeed on selecting subsets in data frames. I am 
 sorry. Unfortunately, I did not find any helpful information on the 
 introduction, searched the help archive and read in introductory books. 
 Please help:
 
 I want to select column KB which is read via read.csv2 as a data.frame 
 into d. I checked that it is indeed a data.frame object and included the 
 correct header information in line 1. For example purposes, look at this 
 small object:
 *= (4)
 d - data.frame(A=1:3, Date=c(01.01.07,02.01.07,03.01.07),
 KB=c(Eenie, Meenie, Miney) )
 
 d[KB==Eenie,] # gives
 @
 output-start
 [1] ADate KB
 0 rows (or 0-length row.names)
 output-end
 @

  Try this instead:

subset(d, KB == Eenie)

  A DateKB
1 1 01.01.07 Eenie

?subset

 If I follow Venables/ Ripley in Modern Applied Statistics with S, it 
 should look like this:
 
 *= (5)
 library(MASS)
 attach(painters)
 painters[Colour=17,]
 @
 gives the correct subset. But
 d[KB==Eenie,] # gives
 
 Error in `[.data.frame`(d, KB == Eenie, ) :
  object KB not found

  Works for me if I attach the data frame first:

attach(d)

d[KB == Eenie,]

  A DateKB
1 1 01.01.07 Eenie

 I need every KB named Eenie. What did I do wrong? The alternative I 
 found seems to be quite complicated:
 
 *= (6)
 d[which( d[,KB]==Eenie ), ]
 @
 output-start
A DateKB
 1 1 01.01.07 Eenie
 output-end
 
 Thank you so much for your help.
 
 cheers
 mark
 
 
 I believe I found the missing link between animal and civilized man. 
 It's us. -- Konrad Lorenz
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dimensions of a all objects

2007-01-10 Thread BXC (Bendix Carstensen)
Generally it is difficult to get an overview of what's there.
But the following function I acquired from (???) ages ago does a nice
job:

lls -
function (pos = 1, pat = ) 
{
dimx - function(dd) if (is.null(dim(dd))) 
length(dd)
else dim(dd)
lll - ls(pos = pos, pat = pat)
cat(formatC(mode, 1, 15), formatC(class, 1, 18), formatC(name,

1, max(nchar(lll)) + 1), 
size\n---\n)
if (length(lll)  0) {
for (i in 1:length(lll)) {
cat(formatC(eval(parse(t = paste(mode(, lll[i], 
, 1, 15), formatC(paste(eval(parse(t =
paste(class(, 
lll[i], , collapse =  ), 1, 18), formatC(lll[i],

1, max(nchar(lll)) + 1),  , eval(parse(t =
paste(dimx(, 
lll[i], , \n)
}
}
}

Just say 

lls()

and you get a reasnoable listing of obejcts.

Best,
Bendix
__

Bendix Carstensen
Senior Statistician
Steno Diabetes Center
Niels Steensens Vej 2-4
DK-2820 Gentofte
Denmark
+45 44 43 87 38 (direct)
+45 30 75 87 38 (mobile)
+45 44 43 73 13 (fax)
[EMAIL PROTECTED]   http://www.biostat.ku.dk/~bxc
__


 -Original Message-
 From: Farrel Buchinsky [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, January 09, 2007 3:30 PM
 To: r-help@stat.math.ethz.ch
 Subject: [R] dimensions of a all objects
 
 Why will the following command not work
 sapply(objects(),dim)
 What does it say about the objects list? What does it say 
 about the dim command?
 
 Likewise, the following also does not work
 all-ls()
 for (f in all) print(dim(f))
 --
 Farrel Buchinsky
 
   [[alternative HTML version deleted]]
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scripts with littler

2007-01-10 Thread John Lawrence Aspden
John Lawrence Aspden wrote:

 I've got a library (brainwaver), installed locally in ~/R/library, and
 this information is recorded in the ~/.Renviron file.

 In my script I load the library, but if I call it using
 #!/usr/bin/r --vanilla, this stops working.

(Various private e-mails exchanged. Again, thanks Dirk!)

Just in case anyone else is trying to do this, it turns out that if you can
persuade your end users to install the library to ~/R/library, then you can
say:

#!/usr/bin/r --vanilla
library(brainwaver, lib.loc='~/R/library')

although in my case, brainwaver depends on another library, which it now
can't find, so actually I have to load them in order:

#!/usr/bin/r --vanilla

library(waveslim, lib.loc='~/R/library')
library(brainwaver, lib.loc='~/R/library')





Alternatively, 

#!/usr/bin/r --vanilla

.libPaths('~/R/library')
library(brainwaver)

works, although be careful, I've noticed that it seems to behave a bit
strangely on my debian setup.

e.g.

#!/usr/bin/r --vanilla
cat(.Library,'*', .libPaths(),\n)
.libPaths('~/R/library')
cat(.Library,'*', .libPaths(),\n)

gives output
/usr/lib/R/library
* /usr/local/lib/R/site-library /usr/lib/R/site-library /usr/lib/R/library
/usr/lib/R/library * ~/R/library /usr/lib/R/library

that is, it seems to have removed /usr/local/lib/R/site-library
and /usr/lib/R/site-library as well as added ~/R/library

Cheers, John.

-- 
Contractor in Cambridge UK -- http://www.aspden.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problems with optim, for-loops and machine precision

2007-01-10 Thread Simon Ruegg
Dear R experts,

 

I have been encountering problems with the optim routine using for
loops. I am determining the optimal parameters of several nested models by
minimizing the negative Log-Likelihood (NLL) of a dataset. 

 

The aim is to find the model which best describes the data. To this end, I
am simulating artificial data sets based on the model with the least number
of parameters (6) and the parameters determined with the field data. For
each artificial set I estimate the parameters of the model with 6 parameters
and the next more complex model with 7 parameters (two of these parameters
are equal in the 6-parameter model) by minimizing the corresponding NLL with
optim. In theory the 7-parameter model should fit the data either equally
or better than the 6-parameter model. Therefore the difference of the
minimal NLLs should be 0 or larger.

For 500 data sets I use the following code:

 

require(odesolve)

res=matrix(nrow=500,ncol=18)

colnames(res)=c(a_23,beta_23,mu_23,d_23,I_23,M_23,NLL_23,

a_21,beta_21,mu_21,e_21,d_21,I_21,M_21,NLL_21,

NLL23_min_21,conv23,conv21)

for (s in 1:500)

{

 
assign(data,read.table(paste(populations/TE23simset_,s,.txt,sep=)),e
nv=MyEnv) #reading a data set

 

  M23=optim(rep(0.1,6),NLL23,method=L-BFGS-B,lower=0,

  upper=c(Inf,Inf,Inf,Inf,1,1),control=c(maxit=150))

  if (M23$convergence==0)

{

   M21=optim(rep(0.1,7),NLL21,method=L-BFGS-B,lower=0,

upper=c(Inf,Inf,Inf,Inf,Inf,1,1),control=c(maxit=150))

}

  res[s,1]=M23$par[1]

  res[s,2]=M23$par[2]

  res[s,3]=M23$par[3]

  res[s,4]=M23$par[4]

  res[s,5]=M23$par[5]

  res[s,6]=M23$par[6]

  res[s,7]=M23$value

  res[s,8]=M21$par[1]

  res[s,9]=M21$par[2]

  res[s,10]=M21$par[3]

  res[s,11]=M21$par[4]

  res[s,12]=M21$par[5]

  res[s,13]=M21$par[6]

  res[s,14]=M21$par[7]

  res[s,15]=M21$value

  res[s,16]=M23$value-M21$value

  res[s,17]=M23$convergence

  res[s,18]=M21$convergence

  write.table(res,compare23_21TE061221.txt)

  rm(M23,M21)

   }

}

 

For some strange reason the results do not correspond to what I expect:
about 10% of the solutions have a difference of NLL smaller than 0. I have
verified the optimisation of these results manually and found that a minimal
NLL was ignored and a higher NLL was returned at convergence. To check
what was happening I inserted a printing line in the NLL function to print
all parameters and the NLL as the procedure goes on. To my surprise optim
then stopped at the minimal NLL which had been ignored before. I have then
reduced the machine precision to .Machine$double.digits=8 thinking, that the
printing was slowing down the procedure and by reducing the machine
precision to speed up the calculations. For an individual calculation this
solved my problem. However if I implemented the same procedure in the loop
above, the same impossible results occurred again.

 

Can anyone tell me where I should be looking for the problem, or what it is
and how I could solve it?

 

Thanks a lot for your help

 

 

Sincerely

 

Simon

 

 

 



Simon Ruegg

Dr.med.vet.,  PhD candidate

Institute for Parasitology

Winterthurstr. 266a

8057 Zurich

Switzerland

 

phone: +41 44 635 85 93

fax: +41 44 635 89 07

e-mail: [EMAIL PROTECTED]

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fw: Memory problem on a linux cluster using a large data set [Broadcast]

2007-01-10 Thread Iris Kolder
Hi 

I listened to all your advise and ran my data on a computer with a 64 bits 
procesor but i still get the same error saying it cannot allocate a vector of 
that size 1240 kb . I don't want to cut my data in smaller pieces because we 
are looking at interaction. So are there any other options for me to try out or 
should i wait for the development of more advanced computers! 

Thanks,

Iris


- Forwarded Message 
From: Iris Kolder [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Sent: Thursday, December 21, 2006 2:07:08 PM
Subject: Re: [R] Memory problem on a linux cluster using a large data set 
[Broadcast]


Thank you all for your help! 
 
So with all your suggestions we will try to run it on a computer with a 64 bits 
proccesor. But i've been told that the new R versions all work on a 32bits 
processor. I read in other posts that only the old R versions were capable of 
larger data sets and were running under 64 bit proccesors. I also read that 
they are adapting the new R version for 64 bits proccesors again so does anyone 
now if there is a version available that we could use?
 
Iris Kolder

- Original Message 
From: Liaw, Andy [EMAIL PROTECTED]
To: Martin Morgan [EMAIL PROTECTED]; Iris Kolder [EMAIL PROTECTED]
Cc: r-help@stat.math.ethz.ch; N.C. Onland-moret [EMAIL PROTECTED]
Sent: Monday, December 18, 2006 7:48:23 PM
Subject: RE: [R] Memory problem on a linux cluster using a large data set 
[Broadcast]


In addition to my off-list reply to Iris (pointing her to an old post of
mine that detailed the memory requirement of RF in R), she might
consider the following:

- Use larger nodesize
- Use sampsize to control the size of bootstrap samples

Both of these have the effect of reducing sizes of trees grown.  For a
data set that large, it may not matter to grow smaller trees.

Still, with data of that size, I'd say 64-bit is the better solution.

Cheers,
Andy

From: Martin Morgan
 
 Iris --
 
 I hope the following helps; I think you have too much data 
 for a 32-bit machine.
 
 Martin
 
 Iris Kolder [EMAIL PROTECTED] writes:
 
  Hello,
   
  I have a large data set 320.000 rows and 1000 columns. All the data 
  has the values 0,1,2.
 
 It seems like a single copy of this data set will be at least 
 a couple of gigabytes; I think you'll have access to only 4 
 GB on a 32-bit machine (see section 8 of the R Installation 
 and Administration guide), and R will probably end up, even 
 in the best of situations, making at least a couple of copies 
 of your data. Probably you'll need a 64-bit machine, or 
 figure out algorithms that work on chunks of data.
 
  on a linux cluster with R version R 2.1.0.  which operates on a 32
 
 This is quite old, and in general it seems like R has become 
 more sensitive to big-data issues and tracking down 
 unnecessary memory copying.
 
  cannot allocate vector size 1240 kb. I've searched through
 
 use traceback() or options(error=recover) to figure out where 
 this is actually occurring.
 
  SNP - read.table(file.txt, header=FALSE, sep=)# 
 read in data file
 
 This makes a data.frame, and data frames have several aspects 
 (e.g., automatic creation of row names on sub-setting) that 
 can be problematic in terms of memory use. Probably better to 
 use a matrix, for which:
 
  'read.table' is not the right tool for reading large matrices,
  especially those with many columns: it is designed to read _data
  frames_ which may have columns of very different classes. Use
  'scan' instead.
 
 (from the help page for read.table). I'm not sure of the 
 details of the algorithms you'll invoke, but it might be a 
 false economy to try to get scan to read in 'small' versions 
 (e.g., integer, rather than
 numeric) of the data -- the algorithms might insist on 
 numeric data, and then make a copy during coercion from your 
 small version to numeric.
 
  SNP$total.NAs = rowSums(is.na(SN # calculate the 
 number of NA per row and adds a colum with total Na's
 
 This adds a column to the data.frame or matrix, probably 
 causing at least one copy of the entire data. Create a 
 separate vector instead, even though this unties the 
 coordination between columns that a data frame provides.
 
  SNP  = t(as.matrix(SNP))  # 
 transpose rows and columns
 
 This will also probably trigger a copy; 
 
  snp.na-SNP
 
 R might be clever enough to figure out that this simple 
 assignment does not trigger a copy. But it probably means 
 that any subsequent modification of snp.na or SNP *will* 
 trigger a copy, so avoid the assignment if possible.
 
  snp.roughfix-na.roughfix(snp.na)   
   
  fSNP-factor(snp.roughfix[, 1])# Asigns 
 factor to case control status
   
  snp.narf- randomForest(snp.roughfix[,-1], fSNP, 
  na.action=na.roughfix, ntree=500, mtry=10, importance=TRUE, 
  keep.forest=FALSE, do.trace=100)
 
 Now you're entirely in the hands of the randomForest. If 
 

[R] prime in expression in plot

2007-01-10 Thread Thomas Steiner
I want to write something like (LaTeX style)
b_{norm}=\frac{F\prime(0)}{R\prime(0)}

how do I add the prime (first derivative) to a R-plot? The help of
plotmath just talks about partialdiff. Can you complete this
command?

text( 30,0.05,labels=expression(b[plain(norm)]==frac(F(0),R(0))) )

Thanks,
Thomas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] select subsets in data frame

2007-01-10 Thread jim holtman
try:

d[d[,KB] == Eenie,]

or

d[d$KB==Eenie,]

What you were comparing is just two character strings (KB==Eenie) which
of course is FALSE and won't select anything.  You have to explicitly
compare values from the KB column of your data


On 1/10/07, Mark Hempelmann [EMAIL PROTECTED] wrote:

 Dear WizaRds!

 A trivial question indeed on selecting subsets in data frames. I am
 sorry. Unfortunately, I did not find any helpful information on the
 introduction, searched the help archive and read in introductory books.
 Please help:

 I want to select column KB which is read via read.csv2 as a data.frame
 into d. I checked that it is indeed a data.frame object and included the
 correct header information in line 1. For example purposes, look at this
 small object:
 *= (4)
 d - data.frame(A=1:3, Date=c(01.01.07,02.01.07,03.01.07),
 KB=c(Eenie, Meenie, Miney) )

 d[KB==Eenie,] # gives
 @
 output-start
 [1] ADate KB
 0 rows (or 0-length row.names)
 output-end
 @
 If I follow Venables/ Ripley in Modern Applied Statistics with S, it
 should look like this:

 *= (5)
 library(MASS)
 attach(painters)
 painters[Colour=17,]
 @
 gives the correct subset. But
 d[KB==Eenie,] # gives

 Error in `[.data.frame`(d, KB == Eenie, ) :
 object KB not found

 I need every KB named Eenie. What did I do wrong? The alternative I
 found seems to be quite complicated:

 *= (6)
 d[which( d[,KB]==Eenie ), ]
 @
 output-start
   A DateKB
 1 1 01.01.07 Eenie
 output-end

 Thank you so much for your help.

 cheers
 mark


 I believe I found the missing link between animal and civilized man.
 It's us. -- Konrad Lorenz

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prime in expression in plot

2007-01-10 Thread Thomas Steiner
 how do I add the prime (first derivative) to a R-plot?

sorry for the noise, I found it myself:
http://finzi.psych.upenn.edu/R/Rhelp02a/archive/20984.html
I use now (works fine)
plot(1,1,xlab=expression(frac(F'(0),R'(0))),xaxt=n)
Thomas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] correlation value and map

2007-01-10 Thread Jenny Barnes
Dear R-help community,

I have 2 different arrays of precipitation data each of the same dimensions of 
[longitude, latitude, time] dim=[30,32,43], called array1 and array2. I need to 
correlate them. This is the code I used to get one overall correlation value 
for 
the whole of the area of interest:

 result - cor(array1,array2,use=complete.obs)
 result

This give me a single value but I'm not convinced it is actually a correlation 
value for the total area for the total time period of 43 yearscan anybody 
tell me if I am indeed wrong in my coding and/or indeed my low knowledge of the 
statistics of correlation.

Also, I wanted to produce a correlation map over the 43 years. Could you also 
advise me if this is correct, I am more confident that this is than the above 
code:

 result - array(NA, c(30,32))
 
 for(i in 1:30){
 for(j in 1:32){
   array1.ts - array1[i,j,]
   array2.ts - array2[i,j,]
   result[i,j] - cor(array1.ts,array2.ts,use= complete.obs)
 }
 }

I appreciate your time very much. If I don't iron out this problem now the 
ground-work for my entire PhD will not be stable at all,

Many thanks for reading my problem, happy 2007 :-)

Jenny Barnes






Jennifer Barnes
PhD student - long range drought prediction
Climate Extremes
Department of Space and Climate Physics
University College London
Holmbury St Mary, Dorking
Surrey
RH5 6NT
01483 204149
Web: http://climate.mssl.ucl.ac.uk

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fractional brownian motion

2007-01-10 Thread ED
Dear All;

I have used fbmSim to simulate a fbm sequence, however, when I tried to
estimate the Hurst effect, none of the nine procedures gave me an answer
close enough to the real value, which is 0.5 (n=1000). So, would you please
advice,

1. which is the best method to estimate the H among the 9 mehods, R/S,
higuchi or Whittle?

2. how to choose the levels (default=50), minnpts, cutoff values or if there
is any other issues to consider? Would you please send your code if you can
get the right estimate for the attached dataset?

3. I also simulated multiple sequences, but some of the estimated results
have H1, how would you explain this, and how to correct it.

4. if I have a sample size of 30, what are your suggestions for estimating
the hurst effect.

I really appreciate your help, and thanks in advance.

Sincerely;

Qing

list(c(0, 0.0602883614054335, 0.0792551804556156, 0.109563140496959,
0.125726475390261, 0.101877497256596, 0.0571926667502657,
0.00262554012667387,
-0.0651825041744684, -0.0696503406796901, -0.0688337926741136,
-0.0361989921035026, -0.0496524166395568, -0.0718218507093852,
-0.109583486993751, -0.0451891749832022, -0.0260755094163026,
-0.0201228029445018, -0.08818052945358, -0.0285680272562049,
-0.032752629105123, -0.00199553309054604, 0.00508312076775159,
-0.0382257524810562, -0.0398084284989033, -0.0502088451482971,
-0.0502183047870017, -0.0174725900800023, -0.031457396692228,
-0.0497298422601627, -0.0499422794745919, -0.0361646489128442,
0.0266653302426084, -0.00219111075589404, 0.0200305010061950,
-0.054431205284715, -0.00789973517059908, -0.0085425214867059,
-0.0541809440614641, -0.110006586793787, -0.127781398474619,
-0.149159749479771, -0.0888724748560444, -0.0768016585462973,
-0.0427333724364433, -0.0562443902614682, -0.062866380084345,
-0.0883889349587378, -0.08412713827269, -0.124694662643563, -
0.0692092347299578,
-0.062046981143375, -0.0853376756560203, -0.0994827776216755,
-0.0599248702862878, -0.086369926350214, -0.0890587510305954,
-0.0599052704934926, -0.0393644348210234, -0.0364342257334110,
-0.0352152731793845, 0.00857219866032972, 0.0180050537574112,
-0.0356947245542089, -0.0615077884110117, -0.0650043906408892,
-0.0357502728853327, 0.00485292461594026, -0.0458490824792766,
-0.0735087346178546, -0.0820482814570969, -0.124917946967105,
-0.102342757999288, -0.154497291254316, -0.181381845540350, -
0.166583783090691,
-0.130709738638527, -0.124693013403996, -0.120174007595549, -
0.204900741024172,
-0.187508161990262, -0.200032164010717, -0.224311714556569, -
0.207979212824500,
-0.232450923263701, -0.198989498760939, -0.215501785889735, -
0.226880158415004,
-0.203545247216445, -0.182379290331033, -0.197038481353429, -
0.183796217531869,
-0.215299419717715, -0.226095790998872, -0.201080854888686, -
0.261516787028393,
-0.228388047078831, -0.233878235687051, -0.250375640774054, -
0.245746001611628,
-0.232938343176222, -0.204702689747725, -0.259350393091606, -
0.266266298423241,
-0.261596784773769, -0.259875864886204, -0.288719457448324, -
0.318140871065267,
-0.304593775001704, -0.263467947228369, -0.323808258544402, -
0.368149351192211,
-0.347335883864181, -0.400024671714471, -0.371544405921674, -
0.34497157633729,
-0.418657437186672, -0.446912741927435, -0.443280747761798, -
0.384739689692638,
-0.408072099289829, -0.44994167683647, -0.495273953054503, -
0.495799924479583,
-0.43697383254148, -0.440650597876129, -0.458929575025839, -
0.464585181026337,
-0.455843249117235, -0.505571057794026, -0.528316203388037, -
0.58236723395483,
-0.519865988395428, -0.555187599676786, -0.532496699903787, -
0.522718022822901,
-0.559598211031558, -0.54708053308, -0.554634133071482, -
0.541090495738355,
-0.558915651522986, -0.541917600099341, -0.578140869146858, -
0.53827826166188,
-0.52995892714492, -0.502706986562639, -0.497002923985933, -
0.488096453915811,
-0.51306171001298, -0.536938942207764, -0.515525690037527, -
0.527192086185579,
-0.511195825348352, -0.531104487863626, -0.521038783472886, -
0.519956505596612,
-0.510535242028159, -0.531234776730198, -0.526733559471536, -
0.583555032968273,
-0.551226199377139, -0.489093347568197, -0.505836247919194, -
0.485420121220644,
-0.511258369655571, -0.505044073824655, -0.496646898991666, -
0.49767793016268,
-0.496864531091594, -0.545965735276333, -0.588151447068976, -
0.594481697088302,
-0.633356216248837, -0.649947987535778, -0.698052359479746, -
0.651147537552288,
-0.678918599568793, -0.707817506128165, -0.69962378236799, -
0.664061266573256,
-0.686809968963275, -0.70940931290113, -0.710042179498104, -
0.700294406700751,
-0.69089796594532, -0.687411073827526, -0.660740247977237, -0.61857834066703,

-0.611492191585365, -0.617870447722936, -0.590305790642949, -
0.616713901938279,
-0.619126236382548, -0.618591413756597, -0.604886044395737, -
0.598434945889939,
-0.481524771809731, -0.424388823122359, -0.490492676413456, -
0.506402791875861,
-0.480172973435785, -0.517922095978117, -0.502318428525046, -
0.496731810144271,
-0.531408048623419, 

[R] Column names in Zoo object

2007-01-10 Thread Shubha Vishwanath Karanth
Hi,

 

I am downloading Bloomberg data from R. This data will be stored in a
zoo object by default. The command is

 

dat-blpGetData(con,c(NOK1V FH Equity,AUA AV
Equity),PX_OPEN,start=as.chron(as.Date(12/1/2006,
%m/%d/%Y)),end=as.chron(as.Date(12/28/2006, %m/%d/%Y)))

Here I am downloading the data for two tickers, (NOK1V FH Equity and
AUA AV Equity) simultaneously. Then the column names of my zoo object
will be NOK1V.FH.Equity and AUA.AV.Equity respectively, which are the
ticker names itself.

 

But if I download for only one ticker by the code,

dat-blpGetData(con,c(NOK1V FH Equity
),PX_OPEN,start=as.chron(as.Date(12/1/2006,
%m/%d/%Y)),end=as.chron(as.Date(12/28/2006, %m/%d/%Y)))

Now, the column name of my zoo object is PX_OPEN, the field name
instead of the ticker name NOK1V FH Equity.

 

I need my second code to work as the first one. i.e., if only one ticker
is used instead of two or more, then the column name taken by my zoo
object should be the ticker name and not the field name. Could somebody
help me on this?

 

Thanks in advance.

 

 

 

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scripts with littler

2007-01-10 Thread Brian Ripley
[By now way off the subject line.  Something like 'how to set the 
libraries inside an R session'.]

On Wed, 10 Jan 2007, John Lawrence Aspden wrote:

 John Lawrence Aspden wrote:

 I've got a library (brainwaver), installed locally in ~/R/library, and
 this information is recorded in the ~/.Renviron file.

 In my script I load the library, but if I call it using
 #!/usr/bin/r --vanilla, this stops working.

 (Various private e-mails exchanged. Again, thanks Dirk!)

 Just in case anyone else is trying to do this, it turns out that if you can
 persuade your end users to install the library to ~/R/library, then you can
 say:

 #!/usr/bin/r --vanilla
 library(brainwaver, lib.loc='~/R/library')

 although in my case, brainwaver depends on another library, which it now
 can't find, so actually I have to load them in order:

 #!/usr/bin/r --vanilla

 library(waveslim, lib.loc='~/R/library')
 library(brainwaver, lib.loc='~/R/library')

 Alternatively,

 #!/usr/bin/r --vanilla

 .libPaths('~/R/library')
 library(brainwaver)

 works, although be careful, I've noticed that it seems to behave a bit
 strangely on my debian setup.

'It' (R) is behaving as you asked it to.  Most likely you intended to ask 
for

.libPaths(c(~/R/library, .libPaths()))

 e.g.

 #!/usr/bin/r --vanilla
 cat(.Library,'*', .libPaths(),\n)
 .libPaths('~/R/library')
 cat(.Library,'*', .libPaths(),\n)

 gives output
 /usr/lib/R/library
 * /usr/local/lib/R/site-library /usr/lib/R/site-library /usr/lib/R/library
 /usr/lib/R/library * ~/R/library /usr/lib/R/library

 that is, it seems to have removed /usr/local/lib/R/site-library
 and /usr/lib/R/site-library as well as added ~/R/library

Exactly as documented.  The argument is named 'new' and not 'add', BTW.
Please 'be careful' in what you say about the work of others.


 Cheers, John.



-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Column names in Zoo object/BlpGetata problem

2007-01-10 Thread Shubha Vishwanath Karanth
I think the problem is in the BlpGetData function than in the zoo
object. Because instead of zoo object, the data was put into a data
frame, but then also the problem prevails.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Shubha Vishwanath
Karanth
Sent: Wednesday, January 10, 2007 7:29 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Column names in Zoo object

Hi,

 

I am downloading Bloomberg data from R. This data will be stored in a
zoo object by default. The command is

 

dat-blpGetData(con,c(NOK1V FH Equity,AUA AV
Equity),PX_OPEN,start=as.chron(as.Date(12/1/2006,
%m/%d/%Y)),end=as.chron(as.Date(12/28/2006, %m/%d/%Y)))

Here I am downloading the data for two tickers, (NOK1V FH Equity and
AUA AV Equity) simultaneously. Then the column names of my zoo object
will be NOK1V.FH.Equity and AUA.AV.Equity respectively, which are the
ticker names itself.

 

But if I download for only one ticker by the code,

dat-blpGetData(con,c(NOK1V FH Equity
),PX_OPEN,start=as.chron(as.Date(12/1/2006,
%m/%d/%Y)),end=as.chron(as.Date(12/28/2006, %m/%d/%Y)))

Now, the column name of my zoo object is PX_OPEN, the field name
instead of the ticker name NOK1V FH Equity.

 

I need my second code to work as the first one. i.e., if only one ticker
is used instead of two or more, then the column name taken by my zoo
object should be the ticker name and not the field name. Could somebody
help me on this?

 

Thanks in advance.

 

 

 

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fractional brownian motion

2007-01-10 Thread RABINOVITCH Peter
You have applied the Hurst parameter estimating function to the fBm, not
the fGn as described in the help page for Long Range Dependence
Modelling in fSeries.

The function aggvarFit computes the Hurst exponent from the variance of
an aggregated FGN or FARIMA time series process

Trying perFit(diff(x)) (where x is your fBm series) gives an estimate of
the Hurst Exponent as 0.53527922 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of ED
Sent: Wednesday, January 10, 2007 8:53 AM
To: r-help@stat.math.ethz.ch
Subject: [R] Fractional brownian motion

Dear All;

I have used fbmSim to simulate a fbm sequence, however, when I tried to
estimate the Hurst effect, none of the nine procedures gave me an answer
close enough to the real value, which is 0.5 (n=1000). So, would you
please advice,

1. which is the best method to estimate the H among the 9 mehods, R/S,
higuchi or Whittle?

2. how to choose the levels (default=50), minnpts, cutoff values or if
there is any other issues to consider? Would you please send your code
if you can get the right estimate for the attached dataset?

3. I also simulated multiple sequences, but some of the estimated
results have H1, how would you explain this, and how to correct it.

4. if I have a sample size of 30, what are your suggestions for
estimating the hurst effect.

I really appreciate your help, and thanks in advance.

Sincerely;

Qing

list(c(0, 0.0602883614054335, 0.0792551804556156, 0.109563140496959,
0.125726475390261, 0.101877497256596, 0.0571926667502657,
0.00262554012667387, -0.0651825041744684, -0.0696503406796901,
-0.0688337926741136, -0.0361989921035026, -0.0496524166395568,
-0.0718218507093852, -0.109583486993751, -0.0451891749832022,
-0.0260755094163026, -0.0201228029445018, -0.08818052945358,
-0.0285680272562049, -0.032752629105123, -0.00199553309054604,
0.00508312076775159, -0.0382257524810562, -0.0398084284989033,
-0.0502088451482971, -0.0502183047870017, -0.0174725900800023,
-0.031457396692228, -0.0497298422601627, -0.0499422794745919,
-0.0361646489128442, 0.0266653302426084, -0.00219111075589404,
0.0200305010061950, -0.054431205284715, -0.00789973517059908,
-0.0085425214867059, -0.0541809440614641, -0.110006586793787,
-0.127781398474619, -0.149159749479771, -0.0888724748560444,
-0.0768016585462973, -0.0427333724364433, -0.0562443902614682,
-0.062866380084345, -0.0883889349587378, -0.08412713827269,
-0.124694662643563, - 0.0692092347299578, -0.062046981143375,
-0.0853376756560203, -0.0994827776216755, -0.0599248702862878,
-0.086369926350214, -0.0890587510305954, -0.0599052704934926,
-0.0393644348210234, -0.0364342257334110, -0.0352152731793845,
0.00857219866032972, 0.0180050537574112, -0.0356947245542089,
-0.0615077884110117, -0.0650043906408892, -0.0357502728853327,
0.00485292461594026, -0.0458490824792766, -0.0735087346178546,
-0.0820482814570969, -0.124917946967105, -0.102342757999288,
-0.154497291254316, -0.181381845540350, - 0.166583783090691,
-0.130709738638527, -0.124693013403996, -0.120174007595549, -
0.204900741024172, -0.187508161990262, -0.200032164010717,
-0.224311714556569, - 0.207979212824500, -0.232450923263701,
-0.198989498760939, -0.215501785889735, - 0.226880158415004,
-0.203545247216445, -0.182379290331033, -0.197038481353429, -
0.183796217531869, -0.215299419717715, -0.226095790998872,
-0.201080854888686, - 0.261516787028393, -0.228388047078831,
-0.233878235687051, -0.250375640774054, - 0.245746001611628,
-0.232938343176222, -0.204702689747725, -0.259350393091606, -
0.266266298423241, -0.261596784773769, -0.259875864886204,
-0.288719457448324, - 0.318140871065267, -0.304593775001704,
-0.263467947228369, -0.323808258544402, - 0.368149351192211,
-0.347335883864181, -0.400024671714471, -0.371544405921674, -
0.34497157633729, -0.418657437186672, -0.446912741927435,
-0.443280747761798, - 0.384739689692638, -0.408072099289829,
-0.44994167683647, -0.495273953054503, - 0.495799924479583,
-0.43697383254148, -0.440650597876129, -0.458929575025839, -
0.464585181026337, -0.455843249117235, -0.505571057794026,
-0.528316203388037, - 0.58236723395483, -0.519865988395428,
-0.555187599676786, -0.532496699903787, - 0.522718022822901,
-0.559598211031558, -0.54708053308, -0.554634133071482, -
0.541090495738355, -0.558915651522986, -0.541917600099341,
-0.578140869146858, - 0.53827826166188, -0.52995892714492,
-0.502706986562639, -0.497002923985933, - 0.488096453915811,
-0.51306171001298, -0.536938942207764, -0.515525690037527, -
0.527192086185579, -0.511195825348352, -0.531104487863626,
-0.521038783472886, - 0.519956505596612, -0.510535242028159,
-0.531234776730198, -0.526733559471536, - 0.583555032968273,
-0.551226199377139, -0.489093347568197, -0.505836247919194, -
0.485420121220644, -0.511258369655571, -0.505044073824655,
-0.496646898991666, - 0.49767793016268, -0.496864531091594,
-0.545965735276333, -0.588151447068976, - 0.594481697088302,
-0.633356216248837, -0.649947987535778, -0.698052359479746, -

Re: [R] prime in expression in plot

2007-01-10 Thread Prof Brian Ripley
Two possibilities are ' (which is the usual way to write \prime in TeX) 
and minute (which is the nearest character in the Adobe symbol font that 
plotmath uses).  TeX has its own symbol fonts, and most R devices are 
based rather on the Adobe one.

On Wed, 10 Jan 2007, Thomas Steiner wrote:

 I want to write something like (LaTeX style)
 b_{norm}=\frac{F\prime(0)}{R\prime(0)}

 how do I add the prime (first derivative) to a R-plot? The help of
 plotmath just talks about partialdiff. Can you complete this
 command?

 text( 30,0.05,labels=expression(b[plain(norm)]==frac(F(0),R(0))) )

 Thanks,
 Thomas

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fw: Memory problem on a linux cluster using a large data set [Broadcast]

2007-01-10 Thread Prof Brian Ripley
On Wed, 10 Jan 2007, Iris Kolder wrote:

 Hi

 I listened to all your advise and ran my data on a computer with a 64 
 bits procesor but i still get the same error saying it cannot allocate 
 a vector of that size 1240 kb . I don't want to cut my data in smaller 
 pieces because we are looking at interaction. So are there any other 
 options for me to try out or should i wait for the development of more 
 advanced computers!

Did you use a 64-bit build of R on that machine?  If the message is the 
same, I strongly suspect not.  64-bit builds are not the default on most 
OSes.


 Thanks,

 Iris


 - Forwarded Message 
 From: Iris Kolder [EMAIL PROTECTED]
 To: r-help@stat.math.ethz.ch
 Sent: Thursday, December 21, 2006 2:07:08 PM
 Subject: Re: [R] Memory problem on a linux cluster using a large data set 
 [Broadcast]


 Thank you all for your help!

 So with all your suggestions we will try to run it on a computer with a 
 64 bits proccesor. But i've been told that the new R versions all work 
 on a 32bits processor. I read in other posts that only the old R 
 versions were capable of larger data sets and were running under 64 bit 
 proccesors. I also read that they are adapting the new R version for 64 
 bits proccesors again so does anyone now if there is a version available 
 that we could use?

 Iris Kolder

 - Original Message 
 From: Liaw, Andy [EMAIL PROTECTED]
 To: Martin Morgan [EMAIL PROTECTED]; Iris Kolder [EMAIL PROTECTED]
 Cc: r-help@stat.math.ethz.ch; N.C. Onland-moret [EMAIL PROTECTED]
 Sent: Monday, December 18, 2006 7:48:23 PM
 Subject: RE: [R] Memory problem on a linux cluster using a large data set 
 [Broadcast]


 In addition to my off-list reply to Iris (pointing her to an old post of
 mine that detailed the memory requirement of RF in R), she might
 consider the following:

 - Use larger nodesize
 - Use sampsize to control the size of bootstrap samples

 Both of these have the effect of reducing sizes of trees grown.  For a
 data set that large, it may not matter to grow smaller trees.

 Still, with data of that size, I'd say 64-bit is the better solution.

 Cheers,
 Andy

 From: Martin Morgan

 Iris --

 I hope the following helps; I think you have too much data
 for a 32-bit machine.

 Martin

 Iris Kolder [EMAIL PROTECTED] writes:

 Hello,

 I have a large data set 320.000 rows and 1000 columns. All the data
 has the values 0,1,2.

 It seems like a single copy of this data set will be at least
 a couple of gigabytes; I think you'll have access to only 4
 GB on a 32-bit machine (see section 8 of the R Installation
 and Administration guide), and R will probably end up, even
 in the best of situations, making at least a couple of copies
 of your data. Probably you'll need a 64-bit machine, or
 figure out algorithms that work on chunks of data.

 on a linux cluster with R version R 2.1.0.  which operates on a 32

 This is quite old, and in general it seems like R has become
 more sensitive to big-data issues and tracking down
 unnecessary memory copying.

 cannot allocate vector size 1240 kb. I've searched through

 use traceback() or options(error=recover) to figure out where
 this is actually occurring.

 SNP - read.table(file.txt, header=FALSE, sep=)#
 read in data file

 This makes a data.frame, and data frames have several aspects
 (e.g., automatic creation of row names on sub-setting) that
 can be problematic in terms of memory use. Probably better to
 use a matrix, for which:

  'read.table' is not the right tool for reading large matrices,
  especially those with many columns: it is designed to read _data
  frames_ which may have columns of very different classes. Use
  'scan' instead.

 (from the help page for read.table). I'm not sure of the
 details of the algorithms you'll invoke, but it might be a
 false economy to try to get scan to read in 'small' versions
 (e.g., integer, rather than
 numeric) of the data -- the algorithms might insist on
 numeric data, and then make a copy during coercion from your
 small version to numeric.

 SNP$total.NAs = rowSums(is.na(SN # calculate the
 number of NA per row and adds a colum with total Na's

 This adds a column to the data.frame or matrix, probably
 causing at least one copy of the entire data. Create a
 separate vector instead, even though this unties the
 coordination between columns that a data frame provides.

 SNP  = t(as.matrix(SNP))  #
 transpose rows and columns

 This will also probably trigger a copy;

 snp.na-SNP

 R might be clever enough to figure out that this simple
 assignment does not trigger a copy. But it probably means
 that any subsequent modification of snp.na or SNP *will*
 trigger a copy, so avoid the assignment if possible.

 snp.roughfix-na.roughfix(snp.na)

 fSNP-factor(snp.roughfix[, 1])# Asigns
 factor to case control status

 snp.narf- randomForest(snp.roughfix[,-1], fSNP,
 na.action=na.roughfix, ntree=500, mtry=10, 

Re: [R] contingency table analysis; generalized linear model

2007-01-10 Thread Trevor Hastie
 Date: Tue, 9 Jan 2007 11:13:41 + (GMT)
 From: Mark Difford [EMAIL PROTECTED]
 Subject: Re: [R] contingency table analysis; generalized linear model

 Dear List,

 I would appreciate help on the following matter:

 I am aware that higher dimensional contingency tables can be  
 analysed using either log-linear models or as a poisson regression  
 using a generalized linear model:

 log-linear:
 loglm(~Age+Site, data=xtabs(~Age+Site, data=SSites.Rev,  
 drop.unused.levels=T))

 GLM:
 glm.table - as.data.frame(xtabs(~Age+Site, data=SSites.Rev,  
 drop.unused.levels=T))
 glm(Freq ~ Age + Site, data=glm.table, family='poisson')

 where Site is a factor and Age is cast as a factor by xtabs() and  
 treated as such.

 **Question**:
 Is it acceptable to step away from contingency table analysis by  
 recasting Age as a numerical variable, and redoing the analysis as:

 glm(Freq ~ as.numeric(Age) + Site, data=glm.table, family='poisson')

 My reasons for wanting to do this are to be able to include non- 
 linear terms in the model, using say restricted or natural cubic  
 splines.

 Thank you in advance for your help.
 Regards,
 Mark Difford.


 ---
 Mark Difford
 Ph.D. candidate, Botany Department,
 Nelson Mandela Metropolitan University,
 Port Elizabeth, SA.

Yes it is, and it is often the preferred way to view the analysis.
In this case it looks like Freq is measuring something like species  
abundance,
and it is natural to model this as a Poisson count via a log-link glm.
As such you are free to include any reasonable functions of your  
predictors
in modeling the mean.

Log-linear models are typically presented as ways of  analyzing  
dependence between
categorical variables, when represented as multi-way tables. The  
appropriate multinomial
models, conditioning on certain marginals, happen to be equivalent to  
Poisson glms with
appropriate terms included.

I would suggest in your data preparation that you
glm.table[,Age] - as.numeric(glm.table[,Age])
at the start, so that now you can think of your data in the right way.

Trevor Hastie


---
   Trevor Hastie   [EMAIL PROTECTED]
   Professor  Chair, Department of Statistics, Stanford University
   Phone: (650) 725-2231 (Statistics)  Fax: (650) 725-8977
   (650) 498-5233 (Biostatistics)   Fax: (650) 725-6951
   URL: http://www-stat.stanford.edu/~hastie
address: room 104, Department of Statistics, Sequoia Hall
390 Serra Mall, Stanford University, CA 94305-4065
  



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Implementation of Wei-Lachin Test

2007-01-10 Thread Dalton, Jarrod
Greetings -

 

Does anybody know if there is an R implementation of the Wei-Lachin
tests for incomplete multivariate observations (JASA 79: 653-661)?

 

The authors' example of data where this test can be applied is in a
study comparing treatment to placebo where the outcome is a series of
time-to-event observations - i.e., time to development of a nonfatal
symptom in each of several major body systems (dermal, musculoskeletal,
neurologic, etc.).  The null hypothesis here is that the joint
distributions for each of the treatment and control groups is equal.

 

Regards,

 

Jarrod Dalton

 






Cleveland Clinic is ranked one of the top 3 hospitals in
America by U.S.News  World Report. Visit us online at
http://www.clevelandclinic.org for a complete listing of
our services, staff and locations.


Confidentiality Note:  This message is intended for use\ onl...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ARIMA and xreg

2007-01-10 Thread sj
Hello,

I am fairly new to time series analysis, and I am wondering if anyone could
point me to some good references related to mechanics of what is going on
when you fit a model using arima that includes a set of external regressors
(i.e., use the xreg= feature).

Thank you,

Spencer

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] logistic regression packages

2007-01-10 Thread Feng Qiu
Hi All:
   I'm testing a set of data classification algorithms in this paper 
(www.stat.wisc.edu/~loh/treeprogs/quest1.7/mach1317.pdf )
I couldn't find such algorithms in R packages:
   1. LOG: polytomous logistic regression (there was one in MASS 
library: multinom. But after I update MASS library, multinom was lost.)
   2. POL: POLYCLASS algorithm. There is a S-Plus package(polyclass 
library) for this algorithm, so there should be a corresponding package in 
R, but I haven't found it so far.
   Any advice is appreciated.

Best,

Feng

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] go or goto command

2007-01-10 Thread Thomas L Jones
Some computer languages, including C, have a go or go to command 
which can be used to shift control to a different part of a function.

Question: Does the R language have a corresponding command? (Yes, I am 
aware that it can be abused; but, in the hands of a good programmer, 
it is simple to use.)

Tom Jones

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] correlation value and map

2007-01-10 Thread Zoltan Kmetty
Hi Jenny!

So if i understand your datafile corect you have 960 case for a year. Any
you have 43 years.. Yes?

I'm not sure you should use correlation in this situation because of the
autocorrelation of the data. There are big autocorrelation on spatial data's
like what you use, and there are also a very big autocorrelation in time
series data. I think you have to decompose your time series, and you have to
cut down, the trend (and maybe some kind of sesonality), and than for the
residuals you should do a correlation. You have to filter out the
autocorrelation on the spatial data too, some way..

And because of the above problems, don't calculate correlation for the
entierly databases!

bye,
Zoltan

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] map data.frame() data after having linked them to a read.shape() object

2007-01-10 Thread Tord Snäll
Dear all,
I try to first link data in a data.frame() to a (polygon) read.shape() 
object and then to plot a polygon map showing the data in the 
data.frame. The first linking is called link in ArcView and relate 
in ArcMap. I use the code shown below, though without success.

Help with this would be greatly appreciated.

Thanks!

Tord

require(maptools)
# Read shape file (one row per county)
a=read.shape(myshp.shp, dbf.data=TRUE, verbose=TRUE)
str(a)
  ..- attr(*, maxbb)= num [1:4] -100   4900
  ..- attr(*, class)= chr ShapeList
$ att.data:'data.frame':   490 obs. of  60 variables:
  ..$ STATE_FIPS: Factor w/ 12 levels 04,06,08,..: 11 11 11 11 4 5
5 5 5 5 ...
[snip]
  ..$ STACOUNT4 : Factor w/ 489 levels ArizonaApache,..: 437 460 451
453 147 207 195 198 231 206 ...
  ..- attr(*, data_types)= chr [1:60] C C C C ...
- attr(*, class)= chr Map


# Read case data (one row per case)
cases = read.table(cases.txt, h=T,)
str(cases)
'data.frame':   431 obs. of  8 variables:
$ Year: int  1950 1950 1950 1951 1956 1957 1959 1959 1959 1959 ...
$ Case: int  3 1 2 1 1 1 2 4 1 3 ...
$ stacount: Factor w/ 106 levels ArizonaApache,..: 1 66 76 66 26 29
15 25 30 60 ...

# table the cases data PER Year, PER County (County = stacount)
temp = t(table(cases[,c(Year,stacount)]))
stacount = dimnames(temp)$stacount
temp = cbind(stacount, as.data.frame(temp[,1:ncol(temp)],row.names=F))
str(temp)
'data.frame':   106 obs. of  50 variables:
$ stacount: Factor w/ 106 levels ArizonaApache,..: 1 2 3 4 5 6 7 8 9
10 ...
$ 1950: int  1 0 0 0 0 0 0 0 0 0 ...
[snip]
$ 2005: int  0 0 0 0 0 0 0 0 0 0 ...

# Pick out a temporary attribute data.frame
datfr = a$att.data

# Merge the temporaty data frame with tabled cases
for(i in 2:ncol(temp)){
 datfr = merge(datfr, temp[,c(1,i)], by.x=STACOUNT4,
by.y=stacount, all.x=T, all.y=F)
}

#Replace NAs with 0:
for(i in 61:109){
 datfr[,i] = ifelse(is.na(datfr[,i])==T,0,datfr[,i])
}

str(a$att.data)
'data.frame':   490 obs. of  60 variables:
$ NAME  : Factor w/ 416 levels Ada,Adams,..: 120 352 265 277 33
210 122 135 372 209 ...
[snip]
$ STACOUNT4 : Factor w/ 489 levels ArizonaApache,..: 437 460 451 453
147 207 195 198 231 206 ...
- attr(*, data_types)= chr  C C C C ...
# Note that the above data is of attribute type

str(datfr)
'data.frame':   490 obs. of  109 variables:
$ STACOUNT4 : Factor w/ 489 levels ArizonaApache,..: 1 2 3 4 5 6 7 8
9 10 ...
[snip]
$ 1951  : num  0 0 0 0 0 0 0 0 0 0 ...
[snip]
$ 2005  : num  0 0 0 0 0 0 0 0 0 0 ...
# Note that at the end of this, data type is not described - it is a 
simple data frame

# bind data together:
#Alternative 1:
a$att.data = cbind(a$att.data, datfr[,61:109])
# Other alternatives:
test = matrix(ncol=49)
a$att.data[,61:109] = test
a$att.data[,61:109] = datfr[,61:109]

# plot:
plot(a, auxvar=a$att.data[,61], xlim=c(-125,-99),ylim=c(28,52), xlab=,
ylab=, frame.plot=F,axes=F)
There were 50 or more warnings (use warnings() to see the first 50)
warnings()
49: axes is not a graphical parameter in: polygon(xy$x, xy$y, 
col,border, lty, ...)
50: frame.plot is not a graphical parameter in: polygon(xy$x, 
xy$y,col, border, lty, ...)

# The a$att.data type has changed to becoming a typical data.frame - 
attr is not mentioned:
str(a$att.data)
[snip]
$ 2003  : num  0 0 0 0 0 0 0 0 0 0 ...
$ 2004  : num  0 0 0 0 0 0 0 0 0 0 ...
$ 2005  : num  0 0 0 0 0 0 0 0 0 0 ...



-- 

Tord Snäll
Department of Conservation Biology
Swedish University of Agricultural Sciences (SLU)
P.O. 7002, SE-750 07 Uppsala, Sweden
Office/Mobile/Fax
+46-18-672612/+46-730-891356/+46-18-673537
E-mail: [EMAIL PROTECTED]
www.nvb.slu.se/staff_tordsnall

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] correlation value and map

2007-01-10 Thread Jenny Barnes
Hi Zoltan,

Right, I have 30x32=960 data points per year (It is actually the mean febuary 
precipitation total in case you were wondering) at each grid point over the 
world, so I have 960 data points each of the 43 years. Therefore can I do 
anything with a trend and residuals? I don't think I can if it's just mean feb 
precipitation, one data point per grid square per year... I apreicate your help 
though very much.although I do still need to perform a spatial correlation 
if anyone else can help?

Many thanks,

Jenny



Hi Jenny!

So if i understand your datafile corect you have 960 case for a year. Any
you have 43 years.. Yes?

I'm not sure you should use correlation in this situation because of the
autocorrelation of the data. There are big autocorrelation on spatial data's
like what you use, and there are also a very big autocorrelation in time
series data. I think you have to decompose your time series, and you have to
cut down, the trend (and maybe some kind of sesonality), and than for the
residuals you should do a correlation. You have to filter out the
autocorrelation on the spatial data too, some way..

And because of the above problems, don't calculate correlation for the
entierly databases!

bye,
Zoltan


Jennifer Barnes
PhD student - long range drought prediction
Climate Extremes
Department of Space and Climate Physics
University College London
Holmbury St Mary, Dorking
Surrey
RH5 6NT
01483 204149
07916 139187
Web: http://climate.mssl.ucl.ac.uk

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problems with optim, for-loops and machine precision

2007-01-10 Thread Setzer . Woodrow
Without more detail - a reproducible example - it is hard to give you
concrete advice.

I wonder if the functions NLL23 and NLL21 depend on numerical solutions
of a system of ODEs, since you invoke the odesolve package?  If so, try
switching to the Nelder-Mead optimizer, enforcing the parameter
constraints using transformation.  Probably you are using the finite
difference derivatives calculated internally to optim for the gradient
information used in the L-BFGS-B optimizer.  These can be unstable when
based on numerical solutions of odes, causing the optimizer to fail, or
sometimes to converge to a non-optimal point.

Some other points:
- you cannot change machine precision by changing values in .Machine.
To change the number of digits printed, use options(digits=8).
- use 'library()' instead of 'require()', unless you need to use the
return value from 'require()'

R. Woodrow Setzer, Ph. D.
National Center for Computational Toxicology
US Environmental Protection Agency
Mail Drop B205-01/US EPA/RTP, NC 27711
Ph: (919) 541-0128Fax: (919) 541-1194



 Simon Ruegg
 [EMAIL PROTECTED]   
 unizh.ch   To 
 Sent by: r-help@stat.math.ethz.ch  
 [EMAIL PROTECTED]cc 
 tat.math.ethz.ch   
Subject 
  [R] problems with optim,  
 01/10/2007 07:18 for-loops and machine precision 
 AM 









Dear R experts,



I have been encountering problems with the optim routine using for
loops. I am determining the optimal parameters of several nested models
by
minimizing the negative Log-Likelihood (NLL) of a dataset.



The aim is to find the model which best describes the data. To this end,
I
am simulating artificial data sets based on the model with the least
number
of parameters (6) and the parameters determined with the field data. For
each artificial set I estimate the parameters of the model with 6
parameters
and the next more complex model with 7 parameters (two of these
parameters
are equal in the 6-parameter model) by minimizing the corresponding NLL
with
optim. In theory the 7-parameter model should fit the data either
equally
or better than the 6-parameter model. Therefore the difference of the
minimal NLLs should be 0 or larger.

For 500 data sets I use the following code:



require(odesolve)

res=matrix(nrow=500,ncol=18)

colnames(res)=c(a_23,beta_23,mu_23,d_23,I_23,M_23,NLL_23,


a_21,beta_21,mu_21,e_21,d_21,I_21,M_21,NLL_21,

NLL23_min_21,conv23,conv21)

for (s in 1:500)

{


assign(data,read.table(paste(populations/TE23simset_,s,.txt,sep=)),e

nv=MyEnv) #reading a data set



  M23=optim(rep(0.1,6),NLL23,method=L-BFGS-B,lower=0,

  upper=c(Inf,Inf,Inf,Inf,1,1),control=c(maxit=150))

  if (M23$convergence==0)

{

   M21=optim(rep(0.1,7),NLL21,method=L-BFGS-B,lower=0,

upper=c(Inf,Inf,Inf,Inf,Inf,1,1),control=c(maxit=150))

}

  res[s,1]=M23$par[1]

  res[s,2]=M23$par[2]

  res[s,3]=M23$par[3]

  res[s,4]=M23$par[4]

  res[s,5]=M23$par[5]

  res[s,6]=M23$par[6]

  res[s,7]=M23$value

  res[s,8]=M21$par[1]

  res[s,9]=M21$par[2]

  res[s,10]=M21$par[3]

  res[s,11]=M21$par[4]

  res[s,12]=M21$par[5]

  res[s,13]=M21$par[6]

  res[s,14]=M21$par[7]

  res[s,15]=M21$value

  res[s,16]=M23$value-M21$value

  res[s,17]=M23$convergence

  res[s,18]=M21$convergence

  write.table(res,compare23_21TE061221.txt)

  rm(M23,M21)

   }

}



For some strange reason the results do not correspond to what I expect:
about 10% of the solutions have a difference of NLL smaller than 0. I
have
verified the optimisation of these results manually and found that a
minimal
NLL was ignored and a higher NLL was returned at convergence. To check
what was happening I inserted a printing line in the NLL function to
print
all parameters and the NLL as the procedure goes on. To my surprise
optim
then stopped at the minimal NLL which had been ignored before. I have
then
reduced the machine precision to .Machine$double.digits=8 thinking, that
the
printing was slowing down the procedure and by reducing the 

Re: [R] a question of substitute

2007-01-10 Thread Gabor Grothendieck
Looks like oneway.test has been changed for R 2.5.0.

Paste the code in this file:

   https://svn.r-project.org/R/trunk/src/library/stats/R/oneway.test.R

into your session.  Then fun.2 from your post will work without
the workarounds I posted:

   fun.2(values ~ group)


On 1/9/07, Adrian Dusa [EMAIL PROTECTED] wrote:
 On Tuesday 09 January 2007 15:14, Gabor Grothendieck wrote:
  oneway.test is using substitute on its arguments so its literally
  getting formula rather than the value of formula.

 Ah-haa... I understand now. Thanks for the tips, they both work as expected.
 Best,
 Adrian

  Try these:
 
  fun.3 - function(formula) {
mc - match.call()
mc[[1]] - as.name(oneway.test)
eval.parent(mc)
  }
  fun.3(values ~ group)
 
  fun.4 - function(formula) {
do.call(oneway.test, list(formula))
  }
  fun.4(values ~ group)
 
  On 1/9/07, Adrian Dusa [EMAIL PROTECTED] wrote:
   Hi all,
  
   I want to write a wrapper for an analysis of variance and I face a
   curious problem. Here are two different wrappers:
  
   fun.1 - function(formula) {
  summary(aov(formula))
   }
  
   fun.2 - function(formula) {
  oneway.test(formula)
   }
  
   values - c(15, 8, 17, 7, 26, 12, 8, 11, 16, 9, 16,
  24, 20, 19, 9, 17, 11, 8, 15, 6, 14)
   group - rep(1:3, each=7)
  
   # While the first wrapper works just fine:
   fun.1(values ~ group)
  
   # the second throws an error:
   fun.2(values ~ group)
   Error in substitute(formula)[[2]] : object is not subsettable
  
   ###
  
   I also tried binding the two vectors in a data.frame, with no avail.
   I did find a hack, creating two new vectors inside the function and
   creating a fresh formula, so I presume this has something to do with
   environments.
  
   Could anybody give me a hint on this?
   Thank you,
   Adrian
  
   --
   Adrian Dusa
   Romanian Social Data Archive
   1, Schitu Magureanu Bd
   050025 Bucharest sector 5
   Romania
   Tel./Fax: +40 21 3126618 \
+40 21 3120210 / int.101
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html and provide commented,
   minimal, self-contained, reproducible code.

 --
 Adrian Dusa
 Arhiva Romana de Date Sociale
 Bd. Schitu Magureanu nr.1
 050025 Bucuresti sectorul 5
 Romania
 Tel./Fax: +40 21 3126618 \
  +40 21 3120210 / int.101


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] go or goto command

2007-01-10 Thread Duncan Murdoch
On 1/10/2007 12:13 AM, Thomas L Jones wrote:
 Some computer languages, including C, have a go or go to command 
 which can be used to shift control to a different part of a function.
 
 Question: Does the R language have a corresponding command? (Yes, I am 
 aware that it can be abused; but, in the hands of a good programmer, 
 it is simple to use.)

No, it doesn't.

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression packages

2007-01-10 Thread David Barron
1. multinom is is the nnet package

2. There is a polyclass function in package polspline

On 10/01/07, Feng Qiu [EMAIL PROTECTED] wrote:
 Hi All:
I'm testing a set of data classification algorithms in this paper
 (www.stat.wisc.edu/~loh/treeprogs/quest1.7/mach1317.pdf )
 I couldn't find such algorithms in R packages:
1. LOG: polytomous logistic regression (there was one in MASS
 library: multinom. But after I update MASS library, multinom was lost.)
2. POL: POLYCLASS algorithm. There is a S-Plus package(polyclass
 library) for this algorithm, so there should be a corresponding package in
 R, but I haven't found it so far.
Any advice is appreciated.

 Best,

 Feng

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



--
=
David Barron
Said Business School
University of Oxford
Park End Street
Oxford OX1 1HP


-- 
=
David Barron
Said Business School
University of Oxford
Park End Street
Oxford OX1 1HP

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] SAS and R code hazard ratios

2007-01-10 Thread Colleen . Ross
Greetings,

I am new to R and have been comparing CPH survival analysis hazard ratios 
between R and SAS PhReg.  The binary covariates' HRs are the same, however 
the continuous variables, for example age, have quite different HRs 
although in the same direction.  SAS PhReg produces HRs which are the 
change in risk for every one increment change in the independent variable. 
 How do I interpret the HRs produced by R?Thanks much, C

Colleen Ross, MS
Clinical Research Unit
Kaiser Permanente Colorado
(303) 614-1244


NOTICE TO RECIPIENT:  If you are not the intended recipient ...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a question of substitute

2007-01-10 Thread Adrian Dusa
On Wednesday 10 January 2007 19:03, Gabor Grothendieck wrote:
 Looks like oneway.test has been changed for R 2.5.0.
 Paste the code in this file:
https://svn.r-project.org/R/trunk/src/library/stats/R/oneway.test.R
 into your session.  Then fun.2 from your post will work without
 the workarounds I posted:
fun.2(values ~ group)

Brilliant :)
Super fast change, this is why I love R.
Cheers,
Adrian

-- 
Adrian Dusa
Romanian Social Data Archive
1, Schitu Magureanu Bd
050025 Bucharest sector 5
Romania
Tel./Fax: +40 21 3126618 \
  +40 21 3120210 / int.101

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scripts with littler

2007-01-10 Thread John Lawrence Aspden
Brian Ripley wrote:

 Exactly as documented.  The argument is named 'new' and not 'add', BTW.
 Please 'be careful' in what you say about the work of others.

Agreed, no criticism intended. I really like R. Sorry.

Cheers, John.

-- 
Contractor in Cambridge UK -- http://www.aspden.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 2 problems with latex.table (quantreg package) - reproducible

2007-01-10 Thread Kati Schweitzer
Dear all,

When using latex.table from the quantreg package, I don't seem to be able to set
table.env=FALSE: when I don't specify caption (as I think I should, when
understanding the R help rightly(?)), I get an error message, and when I
do so, of course I get one, as well.
The funny thing is, that a table is indeed produced in the first case,
so I get a nice tabular, but as I'm using the command within a for -
loop, the loop stops due to the error and only one latex table is
produced.

Example R-Code:

library(quantreg)

v1 - c(val1,val1,val2)
v2 - c(val1,val2,val2)
tab - table(v1,v2)

latex.table(tab,table.env=FALSE)
#error - german R error message (saying that caption  is missing and
has no default :-) ):
#Fehler in cat(caption, \n, file = fi, append = TRUE) :
#   Argument caption fehlt (ohne Standardwert)

latex.table(tab,table.env=FALSE,caption=nothing)
#error - german R error message:
#Fehler in latex.table(tab, table.env = FALSE, caption = nothing) :
#   you must have table.env=TRUE if caption is given


The second problem is, that - when using latex.table to produce a
tabular within a table environment - I would like to specify cgroup
with only one value - one multicolumn being a heading for both columns
in the table.
But I'm not able to produce latex-compilable code:

latex.table(tab,cgroup=v2,caption=my table)

gives me the following latex code:
\begin{table}[hptb]
\begin{center}
\begin{tabular}{|l||c|c|} \hline
\multicolumn{1}{|l||}{\bf
tab}\multicolumn{}{c||}{}\multicolumn{2}{c|}{\bf v2}\\ \cline{2-3}
\multicolumn{1}{|l||}{}\multicolumn{1}{c|}{val1}\multicolumn{1}{c|}{val2}\\
\hline
val111\\
val201\\
\hline
\end{tabular}
\vspace{3mm}
\caption{my table\label{tab}}
\end{center}
\end{table}

and within this code the problem is the second multicolumn
(\multicolumn{}{c||}{}), as it has no number specifying how many
columns the multicolumn should cover. Latex (at least my version)
complains.
When deleting this part of the code, the table is compiled and looks
exactly how I want it to look. I'm doing this with a system call and
an shell script right now, but this seems pretty ugly to me...

When I specify 2 columns, this problem doesn't occur:
latex.table(tab,cgroup=c(blah,v2),caption=my table)

I'm running R Version 2.3.0 (2006-04-24) on a linux machine Fedora
Core 5 (i386).

Can anyone help me find my mistakes?

Thanks a lot
... and sorry for my bad English and potential newbie mistakes!!
Kati

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using DBI

2007-01-10 Thread Bond, Stephen
can anybody provide a reference to a document on using DBI and what is
needed to make it work?
 
I try: drv=dbDriver(Oracle)
 
and it complains about not finding a function.
The DBI.pdf does not require anything else installed, but looking at the
archive I discovered there is more stuff to be installed.
This is on Solaris 5.10 with a full Oracle install, so all Oracle libs
are available and sqlplus works. 
I am trying to connect remotely to another Oracle instance. 
 
Thank you very much
Stephen

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 2 problems with latex.table (quantreg package) - reproducible

2007-01-10 Thread roger koenker
The usual R-help etiquette recommends:

1.  questions about packages go to the maintainer, not to R-help.

2.  examples should be reproducible:  ie self contained.

if you look carefully at the function latex.summary.rqs  you will see
that there is a failure to pass the argument ... on to  
latex.table.  This
_may_ be the source of your problem if in fact your v1 and v2 were
summary.rqs objects, but I doubt that they are.

You might try caption = .  More generally there are much improved
latex tools elsewhere in R; if you aren't making tables that are  
specific
to quantreg, you might want to use them.


url:www.econ.uiuc.edu/~rogerRoger Koenker
email[EMAIL PROTECTED]Department of Economics
vox: 217-333-4558University of Illinois
fax:   217-244-6678Champaign, IL 61820


On Jan 10, 2007, at 12:23 PM, Kati Schweitzer wrote:

 Dear all,

 When using latex.table from the quantreg package, I don't seem to  
 be able to set
 table.env=FALSE: when I don't specify caption (as I think I should,  
 when
 understanding the R help rightly(?)), I get an error message, and  
 when I
 do so, of course I get one, as well.
 The funny thing is, that a table is indeed produced in the first case,
 so I get a nice tabular, but as I'm using the command within a for -
 loop, the loop stops due to the error and only one latex table is
 produced.

 Example R-Code:

 library(quantreg)

 v1 - c(val1,val1,val2)
 v2 - c(val1,val2,val2)
 tab - table(v1,v2)

 latex.table(tab,table.env=FALSE)
 #error - german R error message (saying that caption  is missing and
 has no default :-) ):
 #Fehler in cat(caption, \n, file = fi, append = TRUE) :
 #   Argument caption fehlt (ohne Standardwert)

 latex.table(tab,table.env=FALSE,caption=nothing)
 #error - german R error message:
 #Fehler in latex.table(tab, table.env = FALSE, caption = nothing) :
 #   you must have table.env=TRUE if caption is given


 The second problem is, that - when using latex.table to produce a
 tabular within a table environment - I would like to specify cgroup
 with only one value - one multicolumn being a heading for both columns
 in the table.
 But I'm not able to produce latex-compilable code:

 latex.table(tab,cgroup=v2,caption=my table)

 gives me the following latex code:
 \begin{table}[hptb]
 \begin{center}
 \begin{tabular}{|l||c|c|} \hline
 \multicolumn{1}{|l||}{\bf
 tab}\multicolumn{}{c||}{}\multicolumn{2}{c|}{\bf v2}\\ \cline{2-3}
 \multicolumn{1}{|l||}{}\multicolumn{1}{c|}{val1}\multicolumn{1} 
 {c|}{val2}\\
 \hline
 val111\\
 val201\\
 \hline
 \end{tabular}
 \vspace{3mm}
 \caption{my table\label{tab}}
 \end{center}
 \end{table}

 and within this code the problem is the second multicolumn
 (\multicolumn{}{c||}{}), as it has no number specifying how many
 columns the multicolumn should cover. Latex (at least my version)
 complains.
 When deleting this part of the code, the table is compiled and looks
 exactly how I want it to look. I'm doing this with a system call and
 an shell script right now, but this seems pretty ugly to me...

 When I specify 2 columns, this problem doesn't occur:
 latex.table(tab,cgroup=c(blah,v2),caption=my table)

 I'm running R Version 2.3.0 (2006-04-24) on a linux machine Fedora
 Core 5 (i386).

 Can anyone help me find my mistakes?

 Thanks a lot
 ... and sorry for my bad English and potential newbie mistakes!!
 Kati

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Meeting announcement: An Introduction to Data Analysis Using R

2007-01-10 Thread Paul Mathews
*An Introduction to Data Analysis Using R*
by Paul Mathews
for The Quality Managers Network, Lakeland Community College, Kirtland, Ohio
26 January 2007, 7:30-9:00 AM, Room T136.

R is free software used for graphical and statistical data analysis that 
can be downloaded from the Internet. Because R code is written and 
shared by many people around the world, it has a tremendous range of 
capabilities. Consequently, most graphical and statistical methods that 
have been implemented in any commercially-available statistical software 
package have been implemented in R.

Paul Mathews will present an introduction to data analysis using R 
including:
  * How to obtain and install R and R packages.
  * How to access limited R functions from the R Commander GUI package.
  * How to enter your data into R.
  * How to graph your data.
  * How to perform basic statistical analyses.
  * How to analyze designed experiments.
  * How to save your work.
  * How to learn more about R.

-- 
Paul Mathews
Mathews Malnar and Bailey, Inc.
217 Third Street, Fairport Harbor, OH 44077
Phone: 440-350-0911
Fax: 440-350-7210
E-mail:
Paul: [EMAIL PROTECTED], [EMAIL PROTECTED]
Rebecca: [EMAIL PROTECTED], [EMAIL PROTECTED]
Web: www.mmbstatistical.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using DBI

2007-01-10 Thread chao gai
The way MySQL works, I use RMySQL to contact, which in turn uses DBI.
There is a library ROracle which, from the manual, works the same. Hence I 
would start by looking at Roracle for the connection and proceed from there.

Kees

On Wednesday 10 January 2007 19:39, Bond, Stephen wrote:
 can anybody provide a reference to a document on using DBI and what is
 needed to make it work?

 I try: drv=dbDriver(Oracle)

 and it complains about not finding a function.
 The DBI.pdf does not require anything else installed, but looking at the
 archive I discovered there is more stuff to be installed.
 This is on Solaris 5.10 with a full Oracle install, so all Oracle libs
 are available and sqlplus works.
 I am trying to connect remotely to another Oracle instance.

 Thank you very much
 Stephen

   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] map data.frame() data after having linked them to a read.shape() object

2007-01-10 Thread Roger Bivand
On Wed, 10 Jan 2007, Tord Snäll wrote:

 Dear all,
 I try to first link data in a data.frame() to a (polygon) read.shape() 
 object and then to plot a polygon map showing the data in the 
 data.frame. The first linking is called link in ArcView and relate 
 in ArcMap. I use the code shown below, though without success.
 
 Help with this would be greatly appreciated.

Please consider continuing this thread on the R-sig-geo list. Searching 
the archives might also have given you some leads. For now, and apart from 
not using sp classes (see note in R News in 2005), you have 490 polygons 
in the shapefile - probably one duplicate and 489 unique entities in 
STACOUNT4, but only 106 unique stacount of 431 observations in the data 
frame. The plot method for Map objects is deprecated. The classes in the 
sp package use S4, not S3, specifically to help users avoid putting things 
inside objects that break methods.

In maptools, see ?readShapePoly, and ?spCbind to see how to read a
shapefile into an sp object (check the 490/489 issue), and how the
Polygons IDs can be used as a key with the data frame row names to make
this easier to do. Please also consider using FIPS numbers as IDs for
counties; the five digit ssccc ID is fairly standard, and avoids the
problem of repetitive county names across US states.

 
 Thanks!
 
 Tord
 
 require(maptools)
 # Read shape file (one row per county)
 a=read.shape(myshp.shp, dbf.data=TRUE, verbose=TRUE)
 str(a)
   ..- attr(*, maxbb)= num [1:4] -100   4900
   ..- attr(*, class)= chr ShapeList
 $ att.data:'data.frame':   490 obs. of  60 variables:
   ..$ STATE_FIPS: Factor w/ 12 levels 04,06,08,..: 11 11 11 11 4 5
 5 5 5 5 ...
 [snip]
   ..$ STACOUNT4 : Factor w/ 489 levels ArizonaApache,..: 437 460 451
 453 147 207 195 198 231 206 ...
   ..- attr(*, data_types)= chr [1:60] C C C C ...
 - attr(*, class)= chr Map
 
 
 # Read case data (one row per case)
 cases = read.table(cases.txt, h=T,)
 str(cases)
 'data.frame':   431 obs. of  8 variables:
 $ Year: int  1950 1950 1950 1951 1956 1957 1959 1959 1959 1959 ...
 $ Case: int  3 1 2 1 1 1 2 4 1 3 ...
 $ stacount: Factor w/ 106 levels ArizonaApache,..: 1 66 76 66 26 29
 15 25 30 60 ...
 
 # table the cases data PER Year, PER County (County = stacount)
 temp = t(table(cases[,c(Year,stacount)]))
 stacount = dimnames(temp)$stacount
 temp = cbind(stacount, as.data.frame(temp[,1:ncol(temp)],row.names=F))
 str(temp)
 'data.frame':   106 obs. of  50 variables:
 $ stacount: Factor w/ 106 levels ArizonaApache,..: 1 2 3 4 5 6 7 8 9
 10 ...
 $ 1950: int  1 0 0 0 0 0 0 0 0 0 ...
 [snip]
 $ 2005: int  0 0 0 0 0 0 0 0 0 0 ...
 
 # Pick out a temporary attribute data.frame
 datfr = a$att.data
 
 # Merge the temporaty data frame with tabled cases
 for(i in 2:ncol(temp)){
  datfr = merge(datfr, temp[,c(1,i)], by.x=STACOUNT4,
 by.y=stacount, all.x=T, all.y=F)
 }
 
 #Replace NAs with 0:
 for(i in 61:109){
  datfr[,i] = ifelse(is.na(datfr[,i])==T,0,datfr[,i])
 }
 
 str(a$att.data)
 'data.frame':   490 obs. of  60 variables:
 $ NAME  : Factor w/ 416 levels Ada,Adams,..: 120 352 265 277 33
 210 122 135 372 209 ...
 [snip]
 $ STACOUNT4 : Factor w/ 489 levels ArizonaApache,..: 437 460 451 453
 147 207 195 198 231 206 ...
 - attr(*, data_types)= chr  C C C C ...
 # Note that the above data is of attribute type
 
 str(datfr)
 'data.frame':   490 obs. of  109 variables:
 $ STACOUNT4 : Factor w/ 489 levels ArizonaApache,..: 1 2 3 4 5 6 7 8
 9 10 ...
 [snip]
 $ 1951  : num  0 0 0 0 0 0 0 0 0 0 ...
 [snip]
 $ 2005  : num  0 0 0 0 0 0 0 0 0 0 ...
 # Note that at the end of this, data type is not described - it is a 
 simple data frame
 
 # bind data together:
 #Alternative 1:
 a$att.data = cbind(a$att.data, datfr[,61:109])
 # Other alternatives:
 test = matrix(ncol=49)
 a$att.data[,61:109] = test
 a$att.data[,61:109] = datfr[,61:109]
 
 # plot:
 plot(a, auxvar=a$att.data[,61], xlim=c(-125,-99),ylim=c(28,52), xlab=,
 ylab=, frame.plot=F,axes=F)
 There were 50 or more warnings (use warnings() to see the first 50)
 warnings()
 49: axes is not a graphical parameter in: polygon(xy$x, xy$y, 
 col,border, lty, ...)
 50: frame.plot is not a graphical parameter in: polygon(xy$x, 
 xy$y,col, border, lty, ...)
 
 # The a$att.data type has changed to becoming a typical data.frame - 
 attr is not mentioned:
 str(a$att.data)
 [snip]
 $ 2003  : num  0 0 0 0 0 0 0 0 0 0 ...
 $ 2004  : num  0 0 0 0 0 0 0 0 0 0 ...
 $ 2005  : num  0 0 0 0 0 0 0 0 0 0 ...
 
 
 
 

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, 

[R] Installation problem with package mixtools

2007-01-10 Thread David
I am trying to install mixtools on Debian Etch and get the following error

dell-2 /usr/lib # R CMD INSTALL mixtools_0.1.0.tar.gz
* Installing *source* package 'mixtools' ...
** libs
gcc -I/usr/share/R/include -I/usr/share/R/include -fpic  -g -O2 -std=gnu99 
-c new_svalues.c -o new_svalues.o
gfortran   -fpic  -g -O2 -c sphericaldepth.f -o sphericaldepth.o
make: gfortran: Command not found
make: *** [sphericaldepth.o] Error 127
ERROR: compilation failed for package 'mixtools'
** Removing '/usr/local/lib/R/site-library/mixtools'

I have upgraded my installation of libgfortran and have added /usr/lib
to PATH but the error persists.

The same file has installed perfectly on a similar machine. I am not
sure how to proceed from here.

thanks
Dave

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression packages

2007-01-10 Thread Feng Qiu
Hi David:
 Thanks for you information. 2 further questions:
  1. I found out that multinom is not doing politomous logistic 
regression, do you know which function does this?
  2. the polyclass in polspline does polychotomous regression, 
while I'm looking for polytomous regression. Do you think these two are 
similar int erms of prediction?

Best,

Feng


- Original Message - 
From: David Barron [EMAIL PROTECTED]
To: r-help r-help@stat.math.ethz.ch
Sent: Wednesday, January 10, 2007 12:14 PM
Subject: Re: [R] logistic regression packages


 1. multinom is is the nnet package

 2. There is a polyclass function in package polspline

 On 10/01/07, Feng Qiu [EMAIL PROTECTED] wrote:
 Hi All:
I'm testing a set of data classification algorithms in this 
 paper
 (www.stat.wisc.edu/~loh/treeprogs/quest1.7/mach1317.pdf )
 I couldn't find such algorithms in R packages:
1. LOG: polytomous logistic regression (there was one in MASS
 library: multinom. But after I update MASS library, multinom was lost.)
2. POL: POLYCLASS algorithm. There is a S-Plus 
 package(polyclass
 library) for this algorithm, so there should be a corresponding package 
 in
 R, but I haven't found it so far.
Any advice is appreciated.

 Best,

 Feng

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 =
 David Barron
 Said Business School
 University of Oxford
 Park End Street
 Oxford OX1 1HP


 -- 
 =
 David Barron
 Said Business School
 University of Oxford
 Park End Street
 Oxford OX1 1HP

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] go or goto command

2007-01-10 Thread Peter Dalgaard
Thomas L Jones wrote:
 Some computer languages, including C, have a go or go to command 
 which can be used to shift control to a different part of a function.

 Question: Does the R language have a corresponding command? (Yes, I am 
 aware that it can be abused; but, in the hands of a good programmer, 
 it is simple to use.)
   
Nope. Only break, next, and return.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SAS and R code hazard ratios

2007-01-10 Thread Peter Dalgaard
[EMAIL PROTECTED] wrote:
 Greetings,

 I am new to R and have been comparing CPH survival analysis hazard ratios 
 between R and SAS PhReg.  The binary covariates' HRs are the same, however 
 the continuous variables, for example age, have quite different HRs 
 although in the same direction.  SAS PhReg produces HRs which are the 
 change in risk for every one increment change in the independent variable. 
  How do I interpret the HRs produced by R?Thanks much, C

   
I'm not aware of peculiarities. You're not giving us much to go on 
though. In fact, not even the function used to fit the model with.
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
   
Exactly...

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SAS and R code hazard ratios

2007-01-10 Thread Thomas Lumley
On Wed, 10 Jan 2007, [EMAIL PROTECTED] wrote:
 I am new to R and have been comparing CPH survival analysis hazard ratios
 between R and SAS PhReg.  The binary covariates' HRs are the same, however
 the continuous variables, for example age, have quite different HRs
 although in the same direction.  SAS PhReg produces HRs which are the
 change in risk for every one increment change in the independent variable.
 How do I interpret the HRs produced by R?Thanks much, C


What function did you use to fit the model in R?  If you used coxph(), in 
the survival package then you should get the same answers as SAS. If you 
used cph() in the design package then the output says what the increment 
is that correponds to the quoted hazard ratio, and the default is the 
interquartile range.

-thomas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] TCL/TK and R documentation?

2007-01-10 Thread Michael Prager

I am hoping something has changed since I last asked about this.

Is there any good source of documentation, including examples, of using 
tcl/tk as a gui for R applications? 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] axis labels at subset of tick marks

2007-01-10 Thread Darren Weber
For example, this works:

x = seq(-100, 1000, 25)
y = x * x
plot(x,y, xaxt=n)
axis(1,x,FALSE,tcl=-0.3)
axis(1,x[x %% 100 ==0])

It creates two axis objects and the values of the x-axis are the
labels.  The following scenario is more difficult, because it uses
'image' to plot a grid of values:

a = matrix(rnorm(100),ncol=10)
image(1:ncol(a), 1:nrow(a), a, xaxt=n, yaxt=n)
# adding arbitrary categorical labels to y-axis
ylabels=c('A','B','C','D','E','F','G','H','I','J')
axis(2, at=1:10, labels=ylabels, las=HORIZONTAL-1)
# adding arbitrary categorical labels to x-axis,
# for every nth category; first add tick marks,
# then the labels
axis(1, at=1:10, FALSE)
xlabels=c('A','','C','','E','','G','','I','')  # this is length=10
axis(1, at=1:10, labels=xlabels)

This works, but when using axis, the 'at=1:10' and the length(xlabels)
must be the same length.  Is it ever possible to specify axis when
these values are not the same length?  That is, the following does not
work:

x = seq(-100, 1000, 25)
y = x * x
plot(x,y, xaxt=n)
axis(1,x,labels=x[x %% 100 ==0])

Error in axis(side, at, labels, tick, line, pos, outer, font, lty, lwd,  :
'at' and 'label' lengths differ, 45 != 12

Would it be possible to have axis automatically find the right
location within the x-axis for these labels?  Perhaps something like:

xt = (x %% 100 == 0)
xlabels = x
xlabels[!xt] = ''

This would leave the intermittent label values.  I guess a for loop is
needed to generate xt for irregular intervals in the labels (eg, c(10,
50, 75, 150, 400)).  If the axis command could do this, it would not
be necessary to call it twice to get a subset of labels for all the
tick marks.  That is, it would create tick marks for the 'at'
specification and labels at the 'labels' specification (with no
restraint on these being equal lengths).

Regards, Darren

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problems with optim, for-loops and machine precision

2007-01-10 Thread Ken Beath
Two possibilities for why your 7 parameter model fits worse than the 6
are that you are finding a local maximum, which might suggest using a
different parameterisation or the functions are accessing some global
data and so aren't behaving as expected. This could be why they work
properly when run on their own.

I would also check what happens if convergence fails for the 7 parameter
model, in your code this isn't handled well. Also if the constraint on
parameters of =0 is actually 0, it may be better to work with
parameters on the log scale, avoiding the constraints.

I have found with simulations it is useful to save the fitted objects,
so they can be inspected later, or for the parameters to be extracted
after the models are fitted. This method allows restarting in case of
computer crashes.

Ken

 Simon Ruegg [EMAIL PROTECTED] 01/10/07 11:18 PM 
Dear R experts,

 

I have been encountering problems with the optim routine using for
loops. I am determining the optimal parameters of several nested models
by
minimizing the negative Log-Likelihood (NLL) of a dataset. 

 

The aim is to find the model which best describes the data. To this end,
I
am simulating artificial data sets based on the model with the least
number
of parameters (6) and the parameters determined with the field data. For
each artificial set I estimate the parameters of the model with 6
parameters
and the next more complex model with 7 parameters (two of these
parameters
are equal in the 6-parameter model) by minimizing the corresponding NLL
with
optim. In theory the 7-parameter model should fit the data either
equally
or better than the 6-parameter model. Therefore the difference of the
minimal NLLs should be 0 or larger.

For 500 data sets I use the following code:

 

require(odesolve)

res=matrix(nrow=500,ncol=18)

colnames(res)=c(a_23,beta_23,mu_23,d_23,I_23,M_23,NLL_23,

   
a_21,beta_21,mu_21,e_21,d_21,I_21,M_21,NLL_21,

NLL23_min_21,conv23,conv21)

for (s in 1:500)

{

 
assign(data,read.table(paste(populations/TE23simset_,s,.txt,sep=)),e
nv=MyEnv) #reading a data set

 

  M23=optim(rep(0.1,6),NLL23,method=L-BFGS-B,lower=0,

  upper=c(Inf,Inf,Inf,Inf,1,1),control=c(maxit=150))

  if (M23$convergence==0)

{

   M21=optim(rep(0.1,7),NLL21,method=L-BFGS-B,lower=0,

upper=c(Inf,Inf,Inf,Inf,Inf,1,1),control=c(maxit=150))

}

  res[s,1]=M23$par[1]

  res[s,2]=M23$par[2]

  res[s,3]=M23$par[3]

  res[s,4]=M23$par[4]

  res[s,5]=M23$par[5]

  res[s,6]=M23$par[6]

  res[s,7]=M23$value

  res[s,8]=M21$par[1]

  res[s,9]=M21$par[2]

  res[s,10]=M21$par[3]

  res[s,11]=M21$par[4]

  res[s,12]=M21$par[5]

  res[s,13]=M21$par[6]

  res[s,14]=M21$par[7]

  res[s,15]=M21$value

  res[s,16]=M23$value-M21$value

  res[s,17]=M23$convergence

  res[s,18]=M21$convergence

  write.table(res,compare23_21TE061221.txt)

  rm(M23,M21)

   }

}

 

For some strange reason the results do not correspond to what I expect:
about 10% of the solutions have a difference of NLL smaller than 0. I
have
verified the optimisation of these results manually and found that a
minimal
NLL was ignored and a higher NLL was returned at convergence. To check
what was happening I inserted a printing line in the NLL function to
print
all parameters and the NLL as the procedure goes on. To my surprise
optim
then stopped at the minimal NLL which had been ignored before. I have
then
reduced the machine precision to .Machine$double.digits=8 thinking, that
the
printing was slowing down the procedure and by reducing the machine
precision to speed up the calculations. For an individual calculation
this
solved my problem. However if I implemented the same procedure in the
loop
above, the same impossible results occurred again.

 

Can anyone tell me where I should be looking for the problem, or what it
is
and how I could solve it?

 

Thanks a lot for your help

 

 

Sincerely

 

Simon

 

 

 



Simon Ruegg

Dr.med.vet.,  PhD candidate

Institute for Parasitology

Winterthurstr. 266a

8057 Zurich

Switzerland

 

phone: +41 44 635 85 93

fax: +41 44 635 89 07

e-mail: [EMAIL PROTECTED]

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] TCL/TK and R documentation?

2007-01-10 Thread Peter Dalgaard
Michael Prager wrote:
 I am hoping something has changed since I last asked about this.

 Is there any good source of documentation, including examples, of 
 using tcl/tk as a gui for R applications?

Well, this probably hasn't changed, but did you see it in the first place?

http://bioinf.wehi.edu.au/~wettenhall/RTclTkExamples/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A question about R environment

2007-01-10 Thread François Pinard
[Philippe Grosjean]

Please, don't reinvent the wheel: putting functions in a dedicated 
environment is one of the things done by R packages (together with a 
good documentation of the function, and making them easily installable 
on any R implementation).

[...] this is probably the time for you to read the Writing 
R extensions manual, and to start implementing your own R package!

Hi, Philippe, and gang!

I read this manual long ago and used it to create packages already.  You 
really got the impression I did not read it? :-)

You know, there are small wheels, and huge wheels.  I do not see why 
I would use complex devices for tiny problems, merely because those 
complex devices exist.  R packages undoubtedly have their virtues, of 
course.  But just like many statistical tests, they do not always apply.

Why go at length organising package directories populated with many 
files, resorting to initialisation scripts, using package validators,
creating documentation files and processing them, go through the cycle 
of creating a package and installing it, all that merely for a few small 
quickies that fit very well in the ubiquitous .Rprofile file?  Why worry 
about installation on any R implementation, for little things only meant 
for myself, and too simple to warrant publication anyway?

Keep happy, all.

-- 
François Pinard   http://pinard.progiciels-bpi.ca

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] axis date format in lattice

2007-01-10 Thread karl . sommer
Hello list,

plotting the following example 1 in lattice only labels the x-axis with one
tick mark at the beginning of February.
In contrast the standard plot() function labels the x-axis with equally
spaced tick marks.

#example 1 ---
library(lattice)
date - as.Date(c(1/10/2006, 01/11/2006, 01/12/2006,
  01/01/2007, 01/02/2007),format=%d/%m/%Y)
xx - c(1,3,5,6,7)
xx1 - as.data.frame(cbind(date, xx))
xyplot(xx~date, scales=list(x=list(format = %b)))
plot(xx~date)

Interestingly the behaviour in lattice only seems to arise when there is a
change in calendar years.  This is demonstrated by the second example with
dates from within 1 calendar year only where lattice works as expected.

I was wondering if the behaviour in example 1 was due to a bug or am I
missing something and, more importantly, is there a way around it?

#---example 2 ---
date - as.Date(c(1/08/2006, 01/09/2006, 01/10/2006,
  01/11/2006, 01/12/2006),format=%d/%m/%Y)
xx - c(1,3,5,6,7)
xx1 - as.data.frame(cbind(date, xx))
xyplot(xx~date, scales=list(x=list(format = %b)))
plot(xx~date)

Cheers
Karl

_
Dr Karl J Sommer,
Department of Primary Industries,
Catchment  Agriculture Services,
PO Box 905
Mildura, VIC, Australia 3502

Tel: +61 (0)3 5051 4390
Fax +61 (0)3 5051 4534

Email: [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a question of substitute

2007-01-10 Thread Prof Brian Ripley
The 'Right Thing' is for oneway.test() to allow a variable for the first 
argument, and I have altered it in R-patched and R-devel to do so. So if 
your students can make use of R-patched that would be the best solution. 
If not, perhaps you could make a copy of oneway.test from R-patched 
available to them.  Normally I would worry about namespace issues, but it 
seems unlikely they would matter here: if they did assignInNamespace is 
likely to work to insert the fix.

Grothendieck's suggestions are steps towards a morass: they may work in 
simple cases but can make more complicated ones worse (such as looking for 
'data' in the wrong place).  These model fitting functions have rather 
precise requirements for where they look for their components:

'data'
the environment of 'formula'
the environment of the caller

and that includes where they look for 'data'.  It is easy to use 
substitute or such to make a literal formula out of 'formula', but doing 
so changes its environment.  So one needs to either

(a) fix up an environment within which to evaluate the modified call that 
emulates the scoping rules or

(b) create a new 'data' that has references to all the variables needed, 
and just call the function with the new 'formula' and new 'data'.

At first sight model.frame() looks the way to do (b), but it is not, since 
if there are function calls in the formula (e.g. log()) the model frame 
includes the derived variables and not the original ones.  There are 
workarounds (e.g. in glmmPQL), like using all.vars, creating a formula 
from that, setting its environment to that of the original function and 
then calling model.frame.

This comes up often enough that I have contemplated adding a solution to 
(b) to the stats package.

Doing either of these right is really pretty complicated, and not 
something to dash off code in a fairly quick reply (or even to check that 
the code in glmmPQL was general enough to be applicable).

On Tue, 9 Jan 2007, Adrian Dusa wrote:

 On Tuesday 09 January 2007 15:41, Prof Brian Ripley wrote:
 oneway.test expects a literal formula, not a variable containing a
 formula.  The help page says

   formula: a formula of the form 'lhs ~ rhs' where 'lhs' gives the
sample values and 'rhs' the corresponding groups.

 Furthermore, if you had

 foo.2 - function() oneway.test(value ~ group)

 it would still not work, as

  data: an optional matrix or data frame (or similar: see
'model.frame') containing the variables in the formula
'formula'.  By default the variables are taken from
'environment(formula)'.

 I could show you several complicated workarounds, but why do you want to
 do this?

 Thank you for your reply. The data argument was exactly the next problem I
 faced. My workaround involves checking if(missing(data)) then uses different
 calls to oneway.test(). I am certainly interested in other solutions, this
 one is indeed limited.

 I do this for the students in the anova class, checking first the homogeneity
 of variances with fligner.test(), printing the p.value and based on that
 changing the var.equal argument in the oneway.test()
 It's just for convenience, but they do like having it all-in-one.

 Best regards,
 Adrian



-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] posthoc tests with ANCOVA

2007-01-10 Thread Richard M. Heiberger
## The WoodEnergy example in package HH (available on CRAN) is similar
## to the example you asked about.  I have two factors and a
## covariate, and the interaction between the factors and covariate is
## significant.  I construct the multiple comparisons of factor Stove
## at each level of factor Wood at a specified value of the covariate.


## open a trellis.device() with history recording on.
## There are many plots and you will need to move among them.

require(HH)

source(paste(file.path(.path.package(package=HH)[1], scripts),
 MMC.WoodEnergy-aov.R, sep=/))
## This call prints an Error message that you can ignore because
## the relevant call is inside a try() expression.
anova(energy.aov.4)  ## source() needs a print() statement, demo() doesn't.


source(paste(file.path(.path.package(package=HH)[1], scripts),
 MMC.WoodEnergy.R, sep=/))
## The contrast matrix is developed beginning on line 91 of file
## MMC.WoodEnergy.R


## The MMC (mean-mean multiple comparisons) plots are described in R
## with ?MMC.  The MMC plots in R use the glht package.

## Both files are commented.  Please read the comments.

## Those source() commands in HH_1.17 will be replaced by demo()
## commands in the next version of HH.
##
## demo(MMC.WoodEnergy-aov, package=HH)
## demo(MMC.WoodEnergy, package=HH)


## glht is not currently able to work with aov objects that have the
## Error() function.  You will need to respecify your model formula
## using the terms() function.  See the maiz.aov example in the ?MMC
## help page.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] zero margin / marginless plots

2007-01-10 Thread David Forrest
Hi,

I'd like to produce a marginless or zero margin plot so that the pixel 
coordinates represent the mathematics.

xy-data.frame(x=c(0,1,1,0,0),y=c(0,1,0,0,1))
png('junk.png',width=300,height=300)
par(mar=c(0,0,0,0))
plot(xy$x,xy$y,xlim=c(0,1),ylim=c(,1))
dev.off()

The resultant file has about a 10 pixel margin around these lines, and I'm 
not sure what parameter or function is controlling this offset.  Any 
hints?

Thanks for your time,
Dave
-- 
  Dr. David Forrest
  [EMAIL PROTECTED](804)684-7900w
  [EMAIL PROTECTED] (804)642-0662h
http://maplepark.com/~drf5n/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] zero margin / marginless plots

2007-01-10 Thread Marc Schwartz
On Wed, 2007-01-10 at 21:18 -0600, David Forrest wrote:
 Hi,
 
 I'd like to produce a marginless or zero margin plot so that the pixel 
 coordinates represent the mathematics.
 
 xy-data.frame(x=c(0,1,1,0,0),y=c(0,1,0,0,1))
 png('junk.png',width=300,height=300)
 par(mar=c(0,0,0,0))
 plot(xy$x,xy$y,xlim=c(0,1),ylim=c(,1))
 dev.off()
 
 The resultant file has about a 10 pixel margin around these lines, and I'm 
 not sure what parameter or function is controlling this offset.  Any 
 hints?
 
 Thanks for your time,
 Dave

By default, the axis ranges are extended by +/- 4%.  You can change this
by using:

  plot(xy$x, xy$y, xlim = c(0, 1), ylim = c(0, 1), 
   xaxs = i, yaxs = i)

where 'xaxs' and 'yaxs' set the axis ranges to the actual data ranges.

See ?par for more information.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.