[R] degree of freedom of interaction effect

2007-03-15 Thread genomenet
Hi There,

Suppose two factors, A and B.
A has n levels, B has m levels.

Why the degree of freedom of interaction effect of A and B ( here I
mean A:B not A*B) is (n-1)*(m-1) not n*m-1?

I know this is a basic statistical question. Please recommend some
good websites or books.

Thank you very much.

Fan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cex in xlab, ylab and zlab of persp

2007-03-15 Thread Joseph Retzer
I've had no success using what I think is the correct code to change the 
default size of the x, y and z labels in a persp graph (i.e. cex.lab).   Is 
there a particular specification to accomplish this?
Many thanks,
Joe Retzer

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] replacing all NA's in a dataframe with zeros...

2007-03-15 Thread Gavin Simpson
On Wed, 2007-03-14 at 20:16 -0700, Steven McKinney wrote:
 Since you can index a matrix or dataframe with
 a matrix of logicals, you can use is.na()
 to index all the NA locations and replace them
 all with 0 in one command.
 

A quicker solution, that, IIRC,  was posted to the list by Peter
Dalgaard several years ago is:

sapply(mydata.df, function(x) {x[is.na(x)] - 0; x}))

Some timings on a larger problem with 100 columns:

 mydata.df - as.data.frame(matrix(sample(c(as.numeric(NA), 1), 
 size = 1000*100, replace = TRUE), 
 nrow = 1000))

 system.time(retval - sapply(mydata.df, 
   function(x) {x[is.na(x)] - 0; x}))
[1] 0.108 0.008 0.120 0.000 0.000

 system.time(mydata.df[is.na(mydata.df)] - 0)
[1] 2.460 0.028 2.498 0.000 0.000

And a larger problem still, 1000 columns

 mydata.df - as.data.frame(matrix(sample(c(as.numeric(NA), 1), 
 size = 1000*1000, replace = TRUE), 
 nrow = 1000))

 system.time(retval - sapply(mydata.df, function(x) {x[is.na(x)] - 0;
x}))
[1] 0.908 0.068 2.657 0.000 0.000
 system.time(mydata.df[is.na(mydata.df)] - 0)
[1] 43.127  0.332 46.440  0.000  0.000

Profiling mydata.df[is.na(mydata.df)] - 0 shows that it spends most of
this time subsetting the the individual cells of the data frame in turn
and setting the NA ones to 0.

HTH

G

  mydata.df - as.data.frame(matrix(sample(c(as.numeric(NA), 1), size = 30, 
  replace = TRUE), nrow = 6))
  mydata.df
   V1 V2 V3 V4 V5
 1  1 NA  1  1  1
 2  1 NA NA NA  1
 3 NA NA  1 NA NA
 4 NA NA NA NA  1
 5 NA  1 NA NA  1
 6  1 NA NA  1  1
  is.na(mydata.df)
  V1V2V3V4V5
 1 FALSE  TRUE FALSE FALSE FALSE
 2 FALSE  TRUE  TRUE  TRUE FALSE
 3  TRUE  TRUE FALSE  TRUE  TRUE
 4  TRUE  TRUE  TRUE  TRUE FALSE
 5  TRUE FALSE  TRUE  TRUE FALSE
 6 FALSE  TRUE  TRUE FALSE FALSE
  mydata.df[is.na(mydata.df)] - 0
  mydata.df
   V1 V2 V3 V4 V5
 1  1  0  1  1  1
 2  1  0  0  0  1
 3  0  0  1  0  0
 4  0  0  0  0  1
 5  0  1  0  0  1
 6  1  0  0  1  1
  
 
 Steven McKinney
 
 Statistician
 Molecular Oncology and Breast Cancer Program
 British Columbia Cancer Research Centre
 
 email: [EMAIL PROTECTED]
 
 tel: 604-675-8000 x7561
 
 BCCRC
 Molecular Oncology
 675 West 10th Ave, Floor 4
 Vancouver B.C. 
 V5Z 1L3
 Canada
 
 
 
 
 -Original Message-
 From: [EMAIL PROTECTED] on behalf of David L. Van Brunt, Ph.D.
 Sent: Wed 3/14/2007 5:22 PM
 To: R-Help List
 Subject: [R] replacing all NA's in a dataframe with zeros...
  
 I've seen how to  replace the NA's in a single column with a data frame
 
 * mydata$ncigs[is.na(mydata$ncigs)]-0
 
 *But this is just one column... I have thousands of columns (!) that I need
 to do this, and I can't figure out a way, outside of the dreaded loop, do
 replace all NA's in an entire data frame (all vars) without naming each var
 separately. Yikes.
 
 I'm racking my brain on this, seems like I must be staring at the obvious,
 but it eludes me. Searches have come up CLOSE, but not quite what I need..
 
 Any pointers?
 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] replacing all NA's in a dataframe with zeros...

2007-03-15 Thread Peter Dalgaard
Gavin Simpson wrote:
 On Wed, 2007-03-14 at 20:16 -0700, Steven McKinney wrote:
   
 Since you can index a matrix or dataframe with
 a matrix of logicals, you can use is.na()
 to index all the NA locations and replace them
 all with 0 in one command.

 

 A quicker solution, that, IIRC,  was posted to the list by Peter
 Dalgaard several years ago is:

 sapply(mydata.df, function(x) {x[is.na(x)] - 0; x}))
   
I hope your memory fails you, because it doesn't actually work.

 sapply(test.df, function(x) {x[is.na(x)] - 0; x})
 x1 x2 x3
[1,]  0  1  1
[2,]  2  2  0
[3,]  3  3  0
[4,]  0  4  4

is a matrix, not a data frame.

Instead:

 test.df[] - lapply(test.df, function(x) {x[is.na(x)] - 0; x})
 test.df
  x1 x2 x3
1  0  1  1
2  2  2  0
3  3  3  0
4  0  4  4

Speedwise, sapply() is doing lapply() internally, and the assignment
overhead should be small, so I'd expect similar timings.

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Connecting R-help and Google Groups?

2007-03-15 Thread Gabor Csardi
On Wed, Mar 14, 2007 at 06:30:53PM -0500, Ranjan Maitra wrote:
 I agree with Bert on this one! Any commercial entity's future policies will 
 not be decided by some group's past understanding. Everything can be 
 explained in terms of shareholder value.
 

Personally, i would go even further. I think it is very dangeorous to
allow google and other search engines to search the R web pages and 
mailing lists, who knows what they'll do with this information some day,
or maybe they're already doing it right now!
All search engines should be banned from these sites IMHO. 

Personally i wouldn't even allow to download these pages with
Internet Explorer, Safari, Netscape, Firefox or any commercial 
web browser.

Also, the R team shouldn't support R on any commercial platforms like
MS Windows, OSX, Redhat  Debian Linux, etc. There is a considerable 
chance that the policies of these companies will affect the R community 
negatively.

 I don't see any advantages with tying up to Google groups. We get enough 
 posts every day here to keep us all busy with even a fraction of them. I also 
 think people should be encouraged to follow the policies such as read the 
 basics An Intro etc, before running off and posting. Besides, and most 
 importantly, I prefer having statisticians or those interested in statistics 
 applied to their problems discuss their issues and software, and I learn a 
 lot in this mailing list even in lurk mode. I could do without random 
 posters. 

I completely agree. The R user community is big enough right now, and 
it is clear that we want no more newbie's with annoying questions.
Nor want we more statistics professors, it is our right and duty to 
answer all the questions about R!

 Btw, anyone using R should be encouraged to use RSiteSearch to search this 
 mailing list on some topic.

Yes. I don't even understand why it is possible to search the mailing list
via a web form. It would be a great user filter to stop this service!
Only advanced users who are able to install R and find RSiteSearch 
should be allowed to see the posts. As for posting i recommend to set up
a test with questions about statistics, programming and other relevant
topics and only users passing the test would be allowed to submit posts
to the lists. It would be also wise to make them repeat the test 
every year, just in case they forget something.

Best,
Gabor

 Best,
 Ranjan
 
 
 
 On Wed, 14 Mar 2007 13:36:33 -0400 Paul Lynch [EMAIL PROTECTED] wrote:
 
  Well, I don't see what danger could arise from the fact that Google
  Groups is owned by a company.  Google Groups provides access to all of
  usenet, plus many mailing lists (e.g. the ruby-talk mailing list for
  Ruby programmers).  They don't control any of the newgroups or mailing
  lists that they provide access to.  It is a free service, supported by
  advertising.
  
  As for the issue of whether there might be future access problems
  (e.g. if Google goes bankrupt, which currently seems unlikely)  R
  users would still have access to the r-help list through the means
  that they have now.  I am not recommending replacing any of the
  current means of access to the r-help list; I am just asking about
  adding an additional means of access.
  
--Paul
  
  On 3/14/07, Bert Gunter [EMAIL PROTECTED] wrote:
   I know nothing about Google Groups, but FWIW, I think it would be most
   unwise for R/CRAN to hook up to **any** commercially sponsored web 
   portals.
   Future changes in their policies, interfaces,or access conditions may make
   them inaccessible or unfreindly to R users. So long as we have folks 
   willing
   and able to host and maintain our lists as part of the CRAN 
   infrastructure,
   CRAN maintains control. I think this is wise and prudent.
  
   I am happy to be educated to the contrary if I misunderstand how this 
   would
   work.
  
   Bert Gunter
   Genentech Nonclinical Statistics
   South San Francisco, CA 94404
   650-467-7374
  
  
   -Original Message-
   From: [EMAIL PROTECTED]
   [mailto:[EMAIL PROTECTED] On Behalf Of Paul Lynch
   Sent: Wednesday, March 14, 2007 8:48 AM
   To: R-help@stat.math.ethz.ch
   Subject: [R] Connecting R-help and Google Groups?
  
   This morning I tried to see if I could find the r-help mailing list on
   Google Groups, which has an interface that I like.  I found three
   Google Groups (The R Project for Statistical Computing, rproject,
   and rhelp) but none of them are connected to the r-help list.
  
   Is there perhaps some reason why it wouldn't be a good thing for there
   to be a connected Google Group?  I think it should be possible to set
   things up so that a post to the Google Group goes to the r-help
   mailing list, and vice-versa.
  
   Also, does anyone know why the three existing R Google Groups failed
   to get connected to r-help?  It might require some action on the part
   of the r-help list administrator.
  
   Thanks,
   --Paul
  
   --
   Paul Lynch
   Aquilent, 

Re: [R] fitting of all possible models

2007-03-15 Thread Rense Nieuwenhuis
Dear Lukas,

allthough I'm  intrigued by the purpose of what you are trying to do,  
as mentioned by some of the other persons on this list, I liked the  
challenge to write such a function.

I came up with the following during some train-traveling this morning:


tum - function(x)
{
tum - matrix(data=NA, nrow=2^x, ncol=x)

for (i in 1:x)
{
tum[,i] - c(rep(NA,2^i/2),rep(i+1,2^i/2))
}

return(tum)
}

###

all.models - function(model)
{
npred - length(model$coefficients) - 1
matr.model - tum(npred)
output - matrix(data=NA, nrow=2^npred, ncol=1)

for (t in 2:2^npred)
{
preds - names(model$coefficients)
interc - names(model$coefficients)[1]
form - as.formula(paste(. ~, 
paste(preds[na.omit(matr.model 
[t,])],collapse=+)))

model2 - update(model, form)
output[t,] - mean(resid(model2)^2)
}

return(output)

}

##


As you can see, I used a helper-function (tum, the ultimate matrix)  
to the actual function. Also, I wrote it using lm instead of glm, but  
I suppose that you can easily alter that. As well, the function now  
returns just some regular fit-measurement. But that is not all that  
essential, I think.

The main point is: it works! Using this on my G4 mac, with a lm of 10  
predictors and 18 cases, it returns the output quite fast (1 minute).

I hope you can put this to use. It needs some easy adapting to your  
specific needs, but I don't expect that to be a problem. If you need  
help with that, please contact me.

I'd appreciate to hear from you, if this function is helpful in any way.

sincerely,

Rense Nieuwenhuis





On Feb 27, 2007, at 8:46 , Indermaur Lukas wrote:

 Hi,
 Fitting all possible models (GLM) with 10 predictors will result in  
 loads of (2^10 - 1) models. I want to do that in order to get the  
 importance of variables (having an unbalanced variable design) by  
 summing the up the AIC-weights of models including the same  
 variable, for every variable separately. It's time consuming and  
 annoying to define all possible models by hand.

 Is there a command, or easy solution to let R define the set of all  
 possible models itself? I defined models in the following way to  
 process them with a batch job:

 # e.g. model 1
 preference- formula(Y~Lwd + N + Sex + YY)
 # e.g. model 2
 preference_heterogeneity- formula(Y~Ri + Lwd + N + Sex + YY)
 etc.
 etc.


 I appreciate any hint
 Cheers
 Lukas





 °°°
 Lukas Indermaur, PhD student
 eawag / Swiss Federal Institute of Aquatic Science and Technology
 ECO - Department of Aquatic Ecology
 Überlandstrasse 133
 CH-8600 Dübendorf
 Switzerland

 Phone: +41 (0) 71 220 38 25
 Fax: +41 (0) 44 823 53 15
 Email: [EMAIL PROTECTED]
 www.lukasindermaur.ch

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cex in xlab, ylab and zlab of persp

2007-03-15 Thread talepanda
Internally, labs in persp are drawn as when you use text function.
So you cannot change the sizes by cex.lab, but you can change by cex:

persp(x, y, z, cex=1.5)

gives larger labs in persp 3d plot.
Of course there may be some side effect because it may change the size
something other than labs.

HTH

On 3/15/07, Joseph Retzer [EMAIL PROTECTED] wrote:
 I've had no success using what I think is the correct code to change the
 default size of the x, y and z labels in a persp graph (i.e. cex.lab).   Is
 there a particular specification to accomplish this?
 Many thanks,
 Joe Retzer

   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] replacing all NA's in a dataframe with zeros...

2007-03-15 Thread Joerg van den Hoff
On Thu, Mar 15, 2007 at 10:21:22AM +0100, Peter Dalgaard wrote:
 Gavin Simpson wrote:
  On Wed, 2007-03-14 at 20:16 -0700, Steven McKinney wrote:

  Since you can index a matrix or dataframe with
  a matrix of logicals, you can use is.na()
  to index all the NA locations and replace them
  all with 0 in one command.
 
  
 
  A quicker solution, that, IIRC,  was posted to the list by Peter
  Dalgaard several years ago is:
 
  sapply(mydata.df, function(x) {x[is.na(x)] - 0; x}))

 I hope your memory fails you, because it doesn't actually work.
 
  sapply(test.df, function(x) {x[is.na(x)] - 0; x})
  x1 x2 x3
 [1,]  0  1  1
 [2,]  2  2  0
 [3,]  3  3  0
 [4,]  0  4  4
 
 is a matrix, not a data frame.
 
 Instead:
 
  test.df[] - lapply(test.df, function(x) {x[is.na(x)] - 0; x})
  test.df
   x1 x2 x3
 1  0  1  1
 2  2  2  0
 3  3  3  0
 4  0  4  4
 
 Speedwise, sapply() is doing lapply() internally, and the assignment
 overhead should be small, so I'd expect similar timings.

just an idea:
given the order of magnitude difference (factor 17 or so) in runtime 
between the obvious solution and the fast one: would'nt it be 
possible/sensible
to modify the corresponding subsetting method ([.data.frame) such that it
recognizes the case when it is called with an arbitrary index matrix (the
problem is not restricted to indexing with a logical matrix, I presume?) and
switch internally to the fast solution given above?

in my (admittedly limited) experience it seems that one of the not so nice
properties of R is that one encounters in quite a few situations exactly the 
above
situation: unexpected massive differences in run time between different 
solutions (I'm not
talking about explicit loop penalty). what concerns me most, are the very
basic scenarios (not complex algorithms): data frames vs. matrices, naming
vector components or not, subsetting, read.table vs. scan, etc. if their were a
concise HOW TO list for the cases when speed matters, that would be helpful,
too.

I understand that part of the uneven performance is unavoidable and one must
expect the user to go to the trouble to understand the reasons, e.g. for
differences between handling purely numerical data in either matrices or data
frames. but a factor of 17 between the obvious approach and the wise one seems
a trap in which 99% of the people will step (probably never thinking that their
might be a faster approach).

joerg

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] replacing all NA's in a dataframe with zeros...

2007-03-15 Thread Gavin Simpson
On Thu, 2007-03-15 at 10:21 +0100, Peter Dalgaard wrote:
 Gavin Simpson wrote:
  On Wed, 2007-03-14 at 20:16 -0700, Steven McKinney wrote:

  Since you can index a matrix or dataframe with
  a matrix of logicals, you can use is.na()
  to index all the NA locations and replace them
  all with 0 in one command.
 
  
 
  A quicker solution, that, IIRC,  was posted to the list by Peter
  Dalgaard several years ago is:
 
  sapply(mydata.df, function(x) {x[is.na(x)] - 0; x}))

 I hope your memory fails you, because it doesn't actually work.

Ah, yes, apologies Peter. I have the sapply version embedded in a
package function that I happened to be working on (where I wanted the
result to be a matrix) and pasted directly from there and not my crib
sheet of useful R-help snippets where I do have it as lapply(...). I'd
forgotten I'd changed Peter's suggestion slightly in my function.

That'll teach me to reply before my morning cup of Earl Grey.

All the best,

G

 
  sapply(test.df, function(x) {x[is.na(x)] - 0; x})
  x1 x2 x3
 [1,]  0  1  1
 [2,]  2  2  0
 [3,]  3  3  0
 [4,]  0  4  4
 
 is a matrix, not a data frame.
 
 Instead:
 
  test.df[] - lapply(test.df, function(x) {x[is.na(x)] - 0; x})
  test.df
   x1 x2 x3
 1  0  1  1
 2  2  2  0
 3  3  3  0
 4  0  4  4
 
 Speedwise, sapply() is doing lapply() internally, and the assignment
 overhead should be small, so I'd expect similar timings.
 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading raw matrix saved with writeBin

2007-03-15 Thread Henrik Bengtsson
FYI, to save data as bitmap images, see the EBImage package on Bioconductor.

/H

On 3/15/07, Ranjan Maitra [EMAIL PROTECTED] wrote:
 On Wed, 14 Mar 2007 18:45:53 -0700 (PDT) Milton Cezar Ribeiro [EMAIL 
 PROTECTED] wrote:

  Dear Friends,
 
  I saved a matrix - which contans values 0 and 1 - using following command:
  writeBin (as.integer(mymatrix), myfile.raw,  size=1).
 
  It is working fine and I can see the matrix using photoshop. But now I need
  read the matrices again (in fact I have a thousand of them) as matrix into 
  R but when
  I try something like  mat.dat-readBin (myfile.raw,size=1) I can´t access 
  the
  matrix
 
  Kind regards,
 
  Miltinho
 
  __
 
 
[[alternative HTML version deleted]]
 

 Look up the help file. There is an explicit example. Basically, you need to 
 tell the file to read in binary.

 In fact, I am a little surprised your first command works while writing.

 HTH!
 Ranjan

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] regarding cointegration

2007-03-15 Thread gyadav

Hi All

Thanks for supporting people like me. 
What is cointegration and its connection with granger causality test ? 
what is its use and mathematical methodology behind it. Secondly, is 
cointegration test like  Phillips-Ouliaris Cointegration Test of tseries 
package or of urca package is the same as cointegration ? Please tell me 
how to go about it and interpret the results ? 

Thanks in advance
cheers :-)
-gaurav


DISCLAIMER AND CONFIDENTIALITY CAUTION:\ \ This message and ...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] What happened to the rpm package ?

2007-03-15 Thread Olivier ETERRADOSSI
Hi List,
sorry if this is a FAQ : I could not make my way to it :-(

Once upon a time it happened to be a package named rpm for Robust 
Point Matching.
I can find a few traces of it in the CRAN archives : works by Saussen 
and al. (Aligning spectra with R), but can't find the package anymore.
No trace at all in the Old or Orphaned lists
Does one UseR know about it, or even use it ?

Thanks in advance, and regards to all. Olivier

-- 
Olivier ETERRADOSSI
Maître-Assistant
CMGD / Equipe Propriétés Psycho-Sensorielles des Matériaux
Ecole des Mines d'Alès
Hélioparc, 2 av. P. Angot, F-64053 PAU CEDEX 9
tel std: +33 (0)5.59.30.54.25
tel direct: +33 (0)5.59.30.90.35 
fax: +33 (0)5.59.30.63.68
http://www.ema.fr

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating q and p functions from a self-defined distribution

2007-03-15 Thread Eli Gurarie
Hello all,

I am fishing for some suggestions on efficient ways to make qdist and 
pdist type functions from an arbitrary distribution whose probability 
density function I've defined myself.

For example, let's say I have a distribution whose pdf is:

dRN - function(x,d,v,s)
# d, v, and s are parameters
return(d/x^2/sv/sqrt(2*pi)*exp(-(d-v*x)^2/2/(sv^2*x^2)))

this is a legitimate distribution over the reals (though it has a 
singularity at x=0)

at the moment, to get a p value, I sum the dRN over some arbitrary 
interval dt:

pRN-function(x,d,v,s,dt=.001)
{
   xs-seq(0,x,dt)
   dRN-dRN(xs,d,v,s)
   return(sum(dRN)*dt)
}

which is OK, except I sometimes want a vector of p values for a vector 
of x's, say: pRN(xs=seq(0,10,.1) ... ) which involves an unwieldly loop.

Getting a quantile value is a real nightmare!  Right now I have a thing 
that involves using approx() and matching rounded numbers and requires a 
loop and is slow.

It seems surprising that it would be so hard to invert a function that 
is perfectly defined!

Are there packages/functions/algorithms that allow one to manipulate 
arbitrarily defined distributions?

Any suggestions appreciated,
Thanks,
   Eli




*
  Eli Gurarie
  Quantitative Ecology and Resource Management
  University of Washington, Seattle

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] expm() within the Matrix package

2007-03-15 Thread Laura Hill
Hi

Could anybody give me a bit of advice on some code I'm having trouble with?

I've been trying to calculate the loglikelihood of a function iterated over
a data of time values and I seem to be experiencing difficulty when I use
the function expm(). Here's an example of what I am trying to do


y-c(5,10)  #vector of 2 survival times

p-Matrix(c(1,0),1,2)   #1x2 matrix

Q-Matrix(c(1,2,3,4),2,2)   #2x2 matrix

q-Matrix(c(1,2),2,1)   #2x1 matrix

Loglik-rep(0,length(y))#creating empty vector of same length as y

for(i in 1:length(y)){

Loglik[i]-log((p %*% (expm(Q*y[i]))) %*% q)  #calculating
   
#  Loglikelihood for each y[i]
}

The code works perfectly if I use exp(Q*y[i]) but not for expm()


If anyone has any advice they could give that would be great.
I would like to thank Gad Abraham also for helping me get this far.

Thanks in advance

Laura

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Occasional problems with starting batch mode

2007-03-15 Thread Bos, Roger
I use a windows Server 2003 machine to run R code in batch mode every
night using the following command:

F:\Program Files\R\R-2.4.1pat\bin\R.exe CMD BATCH --vanilla --slave
batch_master_dr.R

This produces an output file, of which the first three lines look like
this:

Loading required package: tcltk
Loading Tcl/Tk interface ... done
Loading required package: R2HTML

99% of the time this works great.  Every once in a while, I get the
following error message instead and it does not run any of my subsequent
code:

Loading required package: tcltk
Loading Tcl/Tk interface ... Unable to register TkTopLevel class

This application has requested the Runtime to terminate it in an unusual
way.
Please contact the application's support team for more information.

Does anyone have any clue where I should look to fix this problem?

Thanks,

Roger
   
version.string R version 2.4.1 Patched (2007-02-04 r40647)

** * 
This message is for the named person's use only. It may 
contain confidential, proprietary or legally privileged 
information. No right to confidential or privileged treatment 
of this message is waived or lost by any error in 
transmission. If you have received this message in error, 
please immediately notify the sender by e-mail, 
delete the message and all copies from your system and destroy 
any hard copies. You must not, directly or indirectly, use, 
disclose, distribute, print or copy any part of this message 
if you are not the intended recipient.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regarding cointegration

2007-03-15 Thread Ranjan Maitra
Please do not mess up the thread by posting as a reply to some other topic.

Thanks,
Ranjan


On Thu, 15 Mar 2007 16:51:20 +0530 [EMAIL PROTECTED] wrote:

 
 Hi All
 
 Thanks for supporting people like me. 
 What is cointegration and its connection with granger causality test ? 
 what is its use and mathematical methodology behind it. Secondly, is 
 cointegration test like  Phillips-Ouliaris Cointegration Test of tseries 
 package or of urca package is the same as cointegration ? Please tell me 
 how to go about it and interpret the results ? 
 
 Thanks in advance
 cheers :-)
 -gaurav
 
 
 DISCLAIMER AND CONFIDENTIALITY CAUTION:\ \ This message and ...{{dropped}}
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Mauchly.test

2007-03-15 Thread sebastien boutry
Bonjour,

elles correspondent à des données de différents traitements (EU) avec des 
dates différentes
j'aimerai pouvoir faire une comparaison de k moyennes selon ces deux 
variables date et EU
ANOVA à n facteurs n'est pas la bonne solution
je voudrai savoir qu'elles sont les tests qui permettent de résoudre ce 
problème (échantillons appariés)
je suis partis sur le test de Mauchley sans réussite
pouvez vous m'aider
merci
Sébastien

Hello,
i would like to compare the k means of DW with two parameters (date and 
traitement) but i don't use ANOVA with n factors (aov) because  i have a 
link between wk2 and wk1, wk3 and wk1-wk2 ... (matched samples).
if you know the test that i can use (Mauchly.test)
thanks.
Sébastien

Voici mes données:
dateEU  DW
wk1 EU1 5.324547829
wk1 EU1 7.321253265
wk1 EU1 4.431712065
wk1 EU2 8.230322407
wk1 EU2 8.546873269
wk1 EU2 5.657332069
wk1 EU3 3.165508618
wk1 EU3 4.431712065
wk1 EU3 1.899305171
wk2 EU1 2.163097556
wk2 EU1 17.61379438
wk2 EU1 15.82754309
wk2 EU2 16.46064481
wk2 EU2 19.30960257
wk2 EU2 13.92823792
wk2 EU3 6.014466374
wk2 EU3 7.280669822
wk2 EU3 5.064813789
wk4 EU1 11.03179753
wk4 EU1 29.75578101
wk4 EU1 22.71252433
wk4 EU2 27.85647584
wk4 EU2 36.71989997
wk4 EU2 20.11680727
wk4 EU3 13.59661321
wk4 EU3 13.2951362
wk4 EU3 14.56133964
wk6 EU1 30.73875474
wk6 EU1 33.27842393
wk6 EU1 35.27512937
wk6 EU2 31.2817185
wk6 EU2 41.53147307
wk6 EU2 35.57093758
wk6 EU3 8.652390223
wk6 EU3 18.6359174
wk6 EU3 15.30807501

_
Windows Live Spaces : créez votre blog à votre image !

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating q and p functions from a self-defined distribut

2007-03-15 Thread Ted Harding
On 15-Mar-07 12:09:42, Eli Gurarie wrote:
 Hello all,
 
 I am fishing for some suggestions on efficient ways to make qdist and 
 pdist type functions from an arbitrary distribution whose probability 
 density function I've defined myself.
 
 For example, let's say I have a distribution whose pdf is:
 
 dRN - function(x,d,v,s)
# d, v, and s are parameters
 return(d/x^2/sv/sqrt(2*pi)*exp(-(d-v*x)^2/2/(sv^2*x^2)))
 
 this is a legitimate distribution over the reals (though it has a 
 singularity at x=0)
 [...]
 It seems surprising that it would be so hard to invert a function that 
 is perfectly defined!
 
 Are there packages/functions/algorithms that allow one to manipulate 
 arbitrarily defined distributions?

Do not be surprised! The Normal distribution function itself, with
pdf (1/(sv*sqrt(1*pi)))*exp(-((x - mu)^2)/(2*sv^2)), is perfectly
well defined. Yet the literature of computation over decades has
presented procedure after procedure for computing the cumulative
fucntion, and its inverse, to desired precision. None of these is
simple. Indeed, (to use your own word), the Normal distribution
is a nightmare from this point of view.

So being well-defined is no guarantee that computing its p and q
values will be a simple or easy problem. And what kind of method
is good for a particular distribution will depend strongly on
what the distribution is (for instance, whether it has tails
which tend rapidly to 0 like the Normal, whether there are good
asymptotic formulae, etc.).

In the case of the example you give above, the transformation
from x to u = 1/x will translate it into a Normal distribution
problem, after which you can use (circumspectly ... ) the R
functions pnorm and qnorm (which are based on the best from
the above literature) to deal with it.

But you give it only as an example ... and you are asking for
methods for a general user-defined distribution. For the reasons
given above, a good general-purpose method is unlikely to exist.

With best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 15-Mar-07   Time: 13:26:27
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Hardware for a new Workstation for best performance using R

2007-03-15 Thread Sascha Morach
Hi,

we are looking for a new workstation for big datasets/applications (easily
up to 100'000 records and up to 300 Variables) using R. As an example:
Variable Selection for a multivariate regression using stepAIC.

What is the best configuration for a workstation to reach a high performance
level for computing with R?

Single core or multi core (is R together with nws package really able to use
advantage of multi core processors, any experience/benchmarks on that)?

Shall we use Linux instead of Windows? If yes, how is the compatibility of
graphics computed on Linux if we like to use them after on windows? And what
are the advantages using Linux instead of Windows?

What kind of workstations are you using (hardware and operating system) for
big data computations? And are you satisfied with it?

I'm quite familiar with pc or server hardware.

Thanks in advance

Sascha Morach

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in upgrade

2007-03-15 Thread Giovanni Parrinello
Dear All,
update.packages(ask='graphics')
--- Please select a CRAN mirror for use in this session ---
Error in .readRDS(pfile) : unknown input format.
???
TIA
Giovanni

-- 
dr. Giovanni Parrinello
Department of Biotecnologies
Medical Statistics Unit
University of Brescia
Viale Europa, 11 25123 Brescia
email: [EMAIL PROTECTED]
Phone: +390303717528
Fax: +390303717488

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hardware for a new Workstation for best performance using R

2007-03-15 Thread Andrew Perrin
I can speak to some of these issues. I don't know about how much benefit 
you can get from SMP for *single* instances of R, though.

1.) Multicore will be helpful, at least, if you are running several 
instances of R at once.  So, for example, if you have people running two 
different models at the same time, the OS can use separate processors or 
cores for each instance.

2.) Yes, by all means you should use linux instead of windows. The 
graphics output is completely compatible with whatever applications you 
want to paste them into on Windows.  Linux is cheaper, stabler, and better 
at using the system's resources.

3.) If you're doing big datasets, you certainly need a 64-bit processor, 
operating system, and R.  Consider, perhaps, a dual-Athlon XP 64 machine 
with a big pile of RAM?

Andy

--
Andrew J Perrin - andrew_perrin (at) unc.edu - http://perrin.socsci.unc.edu
Assistant Professor of Sociology; Book Review Editor, _Social Forces_
University of North Carolina - CB#3210, Chapel Hill, NC 27599-3210 USA
New Book: http://www.press.uchicago.edu/cgi-bin/hfs.cgi/00/178592.ctl



On Thu, 15 Mar 2007, Sascha Morach wrote:

 Hi,

 we are looking for a new workstation for big datasets/applications (easily
 up to 100'000 records and up to 300 Variables) using R. As an example:
 Variable Selection for a multivariate regression using stepAIC.

 What is the best configuration for a workstation to reach a high performance
 level for computing with R?

 Single core or multi core (is R together with nws package really able to use
 advantage of multi core processors, any experience/benchmarks on that)?

 Shall we use Linux instead of Windows? If yes, how is the compatibility of
 graphics computed on Linux if we like to use them after on windows? And what
 are the advantages using Linux instead of Windows?

 What kind of workstations are you using (hardware and operating system) for
 big data computations? And are you satisfied with it?

 I'm quite familiar with pc or server hardware.

 Thanks in advance

 Sascha Morach

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cex in xlab, ylab and zlab of persp

2007-03-15 Thread José Rafael Ferrer Paris
I had similar problems, actually it is very difficult to customize persp
graphics, you should try wireframe in lattice instead. 
This reference might help to customize the wireframe plots: 
http://www.polisci.ohio-state.edu/faculty/lkeele/3dinR.pdf

JR

El mié, 14-03-2007 a las 20:40 -0700, Joseph Retzer escribió:
 I've had no success using what I think is the correct code to change the 
 default size of the x, y and z labels in a persp graph (i.e. cex.lab).   Is 
 there a particular specification to accomplish this?
 Many thanks,
 Joe Retzer
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in upgrade

2007-03-15 Thread Sundar Dorai-Raj


Giovanni Parrinello said the following on 3/15/2007 6:43 AM:
 Dear All,
 update.packages(ask='graphics')
 --- Please select a CRAN mirror for use in this session ---
 Error in .readRDS(pfile) : unknown input format.
 ???
 TIA
 Giovanni
 

I cannot replicate this in R-2.4.1. What version of R and repository are 
you using?

update.packages(ask = graphics, repos = http://cran.cnr.berkeley.edu;)

--sundar

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Redirecting output to the screen

2007-03-15 Thread Prof Brian Ripley
This is much easier in R-devel: just use message() and scan(stdin).

gannet% cat Test.R
message(Enter file name: , appendLF=FALSE)
fn - scan(stdin, what=, n=1)

works for me in R-devel via  R --vanilla -f Test.R  Rout.txt
I believe it also works under Windows.

On Wed, 14 Mar 2007, John Schuenemeyer wrote:

 A simple example follows.  The file is called Test.R
 # Example
 rm(list=is(all=TRUE))
 cat(Enter file name)
 fn-scan(what=)

 I execute the following:
 @C:\PROGRA~1\R\R-2.4.1\bin\Rterm.exe --no-restore --no-save  Test.R  
 Rout.txt

 I do not see the Enter file name or have the opportunity to enter the file 
 name.  I am running R in windows XP.

 Thanks for your help.

 John H Schuenemeyer
 Southwest Statistical Consulting, LLC
 960 Sligo St
 Cortez, CO 81321-2558

 Phone: 970.565.0179

 URL: www.swstatconsult.com
   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

PLEASE do, and not send HTML.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to...

2007-03-15 Thread [EMAIL PROTECTED]
I have to perform ANOVA's on many different data organized in a dataframe. I 
can run an ANOVA for each sample, but I've got hundreds of data and I would 
like to avoid manually carrying out each test. in addition, I would like to 
have the results organized in a simple way, for example in a table, wich could 
be easy to export. thank you for assistance

simone 


--
Leggi GRATIS le tue mail con il telefonino i-mode™ di Wind
http://i-mode.wind.it

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Seemingly Unrelated Regressions question - not R question

2007-03-15 Thread Leeds, Mark \(IED\)
Does anyone know where I can find a proof of the fact that when each X
matrix in a SUR is the same,
then SUR estimation is equivalent to OLS estmation. The proof is
supposedly in William Greene's book but that book 
costs 157.00 an has mixed reviews so I am reluctant to purchase it.
Thanks.


This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating q and p functions from a self-defined distribut

2007-03-15 Thread S Ellison


 [EMAIL PROTECTED] 15/03/2007 13:26:52 
On 15-Mar-07 12:09:42, Eli Gurarie wrote:
 Hello all,
 
 I am fishing for some suggestions on efficient ways to make qdist and 
 pdist type functions from an arbitrary distribution whose probability 
 density function I've defined myself.

Ted Harding accurately said there is unliekly to be an efficient general answer.

However, if you want a dreadfully inefficient but very general answer, could 
you use a numerical integral to calculate the cumulative probabilities, and a 
root-finder to find quantiles from the integral function?  (strictly, the 
quantile would be the root of {cumulative integral} - p where p is where you 
want the quantile.

uniroot is the general-purpose root-finder in R; it isn't ideal for this 
purpose because it won't like an interval of +-In, which you will certainly 
need for general quantiles. But you should be able to make it work with minimal 
tinkering if you use it on logit(p) and apply it to the interval 
[0,1]Converting back from logit after uniroot should then give you your 
quantile. You may also want to give it a smaller tolerance; the default looks 
like its geared to a sum-of-squares minimiser, and may be a bit over-generous 
for your purpose.. 

Steve Ellison.

***
This email and any attachments are confidential. Any use, co...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cannot allocate vector size of... ?

2007-03-15 Thread Wingfield, Jerad G.
Hello all, 

 

I've been working with R  Fridolin Wild's lsa package a bit over the
past few months, but I'm still pretty much a novice. I have a lot of
files that I want to use to create a semantic space. When I begin to run
the initial textmatrix( ), it runs for about 3-4 hours and eventually
gives me an error. It's always ERROR: cannot allocate vector size of
xxx Kb. I imagine this might be my computer running out of memory, but
I'm sure. So I thought I would send this to community at large for any
help/thoughts.

 

I search the archives and didn't really find anything that specifically
speaks to my situation. So I guess I have s few questions. First, is
this actually an issue with the machine running out of memory? If not,
what might be the cause for the error? If so, is there a way to minimize
the amount of memory used by the vector data structures (e.g., Berkeley
DB)?

 

Thanks,

Gabe Wingfield

IT and Program Specialist I

Center for Applied Social Research

University of Oklahoma

2 Partners Place

3100 Monitor, Suite 100

Norman, OK 73072


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Density estimation graphs

2007-03-15 Thread Mark Wardle
Mark Wardle wrote:
 Dear all,
 
 I'm struggling with a plot and would value any help!
 ...
 
 Is there a better way? As always, I'm sure there's a one-liner rather
 than my crude technique!
 

As always, I've spent ages trying to sort this, and then the minute
after sending an email, I find the polygon() function.

Ignore previous message!

Best wishes,

Mark

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Density estimation graphs

2007-03-15 Thread Mark Wardle
Dear all,

I'm struggling with a plot and would value any help!

I'm attempting to highlight a histogram and density plot to show a
proportion of cases above a threshold value. I wanted to cross-hatch the
area below the density curve. The breaks and bandwidth are deliberate
integer values because of the type of data I'm looking at.

I've managed to do this, but I don't think it is very good! It would be
difficult, for example, to do a cross-hatch using this technique.

allele.plot - function(x, threshold=NULL, hatch.col='black',
hatch.border=hatch.col, lwd=par('lwd'),...) {
h - hist(x, breaks=max(x), plot=F)
d - density(x, bw=1)
plot(d, lwd=lwd, ...)

if (!is.null(threshold)) {
d.t - d$xthreshold
d.x - d$x[d.t]
d.y - d$y[d.t]
d.l - length(d.x)
# draw all but first line of hatch
for (i in 2:d.l) {
lines(c(d.x[i],d.x[i]),c(0,d.y[i]),
col=hatch.col,lwd=1)
}
# draw first line in hatch border colour
lines(c(d.x[1],d.x[1]),c(0,d.y[1]),
col=hatch.border,lwd=lwd)

# and now re-draw density plot lines
lines(d, lwd=lwd)
}
}

# some pretend data
s8 = rnorm(100, 15, 5)
threshold = 19  # an arbitrary cut-off
allele.plot(s8, threshold, hatch.col='grey',hatch.border='black')


Is there a better way? As always, I'm sure there's a one-liner rather
than my crude technique!

Best wishes,

Mark
-- 
Dr. Mark Wardle
Clinical research fellow and specialist registrar, Neurology
University Hospital Wales and Cardiff University, UK

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cannot allocate vector size of... ?

2007-03-15 Thread Bos, Roger
Yes, your error is due to running out of memory.  This is probably one
of the most frequent questions asked here, so if you search again you
can find a lot of advice on how to get around it.  

As you learn more about R programming you will learn how to store data
more efficiently, rm() to remove variables you no longer need, gc() to
garbage collect and free up memory.  Try to open only the files you
need, do some of the analysis, then get rid of everything you don't
need, then do some more analysis.

Thanks,

Roger 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Wingfield, Jerad
G.
Sent: Thursday, March 15, 2007 12:27 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Cannot allocate vector size of... ?

Hello all, 

 

I've been working with R  Fridolin Wild's lsa package a bit over the
past few months, but I'm still pretty much a novice. I have a lot of
files that I want to use to create a semantic space. When I begin to run
the initial textmatrix( ), it runs for about 3-4 hours and eventually
gives me an error. It's always ERROR: cannot allocate vector size of
xxx Kb. I imagine this might be my computer running out of memory, but
I'm sure. So I thought I would send this to community at large for any
help/thoughts.

 

I search the archives and didn't really find anything that specifically
speaks to my situation. So I guess I have s few questions. First, is
this actually an issue with the machine running out of memory? If not,
what might be the cause for the error? If so, is there a way to minimize
the amount of memory used by the vector data structures (e.g., Berkeley
DB)?

 

Thanks,

Gabe Wingfield

IT and Program Specialist I

Center for Applied Social Research

University of Oklahoma

2 Partners Place

3100 Monitor, Suite 100

Norman, OK 73072


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

** * 
This message is for the named person's use only. It may 
contain confidential, proprietary or legally privileged 
information. No right to confidential or privileged treatment 
of this message is waived or lost by any error in 
transmission. If you have received this message in error, 
please immediately notify the sender by e-mail, 
delete the message and all copies from your system and destroy 
any hard copies. You must not, directly or indirectly, use, 
disclose, distribute, print or copy any part of this message 
if you are not the intended recipient.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [e1071] predict.svm bug ?

2007-03-15 Thread Charles Hebert
Hi all,

I'm using SVM to classify data (2 classes) and I get strange results :

 model = svm(x, y, probability = TRUE)

 pred = predict(model, x, decision.values = TRUE, probability = FALSE)
 table(pred,y)
 y
pred  ctl nuc
  ctl  82   3
  nuc   5  84

 pred
  1   2   3   4   5   6   7   8 
nuc nuc nuc nuc nuc nuc nuc ctl 

And now, with probabities computation :

 pred = predict(model, x, decision.values = TRUE, probability = TRUE)
 table(pred,y)
 y
pred  ctl nuc
  ctl   7  84
  nuc  80   3

 pred
  1   2   3   4   5   6   7   8 
ctl ctl ctl ctl ctl ctl ctl nuc ...

However, model, x, and y didn't change !! Also, decision.values didn't
change :

 nuc/ctl
10.505289854
20.265975135
30.863270144
40.354181677
50.868119168
60.702989607
70.206018067
8   -0.271452937 - ctl is correct !


Is it a bug ? Could you explain the difference ?

Best regards,

charles

-- 
Charles Hébert

DYnamique et Organisation des GENomes,
Laboratoire de Génétique Moléculaire, CNRS UMR 8541
Ecole Normale Supérieure 46, rue d'Ulm 75005 Paris, France

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] plot.randomForest default mtry values

2007-03-15 Thread Joseph Retzer
When using the plot.randomForest method, 3 error series (by number of trees) 
are plotted. I suspect they are associated with the 3 default values of mtry 
that are used, for example, in the tuneRF method but I'm not sure. Could 
someone confirm?

Also, is it possible to force different values of mtry to be used when creating 
the plots? I specified them explicitly in the randomForest statement but it did 
not seem to have an effect.
Many thanks,
Joe Retzer

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Density estimation graphs

2007-03-15 Thread Charilaos Skiadas
On Mar 15, 2007, at 12:37 PM, Mark Wardle wrote:

 Dear all,

 I'm struggling with a plot and would value any help!

 I'm attempting to highlight a histogram and density plot to show a
 proportion of cases above a threshold value. I wanted to cross- 
 hatch the
 area below the density curve. The breaks and bandwidth are deliberate
 integer values because of the type of data I'm looking at.

 I've managed to do this, but I don't think it is very good! It  
 would be
 difficult, for example, to do a cross-hatch using this technique.

Don't know about a cross-hatch, but in general I use polygon for  
highlighting areas like that:

allele.plot - function(x, threshold=NULL, hatch.col='black',
hatch.border=hatch.col, lwd=par('lwd'),...) {
h - hist(x, breaks=max(x), plot=F)
d - density(x, bw=1)
plot(d, lwd=lwd, ...)   
if (!is.null(threshold)) {
d.t - d$xthreshold
d.x - d$x[d.t]
d.y - d$y[d.t]
polygon(c(d.x[1],d.x,d.x[1]),c(0,d.y,0), col=hatch.col,lwd=1)
}
}
# some pretend data
s8 = rnorm(100, 15, 5)
threshold = 19  # an arbitrary cut-off
allele.plot(s8, threshold, hatch.col='grey',hatch.border='black')


Perhaps this can help a bit. Btw, what was d.l for?

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot.randomForest default mtry values

2007-03-15 Thread Jim Porzak
Joe,

I'm guessing you are doing a 2-category problem. The three lines are
OOB errors for overall error and each of the two categories.

There is only one default value of mtry. You can specify a different
mtry when the forest is built (in your call to randomForest()), but it
applies to the entire forest.

On 3/15/07, Joseph Retzer [EMAIL PROTECTED] wrote:
 When using the plot.randomForest method, 3 error series (by number of trees) 
 are plotted. I suspect they are associated with the 3 default values of mtry 
 that are used, for example, in the tuneRF method but I'm not sure. Could 
 someone confirm?

 Also, is it possible to force different values of mtry to be used when 
 creating the plots? I specified them explicitly in the randomForest statement 
 but it did not seem to have an effect.
 Many thanks,
 Joe Retzer

 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
HTH,
Jim Porzak
Loyalty Matrix Inc.
San Francisco, CA
http://www.linkedin.com/in/jimporzak

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to use result of approxfun in a package?

2007-03-15 Thread Steve Buyske
I am working on a project where we start with start with 2 long,  
equal-length vectors, massage them in various ways, and end up with a  
function mapping one interval to another. I'll call that function  
f1. The last step in R is to generate f1 as the value of the  
approxfun function. I would like to put f1 into a package, but  
without having the package redo the creation of function f1. The  
value of approxfun is a function; for example, I have

  f1
function (v)
.C(R_approx, as.double(x), as.double(y), as.integer(n), xout =  
as.double(v),
 as.integer(length(v)), as.integer(method), as.double(yleft),
 as.double(yright), as.double(f), NAOK = TRUE, PACKAGE = base) 
$xout
environment: 0x17719324

I don't really understand how this is stored, and in particular, how  
I should handle it so as to include the function f1 in a package. I  
would like the users to be able to load the package and use f1  
directly, rather than re-generate it using approxfun.

Thanks,
Steve
---
Steve Buyske (rhymes with nice key)
Associate Research Professor
Department of Statistics  Biostatistics
Rutgers University
Hill Center, 110 Frelinghuysen Rd
Piscataway, NJ 08854
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculating Deviance from log-likelihood

2007-03-15 Thread Benjamin Dickgiesser

Hi,

This is a bit R unrelated, I want to, however, use my results from
this in a R function.

I am trying to figure out a formula for the deviance given a
likelihood function. In my derivation I end up with having ln(y-1)
twice in my expression for the deviance (see attached pdf). Which
makes it impossible to calculate a value for the deviance since y can
be = 1.

I know this expression can be simplified differently avoiding ln(y-1).
Does someone have a suggestion?

I would appreciate you help very much!

Thank you,
Benjamin


deviance.pdf
Description: Adobe PDF document
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] logistic regression

2007-03-15 Thread Milton Cezar Ribeiro
Dear All,

I would like adjust and know the R2 of following presence/absence data:

x-1:10
y-c(0,0,0,0,0,0,0,1,1,1)

I tryed use clogit (survival package) but it don´t worked. 

Any idea?

miltinho

__


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] vars :VARMA, multivariate time series?

2007-03-15 Thread sj
I have a multivariate time series and I would like to build a forecasting
model with both AR and MA terms, I think that this is possible in R. I have
looked at the vars package and it looks like it is possible to estimate MA
terms using the Phi and Psi functions but I am not sure how to incorporate
the estimated terms into a forecasting model. I have also looked at the dse
package, but have not been able to determine how to estimate a VARMA from
the documentation, any direction on which packages to use for VARMA would be
appreciated.

thanks,

Spencer

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot.randomForest default mtry values

2007-03-15 Thread Wensui Liu
Joe,
here is a piece of junk copied from my blog, showing how to use mtry and hth.
library(MASS);
library(randomForest);
data(Boston);

set.seed(2007);

# SEARCH FOR BEST VALUE OF MTRY FOR RANDOM FORESTS
mtry - tuneRF(Boston[, -14], Boston[, 14], mtryStart = 1,
   stepFactor = 2, ntreeTry = 500, improve = 0.01);
best.m - mtry[mtry[, 2] == min(mtry[, 2]), 1];

# FIT A RF MODEL
rf - randomForest(medv~., data = Boston, mtry = best.m, ntree = 1000,
importance = TRUE);

# EXTRACT VARIABLE IMPORTANCE
imp.tmp - importance(rf, type = 1);
rf.imp - imp.tmp[order(imp.tmp[, 1], decreasing = TRUE),];
par(mar = c(3, 0, 4, 0));
barplot(rf.imp, col = gray(0:(ncol(Boston) - 1)/(ncol(Boston) - 1)),
names.arg = names(rf.imp), yaxt = n, cex.names = 1);
title(main = list(Importance Rank of Predictors, font = 4, cex = 1.5));

# PLOT PARTIAL DEPENDENCE OF EACH PREDICTOR
par(mfrow = c(3, 5), mar = c(2, 2, 2, 2), pty = s);
for (i in 1:(ncol(Boston) - 1))
  {
partialPlot(rf, Boston, names(Boston)[i], xlab = names(Boston)[i],
main = NULL);
  }

On 3/15/07, Joseph Retzer [EMAIL PROTECTED] wrote:
 When using the plot.randomForest method, 3 error series (by number of trees) 
 are plotted. I suspect they are associated with the 3 default values of mtry 
 that are used, for example, in the tuneRF method but I'm not sure. Could 
 someone confirm?

 Also, is it possible to force different values of mtry to be used when 
 creating the plots? I specified them explicitly in the randomForest statement 
 but it did not seem to have an effect.
 Many thanks,
 Joe Retzer

 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cannot allocate vector size of... ?

2007-03-15 Thread Wingfield, Jerad G.

Oops. Yep, I totally forgot my specs and such. I'm currently running
R-2.4.1 on a 64-bit Linux box (Fedora Core 6) with 4GB of RAM. The files
are 10-50Kb on average, but this error came about when only working with
~16,000 of them. The final size of the corpus is ~1.7M files. So,
obviously, this memory thing is going to be a large issue for me.

I'm going through re-searching the help list archives and now it looks
like I have S Poetry to read as well. 

Thanks for all the suggestions. Any others are greatly appreciated as
well.

Gabe Wingfield
IT and Program Specialist I
Center for Applied Social Research
University of Oklahoma
2 Partners Place
3100 Monitor, Suite 100
Norman, OK 73072

-Original Message-
From: Patrick Burns [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 15, 2007 12:31 PM
To: Wingfield, Jerad G.
Subject: Re: [R] Cannot allocate vector size of... ?

You can find a few things not to do (things that waste memory)
in S Poetry.  You don't say how much memory your machine
has, nor how big your objects are.  However, it is possible that
getting more memory for your machine might be the best thing
to do.


Patrick Burns
[EMAIL PROTECTED]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and A Guide for the Unwilling S User)

Wingfield, Jerad G. wrote:

Hello all, 

 

I've been working with R  Fridolin Wild's lsa package a bit over the
past few months, but I'm still pretty much a novice. I have a lot of
files that I want to use to create a semantic space. When I begin to
run
the initial textmatrix( ), it runs for about 3-4 hours and eventually
gives me an error. It's always ERROR: cannot allocate vector size of
xxx Kb. I imagine this might be my computer running out of memory, but
I'm sure. So I thought I would send this to community at large for any
help/thoughts.

 

I search the archives and didn't really find anything that specifically
speaks to my situation. So I guess I have s few questions. First, is
this actually an issue with the machine running out of memory? If not,
what might be the cause for the error? If so, is there a way to
minimize
the amount of memory used by the vector data structures (e.g., Berkeley
DB)?

 

Thanks,

Gabe Wingfield

IT and Program Specialist I

Center for Applied Social Research

University of Oklahoma

2 Partners Place

3100 Monitor, Suite 100

Norman, OK 73072


   [[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression

2007-03-15 Thread Ted Harding
On 15-Mar-07 17:03:50, Milton Cezar Ribeiro wrote:
 Dear All,
 
 I would like adjust and know the R2 of following presence/absence
 data:
 
 x-1:10
 y-c(0,0,0,0,0,0,0,1,1,1)
 
 I tryed use clogit (survival package) but it don´t worked. 
 
 Any idea?
 
 miltinho

You are trying to fit an equation

  P[y = 1 ; x] = exp((x-a)/b))/(1 + exp((x-a)/b))

to data

  x =   1   2   3   4   5   6   7   8   9  10

  y =   0   0   0   0   0   0   0   1   1   1

by what amounts to a maximum-likelihood method, i.e. which
chooses the parameter values to maximize the probability of
the observed values of y (given the values of x).

The maximum probability possible is 1, so if you can find
parameters which make P[y = 1] = 0 for x = 1, 2, ... , 7
and P[y = 1] for x = 8, 9, 10 then you have done it.

This will be approximated as closely as you please for any
value of a between 7 and 8, and sufficiently small values of b,
since for such parameter values P[y = 1 ; x] - 0 for x  a,
and - 1 for x  a.

You therefore have a solution which is both indeterminate
(any a such that 7  a  8) and singular (b - 0). So it
will defeat standard estimation methods.

That is the source of your problem. In a more general context,
this is an instance of the linear separation problem in
logistic regression (and similar methods, such a probit
analysis). Basically, this situation implies that, according
to the data, there is a perfect prediction for the results.

There is no well-defined way of dealing with it; any approach
starts from the proposition this perfect prediction is not
a reasonable result in the context of my data, and continues
by following up what you think should be meant by not a
reasonable result. What this is likely to mean would be on
the lines of b should not be that small, which then imposes
upon you the need to be more specific about how small b may
reasonably be. Then carry on from there (perhaps by fixing
the value of b at different reasonable levels, and simply
fitting a for each value of b).

Hoping this helps ... but I'm wondering how it happens that
you have such data ... ??

best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 15-Mar-07   Time: 19:38:51
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to...

2007-03-15 Thread Petr Pikal
Hi

I suppose you will not get usefull response for such poorly specified 
question. 

For automating procedures on data frames you can either do looping or 
use lapply or maybe do.call can also provide some functionality.

If you elaborate what you did and in what respect it was 
unsatisfactory maybe you will get better answer.

Anyway, before your next post you shall look to posting guide.

Regards
Petr



On 15 Mar 2007 at 17:20, [EMAIL PROTECTED] wrote:

Date sent:  Thu, 15 Mar 2007 17:20:57 +0100
From:   [EMAIL PROTECTED] [EMAIL PROTECTED]
To: R Help R-help@stat.math.ethz.ch
Subject:[R] how to...

 I have to perform ANOVA's on many different data organized in a
 dataframe. I can run an ANOVA for each sample, but I've got hundreds
 of data and I would like to avoid manually carrying out each test. in
 addition, I would like to have the results organized in a simple way,
 for example in a table, wich could be easy to export. thank you for
 assistance
 
 simone 
 
 
 --
 Leggi GRATIS le tue mail con il telefonino i-mode  di Wind
 http://i-mode.wind.it
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented,
 minimal, self-contained, reproducible code.

Petr Pikal
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression TRY LOGISTF

2007-03-15 Thread Jeff Miller
If Ted is right, then one work-around is to use Firth's method for penalized
log-likelihood. The technique is originally intended to reduce small sample
bias. However, it's now being extended to deal with complete and quasi
separation problems.

I believe the library is called logistf but I haven't had a chance to try
itI know the SAS version (called the fl macro) works fine.

Reference --
http://www.meduniwien.ac.at/user/georg.heinze/techreps/tr2_2004.pdf


Hope this helps,

Jeff Miller
University of Florida
AlphaPoint05, Inc.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ted Harding
Sent: Thursday, March 15, 2007 2:39 PM
To: R-help
Subject: Re: [R] logistic regression

On 15-Mar-07 17:03:50, Milton Cezar Ribeiro wrote:
 Dear All,
 
 I would like adjust and know the R2 of following presence/absence
 data:
 
 x-1:10
 y-c(0,0,0,0,0,0,0,1,1,1)
 
 I tryed use clogit (survival package) but it don´t worked. 
 
 Any idea?
 
 miltinho

You are trying to fit an equation

  P[y = 1 ; x] = exp((x-a)/b))/(1 + exp((x-a)/b))

to data

  x =   1   2   3   4   5   6   7   8   9  10

  y =   0   0   0   0   0   0   0   1   1   1

by what amounts to a maximum-likelihood method, i.e. which chooses the
parameter values to maximize the probability of the observed values of y
(given the values of x).

The maximum probability possible is 1, so if you can find parameters which
make P[y = 1] = 0 for x = 1, 2, ... , 7 and P[y = 1] for x = 8, 9, 10 then
you have done it.

This will be approximated as closely as you please for any value of a
between 7 and 8, and sufficiently small values of b, since for such
parameter values P[y = 1 ; x] - 0 for x  a, and - 1 for x  a.

You therefore have a solution which is both indeterminate (any a such that 7
 a  8) and singular (b - 0). So it will defeat standard estimation
methods.

That is the source of your problem. In a more general context, this is an
instance of the linear separation problem in logistic regression (and
similar methods, such a probit analysis). Basically, this situation implies
that, according to the data, there is a perfect prediction for the results.

There is no well-defined way of dealing with it; any approach starts from
the proposition this perfect prediction is not a reasonable result in the
context of my data, and continues by following up what you think should be
meant by not a reasonable result. What this is likely to mean would be on
the lines of b should not be that small, which then imposes upon you the
need to be more specific about how small b may reasonably be. Then carry on
from there (perhaps by fixing the value of b at different reasonable levels,
and simply fitting a for each value of b).

Hoping this helps ... but I'm wondering how it happens that you have such
data ... ??

best wishes,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 15-Mar-07   Time: 19:38:51
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] unbiased sandwich variance estimator for GEE estimates

2007-03-15 Thread Qiong Yang
Hi,

Anyone knows any existing package/program that has implemented
unbiased (or bias-reduced) sandwich variance estimator (Wu (1986) and 
Carroll (2001)
for GEE estimates?

Thanks
Qiong

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Model selection in LASSO (cross-validation)

2007-03-15 Thread WQ Y
Hi, I know how to use LASSO for model selection based on the Cp criterion.  
I heard that we can also use cross validation as a criterion too.  I used 
cv.lars to give me the lowest predicted error  fraction.  But I'm short of 
a step to arrive at the number of variables to be included in the final 
model.  How do we do that?  Is it the predict.lars function?  i tried  
logprostate.plars.cv=predict.lars(logprostate.lars.cv, M, type = fit, 
mode=fraction) but it gives me error message:
Error in dim(data) - dim : attempt to set an attribute on NULL.  Please 
help!

thanks!

_
With tax season right around the corner, make sure to follow these few 
simple tips.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Model selection in LASSO (cross-validation)

2007-03-15 Thread WQ Y
Hi, I know how to use LASSO for model selection based on the Cp criterion.  
I heard that we can also use cross validation as a criterion too.  I used 
cv.lars to give me the lowest predicted error  fraction.  But I'm short of 
a step to arrive at the number of variables to be included in the final 
model.  How do we do that?  Is it the predict.lars function?  i tried  
logprostate.plars.cv=predict.lars(logprostate.lars.cv, M, type = fit, 
mode=fraction) but it gives me error message:
Error in dim(data) - dim : attempt to set an attribute on NULL.  Please 
help!

Thanks!

_
The average US Credit Score is 675. The cost to see yours: $0 by Experian.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Covariance matrix calc method question

2007-03-15 Thread Leeds, Mark \(IED\)
I have been comparing the output of an R package to S+Finmetrics and I
notice that 
the covariance matrix outputted by the two procedures is different. The
R package
computes the covariance matrix using Method 1 and I think ( but I'm not
sure ) that S+Finmetrics computes it 
using Method 2.

I put in a correctionfactor (see below ) in to Method 2 in order to deal
with the fact that the  var function 
calculates the unnbiased estimate of variance. This gives the same
answer in both problems for the
data shown which I just made up.  But, my hope is that , for much larger
problems, leaving the correction factor 
out can maybe cause huge differences in the determinant ? That would
explain why some of the output ( AIC, BIC ) 
differs in the two packages. I'm not really sure how to check this
easily but if someone has an idea,
it would be appreciated. Basically, I kind of want to run some sort of
simulation along
the lines of below to check whether this could be the reason for the
differences.  Thanks.

x-c(11,12,13,14,16)
y-c(2,4,6,8,12)
z-c(14,18,22,50,20)
LHS-cbind(x,y)
sample-nrow(lhs)

# METHOD1

output1-lm(LHS ~ z)
resids-resid(output1)
sigma.hat1-crossprod(resids)/sample
print(sigma.hat1)

print(det(sigma.hat1))

# METHOD2

fit1-lm(LHS[,1] ~ z)
fit2-lm(LHS[,2] ~ z)

correctionfactor-sample-1/sample

sigma.hat2-correctionfactor*var(cbind(resid(fit1),resid(fit2)))

print(sigma.hat2)
print(det(sigma.hat2))

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] expm() within the Matrix package

2007-03-15 Thread Gad Abraham
Laura Hill wrote:
 Hi
 
 Could anybody give me a bit of advice on some code I'm having trouble with?
 
 I've been trying to calculate the loglikelihood of a function iterated over
 a data of time values and I seem to be experiencing difficulty when I use
 the function expm(). Here's an example of what I am trying to do
 
 
 y-c(5,10)  #vector of 2 survival times
 
 p-Matrix(c(1,0),1,2)   #1x2 matrix
 
 Q-Matrix(c(1,2,3,4),2,2)   #2x2 matrix
 
 q-Matrix(c(1,2),2,1)   #2x1 matrix
 
 Loglik-rep(0,length(y))#creating empty vector of same length as y
 
 for(i in 1:length(y)){
 
 Loglik[i]-log((p %*% (expm(Q*y[i]))) %*% q)  #calculating

 #  Loglikelihood for each y[i]
 }
 
 The code works perfectly if I use exp(Q*y[i]) but not for expm()

You need to do a type conversion here, because you get the loglik as a 
1x1 Matrix, instead of a scalar:

(after running your code)

  log(p %*% expm(Q * y[i]) %*% q)
1 x 1 Matrix of class dgeMatrix
  [,1]
[1,] 134.5565


If you convert to numeric, you can then assign it to Loglik:
  Loglik[1] - as.numeric(log(p %*% expm(Q * y[i]) %*% q))
  Loglik[1]
[1] 134.5565



-- 
Gad Abraham
Department of Mathematics and Statistics
The University of Melbourne
Parkville 3010, Victoria, Australia
email: [EMAIL PROTECTED]
web: http://www.ms.unimelb.edu.au/~gabraham

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] replacing all NA's in a dataframe with zeros...

2007-03-15 Thread David L. Van Brunt, Ph.D.
Thanks, one and all... I knew it had to be simple.

On 3/14/07, Jason Barnhart [EMAIL PROTECTED] wrote:

 This should work.

  test.df - data.frame(x1=c(NA,2,3,NA), x2=c(1,2,3,4),
  x3=c(1,NA,NA,4))
  test.df
   x1 x2 x3
 1 NA  1  1
 2  2  2 NA
 3  3  3 NA
 4 NA  4  4

  test.df[is.na(test.df)] - 1000

  test.df
 x1 x2   x3
 1 1000  11
 22  2 1000
 33  3 1000
 4 1000  44



 The following search string cran r replace data.frame NA in Google
 (as US user) yielded some good results (5th and 7th entry), but there
 was another example that explicitly yielded this technique.  I can't
 seem to recall my exact search string.


 - Original Message -
 From: David L. Van Brunt, Ph.D. [EMAIL PROTECTED]
 To: R-Help List r-help@stat.math.ethz.ch
 Sent: Wednesday, March 14, 2007 5:22 PM
 Subject: [R] replacing all NA's in a dataframe with zeros...


  I've seen how to  replace the NA's in a single column with a data
  frame
 
  * mydata$ncigs[is.na(mydata$ncigs)]-0
 
  *But this is just one column... I have thousands of columns (!) that
  I need
  to do this, and I can't figure out a way, outside of the dreaded
  loop, do
  replace all NA's in an entire data frame (all vars) without naming
  each var
  separately. Yikes.
 
  I'm racking my brain on this, seems like I must be staring at the
  obvious,
  but it eludes me. Searches have come up CLOSE, but not quite what I
  need..
 
  Any pointers?
 
  --
  ---
  David L. Van Brunt, Ph.D.
  mailto:[EMAIL PROTECTED]
 
  If Tyranny and Oppression come to this land, it will be in the
  guise of
  fighting a foreign enemy.
  --James Madison
 
  [[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 




-- 
---
David L. Van Brunt, Ph.D.
mailto:[EMAIL PROTECTED]

If Tyranny and Oppression come to this land, it will be in the guise of
fighting a foreign enemy.
--James Madison

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multiple scores per subject

2007-03-15 Thread Christopher Brown
Hi,

I have a data set that looks like this:

  data
vara varb S  PC
1  None  250 1  80
2  None  250 1  70
3  Some  250 1  60
4  Some  250 1  70
5  None 1000 1  90
6  None 1000 1  90
7  Some 1000 1  80
8  Some 1000 1  70
9  None  250 2 100
10 None  250 2  80
11 Some  250 2  70
12 Some  250 2  70
13 None 1000 2 100
14 None 1000 2  90
15 Some 1000 2  50
16 Some 1000 2  40

...

And so on. The last column is the dependent variable, and I have made
the other columns factors. As you can see, there are multiple scores for 
each subject in each combination of conditions. How can I reduce the 
dataset so that there is only 1 score per subject, per condition, for 
further analysis? I can use tapply to get means, but I need a data.frame 
for analysis (aov). Any ideas?

-- 
Chris

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to determine the relative distance between a DNA sequence and know genes

2007-03-15 Thread Roger Liu
Hi,

recently, I got a bunch of DNA sequences (with chromosome coordinate for
each sequence), I would like to know the relative distance between these
sequences and all the genes (genomic sequences) on human genome, i.e., are
these sequences locate at upstream of the genes(5 prime end), downstream of
the genes(3 prime end) or within the genes. I have about 8000 sequences. Any
package, methods could help me? thanks.

-zrl

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multiple scores per subject

2007-03-15 Thread jim holtman
Does this do what you want?

 x -   vara varb S  PC
+ 1  None  250 1  80
+ 2  None  250 1  70
+ 3  Some  250 1  60
+ 4  Some  250 1  70
+ 5  None 1000 1  90
+ 6  None 1000 1  90
+ 7  Some 1000 1  80
+ 8  Some 1000 1  70
+ 9  None  250 2 100
+ 10 None  250 2  80
+ 11 Some  250 2  70
+ 12 Some  250 2  70
+ 13 None 1000 2 100
+ 14 None 1000 2  90
+ 15 Some 1000 2  50
+ 16 Some 1000 2  40
 x.in - read.table(textConnection(x), header=TRUE)
 (x.agg -  aggregate(x.in$PC, list(vara=x.in$vara, varb=x.in$varb, S=
x.in$S), mean))
  vara varb S  x
1 None  250 1 75
2 Some  250 1 65
3 None 1000 1 90
4 Some 1000 1 75
5 None  250 2 90
6 Some  250 2 70
7 None 1000 2 95
8 Some 1000 2 45



On 3/15/07, Christopher Brown [EMAIL PROTECTED] wrote:

 Hi,

 I have a data set that looks like this:

  data
vara varb S  PC
 1  None  250 1  80
 2  None  250 1  70
 3  Some  250 1  60
 4  Some  250 1  70
 5  None 1000 1  90
 6  None 1000 1  90
 7  Some 1000 1  80
 8  Some 1000 1  70
 9  None  250 2 100
 10 None  250 2  80
 11 Some  250 2  70
 12 Some  250 2  70
 13 None 1000 2 100
 14 None 1000 2  90
 15 Some 1000 2  50
 16 Some 1000 2  40

 ...

 And so on. The last column is the dependent variable, and I have made
 the other columns factors. As you can see, there are multiple scores for
 each subject in each combination of conditions. How can I reduce the
 dataset so that there is only 1 score per subject, per condition, for
 further analysis? I can use tapply to get means, but I need a data.frame
 for analysis (aov). Any ideas?

 --
 Chris

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] MANOVA permutation testing

2007-03-15 Thread Tyler Smith
Hi,

I've got a dataset with 7 variables for 8 different species. I'd like
to test the null hypothesis of no difference among species for these
variables. MANOVA seems like the appropriate test, but since I'm
unsure of how well the data fit the assumptions of equal
variance/covariance and multivariate normality, I want to use a
permutation test. 

I've been through CRAN looking at packages boot, bootstrap, coin,
permtest, but they all seem to be doing more than I need. Is the
following code an appropriate way to test my hypothesis:

result.vect - c()

for (i in 1:1000){
  wilks - summary.manova(manova(maxent~sample(max.spec)),
   test=Wilks)$stats[1,2]
  result.vect - c(res.vect,wilks)
}

maxent is the data, max.spec is a vector of species names. Comparing
the result.vect with the wilks value for the unpermuted data suggests
there are very significant differences among species -- but did I do
this properly?

-- 
Regards,

Tyler Smith

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How can I place axis annotations away from axis?

2007-03-15 Thread Achiko

Hello Experts

I have the following codes and data for 2 interpolation plots.

http://www.nabble.com/file/7206/3d_plot_data.txt 3d_plot_data.txt 

data-read.table(3d_plot_data.txt, header=T)
attach(data)

par(mfrow=c(1,2))

library(akima)

interpolation-interp(rr,veg_r,predict)

persp(interpolation,theta = -45, phi = 30, ticktype = detailed, nticks=4,
cex=0.8, expand=0.5, xlab=\n\n\nPrecipitation, yla=\n\n\nVegetation,
zlab=\n\n\nDensity, shade=0.4)

interpolation-interp(tc,veg_r,predict, duplicate=mean)

persp(interpolation,theta = -45, phi = 30, ticktype = detailed, nticks=4,
cex=0.8, expand=0.5, xlab=\n\n\nTemperature, yla=\n\n\nVegetation,
zlab=\n\n\nDensity, shade=0.4)


Now as you can see, and when exported as eps, axis annotation by tickmarks
are overlapping with Z axis. As it's for publication, font should be this
big. I could put the axis labels away from the axes, but connot find how to
place axis annotations further from the axes, or if it's ever possible to
adjust the distance between axis and axis annotation. 

Your help is much appreciated! 
-- 
View this message in context: 
http://www.nabble.com/How-can-I-place-axis-annotations-away-from-axis--tf3412293.html#a9507709
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ARIMA standard error

2007-03-15 Thread Gad Abraham
Hi,

Can anyone explain how the standard error in arima() is calculated?

Also, how can I extract it from the Arima object? I don't see it in there.

  x - rnorm(1000)
  a - arima(x, order = c(4, 0, 0))
  a

Call:
arima(x = x, order = c(4, 0, 0))

Coefficients:
   ar1 ar2 ar3  ar4  intercept
   -0.0451  0.0448  0.0139  -0.0688 0.0010
s.e.   0.0316  0.0316  0.0317   0.0316 0.0296

sigma^2 estimated as 0.9775:  log likelihood = -1407.56,  aic = 2827.12
  names(a)
  [1] coef  sigma2var.coef  mask  loglikaic 
   arma  residuals call  seriescode 
n.condmodel


Thanks,
Gad

-- 
Gad Abraham
Department of Mathematics and Statistics
The University of Melbourne
Parkville 3010, Victoria, Australia
email: [EMAIL PROTECTED]
web: http://www.ms.unimelb.edu.au/~gabraham

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ARIMA standard error

2007-03-15 Thread Andrew Robinson
Hi Gad,

try:


 class(a)
[1] Arima
 getAnywhere(print.Arima)

...

Cheers,

Andrew


On Fri, Mar 16, 2007 at 01:47:25PM +1100, Gad Abraham wrote:
 Hi,
 
 Can anyone explain how the standard error in arima() is calculated?
 
 Also, how can I extract it from the Arima object? I don't see it in there.
 
   x - rnorm(1000)
   a - arima(x, order = c(4, 0, 0))
   a
 
 Call:
 arima(x = x, order = c(4, 0, 0))
 
 Coefficients:
ar1 ar2 ar3  ar4  intercept
-0.0451  0.0448  0.0139  -0.0688 0.0010
 s.e.   0.0316  0.0316  0.0317   0.0316 0.0296
 
 sigma^2 estimated as 0.9775:  log likelihood = -1407.56,  aic = 2827.12
   names(a)
   [1] coef  sigma2var.coef  mask  loglikaic 
arma  residuals call  seriescode 
 n.condmodel
 
 
 Thanks,
 Gad
 
 -- 
 Gad Abraham
 Department of Mathematics and Statistics
 The University of Melbourne
 Parkville 3010, Victoria, Australia
 email: [EMAIL PROTECTED]
 web: http://www.ms.unimelb.edu.au/~gabraham
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Andrew Robinson  
Department of Mathematics and StatisticsTel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ARIMA standard error

2007-03-15 Thread Gad Abraham
Andrew Robinson wrote:
 Hi Gad,
 
 try:
 
 
 class(a)
 [1] Arima
 getAnywhere(print.Arima)

Thanks Andrew.

For the record, the standard error is the square root of the diagonal of 
the covariance matrix a$var.coef (itself obtained through some magic):

ses[x$mask] - round(sqrt(diag(x$var.coef)), digits = digits)


Cheers,
Gad

-- 
Gad Abraham
Department of Mathematics and Statistics
The University of Melbourne
Parkville 3010, Victoria, Australia
email: [EMAIL PROTECTED]
web: http://www.ms.unimelb.edu.au/~gabraham

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Complex superscript plot text

2007-03-15 Thread Bob Farmer
Hi all:

I would like to create a line of plot margin text (using mtext() ) that 
features both a superscript and a subset of an object.  However, I 
cannot seem to do both things at once, nor have I found a way to paste 
the two results together.

(I pull the object subset because this is part of a for-next loop to 
make multiple graphs)

This example doesn't give me what I want from mtext():

GoodnessOfFits-c(1, 0.75)

graphNumber-1  #first loop of the for-next loop (not shown)

x-seq(-10, 10, 1)
y-(x^2)
plot(x,y)
lines(x, predict(lm(y~I(x^2
mtext(text=
expression(R^2 * : * GoodnessOfFits[graphNumber]))


I recognize that in this example, I could extract the R-squared value 
from the model in each loop, however this does not apply to my more 
complicated real scenario.

Any suggestions?

Thanks.
--Bob Farmer

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error code 5 from Lapack routine 'dsyevr'

2007-03-15 Thread Richard D. Morey
While using the rmvnorm function, I get the error:

Error in eigen(sigma, sym = TRUE) : error code 5 from Lapack routine 
'dsyevr'

The same thing happens when I try the eigen() function on my covariance 
matrix. The matrix is a symmetric 111x111 matrix. Well, it is almost 
symmetric; there are slight deviations from symmetry (the largest is 
3e-18).  I have this in an MCMC loop, and it happens about once in every 
40 iterations or so.

What does this error mean?

Thanks.


-- 
Richard D. Morey, M.A.
Research Assistant, Perception and Cognition Lab
University of Missouri-Columbia

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.