[R] Finding matches in 2 files

2007-07-26 Thread jenny tan


I have 2 files containing data analysed by 2 different methods. I would like to 
find out which genes appear in both analyses. Can someone show me how to do 
this?
_
[[trailing spam removed]]

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] qda(MASS) function error

2007-07-26 Thread Uwe Ligges


Mauro Rossi wrote:
 Dear R user,
  I'm using qda (quadratic discriminant analysis) function (package MASS) 
 to classify 58 explanatory variables (numeric type with different 
 ranges) using a grouping variable (factor 2 levels 0 1). I'm using 
 the qda method for class 'data.frame' (in this way I don't need to 
 specify a formula).
 Using the function:
 result.qda-qda(explanatory.variables, grouping.variable, method=moment)
 I obtain the following error message:
 Error in qda.default(x, grouping, ...) : rank deficiency in group 0
 I run the script excluding some variables and I've individuated 2 
 explanatory variables that give problems, but  I don't understand why 
 they give them. The two excluded variables are numeric with two possible 
 values: 0 and 1, but in the rest of group of  variables, some similar 
 variables are considered.
 
 I don't have this problem using lda  function for linear discriminant 
 analysis.
 
 What does this error message mean?
 What types of variables does qda function consider?

Well, qda assumes real values (and not factors) in the explanatory 
variables. If you think it makes sense to ignore this assumption (and I 
doubt it makes sense), then the error message tells you there is a rank 
deficiency, i.e. some variables might be collinear.
Hence at least one of the covariance matrices cannot be inverted.

Uwe Ligges


 Thank in advance,
 Mauro Rossi
 
 
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding matches in 2 files

2007-07-26 Thread Christophe Pallier
Maybe 'merge', but your message is wa
First


On 7/26/07, jenny tan [EMAIL PROTECTED] wrote:



 I have 2 files containing data analysed by 2 different methods. I would
 like to find out which genes appear in both analyses. Can someone show me
 how to do this?
 _
 [[trailing spam removed]]

 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding matches in 2 files

2007-07-26 Thread Christophe Pallier
Maybe with 'merge', but your message is too vague (see
http://www.catb.org/~esr/faqs/smart-questions.html).

On 7/26/07, jenny tan [EMAIL PROTECTED] wrote:



 I have 2 files containing data analysed by 2 different methods. I would
 like to find out which genes appear in both analyses. Can someone show me
 how to do this?
 _
 [[trailing spam removed]]

 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Christophe Pallier (http://www.pallier.org)

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] For loops

2007-07-26 Thread Patrick Burns
Any time you are calling a function one value at a time,
it is worth asking if you can eliminate a loop (or more).

If 'G.fun' is vectorized in its first argument, then you can
easily get rid of the three inner loops.  Just generate a
vector of all of the values and do:

gj - sum(G.fun(long.vector, ff))

If 'G.fun' is not vectorized and can't be vectorized, then
you might save some time by still creating a vector of the
first argument first.  Whether that will be a significant
reduction depends on how time consuming 'G.fun' is.

There is a caveat.  If 'n' is large, then you could create a
vector that strains the amount of memory (RAM) that you
have.  If that is the case, then there will be some compromise
between loops and vectorization that will be optimal.

Patrick Burns
[EMAIL PROTECTED]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and A Guide for the Unwilling S User)

Joaquim J. S. Ramalho wrote:

Hi,

is there a way of simplifying the following code:

G - rep(NA,n)

for(i in 1:n)
{
gj - 0
for(j in 1:n)
{
for(l in 1:n)
{
for(m in 1:n)
{
gj - gj+G.fun(XB[i]+p[3]*X[j,3]+p[4]*X[l,4]+p[5]*X[m,5],ff)
}
}
}
G[i] - gj/n^3
}

Thanks.

Joaquim Santos

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ROC curve in R

2007-07-26 Thread Tobias Sing
You might also want to try the ROCR package (http://rocr.bioinf.mpi-sb.mpg.de/).
Tutorial slides: http://rocr.bioinf.mpi-sb.mpg.de/ROCR_Talk_Tobias_Sing.ppt
Overview paper:
http://bioinformatics.oxfordjournals.org/cgi/content/full/21/20/3940

Good luck,
  Tobias


On 7/26/07, Rithesh M. Mohan [EMAIL PROTECTED] wrote:
 Hi,



 I need to build ROC curve in R, can you please provide data steps / code
 or guide me through it.



 Thanks and Regards

 Rithesh M Mohan


 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Tobias Sing
Computational Biology and Applied Algorithmics
Max Planck Institute for Informatics
Saarbrucken, Germany
Phone: +49 681 9325 315
Fax: +49 681 9325 399
http://www.tobiassing.net

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Submatrices Extraction

2007-07-26 Thread Bruno C\.
Hello,
Given a submatrix containing 0 or 1
I need to extract the indexes of all the diagonal submatrices 
so one of the two diagonals must contains only 1 for each submatrix ...
Any help?

Thanks in advance
Bruno


--
Scegli infostrada: ADSL gratis per tutta l’estate e telefoni senza canone 
Telecom

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Regression with Missing values. na.action?

2007-07-26 Thread Vaibhav Gathibandhe
Hi all,

Can you please tell me what is the problem here.

My regression eq is  y = B0 + B1X1 +B2X2 +e
And i am interested in coefficient B1

I am doing regression with two cases:

1) reg-lm(y ~ X1 + X2, sam) where sam is the data

2) reg-lm(y ~ X1 + X2, sam, na.action= na.exclude) . I have missing values
in X1


but the values of coefficient is not consistent in two cases.

Actually B1 in case one sould be smaller than B1 in case 2. But sometimes it
comes greater.

I can't figure it out. Is there some problem with *na.action ? *My sample
size is 100


*Regards,*
*Vaibhav*

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regression with Missing values. na.action?

2007-07-26 Thread David Barron
na.exclude should give the same results as na.omit, which is the
default na.action.  Is the number of complete cases the same in these
two regressions?

On 26/07/07, Vaibhav Gathibandhe [EMAIL PROTECTED] wrote:
 Hi all,

 Can you please tell me what is the problem here.

 My regression eq is  y = B0 + B1X1 +B2X2 +e
 And i am interested in coefficient B1

 I am doing regression with two cases:

 1) reg-lm(y ~ X1 + X2, sam) where sam is the data

 2) reg-lm(y ~ X1 + X2, sam, na.action= na.exclude) . I have missing values
 in X1


 but the values of coefficient is not consistent in two cases.

 Actually B1 in case one sould be smaller than B1 in case 2. But sometimes it
 comes greater.

 I can't figure it out. Is there some problem with *na.action ? *My sample
 size is 100


 *Regards,*
 *Vaibhav*

 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
=
David Barron
Said Business School
University of Oxford
Park End Street
Oxford OX1 1HP

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding matches in 2 files

2007-07-26 Thread john seers \(IFR\)
 

Something like:

# Sample data
g1-c(gene1, gene2, gene3, gene4, gene5, gene9, gene10,
geneA)
g2-c(gene6, gene9, gene1, gene2, gene7, gene8, gene9,
gene1, gene10)
df1-cbind(gene=g1, expr=runif(length(g1)))
df2-cbind(gene=g2, expr=runif(length(g2)))

# Merge
mdf-merge(df1, df2, by=gene, sort=T)
# Unique list
ug-unique(mdf[,gene])


You may find the match command useful and/or the %in% opertaor.


JS 




 
---
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of jenny tan
Sent: 26 July 2007 04:35
To: r-help@stat.math.ethz.ch
Subject: [R] Finding matches in 2 files



I have 2 files containing data analysed by 2 different methods. I would
like to find out which genes appear in both analyses. Can someone show
me how to do this?
_
[[trailing spam removed]]

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Download multiple stock quotes in a loop

2007-07-26 Thread Owe Jessen
Hi all,

this should be a simple question, but I haven't been able to do it 
right. I am trying to download multiple stock quotes in a loop, so that 
every timeseries is safed with the symbol of the stock. Can anybody help 
me out? Here's the code:

require(tseries)
startd - 2000-06-01
stocks - c(bmw.de, vow.de, dte.de)
for(stock in stocks)
stock - as.timeSeries(get.hist.quote(instrument=stock, start=startd, 
quote=Close, compress=d))
}

Thanks in advance,
Owe

-- 
Owe Jessen
Diplom-Volkswirt
Hanssenstraße 17
24106 Kiel

[EMAIL PROTECTED]
http://www.econinfo.de

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Download multiple stock quotes in a loop

2007-07-26 Thread Vladimir Eremeev



Owe Jessen wrote:
 
 Hi all,
 
 this should be a simple question, but I haven't been able to do it 
 right. I am trying to download multiple stock quotes in a loop, so that 
 every timeseries is safed with the symbol of the stock. Can anybody help 
 me out? Here's the code:
 
 require(tseries)
 startd - 2000-06-01
 stocks - c(bmw.de, vow.de, dte.de)
 for(stock in stocks)
 stock - as.timeSeries(get.hist.quote(instrument=stock, start=startd, 
 quote=Close, compress=d))
 }
 
 Thanks in advance,
 Owe
 

The variable stock is assigned values twice in the cycle.
First, it gets the value of bmw.de, and immediately after that it is
assigned with the result returned by as.timeSeries(  ... )

If you replace the interior of the loop with the

  assign(paste(stock.,stock,sep=), as.timeSeries(get.hist.quote  [etc]))

you will get three variables, namely, stock.bmw.de, stock.vow.de and
stock.dte.de.
-- 
View this message in context: 
http://www.nabble.com/Download-multiple-stock-quotes-in-a-loop-tf4150838.html#a11808177
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding matches in 2 files

2007-07-26 Thread jim holtman
Is this what you want?

 g1-c(gene1, gene2, gene3, gene4, gene5, gene9, gene10,
+ geneA)
 g2-c(gene6, gene9, gene1, gene2, gene7, gene8, gene9,
+ gene1, gene10)
 intersect(g1,g2)
[1] gene1  gene2  gene9  gene10


On 7/25/07, jenny tan [EMAIL PROTECTED] wrote:


 I have 2 files containing data analysed by 2 different methods. I would like 
 to find out which genes appear in both analyses. Can someone show me how to 
 do this?
 _
 [[trailing spam removed]]

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multiple graphs

2007-07-26 Thread Daniele Amberti
Does anyone have a simple explanation and example on how to add histograms or 
barcharts to an other graph like in the example at the R-graph gallery:

http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=109

looking at the code I'not undertand very well how to add graphs in 
arbitrary/clever position with an adequate scale.

If somebody have a simplier example with explanations it will be highly 
appreciate.

Best
Daniele


--
Scegli infostrada: ADSL gratis per tutta l’estate e telefoni senza canone 
Telecom

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] logistic regression

2007-07-26 Thread Sullivan, Mary M
Greetings,
 

I am working on a logistic regression model in R and I am struggling with the 
code, as it is a relatively new program for me.  In searching Google for 
'logistic regression diagnostics' I came Elizabeth Brown's Lecture 14 from her 
Winter 2004 Biostatistics 515 course  
(http://courses.washington.edu/b515/l14.pdf) .  I found most of the code to be 
very helpful, but I am struggling with the lines on to calculate the observed 
and expected values in the 10 groups created by the cut function.  I get error 
messages in trying to create the E and O matrices:  R won't accept assignment 
of fi1c==j and it won't calculate the sum.  

 

I am wondering whether someone might be able to offer me some assistance...my 
search of the archives was not fruitful.

 

Here is the code that I adapted from the lecture notes:

 

fit - fitted(glm.lyme)

fitc - cut(fit, br = c(0, quantile(fit, p = seq(.1, .9, .1)),1))  

t-table(fitc)

fitc - cut(fit, br = c(0, quantile(fit, p = seq(.1, .9, .1)), 1), labels = F)

t-table(fitc)

 

#Calculate observed and expected values in ea group

E - matrix(0, nrow=10, ncol = 2)

O - matrix(0, nrow=10, ncol=2)

for (j in 1:10) {

  E[j, 2] = sum(fit[fitc==j])

  E[j, 1] = sum((1- fit)[fitc==j])

  O[j, 2] = sum(pcdata$lymdis[fitc==j])

  O[j, 1] = sum((1-pcdata$lymdis)[fitc==j])



}

 

Here is the error message:  Error in Summary.factor(..., na.rm = na.rm) : 
sum not meaningful for factors

 

 

I understand what it means; I just can't figure out how to get around it or how 
to get the output printed in table form.  Thank you in advance for any 
assistance.

 

Mary Sullivan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] substituting dots in the names of the columns (sub, gsub, regexpr)

2007-07-26 Thread Gabor Grothendieck
Use \\. or [.] with quotes to denote a literal dot (#1)
or can use fixed = TRUE to remove the meaning of dot (#2) or
use a zero-width lookahead assertion (?=[.]) which will be matched
but is not added to the string to be replaced (#3).  Try ?regexpr .
Also the links on the gsubfn home page (http://code.google.com/p/gsubfn/)
point to a number of good resources on regular expressions.

Str - c(y..m., BD..g.cm3., PR..Mpa., Ks..m.s., SP.g..g.,
P..m3.m3., theta1..g.g., theta2..g.g., AWC..g.g.)

# 1
tmp - gsub([.]+, ., Str)
sub([.]+$, , tmp)

# 2
tmp - gsub(.., ., Str, fixed = TRUE)
sub([.]+$, , tmp)

# 3 - both done at once using zero-width lookahead
gsub([.]*$|[.]*(?=[.]), , Str, perl = TRUE)


On 7/26/07, 8rino-Luca Pantani [EMAIL PROTECTED] wrote:
 Dear R users,
 I have the following two problems, related to the function sub, grep,
 regexpr and similia.

 The header of the file(s) I have to import is like this.

 c(y (m), BD (g/cm3), PR (Mpa), Ks (m/s), SP g./g., P
 (m3/m3), theta1 (g/g), theta2 (g/g), AWC (g/g))

 To get rid of spaces and symbols in the names of the columns,
 I use read.table(... check.names=TRUE) and I get:
 str - c(y..m., BD..g.cm3., PR..Mpa., Ks..m.s., SP.g..g.,
 P..m3.m3., theta1..g.g., theta2..g.g., AWC..g.g.)

 Now, my problem is to remove the trailing dots, as well as the double
 dots, in order to get the names like the following
 c(y.m, BD.g.cm3, PR.Mpa, Ks.m.s, SP.g.g, P.m3.m3.,
 theta1.g.g, theta2.g.g, AWC.g.g)

 I've searched the help pages for sub, regexpr and similia, and also
 searched the help archives.
 I understand that the dot is a peculiar sign since
 sub(.., ., str)
 [1] ..m....g.cm3.   ...Mpa. ...m.s. ..g..g.
 [6] ..m3.m3..eta1..g.g. .eta2..g.g. .C..g.g.

 Therefore I tried
 sub(\\.., ., str)
 [1] y.m.BD.g.cm3.   PR.Mpa. Ks.m.s. SP...g.
 [6] P.m3.m3.theta1.g.g. theta2.g.g. AWC.g.g.
 and I've been surprised by the (to me) strange behaviour in SP.g..g.
 modified in SP...g.
 An this is the first problem I cannot solve.

 Then there's the problem of trailing dot removal.
 In
 http://tolstoy.newcastle.edu.au/R/e2/help/07/01/8665.html
 I've found a somewhat similar problem, but it do not works in this case
 since:
 gsub([.].*, , str)
 [1] y  BD PR Ks SP P  theta1 theta2
 [9] AWC
 And this the second problem

 Apart this particular problems I would like to know more on regexp, sub
 and so on, since each time
 I have strings to manipulate, I must face my ignorance in the topic of
 regular expression and its syntax.

 Is there any page with examples, where I can improve my knowledge and
 stop being frustrated each time I have to manipulate strings?

 8rino

 --
 Ottorino-Luca Pantani, Università di Firenze
 Dip. Scienza del Suolo e Nutrizione della Pianta
 P.zle Cascine 28 50144 Firenze Italia
 Tel 39 055 3288 202 (348 lab) Fax 39 055 333 273
 [EMAIL PROTECTED]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] substituting dots in the names of the columns (sub, gsub, regexpr)

2007-07-26 Thread Gabor Grothendieck
Use \\. or [.] with quotes to denote a literal dot (#1)
or can use fixed = TRUE to remove the meaning of dot (#2) or
use a zero-width lookahead assertion (?=[.]) which will be matched
but is not added to the string to be replaced (#3).  Try ?regexpr .
Also the links on the gsubfn home page (http://code.google.com/p/gsubfn/)
point to a number of good resources on regular expressions.

Str - c(y..m., BD..g.cm3., PR..Mpa., Ks..m.s., SP.g..g.,
P..m3.m3., theta1..g.g., theta2..g.g., AWC..g.g.)

# 1
tmp - gsub([.]+, ., Str)
sub([.]+$, , tmp)

# 2
tmp - gsub(.., ., Str, fixed = TRUE)
sub([.]+$, , tmp)

# 3 - both done at once using zero-width lookahead
gsub([.]*$|[.]*(?=[.]), , Str, perl = TRUE)


On 7/26/07, 8rino-Luca Pantani [EMAIL PROTECTED] wrote:
 Dear R users,
 I have the following two problems, related to the function sub, grep,
 regexpr and similia.

 The header of the file(s) I have to import is like this.

 c(y (m), BD (g/cm3), PR (Mpa), Ks (m/s), SP g./g., P
 (m3/m3), theta1 (g/g), theta2 (g/g), AWC (g/g))

 To get rid of spaces and symbols in the names of the columns,
 I use read.table(... check.names=TRUE) and I get:
 str - c(y..m., BD..g.cm3., PR..Mpa., Ks..m.s., SP.g..g.,
 P..m3.m3., theta1..g.g., theta2..g.g., AWC..g.g.)

 Now, my problem is to remove the trailing dots, as well as the double
 dots, in order to get the names like the following
 c(y.m, BD.g.cm3, PR.Mpa, Ks.m.s, SP.g.g, P.m3.m3.,
 theta1.g.g, theta2.g.g, AWC.g.g)

 I've searched the help pages for sub, regexpr and similia, and also
 searched the help archives.
 I understand that the dot is a peculiar sign since
 sub(.., ., str)
 [1] ..m....g.cm3.   ...Mpa. ...m.s. ..g..g.
 [6] ..m3.m3..eta1..g.g. .eta2..g.g. .C..g.g.

 Therefore I tried
 sub(\\.., ., str)
 [1] y.m.BD.g.cm3.   PR.Mpa. Ks.m.s. SP...g.
 [6] P.m3.m3.theta1.g.g. theta2.g.g. AWC.g.g.
 and I've been surprised by the (to me) strange behaviour in SP.g..g.
 modified in SP...g.
 An this is the first problem I cannot solve.

 Then there's the problem of trailing dot removal.
 In
 http://tolstoy.newcastle.edu.au/R/e2/help/07/01/8665.html
 I've found a somewhat similar problem, but it do not works in this case
 since:
 gsub([.].*, , str)
 [1] y  BD PR Ks SP P  theta1 theta2
 [9] AWC
 And this the second problem

 Apart this particular problems I would like to know more on regexp, sub
 and so on, since each time
 I have strings to manipulate, I must face my ignorance in the topic of
 regular expression and its syntax.

 Is there any page with examples, where I can improve my knowledge and
 stop being frustrated each time I have to manipulate strings?

 8rino

 --
 Ottorino-Luca Pantani, Università di Firenze
 Dip. Scienza del Suolo e Nutrizione della Pianta
 P.zle Cascine 28 50144 Firenze Italia
 Tel 39 055 3288 202 (348 lab) Fax 39 055 333 273
 [EMAIL PROTECTED]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fit t Copula

2007-07-26 Thread livia

Hi, I am trying to fit t copula to some data, and I am using the following
function in the library(QRMlib).
Udatac - apply(datac, 2, edf,adjust=1)
tcopulac - fit.tcopula.rank(Udatac)

But the error message come out Error in fit.tcopula.rank(Udatac) : Non
p.s.d. covariance matrix

Could anyone give me some advice? In fact, I am not sure what the adjust=1
is used for.
Many thanks.

-- 
View this message in context: 
http://www.nabble.com/Fit-t-Copula-tf4152818.html#a11814432
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create Strings of Column Id's

2007-07-26 Thread jim holtman
Is this what you want:

 paste(-, paste(colnames(MyMatrix)[COL], collapse='-'), sep='')
[1] -E-T


On 7/26/07, Tom.O [EMAIL PROTECTED] wrote:

 Does anyone know how this is don?

 I have a large matrix where I extract specific columns into txt files for
 further use. To be able to keep track of which txt files contain which
 columns I want to name the filenames with the column Id's.

 The most basic example would be to use an for() loop together with paste(),
 but the result is blank. Not even NULL.

 this is the concept of thecode i use:

 for example

 MyMatrix - matrix(NA,ncol=4,nrow=1,dimnames=list(NULL,c(E,R,T,Y)))
 COL - c(1,3) # a vector of columns I want to extract,

 Filename - NULL # the starting variable, so I can use paste
 Filename - for(i in colnames(MyMatrix)[COL]) {paste(Filename,-,i,sep=)}

 The result is -T, but I want it to be -E-T

 Anyone have a clue?

 Thanks Tom


 --
 View this message in context: 
 http://www.nabble.com/Create-Strings-of-Column-Id%27s-tf4153354.html#a11816439
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with Dates

2007-07-26 Thread Jeffrey J. Hallman
Are you using the latest version of fame?  1.05 and earlier had a bug in
tisFromCsv that was fixed in 1.08.

Below I show what I get with fame version 1.08.  There is still a problem in
that the frequency-figuring logic appears to think the frequency is bwsunday
(biweekly with weeks ending on Sunday) rather than semimonthly, which would
appear to be a better fit.  That's why the 19860330 observation is getting
filled in with NA's.

Jeff

 Lines - Date  Price Open.Int. Comm.Long Comm.Short net.comm
15-Jan-86 673.25175645 65910  2842537485
31-Jan-86 677.00167350 54060  2712026940
14-Feb-86 680.25157985 37955  2542512530
28-Feb-86 691.75162775 49760  1603033730
14-Mar-86 706.50163495 54120  2799526125
31-Mar-86 709.75164120 54715  3039024325

+ + + + + +  
 boink - tisFromCsv(textConnection(Lines), dateFormat = %d-%b-%y, dateCol = 
 Date, sep = )
 boink
$Price
   [,1]
19860119 673.25
19860202 677.00
19860216 680.25
19860302 691.75
19860316 706.50
19860330 NA
19860413 709.75
class: tis

$Open.Int.
   [,1]
19860119 175645
19860202 167350
19860216 157985
19860302 162775
19860316 163495
19860330 NA
19860413 164120
class: tis

$Comm.Long
  [,1]
19860119 65910
19860202 54060
19860216 37955
19860302 49760
19860316 54120
19860330NA
19860413 54715
class: tis

$Comm.Short
  [,1]
19860119 28425
19860202 27120
19860216 25425
19860302 16030
19860316 27995
19860330NA
19860413 30390
class: tis

$net.comm
  [,1]
19860119 37485
19860202 26940
19860216 12530
19860302 33730
19860316 26125
19860330NA
19860413 24325
class: tis


Gabor Grothendieck [EMAIL PROTECTED] writes:

 On 26 Jul 2007 09:59:31 -0400, Jeffrey J. Hallman [EMAIL PROTECTED] wrote:
  zoo is nice.  'tisFromCsv()' in the fame package is nicer.
 
  Jeff
 
 
 1. What am I doing wrong here?  I only get one data column.
 2. I assume the regularized dates which do not exactly match the input ones
 are intended so as to make this a regularly spaced series.  Is that right?
 3. What is the cause of the warning message?
 4. Why is a list returned with a single component containing the output?
 Thanks.
 
  library(fame)
  Lines -  Date  Price Open.Int. Comm.Long Comm.Short net.comm
 + 15-Jan-86 673.25175645 65910  2842537485
 + 31-Jan-86 677.00167350 54060  2712026940
 + 14-Feb-86 680.25157985 37955  2542512530
 + 28-Feb-86 691.75162775 49760  1603033730
 + 14-Mar-86 706.50163495 54120  2799526125
 + 31-Mar-86 709.75164120 54715  3039024325
 + 
  tisFromCsv(textConnection(Lines), dateFormat = %d-%b-%y, dateCol = 
  Date, sep = )
 [[1]]
[,1]
 19860119 673.25
 19860202 677.00
 19860216 680.25
 19860302 691.75
 19860316 706.50
 19860330 709.75
 class: tis
 
 Warning message:
 number of items to replace is not a multiple of replacement length in:
 x[i] - value
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Jeff

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to auto-scale cex of y-axis labels in lattice dotplot?

2007-07-26 Thread Deepayan Sarkar
On 7/25/07, Kevin Wright [EMAIL PROTECTED] wrote:
 When I create a dotplot in lattice, I frequently observe overplotting
 of the labels along the vertical axis.  On my screen, this illustrates
 overplotting of the letters:

 windows()
 reps=6
 dat=data.frame(let=rep(letters,each=reps), grp=rep(1:reps, 26),
   y=runif(26*reps))
 dotplot(let~y|grp, dat)

 Is there a way to automatically scale the labels so that they are not
 over-plotted?

Not that I can think of.

 I currently do something like this:
 Calculate or guess the number of panel rows: NumPanelRows
 cexLab - min(1, .9*par()$pin[2]/
   (nlevels(dat$let)*NumPanelRows*strheight(A,units=in)))
 dotplot(..., scales=list(y=list(cex=cexLab))

 Is there an easier way?

 Is there a function that I can call which calculates the layout of the
 panels that will be used in the dotplot?

Not really. The eventual layout is calculated inside print.trellis as
follows (where 'x' is the trellis object being plotted):


panel.layout -
compute.layout(x$layout, dim(x), skip = x$skip)

[...]

if (panel.layout[1] == 0)
{
ddim - par(din)
device.aspect - ddim[2] / ddim[1]
panel.aspect - panel.height[[1]] / panel.width[[1]]

plots.per.page - panel.layout[2]
m - max (1, round(sqrt(panel.layout[2] * device.aspect /
panel.aspect)))
n - ceiling(plots.per.page/m)
m - ceiling(plots.per.page/n)
panel.layout[1] - n
panel.layout[2] - m
}

-Deepayan

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Constructing bar charts with standard error bars

2007-07-26 Thread Ben Bolker

John Zabroski wrote:
 On 7/25/07, Ben Bolker [EMAIL PROTECTED] wrote:

 Thanks a lot!  I tried all three and they all seem very dependable.
 Also, I appreciate you rewriting my solution and adding elegance.

 Is there a way to extend the tick marks to the ylim values, such that
 the yscale ymax tickmark is something like max(xbar+se)?  In the
 documentation, I thought par(yaxp=c(y0,y1,n)) would do the trick, but
 after trying to use it I am not sure I understand what yaxp even does.

 It took me quite a while to figure this out, I'm not surprised you 
didn't ...

The very easiest way to do this is simply to set ylim to (0,0.4) -- since
you probably want to extend the axes upward to a pretty number
anyway.

The other standard way to do this is to use barplot with axes=FALSE and
then add the axes yourself, with the ticks specified wherever you want:

barplot(...,ylim=c(0,0.4),axes=FALSE)
axis(side=1)
axis(side=2,at=seq(0,0.4,length=8))

  However, I was wondering what was up with yaxp, and why setting
it didn't seem to do anything.  The answer is lurking in ?par:

## This parameter is reset when a user coordinate system is set
##   up, for example by starting a new page or by calling
##   'plot.window' or setting 'par(usr)': 'n' is taken from
##   'par(lab)'.  It affects the default behaviour of subsequent
##   calls to 'axis' for sides 1 or 3.

 Thus, when barplot starts up and plots a new set of axes it RESETS
par(yaxp).  Thus

par(yaxp=...)); barplot(...)

doesn't work.

 However,

barplot(...,yaxp=...)  does work.

 It would actually be nice to have an axis style (xaxs,yaxs) that extended
the axis out beyond the range of the data until it found pretty labels that
extended beyond the data range -- for example, set the range according
to xaxs=r, find the pretty axis ticks, and then add another tick ...

 cheers
   Ben Bolker

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] offset in coxph

2007-07-26 Thread Michael Gormley
The offset argument used in glm and other functions seems to have been 
removed from the argument list for coxph.  I am wondering if there is a 
reason for this and if there is a possible work-around in order to produce a 
cox-ph object without fitting coefficients?

Thanks,
Mike

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R CMD check sh: line 1: make: command not found

2007-07-26 Thread Prof Brian Ripley
On Thu, 26 Jul 2007, David Peltier wrote:

 hello,

 I am using R 2.5.0 under OS X.

 I am having  sh: line 1: make: command not found error message when
 I run  R CMD check  :

 Any help would be appreciated.

Well, that is easy: 'make' is missing.  It should be there in the OS, so 
you need to talk to your OS support for help in finding/installing it.

BTW, the list for MacOS-specific questions if r-sig-mac.


 R CMD check backtest

 * checking for working latex ... OK
 * using log directory '/backtest/trunk/backtest.Rcheck'
 * using R version 2.5.0 (2007-04-23)
 * checking for file 'backtest/DESCRIPTION' ... OK
 * checking extension type ... Package
 * this is package 'backtest' version '0.2-0'
 * checking package dependencies ... OK
 * checking if this is a source package ... OK
 * checking whether package 'backtest' can be installed ... OK
 * checking package directory ... OK
 * checking for portable file names ... OK
 * checking for sufficient/correct file permissions ... OK
 * checking DESCRIPTION meta-information ... OK
 * checking top-level files ... OK
 * checking index information ... OK
 * checking package subdirectories ... OK
 * checking R files for non-ASCII characters ... OK
 * checking R files for syntax errors ... OK
 * checking whether the package can be loaded ... OK
 * checking whether the package can be loaded with stated
 dependencies ... OK
 * checking whether the name space can be loaded with stated
 dependencies ... OK
 * checking for unstated dependencies in R code ... OK
 * checking S3 generic/method consistency ... OK
 * checking replacement functions ... OK
 * checking foreign function calls ... OK
 * checking R code for possible problems ... OK
 * checking Rd files ... OK
 * checking Rd cross-references ... OK
 * checking for missing documentation entries ... OK
 * checking for code/documentation mismatches ... OK
 * checking Rd \usage sections ... OK
 * checking data for non-ASCII characters ... OK
 * creating backtest-Ex.R ... OK
 * checking examples ... OK
 * checking tests ...
 sh: line 1: make: command not found
 ERROR
   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convert string to list?

2007-07-26 Thread jim holtman
Is this what you want:

 str - P = 0.0, T = 0.0, Q = 0.0
 x - eval(parse(text=paste('list(', str, ')')))
 str(x)
List of 3
 $ P: num 0
 $ T: num 0
 $ Q: num 0



On 7/26/07, Manuel Morales [EMAIL PROTECTED] wrote:
 Let's say I have the following string:

 str - P = 0.0, T = 0.0, Q = 0.0

 I'd like to find a function that generates the following object from
 'str'.

 list(P = 0.0, T = 0.0, Q = 0.0)

 Thanks!

 --
 http://mutualism.williams.edu

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convert string to list?

2007-07-26 Thread Ross Darnell
Manuel

Jim's may be what you want-- a list of numerics with names P, T and Q or
a list of character strings?

 str - P = 0.0, T = 0.0, Q = 0.0

 str(as.vector(unlist(strsplit(str,,)),mode=list))
List of 3
 $ : chr P = 0.0
 $ : chr  T = 0.0
 $ : chr  Q = 0.0



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of jim holtman
Sent: Friday, 27 July 2007 11:20 AM
To: Manuel Morales
Cc: r-help
Subject: Re: [R] Convert string to list?

Is this what you want:

 str - P = 0.0, T = 0.0, Q = 0.0
 x - eval(parse(text=paste('list(', str, ')')))
 str(x)
List of 3
 $ P: num 0
 $ T: num 0
 $ Q: num 0



On 7/26/07, Manuel Morales [EMAIL PROTECTED] wrote:
 Let's say I have the following string:

 str - P = 0.0, T = 0.0, Q = 0.0

 I'd like to find a function that generates the following object from
 'str'.

 list(P = 0.0, T = 0.0, Q = 0.0)

 Thanks!

 --
 http://mutualism.williams.edu

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Minitab Parametric Distribution Analysis in R

2007-07-26 Thread Tom La Bone
After a bit of coaching I found what I was looking for: the fitdistr()
function in the MASS package. It appears to be a bit easier to use than
mle() for my application. Thanks all.

Tom


-Original Message-
From: Thomas Lumley [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 25, 2007 12:03 PM
To: Tom La Bone
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Minitab Parametric Distribution Analysis in R


The survival package (survreg() function) will fit quite a few parametric
models under censoring.

If you aren't doing regression, but just one-sample fitting, you can feed
the appropriate censored or truncated likelihood to mle() in the stat4
package.

Both packages should be part of your R distribution.

-thomas



On Wed, 25 Jul 2007, Tom La Bone wrote:

 Minitab can perform a Parametric Distribution Analysis - Arbitrary
 Censoring with one of eight distributions (e.g., weibull), giving the
 maximum likelihood estimates of the parameters in the distribution for a
 given dataset. Does R have a package that provides equivalent
functionality?
 Thanks for any advice you can offer.



 Tom La Bone


   [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Convert string to list?

2007-07-26 Thread Manuel Morales
Let's say I have the following string:

str - P = 0.0, T = 0.0, Q = 0.0

I'd like to find a function that generates the following object from
'str'.

list(P = 0.0, T = 0.0, Q = 0.0)

Thanks!

-- 
http://mutualism.williams.edu

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Survival analysis with 60% random censoring

2007-07-26 Thread zhongmiao wang
Hello,
My study is to predict the likelihood an insurance policy holder will
not renew his policy in the coming expiration date.
My data has about 60% censoring and they are random, because customers
buy insurance at different time, however, the study has to be
terminated on a single date.  Any suggestion or reference is greatly
appreciated. Thanks in advance.

Best Regards
Zhongmiao Wang

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] offset in coxph

2007-07-26 Thread Prof Brian Ripley
Removed?  That it was ever there is not my recollection and seems very 
unlikely given that survival is ported from S where glm() does not have 
it,

As far as I know it has only ever been in glm() and lm() in R: the way 
which is described in the White Book is to use the offset() function, and 
this is preferred (it works correctly for prediction, for example).  The 
function form is supported for coxph, and used in the test suite.

Please forget you ever knew 'offset' could be an argument, and use 
offset() in your formulae instead.


On Thu, 26 Jul 2007, Michael Gormley wrote:

 The offset argument used in glm and other functions seems to have been
 removed from the argument list for coxph.  I am wondering if there is a
 reason for this and if there is a possible work-around in order to produce a
 cox-ph object without fitting coefficients?

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating a cross table out of a large dataset

2007-07-26 Thread celine

Dear all,

I want to make a cross table out of a data set which is 2 columns wide and
more than 15 rows long. When I use the table() function I get an error
message

This is the code I have used:

Dataset - read.table(test.txt, header=TRUE, sep=,, na.strings=NA,
dec=., strip.white=TRUE) 

 .T -table(Dataset$K1,Dataset$K2) 

This is the error message I have received

Error in vector(integer, length) : vector size specified is too large 
In addition: Warning messages: 
1: NAs introduced by coercion 
2: NAs introduced by coercion 

Is it possible to make a cross table with the table() function on a large
dataset or should I consider using another function? I have had a look at
the ?table help file but I could find any information on the size of the
dataset.

Thanks very much in advance for any help:-)

Kind regards,
Céline.
-- 
View this message in context: 
http://www.nabble.com/Creating-a-cross-table-out-of-a-large-dataset-tf4153948.html#a11818590
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R CMD check sh: line 1: make: command not found

2007-07-26 Thread David Peltier
hello,

I am using R 2.5.0 under OS X.

I am having  sh: line 1: make: command not found error message when  
I run  R CMD check  :

Any help would be appreciated.

R CMD check backtest

* checking for working latex ... OK
* using log directory '/backtest/trunk/backtest.Rcheck'
* using R version 2.5.0 (2007-04-23)
* checking for file 'backtest/DESCRIPTION' ... OK
* checking extension type ... Package
* this is package 'backtest' version '0.2-0'
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking whether package 'backtest' can be installed ... OK
* checking package directory ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated  
dependencies ... OK
* checking whether the name space can be loaded with stated  
dependencies ... OK
* checking for unstated dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking data for non-ASCII characters ... OK
* creating backtest-Ex.R ... OK
* checking examples ... OK
* checking tests ...
sh: line 1: make: command not found
ERROR
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Function to separate effect in AOV

2007-07-26 Thread Greg Snow
You may want to look at the interaction function (a quick way to make the 
single factor with 4 levels that you mention).

You can create your own sets of contrasts and set them using the C or contrasts 
functions, then use the split argument to summary.aov to look at the individual 
degrees of freedom.

You may also be interested in the multcomp package for looking at the 
comparisons.

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Ronaldo Reis Junior
 Sent: Monday, July 23, 2007 4:05 PM
 To: R-Help
 Subject: [R] Function to separate effect in AOV
 
 Hi,
 
 I have a dummy question.
 
 Suppose that I have two explanatory variable, T1 (A, B) and 
 T2 (C, D) and one response variable.
 
  attach(dados)
 
  tapply(Y,list(T1,T2),mean)
  CD
 A 2.20 10.2
 B 2.22 20.26667
 
 In this case, A and B inside C have no difference, but 
 have differences inside D
 
 I make this model:
 
  m - aov(Y~T1*T2)
  
  summary(m)
 Df Sum Sq Mean Sq F valuePr(F)
 T1   1  76.36   76.36  5617.9 1.119e-12 ***
 T2   1 508.69  508.69 37426.7 5.704e-16 ***
 T1:T21  75.65   75.65  5566.0 1.161e-12 ***
 Residuals8   0.110.01  
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
 
 
 This result don't show the reality of the data, because I 
 cant see that A 
 and B inside C are the same.
 
 The anova result is the same of an full different levels, like this:
 
  attach(dados2)
  
  tapply(Y,list(T1,T2),mean)
  CD
 A 6.10 10.2
 B 2.22 20.26667
  
  m - aov(Y~T1*T2)
  
  summary(m)
 Df Sum Sq Mean Sq F valuePr(F)
 T1   1  28.74   28.74  2114.3 5.529e-11 ***
 T2   1 367.75  367.75 27056.7 2.088e-15 ***
 T1:T21 145.81  145.81 10728.1 8.433e-14 ***
 Residuals8   0.110.01  
 ---
 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
 
 In this case all level are different, C to D and A to B.
 
 The question is:
 
 The only way to find this real difference is:
 
 1) make T1 and T2 like a Treatment variable with 4 levels 
 (AC,BC,AD,BD)?
 
 or
 
 2) make 3 anova:
   a) Anova (A,B) inside C
   b) Anova (A,B) inside D
   c) Full factorial Anova (like this in the e-mail)
 
 or
 
 3) exist any other way to make this in only one analysis, to 
 find all differences e interactions? In other words, to find 
 differences in A 
 and B inside C, A and B inside D, C and D 
 inside A and C 
 and D inside B 
 
 Thanks
 Ronaldo
 --
  Prof. Ronaldo Reis Júnior
 |  .''`. UNIMONTES/Depto. Biologia Geral/Lab. de Ecologia
 | : :'  : Campus Universitário Prof. Darcy Ribeiro, Vila Mauricéia `. 
 | `'` CP: 126, CEP: 39401-089, Montes Claros - MG - Brasil
 |   `- Fone: (38) 3229-8187 | [EMAIL PROTECTED] | 
 | [EMAIL PROTECTED] http://www.ppgcb.unimontes.br/ | ICQ#: 5692561 | 
 | LinuxUser#: 205366
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Create Strings of Column Id's

2007-07-26 Thread Tom.O

Does anyone know how this is don?

I have a large matrix where I extract specific columns into txt files for
further use. To be able to keep track of which txt files contain which
columns I want to name the filenames with the column Id's.

The most basic example would be to use an for() loop together with paste(),
but the result is blank. Not even NULL.

this is the concept of thecode i use:

for example

MyMatrix - matrix(NA,ncol=4,nrow=1,dimnames=list(NULL,c(E,R,T,Y)))
COL - c(1,3) # a vector of columns I want to extract,

Filename - NULL # the starting variable, so I can use paste
Filename - for(i in colnames(MyMatrix)[COL]) {paste(Filename,-,i,sep=)}

The result is -T, but I want it to be -E-T

Anyone have a clue?

Thanks Tom


-- 
View this message in context: 
http://www.nabble.com/Create-Strings-of-Column-Id%27s-tf4153354.html#a11816439
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Large dataset + randomForest

2007-07-26 Thread Florian Nigsch
[Please CC me in any replies as I am not currently subscribed to the  
list. Thanks!]

Dear all,

I did a bit of searching on the question of large datasets but did  
not come to a definite conclusion. What I am trying to do is the  
following: I want to read in a dataset with approx. 100 000 rows and  
approx 150 columns. The file size is ~ 33MB, which one would deem not  
too big a file for R. To speed up the reading in of the file I do not  
use read.table but a loop that does reading with scan() into a buffer  
and some preprocessing and then adds the data into a dataframe.

When I then want to run randomForest() R complains that I cannot  
allocate a vector of size 313.0 MB. I am aware that randomForest  
needs all data in memory, but
1) why should that suddenly be 10 times the size of the data (I  
acknowedge the need for some internal data of R, but 10 times seems a  
bit too much) and
2) there is still physical memory free on the machine (in total 4GB  
available, even though R is limited to 2GB if I correctly remember  
the help pages - still 2GB should be enough!) - it doesn't seem to  
work either with changed settings done via mem.limits(), or run-time  
arguments --min-vsize --max-vsize - what do these have to be set to  
to work in my case??

  rf - randomForest(V1 ~ ., data=df[trainindices,], do.trace=5)
Error: cannot allocate vector of size 313.0 Mb
  object.size(df)/1024/1024
[1] 129.5390


Any help would be greatly appreciated,

Florian

--
Florian Nigsch [EMAIL PROTECTED]
Unilever Centre for Molecular Sciences Informatics
Department of Chemistry
University of Cambridge
http://www-mitchell.ch.cam.ac.uk/
Telephone: +44 (0)1223 763 073




[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] zeroinfl() or zicounts() error

2007-07-26 Thread Rachel Davidson
I'm trying to fit a zero-inflated poisson model using zeroinfl() from the
pscl library. It works fine for most models I try, but when I include either
of 2 covariates, I get an error.

When I include PopulationDensity, I get this error: Error in solve.default
(as.matrix(fit$hessian)) :system is computationally singular:
reciprocal condition number = 1.91306e-34

When I include BuildingArea, I get this error: Error in optim(fn =
loglikfun, par = c(start$count, start$zero, if (dist ==  :
non-finite finite-difference value [2]

 I tried fitting the models using zicounts in the zicounts library as well
and had the same difficulty.

 When I include PopulationDensity, it runs, but outputs only the parameter
estimates, not the standard errors or p-values (those have NaN).

When I include BuildingArea, I get this error: Error in
solve.default(z0$hessian)
: system is computationally singular: reciprocal condition number =
2.58349e-25

Can anyone suggest what it is about these 2 covariates that might be causing
the problem? I don't see any obvious problems with them. They are both
nonnegative with smooth probability distributions and no missing (NA)
values. The dataset has 3211 observations. It doesn't matter if there are
other covariates in the models or not. If one of these is included, I get
the errors.

Thanks!

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Redirecting print output

2007-07-26 Thread Greg Snow
You may want to look at the R2HTML package as one approach (others have
already told you about sink and cat).

Another approach is to use the variations on sweave.  Here you set up a
template file with the code you want run as well as any explanitory text
(you can even write an entire report), then process this with sweave and
the output will be included.  The original sweave works with LaTeX,
there is an HTML driver for sweave in the R2HTML package (so the source
and final documents are html) and there is an odfWeave package that lets
you create the template and output in a word processor (uses the
openoffice word processor, but since you can convert from and to Msword
from there, this is not much of a limitation).

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Stan Hopkins
 Sent: Monday, July 23, 2007 9:35 PM
 To: R help
 Subject: [R] Redirecting print output
 
 I see a rich set of graphic device functions to redirect that 
 output.  Are there commands to redirect text as well.  I have 
 a set of functions that execute many linear regression tests 
 serially and I want to capture this in a file for printing.
 
 Thanks,
 
 Stan Hopkins
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multiple graphs

2007-07-26 Thread Greg Snow
One of the nice things about the R Graph Gallery is that if you click on
the R logo underneath the graph (may need to scroll down a bit) it will
show you the code used to create that particular graph.

You may also want to look at the subplot function in the TeachingDemos
package for another way to add histograms to a plot:

Here is one possible example of this:

x - rep(1:10, each=25)
y - rexp(250, 1/x)

library(TeachingDemos)

tmp1 - hist(y, plot=FALSE)
r - range(tmp1$breaks)
w - diff(tmp1$breaks)
plot(x,y, type='n', xlim=c(0.5,10.5), ylim=r)
for(i in 1:10){
tmp2 - hist( y[x==i], breaks=tmp1$breaks, plot=FALSE )
subplot( barplot(tmp2$counts, ylim=r, width=w,
   horiz=TRUE, space=0, xaxt='n', yaxs='i'),
   c(i-0.45, i+.45), r
  )
}

points(x,y) # just to compare



Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Daniele Amberti
 Sent: Thursday, July 26, 2007 1:26 AM
 To: r-help
 Subject: [R] multiple graphs
 
 Does anyone have a simple explanation and example on how to 
 add histograms or barcharts to an other graph like in the 
 example at the R-graph gallery:
 
 http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=109
 
 looking at the code I'not undertand very well how to add 
 graphs in arbitrary/clever position with an adequate scale.
 
 If somebody have a simplier example with explanations it will 
 be highly appreciate.
 
 Best
 Daniele
 
 
 --
 Scegli infostrada: ADSL gratis per tutta l'estate e telefoni 
 senza canone Telecom
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with Dates

2007-07-26 Thread Gabor Grothendieck
Yes, I was using 1.05.  I get the same result as you with 1.08.

On 26 Jul 2007 11:39:41 -0400, Jeffrey J. Hallman [EMAIL PROTECTED] wrote:
 Are you using the latest version of fame?  1.05 and earlier had a bug in
 tisFromCsv that was fixed in 1.08.

 Below I show what I get with fame version 1.08.  There is still a problem in
 that the frequency-figuring logic appears to think the frequency is bwsunday
 (biweekly with weeks ending on Sunday) rather than semimonthly, which would
 appear to be a better fit.  That's why the 19860330 observation is getting
 filled in with NA's.

 Jeff

  Lines - Date  Price Open.Int. Comm.Long Comm.Short net.comm
 15-Jan-86 673.25175645 65910  2842537485
 31-Jan-86 677.00167350 54060  2712026940
 14-Feb-86 680.25157985 37955  2542512530
 28-Feb-86 691.75162775 49760  1603033730
 14-Mar-86 706.50163495 54120  2799526125
 31-Mar-86 709.75164120 54715  3039024325

 + + + + + + 
  boink - tisFromCsv(textConnection(Lines), dateFormat = %d-%b-%y, dateCol 
  = Date, sep = )
  boink
 $Price
   [,1]
 19860119 673.25
 19860202 677.00
 19860216 680.25
 19860302 691.75
 19860316 706.50
 19860330 NA
 19860413 709.75
 class: tis

 $Open.Int.
   [,1]
 19860119 175645
 19860202 167350
 19860216 157985
 19860302 162775
 19860316 163495
 19860330 NA
 19860413 164120
 class: tis

 $Comm.Long
  [,1]
 19860119 65910
 19860202 54060
 19860216 37955
 19860302 49760
 19860316 54120
 19860330NA
 19860413 54715
 class: tis

 $Comm.Short
  [,1]
 19860119 28425
 19860202 27120
 19860216 25425
 19860302 16030
 19860316 27995
 19860330NA
 19860413 30390
 class: tis

 $net.comm
  [,1]
 19860119 37485
 19860202 26940
 19860216 12530
 19860302 33730
 19860316 26125
 19860330NA
 19860413 24325
 class: tis


 Gabor Grothendieck [EMAIL PROTECTED] writes:

  On 26 Jul 2007 09:59:31 -0400, Jeffrey J. Hallman [EMAIL PROTECTED] wrote:
   zoo is nice.  'tisFromCsv()' in the fame package is nicer.
  
   Jeff
 
 
  1. What am I doing wrong here?  I only get one data column.
  2. I assume the regularized dates which do not exactly match the input ones
  are intended so as to make this a regularly spaced series.  Is that 
  right?
  3. What is the cause of the warning message?
  4. Why is a list returned with a single component containing the output?
  Thanks.
 
   library(fame)
   Lines -  Date  Price Open.Int. Comm.Long Comm.Short net.comm
  + 15-Jan-86 673.25175645 65910  2842537485
  + 31-Jan-86 677.00167350 54060  2712026940
  + 14-Feb-86 680.25157985 37955  2542512530
  + 28-Feb-86 691.75162775 49760  1603033730
  + 14-Mar-86 706.50163495 54120  2799526125
  + 31-Mar-86 709.75164120 54715  3039024325
  + 
   tisFromCsv(textConnection(Lines), dateFormat = %d-%b-%y, dateCol = 
   Date, sep = )
  [[1]]
 [,1]
  19860119 673.25
  19860202 677.00
  19860216 680.25
  19860302 691.75
  19860316 706.50
  19860330 709.75
  class: tis
 
  Warning message:
  number of items to replace is not a multiple of replacement length in:
  x[i] - value
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 --
 Jeff

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] significance test for difference of two correlations

2007-07-26 Thread Viechtbauer Wolfgang (STAT)
Let r_1 be the correlation between the two variables for the first group with 
n_1 subjects and let r_2 be the correlation for the second group with n_2 
subjects. Then a simple way to test H0: rho_1 = rho_2 is to convert r_1 and r_2 
via Fisher's variance stabilizing transformation ( z = 1/2 * ln[ (1+r)/(1-r)] ) 
and then calculate:

(z_1 - z_2) / sqrt( 1/(n_1 - 3) + 1/(n_2 - 3) )

which is (approximately) N(0,1) under H0. So, using alpha = .05, you can reject 
H0 if the absolute value of the test statistic above is larger than 1.96.

-- 
Wolfgang Viechtbauer
 Department of Methodology and Statistics
 University of Maastricht, The Netherlands
 http://www.wvbauer.com/



Original Message
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Timo Stolz Sent:
Thursday, July 26, 2007 16:13 To: r-help@stat.math.ethz.ch
Subject: [R] significance test for difference of two correlations

 Dear R users,
 
 how can I test, whether two correlations differ significantly. (I
 want to prove, that variables are correlated differently, depending
 on the group a person is in.)  
 
 Greetings from Freiburg im Breisgau (Germany),
 Timo Stolz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error in using R2WinBUGS on Ubuntu 6.10 Linux

2007-07-26 Thread meyerjp
I am trying to run WinBUGS 1.4 from the Ubuntu 6.10 Linux distribution. I am 
using the R2WinBUGS packages with the  source file listed below. WinBUGS 
appears to run properly, but I get the following message after WinBUGS starts 
in WINE. Does anyone know what may be causing this error and what the 
correction may be?

Thanks

ERROR MESSAGE:

fixme:ole:GetHGlobalFromILockBytes cbSize is 13824
err:ole:CoGetClassObject class {0003000a---c000-0046} not 
registered
err:ole:CoGetClassObject class {0003000a---c000-0046} not 
registered
err:ole:CoGetClassObject no class object {0003000a---c000-0046} 
could be created for context 0x3
fixme:keyboard:RegisterHotKey (0x10032,13,0x0002,3): stub
fixme:ntdll:RtlNtStatusToDosErrorNoTeb no mapping for 800a
err:ole:local_server_thread Failure during ConnectNamedPipe 317



R SOURCE FILE:

rm(list=ls(all=TRUE))

library(R2WinBUGS)

inits-function(){
list(alpha0 = 0, alpha1 = 0, alpha2 = 0, alpha12 = 0, sigma = 1)
}

data-list(r = c(10, 23, 23, 26, 17, 5, 53, 55, 32, 46, 10,   8, 10,   8, 23, 
0,  3, 22, 15, 32, 3),
n = c(39, 62, 81, 51, 39, 6, 74, 72, 51, 79, 13, 16, 30, 28, 45, 4, 12, 41, 30, 
51, 7),
x1 = c(0,   0,  0,   0,   0, 0,   0,   0,  0,   0,   0,  1,   1,   1,   1, 1,   
1,  1,   1,   1, 1),
x2 = c(0,   0,  0,   0,   0, 1,   1,   1,  1,   1,   1,  0,   0,   0,   0, 0,   
1,  1,   1,   1, 1),
N = 21)

test-bugs(data,inits,

model.file=/home/meyerjp/rasch/test.bug,

parameters=c(alpha0,alpha1,alpha12,alpha2,sigma),

n.chains=2,n.iter=1,n.burnin=1000,

bugs.directory=/home/meyerjp/.wine/drive_c/Program Files/WinBUGS14/,
working.directory=/home/meyerjp/rasch/working,

debug=FALSE,
WINEPATH=/usr/bin/winepath,
newWINE=TRUE)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] significance test for difference of two correlations

2007-07-26 Thread Timo Stolz
Dear R users,

how can I test, whether two correlations differ significantly. (I want to 
prove, that variables are correlated differently, depending on the group a 
person is in.)

Greetings from Freiburg im Breisgau (Germany),
Timo Stolz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lmer and scale parameters....

2007-07-26 Thread orzack
I'm using lmer to fit mixed-effect logistic regression models. This 
is for a small data set.

First, I fit a constant:

Generalized linear mixed model fit using Laplace
Formula: propm ~ (1 | study)
Data: inducedSR71507.dat
  Family: binomial(logit link)
AIC   BIC logLik deviance
  183.7 189.4 -89.84179.7
Random effects:
  Groups NameVariance Std.Dev.
  study  (Intercept) 0.035812 0.18924
number of obs: 127, groups: study, 21

Estimated scale (compare to  1 )  1.028571

Fixed effects:
 Estimate Std. Error z value Pr(|z|) 
(Intercept)  0.112560.04979   2.261   0.0238 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

So far, so good.

Next, I fit a model with a fixed effect:

Generalized linear mixed model fit using Laplace
Formula: propm ~ 1 + c.age + (1 | study)
Data: inducedSR71507.dat
  Family: binomial(logit link)
   AIC  BIC logLik deviance
  5339 5348  -2667 5333
Random effects:
  Groups NameVariance Std.Dev.
  study  (Intercept) 0.44094  0.66404
number of obs: 127, groups: study, 21

Estimated scale (compare to  1 )  314587114

Fixed effects:
 Estimate Std. Error z value Pr(|z|)
(Intercept) 0.058093   1.033273 0.056220.955
c.age   0.007262   0.095393 0.076130.939

That is one heck of a large scale parameter!

I would be glad to be shown what I am doing wrong, but I am thinking 
that this is a bug..

study is entered as a factor in the data frame.

here is the session info

  sessionInfo()
R version 2.5.1 (2007-06-27)
i386-apple-darwin8.9.1

locale:
en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4stats graphics  grDevices utils 
datasets  methods   base

other attached packages:
   mlmRev lme4 MASS   Matrix  lattice nlme
0.995-1  0.99875-4 7.2-34 0.999375-00.15-11 3.1-83


Any and all help is very much appreciated!


-- 
Steven Orzack

The Fresh Pond Research Institute
173 Harvey Street
Cambridge, MA. 02140
617 864-4307

www.freshpond.org
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with Dates

2007-07-26 Thread Gabor Grothendieck
On 26 Jul 2007 09:59:31 -0400, Jeffrey J. Hallman [EMAIL PROTECTED] wrote:
 zoo is nice.  'tisFromCsv()' in the fame package is nicer.

 Jeff


1. What am I doing wrong here?  I only get one data column.
2. I assume the regularized dates which do not exactly match the input ones
are intended so as to make this a regularly spaced series.  Is that right?
3. What is the cause of the warning message?
4. Why is a list returned with a single component containing the output?
Thanks.

 library(fame)
 Lines -  Date  Price Open.Int. Comm.Long Comm.Short net.comm
+ 15-Jan-86 673.25175645 65910  2842537485
+ 31-Jan-86 677.00167350 54060  2712026940
+ 14-Feb-86 680.25157985 37955  2542512530
+ 28-Feb-86 691.75162775 49760  1603033730
+ 14-Mar-86 706.50163495 54120  2799526125
+ 31-Mar-86 709.75164120 54715  3039024325
+ 
 tisFromCsv(textConnection(Lines), dateFormat = %d-%b-%y, dateCol = Date, 
 sep = )
[[1]]
   [,1]
19860119 673.25
19860202 677.00
19860216 680.25
19860302 691.75
19860316 706.50
19860330 709.75
class: tis

Warning message:
number of items to replace is not a multiple of replacement length in:
x[i] - value

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] substituting dots in the names of the columns (sub, gsub, regexpr)

2007-07-26 Thread Felix Andrews

Hi,

A dot in a regular expression matches any character, so you have to
escape each dot with backslash \\ (which itself is escaped in the
string, to confuse things...).
A plus symbol will match one or more of the preceding characters.
A dollar symbol will match the end of a string.

So:

gsub(\\.$, , gsub(\\.+, ., str))
[1] y.mBD.g.cm3   PR.Mpa Ks.m.s SP.g.g
P.m3.m3theta1.g.g
[8] theta2.g.g AWC.g.g

Learn more at ?regexp

Felix


On 7/26/07, 8rino-Luca Pantani [EMAIL PROTECTED] wrote:

Dear R users,
I have the following two problems, related to the function sub, grep,
regexpr and similia.

The header of the file(s) I have to import is like this.

c(y (m), BD (g/cm3), PR (Mpa), Ks (m/s), SP g./g., P
(m3/m3), theta1 (g/g), theta2 (g/g), AWC (g/g))

To get rid of spaces and symbols in the names of the columns,
I use read.table(... check.names=TRUE) and I get:
str - c(y..m., BD..g.cm3., PR..Mpa., Ks..m.s., SP.g..g.,
P..m3.m3., theta1..g.g., theta2..g.g., AWC..g.g.)

Now, my problem is to remove the trailing dots, as well as the double
dots, in order to get the names like the following
c(y.m, BD.g.cm3, PR.Mpa, Ks.m.s, SP.g.g, P.m3.m3.,
theta1.g.g, theta2.g.g, AWC.g.g)

I've searched the help pages for sub, regexpr and similia, and also
searched the help archives.
I understand that the dot is a peculiar sign since
sub(.., ., str)
[1] ..m....g.cm3.   ...Mpa. ...m.s. ..g..g.
[6] ..m3.m3..eta1..g.g. .eta2..g.g. .C..g.g.

Therefore I tried
sub(\\.., ., str)
[1] y.m.BD.g.cm3.   PR.Mpa. Ks.m.s. SP...g.
[6] P.m3.m3.theta1.g.g. theta2.g.g. AWC.g.g.
and I've been surprised by the (to me) strange behaviour in SP.g..g.
modified in SP...g.
An this is the first problem I cannot solve.

Then there's the problem of trailing dot removal.
In
http://tolstoy.newcastle.edu.au/R/e2/help/07/01/8665.html
I've found a somewhat similar problem, but it do not works in this case
since:
gsub([.].*, , str)
[1] y  BD PR Ks SP P  theta1 theta2
[9] AWC
And this the second problem

Apart this particular problems I would like to know more on regexp, sub
and so on, since each time
I have strings to manipulate, I must face my ignorance in the topic of
regular expression and its syntax.

Is there any page with examples, where I can improve my knowledge and
stop being frustrated each time I have to manipulate strings?

8rino

--
Ottorino-Luca Pantani, Università di Firenze
Dip. Scienza del Suolo e Nutrizione della Pianta
P.zle Cascine 28 50144 Firenze Italia
Tel 39 055 3288 202 (348 lab) Fax 39 055 333 273
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Felix Andrews / 安福立
PhD candidate
Integrated Catchment Assessment and Management Centre
The Fenner School of Environment and Society
The Australian National University (Building 48A), ACT 0200
Beijing Bag, Locked Bag 40, Kingston ACT 2604
http://www.neurofractal.org/felix/
voice:+86_1051404394 (in China)
mobile:+86_13522529265 (in China)
mobile:+61_410400963 (in Australia)
xmpp:[EMAIL PROTECTED]
3358 543D AAC6 22C2 D336  80D9 360B 72DD 3E4C F5D8

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Average plan

2007-07-26 Thread Nok Noy

Hello, 

I'm looking for a method to compute an average plan from 4 or 5 point in an
cartesian space. I'm sure It can be done using a less-square method but
maybe it a function already exist in R system to get this plan. 
Can somebody help me to solve this problem (I'm looking on the net for hours
but didn't find anything realy satisfiying me)
Thanks


-- 
View this message in context: 
http://www.nabble.com/Average-plan-tf4151900.html#a11811324
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with Dates

2007-07-26 Thread Jeffrey J. Hallman
zoo is nice.  'tisFromCsv()' in the fame package is nicer.

Jeff

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Odp: multiple graphs

2007-07-26 Thread Petr PIKAL
Hi

this particular graph is a combination of several approaches
see

layout # how to split plot window (or ?split)
par(new=TRUE) # how to plot several times to the same window without 
erasing previous plot

and of course sophisticated use of all other stuff which is available in 
R.

See also

par(fig=...)

plot(1:10)
par(fig=c(0.1,.5,0.1,.5), new=T)
boxplot(rnorm(10))

Petr

[EMAIL PROTECTED] napsal dne 26.07.2007 09:26:16:

 Does anyone have a simple explanation and example on how to add 
histograms or 
 barcharts to an other graph like in the example at the R-graph gallery:
 
 http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=109
 
 looking at the code I'not undertand very well how to add graphs in 
 arbitrary/clever position with an adequate scale.
 
 If somebody have a simplier example with explanations it will be highly 
appreciate.
 
 Best
 Daniele
 
 
 --
 Scegli infostrada: ADSL gratis per tutta l’estate e telefoni senza 
canone Telecom
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] substituting dots in the names of the columns (sub, gsub, regexpr)

2007-07-26 Thread 8rino-Luca Pantani
Dear R users,
I have the following two problems, related to the function sub, grep, 
regexpr and similia.

The header of the file(s) I have to import is like this.

c(y (m), BD (g/cm3), PR (Mpa), Ks (m/s), SP g./g., P 
(m3/m3), theta1 (g/g), theta2 (g/g), AWC (g/g))

To get rid of spaces and symbols in the names of the columns,
I use read.table(... check.names=TRUE) and I get:
str - c(y..m., BD..g.cm3., PR..Mpa., Ks..m.s., SP.g..g., 
P..m3.m3., theta1..g.g., theta2..g.g., AWC..g.g.)

Now, my problem is to remove the trailing dots, as well as the double 
dots, in order to get the names like the following
c(y.m, BD.g.cm3, PR.Mpa, Ks.m.s, SP.g.g, P.m3.m3., 
theta1.g.g, theta2.g.g, AWC.g.g)

I've searched the help pages for sub, regexpr and similia, and also 
searched the help archives.
I understand that the dot is a peculiar sign since
sub(.., ., str)
[1] ..m....g.cm3.   ...Mpa. ...m.s. ..g..g.   
[6] ..m3.m3..eta1..g.g. .eta2..g.g. .C..g.g.  

Therefore I tried
sub(\\.., ., str)
[1] y.m.BD.g.cm3.   PR.Mpa. Ks.m.s. SP...g.   
[6] P.m3.m3.theta1.g.g. theta2.g.g. AWC.g.g.  
and I've been surprised by the (to me) strange behaviour in SP.g..g. 
modified in SP...g.
An this is the first problem I cannot solve.

Then there's the problem of trailing dot removal.
In
http://tolstoy.newcastle.edu.au/R/e2/help/07/01/8665.html
I've found a somewhat similar problem, but it do not works in this case 
since:
gsub([.].*, , str)
[1] y  BD PR Ks SP P  theta1 theta2
[9] AWC   
And this the second problem

Apart this particular problems I would like to know more on regexp, sub 
and so on, since each time
I have strings to manipulate, I must face my ignorance in the topic of 
regular expression and its syntax.

Is there any page with examples, where I can improve my knowledge and 
stop being frustrated each time I have to manipulate strings?

8rino

-- 
Ottorino-Luca Pantani, Università di Firenze
Dip. Scienza del Suolo e Nutrizione della Pianta
P.zle Cascine 28 50144 Firenze Italia
Tel 39 055 3288 202 (348 lab) Fax 39 055 333 273 
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression

2007-07-26 Thread Frank E Harrell Jr
Mary,

The 10-group approach results in a low-resolution and fairly arbitrary 
calibration curve.  Also, it is the basis of the original 
Hosmer-Lemeshow goodness of fit statistic which has been superceded by 
the Hosmer et al single degree of freedom GOF test that does not require 
any binning.  The Design package handles both.  Do ?calibrate.lrm, 
?residuals.lrm, ?lrm for details.

Frank Harrell


Sullivan, Mary M wrote:
 Greetings,
  
 
 I am working on a logistic regression model in R and I am struggling with the 
 code, as it is a relatively new program for me.  In searching Google for 
 'logistic regression diagnostics' I came Elizabeth Brown's Lecture 14 from 
 her Winter 2004 Biostatistics 515 course  
 (http://courses.washington.edu/b515/l14.pdf) .  I found most of the code to 
 be very helpful, but I am struggling with the lines on to calculate the 
 observed and expected values in the 10 groups created by the cut function.  I 
 get error messages in trying to create the E and O matrices:  R won't accept 
 assignment of fi1c==j and it won't calculate the sum.  
 
  
 
 I am wondering whether someone might be able to offer me some assistance...my 
 search of the archives was not fruitful.
 
  
 
 Here is the code that I adapted from the lecture notes:
 
  
 
 fit - fitted(glm.lyme)
 
 fitc - cut(fit, br = c(0, quantile(fit, p = seq(.1, .9, .1)),1))  
 
 t-table(fitc)
 
 fitc - cut(fit, br = c(0, quantile(fit, p = seq(.1, .9, .1)), 1), labels = F)
 
 t-table(fitc)
 
  
 
 #Calculate observed and expected values in ea group
 
 E - matrix(0, nrow=10, ncol = 2)
 
 O - matrix(0, nrow=10, ncol=2)
 
 for (j in 1:10) {
 
   E[j, 2] = sum(fit[fitc==j])
 
   E[j, 1] = sum((1- fit)[fitc==j])
 
   O[j, 2] = sum(pcdata$lymdis[fitc==j])
 
   O[j, 1] = sum((1-pcdata$lymdis)[fitc==j])
 
 
 
 }
 
  
 
 Here is the error message:  Error in Summary.factor(..., na.rm = na.rm) : 
 sum not meaningful for factors
 
  
 
  
 
 I understand what it means; I just can't figure out how to get around it or 
 how to get the output printed in table form.  Thank you in advance for any 
 assistance.
 
  
 
 Mary Sullivan
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ROC curve in R

2007-07-26 Thread Rithesh M. Mohan
Hi,

 

I need to build ROC curve in R, can you please provide data steps / code
or guide me through it.

 

Thanks and Regards

Rithesh M Mohan


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convert string to list?

2007-07-26 Thread Ross Darnell
Is this what your want?


as.vector(unlist(strsplit(str,,)),mode=list)


Ross Darnell

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Manuel Morales
Sent: Friday, 27 July 2007 10:39 AM
To: r-help
Subject: [R] Convert string to list?

Let's say I have the following string:

str - P = 0.0, T = 0.0, Q = 0.0

I'd like to find a function that generates the following object from
'str'.

list(P = 0.0, T = 0.0, Q = 0.0)

Thanks!

-- 
http://mutualism.williams.edu

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating windows binary R package (PowerArchiver vs. zip -r9X)

2007-07-26 Thread Tao Shi
Hi list,I apologize if you see funny fonts, b/c I'm using the new Windows Live 
Hotmail and don't know how to turn off the rich text mode.I have 
successfully built and installed a R package in windowsXP for R-2.5.1.  But 
when I tried to create a .zip file so I can use Packages/install package(s) 
from local .zip files... to install it, it seems R only recognizes the .zip 
file created by zip -r9X not by PowerArchiver.  Do you know why?  I vaguely 
remember I used WinZip before and it worked fine.The two threads I found on 
R-help and R-devel help me a lot, but don't really answer my 
question.http://tolstoy.newcastle.edu.au/R/help/06/06/29587.htmlhttp://tolstoy.newcastle.edu.au/R/devel/05/12/3336.htmlThanks,...Tao



_
Missed the show?  Watch videos of the Live Earth Concert on MSN.

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] princomp error

2007-07-26 Thread Bricklemyer, Ross S
I am attempting to run principal components analysis on a dataset of
spectral reflectance (6 decimal places).  I imported the data using
read.table and there are both column and row headers.  When I run
princomp I receive the following error:

 

Error in cov.wt(z) : 'x' must contain finite values only

 

Where am I going wrong?

 

Ross

 

***
Ross Bricklemyer
Dept. of Crop and Soil Sciences
Washington State University
291D Johnson Hall
PO Box 646420
Pullman, WA 99164-6420
Work: 509.335.3661
Cell/Home: 406.570.8576
Fax: 509.335.8674
Email: [EMAIL PROTECTED]







[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] zeroinfl() or zicounts() error

2007-07-26 Thread Achim Zeileis
On Thu, 26 Jul 2007, Rachel Davidson wrote:

 I'm trying to fit a zero-inflated poisson model using zeroinfl() from the
 pscl library. It works fine for most models I try, but when I include either
 of 2 covariates, I get an error.

 When I include PopulationDensity, I get this error: Error in solve.default
 (as.matrix(fit$hessian)) :system is computationally singular:
 reciprocal condition number = 1.91306e-34

 When I include BuildingArea, I get this error: Error in optim(fn =
 loglikfun, par = c(start$count, start$zero, if (dist ==  :
 non-finite finite-difference value [2]

Might be due to some close to linear dependencies in your regressor
matrix...

  I tried fitting the models using zicounts in the zicounts library as well
 and had the same difficulty.

If I recall correctly, zicounts() usses a very similar type of
optimization compared to zeroinfl(), hence the similar problems.

  When I include PopulationDensity, it runs, but outputs only the parameter
 estimates, not the standard errors or p-values (those have NaN).

This is due to the same problem as above for zeroinfl(), the Hessian
matrix is (close to) singular.

 When I include BuildingArea, I get this error: Error in
 solve.default(z0$hessian)
 : system is computationally singular: reciprocal condition number =
 2.58349e-25

 Can anyone suggest what it is about these 2 covariates that might be causing
 the problem? I don't see any obvious problems with them. They are both
 nonnegative with smooth probability distributions and no missing (NA)
 values. The dataset has 3211 observations. It doesn't matter if there are
 other covariates in the models or not. If one of these is included, I get
 the errors.

Even if you include just one covariate and nothing else?
  zeroinfl(y ~ PopulationDensity, data = ...)

Z

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Large dataset + randomForest

2007-07-26 Thread Kuhn, Max
Florian,

The first thing that you should change is how you call randomForest.
Instead of specifying the model via a formula, use the randomForest(x,
y) interface.

When a formula is used, there is a terms object created so that a model
matrix can be created for these and future observations. That terms
object can get big (I think it would be a matrix of size 151 x 150) and
is diagonal. 

That might not solve it, but it should help.

Max

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Florian Nigsch
Sent: Thursday, July 26, 2007 2:07 PM
To: r-help@stat.math.ethz.ch
Subject: [R] Large dataset + randomForest

[Please CC me in any replies as I am not currently subscribed to the  
list. Thanks!]

Dear all,

I did a bit of searching on the question of large datasets but did  
not come to a definite conclusion. What I am trying to do is the  
following: I want to read in a dataset with approx. 100 000 rows and  
approx 150 columns. The file size is ~ 33MB, which one would deem not  
too big a file for R. To speed up the reading in of the file I do not  
use read.table but a loop that does reading with scan() into a buffer  
and some preprocessing and then adds the data into a dataframe.

When I then want to run randomForest() R complains that I cannot  
allocate a vector of size 313.0 MB. I am aware that randomForest  
needs all data in memory, but
1) why should that suddenly be 10 times the size of the data (I  
acknowedge the need for some internal data of R, but 10 times seems a  
bit too much) and
2) there is still physical memory free on the machine (in total 4GB  
available, even though R is limited to 2GB if I correctly remember  
the help pages - still 2GB should be enough!) - it doesn't seem to  
work either with changed settings done via mem.limits(), or run-time  
arguments --min-vsize --max-vsize - what do these have to be set to  
to work in my case??

  rf - randomForest(V1 ~ ., data=df[trainindices,], do.trace=5)
Error: cannot allocate vector of size 313.0 Mb
  object.size(df)/1024/1024
[1] 129.5390


Any help would be greatly appreciated,

Florian

--
Florian Nigsch [EMAIL PROTECTED]
Unilever Centre for Molecular Sciences Informatics
Department of Chemistry
University of Cambridge
http://www-mitchell.ch.cam.ac.uk/
Telephone: +44 (0)1223 763 073




[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ROC curve in R

2007-07-26 Thread Dylan Beaudette
On Thursday 26 July 2007 06:01, Frank E Harrell Jr wrote:
 Note that even though the ROC curve as a whole is an interesting
 'statistic' (its area is a linear translation of the
 Wilcoxon-Mann-Whitney-Somers-Goodman-Kruskal rank correlation
 statistics), each individual point on it is an improper scoring rule,
 i.e., a rule that is optimized by fitting an inappropriate model.  Using
 curves to select cutoffs is a low-precision and arbitrary operation, and
 the cutoffs do not replicate from study to study.  Probably the worst
 problem with drawing an ROC curve is that it tempts analysts to try to
 find cutoffs where none really exist, and it makes analysts ignore the
 whole field of decision theory.

 Frank Harrell

Frank,

This thread has caught may attention for a couple reasons, possibly related to 
my novice-level experience. 

1. in a logistic regression study, where i am predicting the probability of 
the response being 1 (for example) - there exists a continuum of probability 
values - and a finite number of {1,0} realities when i either look within the 
original data set, or with a new 'verification' data set. I understand that 
drawing a line through the probabilities returned from the logistic 
regression is a loss of information, but there are times when a 'hard' 
decision requiring prediction of {1,0} is required. I have found that the 
ROCR package (not necessarily the ROC Curve) can be useful in identifying the 
probability cutoff where accuracy is maximized. Is this an unreasonable way 
of using logistic regression as a predictor? 

2. The ROC curve can be a helpful way of communicating false positives / false 
negatives to other users who are less familiar with the output and 
interpretation of logistic regression. 


3. I have been using the area under the ROC Curve, kendall's tau, and cohen's 
kappa to evaluate the accuracy of a logistic regression based prediction, the 
last two statistics based on a some probability cutoff identified before 
hand. 


How does the topic of decision theory relate to some of the circumstances 
described above? Is there a better way to do some of these things?

Cheers,

Dylan




 [EMAIL PROTECTED] wrote:
  http://search.r-project.org/cgi-bin/namazu.cgi?query=ROCmax=20result=no
 rmalsort=scoreidxname=Rhelp02aidxname=functionsidxname=docs
 
  there is a lot of help try help.search(ROC curve) gave
  Help files with alias or concept or title matching 'ROC curve' using
  fuzzy matching:
 
 
 
  granulo(ade4) Granulometric Curves
  plot.roc(analogue)Plot ROC curves and associated
  diagnostics
  roc(analogue) ROC curve analysis
  colAUC(caTools)   Column-wise Area Under ROC
  Curve (AUC)
  DProc(DPpackage)  Semiparametric Bayesian ROC
  curve analysis
  cv.enet(elasticnet)   Computes K-fold cross-validated
  error curve for elastic net
  ROC(Epi)  Function to compute and draw
  ROC-curves.
  lroc(epicalc) ROC curve
  cv.lars(lars) Computes K-fold cross-validated
  error curve for lars
  roc.demo(TeachingDemos)   Demonstrate ROC curves by
  interactively building one
 
  HTH
  see the help and examples those will suffice
 
  Type 'help(FOO, package = PKG)' to inspect entry 'FOO(PKG) TITLE'.
 
 
 
  Regards,
 
  Gaurav Yadav
  +++
  Assistant Manager, CCIL, Mumbai (India)
  Mob: +919821286118 Email: [EMAIL PROTECTED]
  Bhagavad Gita:  Man is made by his Belief, as He believes, so He is
 
 
 
  Rithesh M. Mohan [EMAIL PROTECTED]
  Sent by: [EMAIL PROTECTED]
  07/26/2007 11:26 AM
 
  To
  R-help@stat.math.ethz.ch
  cc
 
  Subject
  [R] ROC curve in R
 
 
 
 
 
 
  Hi,
 
 
 
  I need to build ROC curve in R, can you please provide data steps / code
  or guide me through it.
 
 
 
  Thanks and Regards
 
  Rithesh M Mohan
 
 
   [[alternative HTML version deleted]]

 -
 Frank E Harrell Jr   Professor and Chair   School of Medicine
   Department of Biostatistics   Vanderbilt University

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] significance test for difference of two correlations

2007-07-26 Thread Gabor Grothendieck
There is R code for both the Fisher transform and the corresponding bootstrap
procedure in the vignette for the proto package:
http://cran.r-project.org/doc/vignettes/proto/proto.pdf

On 7/26/07, Viechtbauer Wolfgang (STAT)
[EMAIL PROTECTED] wrote:
 Let r_1 be the correlation between the two variables for the first group with 
 n_1 subjects and let r_2 be the correlation for the second group with n_2 
 subjects. Then a simple way to test H0: rho_1 = rho_2 is to convert r_1 and 
 r_2 via Fisher's variance stabilizing transformation ( z = 1/2 * ln[ 
 (1+r)/(1-r)] ) and then calculate:

 (z_1 - z_2) / sqrt( 1/(n_1 - 3) + 1/(n_2 - 3) )

 which is (approximately) N(0,1) under H0. So, using alpha = .05, you can 
 reject H0 if the absolute value of the test statistic above is larger than 
 1.96.

 --
 Wolfgang Viechtbauer
 Department of Methodology and Statistics
 University of Maastricht, The Netherlands
 http://www.wvbauer.com/



 Original Message
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Timo Stolz Sent:
 Thursday, July 26, 2007 16:13 To: r-help@stat.math.ethz.ch
 Subject: [R] significance test for difference of two correlations

  Dear R users,
 
  how can I test, whether two correlations differ significantly. (I
  want to prove, that variables are correlated differently, depending
  on the group a person is in.)
 
  Greetings from Freiburg im Breisgau (Germany),
  Timo Stolz

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error in using R2WinBUGS on Ubuntu 6.10 Linux

2007-07-26 Thread Uwe Ligges


[EMAIL PROTECTED] wrote:
 I am trying to run WinBUGS 1.4 from the Ubuntu 6.10 Linux distribution. I am 
 using the R2WinBUGS packages with the  source file listed below. WinBUGS 
 appears to run properly, but I get the following message after WinBUGS starts 
 in WINE. Does anyone know what may be causing this error and what the 
 correction may be?
 
 Thanks
 
 ERROR MESSAGE:
 
 fixme:ole:GetHGlobalFromILockBytes cbSize is 13824
 err:ole:CoGetClassObject class {0003000a---c000-0046} not 
 registered
 err:ole:CoGetClassObject class {0003000a---c000-0046} not 
 registered
 err:ole:CoGetClassObject no class object 
 {0003000a---c000-0046} could be created for context 0x3
 fixme:keyboard:RegisterHotKey (0x10032,13,0x0002,3): stub
 fixme:ntdll:RtlNtStatusToDosErrorNoTeb no mapping for 800a
 err:ole:local_server_thread Failure during ConnectNamedPipe 317


This is wine, not R2WinBUGS nor WinBUGS nor R, I fear, and the fixme: 
sounds promising that things go away in a more recent version of wine...

Uwe Ligges

 
 
 R SOURCE FILE:
 
 rm(list=ls(all=TRUE))
 
 library(R2WinBUGS)
 
 inits-function(){
   list(alpha0 = 0, alpha1 = 0, alpha2 = 0, alpha12 = 0, sigma = 1)
 }
 
 data-list(r = c(10, 23, 23, 26, 17, 5, 53, 55, 32, 46, 10,   8, 10,   8, 23, 
 0,  3, 22, 15, 32, 3),
 n = c(39, 62, 81, 51, 39, 6, 74, 72, 51, 79, 13, 16, 30, 28, 45, 4, 12, 41, 
 30, 51, 7),
 x1 = c(0,   0,  0,   0,   0, 0,   0,   0,  0,   0,   0,  1,   1,   1,   1, 1, 
   1,  1,   1,   1, 1),
 x2 = c(0,   0,  0,   0,   0, 1,   1,   1,  1,   1,   1,  0,   0,   0,   0, 0, 
   1,  1,   1,   1, 1),
 N = 21)
 
 test-bugs(data,inits,
 
 model.file=/home/meyerjp/rasch/test.bug,
 
 parameters=c(alpha0,alpha1,alpha12,alpha2,sigma),
 
 n.chains=2,n.iter=1,n.burnin=1000,
 
 bugs.directory=/home/meyerjp/.wine/drive_c/Program Files/WinBUGS14/,
 working.directory=/home/meyerjp/rasch/working,
 
 debug=FALSE,
 WINEPATH=/usr/bin/winepath,
 newWINE=TRUE)
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate.ts

2007-07-26 Thread Achim Zeileis
Jeff,

I'm really not a fan of subjective mine is bigger than yours
discussions. Just three comments that I try to keep as objective as
possible.

 Bottom line: use 'tis' series from the fame package, or 'zoo` stuff from
 Gabor's zoo package.

The last time I checked
  packageDescription(zoo)$Author
had more than one entry.

 As the author of the fame package, I hope you'll excuse
 me for asserting that the 'tis' class is easier to understand and use than the
 zoo stuff,

That surely depends on the user and the task he has to do...

 which takes a more general approach.  Some day Gabor or I or some
 other enterprising soul should try combining the best ideas from zoo and fame
 into a package that is better than either one.

I think combination should be straightforward: zoo is general enough to
allow for time indexes of class ti. Overall, ti seems to be
well-written and only some methods might need to be added/improved to
cooperate fully with zoo. Maybe some of the functionality that is
currently available for tis but is not available for all conceivalbe
zoo+arbitrary_index objects might be special cased for zoo+ti or zooreg or
zooreg+ti etc.

Best,
Z

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem installing tseries package

2007-07-26 Thread Michael Cassin
Hi,

I'm running R 2.4.1 on Fedora Core 6 and am unable to install the tseries
package.  I've resolved a few problems getting to this point, by running a
yum update, installing the gcc-gfortran dependency, but now I'm stuck.
Could someone please point me in the right direction?


R install.packages output ===
==

install.packages(tseries)

trying URL '
http://www.sourcekeg.co.uk/cran/src/contrib/tseries_0.10-11.tar.gz'
Content type 'application/x-tar' length 182043 bytes
opened URL
==
downloaded 177Kb

* Installing *source* package 'tseries' ...
** libs
gcc -I/usr/lib/R/include -I/usr/lib/R/include  -I/usr/local/include
-fpic  -O3 -g -std=gnu99 -c arma.c -o arma.o
gcc -I/usr/lib/R/include -I/usr/lib/R/include  -I/usr/local/include
-fpic  -O3 -g -std=gnu99 -c bdstest.c -o bdstest.o
gcc -I/usr/lib/R/include -I/usr/lib/R/include  -I/usr/local/include
-fpic  -O3 -g -std=gnu99 -c boot.c -o boot.o
gfortran   -fpic  -O2 -g -c dsumsl.f -o dsumsl.o
 In file dsumsl.f:450

  IF (IV(1) - 2) 30, 40, 50
   1
Warning: Obsolete: arithmetic IF statement at (1)
 In file dsumsl.f:3702

   10 ASSIGN 30 TO NEXT
   1
Warning: Obsolete: ASSIGN statement at (1)
 In file dsumsl.f:3707

   20GO TO NEXT,(30, 50, 70, 110)
  1
Warning: Obsolete: Assigned GOTO statement at (1)
 In file dsumsl.f:3709

  ASSIGN 50 TO NEXT
   1
Warning: Obsolete: ASSIGN statement at (1)
 In file dsumsl.f:3718

  ASSIGN 70 TO NEXT
   1
Warning: Obsolete: ASSIGN statement at (1)
 In file dsumsl.f:3724

  ASSIGN 110 TO NEXT
   1
Warning: Obsolete: ASSIGN statement at (1)
 In file dsumsl.f:4552

  IF (IV(1) - 2) 999, 30, 70
   1
Warning: Obsolete: arithmetic IF statement at (1)
 In file dsumsl.f:4714

  IF (IRC) 140, 100, 210
   1
Warning: Obsolete: arithmetic IF statement at (1)
gcc -I/usr/lib/R/include -I/usr/lib/R/include  -I/usr/local/include
-fpic  -O3 -g -std=gnu99 -c garch.c -o garch.o
gcc -I/usr/lib/R/include -I/usr/lib/R/include  -I/usr/local/include
-fpic  -O3 -g -std=gnu99 -c ppsum.c -o ppsum.o
gcc -I/usr/lib/R/include -I/usr/lib/R/include  -I/usr/local/include
-fpic  -O3 -g -std=gnu99 -c tsutils.c -o tsutils.o
gcc -shared -Bdirect,--hash-stype=both,-Wl,-O1 -o tseries.so arma.o
bdstest.o boot.o dsumsl.o garch.o ppsum.o tsutils.o -L/usr/lib/R/lib -lRblas
-lgfortran -lm -lgcc_s -lgfortran -lm -lgcc_s -L/usr/lib/R/lib -lR
/usr/bin/ld: skipping incompatible /usr/lib/R/lib/libRblas.so when searching
for -lRblas
/usr/bin/ld: skipping incompatible /usr/lib/R/lib/libRblas.so when searching
for -lRblas
/usr/bin/ld: cannot find -lRblas
collect2: ld returned 1 exit status
make: *** [tseries.so] Error 1
ERROR: compilation failed for package 'tseries'
** Removing '/usr/lib/R/library/tseries'


=
=


I presume the priority is addressing the error: /usr/bin/ld: cannot find
-lRblas

I have the libRblas.so file with R 2.4. Do I need to upgrade to R 2.5 - In
which case I'll be asking how to fix the problems I'm having doing that  ;)

[~]# yum provides libRblas.so
snip

R.x86_64 2.5.1-2.fc6extras
Matched from:
/usr/lib64/R/lib/libRblas.so
libRblas.so()(64bit)

R.x86_64 2.5.1-2.fc6extras
Matched from:
/usr/lib64/R/lib/libRblas.so
libRblas.so()(64bit)

R.i386   2:2.4.1-1.fc6  installed
Matched from:
/usr/lib/R/lib/libRblas.so
libRblas.so

R.x86_64 2.4.1-4.fc6installed
Matched from:
/usr/lib64/R/lib/libRblas.so
libRblas.so()(64bit)


Regards,
Mike

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dispersion_parameter_GLMM's

2007-07-26 Thread Senor_Felix

I agree with David. A dispersion parameter of 25 suggests that you have
mainly 0's in your data set and your model is not adequate. Perhabs you
should dichotomize your data in 0 and 1's and use a logistic mixed model but
be aware of small numbers of events. 


That amount of overdispersion would make the use of a poisson model
very questionable, and will very likely result in estimated standard
errors that are too low, hence the change in statistical significance
when you switch to quasipoisson.

O
-- 
View this message in context: 
http://www.nabble.com/dispersion_parameter_GLMM%27s-tf3354683.html#a11810939
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate.ts

2007-07-26 Thread Jeffrey J. Hallman
Your troubles with 'aggregate' for a ts are one of the reasons I created the
'tis' and 'ti' classes in the fame package.  If you do this:

 x1 - tis(1:24, start = c(2000, 10), freq = 12)
 x2 - tis(1:24, start = c(2000, 11), freq = 12)
 y1 - aggregate(x1, nfreq = 4)
 y2 - aggregate(x2, nfreq = 4)
 x1
 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2000   1   2   3
2001   4   5   6   7   8   9  10  11  12  13  14  15
2002  16  17  18  19  20  21  22  23  24
class: tis
 x2
 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2000   1   2
2001   3   4   5   6   7   8   9  10  11  12  13  14
2002  15  16  17  18  19  20  21  22  23  24
class: tis
 y1
 Qtr1 Qtr2 Qtr3 Qtr4
2000   6
2001   15   24   33   42
2002   51   60   69 
class: tis
 y2
 Qtr1 Qtr2 Qtr3 Qtr4
2001   12   21   30   39
2002   48   57   66 
class: tis

Everything pretty much works as you would expect.  One thing to notice is
that, even using a 'tis' rather than a 'ts', aggregate will only sum up the
monthly observations for a quarter if all three of the months are there.
That's why y2 starts with 2001Q1, rather than 2000Q4.  If you really want the
2000Q4 observation to be the sum of the first two x2 months, the convert()
function in fame can handle that.

 convert(x2, tif = quarterly, observed = summed, ignore = T)
  Qtr1  Qtr2  Qtr3  Qtr4
20004.03
2001 12.00 21.00 30.00 39.00
2002 48.00 57.00 66.00 71.225806
class: tis

Now back to ts.  If you look deeper into what's happening here:

 y3 - aggregate(as.ts(x2), nf = 4)
 y3
Error in rep.int(, start.pad) : invalid number of copies in rep.int()

Enter a frame number, or 0 to exit   

1: print(c(6, 15, 24, 33, 42, 51, 60, 69))
2: print.ts(c(6, 15, 24, 33, 42, 51, 60, 69))
3: matrix(c(rep.int(, start.pad), format(x, ...), rep.int(, end.pad)), nc 
4: as.vector(data)
5: rep.int(, start.pad)

Selection: 0
 unclass(y3)
[1]  6 15 24 33 42 51 60 69
attr(,tsp)
[1] 2000.833 2002.5834.000

what you see is that aggregate() did indeed create a quarterly series, but the
quarters cover (Nov-Jan, Feb-Apr, May-Jul, Aug-Oct), not the usual (Jan-Mar,
Apr-Jun, Jul-Sep, Oct-Dec).  The author of the print.ts code evidently never
even thought of this possibility.  Not that I blame him.  I work with monthly
and quarterly data all the time, and the behavior of aggregate.ts() is so
counter-intuitive that I wouldn't have imagined it either.

Bottom line: use 'tis' series from the fame package, or 'zoo` stuff from
Gabor's zoo package.  As the author of the fame package, I hope you'll excuse
me for asserting that the 'tis' class is easier to understand and use than the
zoo stuff, which takes a more general approach.  Some day Gabor or I or some
other enterprising soul should try combining the best ideas from zoo and fame
into a package that is better than either one.

Jeff




-- 
Jeff

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression

2007-07-26 Thread Mike Lawrence
Maybe try making sure the data is numeric:

fac.to.num=function(x) as.numeric(as.character(x))


On 26-Jul-07, at 9:34 AM, Sullivan, Mary M wrote:

 Greetings,


 I am working on a logistic regression model in R and I am  
 struggling with the code, as it is a relatively new program for  
 me.  In searching Google for 'logistic regression diagnostics' I  
 came Elizabeth Brown's Lecture 14 from her Winter 2004  
 Biostatistics 515 course  (http://courses.washington.edu/b515/ 
 l14.pdf) .  I found most of the code to be very helpful, but I am  
 struggling with the lines on to calculate the observed and expected  
 values in the 10 groups created by the cut function.  I get error  
 messages in trying to create the E and O matrices:  R won't accept  
 assignment of fi1c==j and it won't calculate the sum.



 I am wondering whether someone might be able to offer me some  
 assistance...my search of the archives was not fruitful.



 Here is the code that I adapted from the lecture notes:



 fit - fitted(glm.lyme)

 fitc - cut(fit, br = c(0, quantile(fit, p = seq(.1, .9, .1)),1))

 t-table(fitc)

 fitc - cut(fit, br = c(0, quantile(fit, p = seq(.1, .9, .1)), 1),  
 labels = F)

 t-table(fitc)



 #Calculate observed and expected values in ea group

 E - matrix(0, nrow=10, ncol = 2)

 O - matrix(0, nrow=10, ncol=2)

 for (j in 1:10) {

   E[j, 2] = sum(fit[fitc==j])

   E[j, 1] = sum((1- fit)[fitc==j])

   O[j, 2] = sum(pcdata$lymdis[fitc==j])

   O[j, 1] = sum((1-pcdata$lymdis)[fitc==j])



 }



 Here is the error message:  Error in Summary.factor(..., na.rm =  
 na.rm) :
 sum not meaningful for factors





 I understand what it means; I just can't figure out how to get  
 around it or how to get the output printed in table form.  Thank  
 you in advance for any assistance.



 Mary Sullivan

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

--
Mike Lawrence
Graduate Student, Department of Psychology, Dalhousie University

Website: http://memetic.ca

Public calendar: http://icalx.com/public/informavore/Public

The road to wisdom? Well, it's plain and simple to express:
Err and err and err again, but less and less and less.
- Piet Hein

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Obtaining summary of frequencies of value occurrences for a variable in a multivariate dataset.

2007-07-26 Thread Allan Kamau
Thanks so much Jim, Andaikalavan, Gabor and others for the help and suggestions.
The solution will result in a matrix containing nested matrices to enable each 
variable name, each variables distinct value and the count of the distinct 
value to be accessible individually.
The main matrix will contain the variable names, the first level nested 
matrices will consist of the variables unique values, and each such variable 
entry will contain a one element vector to contain the count or occurrence 
frequency.
This matrix can now be used in comparing other similar datasets for variable 
values and their frequencies.

Building on the input received so far, a probable solution in building the 
matrix will include the following.


1)I reading the csv file (containing column headers)
my_data=read.table(path/to/my/data.csv,header=TRUE,sep=,,dec=.,fill=TRUE)

2)I group the values in each variable producing an occurrence count(frequency)
x.val-apply(my_data,2,table)

3)I obtain a vector of the names of the variables in the table
names(x.val)

4)Now I make use of the names (obtained in step 3) to obtain a vector of 
distinct values in a given variable (in the example below the variable name is 
$PR14)
names(v.val$PR14)

5)I obtain a vector (with one element) of the frequency of a value obtained 
from the step above (in our example the value is V)
as.vector(x.val$PR14[V])

Todo:
Now I will need to place the steps above in a script (consisting of loops) to 
build the matrix, step 4 and 5 seem tricky to do programatically.

Allan.


- Original Message 
From: jim holtman [EMAIL PROTECTED]
To: Allan Kamau [EMAIL PROTECTED]
Cc: Adaikalavan Ramasamy [EMAIL PROTECTED]; r-help@stat.math.ethz.ch
Sent: Wednesday, July 25, 2007 1:50:55 PM
Subject: Re: [R] Obtaining summary of frequencies of value occurrences for a 
variable in a multivariate dataset.

Also if you want to access the individual values, you can just leave
it as a list:

 x.val - apply(x, 2, table)
 # access each value
 x.val$PR14[V]
V
8



On 7/25/07, Allan Kamau [EMAIL PROTECTED] wrote:
 A subset of the data looks as follows

  df[1:10,14:20]
   PR10 PR11 PR12 PR13 PR14 PR15 PR16
 1 VTIKVGD
 2 VSIKVGG
 3 VTIRVGG
 4 VSIKIGG
 5 VSIKVGG
 6 VSIRVGG
 7 VTIKIGG
 8 VSIKVEG
 9 VSIKVGG
 10VSIKVGG

 The result I would like is as follows

 PR10PR11  PR12   ...
 [V:10][S:7,T:3][I:10]

 The result can be in a matrix or a vector and each variablename, value and 
 frequency should be accessible so as to be used for comparisons with another 
 dataset later.
 The frequency can be a count or a percentage.


 Allan.


 - Original Message 
 From: Adaikalavan Ramasamy [EMAIL PROTECTED]
 To: Allan Kamau [EMAIL PROTECTED]
 Cc: r-help@stat.math.ethz.ch
 Sent: Tuesday, July 24, 2007 10:21:51 PM
 Subject: Re: [R] Obtaining summary of frequencies of value occurrences for a 
 variable in a multivariate dataset.

 The name of the table should give you the value. And if you have a
 matrix, you just need to convert it into a vector first.

   m - matrix( LETTERS[ c(1:3, 3:5, 2:4) ], nc=3 )
   m
  [,1] [,2] [,3]
 [1,] A  C  B
 [2,] B  D  C
 [3,] C  E  D
   tb - table( as.vector(m) )
   tb

 A B C D E
 1 2 3 2 1
   paste( names(tb), :, tb, sep= )
 [1] A:1 B:2 C:3 D:2 E:1

 If this is not what you want, then please give a simple example.

 Regards, Adai



 Allan Kamau wrote:
  Hi all,
  If the question below as been answered before I
  apologize for the posting.
  I would like to get the frequencies of occurrence of
  all values in a given variable in a multivariate
  dataset. In short for each variable (or field) a
  summary of values contained with in a value:frequency
  pair, there can be many such pairs for a given
  variable. I would like to do the same for several such
  variables.
  I have used table() but am unable to extract the
  individual value and frequency values.
  Please advise.
 
  Allan.
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list

[R] colored heights in 3D plot (persp)

2007-07-26 Thread Juliane Willert
Hello everybody,

I have a matrix with measurement values and plot them with persp.
I want to highlight different heights in different colors. At least 
everything above and under a certain z-level shall have a different 
color to make the differences in height more obvious.
How can I do that or do I have to use another package?

Best regards,
Juliane

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating a cross table out of a large dataset

2007-07-26 Thread Marc Schwartz
On Thu, 2007-07-26 at 13:32 -0700, celine wrote:
 Dear all,
 
 I want to make a cross table out of a data set which is 2 columns wide and
 more than 15 rows long. When I use the table() function I get an error
 message
 
 This is the code I have used:
 
 Dataset - read.table(test.txt, header=TRUE, sep=,, na.strings=NA,
 dec=., strip.white=TRUE) 
 
  .T -table(Dataset$K1,Dataset$K2) 
 
 This is the error message I have received
 
 Error in vector(integer, length) : vector size specified is too large 
 In addition: Warning messages: 
 1: NAs introduced by coercion 
 2: NAs introduced by coercion 
 
 Is it possible to make a cross table with the table() function on a large
 dataset or should I consider using another function? I have had a look at
 the ?table help file but I could find any information on the size of the
 dataset.
 
 Thanks very much in advance for any help:-)
 
 Kind regards,
 Céline.

A wild guess here, but it sounds like your data does not likely contain
a relatively small set of repeated discrete entries.

Thus, your cross-tabulation results in a large number of combinations,
the number of which exceeds the largest representable integer in R,
which is:

 .Machine$integer.max
[1] 2147483647

or

 2^31 - 1
[1] 2147483647


An R table is a two (or possibly more) dimension matrix with additional
class attributes.  A matrix is in turn, a vector with 'dim' attributes.
A vector is indexed using integers and thus is limited in size to the
above number.

If the above assumptions are correct, I am struggling to think of a
scenario where the visual representation of a cross-tabulation of your
data will be of value, but that may be just do to a severe lack of sleep
of late.

You might want to run:

 length(unique(Dataset$K1))

and 

  length(unique(Dataset$K2))

which will tell you how many unique values are in each of the two
vectors. That will begin to give you some idea as to what you are
dealing with.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Diagonal Submatrices Extraction

2007-07-26 Thread Bruno C\.
Yes you are right ... an example is mandatory.

So ... I have a matrix of 0 with just a single 1 per row and per column
I need to extract all maximal 'diagonal' submatrices

Let's say I have the following matrix

  A B C D E
a 0 1 0 0 0
b 1 0 0 0 0
c 0 0 1 0 0
d 0 0 0 1 0
e 0 0 0 0 1

well I would like to get, for this example, the two following submatrices
  A B C D E
a 0 1   c 1 0 0
b 1 0   d 0 1 0
e 0 0 1
Of course some of the extracted submatrices will have in some situations 
dim=c(1,1) ...

Thanks in advance
Bruno
 Hi :  I think you need to give an example because I don't understand
 below and my guess is that, since noone else replied,
 I don't think they understood it either. I don't mean to be rude. I've
 just noticed  from being on the list
 That, if something is not clear, people won't even tell you. They just
 won't respond. The list is great
 But people don't want to spend time trying to figure out what you want.
 An example and possibly code is really
 helpful in getting responses.
 
 
 
 
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Bruno C.
 Sent: Thursday, July 26, 2007 4:47 AM
 To: R-help
 Subject: [R] Submatrices Extraction
 
 Hello,
 Given a submatrix containing 0 or 1
 I need to extract the indexes of all the diagonal submatrices so one of
 the two diagonals must contains only 1 for each submatrix ...
 Any help?
 
 Thanks in advance
 Bruno
 
 
 --
 Scegli infostrada: ADSL gratis per tutta l'estate e telefoni senza
 canone Telecom
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 This is not an offer (or solicitation of an offer) to buy/sell the 
 securities/instruments mentioned or an official confirmation.  Morgan Stanley 
 may deal as principal in or own or act as market maker for 
 securities/instruments mentioned or may advise the issuers.  This is not 
 research and is not from MS Research but it may refer to a research 
 analyst/research report.  Unless indicated, these views are the author's and 
 may differ from those of Morgan Stanley research or others in the Firm.  We 
 do not represent this is accurate or complete and we may not update this.  
 Past performance is not indicative of future returns.  For additional 
 information, research reports and important disclosures, contact me or see 
 https://secure.ms.com/servlet/cls.  You should not use e-mail to request, 
 authorize or effect the purchase or sale of any security or instrument, to 
 send transfer instructions, or to effect any other transactions.  We cannot 
 guarantee that any such requests received via e-mail will be processed in a 
 timely manner.  This communication is solely for the addressee(s) and may 
 contain confidential information.  We do not waive confidentiality by 
 mistransmission.  Contact me if you do not wish to receive these 
 communications.  In the UK, this communication is directed in the UK to those 
 persons who are market counterparties or intermediate customers (as defined 
 in the UK Financial Services Authority's rules).
 


--
Scegli infostrada: ADSL gratis per tutta l’estate e telefoni senza canone 
Telecom

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem installing tseries package

2007-07-26 Thread Prof Brian Ripley
On Thu, 26 Jul 2007, Michael Cassin wrote:

 Hi,

 I'm running R 2.4.1 on Fedora Core 6 and am unable to install the tseries
 package.  I've resolved a few problems getting to this point, by running a
 yum update, installing the gcc-gfortran dependency, but now I'm stuck.
 Could someone please point me in the right direction?

Please read the posting guide and provide the information you were asked 
for: only then we may be able to help you.

You seem to have a system which installed R in /usr/lib/R but has x86_64 
components on it.  So what architecture is it that you are trying to run?

My guess is that you installed a i386 RPM on a x86_64 OS.  That will 
install and R will run *but* you will not be able to use it to install 
packages.  If you installed the i386 RPM after the x86_64 one, it will 
have overwritten some crucial files including /usr/bin/R.

It is possible to have i386 and x86_64 R coexisting on x86_64 Linux, but 
not by installing RPMs for different architectures.



 R install.packages output ===
 ==

 install.packages(tseries)

 trying URL '
 http://www.sourcekeg.co.uk/cran/src/contrib/tseries_0.10-11.tar.gz'
 Content type 'application/x-tar' length 182043 bytes
 opened URL
 ==
 downloaded 177Kb

 * Installing *source* package 'tseries' ...
 ** libs
 gcc -I/usr/lib/R/include -I/usr/lib/R/include  -I/usr/local/include
 -fpic  -O3 -g -std=gnu99 -c arma.c -o arma.o
 gcc -I/usr/lib/R/include -I/usr/lib/R/include  -I/usr/local/include
 -fpic  -O3 -g -std=gnu99 -c bdstest.c -o bdstest.o
 gcc -I/usr/lib/R/include -I/usr/lib/R/include  -I/usr/local/include
 -fpic  -O3 -g -std=gnu99 -c boot.c -o boot.o
 gfortran   -fpic  -O2 -g -c dsumsl.f -o dsumsl.o
 In file dsumsl.f:450

  IF (IV(1) - 2) 30, 40, 50
   1
 Warning: Obsolete: arithmetic IF statement at (1)
 In file dsumsl.f:3702

   10 ASSIGN 30 TO NEXT
   1
 Warning: Obsolete: ASSIGN statement at (1)
 In file dsumsl.f:3707

   20GO TO NEXT,(30, 50, 70, 110)
  1
 Warning: Obsolete: Assigned GOTO statement at (1)
 In file dsumsl.f:3709

  ASSIGN 50 TO NEXT
   1
 Warning: Obsolete: ASSIGN statement at (1)
 In file dsumsl.f:3718

  ASSIGN 70 TO NEXT
   1
 Warning: Obsolete: ASSIGN statement at (1)
 In file dsumsl.f:3724

  ASSIGN 110 TO NEXT
   1
 Warning: Obsolete: ASSIGN statement at (1)
 In file dsumsl.f:4552

  IF (IV(1) - 2) 999, 30, 70
   1
 Warning: Obsolete: arithmetic IF statement at (1)
 In file dsumsl.f:4714

  IF (IRC) 140, 100, 210
   1
 Warning: Obsolete: arithmetic IF statement at (1)
 gcc -I/usr/lib/R/include -I/usr/lib/R/include  -I/usr/local/include
 -fpic  -O3 -g -std=gnu99 -c garch.c -o garch.o
 gcc -I/usr/lib/R/include -I/usr/lib/R/include  -I/usr/local/include
 -fpic  -O3 -g -std=gnu99 -c ppsum.c -o ppsum.o
 gcc -I/usr/lib/R/include -I/usr/lib/R/include  -I/usr/local/include
 -fpic  -O3 -g -std=gnu99 -c tsutils.c -o tsutils.o
 gcc -shared -Bdirect,--hash-stype=both,-Wl,-O1 -o tseries.so arma.o
 bdstest.o boot.o dsumsl.o garch.o ppsum.o tsutils.o -L/usr/lib/R/lib -lRblas
 -lgfortran -lm -lgcc_s -lgfortran -lm -lgcc_s -L/usr/lib/R/lib -lR
 /usr/bin/ld: skipping incompatible /usr/lib/R/lib/libRblas.so when searching
 for -lRblas
 /usr/bin/ld: skipping incompatible /usr/lib/R/lib/libRblas.so when searching
 for -lRblas
 /usr/bin/ld: cannot find -lRblas
 collect2: ld returned 1 exit status
 make: *** [tseries.so] Error 1
 ERROR: compilation failed for package 'tseries'
 ** Removing '/usr/lib/R/library/tseries'


 =
 =


 I presume the priority is addressing the error: /usr/bin/ld: cannot find
 -lRblas

 I have the libRblas.so file with R 2.4. Do I need to upgrade to R 2.5 - In
 which case I'll be asking how to fix the problems I'm having doing that  ;)

 [~]# yum provides libRblas.so
 snip

 R.x86_64 2.5.1-2.fc6extras
 Matched from:
 /usr/lib64/R/lib/libRblas.so
 libRblas.so()(64bit)

 R.x86_64 2.5.1-2.fc6extras
 Matched from:
 /usr/lib64/R/lib/libRblas.so
 libRblas.so()(64bit)

 R.i386   2:2.4.1-1.fc6  installed
 Matched from:
 /usr/lib/R/lib/libRblas.so
 libRblas.so

 R.x86_64 2.4.1-4.fc6installed
 Matched from:
 /usr/lib64/R/lib/libRblas.so
 libRblas.so()(64bit)


 Regards,
 

Re: [R] Constructing bar charts with standard error bars

2007-07-26 Thread John Zabroski
On 7/25/07, Ben Bolker [EMAIL PROTECTED] wrote:
 John Zabroski johnzabroski at gmail.com writes:

  The best clue I have so far is Rtips #5.9:
  http://pj.freefaculty.org/R/Rtips.html#5.9 which is what I based my present
  solution off of.
 
  However, I do not understand how this works.  It seems like there is no
  concrete way to determine the arrow drawing parameters x0 and x1 for a
  barplot.  Moreover, the bars seem to be cut off.
 

   barplot() returns the x values you need for x0 and x1.
 barplot(...,ylim=c(0,xbar+se)) will set the upper y limit so
 the bars don't get cut off.

   P.S. I hope you're not hoping to infer a statistically
 significant difference among these groups ...

   cheers
Ben Bolker

Thanks a lot!  I tried all three and they all seem very dependable.
Also, I appreciate you rewriting my solution and adding elegance.

Is there a way to extend the tick marks to the ylim values, such that
the yscale ymax tickmark is something like max(xbar+se)?  In the
documentation, I thought par(yaxp=c(y0,y1,n)) would do the trick, but
after trying to use it I am not sure I understand what yaxp even does.

P.S. I am not looking for statistically significant differences.  I am
trying to learn how to leverage R's graphing capabilities.  I also
appreciate Frank Harrell referring me to the link about Dynamite Plots
and associated weaknesses.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ROC curve in R

2007-07-26 Thread Frank E Harrell Jr
Note that even though the ROC curve as a whole is an interesting 
'statistic' (its area is a linear translation of the 
Wilcoxon-Mann-Whitney-Somers-Goodman-Kruskal rank correlation 
statistics), each individual point on it is an improper scoring rule, 
i.e., a rule that is optimized by fitting an inappropriate model.  Using 
curves to select cutoffs is a low-precision and arbitrary operation, and 
the cutoffs do not replicate from study to study.  Probably the worst 
problem with drawing an ROC curve is that it tempts analysts to try to 
find cutoffs where none really exist, and it makes analysts ignore the 
whole field of decision theory.

Frank Harrell


[EMAIL PROTECTED] wrote:
 http://search.r-project.org/cgi-bin/namazu.cgi?query=ROCmax=20result=normalsort=scoreidxname=Rhelp02aidxname=functionsidxname=docs
 
 there is a lot of help try help.search(ROC curve) gave
 Help files with alias or concept or title matching 'ROC curve' using fuzzy 
 matching:
 
 
 
 granulo(ade4) Granulometric Curves
 plot.roc(analogue)Plot ROC curves and associated 
 diagnostics
 roc(analogue) ROC curve analysis
 colAUC(caTools)   Column-wise Area Under ROC Curve 
 (AUC)
 DProc(DPpackage)  Semiparametric Bayesian ROC 
 curve analysis
 cv.enet(elasticnet)   Computes K-fold cross-validated 
 error curve for elastic net
 ROC(Epi)  Function to compute and draw 
 ROC-curves.
 lroc(epicalc) ROC curve
 cv.lars(lars) Computes K-fold cross-validated 
 error curve for lars
 roc.demo(TeachingDemos)   Demonstrate ROC curves by 
 interactively building one
 
 HTH
 see the help and examples those will suffice
 
 Type 'help(FOO, package = PKG)' to inspect entry 'FOO(PKG) TITLE'.
 
 
 
 Regards,
 
 Gaurav Yadav
 +++
 Assistant Manager, CCIL, Mumbai (India)
 Mob: +919821286118 Email: [EMAIL PROTECTED]
 Bhagavad Gita:  Man is made by his Belief, as He believes, so He is
 
 
 
 Rithesh M. Mohan [EMAIL PROTECTED] 
 Sent by: [EMAIL PROTECTED]
 07/26/2007 11:26 AM
 
 To
 R-help@stat.math.ethz.ch
 cc
 
 Subject
 [R] ROC curve in R
 
 
 
 
 
 
 Hi,
 
  
 
 I need to build ROC curve in R, can you please provide data steps / code
 or guide me through it.
 
  
 
 Thanks and Regards
 
 Rithesh M Mohan
 
 
  [[alternative HTML version deleted]]
 
-
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R codes for g-and-h distribution

2007-07-26 Thread filame uyaco

hi!


I would like to ask help how to generate numbers from g-and-h distribution.  
This distribution is like normal distribution  but span more of the kurtosis 
and skewness plane. Has R any package on how to generate them? 

Any help will be greatly appreciated.  Thank you so much!


Form,

Filame Uyaco


   
-


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using contrasts on matrix regressions (using gmodels, perhaps): 2 Solutions

2007-07-26 Thread Ranjan Maitra
Dear list,

I got two responses to my post. One was from Soren with a follow-up on personal 
e-mail, and the other I leave anonymous since he contacted me on personal 
e-mail. Anyway, here we go:

The first (Soren):

library(doBy)

Y - as.data.frame(Y)
 
lapply(Y,function(y){reg- lm(y~X); esticon(reg, c(0,0, 0, 1, 0, -1) )})
 
Confidence interval ( WALD ) level = 0.95 
Confidence interval ( WALD ) level = 0.95 
Confidence interval ( WALD ) level = 0.95 
Confidence interval ( WALD ) level = 0.95 
Confidence interval ( WALD ) level = 0.95 
$V1
  beta0  Estimate Std.Error  t.value DF  Pr(|t|)  Lower.CI Upper.CI
1 0 0.6701771  0.517921 1.293976  4 0.2653302 -0.767802 2.108156
$V2
  beta0   Estimate Std.Errort.value DF  Pr(|t|)  Lower.CI Upper.CI
1 0 -0.2789954   0.64481 -0.4326784  4 0.687 -2.069275 1.511284
$V3
  beta0   Estimate Std.Errort.value DF  Pr(|t|)  Lower.CI Upper.CI
1 0 -0.7677927 0.9219688 -0.8327751  4 0.4518055 -3.327588 1.792003
$V4
  beta0   Estimate Std.Error   t.value DF Pr(|t|)  Lower.CI  Upper.CI
1 0 -0.6026635 0.4960805 -1.214850  4  0.29123 -1.980004 0.7746768
$V5
  beta0 Estimate Std.Error  t.value DF Pr(|t|)  Lower.CI Upper.CI
1 0 2.001558  1.004574 1.992444  4 0.117123 -0.787587 4.790703

 

One thing I do not know how to handle is the output Confidence interval ( WALD 
) level = 0.95  which shows up for every regression. When I do millions of 
regressions, this seriously slows it all down. Any idea how I can suppress that?



The second solution uses gmodels, with a lucid explanation which I reproduce. 
Thanks!
 

The second (anon):

For a standard (non-matrix) regression, you could test the hypothesis  
X3=X4 using

estimable(reg, c((Intercept)=0, X1=0, X2=0, X3=1, X4=0, X5=-1) )

but this won't currently work with the mlm object created by a matrix  
regression.

The best way to solve this problem is to write an estimable.mlm()  
function that simply extracts the individual regressions from the mlm  
object and then calls estimable on each of these, pasting the results  
back together appropriately.

Something like this should do the trick:

`estimable.mlm` -
   function (object, ...)
{
   coef - coef(object)
   ny - ncol(coef)
   effects - object$effects
   resid - object$residuals
   fitted - object$fitted
   ynames - colnames(coef)
   if (is.null(ynames)) {
 lhs - object$terms[[2]]
 if (mode(lhs) == call  lhs[[1]] == cbind)
   ynames - as.character(lhs)[-1]
 else ynames - paste(Y, seq(ny), sep = )
   }
   value - vector(list, ny)
   names(value) - paste(Response, ynames)
   cl - oldClass(object)
   class(object) - cl[match(mlm, cl):length(cl)][-1]
   for (i in seq(ny)) {
 object$coefficients - coef[, i]
 object$residuals - resid[, i]
 object$fitted.values - fitted[, i]
 object$effects - effects[, i]
 object$call$formula[[2]] - object$terms[[2]] - as.name(ynames[i])
 value[[i]] - estimable(object, ...)
   }
   class(value) - listof
   value
}

Now this all works:

  X - matrix(rnorm(50),10,5)
  Y - matrix(rnorm(50),10,5)
  reg - lm(Y~X)
  estimable(reg, c((Intercept)=0, X1=0, X2=0, X3=1, X4=0, X5=-1) )  

Response Y1 :
  Estimate Std. Error   t value DF  Pr(|t|)
(0 0 0 1 0 -1) -0.9024065  0.4334235 -2.082043  4 0.1057782

Response Y2 :
  Estimate Std. Error   t value DF   Pr(|t|)
(0 0 0 1 0 -1) -0.7017988  0.2199234 -3.191106  4 0.03318115

Response Y3 :
 Estimate Std. Error  t value DF  Pr(|t|)
(0 0 0 1 0 -1) 0.5412863  0.2632527 2.056147  4 0.1089276

Response Y4 :
  Estimate Std. Errort value DF Pr(|t|)
(0 0 0 1 0 -1) -0.1028162  0.5973959 -0.1721073  4  0.87171

Response Y5 :
 Estimate Std. Error  t value DF  Pr(|t|)
(0 0 0 1 0 -1) 0.2493330  0.2024061 1.231845  4 0.2854716











On Wed, 25 Jul 2007 18:30:36 -0500 Ranjan Maitra [EMAIL PROTECTED]
wrote:

 Hi, 
 
 I want to test for a contrast from a regression where I am regressing the 
 columns of a matrix. In short, the following.
 
 X - matrix(rnorm(50),10,5)
 Y - matrix(rnorm(50),10,5)
 lm(Y~X)  
 
 Call:
 lm(formula = Y ~ X)
 
 Coefficients:
  [,1] [,2] [,3] [,4] [,5]   
 (Intercept)   0.3350  -0.1989  -0.1932   0.7528   0.0727
 X10.2007  -0.8505   0.0520   0.1501   0.3248
 X20.3212   0.7008  -0.0963  -0.2584   0.6711
 X30.3781  -0.7321   0.1907  -0.1721   0.3073
 X4   -0.1778   0.2822  -0.0644  -0.2649  -0.4140
 X5   -0.1079  -0.0475   0.6047  -0.8369  -0.5928
 
 
 I want to test for c'b = 0 where c is (lets say) the contrast (0, 0, 1, 0, 
 -1). Is it possible to do so, in one shot, using gmodels or something else?
 
 Many thanks and best wishes,
 Ranjan
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide 

Re: [R] ROC curve in R

2007-07-26 Thread Frank E Harrell Jr
Dylan Beaudette wrote:
 On Thursday 26 July 2007 06:01, Frank E Harrell Jr wrote:
 Note that even though the ROC curve as a whole is an interesting
 'statistic' (its area is a linear translation of the
 Wilcoxon-Mann-Whitney-Somers-Goodman-Kruskal rank correlation
 statistics), each individual point on it is an improper scoring rule,
 i.e., a rule that is optimized by fitting an inappropriate model.  Using
 curves to select cutoffs is a low-precision and arbitrary operation, and
 the cutoffs do not replicate from study to study.  Probably the worst
 problem with drawing an ROC curve is that it tempts analysts to try to
 find cutoffs where none really exist, and it makes analysts ignore the
 whole field of decision theory.

 Frank Harrell
 
 Frank,
 
 This thread has caught may attention for a couple reasons, possibly related 
 to 
 my novice-level experience. 
 
 1. in a logistic regression study, where i am predicting the probability of 
 the response being 1 (for example) - there exists a continuum of probability 
 values - and a finite number of {1,0} realities when i either look within the 
 original data set, or with a new 'verification' data set. I understand that 
 drawing a line through the probabilities returned from the logistic 
 regression is a loss of information, but there are times when a 'hard' 
 decision requiring prediction of {1,0} is required. I have found that the 
 ROCR package (not necessarily the ROC Curve) can be useful in identifying the 
 probability cutoff where accuracy is maximized. Is this an unreasonable way 
 of using logistic regression as a predictor? 

Logistic regression (with suitable attention to not assuming linearity 
and to avoiding overfitting) is a great way to estimate P[Y=1].  Given 
good predicted P[Y=1] and utilities (losses, costs) for incorrect 
positive and negative decisions, an optimal decision is one that 
optimizes expected utility.  The ROC curve does not play a direct role 
in this regard.  If per-subject utilities are not available, the analyst 
may make various assumptions about utilities (including the unreasonable 
but often used assumption that utilities do not vary over subjects) to 
find a cutoff on P[Y=1].  A very nice feature of P[Y=1] is that error 
probabilities are self-contained.  For example if P[Y=1] = .02 for a 
single subject and you predict Y=0, the probability of an error is .02 
by definition.  One doesn't need to compute an overall error probability 
over the whole distribution of subjects' risks.  If the cost of a false 
negative is C, the expected cost is .02*C in this example.

 
 2. The ROC curve can be a helpful way of communicating false positives / 
 false 
 negatives to other users who are less familiar with the output and 
 interpretation of logistic regression. 

What is more useful than that is a rigorous calibration curve estimate 
to demonstrate the faithfulness of predicted P[Y=1] and a histogram 
showing the distribution of predicted P[Y=1].  Models that put a lot of 
predictions near 0 or 1 are the most discriminating.  Calibration curves 
and risk distributions are easier to explain than ROC curves.  Too often 
a statistician will solve for a cutoff on P[Y=1], imposing her own 
utility function without querying any subjects.

 
 
 3. I have been using the area under the ROC Curve, kendall's tau, and cohen's 
 kappa to evaluate the accuracy of a logistic regression based prediction, the 
 last two statistics based on a some probability cutoff identified before 
 hand. 

ROC area (equiv. to Wilcoxon-Mann-Whitney and Somers' Dxy rank 
correlation between pred. P[Y=1] and Y) is a measure of pure 
discrimination, not a measure of accuracy per se.  Rank correlation 
(concordance) measures do not require the use of cutoffs.

 
 
 How does the topic of decision theory relate to some of the circumstances 
 described above? Is there a better way to do some of these things?

See above re: expected loses/utilities.

Good questions.

Frank
 
 Cheers,
 
 Dylan
 
 
 
 [EMAIL PROTECTED] wrote:
 http://search.r-project.org/cgi-bin/namazu.cgi?query=ROCmax=20result=no
 rmalsort=scoreidxname=Rhelp02aidxname=functionsidxname=docs

 there is a lot of help try help.search(ROC curve) gave
 Help files with alias or concept or title matching 'ROC curve' using
 fuzzy matching:



 granulo(ade4) Granulometric Curves
 plot.roc(analogue)Plot ROC curves and associated
 diagnostics
 roc(analogue) ROC curve analysis
 colAUC(caTools)   Column-wise Area Under ROC
 Curve (AUC)
 DProc(DPpackage)  Semiparametric Bayesian ROC
 curve analysis
 cv.enet(elasticnet)   Computes K-fold cross-validated
 error curve for elastic net
 ROC(Epi)  Function to compute and draw
 ROC-curves.
 lroc(epicalc) ROC curve
 cv.lars(lars) 

Re: [R] dates() is a great date function in R

2007-07-26 Thread Jeffrey J. Hallman
Mr Natural [EMAIL PROTECTED] writes:

Just save the spreadsheet as a csv file and use tisFromCsv() in the fame
package.  One of the arguments tisFromCsv() takes is a dateFormat, so you can
tell it what format the date column is in.  You can also tell it the name of
the date column if it isn't some variation of DATE, Date, or date.

tisFromCsv() looks at the dates coming in and automatically figures out what
frequency the data are (quarterly, monthly, weekly, daily, etc.) and creates a
univariate or multivariate (if the spreadsheet has more than one data column)
'tis' (Time Indexed Series) object. 

Jeff

 Proper calendar dates in R are great for plotting and calculating. 
 However for the non-wonks among us, they can be very frustrating.
 I have recently discussed the pains that people in my lab have had 
 with dates in R. Especially the frustration of bringing date data into R 
 from Excel, which we have to do a lot. 
 
 Please find below a simple analgesic for R date importation that I
 discovered 
 over the last 1.5 days (Learning new stuff in R is calculated in 1/2 days).
 
 The functiondates()gives the simplest way to get calendar dates into
 R from Excel that I can find.
 But straight importation of Excel dates, via a csv or txt file, can be a a
 huge pain (I'll give details for anyone who cares to know). 
 
 My pain killer is:
 Consider that you have Excel columns in month, day, year format. Note that R
 hates date data that does not lead with the year. 
 
 a. Load the chron library by typing   library(chron)   in the console.
 You know that you need this library from information revealed by 
 performing the query,
 ?dates()in the Console window. This gives the R documentation 
 help file for this and related time, date functions.  In the upper left 
 of the documentation, one sees dates(chron). This tells you that you
 need the library chron. 
 
 b. Change the format dates in Excel to format general, which gives 
 5 digit Julian dates. Import the csv file (I useread.csv()  with the 
 Julian dates and other data of interest.
 
 c.  Now, change the Julian dates that came in with the csv file into 
 calendar dates with thedates() function. Below is my code for performing 
 this activity, concerning an R data file called ss,
 
 ss holds the Julian dates, illustrated below from the column MPdate,
 
 ss$MPdate[1:5]
 [1] 34252 34425 34547 34759 34773
 
 The dates() function makes calendar dates from Julian dates,
 
 dmp-dates(ss$MPdate,origin=c(month = 1, day = 1, year = 1900))
 
  dmp[1:5]
 [1] 10/12/93 04/03/94 08/03/94 03/03/95 03/17/95
 
 I would appreciate the comments of more sophisticated programmers who
 can suggest streamlining or shortcutting this operation.
 
 regards, Don
 
 
 
  
 -- 
 View this message in context: 
 http://www.nabble.com/dates%28%29-is-a-great-date-function-in-R-tf4105322.html#a11675205
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Jeff

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ROC curve in R

2007-07-26 Thread gyadav

http://search.r-project.org/cgi-bin/namazu.cgi?query=ROCmax=20result=normalsort=scoreidxname=Rhelp02aidxname=functionsidxname=docs

there is a lot of help try help.search(ROC curve) gave
Help files with alias or concept or title matching 'ROC curve' using fuzzy 
matching:



granulo(ade4) Granulometric Curves
plot.roc(analogue)Plot ROC curves and associated 
diagnostics
roc(analogue) ROC curve analysis
colAUC(caTools)   Column-wise Area Under ROC Curve 
(AUC)
DProc(DPpackage)  Semiparametric Bayesian ROC 
curve analysis
cv.enet(elasticnet)   Computes K-fold cross-validated 
error curve for elastic net
ROC(Epi)  Function to compute and draw 
ROC-curves.
lroc(epicalc) ROC curve
cv.lars(lars) Computes K-fold cross-validated 
error curve for lars
roc.demo(TeachingDemos)   Demonstrate ROC curves by 
interactively building one

HTH
see the help and examples those will suffice

Type 'help(FOO, package = PKG)' to inspect entry 'FOO(PKG) TITLE'.



Regards,

Gaurav Yadav
+++
Assistant Manager, CCIL, Mumbai (India)
Mob: +919821286118 Email: [EMAIL PROTECTED]
Bhagavad Gita:  Man is made by his Belief, as He believes, so He is



Rithesh M. Mohan [EMAIL PROTECTED] 
Sent by: [EMAIL PROTECTED]
07/26/2007 11:26 AM

To
R-help@stat.math.ethz.ch
cc

Subject
[R] ROC curve in R






Hi,

 

I need to build ROC curve in R, can you please provide data steps / code
or guide me through it.

 

Thanks and Regards

Rithesh M Mohan


 [[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




DISCLAIMER AND CONFIDENTIALITY CAUTION:\ \ This message and ...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convert string to list?

2007-07-26 Thread Gabor Grothendieck
Try this.  It pastes list( onto the front and ) onto the end giving
list( P = 0.0, T = 0.0, Q = 0.0 )
and then parses and evaluates that as an R expression.

Str - P = 0.0, T = 0.0, Q = 0.0
eval(parse(text = paste(list(, Str, 


On 7/26/07, Manuel Morales [EMAIL PROTECTED] wrote:
 Let's say I have the following string:

 str - P = 0.0, T = 0.0, Q = 0.0

 I'd like to find a function that generates the following object from
 'str'.

 list(P = 0.0, T = 0.0, Q = 0.0)

 Thanks!

 --
 http://mutualism.williams.edu

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] reading stata files: preserving values of variables converted to factors

2007-07-26 Thread Ben Saylor
Hi,

I am a Stata user new to R.  I am using read.dta to read a Stata file 
that has variables with value labels.  read.dta converts them to 
factors, but seems to recode them with values from 1 to number of 
factor levels (looking at the output of unclass(varname)), so the 
original numerical values are lost.  Using convert.factors=FALSE 
preserves the values, but seems to discard the labels.  Is it possible 
to get these variables into R while preserving both the values and the 
labels?

Thanks,
Ben

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.