date:20080528

Re: [R] plot rownames

2008-05-28 Thread Prof Brian Ripley


On Tue, 27 May 2008, T.D.Rudolph wrote:



In the following example:
x - rnorm(1:100)
y - seq(from=-2.5, to=3.5, by=0.5)
z - as.matrix(table(cut(x,c(-Inf, y, +Inf

## I wish to transform the values in z
j - log(z)

## Yet retain the row names
row.names(j)-row.names(z)


Hmm.  The rownames were retained and row.names() is for data frames, 
rownames() for matrices.



Now, how can I go about creating a scatterplot with row.names(j) on the
x-axis and j(m:nrow(j)) on the y-axis?  Keep in mind the transformation I am
conducting is more complicated and therefore cannot be plotted directly
using log(z), which places me in this unique position.


You will need to explain what exactly you want plotted.  A scatterplot is 
of two continuous variables and you only have one (and 'j(m:nrow(j))' is 
invaild R). But here is one possibility to get you started:


m - nrow(j)
plot(j, xaxt=n, xlab=)
axis(1, 1:m, rownames(j), las=2)




Tyler
--
View this message in context: 
http://www.nabble.com/plot-rownames-tp17504882p17504882.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Linear Programming.

2008-05-28 Thread Arne Henningsen

Hi Marcus!

On Wednesday 28 May 2008 05:56, Marcus Vinicius wrote:
  Dear all,
 May anyone explain to me how  I run a linear programming or Data
 Envelopment Analysis (DEA models) into R?

Package DEA (on CRAN):
http://cran.r-project.org/web/packages/DEA/index.html

Package FEAR (NOT on CRAN):
http://www.clemson.edu/economics/faculty/wilson/Software/FEAR/fear.html

Best wishes,
Arne

 Thanks a lot.
 Best Regards.
 Marcus Vinicius

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented, minimal,
 self-contained, reproducible code.

-- 
Arne Henningsen
http://www.arne-henningsen.name

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to make R running faster

2008-05-28 Thread ctu


Hi everyone,
I run the R loops on window XP and vista. Both are Intel core 2 Duo  
2.2 GHz with 2 GB ram and XP is significantly faster than vista. Dose  
anyone know how speed up R loops in vista?


Thank you in advance.
Chunhao Tu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Tukey HSD (or other post hoc tests) following repeated measures ANOVA

2008-05-28 Thread Dieter Menne

Ullrich Ecker ullrich.ecker at uwa.edu.au writes:

 I am fairly new to R, and I am aware that others have had this 
 problem before, but I have failed to solve the problem from previous 
 replies I found in the archives.
 
 Now, I want to calculate a post-hoc test following up a within-subjects ANOVA.


Probably the best is package multcomp. By default, it gives confidence
intervals, which is certainly much better than p-values. Haven't tried if you
can get p-values.

Dieter


library(multcomp)
amod - aov(breaks ~ wool + tension, data = warpbreaks)
wht - glht(amod, linfct = mcp(tension = Tukey))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R News, volume 8, issue 1 is now available

2008-05-28 Thread John Fox


Dear R users,
The May 2008 issue of `R News' is now available on CRAN under the  
Documentation/Newsletter link.

John
(on behalf of the R News Editorial Board)


John Fox, Professor
Department of Sociology
McMaster University
Hamilton ON Canada L8S 4M4
web: socserv.mcmaster.ca/jfox

___
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-announce

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] can I do this with R?

2008-05-28 Thread Maria


Hello,
I am just about to install R and was wondering about a few things.

I have only worked in Matlab because I wanted to do a logistic 
regression. However Matlab does not do logistic regression with 
stepwiseforward method. Therefore I thought about testing R. So my 
question is

can I do logistic regression with stepwise forward in R?

Thanks /M

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] hash or other quick lookup function?

2008-05-28 Thread Esmail Bonakdarian


Hi Duncan,

Duncan Murdoch wrote:

Duncan Murdoch wrote:

Esmail Bonakdarian wrote:
 

Hello all,

I have a matrix of bit values.

I compute certain values based on the bits in each row. There may be
*duplicate* entries in the matrix, ie several rows may be identical.
These rows change over time, and identical entries may not be next to
each other.

Computing these values is time consuming, so I would like a way to 
store a value once it's computed. That way I would be able to look up the value
based on a given bitstring (row in the matrix) and avoid having to 
re-compute the same value.


So, my thought is a hash function based on a bit string - does R have such
a thing? Or is there an alternative approach that would do the same in R?
The lookup should be quick, otherwise the gains from avoiding
recomputing identical values would be lost to some extent.


Environments can be hashed. To use this, you'd convert the bit string 
into a character string, and use that as the name of a variable in an 
environment.


For example,

cache - new.env(hash=TRUE, parent=emptyenv())
 ...
bits - '0101'
if (exists(bits, cache)) {
  return(get(bits, cache))
} else {
  value - long.computation()
  assign(bits, value)
  


Whoops, that assignment should have been in the cache:

   assign(bits, value, envir=cache)

  return(value)
}



Thank you for posting this, I'll have to give it a try.
If I can get this to work it should save some some time
spent on recomputing values.

(Sorry for the delayed reply, I was traveling for the
whole last day - suffering from some reverse jet lag at
the moment :-)

Can anyone else think of other alternatives? Always happy
to learn something new.

Thanks!

Esmail

ps: do you think the conversion to a character string
before comparison/search will cause a noticeable
performance hit? I still lack an intuitive feel for
R the programming language. I'd have more of an idea
with regular high-level programming languages.



Duncan Murdoch


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Unexpected behaviour in reading genomic coordinate files of R-2.7.0

2008-05-28 Thread Prof Brian Ripley


From the NEWS file for R-patched:


o   A field containing just a sign is no longer regarded as numeric
(it was on all platforms in 2.7.0, but not on most in earlier
versions of R).

So the default behaviour has already been changed.  The right way to 
overcome this was (as you discovered) to say what type you want the 
variables to be rather than let R guess for you.  But you cna also update 
your R (as the posting guide suggests).



On Wed, 28 May 2008, Margherita wrote:


Great R people,

I have noticed a strange behaviour in read.delim() and friends in the R 2.7.0 
version. I will describe you the problem and also the solution I already 
found, just to be sure it is an expected behaviour and also to tell people, 
who may experience the same difficulty, a way to overcome it.

And also to see if it is a proper behaviour or maybe a correction is needed.

Here is the problem:
I have some genomic coordinates files (bed files, a standard format, for 
example) containing a column (Strand) in which there is either a + or a 
-.
In R-2.6.2patched (and every past version I have used) I never had problems 
in reading them in, as for example:

a - read.table(coords.bed, skip=1)
disp(a)

class  data.frame
dimensions are  38650 6
first rows:
  V1V2V3V4 V5 V6
1 chr1 100088396 100088446  seq1  0  +
2 chr1 100088764 100088814  seq2  0  -

If I do exactly the same command on the same file in R-2.7.0 the result I 
obtain is:

a - read.table(coords.bed, skip=1)
disp(a)

class  data.frame
dimensions are  38650 6
first rows:
  V1V2V3V4 V5 V6
1 chr1 100088396 100088446  seq1  0  0
2 chr1 100088764 100088814  seq2  0  0

and I completely loose the strand information, they are all zeros! I have 
also tried to put quotes around + and - in the file before reading it, to 
set in read.table() call stringsAsFactors=FALSE, to set encoding to a few 
different alternatives, but the result was always the same: they are all 
transformed in 0.


Then I tried scan() and I saw it was reading the character + properly:

scan(coords.bed,  skip=1, nlines=1, what=ch)

Read 6 items
[1] chr1 100088396100088446.00 seq1 0  [6] + 
...my conclusion is that the lone + or - are not taken as characters in 
the data frame creation step, they are taken as numeric but, being without 
numbers are all converted to 0.
Is it correct if this behaviour happens also if they are surrounded by 
quotes?


Anyway, my temporary solution (which works without the need of changing the 
files) is:
a - read.table(coords.bed, skip=1, colClasses=c(character, numeric, 
numeric, character, numeric, character))

a[1:2,]

  V1V2V3V4 V5 V6
1 chr1 100088396 100088446 seq1  0  +
2 chr1 100088764 100088814 seq2  0  -

Another way to avoid loosing strand information was to manually substitute an 
R to - and an F to + in the file before reading it in R. But it is 
much more cumbersome since the use of + and - is, for example, a standard 
format in bed files accepted and generated by the Genome Browser and other 
genome sites.


Please let me know what do you think. Ps. I saw this first in the Fedora 
version (rpm automatically updated), but it is reproduced also in the Windows 
version.


Thank you all people for your work and for making R the wonderful tool it is!

Cheers,

Margherita

--
--

---
Margherita Mutarelli, PhD Seconda Universita' di Napoli
Dipartimento di Patologia Generale
via L. De Crecchio, 7
80138 Napoli - Italy
Tel/Fax. +39.081.5665802

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] request: which integer in each column is in majority

2008-05-28 Thread Uwe Ligges




Muhammad Azam wrote:

Respected R helpers/ users
I am one of the new R user. I have a problem regarding to know which of the 
integer in each column of the following matrix is in majority. I want to know 
that integer e.g. in the first column 1 is in majority. Similarly in the third 
column 4 is in majority. So what is the suitable way to get the desired integer 
for each column. I am looking for some kind reply. Thanks
example:

x=matrix(c(1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,4,1,2,3,3),ncol=4)
x

 [,1] [,2] [,3] [,4]
[1,]1234
[2,]1241
[3,]1342
[4,]2343
[5,]2343



Probably bad but working example:

apply(x, 2, function(x) as.numeric(names(which.max(table(x)))[1]))


Uwe Ligges




best regards

Muhammad Azam 
Ph.D. Student 
Department of Medical Statistics, 
Informatics and Health Economics 
University of Innsbruck, Austria 



  
	[[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rmeta package: metaplot or forestplot of meta-analysis under DSL (ramdon) model

2008-05-28 Thread Jonathan Baron

I don't know if this helps, but recently I tried to use rmeta to make
a forest plot and gave up because the data I had were not in the right
format, so I simulated a forest plot using gplots.  I did it all
sideways and then rotated the PostScript.  See 

http://www.sas.upenn.edu/~baron/journal/jdm71128.pdf
Figure 4 on p. 300 for the result, and 
http://www.sas.upenn.edu/~baron/journal/71128/figs.R
for the code (Figure 4).  The last line in the comment shows how to
rotate the PostScript.

Jon

On 05/27/08 20:59, Shi, Jiajun [BSD] - KNP wrote:
 Dear all,
  
 I could not draw a forest plot for meta-analysis under ramdon models using 
 the rmeta 
 package.  The rmeta has a default function for MH (fixed-effect) model. Has 
 the 
 rmeta package been updated for such a function?  Or someone revised it and 
 kept a 
 private code?

-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron
Editor: Judgment and Decision Making (http://journal.sjdm.org)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Unexpected behaviour in reading genomic coordinate files of R-2.7.0

2008-05-28 Thread Margherita


Great R people,

I have noticed a strange behaviour in read.delim() and friends in the R 
2.7.0 version. I will describe you the problem and also the solution I 
already found, just to be sure it is an expected behaviour and also to 
tell people, who may experience the same difficulty, a way to overcome it.

And also to see if it is a proper behaviour or maybe a correction is needed.

Here is the problem:
I have some genomic coordinates files (bed files, a standard format, for 
example) containing a column (Strand) in which there is either a + or 
a -.
In R-2.6.2patched (and every past version I have used) I never had 
problems in reading them in, as for example:

 a - read.table(coords.bed, skip=1)
 disp(a)
class  data.frame
dimensions are  38650 6
first rows:
   V1V2V3V4 V5 V6
1 chr1 100088396 100088446  seq1  0  +
2 chr1 100088764 100088814  seq2  0  -

If I do exactly the same command on the same file in R-2.7.0 the result 
I obtain is:

 a - read.table(coords.bed, skip=1)
 disp(a)
class  data.frame
dimensions are  38650 6
first rows:
   V1V2V3V4 V5 V6
1 chr1 100088396 100088446  seq1  0  0
2 chr1 100088764 100088814  seq2  0  0

and I completely loose the strand information, they are all zeros! I 
have also tried to put quotes around + and - in the file before 
reading it, to set in read.table() call stringsAsFactors=FALSE, to set 
encoding to a few different alternatives, but the result was always 
the same: they are all transformed in 0.


Then I tried scan() and I saw it was reading the character + properly:
 scan(coords.bed,  skip=1, nlines=1, what=ch)
Read 6 items
[1] chr1 100088396100088446.00 seq1 0  
[6] +  

...my conclusion is that the lone + or - are not taken as 
characters in the data frame creation step, they are taken as 
numeric but, being without numbers are all converted to 0.
Is it correct if this behaviour happens also if they are surrounded by 
quotes?


Anyway, my temporary solution (which works without the need of changing 
the files) is:
a - read.table(coords.bed, skip=1, colClasses=c(character, 
numeric, numeric, character, numeric, character))

 a[1:2,]
   V1V2V3V4 V5 V6
1 chr1 100088396 100088446 seq1  0  +
2 chr1 100088764 100088814 seq2  0  -

Another way to avoid loosing strand information was to manually 
substitute an R to - and an F to + in the file before reading it 
in R. But it is much more cumbersome since the use of + and - is, for 
example, a standard format in bed files accepted and generated by the 
Genome Browser and other genome sites.


Please let me know what do you think. Ps. I saw this first in the Fedora 
version (rpm automatically updated), but it is reproduced also in the 
Windows version.


Thank you all people for your work and for making R the wonderful tool 
it is!


Cheers,

Margherita

--
--
---
Margherita Mutarelli, PhD 
Seconda Universita' di Napoli

Dipartimento di Patologia Generale
via L. De Crecchio, 7
80138 Napoli - Italy
Tel/Fax. +39.081.5665802

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Tukey HSD (or other post hoc tests) following repeated measures ANOVA

2008-05-28 Thread Ullrich Ecker

Hi everyone,

I am fairly new to R, and I am aware that others have had this 
problem before, but I have failed to solve the problem from previous 
replies I found in the archives.

As this is such a standard procedure in psychological science, there 
must be an elegant solution to this...I think.

I would much appreciate a solution that even I could understand... ;-)


Now, I want to calculate a post-hoc test following up a within-subjects ANOVA.

The dv is reaction time (RT), there is a 3-level Condition factor 
(Cond; within-subjects), a number of subjects (Subj), and the 
dataframe is called WMU3C.

The model is

  RT.aov - aov(RT~Cond + Error(Subj/Cond), WMU3C)

I understand that TukeyHSD only works with an aov object, but that 
RT.aov is an aovlist object.

  class(RT.aov)
[1] aovlist listof

I've tried to work around it using the maiz example in the MMC 
documentation of the HH package (a solution previously recommended), 
but I couldn't get it to work: My best shot was to calculate another 
aov avoiding the error term (I don't see how this could be a feasible 
approach, but that's how I understood the MMC example) and a contrast 
vector (contrasting conditions 2 and 3):

I have to admit that I don't quite understand what I'm doing here 
(not that you couldn't tell)

  RT2.aov - aov(terms(RT~Subj*Cond, WMU3C))
  Cond.lmat - c(0,1,-1)
  Tukey - glht.mmc(RT2.aov, focus = Cond, focus.lmat = Cond.lmat)

yielding

Error in mvt(lower = carg$lower, upper = carg$upper, df = df, corr = 
carg$corr,  :
   NA/NaN/Inf in foreign function call (arg 6)
In addition: Warning message:
In cov2cor(covm) : diagonal has non-finite entries

  Tukey
   height



Thank you very much for your help!

Ullrich


Dr Ullrich Ecker
Postdoctoral Research Associate
Cognitive Science Laboratories
School of Psychology (Mailbag M304)
Room 211 Sanders Building
University of Western Australia
35 Stirling Hwy
Crawley WA 6009
Australia
Office: +61 8 6488 3266
Mobile: +61 4 5822 0072
Fax: +61 8 6488 1006
E-mail: [EMAIL PROTECTED]
Web: www.cogsciwa.com  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] confidence interval for the logit - predict.glm

2008-05-28 Thread Christine Sonvilla


Hello all,

I've come across an online posting 
 
http://www.biostat.wustl.edu/archives/html/s-news/2001-10/msg00119.html

that described how to get confidence intervals for predicted values from 
predict.glm. These instructions were meant for S-Plus. Yet, it generally seems 
to work with R too, but I am encountering some problems. I am explaining my 
procedure in the following and would be most grateful if anyone could reply to 
me on that.

I have run glm models to predict fish species from a number of environmental 
predictors, using a stepwise approach (I am aware that there is some 
controversy going on about this). I have got a data set that is called 
testfishes.txt, predy.txt contains the predictors for the testfishes only 
and predallx.txt contains all predictors for all planning units. 

I used the following commands: stepmodfi is the output of the stepwise 
approach

predictionlogit - predict(stepmodfi, predallx, type=link, se.fit=TRUE)

upperlogit - predictionlogit$fit + 2*predictionlogit$se.fit
lowerlogit - predictionlogit$fit - 2*predictionlogit$se.fit

upperresponse[,i] - exp(upperlogit)/(1+exp(upperlogit))
lowerresponse[,i] - exp(lowerlogit)/(1+exp(lowerlogit))

predictionresponse - predict(stepmodfi, predallx, type=response, 
se.fit=FALSE)
fishoccur[,i] - predictionresponse


type=link should be the equivalent to the S-Plus version of type=1, which 
was indicated in the online posting explaining this procedure.

So I first tried to get predictions on the logit scale and there is already the 
point where I am a bit bewildered. Predictionlogit would only result in ONE 
value for every single planning unit contained in predallx.txt, when actually I 
would assume that there would be planning unit times fishes predicted values 
- at least this is what I get from predictionresponse. 
Predictionresponse generates as many predicted values as there are planning 
units times fishes (1680 planning units x 205 fish species, whihc are predicted 
over all these planning units), which I put into a constructed matrix named 
fishoccur.
In fact this is what I did in the very beginning, generating predicted values 
with the command predictionresponse - predict(stepmodfi, predallx, 
type=response, se.fit=FALSE).
Then I wanted to construct confidence intervals around these values.
Therefore I first generated the logit predicted values, but that already is the 
point I am unsure about.

The problem in fact is, that quite some of my upper values of the confidence 
interval result in NAs. The lower interval seems fine. Yet, I am not even sure 
if my approach is correct, regarding this single value that is being produced 
by the prediction on the logit scale.

I hope that my descriptions were clear enough and would greatly appreciate if 
anyone could suggest how to tackle this, whether the approach itself is fine 
(which I believe indeed) and how to get proper lower and upper confidence 
values.

Many thanks in advance!

Christine

-- 
Super-Aktion nur in der GMX Spieleflat: 10 Tage für 1 Euro.
Über 180 Spiele downloaden: http://flat.games.gmx.de

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Computing P-Value

2008-05-28 Thread Edward Wijaya

Hi,

Is there an R function or package that computes
p-value?

-GV

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to bind lists recursively

2008-05-28 Thread Patrick Burns


[EMAIL PROTECTED] wrote:

This is somewhat subtle.

Rolf's solution (here corrected to...)

a - list()
for(i in 0:1) a[[i+1]] - i

is the best of the loop solutions (or at least the best I know of).  The
  


But Bill does know a better way -- it just slipped his mind.

 system.time({ a - list(); for(i in 0:1) a[[i+1]] - i})
  user  system elapsed
  1.140.001.16
 system.time({ a - vector(list, 10001); for(i in 0:1) a[[i+1]] 
- i})

  user  system elapsed
  0.110.000.12

The lesson is to create the object to be the final length
rather than growing the object.

Patrick Burns
[EMAIL PROTECTED]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and A Guide for the Unwilling S User)


apparently similar

a - list(0)
for(i in 1:1) a - c(a, list(i))

will take a lot longer, although the result is the same.  For example:

  

system.time({


a - list()
for(i in 0:1) a[[i+1]] - i
})
   user  system elapsed 
   0.590.000.59 
  

system.time({


a - list(0)
for(i in 1:1) a - c(a, list(i))
})
   user  system elapsed 
   6.870.006.89 
  


That's a factor of about 11 times as long.  The best of the lot is

a - as.list(0:1)

of course, but this has problems with generalisation, (which everyone
suspects is going to be needed here...).

Bill Venables.  




-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Rolf Turner
Sent: Wednesday, 28 May 2008 1:02 PM
To: Daniel Yang
Cc: r-help@r-project.org
Subject: Re: [R] how to bind lists recursively


On 28/05/2008, at 2:43 PM, Daniel Yang wrote:

  

Dear all,

I want to create a list that contains 0,1,2,3, ..., 1 as its  
elements. I used the following code, which apparently doesn't work  
very well.


a - 0
for(i in 1:1) {
   a - list(a, i)
}

The result is not what I wanted. So how to create the bind lists  
recursively so that the last element would be the newly added one  
while the previous elements all remain the same?



a - list()
for(i in 1:1) a[[i]] - i

(The word ``bind'' is inappropriate.)

cheers,

Rolf Turner

##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] request: which integer in each column is in majority

2008-05-28 Thread Muhammad Azam

Respected R helpers/ users
I am one of the new R user. I have a problem regarding to know which of the 
integer in each column of the following matrix is in majority. I want to know 
that integer e.g. in the first column 1 is in majority. Similarly in the third 
column 4 is in majority. So what is the suitable way to get the desired integer 
for each column. I am looking for some kind reply. Thanks
example:
 x=matrix(c(1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,4,1,2,3,3),ncol=4)
 x
 [,1] [,2] [,3] [,4]
[1,]1234
[2,]1241
[3,]1342
[4,]2343
[5,]2343

 
best regards

Muhammad Azam 
Ph.D. Student 
Department of Medical Statistics, 
Informatics and Health Economics 
University of Innsbruck, Austria 


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] request: which integer in each column is in majority

2008-05-28 Thread Karl Ove Hufthammer

Muhammad Azam:

 I am one of the new R user. I have a problem regarding to know which of
 the integer in each column of the following matrix is in majority. I want
 to know that integer e.g. in the first column 1 is in majority.

 x=matrix(c(1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,4,1,2,3,3),ncol=4)
 x
 [,1] [,2] [,3] [,4]
 [1,]    1    2    3    4
 [2,]    1    2    4    1
 [3,]    1    3    4    2
 [4,]    2    3    4    3
 [5,]    2    3    4    3

As long as the matrix only contains integers, the following should work:

apply(x, 2, function(z) which.max(tabulate(z)) )

Output: 1 3 4 3

-- 
Karl Ove Hufthammer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pros and Cons of R

2008-05-28 Thread Neil Shephard


I feel the discussion about ease of installation on Linux (/*NIX type
systems) isn't really relevant to the Pros and Cons of R.

The problems encountered by people are often a consequence of their lack of
knowledge/understanding of the operating system, and not a deficiency of R
itself.

Just my tuppence, but I consider myself to be competent in using Linux so
this view may be somewhat biased,

Neil


-- 
View this message in context: 
http://www.nabble.com/Pros-and-Cons-of-R-tp17407521p17510625.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Sample size for 2-sample proportion tests

2008-05-28 Thread Karl Knoblick

Hallo!
I found a question exactly as mine, but I did not found an answer. Therefore I 
post this again - hopefully there will be an answer!
Thanks in advance!
karl
From: Berta ibanez 
Date: Tue, 27 Feb 2007 18:58:48 +0100

Hi R-users, 
I want to calculate the sample size needed to carry out a 2-sample 
proprotion test. 
I have the hypotesized treatment probability of success (0.80), the 
hypotesized control probability of success (0.05), and also de proportion of 
the sample devoted to treated group (5%), (fraction=rho=0.05, n2/n1=19). 
Using the Hsmisc library, it seemss that I can use bsamsize (option 1) or 
samplesize.bin (option 2, alpha=0.05 or option 3 alpha=0.05/2, I am not sure 
after reading the help page) and I can use STATA (option 4). 

library(Hmisc) 

#OPTION 1: 
bsamsize(p1=0.8, p2=0.05, fraction=0.05, alpha=.05, power=.9) 
# n1 =2.09, n2=39.7, TOTAL=42 

#OPTION 2: 
samplesize.bin(alpha=0.05, beta=0.9, pit=0.8, pic=0.05, rho=0.05) 
# n= 58, TOTAL= 58 

#OPTION 3: 
samplesize.bin(alpha=0.025, beta=0.9, pit=0.8, pic=0.05, rho=0.05) 
# n= 72, TOTAL= 72 

#OPTION 4: 
sampsi 0.8 0.05, p(0.90) a(0.05) r(19) 
# n1=4, n2 = 76 TOTAL=80 

Can the method used produces the differences (42 vs 72 vs 80)? Can somebody 
give me hints about the possible reasons (asymptotic-exact distribution- 
continuity correction-my own error)? Which method would be recomended? 

Thanks a lot in advance, 

Berta 

  __
Ge
.mail.yahoo.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] extracting information from lmer objects

2008-05-28 Thread epigone26

Hi,

I wish to extract a subset of the information of given by
summary(lmer.object) as a dataframe. In particular, I wish to extract
just a table listing the Estimate, Std Error, and t-values rounded to
3 decimal places. I have learned how to extract the coefficients with
round(fixef(lmer.object),3) and the standard errors with
round(sqrt(diag(vcov(a.lmer))),3)
but I do not know how to extract the t-values; the extractor methods
do not seem to help in this regard. I inspected the structure of the
summary with str(summary(lmer.object)) but unfortunately I am new to
R and didn't find this very enlightening. Here is an example of the
dataframe I would like to produce:

FactorEstimate Std. Err   t
FixedFac1   0.0910.140 0.651
FixedFac2   0.0540.012 4.461
FixedFac3  -0.0780.021-3.664

Cheers,

Barry.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Evidence Theory in R

2008-05-28 Thread Lars Fischer

Hello,

well, I searched list-archive, cran and the references, but found
nothing. Thus:

Does anybody around here know anything about Dempster-Shafer Theory,
Evidence Theory or Hints in R? Has anybody stumbled about a package that
I overlooked or implemented something in this area? I really would like
to not implement a hint-model a second time.

My apologies if I missed something obvious, but I would be glad if
someone could give me a hint.

Regards,
Lars

-- 
Lars Fischertel: +49 (0)6151 16-2889
Technische Universität Darmstadt
Fachbereich Informatik/ FG Sicherheit in der Informationstechnik
PGP FPR: A197 CBE1 91FC 0CE3 A71D  77F2 1094 CB6E CEE3 7111

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to remove NAs and lme function

2008-05-28 Thread Jen_mp3


I am working on a project to find a model for the concentration of dissolved
oxygen in the river clyde. Ive fitted a linear mixed model as
lme(DOW~Temperature+Salinity+Year+factor(Station)*factor(Depth),
random~1|id), where id is an identifier of the day over 20 years defined as
Day*1 + Month*100 + (1900 - Year).
Anyway, there are some NAs for the concentration of dissolved oxygen in the
water so I know you add in na.action = na.omit and that omits the NAs so
there are 9008 observations in the model, but it doesnt do it for the whole
data set where there are 10965 including observations with NAs. I would like
to plot the residuals from the model against the Salinity, Temperature and
Year, but when I try, it seems to want to take the observations of these
variables from the full data set and the residuals from the model which of
course doesnt work. I have tried using
data1 - data[data$DOW != NA,] on the whole data set but it doesnt work.
How can I remove the NAs from a data set? 

-- 
View this message in context: 
http://www.nabble.com/How-to-remove-NAs-and-lme-function-tp17510564p17510564.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] package functions documentation

2008-05-28 Thread Martin Morgan

Ardia David [EMAIL PROTECTED] writes:

 Great, thanks a lot! It works properly now.
 By the way, how can I get rid of the warning message :

 * checking line endings in C/C++/Fortran sources/headers ... WARNING
 Found the following sources/headers with CR or CRLF line endings:

 Should I open an save the C files in another format ? I am using
 notepad on a Windows XP machine.

Change formats yes. With Notepad, I don't think you can. There are
hints at

http://en.wikipedia.org/wiki/Newline#Conversion_utilities

Martin


 Martin Morgan wrote:
 Ardia David [EMAIL PROTECTED] writes:
 
 Dear all,
 I am currently creating a package and writing the documention files
 for some functions (the main functions). When using R CMD check
 myPackage, I get a warning message indicating that some of my
 functions are not documented. How can I get rid of this problem? Of
 course, I don't want to document every function in the package...
 Thanks for your help.
 Adding a NAMESPACE file to your package allows you to separate
 functions that are visible to the end user (and hence requiring
 documentation) from those used purely internally to the package (and
 perhaps documented less formally). See the Writing R Extensions manual
 for details.
 This helps to provide the user with access to the relevant
 documentation, without too much clutter. It also helps you to
 structure your code to have a 'public' interface that should be
 well thought out and relatively unchanging, and more flexibly defined
 and used functions that are 'private' to your package.
 Martin
 -- 
 David Ardia
 H-building, room 11-26
 Econometric Institute
 Erasmus University
 PO Box 1738
 NL 3000 DR Rotterdam
 The Netherlands
 Phone: +31 (0)10 408 2269

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

 -- 
 David Ardia
 H-building, room 11-26
 Econometric Institute
 Erasmus University
 PO Box 1738
 NL 3000 DR Rotterdam
 The Netherlands
 Phone: +31 (0)10 408 2269

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to remove NAs and lme function

2008-05-28 Thread Andrew Robinson

Jen,

try

na.action = na.exclude

Andrew


On Wed, May 28, 2008 9:26 pm, Jen_mp3 wrote:

 I am working on a project to find a model for the concentration of
 dissolved
 oxygen in the river clyde. Ive fitted a linear mixed model as
 lme(DOW~Temperature+Salinity+Year+factor(Station)*factor(Depth),
 random~1|id), where id is an identifier of the day over 20 years defined
 as
 Day*1 + Month*100 + (1900 - Year).
 Anyway, there are some NAs for the concentration of dissolved oxygen in
 the
 water so I know you add in na.action = na.omit and that omits the NAs so
 there are 9008 observations in the model, but it doesnt do it for the
 whole
 data set where there are 10965 including observations with NAs. I would
 like
 to plot the residuals from the model against the Salinity, Temperature and
 Year, but when I try, it seems to want to take the observations of these
 variables from the full data set and the residuals from the model which of
 course doesnt work. I have tried using
 data1 - data[data$DOW != NA,] on the whole data set but it doesnt work.
 How can I remove the NAs from a data set?

 --
 View this message in context:
 http://www.nabble.com/How-to-remove-NAs-and-lme-function-tp17510564p17510564.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



Andrew Robinson
Senior Lecturer in Statistics   Tel: +61-3-8344-6410
Department of Mathematics and StatisticsFax: +61-3-8344 4599
University of Melbourne, VIC 3010 Australia
Email: [EMAIL PROTECTED]Website: http://www.ms.unimelb.edu.au

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to bind lists recursively

2008-05-28 Thread John Fox

Dear Brian and Bill,

Here's an interesting contrasting example (taken from this month's Help Desk
column in R News, which Bill has already seen), first verifying the relative
timings for Brian's example:

 system.time({
+   a - vector(list, 10001)
+   for(i in 0:1) a[[i+1]] - i
+   })
   user  system elapsed 
   0.030.000.03 
   
 system.time({
+   a - list()
+   for(i in 0:1) a[[i+1]] - i
+   })
   user  system elapsed 
   0.470.020.49 
   
 system.time({
+   matrices - vector(mode=list, length=1000)
+   for (i in 1:1000)
+   matrices[[i]] - matrix(rnorm(1), 100, 100)
+   })
   user  system elapsed 
   2.590.002.61 

 system.time({
+   matrices - list()
+   for (i in 1:1000)
+   matrices[[i]] - matrix(rnorm(1), 100, 100)
+   })
   user  system elapsed 
   2.610.002.60

Brian will, I'm sure, have the proper explanation, but I suppose that the
NULL elements in the initialized list in the first example provide
sufficient space for the integers that eventually get stored there, while
that's not the case for the matrices in the second example. I believe that
the second example is more typical of what one would actually put in a list.

Regards,
 John

--
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
 Behalf Of Prof Brian Ripley
 Sent: May-28-08 2:04 AM
 To: [EMAIL PROTECTED]
 Cc: r-help@r-project.org
 Subject: Re: [R] how to bind lists recursively
 
 But pre-allocation still helps
 
 a - vector(list, 10001)
 for(i in 0:1) a[[i+1]] - i
 
 gives
 user  system elapsed
0.042   0.001   0.057
 
 on my system, against
 user  system elapsed
0.743   0.000   1.907
 
 for Bill's 'best' solution.
 
 On Wed, 28 May 2008, [EMAIL PROTECTED] wrote:
 
  This is somewhat subtle.
 
  Rolf's solution (here corrected to...)
 
  a - list()
  for(i in 0:1) a[[i+1]] - i
 
  is the best of the loop solutions (or at least the best I know of).  The
  apparently similar
 
  a - list(0)
  for(i in 1:1) a - c(a, list(i))
 
  will take a lot longer, although the result is the same.  For example:
 
  system.time({
 a - list()
 for(i in 0:1) a[[i+1]] - i
 })
user  system elapsed
0.590.000.59
  system.time({
 a - list(0)
 for(i in 1:1) a - c(a, list(i))
 })
user  system elapsed
6.870.006.89
 
 
  That's a factor of about 11 times as long.  The best of the lot is
 
  a - as.list(0:1)
 
  of course, but this has problems with generalisation, (which everyone
  suspects is going to be needed here...).
 
  Bill Venables.
 
 
 
  -Original Message-
  From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
  On Behalf Of Rolf Turner
  Sent: Wednesday, 28 May 2008 1:02 PM
  To: Daniel Yang
  Cc: r-help@r-project.org
  Subject: Re: [R] how to bind lists recursively
 
 
  On 28/05/2008, at 2:43 PM, Daniel Yang wrote:
 
  Dear all,
 
  I want to create a list that contains 0,1,2,3, ..., 1 as its
  elements. I used the following code, which apparently doesn't work
  very well.
 
  a - 0
  for(i in 1:1) {
 a - list(a, i)
  }
 
  The result is not what I wanted. So how to create the bind lists
  recursively so that the last element would be the newly added one
  while the previous elements all remain the same?
 
  a - list()
  for(i in 1:1) a[[i]] - i
 
  (The word ``bind'' is inappropriate.)
 
  cheers,
 
  Rolf Turner
 
 --
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rotated text on a regression line

2008-05-28 Thread Thomas Adams


Christoph,

I see two problems:

(1) use plot(x,y,pch=16,xlim=c(0,10),asp=1), as the default has the x/y scales 
different.

(2) It looks to me that the expression srt=180/pi*atan(slope) should be 
srt=180*atan(slope)/pi

Regards,
Tom


Dr. Christoph Scherber wrote:

Dear all,

I stumbled over a problem recently when trying to use srt with text() on a
windows device.

What I intended to do was to plot a simple regression line, and to rotate
a piece of text such that the text has the same angle as the regression
line.

However, the text is always plotted in a slightly wrong angle:



x=1:10  #create arbitrary x and y values
y=x*2-rnorm(1:10)

plot(x,y,pch=16,xlim=c(0,10))  #create the graph
abline(lm(y~x))

#calculate the y coordinate of the text:
yval=predict(lm(y~x),list(x=rep(2,length(x[1]

#calculate the slope:
slope=as.numeric(lm(y~x)[[1]][2])

text(2,yval,Regression,srt=180/pi*atan(slope),adj=0)



What am I doing wrong here?

Many thanks in advance for any help!

Best wishes
Christoph

(using R 2.6.1 on Windows XP)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
  



--
Thomas E Adams
National Weather Service
Ohio River Forecast Center
1901 South State Route 134
Wilmington, OH 45177

EMAIL:  [EMAIL PROTECTED]

VOICE:  937-383-0528
FAX:937-383-0033

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to make R running faster

2008-05-28 Thread John Fox

Dear Chunhao Tu,

There is, coincidentally, a discussion of loops and related issues in the
Help Desk column in the current issue of R News (see the newsletter link on
CRAN).

Regards,
 John

--
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
 Behalf Of [EMAIL PROTECTED]
 Sent: May-28-08 4:27 AM
 To: r-help@r-project.org
 Subject: [R] How to make R running faster
 
 Hi everyone,
 I run the R loops on window XP and vista. Both are Intel core 2 Duo
 2.2 GHz with 2 GB ram and XP is significantly faster than vista. Dose
 anyone know how speed up R loops in vista?
 
 Thank you in advance.
 Chunhao Tu
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] need help for building R in Ubuntu 8.04

2008-05-28 Thread Dirk Eddelbuettel

On Wed, May 28, 2008 at 02:29:10PM +0200, Martin Maechler wrote:
  EH == Erin Hodgess [EMAIL PROTECTED]
  on Sun, 25 May 2008 13:27:04 -0500 writes:
 
 EH Try: ./configure --with-x=no
 
 well..  no!  really don't.

Seconded. 

At best this qualified for the 'then do not do it' school of advice to the 'it 
hurts when I do this'.
But it truly missed the underlying issue. See below.

 If you want to enjoy a Linux system and building from the
 source, and then maybe learn how that is happening, learning
 about shell scripts and 'make' and ...
 then rather do get the extra ubuntu packages needed.

Or if you 'just' want to run it, install Ubuntu and learn to take
advantage of the work of others.

 The advice (below) to get the 'xorg-dev'
 is definitely good advice. I have it on the list of packages
 I'd always want to install in addition to the basic
 ubuntu/debian list.
 
 But you most probably will find that you need a few more tools /
 libraries / headers for your ubuntu system such that you can
 build R with all the bells and whistles possible.
 
 There's the Debian (and hence Ubuntu) package
 'r-base-dev'
 which contains 'r-base' (i.e. a *binary* version of R; the one
 Dirk Eddelbuettel mentioned),
 but also most of the compilers/libraries/... that you'd want to
 build R from the sources.

Just to be a bit more precise:

i)   'apt-get install r-base' will get you r-base-core and all the
 recommended packages --- use this if you want to _run_ R

ii)  'apt-get install r-base-dev' will get all the common header files, 
 as well as r-base-core use this if you _also want to build /
 install R packages_ incl from CRAN

iii) 'apt-get build-dep r-base' will get you _build dependencies_ for
 R and is probably what Martin wanted here.

 Last time I did get 'r-base-dev' on a virgin ubuntu system,
 I vaguely remember that it did not contain *really* all the
 tools I'd wanted, but almost all.

Bug reports are always welcome and a more constructive form of moving
things forward than an off-hand comment here :-) Note that I tend not
to get the ones filed against Ubuntu so file against Debian please.

 e.g., you may also want the two packages
 
   tcl8.4-dev
   tk8.4-dev

Just curious: what did you need them for ? In case you wanted to build
R, see iii) above as a possibly more focussed way to get there.

Dirk

-- 
Three out of two people have difficulties with fractions.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] rbinom not using probability of success right

2008-05-28 Thread Philip Twumasi-Ankrah

I am trying to simulate a series of ones and zeros (1 or 0) and I am using 
rbinom but realizing that the number of successes expected is not accurate. 
Any advice out there.

This is the example:

N-500
status-rbinom(N, 1, prob = 0.15)
count-sum(status)

15 percent of 500 should be 75 but what I obtain from the count variable is 
77 that gives the probability of success to be 0.154. Not very good.

Is there another way beyond using sample and rep together?


A Smile costs Nothing  

 But Rewards Everything

Happiness is not perfected until it is shared
  -Jane Porter  
  


   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] can I do this with R?

2008-05-28 Thread Charilaos Skiadas

A google search for logistic regression with stepwise forward in r  
returns the following post:


https://stat.ethz.ch/pipermail/r-help/2003-December/043645.html

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

On May 28, 2008, at 7:01 AM, Maria wrote:


Hello,
I am just about to install R and was wondering about a few things.

I have only worked in Matlab because I wanted to do a logistic  
regression. However Matlab does not do logistic regression with  
stepwiseforward method. Therefore I thought about testing R. So my  
question is

can I do logistic regression with stepwise forward in R?

Thanks /M


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] function to compute consensus DNA sequence by plurality?

2008-05-28 Thread Jean lobry


Kim,

Is is what you want?

tmp - readLines(textConnection(
TGCATACACCGACAACATCCTCGACGACTACACCTACTACG
CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG
CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG
CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG
CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG
AGCATACACCGACAACATCCTCGATGACTACTGCTACTACG
CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG
CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG
CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG
AGCATACACCGACAACATCCTCGATGACTACTGCTACTACG
CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG
CGCCTACACCAACGATGTCCTGGACGACTTCTGCTACTACG))
library(seqinr)
mat - matrix(sapply(tmp, s2c), nrow = length(tmp), byrow = TRUE)
bma - function(nucl){
nucl - tolower(nucl)
iupac - sapply(amb(), amb)
iupac$u - NULL # no RNA
names(iupac)[unlist(lapply(iupac, setequal, nucl))]
}
res - apply(mat, 2, bma)
toupper(c2s(res))
[1] HGCMTACACCRACRAYRTCCTSGAYGACTWCWSCTACTACG

Best,

Jean
--
Jean R. Lobry([EMAIL PROTECTED])
Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - LYON I,
43 Bd 11/11/1918, F-69622 VILLEURBANNE CEDEX, FRANCE
allo  : +33 472 43 27 56 fax: +33 472 43 13 88
http://pbil.univ-lyon1.fr/members/lobry/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to bind lists recursively

2008-05-28 Thread Prof Brian Ripley


On Wed, 28 May 2008, John Fox wrote:


Dear Brian and Bill,

Here's an interesting contrasting example (taken from this month's Help Desk
column in R News, which Bill has already seen), first verifying the relative
timings for Brian's example:


system.time({

+   a - vector(list, 10001)
+   for(i in 0:1) a[[i+1]] - i
+   })
  user  system elapsed
  0.030.000.03


system.time({

+   a - list()
+   for(i in 0:1) a[[i+1]] - i
+   })
  user  system elapsed
  0.470.020.49


system.time({

+   matrices - vector(mode=list, length=1000)
+   for (i in 1:1000)
+   matrices[[i]] - matrix(rnorm(1), 100, 100)
+   })
  user  system elapsed
  2.590.002.61


system.time({

+   matrices - list()
+   for (i in 1:1000)
+   matrices[[i]] - matrix(rnorm(1), 100, 100)
+   })
  user  system elapsed
  2.610.002.60

Brian will, I'm sure, have the proper explanation, but I suppose that the
NULL elements in the initialized list in the first example provide
sufficient space for the integers that eventually get stored there, while
that's not the case for the matrices in the second example. I believe that
the second example is more typical of what one would actually put in a list.


Your explanation is incorrect as a list is just a header plus a set of 
SEXP pointers, and is never used to store data directly.


The main point is that with only 1000 elements, the difference in 
re-allocating the list on your system is only going to be about (0.49 - 
0.04)/10 ~ 0.04, too small to time accurately (especially under Windows). 
So finding 0.02 is not really contrasting.


The point that pre-allocation may only help when list allocation is 
taking an appreciable part of the time is a sound one.


Note that because of GC tuning such things are very variable.  Running 
repeatedly either version on my Linux box gets timings from about 3.8 down 
to 2.5 seconds -- and after settling down to 2.5-2.7s, the 13th run took 
3.2s.




Regards,
John

--
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]

On

Behalf Of Prof Brian Ripley
Sent: May-28-08 2:04 AM
To: [EMAIL PROTECTED]
Cc: r-help@r-project.org
Subject: Re: [R] how to bind lists recursively

But pre-allocation still helps

a - vector(list, 10001)
for(i in 0:1) a[[i+1]] - i

gives
user  system elapsed
   0.042   0.001   0.057

on my system, against
user  system elapsed
   0.743   0.000   1.907

for Bill's 'best' solution.

On Wed, 28 May 2008, [EMAIL PROTECTED] wrote:


This is somewhat subtle.

Rolf's solution (here corrected to...)

a - list()
for(i in 0:1) a[[i+1]] - i

is the best of the loop solutions (or at least the best I know of).  The
apparently similar

a - list(0)
for(i in 1:1) a - c(a, list(i))

will take a lot longer, although the result is the same.  For example:


system.time({

   a - list()
   for(i in 0:1) a[[i+1]] - i
   })
  user  system elapsed
  0.590.000.59

system.time({

   a - list(0)
   for(i in 1:1) a - c(a, list(i))
   })
  user  system elapsed
  6.870.006.89




That's a factor of about 11 times as long.  The best of the lot is

a - as.list(0:1)

of course, but this has problems with generalisation, (which everyone
suspects is going to be needed here...).

Bill Venables.



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Rolf Turner
Sent: Wednesday, 28 May 2008 1:02 PM
To: Daniel Yang
Cc: r-help@r-project.org
Subject: Re: [R] how to bind lists recursively


On 28/05/2008, at 2:43 PM, Daniel Yang wrote:


Dear all,

I want to create a list that contains 0,1,2,3, ..., 1 as its
elements. I used the following code, which apparently doesn't work
very well.

a - 0
for(i in 1:1) {
   a - list(a, i)
}

The result is not what I wanted. So how to create the bind lists
recursively so that the last element would be the newly added one
while the previous elements all remain the same?


a - list()
for(i in 1:1) a[[i]] - i

(The word ``bind'' is inappropriate.)

cheers,

Rolf Turner


--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.




--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861

Re: [R] rbinom not using probability of success right

2008-05-28 Thread Prof Brian Ripley

You asked for each of 500 to be included with probability 0.15, not for 
15% of 500.  If you want the latter, use sample, e.g.


sample(c(rep(1,75), rep(0,425)))

And to see if your 77 is reasonable for binomial sampling:


binom.test(77, 500, 0.15)


Exact binomial test

data:  77 and 500
number of successes = 77, number of trials = 500, p-value = 0.8022
alternative hypothesis: true probability of success is not equal to 0.15
95 percent confidence interval:
 0.1234860 0.1886725
sample estimates:
probability of success
 0.154

so it certainly is.

On Wed, 28 May 2008, Philip Twumasi-Ankrah wrote:


I am trying to simulate a series of ones and zeros (1 or 0) and I am using 
rbinom but realizing that the number of successes expected is not accurate. 
Any advice out there.

This is the example:

N-500
status-rbinom(N, 1, prob = 0.15)
count-sum(status)

15 percent of 500 should be 75 but what I obtain from the count variable is 
77 that gives the probability of success to be 0.154. Not very good.

Is there another way beyond using sample and rep together?


[...]

--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rbinom not using probability of success right

2008-05-28 Thread Ted Harding

On 28-May-08 12:53:26, Philip Twumasi-Ankrah wrote:
 I am trying to simulate a series of ones and zeros (1 or 0) and I am
 using rbinom but realizing that the number of successes expected is
 not accurate. Any advice out there.
 
 This is the example:
 
 N-500
 status-rbinom(N, 1, prob = 0.15)
 count-sum(status)
 
 15 percent of 500 should be 75 but what I obtain from the count
 variable is 77 that gives the probability of success to be 0.154. Not
 very good.

The difference (77 - 75 =2) is well within the likely sampling
variation when 500 values are sampled independently with
P(1)=0.15:

The standard deviation of the resulting number of 1s is
sqrt(500*0.15*0.85) =  7.98, so the difference of 2 is only 1/4 of
a standard deviation, hence very likely to be equalled or exceeded.

Your chance of getting exactly 75 by this method is quite small:

  dbinom(75,500,0.15)
  [1] 0.04990852

and your chance of being 2 or more off your target is

  1 - sum(dbinom((74:76),500,0.15))
  [1] 0.8510483

 Is there another way beyond using sample and rep together?

It looks as though you are seeking to obtain exactly 75 1s,
randomly situated, the rest being 0s, so in effect you do need
to do something on the lines of sample and rep. Hence,
something like

  status - rep(0,500)
  status[sample((1:500),75,replace=FALSE)] - 1

Hoping this helps,
Ted.


E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 28-May-08   Time: 14:19:24
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rbinom not using probability of success right

2008-05-28 Thread Charles Annis, P.E.

Do it again.  What did you get this time?  Then do it another time.  Do you
see what is happening?

Charles Annis, P.E.

[EMAIL PROTECTED]
phone: 561-352-9699
eFax:  614-455-3265
http://www.StatisticalEngineering.com
 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Philip Twumasi-Ankrah
Sent: Wednesday, May 28, 2008 8:53 AM
To: r-help@r-project.org
Subject: [R] rbinom not using probability of success right

I am trying to simulate a series of ones and zeros (1 or 0) and I am using
rbinom but realizing that the number of successes expected is not
accurate. Any advice out there.

This is the example:

N-500
status-rbinom(N, 1, prob = 0.15)
count-sum(status)

15 percent of 500 should be 75 but what I obtain from the count variable
is 77 that gives the probability of success to be 0.154. Not very good.

Is there another way beyond using sample and rep together?


A Smile costs Nothing  

 But Rewards Everything

Happiness is not perfected until it is shared
  -Jane Porter



   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to make R running faster

2008-05-28 Thread Erin Hodgess

I remember reading the colSum and colMean were better, when you need
sums and means

On Wed, May 28, 2008 at 8:26 AM, Esmail Bonakdarian [EMAIL PROTECTED] wrote:
 Neil Shephard wrote:

 Loops are not massively efficient within R.

 Look into using the apply() family of functions
 (eapply()/lapply()/mapply/rapply()/tapply()).

 Didn't someone post not too long ago that apply is
 internally represented as a for-loop? Or am I not
 remembering this correctly?

 The reason for apply was that it was more concise but not
 necessarily more efficient.

 Esmail

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: [EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rbinom : Does randomness preclude precision?

2008-05-28 Thread Philip Twumasi-Ankrah

Teds reply is a bit comforting and as indicated in my post, I am resorting to 
using sample but as an academic issue, does randomness preclude precision? 

Randomness should be in the sequence of zeros and ones and how they are 
simulated at each iteration of the process but not in the eventual nature of 
the distribution. 

I mean if I simulated a Normal (0, 1) and got a Normal(1.5, 2) these would be 
very different distributions. It is the same with simulating a Binomial(1, 
p=0.15) and getting Binomial(1, 0.154)

[EMAIL PROTECTED] wrote: On 28-May-08 12:53:26, Philip Twumasi-Ankrah wrote:
 I am trying to simulate a series of ones and zeros (1 or 0) and I am
 using rbinom but realizing that the number of successes expected is
 not accurate. Any advice out there.
 
 This is the example:
 
 N-500
 status-rbinom(N, 1, prob = 0.15)
 count-sum(status)
 
 15 percent of 500 should be 75 but what I obtain from the count
 variable is 77 that gives the probability of success to be 0.154. Not
 very good.

The difference (77 - 75 =2) is well within the likely sampling
variation when 500 values are sampled independently with
P(1)=0.15:

The standard deviation of the resulting number of 1s is
sqrt(500*0.15*0.85) =  7.98, so the difference of 2 is only 1/4 of
a standard deviation, hence very likely to be equalled or exceeded.

Your chance of getting exactly 75 by this method is quite small:

  dbinom(75,500,0.15)
  [1] 0.04990852

and your chance of being 2 or more off your target is

  1 - sum(dbinom((74:76),500,0.15))
  [1] 0.8510483

 Is there another way beyond using sample and rep together?

It looks as though you are seeking to obtain exactly 75 1s,
randomly situated, the rest being 0s, so in effect you do need
to do something on the lines of sample and rep. Hence,
something like

  status - rep(0,500)
  status[sample((1:500),75,replace=FALSE)] - 1

Hoping this helps,
Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 28-May-08   Time: 14:19:24
-- XFMail --



A Smile costs Nothing  

 But Rewards Everything

Happiness is not perfected until it is shared
  -Jane Porter  
  


   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to make R running faster

2008-05-28 Thread Esmail Bonakdarian


Neil Shephard wrote:


Loops are not massively efficient within R.

Look into using the apply() family of functions
(eapply()/lapply()/mapply/rapply()/tapply()).


Didn't someone post not too long ago that apply is
internally represented as a for-loop? Or am I not
remembering this correctly?

The reason for apply was that it was more concise but not
necessarily more efficient.

Esmail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Computing P-Value

2008-05-28 Thread Gundala Viswanath

Dear Ben,

Given a set of words
('foo', 'bar', 'bar', 'bar', quux . foo) this can be in 10.000 items.
I would like to compute the significance of the word occurrence with  P-Value.

Is there a simple way to do it?

- GV


On Wed, May 28, 2008 at 11:46 PM, Ben Bolker [EMAIL PROTECTED] wrote:
 Edward Wijaya gundalav at gmail.com writes:


 Hi,

 Is there an R function or package that computes
 p-value?

 -GV

  Many, but this is far too vague a question for us
 to answer usefully.  Perhaps if you tell us specifically what
 you want to do we can help.  (Please make sure to
 read the posting guide, while you're at it ...)

  Ben Bolker

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Gundala Viswanath

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] (no subject)

2008-05-28 Thread Ben Bolker

Philip Twumasi-Ankrah nana_kwadwo_derkyi at yahoo.com writes:

 
 Teds reply is a bit comforting and as indicated in my post, 
 I am resorting to
 using sample but as an academic
 issue, does randomness preclude precision? 
 
 Randomness should be in the sequence of zeros and ones and 
 how they are simulated at each iteration of the
 process but not in the eventual nature of the distribution. 
 
 I mean if I simulated a Normal (0, 1) and got a Normal(1.5, 2) 
 these would be
 very different distributions. It
 is the same with simulating a Binomial(1, p=0.15) and getting 
 Binomial(1, 0.154)
 

  It's impossible to have a sampler that is (1) Binomial(1,p) on each draw and
(2) independent for each draw and (3) has exactly Np successes in N draws.  For
example, suppose you had drawn 99 Binomial(1,p=0.15) samples and had got(ten) 14
successes so far ... your last draw would be constrained to be 1 if you wanted
property #3 to hold.  So I guess the answer to your question is yes (except in
the limit of infinitely large samples).

  Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rbinom : Does randomness preclude precision?

2008-05-28 Thread Charles Annis, P.E.

What do you mean by ... *eventual* nature of the distribution?  If you
simulated 100 samples, would you expect to see 1.5 successes?  Or 1?  Or 2?
How many, in your thinking, is eventual?



Charles Annis, P.E.

[EMAIL PROTECTED]
phone: 561-352-9699
eFax:  614-455-3265
http://www.StatisticalEngineering.com
 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Philip Twumasi-Ankrah
Sent: Wednesday, May 28, 2008 9:52 AM
To: [EMAIL PROTECTED]
Cc: r-help@r-project.org
Subject: Re: [R] rbinom : Does randomness preclude precision?

Teds reply is a bit comforting and as indicated in my post, I am resorting
to using sample but as an academic issue, does randomness preclude
precision? 

Randomness should be in the sequence of zeros and ones and how they are
simulated at each iteration of the process but not in the eventual nature of
the distribution. 

I mean if I simulated a Normal (0, 1) and got a Normal(1.5, 2) these would
be very different distributions. It is the same with simulating a
Binomial(1, p=0.15) and getting Binomial(1, 0.154)

[EMAIL PROTECTED] wrote: On 28-May-08 12:53:26, Philip
Twumasi-Ankrah wrote:
 I am trying to simulate a series of ones and zeros (1 or 0) and I am
 using rbinom but realizing that the number of successes expected is
 not accurate. Any advice out there.
 
 This is the example:
 
 N-500
 status-rbinom(N, 1, prob = 0.15)
 count-sum(status)
 
 15 percent of 500 should be 75 but what I obtain from the count
 variable is 77 that gives the probability of success to be 0.154. Not
 very good.

The difference (77 - 75 =2) is well within the likely sampling
variation when 500 values are sampled independently with
P(1)=0.15:

The standard deviation of the resulting number of 1s is
sqrt(500*0.15*0.85) =  7.98, so the difference of 2 is only 1/4 of
a standard deviation, hence very likely to be equalled or exceeded.

Your chance of getting exactly 75 by this method is quite small:

  dbinom(75,500,0.15)
  [1] 0.04990852

and your chance of being 2 or more off your target is

  1 - sum(dbinom((74:76),500,0.15))
  [1] 0.8510483

 Is there another way beyond using sample and rep together?

It looks as though you are seeking to obtain exactly 75 1s,
randomly situated, the rest being 0s, so in effect you do need
to do something on the lines of sample and rep. Hence,
something like

  status - rep(0,500)
  status[sample((1:500),75,replace=FALSE)] - 1

Hoping this helps,
Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 28-May-08   Time: 14:19:24
-- XFMail --



A Smile costs Nothing  

 But Rewards Everything

Happiness is not perfected until it is shared
  -Jane Porter



   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Grouped weighted.mean

2008-05-28 Thread Chip Barnaby


Dear all --

I want to compute weighted.mean() for grouped rows.

Data frame extract is just below.  For each Key, I want the mean of 
IAC weighted by Wt.


DP0[1:20,]
Key 
IACWt

2   C3-PD030020050.PD030020050.3.12.3.0 0.765 0.859
3   C3-PD030020050.PD030020050.3.12.3.0 0.764 0.8449651
4   C3-PD030020050.PD030020050.3.12.3.0 0.760 0.8024975
5   C3-PD030020050.PD030020050.3.12.3.0 0.753 0.7326575
6   C3-PD030020050.PD030020050.3.12.3.0 0.743 0.6381150
7   C3-PD030020050.PD030020050.3.12.3.0 0.730 0.5187296
8   C3-PD030020050.PD030020050.3.12.3.0 0.714 0.378
65  C3-PD0.PD0.3.12.3.0 0.770 0.859
66  C3-PD0.PD0.3.12.3.0 0.770 0.8449651
67  C3-PD0.PD0.3.12.3.0 0.771 0.8024975
68  C3-PD0.PD0.3.12.3.0 0.772 0.7326575
69  C3-PD0.PD0.3.12.3.0 0.774 0.6381150
70  C3-PD0.PD0.3.12.3.0 0.777 0.5187296
71  C3-PD0.PD0.3.12.3.0 0.780 0.378
128 C3-PD2.PD2.3.12.3.0 0.685 0.859
129 C3-PD2.PD2.3.12.3.0 0.685 0.8449651
130 C3-PD2.PD2.3.12.3.0 0.684 0.8024975
131 C3-PD2.PD2.3.12.3.0 0.682 0.7326575
132 C3-PD2.PD2.3.12.3.0 0.679 0.6381150
133 C3-PD2.PD2.3.12.3.0 0.675 0.5187296


mean -- works OK

 MN = by( DP0$IAC, DP0$Key, mean)


weighted.mean -- no go due to length of Wt

 WMN = by( DP0$IAC, DP0$Key, weighted.mean, DP0$Wt)
Error in FUN(data[x, ], ...) : 'x' and 'w' must have the same length


So I tried tapply() -- same result

 WMN = tapply( DP0, DP0$Key, function(x) weighted.mean( x$IAC, x$Wt))
Error in tapply(DP0, DP0$Key, function(x) weighted.mean(x$IAC, x$Wt)) :
arguments must have same length


How does one pass grouped arguments to processing functions such as 
weighted.mean() ?


TIA,

Chip Barnaby








-
Chip Barnaby   [EMAIL PROTECTED]
Vice President of Research
Wrightsoft Corp.   781-862-8719 x118 voice
131 Hartwell Ave   781-861-2058 fax
Lexington, MA 02421 www.wrightsoft.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] individual analysis

2008-05-28 Thread julien . jung


Dear R-community,

I'm looking forward to analyse the results statistically
for a new medical diagnostic tool. The aim is to know whether a given  
subject is different from the mean or distribution of a population of  
controls for one continuous variable (a neuro-imaging results which  
can be quantified).


Does anyone knwows which test is appropriate for such an individual  
analysis ? Which package is necessary to perform the analysis ?


Thanks a lot for your help,

Julien

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to remove NAs and lme function

2008-05-28 Thread Jen_mp3


Thanks, that worked!


Andrew Robinson-6 wrote:
 
 Jen,
 
 try
 
 na.action = na.exclude
 
 Andrew
 
 
 On Wed, May 28, 2008 9:26 pm, Jen_mp3 wrote:

 I am working on a project to find a model for the concentration of
 dissolved
 oxygen in the river clyde. Ive fitted a linear mixed model as
 lme(DOW~Temperature+Salinity+Year+factor(Station)*factor(Depth),
 random~1|id), where id is an identifier of the day over 20 years defined
 as
 Day*1 + Month*100 + (1900 - Year).
 Anyway, there are some NAs for the concentration of dissolved oxygen in
 the
 water so I know you add in na.action = na.omit and that omits the NAs so
 there are 9008 observations in the model, but it doesnt do it for the
 whole
 data set where there are 10965 including observations with NAs. I would
 like
 to plot the residuals from the model against the Salinity, Temperature
 and
 Year, but when I try, it seems to want to take the observations of these
 variables from the full data set and the residuals from the model which
 of
 course doesnt work. I have tried using
 data1 - data[data$DOW != NA,] on the whole data set but it doesnt
 work.
 How can I remove the NAs from a data set?

 --
 View this message in context:
 http://www.nabble.com/How-to-remove-NAs-and-lme-function-tp17510564p17510564.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 
 Andrew Robinson
 Senior Lecturer in Statistics   Tel: +61-3-8344-6410
 Department of Mathematics and StatisticsFax: +61-3-8344 4599
 University of Melbourne, VIC 3010 Australia
 Email: [EMAIL PROTECTED]Website:
 http://www.ms.unimelb.edu.au
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/How-to-remove-NAs-and-lme-function-tp17510564p17514479.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fixing the coefficient of a regressor in formula

2008-05-28 Thread Marie-Pierre Sylvestre


Dear R users,

I want to estimate a Cox PH model with time-dependent covariates so I am 
using a counting process format with the following formula:


Surv(data$start, data$stop, data$event.time) ~ cluster(data$id) +  G1 + 
G2 + G3 + G4 + G5 +G6


Gs represent  a B-spline basis functions so they sum to 1 and I can't 
estimate the model as is without getting the last coefficient to be NA, 
which makes sense given the perfect collinearity.


without getting in lengthy details about my code, let me just say that 
to avoid the colinearity problem,.  I do not want to omit G1 from the 
regression. Instead, I want to fix the regression coefficient of one of 
the regressors, G1, to 1.


I have read the R manual section on formulae but I have not found how to 
do fix a regression coefficient. Conceptually speaking it seems to me 
that it should be simple, and I am sure that someone explained it 
somewhere, but I did not find the proper keywords to find it!


So, does someone know how to fix the coefficient of a regressor in the 
formula for Cox model so that the coefficient is not estimated but still 
taken into account?


Thank you in advance,

MP

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rbinom : Does randomness preclude precision?

2008-05-28 Thread Charles Annis, P.E.

I think I see the rub:  You would like to see the distribution of a sample
be identical to the distribution from which it was sampled.  But if it is
random then that can happen only in the long run, not on every sample.  That
is why samples from a normal density are *not* themselves normal - they're
t.  When the sample size is large enough the differences between a random
sample's density and its parent density become vanishingly small.  Thus the
differences you observe from repeated random samples from the binomial.
Repeated sampling produces slightly different numbers of successes.  How
could it be otherwise?

Charles Annis, P.E.

[EMAIL PROTECTED]
phone: 561-352-9699
eFax:  614-455-3265
http://www.StatisticalEngineering.com
 

From: Philip Twumasi-Ankrah [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 28, 2008 10:36 AM
To: [EMAIL PROTECTED]
Subject: RE: [R] rbinom : Does randomness preclude precision?

Charles,
When you simulate data from a distribution, what you effect are doing is
generating a sequence of values that would correspond to that distribution.
So you can generate 1000 values from a normal distribution and expect that
when you check on the distribution of your sample (what you do with your
qqnorm or Q-Q plot), it should be a close fit with the theoretical
distribution with the assigned parameter values. It will be difficult to
explain why a simulated data may be different from the distribution it is
was generated from . I think you can not blame it on randomness. 

I hope you understand what I am trying to determine.

Charles Annis, P.E. [EMAIL PROTECTED] wrote:
What do you mean by ... *eventual* nature of the distribution? If you
simulated 100 samples, would you expect to see 1.5 successes? Or 1? Or 2?
How many, in your thinking, is eventual?



Charles Annis, P.E.

[EMAIL PROTECTED]
phone: 561-352-9699
eFax: 614-455-3265
http://www.StatisticalEngineering.com


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Philip Twumasi-Ankrah
Sent: Wednesday, May 28, 2008 9:52 AM
To: [EMAIL PROTECTED]
Cc: r-help@r-project.org
Subject: Re: [R] rbinom : Does randomness preclude precision?

Teds reply is a bit comforting and as indicated in my post, I am resorting
to using sample but as an academic issue, does randomness preclude
precision? 

Randomness should be in the sequence of zeros and ones and how they are
simulated at each iteration of the process but not in the eventual nature of
the distribution. 

I mean if I simulated a Normal (0, 1) and got a Normal(1.5, 2) these would
be very different distributions. It is the same with simulating a
Binomial(1, p=0.15) and getting Binomial(1, 0.154)

[EMAIL PROTECTED] wrote: On 28-May-08 12:53:26, Philip
Twumasi-Ankrah wrote:
 I am trying to simulate a series of ones and zeros (1 or 0) and I am
 using rbinom but realizing that the number of successes expected is
 not accurate. Any advice out there.
 
 This is the example:
 
 N-500
 status-rbinom(N, 1, prob = 0.15)
 count-sum(status)
 
 15 percent of 500 should be 75 but what I obtain from the count
 variable is 77 that gives the probability of success to be 0.154. Not
 very good.

The difference (77 - 75 =2) is well within the likely sampling
variation when 500 values are sampled independently with
P(1)=0.15:

The standard deviation of the resulting number of 1s is
sqrt(500*0.15*0.85) = 7.98, so the difference of 2 is only 1/4 of
a standard deviation, hence very likely to be equalled or exceeded.

Your chance of getting exactly 75 by this method is quite small:

dbinom(75,500,0.15)
[1] 0.04990852

and your chance of being 2 or more off your target is

1 - sum(dbinom((74:76),500,0.15))
[1] 0.8510483

 Is there another way beyond using sample and rep together?

It looks as though you are seeking to obtain exactly 75 1s,
randomly situated, the rest being 0s, so in effect you do need
to do something on the lines of sample and rep. Hence,
something like

status - rep(0,500)
status[sample((1:500),75,replace=FALSE)] - 1

Hoping this helps,
Ted.


E-Mail: (Ted Harding) 
Fax-to-email: +44 (0)870 094 0861
Date: 28-May-08 Time: 14:19:24
-- XFMail --



A Smile costs Nothing 

But Rewards Everything

Happiness is not perfected until it is shared
-Jane Porter




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


A Smile costs Nothing  
But Rewards Everything

Happiness is not perfected until it is shared
  -Jane Porter  

  

__
R-help@r-project.org mailing

Re: [R] can I do this with R?

2008-05-28 Thread Smita Pakhale

Hi Maria,

But why do you want to use forwards or backwards
methods? These all are 'backward' methods of modeling.
Try using AIC or BIC. BIC is much better than AIC.
And, you do not have to believe me or any one else on
this. 

Just make a small data set with a few variables with
known relationship amongst them. With this simulated
data set, use all your modeling methods: backwards,
forwards, AIC, BIC etc and then see which one gives
you a answer closest to the truth. The beauty of using
a simulated dataset is that, you 'know' the truth, as
you are the 'creater' of it!

smita

--- Charilaos Skiadas [EMAIL PROTECTED] wrote:

 A google search for logistic regression with
 stepwise forward in r  
 returns the following post:
 

https://stat.ethz.ch/pipermail/r-help/2003-December/043645.html
 
 Haris Skiadas
 Department of Mathematics and Computer Science
 Hanover College
 
 On May 28, 2008, at 7:01 AM, Maria wrote:
 
  Hello,
  I am just about to install R and was wondering
 about a few things.
 
  I have only worked in Matlab because I wanted to
 do a logistic  
  regression. However Matlab does not do logistic
 regression with  
  stepwiseforward method. Therefore I thought about
 testing R. So my  
  question is
  can I do logistic regression with stepwise forward
 in R?
 
  Thanks /M
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] odds ratio's and function

2008-05-28 Thread Wim Bertels

Hallo,

i tried writing a function to extract 
all the odds ratio's from a ftable:
(+ p.adjust needs to build in)

So i tried the following:

ORCalcul - function(m) {
or-matrix(1:(length(m[,1])-1)*(length(m[1,])-1)*5,length(m[,1])-1,length(m[1,])-1)
for(i in 1:length(m[,1])-1) {
for(j in 1:length(m[1,])-1) {
or[i,j,1] - attr(m,row.vars)[[1]][i]
or[i,j,2] - attr(m,col.vars)[[1]][j]
or[i,j,3] - attr(m,row.vars)[[1]][i+1]
or[i,j,4] - attr(m,col.vars)[[1]][j+1]
or[i,j,5] -(m[i,j]*m[i+1,j1])/(m[i1,j]*m[i,j1])
}   
}
ORCalcul - c(or)
}

## which doenst work, because of the way R works with matrices,
## i am justed to other syntax, using the indices of a matrix
## so first question: i don't think there is way in R to work with
matrix m = new matrix(10,5,45) 
# m being a three dim matrix eg?

# so i tried, and tried, finally

ORCalcul - function(m) {
t
for(i in 1:length(m[,1])-1) {
for(j in 1:length(m[1,])-1) {
row1 - attr(m,row.vars)[[1]][i]
col1 - attr(m,col.vars)[[1]][j]
row2 - attr(m,row.vars)[[1]][i+1]
col2 - attr(m,col.vars)[[1]][j+1]
or - (m[i,j]*m[i+1,j+1])/(m[i+1,j]*m[i,j+1])
ORCalculij - c(row1, col1, row2, col2, or)
append(t,ORCalculij)
}   

}
ORCalcul - t
ORCalcul
}

# which unfornately only gives as output:, so doesnt work either
 [,1]
[1,]   NA

# the only variant i wrote that gave some output,
# but logically, only the last OR, is:

ORCalcul - function(m) {
for(i in 1:length(m[,1])-1) {
for(j in 1:length(m[1,])-1) {
row1 - attr(m,row.vars)[[1]][i]
col1 - attr(m,col.vars)[[1]][j]
row2 - attr(m,row.vars)[[1]][i+1]
col2 - attr(m,col.vars)[[1]][j+1]
or - (m[i,j]*m[i+1,j+1])/(m[i+1,j]*m[i,j+1])
ORCalcul - c(row1, col1, row2, col2, or)
# putting an append here, just gives me the code
 # back..
}   

}
ORCalcul
}




# what should i do? Suggestions?

(for the algoritmic logic, i need two more nested loops, that is not the
problem)

mvg,
Wim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Grouped weighted.mean

2008-05-28 Thread Erik Iverson


Hello -

Chip Barnaby wrote:

Dear all --

I want to compute weighted.mean() for grouped rows.

Data frame extract is just below.  For each Key, I want the mean of IAC 
weighted by Wt.


DP0[1:20,]
Key 
IACWt

2   C3-PD030020050.PD030020050.3.12.3.0 0.765 0.859
3   C3-PD030020050.PD030020050.3.12.3.0 0.764 0.8449651
4   C3-PD030020050.PD030020050.3.12.3.0 0.760 0.8024975
5   C3-PD030020050.PD030020050.3.12.3.0 0.753 0.7326575
6   C3-PD030020050.PD030020050.3.12.3.0 0.743 0.6381150
7   C3-PD030020050.PD030020050.3.12.3.0 0.730 0.5187296
8   C3-PD030020050.PD030020050.3.12.3.0 0.714 0.378
65  C3-PD0.PD0.3.12.3.0 0.770 0.859
66  C3-PD0.PD0.3.12.3.0 0.770 0.8449651
67  C3-PD0.PD0.3.12.3.0 0.771 0.8024975
68  C3-PD0.PD0.3.12.3.0 0.772 0.7326575
69  C3-PD0.PD0.3.12.3.0 0.774 0.6381150
70  C3-PD0.PD0.3.12.3.0 0.777 0.5187296
71  C3-PD0.PD0.3.12.3.0 0.780 0.378
128 C3-PD2.PD2.3.12.3.0 0.685 0.859
129 C3-PD2.PD2.3.12.3.0 0.685 0.8449651
130 C3-PD2.PD2.3.12.3.0 0.684 0.8024975
131 C3-PD2.PD2.3.12.3.0 0.682 0.7326575
132 C3-PD2.PD2.3.12.3.0 0.679 0.6381150
133 C3-PD2.PD2.3.12.3.0 0.675 0.5187296


mean -- works OK

  MN = by( DP0$IAC, DP0$Key, mean)


weighted.mean -- no go due to length of Wt

  WMN = by( DP0$IAC, DP0$Key, weighted.mean, DP0$Wt)
Error in FUN(data[x, ], ...) : 'x' and 'w' must have the same length



Just a little confusion with how 'by' works I think, try

test - data.frame(key = rep(c(A, B), each = 100),
iac = rnorm(200, 10),
wt  = runif(200))

test

by(test, test$key, function(x) weighted.mean(x$iac, x$wt))







So I tried tapply() -- same result

  WMN = tapply( DP0, DP0$Key, function(x) weighted.mean( x$IAC, x$Wt))
Error in tapply(DP0, DP0$Key, function(x) weighted.mean(x$IAC, x$Wt)) :
arguments must have same length


How does one pass grouped arguments to processing functions such as 
weighted.mean() ?


TIA,

Chip Barnaby








-
Chip Barnaby   [EMAIL PROTECTED]
Vice President of Research
Wrightsoft Corp.   781-862-8719 x118 voice
131 Hartwell Ave   781-861-2058 fax
Lexington, MA 02421 www.wrightsoft.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Searchreplace string?

2008-05-28 Thread Romain

Hi there,

I would like to know if it is possible to modify a text file with a R function.
In fact I would like to know if a function Search  Replace exists.

My problem is to create config files from a Base file in which I have to modify 
values of parameters.

My Base File: 

#...
#...

Param1= V1_1

#...

Param2 = V2_1
Param3 = V3_1
#...

What I would like for each created file i: 

#...
#...

Param1= V1_i

#...

Param2 = V2_i
Param3 = V3_i
#...

For the moment my solution is to read each line of the config file, modify if 
needed the line and recopy it in the new file. But the problem is that my file 
is about 500 lines and I would like to create a lot of different version of 
Config_file from my Base file so with this method it takes a long time and i'm 
sure i can improve it. (For example for 85 differents files I have to wait 
10min...)


My dream would be a function like this one :
Function - function(Base_file, Param_array, Value_array)
{
file.copy(Base_file, Config_file)
...
for (i in length(Param_array)
{
 SearchReplace(Config_file, Param_array[i] = something, 
Param_array[i] = Vallue_array[i])
}
...
}

I hope I have been clear enough. If not, don't hesitate to ask me more details!


Thank you for your help.
Have a good day!



Vous aussi bénéficiez d'1 Go de stockage gratuit en ligne avec Voila  
http://macle.voila.fr

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Computing P-Value

2008-05-28 Thread Ben Bolker


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



Gundala Viswanath wrote:
| Dear Ben,
|
| Given a set of words
| ('foo', 'bar', 'bar', 'bar', quux . foo) this can be in 10.000
items.
| I would like to compute the significance of the word occurrence with
P-Value.
|
| Is there a simple way to do it?
|
| - GV
|

~  Closer, but still not enough information.  What is your null
hypothesis?  Equidistribution?  If so, ...

dat - sample(c(foo,bar,quux,pridznyskie),
~  replace=TRUE,size=1)
tab - table(dat)
chisq.test(tab)

from ?chisq.test:

~ If 'x' is a matrix with one row or column, or if 'x' is a vector
~ and 'y' is not given, then a _goodness-of-fit test_ is performed
~ ('x' is treated as a one-dimensional contingency table).  The
~ entries of 'x' must be non-negative integers.  In this case, the
~ hypothesis tested is whether the population probabilities equal
~ those in 'p', or are all equal if 'p' is not given.

~  Note that this won't test the significance of *individual* deviations
from equiprobability, just the overall pattern.  If you wanted to test
individual words you could use binom.test -- but if you tested more
than one word, or tested words on the basis of those that appeared to
have extreme frequencies, you'd start running into multiple comparisons/
post hoc testing issues.

~  Do you know something about the methods that people usually use
in this area?

~  Ben Bolker

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIPXhVc5UpGjwzenMRAsunAJ9to/KGX0ohSrhUC8qTkhIR0CO8OgCfcejV
+LpiB16YBG9ExiHd2tD0sOg=
=w5FE
-END PGP SIGNATURE-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rbinom : Does randomness preclude precision?

2008-05-28 Thread Daniel Nordlund

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf
 Of Philip Twumasi-Ankrah
 Sent: Wednesday, May 28, 2008 6:52 AM
 To: [EMAIL PROTECTED]
 Cc: r-help@r-project.org
 Subject: Re: [R] rbinom : Does randomness preclude precision?

 Teds reply is a bit comforting and as indicated in my post, I am resorting to 
 using
 sample but as an academic issue, does randomness preclude precision?

I would say yes, at least in the manner you seem to be thinking of it.  Think 
about it for a minute.  Say you use the following code

status - sample(rep(c(0,1),c(425,75)))

You will get your 75 ones randomly distributed throughout the vector status.  
What would you expect the value to be for

sum(status[1:100])

I suggest that you will see the same kind of variation in the sum as you would 
see for 

sum(rbinom(100, 1, p=.15))

So, you can randomly arrange a certain percentage of ones in a sequence, but 
you should not expect to see that exact percentage in any subset of the 
sequence.

One other example, if you flip a fair coin 100 times, do you expect to get 
exactly 50 heads?

Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA

 Randomness should be in the sequence of zeros and ones and how they are 
 simulated
 at each iteration of the process but not in the eventual nature of the 
 distribution.

 I mean if I simulated a Normal (0, 1) and got a Normal(1.5, 2) these would be 
 very
 different distributions. It is the same with simulating a Binomial(1, p=0.15) 
 and getting
 Binomial(1, 0.154)

 [EMAIL PROTECTED] wrote: On 28-May-08 12:53:26, Philip Twumasi-
 Ankrah wrote:
  I am trying to simulate a series of ones and zeros (1 or 0) and I am
  using rbinom but realizing that the number of successes expected is
  not accurate. Any advice out there.

  This is the example:

  N-500
  status-rbinom(N, 1, prob = 0.15)
  count-sum(status)

  15 percent of 500 should be 75 but what I obtain from the count
  variable is 77 that gives the probability of success to be 0.154. Not
  very good.

 The difference (77 - 75 =2) is well within the likely sampling
 variation when 500 values are sampled independently with
 P(1)=0.15:

 The standard deviation of the resulting number of 1s is
 sqrt(500*0.15*0.85) =  7.98, so the difference of 2 is only 1/4 of
 a standard deviation, hence very likely to be equalled or exceeded.

 Your chance of getting exactly 75 by this method is quite small:

   dbinom(75,500,0.15)
   [1] 0.04990852

 and your chance of being 2 or more off your target is

   1 - sum(dbinom((74:76),500,0.15))
   [1] 0.8510483

  Is there another way beyond using sample and rep together?

 It looks as though you are seeking to obtain exactly 75 1s,
 randomly situated, the rest being 0s, so in effect you do need
 to do something on the lines of sample and rep. Hence,
 something like

   status - rep(0,500)
   status[sample((1:500),75,replace=FALSE)] - 1

 Hoping this helps,
 Ted.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help on Calculating day differences

2008-05-28 Thread Thorsten Muehge

Hello R Freaks,
I calculate the difference in days between two events with the following 
litte R expresseion:

T1a - strptime(T1,%m/%d/%y %H:%M:%S);
T2a - strptime(T2,%m/%d/%y %H:%M:%S);
T1b - as.Date(T1a);
T2b - as.Date(T2a);
days - T2b-T1b;
time - T2a - T1a;

In the project I would like to calculate only working day.

I the a possibility to count on woring days without weekends?

Is it maybe also possible to skip in addition the national holiday dates?

Thanks a lit for your help
Thorsten

Mit freundlichen GrÃ¼Ãen / Best Regards / Ð¡ Ð½Ð°Ð¸Ð»ÑÑÑÐ¸Ð¼Ð¸ 
Ð¿Ð¾Ð¶ÐµÐ»Ð°Ð½Ð¸ÑÐ¼Ð¸ / 
Ã¼dvÃ¶zlettel

Dr .Th.MÃ¼hge, 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] superposing barplots having different scales

2008-05-28 Thread Bill Shipley

Hello. I know how to make a bar plot in which a numeric y variable is
plotted against some grouping variable X (say, groups A, B, C) when this
grouping variable is subdivided into each of two subgroups; so the bars
would be: (group A subgroup 1) beside (group A subgroup 2), then (group B
subgroup 1) beside (group B subgroup 2), and so on. This is done using the
beside=TRUE argument in the barplot() function.  However, I want to make a
slightly different type of barplot in which the numerical values of the
subgroups (subgroups 1 and 2) are measured in very different scales.  I want
to use different scales to define the numerical y values for each subgroup.
This graph would have one scale on the left-hand y-axis and a second scale
on the right-hand y-axis.

I cannot simply superimpose two bar plots because I have to make sure that
the subgroup bars are beside each other.

 

Bill Shipley

North American Editor, Annals of Botany

Département de biologie

Université de Sherbrooke

Sherbrooke (Québec) J1K 2R1

Canada

(819) 821-8000, poste 62079

(819) 821-8049 FAX

 

 http://pages.usherbrooke.ca/jshipley/recherche/
http://pages.usherbrooke.ca/jshipley/recherche/

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to make R running faster

2008-05-28 Thread Robert A LaBudde


At 10:25 AM 5/28/2008, Esmail Bonakdarian wrote:

Erin Hodgess wrote:

I remember reading the colSum and colMean were better, when you need
sums and means


Well .. I'm waiting for the experts to jump in and give us the
straight story on this :-)


All of the algorithms are represented internally by sequential 
program logic using C or Fortran, for example. So the issue isn't the 
algorithm itself. Instead, it's where the algorithm is implemented.


However, R is an interpreter, not a compiler. This means that it 
reads each line of R code one character at a time to develop an 
understanding of what is desired done, and to check for errors in 
syntax and data classes. Interpreters are very slow compared to 
compiled code, where the lines have been pre-interpreted and already 
converted to machine code with error checking resolved.


For example a simple for loop iteration might take only 0.1 
microsecond in a compiled program, but 20-100 microseconds in an 
interpreted program.


This overhead of parsing each line can be bounded by function calls 
inside each line. If the compiled function executes on a large number 
of cases in one call, then the 50 microsecond overhead per call is diluted out.


R is a parallel processing language. If you use vectors and arrays 
and the built-in (i.e., compiled) function calls, you get maximum use 
of the compiled programs and minimum use of the interpreted program.


This is why functions such as colMeans() or apply() are faster than 
writing direct loops in R. You can speed things up by 200-1000x for 
large arrays.


Interpreted languages are very convenient to use, as they do instant 
error checking and are very interactive. No overhead of learning and 
using compilers and linkers. But they are very slow on complex 
calculations. This is why the array processing is stuffed into 
compiled functions. The best of both worlds then.


Interpreted languages are Java, R, MatLab, Gauss and others. Compiled 
languages are C and Fortran. Some, like variants of BASIC, can be 
interpreted, line-compiled or compiled, depending upon 
implementation. Some compiled languages (such as Fortran), can allow 
parallel processing via multiprocessing on multiple CPUs, which 
speeds things up even more. Compiled languages also typically 
optimize code for the target machine, which can speed things up a 
factor of 2 or so.


So the general rule for R is: If you are annoyed at processing time, 
alter your program to maximize calculations within compiled functions 
(i.e., vectorize your program to process an entire array at one 
time) and minimize the number of lines of R.



Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: [EMAIL PROTECTED]
Least Cost Formulations, Ltd.URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239Fax: 757-467-2947

Vere scire est per causas scire

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help on Calculating day differences

2008-05-28 Thread Gabor Grothendieck

See ?is.holiday in chron.   You need to supply
a .Holidays vector and then do this:

library(chron)
d1 - Sys.Date()
d2 - Sys.Date() + 100
sq - sq(d1, d2, by = day)
sum(!is.holiday(sq)  !is.weekend(sq)) # endpoints included

The fCalendar package also has functionality in this area.

On Wed, May 28, 2008 at 9:30 AM, Thorsten Muehge [EMAIL PROTECTED] wrote:
 Hello R Freaks,
 I calculate the difference in days between two events with the following
 litte R expresseion:

 T1a - strptime(T1,%m/%d/%y %H:%M:%S);
 T2a - strptime(T2,%m/%d/%y %H:%M:%S);
 T1b - as.Date(T1a);
 T2b - as.Date(T2a);
 days - T2b-T1b;
 time - T2a - T1a;

 In the project I would like to calculate only working day.

 I the a possibility to count on woring days without weekends?

 Is it maybe also possible to skip in addition the national holiday dates?

 Thanks a lit for your help
 Thorsten

 Mit freundlichen Grüßen / Best Regards / С наилучшими пожеланиями /
 üdvözlettel

 Dr .Th.Mühge,

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Evidence Theory in R

2008-05-28 Thread H. W. Borchers

Lars Fischer lars at sec.informatik.tu-darmstadt.de writes:

 Hello,
 
 well, I searched list-archive, cran and the references, but found
 nothing. Thus:
 
 Does anybody around here know anything about Dempster-Shafer Theory,
 Evidence Theory or Hints in R? Has anybody stumbled about a package that
 I overlooked or implemented something in this area? I really would like
 to not implement a hint-model a second time.


Dempster-Shafer Theory was (almost) a hype 15 or more years ago when 
it was one of several reasoning systems. It has been heavily discussed 
and criticized, and it was considered by many as statistically unsound. 

This might be the reason why it has not been realized in an R package.

As a scheme for reasoning under uncertainty Dempster-Shafer Theory has 
been replaced by Bayesian Networks and their learning capabilities in 
many modern knowledge-based systems.

Fortunately, there are several packages in R that implement Bayesian 
statistics and networks, including learning facilities. You will find 
references in the 'Bayesian' and 'Machine Learning' task views.

Regards,  Hans Werner


 My apologies if I missed something obvious, but I would be glad if
 someone could give me a hint.
 
 Regards,
   Lars

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lm() output with quantiative predictors not the same as SAS

2008-05-28 Thread Ben Bolker

 alicia.senauer at yale.edu writes:

 
 I am trying to use R lm() with quantitative and qualitative predictors, but am
 getting different results than those that I get in SAS.
 
 In the R ANOVA table documentation I see that Type-II tests corresponds to 
 the
 tests produced by SAS for analysis-of-variance models, where all of the
 predictors are factors, but not more generally (i.e., when there are
 quantitative predictors). Is this the problem? Is there a way around this so
 that the output matches the results that I am getting in SAS? Is there a
 better/more appropriate way to handle quantitative predictors?

   First of all, it looks you're using Anova, in the car package --
this is an important piece of information.   Second, you're presumably
(you didn't tell us exactly what commands you ran -- this is also
important) using the default type=II in Anova, vs. SAS type III
sums of squares (as it says in your SAS output), so I wouldn't
expect these to agree.  Third, it's highly questionable whether
type-III SS are appropriate here, in the presence of an interaction
(Treat*Plant) -- you have violated marginality restrictions --
see http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf for example.

   While I understand the desire to match the output from SAS,
a better set of questions is why do these two disagree?  What
are they doing differently?  Which one comes closer to answering
the questions I am really interested in?  

  Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] superposing barplots having different scales

2008-05-28 Thread Ben Bolker

Bill Shipley bill.shipley at usherbrooke.ca writes:

 
 Hello. I know how to make a bar plot in which a numeric y variable is
 plotted against some grouping variable X (say, groups A, B, C) when this
 grouping variable is subdivided into each of two subgroups; so the bars
 would be: (group A subgroup 1) beside (group A subgroup 2), then (group B
 subgroup 1) beside (group B subgroup 2), and so on. This is done using the
 beside=TRUE argument in the barplot() function.  However, I want to make a
 slightly different type of barplot in which the numerical values of the
 subgroups (subgroups 1 and 2) are measured in very different scales.  I want
 to use different scales to define the numerical y values for each subgroup.
 This graph would have one scale on the left-hand y-axis and a second scale
 on the right-hand y-axis.
 
 I cannot simply superimpose two bar plots because I have to make sure that
 the subgroup bars are beside each other.
 
 Bill Shipley
 

  Bill,

  I think that all the arguments for why NOT to create
scatterplots with two different y-axes apply here --
(see recent R-help threads and the wiki) --
consider a scatterplot, or some other way of presenting
the data, instead?

  Nevertheless -- the easiest way to do this (I think)
is to scale subgroup 2, then plot the right-hand axis
with labels appropriate for the unscaled data, as follows:

dat = matrix(c(1,2,3,1000,2000,3000),ncol=2,
  dimnames=list(c(A,B,C),c(meas1,meas2)))

## scale groups to have the same max. value
scale - max(dat[,meas2]/dat[,meas1])

## apply scale to second subgroup
dat2 = dat
dat2[,meas2] - dat2[,meas2]/scale

barplot(t(dat2),beside=TRUE)
ticks = pretty(dat[,meas2]) ## on original data scale
axis(side=4,at=ticks/scale,label=ticks)

  I haven't bothered with any prettiness like
setting aside margins for the right-hand axis, etc etc

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot rownames

2008-05-28 Thread T.D.Rudolph


I did have the problem of not having two continuous variables and this
approach circumvents this, allowing me in fact to plot the rownames.  


Prof Brian Ripley wrote:
 
 On Tue, 27 May 2008, T.D.Rudolph wrote:
 

 In the following example:
 x - rnorm(1:100)
 y - seq(from=-2.5, to=3.5, by=0.5)
 z - as.matrix(table(cut(x,c(-Inf, y, +Inf

 ## I wish to transform the values in z
 j - log(z)

 ## Yet retain the row names
 row.names(j)-row.names(z)
 
 Hmm.  The rownames were retained and row.names() is for data frames, 
 rownames() for matrices.
 
 Now, how can I go about creating a scatterplot with row.names(j) on the
 x-axis and j(m:nrow(j)) on the y-axis?  Keep in mind the transformation I
 am
 conducting is more complicated and therefore cannot be plotted directly
 using log(z), which places me in this unique position.
 
 You will need to explain what exactly you want plotted.  A scatterplot is 
 of two continuous variables and you only have one (and 'j(m:nrow(j))' is 
 invaild R). But here is one possibility to get you started:
 
 m - nrow(j)
 plot(j, xaxt=n, xlab=)
 axis(1, 1:m, rownames(j), las=2)
 
 

 Tyler
 -- 
 View this message in context:
 http://www.nabble.com/plot-rownames-tp17504882p17504882.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 -- 
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://www.nabble.com/plot-rownames-tp17504882p17518854.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] manipulating multiply imputed data sets

2008-05-28 Thread Donald Braman

Hi folks,

I have five imputed data sets and would like to apply the same
recoding routines to each.  I could do this sort of thing pretty
easily in Stata using MIM, but I've decided to go cold turkey on other
stats packages as a incentive for learning more about R.  Most of the
recoding is for nominal variables, like race, religion, urbanicity,
and the like.  So, for example, to recode race for my first dataset,
inmi1, I would do the following:

miset1$white  - recode(miset1$RACE, '1=1; else=0; ')
miset1$black  - recode(miset1$RACE, '2=1; else=0; ')
miset1$asian  - recode(miset1$RACE, '3=1; else=0; ')
miset1$hispanic - recode(miset1$RACE, '4=1; else=0; ')
miset1$raceother - recode(miset1$RACE, '5=1; else=0; ')

I've tried a number of variations, e.g., on the following using recode
(from the car package) with imputationList (from the mitools package),
though without success:

files.allmisets - list.files(getwd(),pattern=miset*.csv$,full=TRUE)
allmis - imputationList(lapply(files.allmisets, read.csv))
allmis - update(allmis, white - recode(RACE, '1=1; else=0; '))

I've also tried some basic loops.  I guess I'm also a bit confused as
to when R references the original object and when it creates a new
one. I suppose I could do this in Python and the use PyR, but I'd
really like to learn a bit more about how R syntax.

Any help on this specific problem or general advice on manipulating
data in multiply imputed datasets in R would be much appreciated.

-- 
Donald Braman
http://www.law.gwu.edu/Faculty/profile.aspx?id=10123
http://research.yale.edu/culturalcognition
http://ssrn.com/author=286206

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Suitable package for carrying out sigma and beta convergence in panel data

2008-05-28 Thread Amarjit Singh Sethi

Dear all
nbsp;
I wish to carry out sigma- and beta-convergence analysis in respect of panel 
data [wherein current value of one of the variables needs be regressed upon 
suitably transformed lagged values of another variable(s)]. I am quite new to 
the R-language and am not very much aware of the availbaility of suitable 
package(s)/ code in the language. Can any one help me in letting me know of 
appropriate package/procedural steps to undertake the anlaysis. Kindly let me 
know as well, format of the input data file for such a package.
nbsp;
Regards
nbsp;
Amarjit Singh Sethi


  Bollywood, fun, friendship, sports and more. You name it, we have it on 
http://in.promos.yahoo.com/groups/bestofyahoo/
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R reference Books

2008-05-28 Thread Neil Gupta

Hi I am still fairly new to R but picking it up quickly. I have some
problems manipulating data in tables. I was wondering if anyone new any good
resources such as an R manual. Some of the intro pdfs I viewed do not show
how...much appreciated.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] manipulating multiply imputed data sets

2008-05-28 Thread John Fox

Dear Donald,

I can't guarantee that there aren't other problems, but your call to
update() is in error; you need 

allmis - update(allmis, white = recode(RACE, '1=1; else=0; '))

not

allmis - update(allmis, white - recode(RACE, '1=1; else=0; '))

[The last ; in the recode specification is unnecessary, but should do no
harm; as well, for a simple recode like this you might prefer ifelse() to
recode().]

Here's an example:

- snip -

 library(mitools)
 library(car)
 data(smi)
 smi - update(smi, sex=recode(sex, 0 = 'M'; 1 = 'F'))
 with(smi, table(sex, drkfre))
[[1]]
   drkfre
sex Non drinker not in last wk 3 days last wk =3 days last wk
  F 207194 134   35
  M 282201 105   12

[[2]]
   drkfre
sex Non drinker not in last wk 3 days last wk =3 days last wk
  F 200200 132   38
  M 282195 109   14

. . .

attr(,call)
with.imputationList(smi, table(sex, drkfre))

--- snip ---

I hope this helps,
 John

--
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
 Behalf Of Donald Braman
 Sent: May-28-08 2:21 PM
 To: r-help@r-project.org
 Subject: [R] manipulating multiply imputed data sets
 
 Hi folks,
 
 I have five imputed data sets and would like to apply the same
 recoding routines to each.  I could do this sort of thing pretty
 easily in Stata using MIM, but I've decided to go cold turkey on other
 stats packages as a incentive for learning more about R.  Most of the
 recoding is for nominal variables, like race, religion, urbanicity,
 and the like.  So, for example, to recode race for my first dataset,
 inmi1, I would do the following:
 
 miset1$white  - recode(miset1$RACE, '1=1; else=0; ')
 miset1$black  - recode(miset1$RACE, '2=1; else=0; ')
 miset1$asian  - recode(miset1$RACE, '3=1; else=0; ')
 miset1$hispanic - recode(miset1$RACE, '4=1; else=0; ')
 miset1$raceother - recode(miset1$RACE, '5=1; else=0; ')
 
 I've tried a number of variations, e.g., on the following using recode
 (from the car package) with imputationList (from the mitools package),
 though without success:
 
 files.allmisets - list.files(getwd(),pattern=miset*.csv$,full=TRUE)
 allmis - imputationList(lapply(files.allmisets, read.csv))
 allmis - update(allmis, white - recode(RACE, '1=1; else=0; '))
 
 I've also tried some basic loops.  I guess I'm also a bit confused as
 to when R references the original object and when it creates a new
 one. I suppose I could do this in Python and the use PyR, but I'd
 really like to learn a bit more about how R syntax.
 
 Any help on this specific problem or general advice on manipulating
 data in multiply imputed datasets in R would be much appreciated.
 
 --
 Donald Braman
 http://www.law.gwu.edu/Faculty/profile.aspx?id=10123
 http://research.yale.edu/culturalcognition
 http://ssrn.com/author=286206
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Gantt chart like graphics

2008-05-28 Thread kljosc


Dear R Community,

I have a dataframe like this

dat   product1 product2 ... productn
01.1.2008  1   1   1
02.1.2008  1   1   2
.
15.2.2008  2   2   NA
.
04.4.2008  2   2   1
05.4.2008  NA 2   NA

(date ascending order, 1:n products with status 1, 2 or NA)

and want to produce a graphic like this, where 1 or 2 are colored
rectangles, NAs are white:

product1 11..22_
product2 11..222
.
.
productn 12.._1_
 time_axis (=dat) -

which looks like a Gantt chart with multiple time slots. I have tried it
with image() which looks good, but I don't know how to produce an y-axis
with labels product1, etc. and a x-axis with date labels.
Further, I want to draw a scatterplot below (using layout()) with the same
x-axis. 
Maybe someone has a better idea which is easier to label?

Regards
Klemens
-- 
View this message in context: 
http://www.nabble.com/Gantt-chart-like-graphics-tp17518407p17518407.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] multistate survival analysis w/ time varying covariates

2008-05-28 Thread gulemeto



Hi, I've seen in the NestCohort package that one can do a hazard model with a
binary outcome using covariates. I am interested in multistate hazard models
with time-varying covariates, but can't seem to find this already implemented
in an R package. Is this included somewhere but called something else? I feel
like I've looked all over.

Thanks!
Michaela Gulemetova-Swan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R reference Books

2008-05-28 Thread Douglas Bates

There is a very good book called Data Manipulation with R by Phil
Spector that just became available.  It is brief and concise.  I would
recommend that book for learning about manipulating data in tables.
For anyone interested in data exploration and graphics I would also
recommend Deepayan Sarkar's book Lattice: Multivariate Data
Visualization with R.  Amazon.com is selling them as a pair for
$88.80 (US) (which I guess is about 3.72 euros these days).

On Wed, May 28, 2008 at 3:25 PM, Neil Gupta [EMAIL PROTECTED] wrote:
 Hi I am still fairly new to R but picking it up quickly. I have some
 problems manipulating data in tables. I was wondering if anyone new any good
 resources such as an R manual. Some of the intro pdfs I viewed do not show
 how...much appreciated.

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] library(Matrix) and image() colors?

2008-05-28 Thread jgarcia

Hi,
I'm trying to produce a plot of an image of a Matrix, but I don't get
other colors than the default grey scale:

 image(Matrix(topo.matrix.2),col.regions=topo.colors(100),colorkey=FALSE)

this still is plotted in grey.

Is there any mistake in my syntax?

Thanks and regards,
Javier
--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Can plot() be used for multiple plots?

2008-05-28 Thread Ben Fairbank

Greetings helpRs --

 

I would like to use plot() to plot two cumulative distribution curves so
that a user of the plot can compare the distributions of the two
variables.  The following code draws two distributions separately, but I
cannot find the instruction necessary to add a second cumulative
distribution to the first one.

 

Any suggestion would be very welcome.

 

x1 - sort(rnorm(1000,50,10))

x2 - sort(rnorm(1000,40,8))

plot(x1,1:length(x1)/length(x1),type=l)

plot(x2,1:length(x2)/length(x2),type=l)

grid(col = black)

 

Ben Fairbank

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] library(Matrix) and image() colors?

2008-05-28 Thread Deepayan Sarkar

On 5/28/08, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Hi,
  I'm trying to produce a plot of an image of a Matrix, but I don't get
  other colors than the default grey scale:

   image(Matrix(topo.matrix.2),col.regions=topo.colors(100),colorkey=FALSE)

  this still is plotted in grey.

  Is there any mistake in my syntax?

Not sure what the problem is, but although the first call does not
work, the second one does:

image(Matrix(volcano), col.regions = topo.colors(100), colorkey = TRUE)

image(as(Matrix(volcano), sparseMatrix),
  col.regions = topo.colors(100), colorkey = TRUE)

-Deepayan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] can I do this with R?

2008-05-28 Thread Xiaohui Chen

step or stepAIC functions do the job. You can opt to use BIC by changing 
the mulplication of penalty.


I think AIC and BIC are not only limited to compare two pre-defined 
models, they can be used as model search criteria. You could enumerate 
the information criteria for all possible models if the size of full 
model is relatively small. But this is not generally scaled to practical 
high-dimensional applications. Hence, it is often only possible to find 
a 'best' model of a local optimum, e.g. measured by AIC/BIC.


On the other way around, I wouldn't like to say the over-penalization of 
BIC. Instead, I think AIC is usually underpenalizing larger models in 
terms of the positive probability of incoperating irrevalent variables 
in linear models.


X

Frank E Harrell Jr 写道:

Smita Pakhale wrote:

Hi Maria,

But why do you want to use forwards or backwards
methods? These all are 'backward' methods of modeling.
Try using AIC or BIC. BIC is much better than AIC.
And, you do not have to believe me or any one else on
this. 


How does that help? BIC gives too much penalization in certain 
contexts; both AIC and BIC were designed to compare two pre-specified 
models. They were not designed to fix problems of stepwise variable 
selection.


Frank



Just make a small data set with a few variables with
known relationship amongst them. With this simulated
data set, use all your modeling methods: backwards,
forwards, AIC, BIC etc and then see which one gives
you a answer closest to the truth. The beauty of using
a simulated dataset is that, you 'know' the truth, as
you are the 'creater' of it!

smita

--- Charilaos Skiadas [EMAIL PROTECTED] wrote:


A google search for logistic regression with
stepwise forward in r returns the following post:



https://stat.ethz.ch/pipermail/r-help/2003-December/043645.html

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

On May 28, 2008, at 7:01 AM, Maria wrote:


Hello,
I am just about to install R and was wondering

about a few things.

I have only worked in Matlab because I wanted to

do a logistic

regression. However Matlab does not do logistic

regression with

stepwiseforward method. Therefore I thought about

testing R. So my

question is
can I do logistic regression with stepwise forward

in R?

Thanks /M

__






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] calling C function from R

2008-05-28 Thread Wen-Ching Lin

Hi,

I am reading the source code of rpart. I have problems understand the following 
code and would appreciate for any helps. In rpart.s, there is a line:

rpfit lt;- .C(C_s_to_rp,
nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; n = as.integer(nobs),
nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nvarx = 
as.integer(nvar),
nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; ncat = 
as.integer(cats* !isord),
nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; method= 
as.integer(method.int),
nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; 
as.double(unlist(controls)),
nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; parms = 
as.double(unlist(init$parms)),
nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; as.integer(xval),
nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; as.integer(xgroups),
nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; as.double(t(init$y)),
nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; as.double(X),
nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; 
as.integer(!is.finite(X)), # R lets Infs through
nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; error = character(1),
nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; wt = as.double(wt),
nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; as.integer(init$numy),
nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; as.double(cost),
nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; nbsp;nbsp;nbsp; NAOK=TRUE)

Is this line calling the rpart.c function? If so, what was sent to rpfit? I am 
confused because rpart.c only returns an integer, not an object. The following 
is the rpart.c function:

int rpart(int n,nbsp; int nvarx, Sint *ncat, int method,nbsp; intnbsp; 
maxpri,nbsp; double *parms,nbsp; double *ymat,nbsp; FLOAT *xmat, Sint 
*missmat, struct cptable *cptable, struct node 
**tree,nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp;nbsp; char 
**error,nbsp;nbsp; int *which, int xvals,nbsp;nbsp;nbsp;nbsp; Sint 
*x_grp,nbsp;nbsp;nbsp; double *wt,nbsp;nbsp;nbsp;nbsp; double *opt, int 
ny,nbsp; double *cost) 

--
Thanks!




  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] can I do this with R?

2008-05-28 Thread Frank E Harrell Jr


Xiaohui Chen wrote:
step or stepAIC functions do the job. You can opt to use BIC by changing 
the mulplication of penalty.


I think AIC and BIC are not only limited to compare two pre-defined 
models, they can be used as model search criteria. You could enumerate 
the information criteria for all possible models if the size of full 
model is relatively small. But this is not generally scaled to practical 
high-dimensional applications. Hence, it is often only possible to find 
a 'best' model of a local optimum, e.g. measured by AIC/BIC.


Sure you can use them that way, and they may perform better than other 
measures, but the resulting model will be highly biased (regression 
coefficients biased away from zero).  AIC and BIC were not designed to 
be used in this fashion originally.  Optimizing AIC or BIC will not 
produce well-calibrated models as does penalizing a large model.




On the other way around, I wouldn't like to say the over-penalization of 
BIC. Instead, I think AIC is usually underpenalizing larger models in 
terms of the positive probability of incoperating irrevalent variables 
in linear models.


If you put some constraints on the process (e.g., if using AIC to find 
the optimum penalty in penalized maximum likelihood estimation), AIC 
works very well and BIC results if far too much shrinkage 
(underfitting).  If using a dangerous process such as stepwise variable 
selection, the more conservative BIC may be better in some sense, worse 
in others.  The main problem with stepwise variable selection is the use 
of significance levels for entry below 1.0 and especially below 0.1.


Frank



X

Frank E Harrell Jr 写道:

Smita Pakhale wrote:

Hi Maria,

But why do you want to use forwards or backwards
methods? These all are 'backward' methods of modeling.
Try using AIC or BIC. BIC is much better than AIC.
And, you do not have to believe me or any one else on
this. 


How does that help? BIC gives too much penalization in certain 
contexts; both AIC and BIC were designed to compare two pre-specified 
models. They were not designed to fix problems of stepwise variable 
selection.


Frank



Just make a small data set with a few variables with
known relationship amongst them. With this simulated
data set, use all your modeling methods: backwards,
forwards, AIC, BIC etc and then see which one gives
you a answer closest to the truth. The beauty of using
a simulated dataset is that, you 'know' the truth, as
you are the 'creater' of it!

smita

--- Charilaos Skiadas [EMAIL PROTECTED] wrote:


A google search for logistic regression with
stepwise forward in r returns the following post:



https://stat.ethz.ch/pipermail/r-help/2003-December/043645.html

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

On May 28, 2008, at 7:01 AM, Maria wrote:


Hello,
I am just about to install R and was wondering

about a few things.

I have only worked in Matlab because I wanted to

do a logistic

regression. However Matlab does not do logistic

regression with

stepwiseforward method. Therefore I thought about

testing R. So my

question is
can I do logistic regression with stepwise forward

in R?

Thanks /M

__










--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] can I do this with R?

2008-05-28 Thread Xiaohui Chen


Frank E Harrell Jr 写道:

Xiaohui Chen wrote:
step or stepAIC functions do the job. You can opt to use BIC by 
changing the mulplication of penalty.


I think AIC and BIC are not only limited to compare two pre-defined 
models, they can be used as model search criteria. You could 
enumerate the information criteria for all possible models if the 
size of full model is relatively small. But this is not generally 
scaled to practical high-dimensional applications. Hence, it is often 
only possible to find a 'best' model of a local optimum, e.g. 
measured by AIC/BIC.


Sure you can use them that way, and they may perform better than other 
measures, but the resulting model will be highly biased (regression 
coefficients biased away from zero).  AIC and BIC were not designed to 
be used in this fashion originally.  Optimizing AIC or BIC will not 
produce well-calibrated models as does penalizing a large model.


Sure, I agree with this point. AIC is used to correct the bias from the 
estimations which minimize the KL distance of true model, provided the 
assumed model family contains the true model. BIC is designed for 
approximating the model marginal likelihood. Those are all 
post-selection estimating methods. For simutaneous variable selection 
and estimation, there are better penalizations like L1 penalty, which is 
much better than AIC/BIC in terms of consistency.


On the other way around, I wouldn't like to say the over-penalization 
of BIC. Instead, I think AIC is usually underpenalizing larger models 
in terms of the positive probability of incoperating irrevalent 
variables in linear models.


If you put some constraints on the process (e.g., if using AIC to find 
the optimum penalty in penalized maximum likelihood estimation), AIC 
works very well and BIC results if far too much shrinkage 
(underfitting).  If using a dangerous process such as stepwise 
variable selection, the more conservative BIC may be better in some 
sense, worse in others.  The main problem with stepwise variable 
selection is the use of significance levels for entry below 1.0 and 
especially below 0.1.
What's the point to use AIC on penalized MLE? I think generally you can 
view the penalty as the prior regularization and using certain 
optimization algorithm to find the MAP estimate.


Frank



X

Frank E Harrell Jr 写道:

Smita Pakhale wrote:

Hi Maria,

But why do you want to use forwards or backwards
methods? These all are 'backward' methods of modeling.
Try using AIC or BIC. BIC is much better than AIC.
And, you do not have to believe me or any one else on
this. 


How does that help? BIC gives too much penalization in certain 
contexts; both AIC and BIC were designed to compare two 
pre-specified models. They were not designed to fix problems of 
stepwise variable selection.


Frank



Just make a small data set with a few variables with
known relationship amongst them. With this simulated
data set, use all your modeling methods: backwards,
forwards, AIC, BIC etc and then see which one gives
you a answer closest to the truth. The beauty of using
a simulated dataset is that, you 'know' the truth, as
you are the 'creater' of it!

smita

--- Charilaos Skiadas [EMAIL PROTECTED] wrote:


A google search for logistic regression with
stepwise forward in r returns the following post:



https://stat.ethz.ch/pipermail/r-help/2003-December/043645.html

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

On May 28, 2008, at 7:01 AM, Maria wrote:


Hello,
I am just about to install R and was wondering

about a few things.

I have only worked in Matlab because I wanted to

do a logistic

regression. However Matlab does not do logistic

regression with

stepwiseforward method. Therefore I thought about

testing R. So my

question is
can I do logistic regression with stepwise forward

in R?

Thanks /M

__












__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to make R running faster

2008-05-28 Thread Richard Rowe


[EMAIL PROTECTED] wrote:

Hi everyone,
I run the R loops on window XP and vista. Both are Intel core 2 Duo 
2.2 GHz with 2 GB ram and XP is significantly faster than vista. Dose 
anyone know how speed up R loops in vista?


Thank you in advance.
Chunhao Tu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

This is a matter of operating system ... Mr Gates or Mr Ballmer may be 
able to help. No matter what you do in R the code  will probably be 
faster in XP ...


--
Dr Richard Rowe
Zoology  Tropical Ecology
School of Marine  Tropical Biology
James Cook University
Townsville 4811
AUSTRALIA

ph +61 7 47 81 4851
fax +61 7 47 25 1570
JCU has CRICOS Provider Code 00117J

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] can I do this with R?

2008-05-28 Thread Andrew Robinson

On Wed, May 28, 2008 at 03:47:49PM -0700, Xiaohui Chen wrote:
 Frank E Harrell Jr ??:
 Xiaohui Chen wrote:
 step or stepAIC functions do the job. You can opt to use BIC by 
 changing the mulplication of penalty.
 
 I think AIC and BIC are not only limited to compare two pre-defined 
 models, they can be used as model search criteria. You could 
 enumerate the information criteria for all possible models if the 
 size of full model is relatively small. But this is not generally 
 scaled to practical high-dimensional applications. Hence, it is often 
 only possible to find a 'best' model of a local optimum, e.g. 
 measured by AIC/BIC.
 
 Sure you can use them that way, and they may perform better than other 
 measures, but the resulting model will be highly biased (regression 
 coefficients biased away from zero).  AIC and BIC were not designed to 
 be used in this fashion originally.  Optimizing AIC or BIC will not 
 produce well-calibrated models as does penalizing a large model.
 
 Sure, I agree with this point. AIC is used to correct the bias from the 
 estimations which minimize the KL distance of true model, provided the 
 assumed model family contains the true model. BIC is designed for 
 approximating the model marginal likelihood. Those are all 
 post-selection estimating methods. For simutaneous variable selection 
 and estimation, there are better penalizations like L1 penalty, which is 
 much better than AIC/BIC in terms of consistency.

Xiaohui, 

Tibshirani (1996) suggests that the quality of the L1 penalty depends
on the structure of the dataset.  As I recall, subset selection was
preferred for finding a small number of large effects, lasso (L1) for
finding a small to moderate number of moderate-sized effects, and
ridge (L2) for many small effects.

Can you provide any references to more up-to-date simulations that you
would recommend?

Cheers,

Andrew
-- 
Andrew Robinson  
Department of Mathematics and StatisticsTel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
http://www.ms.unimelb.edu.au/~andrewpr
http://blogs.mbs.edu/fishing-in-the-bay/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Gantt chart like graphics

2008-05-28 Thread Gabor Grothendieck

Try gantt.chart in the plotrix package.

On Wed, May 28, 2008 at 1:54 PM, kljosc [EMAIL PROTECTED] wrote:

 Dear R Community,

 I have a dataframe like this

 dat   product1 product2 ... productn
 01.1.2008  1   1   1
 02.1.2008  1   1   2
 .
 15.2.2008  2   2   NA
 .
 04.4.2008  2   2   1
 05.4.2008  NA 2   NA

 (date ascending order, 1:n products with status 1, 2 or NA)

 and want to produce a graphic like this, where 1 or 2 are colored
 rectangles, NAs are white:

 product1 11..22_
 product2 11..222
 .
 .
 productn 12.._1_
 time_axis (=dat) -

 which looks like a Gantt chart with multiple time slots. I have tried it
 with image() which looks good, but I don't know how to produce an y-axis
 with labels product1, etc. and a x-axis with date labels.
 Further, I want to draw a scatterplot below (using layout()) with the same
 x-axis.
 Maybe someone has a better idea which is easier to label?

 Regards
 Klemens
 --
 View this message in context: 
 http://www.nabble.com/Gantt-chart-like-graphics-tp17518407p17518407.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Writing intermediate analysis to disk

2008-05-28 Thread Moshe Olshansky

Hi Stephen,

Have you looked at 'save' and 'load'?

As far as I understand, to really release the memory
you need to close R, so you may try to write a script
(shell script on Unix, batch file on Windows) which
invokes Rcmd to load the data, make an iteration and
save the result, so that R dies between subsequent
calls to Rcmd. I have never used this but I believe
that this shouldn't be too difficult.

--- stephen sefick [EMAIL PROTECTED] wrote:

 Is there a way to write and analysis to disk and
 then reconstruct the whole
 thing back into an object.
 
 wavCWT() #wmtsa package
 
 I am running out of memory on my computer and I was
 wondering if there was a
 way to iterate through this process (as it is an
 iterative process anyway-
 it just stores the whole thing to memory).  Or is
 there a way to set the
 scale that I want to look at so that wavCWT can use
 something other than the
 default.  In the documentation it says that the
 timestep or anything larger
 can be used.  My time step is 1/15, but I can not
 use anything larger like
 96 (which is one day of fifteen minute readings).
 thanks
 
 Stephen
 
 -- 
 Let's not spend our time and resources thinking
 about things that are so
 little or so large that all they really do for us is
 puff us up and make us
 feel like gods. We are mammals, and have not
 exhausted the annoying little
 problems of being mammals.
 
 -K. Mullis
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] confidence interval for the logit - predict.glm

2008-05-28 Thread Christine Sonvilla


Dear Brian and list members,

Thanks very much for your response Brian!
I applied the adjusted calculation that you advised me to use 
[1/(1+exp(-upperlogit))] and as a result I don't get any more NA values in my 
upper confidence interval values.

Yet, some outcomes are very akward, since for very small values such as 
1.98E-10, the lower values very reasonably turns out to be 0, yet the upper 
value is simply set to 1, which kind of makes the confidence interval redundant 
and doesn't give any proper idea how good of a predicted value this 1.98E-10 
value and other similar ones are.

In the following I try to retrace what excatly I am doing, yet I suppose the 
principle of reproducible code will be difficult to accomplish, since I can't 
add my input files here, I try my best and hope it kind of fits the criteria:

My input txt.files are:
testfishes.txt  (column names are the fish species, row names are the planning 
units, where fish surveys where conducted, so the matrix is binary, 
presence/absence data, 0s and 1s)

predy.txt (column names are the predictor variables, numerical, one is 
categorical, row names are again the planning units where fish surveys were 
conducted)

predallx.txt (column names are the predictor variables, row names now are ALL 
the planning units, also those where no fish survey was conducted)

# I do a glm analysis for my fish models, loading the files first:

predallx-read.table(predallx.txt,header=TRUE,row.names=1,sep=\t)
predallx$exposure-factor(predallx$exposure)
predallx$exposure-ordered(predallx$exposure,levels=c(Sheltered,Medium,Medexposed,Exposed))
attach(predallx)

predy-read.table(predy.txt,header=TRUE, row.names=1,sep=\t)
predy$exposure-factor(predy$exposure)
attach(predy)

testfishes-read.table(testfishes.txt,header=TRUE,row.names=1,sep=\t)
attach(testfishes)

# Creating a table for my glm output, models are compared on the basis of AIC 
values:

cat('fish',' ','model',' ','AIC',' ','df.residual',' ','deviance',\n,
file=AICfish.txt,append=FALSE)


# Creating a matrix for the predicted values (fishoccur) and a matrix for the 
upper and lower ci:

fishoccur-matrix(0,nrow(predallx),ncol(testfishes))
colnames(fishoccur)-colnames(testfishes)
rownames(fishoccur)-rownames(predallx)

upperresponse-matrix(0,nrow(predallx),ncol(testfishes))
colnames(upperresponse)-colnames(testfishes)
rownames(upperresponse)-rownames(predallx)

lowerresponse-matrix(0,nrow(predallx),ncol(testfishes))
colnames(lowerresponse)-colnames(testfishes)
rownames(lowerresponse)-rownames(predallx)

# Now I run the anlysis in a loop + stepwise approach and enter results in the 
tables created above, i stands for every single fish that in turn should run 
through the loop:

for (i in 1:5)
{
  predy$eachfish - testfishes[,i]
modelfish - glm(eachfish~depth500m + exposure + nearest.est + island + 
mangroves + seagrass,
 family=binomial, data=predy)
stepmodfi - step(modelfish, trace= 0)

 cat(i, make.names(c(stepmodfi$call),unique=TRUE), AIC(stepmodfi),
df.residual(stepmodfi),deviance(stepmodfi),\n, 
file=AICfish.txt,append=TRUE)

# Eventually I want to get predicted values for all those planning units where 
no fish surveys were conducted:

# First I get my predicted values from the response, this yields a predicted 
value for every single fish and every single planning unit, and these results 
will be fed into the previously constructed matrix fishoccur:

predictionresponse - predict(stepmodfi, predallx, type=response, 
se.fit=FALSE)
fishoccur[,i] - predictionresponse

# Now I want to get confidence intervals for these predicted values. I figured 
out that I would need to get these from the logit scale first and only 
thereafter transfer them to the response scale. So, first I get again predicted 
values, but on the logit scale. This time though I get just ONE result for 
every planning unit, when actually I also expected to get a result for every 
fish and every planning unit. So, it seems that there is just one value for 
each planning unit, but I thought it would be nessesary to get one value for 
each fish and planning unit combination(?) 

predictionlogit - predict(stepmodfi, predallx, type=link, se.fit=TRUE)

# Then, nevertheless, I calculated the upper and lower ci values and putting 
them into the previously created matrices of upperresponse and lowerresponse, 
the i stands for the number of fishes used in this run, so that I get one 
value for every combination of fish and planning unit. Yet, I wonder whether 
this can be done, if first I only get one predicted value from the logit scale:

upperresponse[,i] - exp(upperlogit)/(1+exp(upperlogit))
lowerresponse[,i] - exp(lowerlogit)/(1+exp(lowerlogit))

# Or I use your adjusted formula:

upperresponse[,i] - 1/(1+exp(-upperlogit))
lowerresponse[,i] - 1/(1+exp(-lowerlogit))

# Close the loop:

}


# Finally extracting tables:

write.table(fishoccur,file=fishoccur.txt,row.names=TRUE,sep=,)

[R] Separator argument in read.table

2008-05-28 Thread Gundala Viswanath

Hi,

Suppose I have the following tabular data:


1729_at | TRADD | TNFRSF1A-associated via death domain | protein-coding
1773_at | FNTB | farnesyltransferase, CAAX box, beta | protein-coding
177_at | PLD1 | phospholipase D1, phosphatidylcholine-specific | protein-coding


What is the right separator used for read.table function?

I tried this:

dat - read.table(geo2geneinfo_bymodel.txt, sep = |)
print(dat)

It doesn't seem to work. It flattens the table above into just two columns
meaning only contain $V1 and $V2.

sep= |   also won't work.

Please advice.


-- 
Gundala Viswanath

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] can I do this with R?

2008-05-28 Thread Smita Pakhale

Using any 'significance level', I think is the main
problem in the stepwise variable selection method. As
such in 'normal' circumstances the interpretation of
p-value is topsy-turvy. Then you can only imagine as
to what happens to this p-value interpretation in this
process of variable selection...you no longer no, what
does the significance level mean, if at all anything?
smita

--- Frank E Harrell Jr [EMAIL PROTECTED]
wrote:

 Xiaohui Chen wrote:
  step or stepAIC functions do the job. You can opt
 to use BIC by changing 
  the mulplication of penalty.
  
  I think AIC and BIC are not only limited to
 compare two pre-defined 
  models, they can be used as model search criteria.
 You could enumerate 
  the information criteria for all possible models
 if the size of full 
  model is relatively small. But this is not
 generally scaled to practical 
  high-dimensional applications. Hence, it is often
 only possible to find 
  a 'best' model of a local optimum, e.g. measured
 by AIC/BIC.
 
 Sure you can use them that way, and they may perform
 better than other 
 measures, but the resulting model will be highly
 biased (regression 
 coefficients biased away from zero).  AIC and BIC
 were not designed to 
 be used in this fashion originally.  Optimizing AIC
 or BIC will not 
 produce well-calibrated models as does penalizing a
 large model.
 
  
  On the other way around, I wouldn't like to say
 the over-penalization of 
  BIC. Instead, I think AIC is usually
 underpenalizing larger models in 
  terms of the positive probability of incoperating
 irrevalent variables 
  in linear models.
 
 If you put some constraints on the process (e.g., if
 using AIC to find 
 the optimum penalty in penalized maximum likelihood
 estimation), AIC 
 works very well and BIC results if far too much
 shrinkage 
 (underfitting).  If using a dangerous process such
 as stepwise variable 
 selection, the more conservative BIC may be better
 in some sense, worse 
 in others.  The main problem with stepwise variable
 selection is the use 
 of significance levels for entry below 1.0 and
 especially below 0.1.
 
 Frank
 
  
  X
  
  Frank E Harrell Jr åé:
  Smita Pakhale wrote:
  Hi Maria,
 
  But why do you want to use forwards or backwards
  methods? These all are 'backward' methods of
 modeling.
  Try using AIC or BIC. BIC is much better than
 AIC.
  And, you do not have to believe me or any one
 else on
  this. 
 
  How does that help? BIC gives too much
 penalization in certain 
  contexts; both AIC and BIC were designed to
 compare two pre-specified 
  models. They were not designed to fix problems of
 stepwise variable 
  selection.
 
  Frank
 
 
  Just make a small data set with a few variables
 with
  known relationship amongst them. With this
 simulated
  data set, use all your modeling methods:
 backwards,
  forwards, AIC, BIC etc and then see which one
 gives
  you a answer closest to the truth. The beauty of
 using
  a simulated dataset is that, you 'know' the
 truth, as
  you are the 'creater' of it!
 
  smita
 
  --- Charilaos Skiadas [EMAIL PROTECTED]
 wrote:
 
  A google search for logistic regression with
  stepwise forward in r returns the following
 post:
 
 
 

https://stat.ethz.ch/pipermail/r-help/2003-December/043645.html
  Haris Skiadas
  Department of Mathematics and Computer Science
  Hanover College
 
  On May 28, 2008, at 7:01 AM, Maria wrote:
 
  Hello,
  I am just about to install R and was wondering
  about a few things.
  I have only worked in Matlab because I wanted
 to
  do a logistic
  regression. However Matlab does not do
 logistic
  regression with
  stepwiseforward method. Therefore I thought
 about
  testing R. So my
  question is
  can I do logistic regression with stepwise
 forward
  in R?
  Thanks /M
  __
 
 
  
  
 
 
 -- 
 Frank E Harrell Jr   Professor and Chair  
 School of Medicine
   Department of Biostatistics  
 Vanderbilt University


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Levels error after printing

2008-05-28 Thread Gundala Viswanath

Hi all,

After running this code:

__BEGIN__
dat - read.table(gene_prob.txt, sep = \t)
n - length(dat$V1)
print(n)
print(dat$V1)
__END__

With this input in gene_prob.txt

__INPUT__
HFE 0.00107517988586552
NF1 0.000744355305599206
PML 0.000661649160532628
TCF30.000661649160532628
NF2 0.000578943015466049
GNAS0.000578943015466049
GGA20.000578943015466049
.

I get this print out.

..
[8541] LOC552889 GPR15 SLC2A11   GRIP2 SGEF
[8546] PIK3IP1   RPS27 AQP7
8548 Levels: 3.8-1 A2M A4GALT A4GNT AAAS AAK1 AAMP AANAT AARSD1 AASS
... hCG_1730474

What's the meaning of the last line? Is it an error?
How can I fix it?

-- 
Gundala Viswanath

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Extracting variables from random effects formulas

2008-05-28 Thread Rebecca Sela

I would like to be able to extract the names of the variables in a formula that 
specifies random effects.  For example, given:
random = ~ 1+year | st_name
I'd like to be able to get year and st_name out of random.  Neither 
terms(random) nor random[2] seems to work.  Is there a way to get variable 
names out?

Thanks in advance!

Rebecca

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

83 matches

Mail list logo