date:20101221

Re: [R] Odp: For-loop

2010-12-21 Thread Anne-Christine Mupepele

Hi Petr,

thank you, I got it. In fact I was looking for the function aggregate() which 
I didn't know. 

aggregate(x = t(data), by = list(cov$Month, cov$Area), FUN = sum)

that is doing exactly what I need. 

Anne
-Ursprüngliche Nachricht-
Von: Petr PIKAL 
Gesendet: 20.12.2010 14:23:50
An: Anne-Christine Mupepele [
Betreff: Odp: [R] For-loop

Hi

r-help-boun...@r-project.org napsal dne 20.12.2010 11:48:51:

 Hi,
 I have the following problem:
 
 I have a data.frame with 36 sample sites (colums) for which I have 
covariates 
 in 3 categories: Area, Month and River. Each Area consists of 3 rivers, 
which 
 were sampled over 3 month. Now I want to fuse River 1-3 for one area in 
one 
 month. To get a data.frame with 12 colums. 
 I am trying to do a for loop (which may be a complicated solution, but 
I 
 don't see an easier way), which is not working, apparently because 
a[,ij] or a
 [,c(i,j)] is not working as a definition of the matrix with a double 
condition
 in the colums. 
 How can  I make it work or what would be an easier solution?
 
 Thank you for your help,
 Anne
 
 data=data.frame(matrix(1:99,nrow=5,ncol=36))
 colnames(data)=c(paste(plot,1:36))
 
cov=data.frame(rep(1:3,12),c(rep(Jan,12),rep(Feb,12),rep(Mar,12)),rep(c
 (1,1,1,2,2,2,3,3,3,4,4,4),3))
 dimnames(cov)=list(colnames(data),c(River,Month,Area))
 
 ###loop###
 a=matrix(nrow=dim(data)[1],ncol=length(levels(factor(cov$Month)))*length
 (levels(factor(cov$Area
 
  for(i in 1:length(levels(factor(cov$Month 
  {
  for(j in 1:length(levels(factor(cov$Area 
  {
 
a[,ij]=as.numeric(rowSums(data[,factor(cov$Month)==levels(factor(cov$Month))
 [i]factor(cov$Area)==levels(factor(cov$Area))[j]]))
 }
 }

I am not exactly sure what you want to do. What operation is fuse? If it 
is sum so having you data you can do

area-rep(1:12, each=3)
data.t-t(data)
 aggregate(data.t, list(area), sum)
   Group.1  V1  V2  V3  V4  V5
11  18  21  24  27  30
22  63  66  69  72  75
33 108 111 114 117 120
44 153 156 159 162 165
55 198 201 204 207 210
66 243 246 249 252 255
77 189 192 195 198 102
88  36  39  42  45  48
99  81  84  87  90  93
10  10 126 129 132 135 138
11  11 171 174 177 180 183
12  12 216 219 222 225 228
 t(aggregate(data.t, list(area), sum))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
Group.1123456789101112
V118   63  108  153  198  243  189   36   81   126   171   216
V221   66  111  156  201  246  192   39   84   129   174   219
V324   69  114  159  204  249  195   42   87   132   177   222
V427   72  117  162  207  252  198   45   90   135   180   225
V530   75  120  165  210  255  102   48   93   138   183   228

but then there is Month value, which is not apparent from your example. 
Maybe

t(aggregate(data.t, list(area, data.t$Month), sum))

Could do the trick but you probably need to show us maybe str and/or head 
of your real data.

Regards
Petr

 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

___
GRATIS! Movie-FLAT mit über 300 Videos. 
Jetzt freischalten unter http://movieflat.web.de

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R CMD build/install: wrong Rtools include path is passed to g++

2010-12-21 Thread Andy Zhu

Hi:

I am trying to build/install rparallel source package in win32 using Rtools/R 
CMD.  However, R CMD build or install fails.  The R CMD build output shows that 
the path of Rtools/MinGW/include is wrong in g++ -I. How can I pass/configure 
the correct include path to R CMD? Tried this in both R 2.12 and 2.11 with 
compatible Rtools and Miktex/chm helper. Neither succeeded.

Note, the R/Rtools/MinGW setting works fine if the package doesn't have C/C++ 
code.  I was able to install my own R package which doesn't have C/C++ code.


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] monthly median in a daily dataset

2010-12-21 Thread SNV Krishna

Hi Dennis,

I am looking for similar function and this post is useful. But a strange
thing is happening when I try which I couldn't figure out (details below).
Could you or anyone help me understand why this is so?

 df = data.frame(date = seq(as.Date(2010-1-1), by = days, length =
250))
 df$value = cumsum(rnorm(1:250))

When I use the statement (as given in ?aggregate help file) the following
error is displayed
 aggregate(df$value, by = months(df$date), FUN = median)
Error in aggregate.data.frame(as.data.frame(x), ...) : 
  'by' must be a list

But it works when I use as was suggested 
 aggregate(value~months(date), data = df, FUN = median)
  months(date)  value
1April 15.5721440
2   August -0.1261205
3 February -1.0230631
4  January -0.9277885
5 July -2.1890907
6 June  1.3045260
7March 11.4126371
8  May  2.1625091

The second question, is it possible to have the median across the months and
years. Say I have daily data for last five years the above function will
give me the median of Jan of all the five years, while I want Jan-2010,
Jan-2009 and so... Wish my question is clear.

Any assistance will be greatly appreciated and many thanks for the same.

Regards, 

Krishna


Date: Sun, 19 Dec 2010 15:42:15 -0800
From: Dennis Murphy djmu...@gmail.com
To: HUXTERE emilyhux...@gmail.com
Cc: r-help@r-project.org
Subject: Re: [R] monthly median in a daily dataset
Message-ID:
aanlktimxtjhbse1mq4o121fekxtf8d1psyeegzkkz...@mail.gmail.com
Content-Type: text/plain

Hi:

There is a months() function associated with Date objects, so you should be
able to do something like

aggregate(value ~ months(date), data = data$flow$daily, FUN = median)

Here's a toy example because your data are not in a ready form:

df - data.frame(date = seq(as.Date('2010-01-01'), by = 'days', length =
250),
val =  rnorm(250))
 aggregate(val ~ months(date), data = df, FUN = median)
  months(date) val
1April -0.18864817
2   August -0.16203705
3 February  0.03671700
4  January  0.04500988
5 July -0.12753151
6 June  0.09864811
7March  0.23652105
8  May  0.25879994
9September  0.53570764

HTH,
Dennis

On Sun, Dec 19, 2010 at 2:31 PM, HUXTERE emilyhux...@gmail.com wrote:


 Hello,

 I have a multi-year dataset (see below) with date, a data value and a flag
 for the data value. I want to find the monthly median for each month in
 this
 dataset and then plot it. If anyone has suggestions they would be greatly
 apperciated. It should be noted that there are some dates with no values
 and
 they should be removed.

 Thanks
 Emily

  print ( str(data$flow$daily) )
 'data.frame':   16071 obs. of  3 variables:
  $ date :Class 'Date'  num [1:16071] -1826 -1825 -1824 -1823 -1822 ...
  $ value: num  NA NA NA NA NA NA NA NA NA NA ...
  $ flag : chr  ...
 NULL

 5202008-11-01 0.034
 1041   2008-11-02 0.034
 1562   2008-11-03 0.034
 2083   2008-11-04 0.038
 2604   2008-11-05 0.036
 3125   2008-11-06 0.035
 3646   2008-11-07 0.036
 4167   2008-11-08 0.039
 4688   2008-11-09 0.039
 5209   2008-11-10 0.039
 5730   2008-11-11 0.038
 6251   2008-11-12 0.039
 6772   2008-11-13 0.039
 7293   2008-11-14 0.038
 7814   2008-11-15 0.037
 8335   2008-11-16 0.037
 8855   2008-11-17 0.037
 9375   2008-11-18 0.037
 9895   2008-11-19 0.034B
 10415  2008-11-20 0.034B
 10935  2008-11-21 0.033B
 11455  2008-11-22 0.034B
 11975  2008-11-23 0.034B
 12495  2008-11-24 0.034B
 13016  2008-11-25 0.034B
 13537  2008-11-26 0.033B
 14058  2008-11-27 0.033B
 14579  2008-11-28 0.033B
 15068  2008-11-29 0.034B
 15546  2008-11-30 0.035B
 5212008-12-01 0.035B
 1042   2008-12-02 0.034B
 1563   2008-12-03 0.033B
 2084   2008-12-04 0.031B
 2605   2008-12-05 0.031B
 3126   2008-12-06 0.031B
 3647   2008-12-07 0.032B
 4168   2008-12-08 0.032B
 4689   2008-12-09 0.032B
 5210   2008-12-10 0.033B
 5731   2008-12-11 0.033B
 6252   2008-12-12 0.032B
 6773   2008-12-13 0.031B
 7294   2008-12-14 0.030B
 7815   2008-12-15 0.030B
 8336   2008-12-16 0.029B
 8856   2008-12-17 0.028B
 9376   2008-12-18 0.028B
 9896   2008-12-19 0.028B
 10416  2008-12-20 0.027B
 10936  2008-12-21 0.027B
 11456  2008-12-22 0.028B
 11976  2008-12-23 0.028B
 12496  2008-12-24 0.029B
 13017  2008-12-25 0.029B
 13538  2008-12-26 0.029B
 14059  2008-12-27 0.030B
 14580  2008-12-28 0.030B
 15069  2008-12-29 0.030B
 15547  2008-12-30 0.031B
 15851  2008-12-31 0.031B
 --
 View this message in context:

http://r.789695.n4.nabble.com/monthly-median-in-a-daily-dataset-tp3094917p30
94917.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide

[R] predict function for kmeans

2010-12-21 Thread Raji


Hi,

   I am using kmeans algorithm to cluster my training dataset.After the
model is generated, i need to apply it to my production dataset and see the
clusters it falls into.But, i am unable to find a predict function for
kmeans to do this. Could you please let me know if there is a predict
function in R to perform this?

In SPSS, once the kmeans model is generated , it can be applied to a new
dataset and find the clusters.I am trying to do something similar in R.

Thanks in advance.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/predict-function-for-kmeans-tp3121557p3121557.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R CMD build/install: wrong Rtools include path is passed to g++

2010-12-21 Thread Andy Zhu

Never mind.  Found the solution: the package coded the rtools path in 
Makevars.win.  So I was able to compile (but have another problem though).  But 
not sure if there is an environment name for rtools, maybe RTOOLS_HOME ...

Thanks.



- Forwarded Message 
From: Andy Zhu andyzh...@yahoo.com
Cc: r-help@r-project.org
Sent: Mon, December 20, 2010 11:33:31 PM
Subject: [R] R CMD build/install: wrong Rtools include path is passed to g++


Hi:

I am trying to build/install rparallel source package in win32 using Rtools/R 
CMD.  However, R CMD build or install fails.  The R CMD build output shows that 
the path of Rtools/MinGW/include is wrong in g++ -I. How can I pass/configure 
the correct include path to R CMD? Tried this in both R 2.12 and 2.11 with 
compatible Rtools and Miktex/chm helper. Neither succeeded.

Note, the R/Rtools/MinGW setting works fine if the package doesn't have C/C++ 
code.  I was able to install my own R package which doesn't have C/C++ code.


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] replace values of a table !!!

2010-12-21 Thread Taiseer Aljazzar

Dear all,

Dear all,

I am a relatively new user.
I have an ascii file with 550 rows and 400 columns. The file contain values 
ranging from 1 to 2000 and some values with -.

I want to generate a new file where the - values are replaced with 0 
values, the other values with the 1.0 value.

What should I do,

Thanks
Taiseer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Performing basic Multiple Sequence Alignment in R?

2010-12-21 Thread Tal Galili

Hello everyone,

I am not sure if this should go on the general R mailing list (for example,
if there is a text mining solution that might work here) or the bioconductor
mailing list (since I wasn't able to find a solution to my question on
searching their lists) - so this time I tried both, and in the future I'll
know better (in case it should go to only one of the two).


The task I'm trying to achieve is to align several sequences together.
I don't have a basic pattern to match to.  All that I know is that the
True pattern should be of length 30 and that the sequences I'm looking
at, have had missing values introduced to them at random points.
Here is an example of such sequences, were on the left we see what is the
real location of the missing values, and on the right we see the sequence
that we will be able to observe.  My goal is to reconstruct the left column
using only the sequences I've got on the right column (based on the fact
that many of the letters in each position are the same)

 Real_sequence   The_sequence_we_see
1   CGCAATACTAAC-AGCTGACTTACGCACCG CGCAATACTAACAGCTGACTTACGCACCG
2   CGCAATACTAGC-AGGTGACTTCC-CT-CG   CGCAATACTAGCAGGTGACTTCCCTCG
3   CGCAATGATCAC--GGTGGCTCCCGGTGCG  CGCAATGATCACGGTGGCTCCCGGTGCG
4   CGCAATACTAACCA-CTAACT--CGCTGCG   CGCAATACTAACCACTAACTCGCTGCG
5   CGCACGGGTAAGAACGTGA-TTACGCTCAG CGCACGGGTAAGAACGTGATTACGCTCAG
6   CGCTATACTAACAA-GTG-CTTAGGC-CTG   CGCTATACTAACAAGTGCTTAGGCCTG
7   CCCA-C-CTAA-ACGGTGACTTACGCTCCG   CCCACCTAAACGGTGACTTACGCTCCG


Here is an example code to reproduce the above example:

ATCG - c(A,T,C,G)
set.seed(40)

original.seq - sample(ATCG, 30, T)

seqS - matrix(original.seq,200,30, T)

change.letters - function(x, number.of.changes = 15,
letters.to.change.with = ATCG)
{

number.of.changes - sample(seq_len(number.of.changes), 1)

new.letters - sample(letters.to.change.with , number.of.changes, T)

where.to.change.the.letters - sample(seq_along(x) , number.of.changes, F)

x[where.to.change.the.letters] - new.letters

return(x)
}

change.letters(original.seq)

insert.missing.values - function(x) change.letters(x, 3, -)

insert.missing.values(original.seq)

seqS2 - t(apply(seqS, 1, change.letters))

seqS3 - t(apply(seqS2, 1, insert.missing.values))

seqS4 - apply(seqS3,1, function(x) {paste(x, collapse = )})
require(stringr)
# library(help=stringr)

all.seqS - str_replace(seqS4,- , )

# how do we allign this?

data.frame(Real_sequence = seqS4, The_sequence_we_see = all.seqS)


I understand that if all I had was a string and a pattern I would be able to
use

library(Biostrings)

pairwiseAlignment(...)



But in the case I present we are dealing with many sequences to align to one
another (instead of aligning them to one pattern).

Is there a known method for doing this in R?


Thanks,

Tal



Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] .Rd file for S4-method warning

2010-12-21 Thread Mark Heckmann

Hi Duncan,

thanks for the quick answer! 
Oops, I have really overseen the correct argument specs...

In the beginning roxygen did not support S4 but by now it does.
The usage line is created automatically by roxygen in this case (only when 
using S4 classes and I am not sure when and when not).
Not being an roxygen expert, I feel that roxygen limits the creation of .Rd 
files at some points but still is a great thing I use a lot.

For those interested, here is how it worked for me with roxygen:

#' Show method for testClass
#'
#' @param object a \code{testClass} object
#' @docType methods
#' @aliases show, testClass-method
#' @usage \S4method{show}{testClass}(object)
#'
setMethod(show, testClass, function(object){
})

Mark

Am 20.12.2010 um 23:31 schrieb Duncan Murdoch:

 On 20/12/2010 5:18 PM, Mark Heckmann wrote:
 Dear R users,
 
 I want to create a proper .Rd file for the show method for an S4 class.
 I am encountering problems in the \usage{} line, I guess. An example:
 
 setClass(testClass,
 representation(a=character))
 
 setMethod(show, testClass, function(object){
 })
 
 
 The .Rd file:
 
 \name{show,-method}
 \alias{show,testClass-method}
 \alias{show}
 \title{Show method for testClass...}
 \usage{\S4method{show}{testClass}(object)
 }
 \description{Show method for testClass}
 \arguments{\item{testClass}{object}
 }
 
 CHECK says:
 * checking Rd \usage sections ... WARNING
 Undocumented arguments in documentation object 'show,-method'
  object
 
 What would be a correct \usage line? Writing R extensions says:
 \S4method{generic}{signature_list}(argument_list)
 
 That's okay, the warning is about the fact that you didn't document object in 
 the \arguments section.
 
 You had
 
 \item{testClass}{object}
 
 but you should have had
 
 \item{object}{some description of what object is}
 
 As yours was written, it's documentation for the testclass argument, which 
 doesn't exist.
 
 What am I doing wrong?
 It works though if I simply delete the \usage line.
 Unfortunately I use roxygen and the line is created automatically,
 so I need to create it properly.
 
 Does roxygen also create the argument?  Looks like a bug or limitation (I 
 seem to recall that roxygen doesn't support S4, or didn't in the past...)
 
 Duncan Murdoch
 
 
 Thanks in advance,
 Mark
 
 Mark Heckmann
 Blog: www.markheckmann.de
 R-Blog: http://ryouready.wordpress.com
 
 
 
 
 
 
  [[alternative HTML version deleted]]
 
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


Mark Heckmann
Dipl. Wirt.-Ing. cand. Psych.
Celler Straße 27
28205 Bremen
Blog: www.markheckmann.de
R-Blog: http://ryouready.wordpress.com






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] NA's in survey analysis

2010-12-21 Thread Donatas G.

Hello,

I am trying to analyze sociological survey data using R. It is often
important in survey to calculate both the actual factor sums and
percentages (easily done with describe() ), but also the numbers and
total percentage of NA's. Often it is important to present NA's in
graphs besides the factors.

Is there any easy way to make R treat NA's as if those were factors
besides other factors?

Now, describe(data$a) gives me percentages only for the factors. So I
have to redo percentages manually.

barplot() also ignores NA's. So, to include NA's into barplot I need
to do a table more or less manually.

The other way to do it is to convert NA's into factors (doable,
although, unlike in SPSS, I cannot make an assumption that 99 is a
good code for a factor NA – it has to be the next number in the
factor list,so, might be different for each column in a data frame).
And besides, I have read somewhere in this list that IT IS THE WRONG
WAY TO DO STUFF IN R :)

Is there the right way to do things that I want, and if not – what are
the possible workarounds, smarter than the ones I listed?

--
Donatas Glodenis

-- 
Donatas Glodenis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Odp: NA's in survey analysis

2010-12-21 Thread Petr PIKAL

Hi

r-help-boun...@r-project.org napsal dne 21.12.2010 11:02:07:

 Hello,
 
 I am trying to analyze sociological survey data using R. It is often
 important in survey to calculate both the actual factor sums and
 percentages (easily done with describe() ), but also the numbers and
 total percentage of NA's. Often it is important to present NA's in
 graphs besides the factors.
 
 Is there any easy way to make R treat NA's as if those were factors
 besides other factors?
 
 Now, describe(data$a) gives me percentages only for the factors. So I
 have to redo percentages manually.
 
 barplot() also ignores NA's. So, to include NA's into barplot I need
 to do a table more or less manually.
 
 The other way to do it is to convert NA's into factors (doable,
 although, unlike in SPSS, I cannot make an assumption that 99 is a

not necessary to code missing values, you can set NA as one level.

x-factor(sample(c(1:3, NA),20,replace=T), exclude=NULL)
x
 [1] 1133323NA 312NA 3NA 
2 
[16] 231NA 3 
Levels: 1 2 3 NA
 y-rnorm(20)
boxplot(split(y,x))

Besides you could find it from factor help page as I did.

Regards
Petr

 good code for a factor NA – it has to be the next number in the
 factor list,so, might be different for each column in a data frame).
 And besides, I have read somewhere in this list that IT IS THE WRONG
 WAY TO DO STUFF IN R :)
 
 Is there the right way to do things that I want, and if not – what are
 the possible workarounds, smarter than the ones I listed?
 
 --
 Donatas Glodenis
 
 -- 
 Donatas Glodenis
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Odp: replace values of a table !!!

2010-12-21 Thread Petr PIKAL

Hi

r-help-boun...@r-project.org napsal dne 21.12.2010 09:59:31:

 Dear all,
 
 Dear all,
 
 I am a relatively new user.
 I have an ascii file with 550 rows and 400 columns. The file contain 
values 
 ranging from 1 to 2000 and some values with -.
 
 I want to generate a new file where the - values are replaced with 0 

 values, the other values with the 1.0 value.

Do you want to use R for it? If yes you can read the file and set - as 
missing value
see ?read.table

further on you can

change not NA values to 1 by

your.data[!is.na(your.data)] - 1

and NA values to 0 by

your.data[is.na(your.data)] - 0

Regards
Petr

 
 What should I do,
 
 Thanks
 Taiseer
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] labels and barchart

2010-12-21 Thread Robert Ruser

Hello,
I'm wondering how to set a value of mar ( par( mar=c(...)) ) in
order to allow labels to be visible in barplot. Is there any relation
between the number of characters in a label and the second value of
mar? Look at my example.

x - seq(20, 100, by=15)
ety - rep( Effect on treatment group, times=length(x))
barplot(x, names.arg=ety, las=1, horiz=TRUE)

Labels are not visible. But trial and error method with the second mar
argument I get what I want.

par(mar=c(3,12,2,1), cex=0.8)
barplot(x, names.arg=ety, las=1, horiz=TRUE)

I would like something like that: second.mar = max( nchar(ety) )/2

Taking the opportunity I have 2 another question:
1. Space between labels and bars is too big - how to change it to the
value of 1 character?
2. In the example above the x axis is too short. How to make R draw a
line little longer then maximum bar length. I know that I could set
xlim=c(0,max(x)) but because of main increase equals 20 and the last
value 95 it doesn't solve the problem. The increase is ok. but only
line should be longer.



Thank you
Robert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] loading workspace- getting annoying

2010-12-21 Thread Angel Salamanca


Also, 

rm(list=ls())

will remove absolutely everything from your workspace. Next time you quit
and save workspace you start with and empty workspace.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/loading-workspace-getting-annoying-tp3004781p3138203.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] NA's in survey analysis

2010-12-21 Thread Donatas G.

Hello,

I am trying to analyze sociological survey data using R. It is often
important in survey to calculate both the actual factor sums and
percentages (easily done with describe() ), but also the numbers and
total percentage of NA's. Often it is important to present NA's in
graphs besides the factors.

Is there any easy way to make R treat NA's as if those were factors
besides other factors?

Now, describe(data$a) gives me percentages only for the factors. So I
have to redo percentages manually.

barplot() also ignores NA's. So, to include NA's into barplot I need
to do a table more or less manually.

The other way to do it is to convert NA's into factors (doable,
although, unlike in SPSS, I cannot make an assumption that 99 is a
good code for a factor NA – it has to be the next number in the
factor list,so, might be different for each column in a data frame).
And besides, I have read somewhere in this list that IT IS THE WRONG
WAY TO DO STUFF IN R :)

Is there the right way to do things that I want, and if not – what are
the possible workarounds, smarter than the ones I listed?

-- 
Donatas Glodenis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RE : replace values of a table !!!

2010-12-21 Thread Wolfgang RAFFELSBERGER

I suppose what you want to do is something like:

dat - matrix(c(2:13,-),nc=4)
dat
dat[dat== -] - 1   # replace the - by 0
dat


Please be careful to think twice what you are doing to you data by changing 
some values.  Maybe you rather want to replace the - values by NA ?

HTH,
Wolfgang
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Wolfgang Raffelsberger, PhD
Laboratoire de BioInformatique et Génomique Intégratives
IGBMC,
1 rue Laurent Fries,  67404 Illkirch  Strasbourg,  France
Tel (+33) 388 65 3300 Fax (+33) 388 65 3276
wolfgang.raffelsber...@igbmc.fr


De : r-help-boun...@r-project.org [r-help-boun...@r-project.org] de la part de 
Taiseer Aljazzar [taljaz...@yahoo.com]
Date d'envoi : mardi 21 décembre 2010 09:59
À : r-help@r-project.org
Objet : [R] replace values of a table !!!

Dear all,

Dear all,

I am a relatively new user.
I have an ascii file with 550 rows and 400 columns. The file contain values 
ranging from 1 to 2000 and some values with -.

I want to generate a new file where the - values are replaced with 0 
values, the other values with the 1.0 value.

What should I do,

Thanks
Taiseer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] labels and barchart

2010-12-21 Thread Gerrit Eichner


Hello, Robert,

see hints below.

On Tue, 21 Dec 2010, Robert Ruser wrote:


Hello,
I'm wondering how to set a value of mar ( par( mar=c(...)) ) in
order to allow labels to be visible in barplot. Is there any relation
between the number of characters in a label and the second value of
mar? Look at my example.

x - seq(20, 100, by=15)
ety - rep( Effect on treatment group, times=length(x))
barplot(x, names.arg=ety, las=1, horiz=TRUE)

Labels are not visible. But trial and error method with the second mar
argument I get what I want.

par(mar=c(3,12,2,1), cex=0.8)
barplot(x, names.arg=ety, las=1, horiz=TRUE)

I would like something like that: second.mar = max( nchar(ety) )/2


Can't help with that really, but ...


Taking the opportunity I have 2 another question:
1. Space between labels and bars is too big - how to change it to the
value of 1 character?
2. In the example above the x axis is too short. How to make R draw a
line little longer then maximum bar length. I know that I could set
xlim=c(0,max(x)) but because of main increase equals 20 and the last
value 95 it doesn't solve the problem. The increase is ok. but only
line should be longer.


You could take a look at par()'s argument mgp, but it affects both axes at 
the same time. I have the impression that you want more control of the 
style of each axis separately; axis() might than be useful, like



par( mar = c( 3, 13, 2, 1), cex = 0.8)

barplot( x, names.arg = NULL, horiz = TRUE, axes = FALSE)

axis( side = 1, at = c( seq( 0, 80, by = 20), 95))

axis( side = 2, at = 1:length(ety), line = -1, las = 1, tick = FALSE,
  labels = ety)



Hth,

Gerrit





Thank you
Robert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] labels and barchart

2010-12-21 Thread Dieter Menne



Robert Ruser wrote:
 
 x - seq(20, 100, by=15)
 ety - rep( Effect on treatment group, times=length(x))
 barplot(x, names.arg=ety, las=1, horiz=TRUE)
 
 Labels are not visible. But trial and error method with the second mar
 argument I get what I want.
 

Standard graphics has fallen a bit out of favor because of these quirks. Try
lattice:

library(lattice)
x - seq(20, 100, by=15)
ety - paste(Effect on treatment group,1:length(x))
barchart(ety~x)

Note that the ety labels must be different to make this work. With your
original data, you only get one bar (and I needed some time to find out what
was wrong).

Dieter

-- 
View this message in context: 
http://r.789695.n4.nabble.com/labels-and-barchart-tp3141185p3145166.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Where is the bioDist package?

2010-12-21 Thread Dieter Menne



venik wrote:
 
 I am trying in vain to find the bioDist package.
 More generally, where can I find a lit of packages and their location?  I
 thought CRAN will have it, but I had no luck with bioDist.
 

Google bioDist, second hit (maybe another one, depending on your language
settings).

D

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Where-is-the-bioDist-package-tp3143266p3145231.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] labels and barchart

2010-12-21 Thread Robert Ruser

2010/12/21 Gerrit Eichner gerrit.eich...@math.uni-giessen.de:
 par( mar = c( 3, 13, 2, 1), cex = 0.8)

 barplot( x, names.arg = NULL, horiz = TRUE, axes = FALSE)

 axis( side = 1, at = c( seq( 0, 80, by = 20), 95))

 axis( side = 2, at = 1:length(ety), line = -1, las = 1, tick = FALSE,
      labels = ety)

Thank you very much. I would change a little because the levels of the
labels are not good.

par( mar = c( 3, 13, 2, 1), cex = 0.8)
 my.chart - barplot( x, names.arg = NULL, horiz = TRUE, axes = FALSE)
axis( side = 1, at = c( seq( 0, 80, by = 20), 95))
 axis( side = 2, at = my.chart, line = -1, las = 1, tick = FALSE, labels = ety)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] labels and barchart

2010-12-21 Thread Robert Ruser

2010/12/21 Dieter Menne dieter.me...@menne-biomed.de:
 Standard graphics has fallen a bit out of favor because of these quirks. Try
 lattice:

 library(lattice)
 x - seq(20, 100, by=15)
 ety - paste(Effect on treatment group,1:length(x))
 barchart(ety~x)

 Note that the ety labels must be different to make this work. With your
 original data, you only get one bar (and I needed some time to find out what
 was wrong).

Thank you. I know that lattice in some circumstances is better but I
find traditional graphics more controllable.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Coding a new variable based on criteria in a dataset

2010-12-21 Thread RaoulD


Hi,

I'm a bit stuck and need some help with R code to code a variable F_R based
on a combination of conditions. 

The first condition would code F_R as F and would be based on the
min(Date) and Min(Time) for each combination of UniqueID  Reason. The
second condition would code the variable as R as it would be the rest of
the data that dont meet the first condition. 

For example: for UID 1  Reason 1 the first record would be coded F
and the 4th record would be coded R. 

   UniqueID   Reason   Date  Time
1 UID 1   Reason 1 19/12/2010 15:00
2 UID 1   Reason 2 19/12/2010 16:00
3 UID 1   Reason 3 19/12/2010 16:30
4 UID 1   Reason 1 20/12/2010 08:00
5 UID 1   Reason 2 20/12/2010 10:01
6 UID 1   Reason 3 20/12/2010 11:30
7 UID 1   Reason 1 21/12/2010 12:45
8 UID 1   Reason 2 21/12/2010 18:44
9 UID 1   Reason 3 21/12/2010 19:29
10UID 2  Reason 1 19/12/2010 17:00
11UID 2  Reason 2 19/12/2010 18:00
12UID 2  Reason 3 19/12/2010 18:10
13UID 2  Reason 1 20/12/2010 13:00
14UID 2  Reason 2 20/12/2010 13:30
15UID 2  Reason 3 20/12/2010 16:15 

Is a loop the most efficient way to do this or is there some pre-existing
function that can help me with this? The sample dataset is what is given
below.

Thanks in advance,
Raoul
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Coding-a-new-variable-based-on-criteria-in-a-dataset-tp3145176p3145176.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R CMD build/install: wrong Rtools include path is passed to g++

2010-12-21 Thread Duncan Murdoch


Andy Zhu wrote:

Hi:

I am trying to build/install rparallel source package in win32 using Rtools/R 
CMD.  However, R CMD build or install fails.  The R CMD build output shows that 
the path of Rtools/MinGW/include is wrong in g++ -I. How can I pass/configure 
the correct include path to R CMD? Tried this in both R 2.12 and 2.11 with 
compatible Rtools and Miktex/chm helper. Neither succeeded.


Note, the R/Rtools/MinGW setting works fine if the package doesn't have C/C++ 
code.  I was able to install my own R package which doesn't have C/C++ code.


I think your analysis is wrong. The path to Rtools/MinGW/include is not 
explicitly set by R.  You set the PATH to the compiler, and that include 
directory is automatically set.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] NA's in survey analysis

2010-12-21 Thread Donatas G.

2010/12/21 Petr PIKAL petr.pi...@precheza.cz:
 Hi

 r-help-boun...@r-project.org napsal dne 21.12.2010 11:02:07:

 Hello,

 I am trying to analyze sociological survey data using R. It is often
 important in survey to calculate both the actual factor sums and
 percentages (easily done with describe() ), but also the numbers and
 total percentage of NA's. Often it is important to present NA's in
 graphs besides the factors.

 Is there any easy way to make R treat NA's as if those were factors
 besides other factors?

 Now, describe(data$a) gives me percentages only for the factors. So I
 have to redo percentages manually.

 barplot() also ignores NA's. So, to include NA's into barplot I need
 to do a table more or less manually.

 The other way to do it is to convert NA's into factors (doable,
 although, unlike in SPSS, I cannot make an assumption that 99 is a

 not necessary to code missing values, you can set NA as one level.

 x-factor(sample(c(1:3, NA),20,replace=T), exclude=NULL)
 x
  [1] 1    1    3    3    3    2    3    NA 3    1    2    NA 3    NA
 2
 [16] 2    3    1    NA 3
 Levels: 1 2 3 NA
 y-rnorm(20)
 boxplot(split(y,x))

 Besides you could find it from factor help page as I did.

 Regards
 Petr

Thank you Petr, this info (re exclude=NULL) might have saved me tons
of time last week :)

I still have not found an equivalent parameter in describe(), but
anyway, I have been helped a lot!

-- 
Donatas Glodenis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to suppress plotting for xyplot(zoo(x))?

2010-12-21 Thread Marius Hofert

Hi,

I found the thread 
http://r.789695.n4.nabble.com/Matrix-as-input-to-xyplot-lattice-proper-extended-formula-syntax-td896948.html
 
I used Gabor's approach and then tried to assign the plot to a variable (see 
below). But a Quartz device is opened... why? I don't want to have anything 
plot/printed, I just would like to store the plot object. Is there something 
like plot = FALSE?

Cheers,

Marius

library(lattice)
library(zoo)

df - data.frame(y = matrix(rnorm(24), nrow = 6), x = 1:6) 
xyplot(zoo(df[1:4], df$x), type = p)

plot.object - xyplot(zoo(df[1:4], df$x), type = p) # problem: a Quartz 
device is opened (on Mac OS X 10.6) 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression or not?

2010-12-21 Thread Ben Bolker

array chip arrayprofile at yahoo.com writes:

[snip]

 I can think of analyzing this data using glm() with the attached dataset:
 
 test-read.table('test.txt',sep='\t')
 fit-glm(cbind(positive,total-positive)~treatment,test,family=binomial)
 summary(fit)
 anova(fit, test='Chisq')
 
 First, is this still called logistic regression or something else? I thought 
 with logistic regression, the response variable is a binary factor?

  Sometimes I've seen it called binomial regression, or just 
a binomial generalized linear model

 Second, then summary(fit) and anova(fit, test='Chisq') gave me different p 
 values, why is that? which one should I use?

  summary(fit) gives you p-values from a Wald test.
  anova() gives you tests based on the Likelihood Ratio Test.
  In general the LRT is more accurate.

 Third, is there an equivalent model where I can use variable percentage 
 instead of positive  total?

  glm(percentage~treatment,weights=total,data=tests,family=binomial)

 is equivalent to the model you fitted above.
 
 Finally, what is the best way to analyze this kind of dataset 
 where it's almost the same as ANOVA except that the response variable
  is a proportion (or success and failure)?

  Don't quite know what you mean here.  How is the situation almost
the same as ANOVA different from the situation you described above?
Do you mean when there are multiple factors? or ???

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Performing basic Multiple Sequence Alignment in R?

2010-12-21 Thread David Winsemius

Tal; I'm trimming the BioC posting. In the R lists it is considered  
spamming to cross post. (Please re-read the Posting Guide.)



On Dec 21, 2010, at 4:21 AM, Tal Galili wrote:


Hello everyone,

I am not sure if this should go on the general R mailing list (for  
example,
if there is a text mining solution that might work here) or the  
bioconductor

mailing list (since I wasn't able to find a solution to my question on
searching their lists) - so this time I tried both, and in the  
future I'll

know better (in case it should go to only one of the two).


The task I'm trying to achieve is to align several sequences together.
I don't have a basic pattern to match to.  All that I know is that the
True pattern should be of length 30 and that the sequences I'm  
looking

at, have had missing values introduced to them at random points.
Here is an example of such sequences, were on the left we see what  
is the
real location of the missing values, and on the right we see the  
sequence
that we will be able to observe.  My goal is to reconstruct the left  
column
using only the sequences I've got on the right column (based on the  
fact

that many of the letters in each position are the same)

Real_sequence   The_sequence_we_see
1   CGCAATACTAAC-AGCTGACTTACGCACCG CGCAATACTAACAGCTGACTTACGCACCG
2   CGCAATACTAGC-AGGTGACTTCC-CT-CG   CGCAATACTAGCAGGTGACTTCCCTCG
3   CGCAATGATCAC--GGTGGCTCCCGGTGCG  CGCAATGATCACGGTGGCTCCCGGTGCG
4   CGCAATACTAACCA-CTAACT--CGCTGCG   CGCAATACTAACCACTAACTCGCTGCG
5   CGCACGGGTAAGAACGTGA-TTACGCTCAG CGCACGGGTAAGAACGTGATTACGCTCAG
6   CGCTATACTAACAA-GTG-CTTAGGC-CTG   CGCTATACTAACAAGTGCTTAGGCCTG
7   CCCA-C-CTAA-ACGGTGACTTACGCTCCG   CCCACCTAAACGGTGACTTACGCTCCG



The agrep function allows one to specify which sort of differences to  
consider in calculating a Levenshtein edit distance. Insertions are  
one possible distance component. You could take a look at its code (in  
C in hte sources) and perhaps rejigger it to spit out the location of  
the deletions.


 agrep(seqdat$The_sequence_we_see[1], seqdat$Real_sequence,  
max.distance=list(deletions=0, substitutions=0, insertions=0))

integer(0)
 agrep(seqdat$The_sequence_we_see[1], seqdat$Real_sequence,  
max.distance=list(deletions=0, substitutions=0, insertions=1))

[1] 1

--
David.


Here is an example code to reproduce the above example:

ATCG - c(A,T,C,G)
set.seed(40)

original.seq - sample(ATCG, 30, T)

seqS - matrix(original.seq,200,30, T)

change.letters - function(x, number.of.changes = 15,
letters.to.change.with = ATCG)
{

   number.of.changes - sample(seq_len(number.of.changes), 1)

   new.letters - sample(letters.to.change.with , number.of.changes,  
T)


   where.to.change.the.letters - sample(seq_along(x) ,  
number.of.changes, F)


   x[where.to.change.the.letters] - new.letters

   return(x)
}

change.letters(original.seq)

insert.missing.values - function(x) change.letters(x, 3, -)

insert.missing.values(original.seq)

seqS2 - t(apply(seqS, 1, change.letters))

seqS3 - t(apply(seqS2, 1, insert.missing.values))

seqS4 - apply(seqS3,1, function(x) {paste(x, collapse = )})
require(stringr)
# library(help=stringr)

all.seqS - str_replace(seqS4,- , )

# how do we allign this?

data.frame(Real_sequence = seqS4, The_sequence_we_see = all.seqS)


I understand that if all I had was a string and a pattern I would be  
able to

use

library(Biostrings)

pairwiseAlignment(...)



But in the case I present we are dealing with many sequences to  
align to one

another (instead of aligning them to one pattern).

Is there a known method for doing this in R?


Thanks,

Tal



Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il  
(Hebrew) |

www.r-statistics.com (English)
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Performing basic Multiple Sequence Alignment in R?

2010-12-21 Thread Mike Marchywka

I don't have an answer, trying to solicit more input with additional questions.

 From: tal.gal...@gmail.com
 Date: Tue, 21 Dec 2010 11:21:03 +0200
 To: r-help@r-project.org; bioconduc...@r-project.org
 Subject: [R] Performing basic Multiple Sequence Alignment in R?

 Hello everyone,

 I am not sure if this should go on the general R mailing list (for example,
 if there is a text mining solution that might work here) or the bioconductor
 mailing list (since I wasn't able to find a solution to my question on
 searching their lists) - so this time I tried both, and in the future I'll
 know better (in case it should go to only one of the two).

I take it you don't want an R interface for clustal and I seem
to recall, from doing this a few years ago, that alignment by
exact string matching was a bit of a research area ( I think you
can find papers on citeseer for example). It does seem you are asking
about exact string matches for alignment markers- your left sequences
appear exactly someplace on the right- but your overall interests
are not real clear. I never got my code fully working but I was
happy that I could do different strains of e coli ( or something in 
the 5-10 Mbp genome range ) very quickly ( seconds as I recall ) and
you could also presumably find similar items that had
moved a long way. 

Earlier someone came
here with a task and was pointed to bio packages but I 
thought there may be something in computational linguistics or mining
better suited to needs but no one ever volunteered anything.

 The task I'm trying to achieve is to align several sequences together.
 I don't have a basic pattern to match to. All that I know is that the
 True pattern should be of length 30 and that the sequences I'm looking
 at, have had missing values introduced to them at random points.

Alternatively I guess someone could make an R interface for various
BLAST's, sometimes the help desk at NCBI can get questions like this
to the right person internally.

 Here is an example of such sequences, were on the left we see what is the
 real location of the missing values, and on the right we see the sequence
 that we will be able to observe. My goal is to reconstruct the left column
 using only the sequences I've got on the right column (based on the fact
 that many of the letters in each position are the same)

 Real_sequence The_sequence_we_see
 1 CGCAATACTAAC-AGCTGACTTACGCACCG CGCAATACTAACAGCTGACTTACGCACCG
 2 CGCAATACTAGC-AGGTGACTTCC-CT-CG CGCAATACTAGCAGGTGACTTCCCTCG
 3 CGCAATGATCAC--GGTGGCTCCCGGTGCG CGCAATGATCACGGTGGCTCCCGGTGCG
 4 CGCAATACTAACCA-CTAACT--CGCTGCG CGCAATACTAACCACTAACTCGCTGCG
 5 CGCACGGGTAAGAACGTGA-TTACGCTCAG CGCACGGGTAAGAACGTGATTACGCTCAG
 6 CGCTATACTAACAA-GTG-CTTAGGC-CTG CGCTATACTAACAAGTGCTTAGGCCTG
 7 CCCA-C-CTAA-ACGGTGACTTACGCTCCG CCCACCTAAACGGTGACTTACGCTCCG

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to control ticks

2010-12-21 Thread Yogesh Tiwari

Hi,
I want 12 ticks at axis 1 and want to write Jan-Dec on each.

something like:

axis(1, at=1:12, labels=c('J','F','M','A','M','J','J','A','S','O','N','D'))

I could omit default ticks but now how to control ticks.

plot(file$time, file$ch4*1000, ylim=c(1500,1700), xaxt='n', xlab= NA,
ylab=NA,col=blue,yaxs=i,lwd=2, pch=10, type=b)#

axis(1, at=1:12, labels=c('J','F','M','A','M','J','J','A','S','O','N','D'))

BUT above is not working, and there is no error as well.

Pls help,

Regards,
Yogesh

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Coding a new variable based on criteria in a dataset

2010-12-21 Thread Ben Bolker

RaoulD raoul.t.dsouza at gmail.com writes:

 
 
 Hi,
 
 I'm a bit stuck and need some help with R code to code a variable F_R based
 on a combination of conditions. 
 
 The first condition would code F_R as F and would be based on the
 min(Date) and Min(Time) for each combination of UniqueID  Reason. The
 second condition would code the variable as R as it would be the rest of
 the data that dont meet the first condition. 
 

  It isn't quite convenient to read the data posted below into R
(if it was originally tab-separated, that formatting got lost) but
ddply from the plyr package is good for this: something like (untested)

  d - with(data,ddply(data,interaction(UniqueID,Reason),
function(x) {
  ## make sure x is sorted by date/time here
  x$F_R - c(F,rep(R,nrow(x)-1))
  x
 })

 For example: for UID 1  Reason 1 the first record would be coded F
 and the 4th record would be coded R. 
 
UniqueID   Reason   Date  Time
 1 UID 1   Reason 1 19/12/2010 15:00
 2 UID 1   Reason 2 19/12/2010 16:00
 3 UID 1   Reason 3 19/12/2010 16:30
 4 UID 1   Reason 1 20/12/2010 08:00

[snip]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to control ticks

2010-12-21 Thread Martyn Byng

Hi,

The following seems to work:

plot(1:12,1:12,xaxt='n',xlab=NA)
axis(1,at=1:12,labels=c(J,F,M,A,M,J,J,A,S,O,N,D)
)

So I'd guess that your X axis data, file$time, doesn't take the values 1
to 12.

Martyn

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of Yogesh Tiwari
Sent: 21 December 2010 12:37
To: r-help
Subject: [R] how to control ticks

Hi,
I want 12 ticks at axis 1 and want to write Jan-Dec on each.

something like:

axis(1, at=1:12,
labels=c('J','F','M','A','M','J','J','A','S','O','N','D'))

I could omit default ticks but now how to control ticks.

plot(file$time, file$ch4*1000, ylim=c(1500,1700), xaxt='n', xlab= NA,
ylab=NA,col=blue,yaxs=i,lwd=2, pch=10, type=b)#

axis(1, at=1:12,
labels=c('J','F','M','A','M','J','J','A','S','O','N','D'))

BUT above is not working, and there is no error as well.

Pls help,

Regards,
Yogesh

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


This e-mail has been scanned for all viruses by Star.\ _...{{dropped:12}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to suppress plotting for xyplot(zoo(x))?

2010-12-21 Thread Gabor Grothendieck

On Tue, Dec 21, 2010 at 7:53 AM, Marius Hofert m_hof...@web.de wrote:
 Hi,

 I found the thread 
 http://r.789695.n4.nabble.com/Matrix-as-input-to-xyplot-lattice-proper-extended-formula-syntax-td896948.html
 I used Gabor's approach and then tried to assign the plot to a variable (see 
 below). But a Quartz device is opened... why? I don't want to have anything 
 plot/printed, I just would like to store the plot object. Is there something 
 like plot = FALSE?

 Cheers,

 Marius

 library(lattice)
 library(zoo)

 df - data.frame(y = matrix(rnorm(24), nrow = 6), x = 1:6)
 xyplot(zoo(df[1:4], df$x), type = p)

 plot.object - xyplot(zoo(df[1:4], df$x), type = p) # problem: a Quartz 
 device is opened (on Mac OS X 10.6)

This also opens up a window on Windows.   It occurs within lattice
when lattice issues a trellis.par.get .  A workaround would be to open
a device directed to null.  On Windows this would work.  I assume if
you use /dev/null it would work on your machine.

png(NUL)
plot.object - ...
dev.off()





-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression or not?

2010-12-21 Thread S Ellison

A possible caveat here.

Traditionally, logistic regression was performed on the
logit-transformed proportions, with the standard errors based on the
residuals for the resulting linear fit. This accommodates overdispersion
naturally, but without telling you that you have any.

glm with a binomial family does not allow for overdispoersion unless
you use the quasibinomial family. If you have overdispersion, standard
errors from glm will be unrealistically small. Make sure your model fits
in glm before you believe the standard errors, or use the quasibionomial
family.

Steve Ellison
LGC



 Ben Bolker bbol...@gmail.com 21/12/2010 13:08:34 
array chip arrayprofile at yahoo.com writes:

[snip]

 I can think of analyzing this data using glm() with the attached
dataset:
 
 test-read.table('test.txt',sep='\t')

fit-glm(cbind(positive,total-positive)~treatment,test,family=binomial)
 summary(fit)
 anova(fit, test='Chisq')
 
 First, is this still called logistic regression or something else? I
thought 
 with logistic regression, the response variable is a binary factor?

  Sometimes I've seen it called binomial regression, or just 
a binomial generalized linear model

 Second, then summary(fit) and anova(fit, test='Chisq') gave me
different p 
 values, why is that? which one should I use?

  summary(fit) gives you p-values from a Wald test.
  anova() gives you tests based on the Likelihood Ratio Test.
  In general the LRT is more accurate.

 Third, is there an equivalent model where I can use variable
percentage 
 instead of positive  total?

  glm(percentage~treatment,weights=total,data=tests,family=binomial)

 is equivalent to the model you fitted above.
 
 Finally, what is the best way to analyze this kind of dataset 
 where it's almost the same as ANOVA except that the response
variable
  is a proportion (or success and failure)?

  Don't quite know what you mean here.  How is the situation almost
the same as ANOVA different from the situation you described above?
Do you mean when there are multiple factors? or ???

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] combination value

2010-12-21 Thread amir

mailto:r-help@r-project.orgHi every one,

I want to calculate the combination function in R, the value not all the
possible choices.
I mean cmbn(5,2)=10.

Is there any function unless using factorial?

Regards,
Amir


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] combination value

2010-12-21 Thread Jorge I Velez

choose(5, 2)

HTH,
Jorge


On Tue, Dec 21, 2010 at 9:23 AM, amir  wrote:

 mailto:r-help@r-project.orgHi every one,

 I want to calculate the combination function in R, the value not all the
 possible choices.
 I mean cmbn(5,2)=10.

 Is there any function unless using factorial?

 Regards,
 Amir


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] two-part growth analysis

2010-12-21 Thread Sebastián Daza


Hi everyone!

Does anyone know if there is a package to do two-part growth analysis 
with R?


Regards,
Sebastian


--
Sebastián Daza
sebastian.d...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] combination value

2010-12-21 Thread David Winsemius



On Dec 21, 2010, at 9:23 AM, amir wrote:


mailto:r-help@r-project.orgHi every one,

I want to calculate the combination function in R, the value not all  
the

possible choices.
I mean cmbn(5,2)=10.

Is there any function unless using factorial?


?choose

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ideas, modeling highly discrete time-series data

2010-12-21 Thread Kjetil Halvorsen

You could try the timeseries list at
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=TIMESERIES

kjetil

On Mon, Dec 20, 2010 at 6:26 PM, Mike Williamson this.is@gmail.com wrote:
Hello all,

First of all, thanks so those of you who helped me a week or so ago
managing a time series with varying gaps between the data series in 'R'.
(My final preferred solution was to use its function then
forecast(Arima( ) ). )

My next question is a general statistical question where I'd like some
advice, for those willing / able to proffer any wisdom:

- I need to predict using this same time series, where the *data* are
highly discrete. E.g., I will have values like 1e5, 2.2e5, and 3.6e5, but I
will never have 1.3e5 or 1.8e5, etc.
- I could simply leave these values as discrete, similar to a binomial
distribution, but then I am not sure how to use time series tricks like
arima above. For time-series analyses that I know of, an
assumption of an
approximately normal distribution is expected. No simple normalization
(e.g., log(values) ) works, since the non-normality arises from
the highly
discrete distribution more than any drastic asymmetry in the population
spread.
- I could leave the values as they are an work with a model where the
assumption is violated... I am not sure how sensitive a model
such as arima
is on the population distribution
- Or I could... (here's where I am hoping for some collective genius).

Thanks in advance for any help! If this isn't the best forum, since I
know this is not specifically an 'R' question, please let me know of a
better forum to post such a question.

Thanks!
Mike

Telescopes and bathyscaphes and sonar probes of Scottish lakes,
Tacoma Narrows bridge collapse explained with abstract phase-space maps,
Some x-ray slides, a music score, Minard's Napoleanic war:
The most exciting frontier is charting what's already here.
-- xkcd

--
Help protect Wikipedia. Donate now:
http://wikimediafoundation.org/wiki/Support_Wikipedia/en

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to control ticks

2010-12-21 Thread Edwin Groot

On Tue, 21 Dec 2010 18:06:52 +0530
 Yogesh Tiwari yogesh@googlemail.com wrote:
 Hi,
 I want 12 ticks at axis 1 and want to write Jan-Dec on each.
 
 something like:
 
 axis(1, at=1:12,
 labels=c('J','F','M','A','M','J','J','A','S','O','N','D'))
 
 I could omit default ticks but now how to control ticks.
 

Dear Yogesh,
I spray my clothing with No-Bite, and that controls ticks quite well.

:-)
Edwin

 plot(file$time, file$ch4*1000, ylim=c(1500,1700), xaxt='n', xlab= NA,
 ylab=NA,col=blue,yaxs=i,lwd=2, pch=10, type=b)#
 
 axis(1, at=1:12,
 labels=c('J','F','M','A','M','J','J','A','S','O','N','D'))
 
 BUT above is not working, and there is no error as well.
 
 Pls help,
 
 Regards,
 Yogesh
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

Dr. Edwin Groot, postdoctoral associate
AG Laux
Institut fuer Biologie III
Schaenzlestr. 1
79104 Freiburg, Deutschland
+49 761-2032945

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to control ticks

2010-12-21 Thread jim holtman

What is the structure of file$time? Is it Date/POSIXct?  'at=1:12'
only works if those are the dimensions of file$time.  So give us an
idea of what the data is (PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented,
minimal, self-contained, reproducible code).



On Tue, Dec 21, 2010 at 7:36 AM, Yogesh Tiwari
yogesh@googlemail.com wrote:
 Hi,
 I want 12 ticks at axis 1 and want to write Jan-Dec on each.

 something like:

 axis(1, at=1:12, labels=c('J','F','M','A','M','J','J','A','S','O','N','D'))

 I could omit default ticks but now how to control ticks.

 plot(file$time, file$ch4*1000, ylim=c(1500,1700), xaxt='n', xlab= NA,
 ylab=NA,col=blue,yaxs=i,lwd=2, pch=10, type=b)#

 axis(1, at=1:12, labels=c('J','F','M','A','M','J','J','A','S','O','N','D'))

 BUT above is not working, and there is no error as well.

 Pls help,

 Regards,
 Yogesh

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to control ticks

2010-12-21 Thread peter dalgaard


On Dec 21, 2010, at 17:01 , Edwin Groot wrote:

 On Tue, 21 Dec 2010 18:06:52 +0530
 Yogesh Tiwari yogesh@googlemail.com wrote:
 Hi,
 I want 12 ticks at axis 1 and want to write Jan-Dec on each.
 
 something like:
 
 axis(1, at=1:12,
 labels=c('J','F','M','A','M','J','J','A','S','O','N','D'))
 
 I could omit default ticks but now how to control ticks.
 
 
 Dear Yogesh,
 I spray my clothing with No-Bite, and that controls ticks quite well.

Yeah, but then how do you get the suckers to sit still while you write on them?

;-)

 
 :-)
 Edwin
 
 plot(file$time, file$ch4*1000, ylim=c(1500,1700), xaxt='n', xlab= NA,
 ylab=NA,col=blue,yaxs=i,lwd=2, pch=10, type=b)#
 
 axis(1, at=1:12,
 labels=c('J','F','M','A','M','J','J','A','S','O','N','D'))
 
 BUT above is not working, and there is no error as well.
 
 Pls help,
 
 Regards,
 Yogesh
 
  [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 Dr. Edwin Groot, postdoctoral associate
 AG Laux
 Institut fuer Biologie III
 Schaenzlestr. 1
 79104 Freiburg, Deutschland
 +49 761-2032945
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression or not?

2010-12-21 Thread peter dalgaard


On Dec 21, 2010, at 14:22 , S Ellison wrote:

 A possible caveat here.
 
 Traditionally, logistic regression was performed on the
 logit-transformed proportions, with the standard errors based on the
 residuals for the resulting linear fit. This accommodates overdispersion
 naturally, but without telling you that you have any.
 
 glm with a binomial family does not allow for overdispoersion unless
 you use the quasibinomial family. If you have overdispersion, standard
 errors from glm will be unrealistically small. Make sure your model fits
 in glm before you believe the standard errors, or use the quasibionomial
 family.

...and before you believe in overdispersion, make sure you have a credible 
explanation for it. All too often, what you really have is a model that doesn't 
fit your data properly.

 
 Steve Ellison
 LGC
 
 
 
 Ben Bolker bbol...@gmail.com 21/12/2010 13:08:34 
 array chip arrayprofile at yahoo.com writes:
 
 [snip]
 
 I can think of analyzing this data using glm() with the attached
 dataset:
 
 test-read.table('test.txt',sep='\t')
 
 fit-glm(cbind(positive,total-positive)~treatment,test,family=binomial)
 summary(fit)
 anova(fit, test='Chisq')
 
 First, is this still called logistic regression or something else? I
 thought 
 with logistic regression, the response variable is a binary factor?
 
  Sometimes I've seen it called binomial regression, or just 
 a binomial generalized linear model
 
 Second, then summary(fit) and anova(fit, test='Chisq') gave me
 different p 
 values, why is that? which one should I use?
 
  summary(fit) gives you p-values from a Wald test.
  anova() gives you tests based on the Likelihood Ratio Test.
  In general the LRT is more accurate.
 
 Third, is there an equivalent model where I can use variable
 percentage 
 instead of positive  total?
 
  glm(percentage~treatment,weights=total,data=tests,family=binomial)
 
 is equivalent to the model you fitted above.
 
 Finally, what is the best way to analyze this kind of dataset 
 where it's almost the same as ANOVA except that the response
 variable
 is a proportion (or success and failure)?
 
  Don't quite know what you mean here.  How is the situation almost
 the same as ANOVA different from the situation you described above?
 Do you mean when there are multiple factors? or ???
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help 
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html 
 and provide commented, minimal, self-contained, reproducible code.
 
 ***
 This email and any attachments are confidential. Any use...{{dropped:8}}
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to control ticks

2010-12-21 Thread Patrick Burns


I'm not sure, but perhaps you want to
copy the logic of:

http://www.portfolioprobe.com/R/blog/pp.timeplot.R

On 21/12/2010 12:36, Yogesh Tiwari wrote:

Hi,
I want 12 ticks at axis 1 and want to write Jan-Dec on each.

something like:

axis(1, at=1:12, labels=c('J','F','M','A','M','J','J','A','S','O','N','D'))

I could omit default ticks but now how to control ticks.

plot(file$time, file$ch4*1000, ylim=c(1500,1700), xaxt='n', xlab= NA,
ylab=NA,col=blue,yaxs=i,lwd=2, pch=10, type=b)#

axis(1, at=1:12, labels=c('J','F','M','A','M','J','J','A','S','O','N','D'))

BUT above is not working, and there is no error as well.

Pls help,

Regards,
Yogesh

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression or not?

2010-12-21 Thread S Ellison


...and before you believe in overdispersion, make sure you have a
credible explanation for it. All too often, what you really have 
is a model that doesn't fit your data properly.

Well put.

A possible fortune?

S Ellison



***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression or not?

2010-12-21 Thread array chip

Thank you Ben, Steve and Peter. 

Ben, my last question was to see if there are other ways of analyzing this type 
of data where the response variable is a proportion, in addition to binomial 
regression. 


BTW, I also found the following is also an equivalent model directly using 
percentage:

glm(log(percentage/(1-percentage))~treatment,data=test)

Thanks

John

 




From: Ben Bolker bbol...@gmail.com
To: r-h...@stat.math.ethz.ch
Sent: Tue, December 21, 2010 5:08:34 AM
Subject: Re: [R] logistic regression or not?

array chip arrayprofile at yahoo.com writes:

[snip]

 I can think of analyzing this data using glm() with the attached dataset:
 
 test-read.table('test.txt',sep='\t')
 fit-glm(cbind(positive,total-positive)~treatment,test,family=binomial)
 summary(fit)
 anova(fit, test='Chisq')

 First, is this still called logistic regression or something else? I thought 
 with logistic regression, the response variable is a binary factor?

  Sometimes I've seen it called binomial regression, or just 
a binomial generalized linear model

 Second, then summary(fit) and anova(fit, test='Chisq') gave me different p 
 values, why is that? which one should I use?

  summary(fit) gives you p-values from a Wald test.
  anova() gives you tests based on the Likelihood Ratio Test.
  In general the LRT is more accurate.

 Third, is there an equivalent model where I can use variable percentage
 instead of positive  total?

  glm(percentage~treatment,weights=total,data=tests,family=binomial)

is equivalent to the model you fitted above.
 
 Finally, what is the best way to analyze this kind of dataset 
 where it's almost the same as ANOVA except that the response variable
  is a proportion (or success and failure)?

  Don't quite know what you mean here.  How is the situation almost
the same as ANOVA different from the situation you described above?
Do you mean when there are multiple factors? or ???

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression or not?

2010-12-21 Thread Ben Bolker

On 10-12-21 12:20 PM, array chip wrote:
 Thank you Ben, Steve and Peter.
  
 Ben, my last question was to see if there are other ways of analyzing
 this type of data where the response variable is a proportion, in
 addition to binomial regression.
  
 BTW, I also found the following is also an equivalent model directly
 using percentage:
  
 glm(log(percentage/(1-percentage))~treatment,data=test)
  
 Thanks
  
 John
 

  Yes, but this is a different model.

  The model you have here uses Gaussian errors (it is in fact an
identical model, although not necessarily quite an identical algorithm
(?), to just using lm().  It will fail if you have any percentages that
are 0 or 1.  See Stuart's comment about how things were done in the old
days.

  Beta regression (see e.g. the betareg package) is another way of
handling analysis of proportions.

 
 
 *From:* Ben Bolker bbol...@gmail.com
 *To:* r-h...@stat.math.ethz.ch
 *Sent:* Tue, December 21, 2010 5:08:34 AM
 *Subject:* Re: [R] logistic regression or not?
 
 array chip arrayprofile at yahoo.com http://yahoo.com/ writes:
 
 [snip]
 
 I can think of analyzing this data using glm() with the attached dataset:

 test-read.table('test.txt',sep='\t')
 fit-glm(cbind(positive,total-positive)~treatment,test,family=binomial)
 summary(fit)
 anova(fit, test='Chisq')
 
 First, is this still called logistic regression or something else? I
 thought
 with logistic regression, the response variable is a binary factor?
 
   Sometimes I've seen it called binomial regression, or just
 a binomial generalized linear model
 
 Second, then summary(fit) and anova(fit, test='Chisq') gave me
 different p
 values, why is that? which one should I use?
 
   summary(fit) gives you p-values from a Wald test.
   anova() gives you tests based on the Likelihood Ratio Test.
   In general the LRT is more accurate.
 
 Third, is there an equivalent model where I can use variable percentage
 instead of positive  total?
 
   glm(percentage~treatment,weights=total,data=tests,family=binomial)
 
 is equivalent to the model you fitted above.

 Finally, what is the best way to analyze this kind of dataset
 where it's almost the same as ANOVA except that the response variable
  is a proportion (or success and failure)?
 
   Don't quite know what you mean here.  How is the situation almost
 the same as ANOVA different from the situation you described above?
 Do you mean when there are multiple factors? or ???
 
 __
 R-help@r-project.org mailto:R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression or not?

2010-12-21 Thread array chip

Ben, thanks again.

John





From: Ben Bolker bbol...@gmail.com

Cc: r-h...@stat.math.ethz.ch; S Ellison s.elli...@lgc.co.uk; peter dalgaard 
pda...@gmail.com
Sent: Tue, December 21, 2010 9:26:29 AM
Subject: Re: [R] logistic regression or not?

On 10-12-21 12:20 PM, array chip wrote:
 Thank you Ben, Steve and Peter.
  
 Ben, my last question was to see if there are other ways of analyzing
 this type of data where the response variable is a proportion, in
 addition to binomial regression.
  
 BTW, I also found the following is also an equivalent model directly
 using percentage:
  
 glm(log(percentage/(1-percentage))~treatment,data=test)
  
 Thanks
  
 John
 

  Yes, but this is a different model.

  The model you have here uses Gaussian errors (it is in fact an
identical model, although not necessarily quite an identical algorithm
(?), to just using lm().  It will fail if you have any percentages that
are 0 or 1.  See Stuart's comment about how things were done in the old
days.

  Beta regression (see e.g. the betareg package) is another way of
handling analysis of proportions.

 
 
 *From:* Ben Bolker bbol...@gmail.com
 *To:* r-h...@stat.math.ethz.ch
 *Sent:* Tue, December 21, 2010 5:08:34 AM
 *Subject:* Re: [R] logistic regression or not?
 
 array chip arrayprofile at yahoo.com http://yahoo.com/ writes:
 
 [snip]
 
 I can think of analyzing this data using glm() with the attached dataset:

 test-read.table('test.txt',sep='\t')
 fit-glm(cbind(positive,total-positive)~treatment,test,family=binomial)
 summary(fit)
 anova(fit, test='Chisq')
 
 First, is this still called logistic regression or something else? I
 thought
 with logistic regression, the response variable is a binary factor?
 
  Sometimes I've seen it called binomial regression, or just
 a binomial generalized linear model
 
 Second, then summary(fit) and anova(fit, test='Chisq') gave me
 different p
 values, why is that? which one should I use?
 
  summary(fit) gives you p-values from a Wald test.
  anova() gives you tests based on the Likelihood Ratio Test.
  In general the LRT is more accurate.
 
 Third, is there an equivalent model where I can use variable percentage
 instead of positive  total?
 
  glm(percentage~treatment,weights=total,data=tests,family=binomial)
 
 is equivalent to the model you fitted above.

 Finally, what is the best way to analyze this kind of dataset
 where it's almost the same as ANOVA except that the response variable
  is a proportion (or success and failure)?
 
  Don't quite know what you mean here.  How is the situation almost
 the same as ANOVA different from the situation you described above?
 Do you mean when there are multiple factors? or ???
 
 __
 R-help@r-project.org mailto:R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R.matlab memory use

2010-12-21 Thread Henrik Bengtsson

Hi,

I am using Octave; what does that save options do, more specifically,
is compression taking place when saving that file?

If compression is done, then the Rcompression package is utilized by
R.matlab (otherwise not).  BTW, you don't have to load Rcompression
explicitly; R.matlab will do it for you if needed.  So, if you start a
fresh R session and load R.matlab and then try to load your package,
is Rcompression loaded?  If so, what version Rcompression do you have
installed, i.e. what does

sessionInfo()

report afterward?   Duncan TL did Rcompression updates addressing
memory usage about a year ago (I think) and it might be that you are
using an older version of it.  You should also update R.matlab et al,
because your using old versions (though I don't think that is the
cause here).

If Rcompression is the cause here, then it also make sense that you
don't experience the memory hog when reading a text file (which is
never compressed).  You could also see if there is an option in Octave
that safes to binary format but without compression.  I know Matlab
has such options.

/Henrik
(author of R.matlab)




On Mon, Dec 20, 2010 at 7:11 AM, Stefano Ghirlanda
dr.ghirla...@gmail.com wrote:
 Hi Ben,
 Thanks for your reply. My data structure is about 2 x 2000 so one
 order of magnitude the one you tried. I have no problem saving and
 reading smaller data structures (even large ones, just not his large)
 between octave and R using octave's save -7 (which saves MATLAB v5
 files) and R.matlab's readMat. And I can save in text format in octave
 and read in R using read.octave (from package foreign) so it's not a
 big deal. I was just surprised that R.matlab needed more memory than I
 have (I have 3GB on this machine).

 Thanks,
 Stefano

 On Sun, Dec 19, 2010 at 10:54 PM, Ben Bolker bbol...@gmail.com wrote:
 Stefano Ghirlanda dr.ghirlanda at gmail.com writes:

 I am trying to load into R a MATLAB format file (actually, as saved by
 octave). The file is about 300kB but R complains with a memory
 allocation error:

  library(Rcompression)
  library(R.matlab)
 Loading required package: R.oo
 Loading required package: R.methodsS3
 R.methodsS3 v1.2.0 (2010-03-13) successfully loaded. See ?R.methodsS3 for
 help.
 R.oo v1.7.2 (2010-04-13) successfully loaded. See ?R.oo for help.
 R.matlab v1.3.1 (2010-04-20) successfully loaded. See ?R.matlab for help.
  f - readMat(freq.mat)
 Error: cannot allocate vector of size 296.5 Mb

 On the other hand, if I save the same data in ascii format (from
 octave: save -text), resulting in a 75MB file, then I can load it
 without problems with the read.octave() function from package foreign.
 Is this a known issue or am I doing something wrong? My R version is:

  This is not a package I'm particularly familiar with, but:

  what commands did you use to save the file in octave?  Based on
 'help save' I think that 'save' by default would get you an octave
 format file ... you might have to do some careful reading in
 ?readMat (in R) and 'help save' (in octave) to figure out the
 correspondence between octave/MATLAB and R/MATLAB.
   If possible, try saving a small file and see if it works; if
 you still don't know what's going on, post that file somewhere for
 people to try.

  I was able to

 save -6 save.mat in octave and
 readMat(save.mat) in R successfully,
 saving a vector of integers from 1 to 1 million (which
 took about 7.7 Mb)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Stefano Ghirlanda
 www.intercult.su.se/~stefano - drghirlanda.wordpress.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Performing basic Multiple Sequence Alignment in R?

2010-12-21 Thread Tal Galili

Hello David, Mike and Thomas

Dear David,
First, my apologies for the double posting - I'll try to not forget that
policy.
Regarding agrep, I think it will be easier for me to work with the functions
on {Biostrings} (for example stringDist, or pairwiseAlignment), then to open
up the C code.


Dear Mike and Thomas,

From what I gathered here (Thanks to Joris Meys):
http://stackoverflow.com/questions/4497747/how-to-perform-basic-multiple-sequence-alignments-in-r/4498434#4498434
There is an R interface to the MUSCLE algorithm in the bio3d package
(function seqaln()).
But not one for clustal.

I will probably end up using pairwiseAlignment on pairs of allignments with
some sort of stopping rules (I'll have to play with it to see how it works).

Thank you all for your answers.
It is always helpful to from others if something was already implemented in
R or not.

Best,
Tal


Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Tue, Dec 21, 2010 at 2:44 PM, Mike Marchywka marchy...@hotmail.comwrote:

 e came
 here with a task and was pointed to bio packages but I
 thought there m


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R.matlab memory use

2010-12-21 Thread Henrik Bengtsson

On Tue, Dec 21, 2010 at 10:15 AM, Henrik Bengtsson h...@biostat.ucsf.edu 
wrote:
 Hi,

 I am using Octave; what does that save options do, more specifically,
 is compression taking place when saving that file?

That should be: I am [not] using Octave... /H


 If compression is done, then the Rcompression package is utilized by
 R.matlab (otherwise not).  BTW, you don't have to load Rcompression
 explicitly; R.matlab will do it for you if needed.  So, if you start a
 fresh R session and load R.matlab and then try to load your package,
 is Rcompression loaded?  If so, what version Rcompression do you have
 installed, i.e. what does

 sessionInfo()

 report afterward?   Duncan TL did Rcompression updates addressing
 memory usage about a year ago (I think) and it might be that you are
 using an older version of it.  You should also update R.matlab et al,
 because your using old versions (though I don't think that is the
 cause here).

 If Rcompression is the cause here, then it also make sense that you
 don't experience the memory hog when reading a text file (which is
 never compressed).  You could also see if there is an option in Octave
 that safes to binary format but without compression.  I know Matlab
 has such options.

 /Henrik
 (author of R.matlab)




 On Mon, Dec 20, 2010 at 7:11 AM, Stefano Ghirlanda
 dr.ghirla...@gmail.com wrote:
 Hi Ben,
 Thanks for your reply. My data structure is about 2 x 2000 so one
 order of magnitude the one you tried. I have no problem saving and
 reading smaller data structures (even large ones, just not his large)
 between octave and R using octave's save -7 (which saves MATLAB v5
 files) and R.matlab's readMat. And I can save in text format in octave
 and read in R using read.octave (from package foreign) so it's not a
 big deal. I was just surprised that R.matlab needed more memory than I
 have (I have 3GB on this machine).

 Thanks,
 Stefano

 On Sun, Dec 19, 2010 at 10:54 PM, Ben Bolker bbol...@gmail.com wrote:
 Stefano Ghirlanda dr.ghirlanda at gmail.com writes:

 I am trying to load into R a MATLAB format file (actually, as saved by
 octave). The file is about 300kB but R complains with a memory
 allocation error:

  library(Rcompression)
  library(R.matlab)
 Loading required package: R.oo
 Loading required package: R.methodsS3
 R.methodsS3 v1.2.0 (2010-03-13) successfully loaded. See ?R.methodsS3 for
 help.
 R.oo v1.7.2 (2010-04-13) successfully loaded. See ?R.oo for help.
 R.matlab v1.3.1 (2010-04-20) successfully loaded. See ?R.matlab for help.
  f - readMat(freq.mat)
 Error: cannot allocate vector of size 296.5 Mb

 On the other hand, if I save the same data in ascii format (from
 octave: save -text), resulting in a 75MB file, then I can load it
 without problems with the read.octave() function from package foreign.
 Is this a known issue or am I doing something wrong? My R version is:

  This is not a package I'm particularly familiar with, but:

  what commands did you use to save the file in octave?  Based on
 'help save' I think that 'save' by default would get you an octave
 format file ... you might have to do some careful reading in
 ?readMat (in R) and 'help save' (in octave) to figure out the
 correspondence between octave/MATLAB and R/MATLAB.
   If possible, try saving a small file and see if it works; if
 you still don't know what's going on, post that file somewhere for
 people to try.

  I was able to

 save -6 save.mat in octave and
 readMat(save.mat) in R successfully,
 saving a vector of integers from 1 to 1 million (which
 took about 7.7 Mb)

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Stefano Ghirlanda
 www.intercult.su.se/~stefano - drghirlanda.wordpress.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to see what's wrong with a self written function?

2010-12-21 Thread casperyc


Hi all,

I am writing a simple function to implement regularfalsi (secant) method.

###
regulafalsi=function(f,x0,x1){
x=c()
x[1]=x1
i=1
while ( f(x[i])!=0 ) {
i=i+1
if (i==2) {
x[2]=x[1]-f(x[1])*(x[1]-x0)/(f(x[1])-f(x0))
} else {

x[i]=x[i-1]-f(x[i-1])*(x[i-1]-x[i-2])/(f(x[i-1])-f(x[i-2]))
}
}
x[i]
}
###

These work fine,
regulafalsi(function(x) x^(1/2)+3*log(x)-5,1,10)
regulafalsi(function(x) x^(1/2)+3*log(x)-5,10,1)

For all x0, the function is strictly increasing.

Then

regulafalsi(function(x) x^(1/2)+3*log(x)-5,1,100)

Error in while (f(x[i]) != 0) { : missing value where TRUE/FALSE needed
In addition: Warning message:
In log(x) : NaNs produced

I dont know what happened there, is there a way to find the value for
f(x[i])
that R can't determine TRUE/FALSE?

Thanks!

casper
-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-see-what-s-wrong-with-a-self-written-function-tp3159528p3159528.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] variable lengths differ (found for '(weights)') error in Zelig library

2010-12-21 Thread Sotiris Adamakis

Dear R users,

I am trying to estimate to estimate the average treatmen effect on the
treated (ATT) using first the MatchIt software to weight the data set and,
after this, the Zelig software as shown in Ho et al. (2007). See here for an
explanation of how to apply this technique in R:

http://imai.princeton.edu/research/files/matchit.pdf

I encounter a slight problem when I apply the weights that are produced in
the stage of preprocessing the data. The idea of this is to use the MatchIt
software to preprocess the data and then use the Zelig software to generate
the distribution of ATT. I believe that the main reason for preprocessing
the data is to create weights (depending on the matching technique you use)
so that balance would be achieved for the matching variables between the
treatment and the control group. Then you use these weights in the
regressions that follow in the Zelig library. Copied from the matchit
article, whose link I provide above, the authors say:

If one chooses options that allow matching with
replacement, or any solution that has different numbers of controls (or
treateds) within each
subclass or strata (such as full matching), then the parametric analysis
following matching
must accomodate these procedures, such as by using fixed effects or weights,
as appropriate.
(Similar procedures can also be used to estimate various other quantities of
interest such
as the average treatment effect by computing it for all observations, but
then one must
be aware that the quantity of interest may change during the matching
procedure as some
control units may be dropped.)

The following code is for the lalonde data set, where I get an error
message in the end:

 library(Zelig)
 library(MatchIt)
 data(lalonde)
 m.out1 = matchit(treat ~ age + educ + black + hispan + nodegree + married
+ re74 + re75, method = subclass, subclass=6, data = lalonde)
 z.out1 = zelig(re78 ~ age + educ + black + hispan + nodegree + married +
re74 + re75, data = match.data(m.out1, control), model = ls,
weights=weights)
 x.out1 = setx(z.out1, data = match.data(m.out1, treat), cond = TRUE)
 s.out1 = sim(z.out1, x = x.out1)
Error in model.frame.default(formula = re78 ~ age + educ + black + hispan +
:
  variable lengths differ (found for '(weights)')

I was wondering if somebody could tell me how to get around with this
problem?

Also, I have seen people adding the propensity scores in the regression
analysis applied in the Zelig package, i.e.

 z.out1 = zelig(re78 ~ age + educ + black + hispan + nodegree + married +
re74 + re75 + *distance*, data = match.data(m.out1, control), model =
ls, weights=weights)

Does anyone have a clue of why this can happen?

Kind regards,
Sotiris

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Density plot with lattice?

2010-12-21 Thread Marie-Hélène Hachey


Hi,
Is it possible to remove the points at the base of a density plot?I would like 
to keep only the curves of the plot, not the points.
Thank you.
Marie-Helene HacheyM.Sc. studentUniversite Laval, Quebec
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lm() on a matrix of zoo series

2010-12-21 Thread steven mosher

I have a matrix of zoo series. each series is in a column.
 x - as.yearmon(2000 + seq(0, 23)/12)
# 24 months of data, lets make 20 sets of random data
 testData - matrix(rnorm(480),ncol=20)
# make a zoo object and columns will hold the 20 series
TestZoo  - zoo(testData,order.by=x)
# now run lm for just one series.
 m - lm(TestZoo[,1]~time(TestZoo))$coeff[2]
 m
time(TestZoo)
0.3443124
 m2 - lm(TestZoo[,2]~time(TestZoo))$coeff[2]
 m2
time(TestZoo)
   -0.1192866

I've been struggling trying to use apply ( or something equally suitable) to
get a vector of m for this entire matrix
without resorting to a loop.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lm() on a matrix of zoo series

2010-12-21 Thread Gabor Grothendieck

On Tue, Dec 21, 2010 at 3:02 PM, steven mosher mosherste...@gmail.com wrote:
 I have a matrix of zoo series. each series is in a column.
  x - as.yearmon(2000 + seq(0, 23)/12)
 # 24 months of data, lets make 20 sets of random data
  testData - matrix(rnorm(480),ncol=20)
 # make a zoo object and columns will hold the 20 series
 TestZoo  - zoo(testData,order.by=x)
 # now run lm for just one series.
  m - lm(TestZoo[,1]~time(TestZoo))$coeff[2]
  m
 time(TestZoo)
    0.3443124
  m2 - lm(TestZoo[,2]~time(TestZoo))$coeff[2]
  m2
 time(TestZoo)
   -0.1192866

 I've been struggling trying to use apply ( or something equally suitable) to
 get a vector of m for this entire matrix
 without resorting to a loop.


Try this:

   lm(TestZoo ~ time(TestZoo))


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lm() on a matrix of zoo series

2010-12-21 Thread steven mosher

Thanks,

  I was trying apply(TestZoo,2,lm,TestZoo~time(TestZoo))

 which was throwing a formula error.

On Tue, Dec 21, 2010 at 12:21 PM, Gabor Grothendieck 
ggrothendi...@gmail.com wrote:

 On Tue, Dec 21, 2010 at 3:02 PM, steven mosher mosherste...@gmail.com
 wrote:
  I have a matrix of zoo series. each series is in a column.
   x - as.yearmon(2000 + seq(0, 23)/12)
  # 24 months of data, lets make 20 sets of random data
   testData - matrix(rnorm(480),ncol=20)
  # make a zoo object and columns will hold the 20 series
  TestZoo  - zoo(testData,order.by=x)
  # now run lm for just one series.
   m - lm(TestZoo[,1]~time(TestZoo))$coeff[2]
   m
  time(TestZoo)
 0.3443124
   m2 - lm(TestZoo[,2]~time(TestZoo))$coeff[2]
   m2
  time(TestZoo)
-0.1192866
 
  I've been struggling trying to use apply ( or something equally suitable)
 to
  get a vector of m for this entire matrix
  without resorting to a loop.
 

 Try this:

   lm(TestZoo ~ time(TestZoo))


 --
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to see what's wrong with a self written function?

2010-12-21 Thread Duncan Murdoch


On 21/12/2010 2:39 PM, casperyc wrote:

Hi all,

I am writing a simple function to implement regularfalsi (secant) method.

###
regulafalsi=function(f,x0,x1){
x=c()
x[1]=x1
i=1
while ( f(x[i])!=0 ) {
i=i+1
if (i==2) {
x[2]=x[1]-f(x[1])*(x[1]-x0)/(f(x[1])-f(x0))
} else {

x[i]=x[i-1]-f(x[i-1])*(x[i-1]-x[i-2])/(f(x[i-1])-f(x[i-2]))
}
}
x[i]
}
###

These work fine,
regulafalsi(function(x) x^(1/2)+3*log(x)-5,1,10)
regulafalsi(function(x) x^(1/2)+3*log(x)-5,10,1)

For all x0, the function is strictly increasing.

Then

regulafalsi(function(x) x^(1/2)+3*log(x)-5,1,100)

Error in while (f(x[i]) != 0) { : missing value where TRUE/FALSE needed
In addition: Warning message:
In log(x) : NaNs produced

I dont know what happened there, is there a way to find the value for
f(x[i])
that R can't determine TRUE/FALSE?


The easiest is to just use regular old-fashioned debugging methods, i.e. 
insert print() or cat() statements into your function.  You could also 
try debug(regulafalsi) and single step through it to see where things go 
wrong.  (An obvious guess is that one of the values being passed to f is 
negative, but you'll have to figure out why that happened and what to do 
about it.)


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Keeping Leading Zeros, Treating numbers as text

2010-12-21 Thread James Splinter

Hello,

I have a data set, with some numerical values, some non-numerical data, my
issue is that I need to preserve my ID numbers (numerics) with the leading
zeros, but when I import the data into R (it's in .csv format) using the
read.csv( ) command, it turns all the ID numbers (Example: 00210) into
numbers, removing the leading zeros, so I end up with 210. I tried using the
as.is= command on the column that I wanted to treat as text, but it had no
effect.

Any help would be very much appreciated,

Thanks,

James

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Keeping Leading Zeros, Treating numbers as text

2010-12-21 Thread Bert Gunter

Try reading the csv file with, say, Notepad. I think you may find that
the problem is that Excel assumes the column is numeric and strips off
the zeros before saving the file. So you need to tell it that the ID
columns are character before saving.

Then you need to read the Help page for read.csv more carefully,
noting, in particular, the colClasses argument.

-- Bert

On Tue, Dec 21, 2010 at 12:43 PM, James Splinter
james.r.splin...@gmail.com wrote:
 Hello,

 I have a data set, with some numerical values, some non-numerical data, my
 issue is that I need to preserve my ID numbers (numerics) with the leading
 zeros, but when I import the data into R (it's in .csv format) using the
 read.csv( ) command, it turns all the ID numbers (Example: 00210) into
 numbers, removing the leading zeros, so I end up with 210. I tried using the
 as.is= command on the column that I wanted to treat as text, but it had no
 effect.

 Any help would be very much appreciated,

 Thanks,

 James

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Bert Gunter
Genentech Nonclinical Biostatistics
467-7374
http://devo.gene.com/groups/devo/depts/ncb/home.shtml

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Keeping Leading Zeros, Treating numbers as text

2010-12-21 Thread David L Lorenz

James,
  How about 

sprintf('%05d', 210)

  It works for fixed length id numbers.
Dave


From:
James Splinter james.r.splin...@gmail.com
To:
R-help@r-project.org
Date:
12/21/2010 02:44 PM
Subject:
[R] Keeping Leading Zeros, Treating numbers as text
Sent by:
r-help-boun...@r-project.org



Hello,

I have a data set, with some numerical values, some non-numerical data, my
issue is that I need to preserve my ID numbers (numerics) with the leading
zeros, but when I import the data into R (it's in .csv format) using the
read.csv( ) command, it turns all the ID numbers (Example: 00210) into
numbers, removing the leading zeros, so I end up with 210. I tried using 
the
as.is= command on the column that I wanted to treat as text, but it had 
no
effect.

Any help would be very much appreciated,

Thanks,

James

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Keeping Leading Zeros, Treating numbers as text

2010-12-21 Thread David Winsemius



On Dec 21, 2010, at 4:11 PM, Bert Gunter wrote:


Try reading the csv file with, say, Notepad. I think you may find that
the problem is that Excel assumes the column is numeric and strips off
the zeros before saving the file. So you need to tell it that the ID
columns are character before saving.


If Excel turns out to be the culprit, there is an equivalent operation  
to the colClasses specification which you can do to prevent leading  
zeros from being dropped. Select the entire column by clicking on the  
column letter at the top margin of the sheet and then choose Format/ 
Cells/... and pick Text. The same sort of preparation can also save  
you grief with Date types in Excel or OO.org.




Then you need to read the Help page for read.csv more carefully,
noting, in particular, the colClasses argument.

-- Bert

On Tue, Dec 21, 2010 at 12:43 PM, James Splinter
james.r.splin...@gmail.com wrote:

Hello,

I have a data set, with some numerical values, some non-numerical  
data, my
issue is that I need to preserve my ID numbers (numerics) with the  
leading
zeros, but when I import the data into R (it's in .csv format)  
using the
read.csv( ) command, it turns all the ID numbers (Example: 00210)  
into
numbers, removing the leading zeros, so I end up with 210. I tried  
using the
as.is= command on the column that I wanted to treat as text, but  
it had no

effect.

Any help would be very much appreciated,

Thanks,

James



--
Bert Gunter
Genentech Nonclinical Biostatistics
467-7374
http://devo.gene.com/groups/devo/depts/ncb/home.shtml


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Write.table eol argument

2010-12-21 Thread Jim Moon

Hello All,

R 2.11.1
Windows XP, 32-bit

Help says that default is eol='\n'.  To me, that represents Linefeed (LF)

From Help:
eol the character(s) to print at the end of each line (row). For example, 
eol=\r\n will produce Windows' line endings on a Unix-alike OS, and eol=\r 
will produce files as expected by Mac OS Excel 2004.

I would like for write.table to end each line with LF only-no carriage return 
(CR).

Default eol='\n'   generates  CRLF
Explicit eol='\n'   generates  CRLF
eol='\r'   generates CR
eol='\r\n'  generates (predictably)  CRCRLF

Thank you for your time.

Jim


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Keeping Leading Zeros, Treating numbers as text

2010-12-21 Thread Jeff Newmiller

Use the colClasses argument with a vector of character strings naming the types 
you want each column to have, and specify character for your id column.

James Splinter james.r.splin...@gmail.com wrote:

Hello,

I have a data set, with some numerical values, some non-numerical data,
my
issue is that I need to preserve my ID numbers (numerics) with the
leading
zeros, but when I import the data into R (it's in .csv format) using
the
read.csv( ) command, it turns all the ID numbers (Example: 00210)
into
numbers, removing the leading zeros, so I end up with 210. I tried
using the
as.is= command on the column that I wanted to treat as text, but it
had no
effect.

Any help would be very much appreciated,

Thanks,

James

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
---
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Write.table eol argument

2010-12-21 Thread William Dunlap

At least on Windows, you need to open the
file in binary mode (as opposed to text
mode) to prevent the usual OS-dependent way
of encoding end-of-line. E.g.,

   z - data.frame(x=1:3, y=state.name[1:3])
   f - file(tmp.csv, open=wb)
   write.table(z, file=f, quote=FALSE, sep=;, eol=\n)
   close(f) # do not forget to close it!
   system(e:\\cygwin\\bin\\od -c --width=8 tmp.csv)
  000   x   ;   y  \n   1   ;   1   ;
  010   A   l   a   b   a   m   a  \n
  020   2   ;   2   ;   A   l   a   s
  030   k   a  \n   3   ;   3   ;   A
  040   r   i   z   o   n   a  \n
  047

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org] On Behalf Of Jim Moon
 Sent: Tuesday, December 21, 2010 1:37 PM
 To: R-help@r-project.org
 Subject: [R] Write.table eol argument
 
 Hello All,
 
 R 2.11.1
 Windows XP, 32-bit
 
 Help says that default is eol='\n'.  To me, that represents 
 Linefeed (LF)
 
 From Help:
 eol the character(s) to print at the end of each line 
 (row). For example, eol=\r\n will produce Windows' line 
 endings on a Unix-alike OS, and eol=\r will produce files 
 as expected by Mac OS Excel 2004.
 
 I would like for write.table to end each line with LF only-no 
 carriage return (CR).
 
 Default eol='\n'   generates  CRLF
 Explicit eol='\n'   generates  CRLF
 eol='\r'   generates CR
 eol='\r\n'  generates (predictably)  CRCRLF
 
 Thank you for your time.
 
 Jim
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Performing basic Multiple Sequence Alignment in R?

2010-12-21 Thread Mike Marchywka

 From: tal.gal...@gmail.com
 Date: Tue, 21 Dec 2010 20:17:18 +0200
 Subject: Re: [R] Performing basic Multiple Sequence Alignment in R?
 To: r-help@r-project.org

 Dear Mike and Thomas,

 From what I gathered here (Thanks to Joris Meys):
 http://stackoverflow.com/questions/4497747/how-to-perform-basic-multiple-sequence-alignments-in-r/4498434#4498434
 There is an R interface to the MUSCLE algorithm in the bio3d package
 (function seqaln()).
 But not one for clustal.

 I will probably end up using pairwiseAlignment on pairs of allignments
 with some sort of stopping rules (I'll have to play with it to see how
 it works).

http://scholar.google.com/scholar?hl=enq=%22exact+string+matching%22+alignment

http://citeseerx.ist.psu.edu/search?q=exact+string+matching+alignment+dnasubmit=Searchsort=rel

Certainly if you are flexible and can use whatever may be close in R that
is fine but I seem to recall that exact string matching was a fast and 
interesting way to go and maybe some of the authors above, in the interest
of promoting their work, would help implement an R version if there is demand.

I seem to recall I did something like building indexes of the strings to be 
aligned
first, finding substrings that were unique to a given string but appeared only
once in each of the sequences to be aligned ( this was the most restrictive 
criterion
but you can imagine how to make it more accomodating). Now that you got me 
started,
up front tokenizing or compiling of input sequences ( usually no more than 
indexing
them in some way ) made many later operations like alignment go faster. This
may have ended up being similar to BLAST but now I can't really recall. Anyway,
my point here is that some where in R there may be packages that
generate intermediate forms useful across disciplines- mining data from
text, linquistics, or macromolecule analysis.  In fact, the indexing process 
helps find things that have migrated a long ways from their original place
and there are probably other non-alignment related things you could
get out of the approach. 

 Thank you all for your answers.
 It is always helpful to from others if something was already
 implemented in R or not.

 Best,
 Tal

 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |
 972-52-7275845
 Read me: www.talgalili.com (Hebrew) |
 www.biostatistics.co.il (Hebrew) |
 www.r-statistics.com (English)
 --

 On Tue, Dec 21, 2010 at 2:44 PM, Mike Marchywka
  wrote:
 e came
 here with a task and was pointed to bio packages but I
 thought there m

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Matching 2 SQL tables

2010-12-21 Thread mathijsdevaan


Hi,

I have a postgresql and a mysql database and I would like to combine the
info from two different tables in R. Both databases contain a table with
three columns: project_name, release_id and release_date. So each project
output could be released multiple times (I am interested in the first
release_date). However, some of the data is missing. Basically, what I want
to do is to try and fill the missing data in 1 table with the data from the
other table. The difficulty here is that table1$project_name IS NOT
table2$project_name. Example: green-tree and green tree, new(Jacket) and
newJacket. Could you please help me?

Thanks!

Mathijs 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Matching-2-SQL-tables-tp3159678p3159678.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] randomForest: tuneRF error

2010-12-21 Thread Dennis

Just curious if anyone else has got this error before, and if so,
would know what I could do (if anything) to get past it:

 mtry - tuneRF(training, trainingdata$class, ntreeTry = 500, stepFactor = 2, 
 improve = 0.05, trace = TRUE, plot = TRUE, doBest = FALSE)
mtry = 13  OOB error = 0.62%
Searching left ...
mtry = 7OOB error = 1.38%
-1.22 0.05
Searching right ...
mtry = 26   OOB error = 0.24%
0.611 0.05
mtry = 52   OOB error = 0.07%
0.7142857 0.05
mtry = 104  OOB error = 0%
1 0.05
mtry = 173  OOB error = 0%
NaN 0.05
Error in if (Improve  improve) { : missing value where TRUE/FALSE
needed


I've used tuneRF successfully before, but in this instance, no matter
what I change in the parameters, I still get the error above (last
line). The data has no NAs in it. I'm using R 2.12.0 (64bit-M$ Windows
7).

Thanks in advance!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Link prediction in social network with R

2010-12-21 Thread EU JIN LOK


Dear R users
 
I'm a novice user of R and have absolutely no prior knowledge of social network 
analysis, so apologies if my question is trivial. I've spent alot of time 
trying to solve this on my own but I really can't so hope someone here can help 
me out. Cheers!
 
The dataset:
I'm trying to predict the existance of links (True or False) in a test set 
using a training set. Both data sets are in an edgelist format, where User 
IDs represents nodes in both columns with the 1st column directing to the 2nd 
column (see figure 1 below). Using the AUC to evaluate the performance, I am 
looking for the best algorithm to predict the existance of links in the test 
data (50% are true and rest are false).
 
Figure 1:
 training
Vertices: 1133143 
Edges: 999 
Directed: TRUE 
Edges:

[0]   105 -  850956
[1]   105 - 1073420
[2]   105 - 1102667
[3]   165 -  888346
[4]   165 -  579649
[5]   165 -  136665
etc..
 
I'm having problems obtaining the probability scores for the links / edges as 
most of the scores are for the nodes. An example of this is the graph.knn and 
page.rank module in igraph. 
 
So my questions are:
1) What do I need to do to obtain the scores for the links instead of the nodes 
(I presume it must be a data preparation step that I must be missing out)?
2) Which R package would be the best for running the various techniques - 
Jackard index, Adamic-Adar, common neightbours, PropFlow, etc
3) How to implement a supervised learning method such as random forest (I am 
guessing I need to obtain a feature list but again, how can I get the scores 
for the edges)? 
 
Hope I've explain my questions well but do let me know if more clarification is 
need. 
 
Thanks in advance
Eu Jin
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Density plot with lattice?

2010-12-21 Thread Dennis Murphy

Hi:

Try this:

densityplot( ~ height | voice.part, data = singer, layout = c(2, 4),
 xlab = Height (inches), bw = 5)
densityplot( ~ height | voice.part, data = singer, layout = c(2, 4),
 xlab = Height (inches), bw = 5, plot.points = FALSE)

The plot.points argument is actually associated with panel.densityplot(); in
this case, you can pass it from within densityplot().

HTH,
Dennis

On Tue, Dec 21, 2010 at 10:58 AM, Marie-Hélène Hachey 
marie_helen...@hotmail.com wrote:


 Hi,
 Is it possible to remove the points at the base of a density plot?I would
 like to keep only the curves of the plot, not the points.
 Thank you.
 Marie-Helene HacheyM.Sc. studentUniversite Laval, Quebec
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] please Help me on a repeated measures anova

2010-12-21 Thread soileil


I currently work on a draft of an aquatic bioassessment. The conditions
tested are the following: ER river water T dechlorinated water control 0.5 +
0.5mg / L of malate T + 1 dechlorinated water control + 1g / L of malate T
ED dechlorinated water control SED + ER + river water sediment SED ED +
sediment + water dechlorinated. It is the result of AChE in muscle (fillet
of fish). The production of acetylcholine is followed with a
spectrophotometer every 15 seconds for two minutes. The results are
presented in the following table:


traitement t15 t30 t45 t60 t75 t90 t105 t120
ER 0.100 0.110 0.123 0.135 0.147 0.159 0.171 0.182
ER 0.112 0.134 0.153 0.174 0.192 0.208 0.226 0.251
T+0.5 0.078 0.082 0.088 0.094 0.101 0.108 0.113 0.120
t+0.5 0.053 0.100 0.109 0.120 0.127 0.136 0.145 0.154
TED 0.107 0.126 0.141 0.161 0.172 0.184 0.200 0.213
TED 0.117 0.135 0.153 0.169 0.183 0.201 0.218 0.229
TED 0.124 0.145 0.163 0.187 0.208 0.227 0.244 0.259
T+1 0.109 0.119 0.134 0.148 0.163 0.174 0.187 0.202
T+1 0.118 0.134 0.153 0.170 0.184 0.197 0.214 0.228
SED+ER 0.158 0.175 0.194 0.208 0.226 0.240 0.259 0.268
SED+ED 0.119 0.140 0.157 0.174 0.192 0.208 0.225 0.240
SED+ED 0.101 0.113 0.180 0.140 0.154 0.166 0.179 0.190
SED+ED 0.129 0.135 0.140 0.146 0.153 0.159 0.165 0.172


The statistical test is considered a repeated measures anova but I do not
know how to do it in R. I watched the forums and I downloaded the R package
'nlme' by which I should be able to use the function 'lm'. But the problem
is that I can not encode this function. Could you help me?
-- 
View this message in context: 
http://r.789695.n4.nabble.com/please-Help-me-on-a-repeated-measures-anova-tp3159868p3159868.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] installation of R/parallel package in win32/64

2010-12-21 Thread Andy Zhu

This is to summarize my workaround to install R/parallel in win32/64 boxes.  
Recently I had problems to install rparallel:
1. The package's Makevars.win coded include fixed path for Rtools
2. package is written in C++; Rtools and R are not intended to run g++ by 
default.

My workaround:
1. Need to install R, Rtools as usual.  Note Rtools 2.12 has both 32bit mingw 
and 64 bit.  However, it doesn't include package for g++ and libstdc++.  You 
need to install these 2 packages into rtools from sourceforge first.
2. In rparallel source tree, open Makevars.win: change the PKG_CPPFLAGS and 
PKG_LIBS variables to point to your correct rtools and rtools/mingw directories.
3. In R_installation/etc: open i386 or win64 (forget the name for 64bit) and 
open Makeconf file; look for DLLFLAGS += ... and append -static-libstdc++.  
This 
flag will cause g++ to statically link in libstdc++.

then you can run usual R CMD INSTALL rparallel.

Good luck.


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to see what's wrong with a self written function?

2010-12-21 Thread jim holtman

Here is what I get when I have:

options(error=utils::recover)

I always run with the option so that on an error, I get dumped in the
browser to see what is happening.  It appears that 'i == 3' when the
error occurs and you can also see the values of 'x':



 regulafalsi=function(f,x0,x1){
+x=c()
+x[1]=x1
+i=1
+while ( f(x[i])!=0 ) {
+i=i+1
+if (i==2) {
+x[2]=x[1]-f(x[1])*(x[1]-x0)/(f(x[1])-f(x0))
+} else {
+
x[i]=x[i-1]-f(x[i-1])*(x[i-1]-x[i-2])/(f(x[i-1])-f(x[i-2]))
+}
+}
+x[i]
+ }
 regulafalsi(function(x) x^(1/2)+3*log(x)-5,10,1)
[1] 2.978429
 regulafalsi(function(x) x^(1/2)+3*log(x)-5,1,100)
Error in while (f(x[i]) != 0) { : missing value where TRUE/FALSE needed
In addition: Warning message:
In log(x) : NaNs produced

Enter a frame number, or 0 to exit

1: regulafalsi(function(x) x^(1/2) + 3 * log(x) - 5, 1, 100)

Selection: 1
Called from: top level
Browse[1] i
[1] 3
Browse[1] x
[1] 100.0  18.35661 -42.22301
Browse[1]


On Tue, Dec 21, 2010 at 2:39 PM, casperyc caspe...@hotmail.co.uk wrote:

 Hi all,

 I am writing a simple function to implement regularfalsi (secant) method.

 ###
 regulafalsi=function(f,x0,x1){
        x=c()
        x[1]=x1
        i=1
        while ( f(x[i])!=0 ) {
                i=i+1
                if (i==2) {
                        x[2]=x[1]-f(x[1])*(x[1]-x0)/(f(x[1])-f(x0))
                } else {
                        
 x[i]=x[i-1]-f(x[i-1])*(x[i-1]-x[i-2])/(f(x[i-1])-f(x[i-2]))
                }
        }
        x[i]
 }
 ###

 These work fine,
 regulafalsi(function(x) x^(1/2)+3*log(x)-5,1,10)
 regulafalsi(function(x) x^(1/2)+3*log(x)-5,10,1)

 For all x0, the function is strictly increasing.

 Then

 regulafalsi(function(x) x^(1/2)+3*log(x)-5,1,100)

 Error in while (f(x[i]) != 0) { : missing value where TRUE/FALSE needed
 In addition: Warning message:
 In log(x) : NaNs produced

 I dont know what happened there, is there a way to find the value for
 f(x[i])
 that R can't determine TRUE/FALSE?

 Thanks!

 casper
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/how-to-see-what-s-wrong-with-a-self-written-function-tp3159528p3159528.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] matrix indexing in 'for' loop?

2010-12-21 Thread govindas



Hi,

I am having trouble with matrices. I have 2 matrices as given below, and I am 
interested in using these matrices inside for loops used to calculate 
correlations. I am creating a list with the names of the matrices assuming this 
list could be indexed inside the 'for' loop to retrieve the matrix values. But, 
as expected the code throws out an error. Can someone suggest a better way to 
call these matrices inside the loops?

ts.m.dmi - matrix(c(1:20), 4, 5) 
ts.m.soi - matrix(c(21:40), 4, 5) 
ts.m.pe - matrix(c(21:40), 4, 5) 

factors - c(ts.m.dmi, ts.m.soi)
for (j in 0:1){
y - factors[j+1]

for (i in 1:5){

cor.pe.y - cor(ts.m.pe[,2], y[,i])
ct.tst - cor.test(ts.m.pe[,2], y[,i])
}
}

Thanks for your time. 

-- 
Regards,
Maha
Graduate Student
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matrix indexing in 'for' loop?

2010-12-21 Thread Phil Spector


To make your loop work, you need to learn about the get function.
I'm not going to give you the details because there are better
approaches available.
First, let's make some data that will give values which can 
be verified.  (All the correlations of the data you created

are exactly equal to 1.)  And to make the code readable, I'll
omit the ts.m prefix.


set.seed(14)
dmi = matrix(rnorm(20),4,5)
soi = matrix(rnorm(20),4,5)
pe = matrix(rnorm(20),4,5)
allmats = list(dmi,soi,pe)


Since cor.test won't automatically do the tests for all columns
of a matrix, I'll write a little helper function:


gettests = function(x)apply(x,2,function(col)cor.test(pe[,2],col)
tests = lapply(allmats,gettests)


Now tests is a list of length 2, with a list of the output from
cor.test for the five columns of the each matrix with pe[,2]
(Notice that in your program you made no provision to store 
the results anywhere.)


Suppose you want the correlations:


sapply(tests,function(x)sapply(x,function(test)test$estimate))

   [,1]   [,2]
cor  0.12723615  0.1342751
cor  0.07067819  0.6228158
cor -0.28761533  0.6218661
cor  0.83731828 -0.9602551
cor -0.36050836  0.1170035

The probabilities for the tests can be found similarly:


sapply(tests,function(x)sapply(x,function(test)test$p.value))

  [,1]   [,2]
[1,] 0.8727638 0.86572490
[2,] 0.9293218 0.37718416
[3,] 0.7123847 0.37813388
[4,] 0.1626817 0.03974489
[5,] 0.6394916 0.88299648

(Take a look at the Values section in the help file for cor.test
to get the names of other quantities of interest.)

The main advantage to this approach is that if you add more matrices
to the allmats list, the other steps automaticall take it into account.

Hope this helps.
- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu





On Tue, 21 Dec 2010, govin...@msu.edu wrote:




Hi,

I am having trouble with matrices. I?have 2 matrices as given below, and I am interested 
in using these matrices inside for loops used to calculate correlations. I am 
creating a list with the names of the matrices assuming this list could be indexed inside 
the 'for' loop to retrieve the matrix values. But, as expected the code throws out an 
error. Can someone suggest a better way to call these matrices inside the loops?

ts.m.dmi - matrix(c(1:20), 4, 5)
ts.m.soi - matrix(c(21:40), 4, 5)
ts.m.pe - matrix(c(21:40), 4, 5)

factors - c(ts.m.dmi, ts.m.soi)
for (j in 0:1){
y - factors[j+1]

for (i in 1:5){

cor.pe.y - cor(ts.m.pe[,2], y[,i])
ct.tst - cor.test(ts.m.pe[,2], y[,i])
}
}

Thanks for your time.

--
Regards,
Maha
Graduate Student
[[alternative HTML version deleted]]




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] please Help me on a repeated measures anova

2010-12-21 Thread Dennis Murphy

Hi:

I did the following (note a fix to the assumed typo in t + 0.5 - T + 0.5)
using the melt() function in package reshape2, the lattice graphics package
and package lme4. I named your input data df.

library(reshape) # or reshape2 if you have it
# Fix the typo:
df[4, 1] - 'T+0.5'
# Redefine the factor to produce the correct number of levels:
df$traitement - factor(df$traitement)
# Create a subject variable to distinguish profiles in time
df$subject - as.numeric(row.names(df))
# reshape the data from wide format to long
dm - melt(df, id = c('traitement', 'subject'))
# sort the reshaped data frame
dm - dm[order(dm$traitement, dm$subject, dm$variable), ]
head(dm)
# Create a numeric time variable by stripping off the 't's
dm$time - as.numeric(sub('^t','', dm$variable))

# Plot the individual profiles over time by treatment type
library(lattice)
xyplot(value ~ time | traitement, data = dm, groups = subject, type = c('p',
'l'))

# The individual profiles are almost uniformly linearly increasing
# with a couple of obvious nonconforming points visible in the plots.
# There are mean differences among treatments,
# but also unbalanced replication in subjects. Treatment (SED + ER) has only
one subject.

# One way to fit a model:
library(lme4)
m1 - lmer(value ~ traitement  + (1 + time | subject), data = dm, reml = 0)
summary(m1)

This fits a mixed effects model with random subjects and time as a repeated
measures variable, using maximum likelihood to fit the model. This
particular specification treats time as numeric rather than factor because
the linear component is so strong, but it is possible to replace it with the
factor version instead (variable in data frame dm). The output of this model
fit shows a very small residual effect, a strong correlation between time
and subject (the sign seems wrong, though) and about the same amount of
variation between subjects as within subjects. This is a model you should
seriously consider, as it takes proper account of the randomness of subjects
and the nesting of time as a linear effect within subject. I would encourage
you to follow this direction, but there is much to learn if you are to use
the lme4 package.

I suspect, however, you're looking for something more along the lines of
Anova() in the car package, which uses the 'traditional' ANOVA approach to
repeated measures models. If you go in this direction, be sure you
understand the underlying assumptions of the model.

For multiple comparisons, which I presume you'll want to investigate,
there's the TukeyHSD() function that you could use with Anova(), or for more
general methods, the multcomp package, which has a function glht() that can
be used with a mixed effects model per above or with an Anova() object. The
multcomp package has several useful vignettes and a recent book that
describes its essential features.

Refs:
Bretz, Hothorn and Westfall (2010). Multiple Comparisons in R. Chapman 
Hall.
Fox and Weisberg (2011). An R Companion to Applied Regression, 2nd ed. Sage
Publications.  (Just out!)

HTH,
Dennis


On Tue, Dec 21, 2010 at 3:10 PM, soileil soil...@msn.com wrote:


 I currently work on a draft of an aquatic bioassessment. The conditions
 tested are the following: ER river water T dechlorinated water control 0.5
 +
 0.5mg / L of malate T + 1 dechlorinated water control + 1g / L of malate T
 ED dechlorinated water control SED + ER + river water sediment SED ED +
 sediment + water dechlorinated. It is the result of AChE in muscle (fillet
 of fish). The production of acetylcholine is followed with a
 spectrophotometer every 15 seconds for two minutes. The results are
 presented in the following table:


 traitement t15 t30 t45 t60 t75 t90 t105 t120
 ER 0.100 0.110 0.123 0.135 0.147 0.159 0.171 0.182
 ER 0.112 0.134 0.153 0.174 0.192 0.208 0.226 0.251
 T+0.5 0.078 0.082 0.088 0.094 0.101 0.108 0.113 0.120
 t+0.5 0.053 0.100 0.109 0.120 0.127 0.136 0.145 0.154
 TED 0.107 0.126 0.141 0.161 0.172 0.184 0.200 0.213
 TED 0.117 0.135 0.153 0.169 0.183 0.201 0.218 0.229
 TED 0.124 0.145 0.163 0.187 0.208 0.227 0.244 0.259
 T+1 0.109 0.119 0.134 0.148 0.163 0.174 0.187 0.202
 T+1 0.118 0.134 0.153 0.170 0.184 0.197 0.214 0.228
 SED+ER 0.158 0.175 0.194 0.208 0.226 0.240 0.259 0.268
 SED+ED 0.119 0.140 0.157 0.174 0.192 0.208 0.225 0.240
 SED+ED 0.101 0.113 0.180 0.140 0.154 0.166 0.179 0.190
 SED+ED 0.129 0.135 0.140 0.146 0.153 0.159 0.165 0.172


 The statistical test is considered a repeated measures anova but I do not
 know how to do it in R. I watched the forums and I downloaded the R package
 'nlme' by which I should be able to use the function 'lm'. But the problem
 is that I can not encode this function. Could you help me?
 --
 View this message in context:
 http://r.789695.n4.nabble.com/please-Help-me-on-a-repeated-measures-anova-tp3159868p3159868.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list

Re: [R] how to control ticks

2010-12-21 Thread csrabak


Em 21/12/2010 14:35, peter dalgaard escreveu:


On Dec 21, 2010, at 17:01 , Edwin Groot wrote:


On Tue, 21 Dec 2010 18:06:52 +0530
Yogesh Tiwariyogesh@googlemail.com  wrote:

Hi,
I want 12 ticks at axis 1 and want to write Jan-Dec on each.

something like:

axis(1, at=1:12,
labels=c('J','F','M','A','M','J','J','A','S','O','N','D'))

I could omit default ticks but now how to control ticks.



Dear Yogesh,
I spray my clothing with No-Bite, and that controls ticks quite well.


Yeah, but then how do you get the suckers to sit still while you write on them?

;-)



You start telling them a story so they keep quiet paying attention?

LOL

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Estimate between-axes vs within-axes heterogeneity of multivariate matrices

2010-12-21 Thread Nikos Alexandris

Hi!

My question(s) in the end might be silly but I am no expert on this, so here 
it goes:

Noy-Meir (1973), Pielou (1984) and a few others have pointed to non-centered 
PCA being in some cases useful. They clearly explain that it is the case 
when multi-dimensional data display distinct clusters (which have zero, or 
near-zero, projections in some subset of the axes) and the task is (exactly) 
to separate this clusters among the principal components.

I have done my complete work using prcomp() and tested combinations of 
center=FALSE/TRUE and scale=FALSE/TRUE. I would like to now check this 
between-axes vs within-axes heterogeneity of my data and cross-check 
results with the various tested PCA-versions.

Is there any (official or custom) function available in R that could answer 
this question? Some relative/comparative (preferrable simple and intuitive) 
measure(s)? Something that would graphically perhaps give an indication 
without time-consuming clustering, sampling or whatsoever processing?

Even though the above mentoined authors mention some measure for the assymetry 
of the yielded compoenents ( uncentered - unipolar, centered - bipolar) I 
find the concept a bit hard to understand.

Isn't there a quick way (function) to just say (with numbers of plots of 
course) well, it seems that the data are heterogenous looking at between-
axes or  the other way around it looks like the variables differ within, 
more than between?

Apologies for repeating the same question (trying to understand the problem 
myself). Thank you, Nikos

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Estimate between-axes vs within-axes heterogeneity of multivariate matrices

2010-12-21 Thread Nikos Alexandris

Hi!

My question(s) in the end might be silly but I am no expert on this, so here 
it goes:

Noy-Meir (1973), Pielou (1984) and a few others have pointed to non-centered 
PCA being in some cases useful. They clearly explain that it is the case 
when multi-dimensional data display distinct clusters (which have zero, or 
near-zero, projections in some subset of the axes) and the task is (exactly) 
to separate this clusters among the principal components.

I have done my complete work using prcomp() and tested combinations of 
center=FALSE/TRUE and scale=FALSE/TRUE. I would like to now check this 
between-axes vs within-axes heterogeneity of my data and cross-check 
results with the various tested PCA-versions.

Is there any (official or custom) function available in R that could answer 
this question? Some relative/comparative (preferrable simple and intuitive) 
measure(s)? Something that would graphically perhaps give an indication 
without time-consuming clustering, sampling or whatsoever processing?

Even though the above mentoined authors mention some measure for the assymetry 
of the yielded compoenents ( uncentered - unipolar, centered - bipolar) I 
find the concept a bit hard to understand.

Isn't there a quick way (function) to just say (with numbers of plots of 
course) well, it seems that the data are heterogenous looking at between-
axes or  the other way around it looks like the variables differ within, 
more than between?

Apologies for repeating the same question (trying to understand the problem 
myself). Thank you, Nikos

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] zoo.read intraday data

2010-12-21 Thread szimine


Hi Gabor et al.

the 
 f3 - function(...) as.POSIXct(paste(...), format = %Y%m%d %H:%M:%S ) 

helped me to read intraday data from file
##
TICKER,NAME,PER,DATE,TIME,OPEN,HIGH,LOW,CLOSE,VOL,OPENINT
ICE.BRN,ice.brn_m5,5,20100802,10:40:00,79.21000,79.26000,79.16000,79.2,238,0
ICE.BRN,ice.brn_m5,5,20100802,10:45:00,79.19000,79.26000,79.19000,79.21000,413,0
##

##intraday data 5m  file
fnameId= ./finam_brn_m5.csv
pDateTimeColumns - list(4,5) 
b - read.zoo(fnameId, index=pDateTimeColumns , sep=,, header=TRUE, 
FUN=f3   )
xb - as.xts(b)


 head(b,2) ##
X.TICKER. X.NAME.X.PER. X.OPEN. X.HIGH. X.LOW.
X.CLOSE. X.VOL. X.OPENINT.
2010-08-02 10:40:00 ICE.BRN   ice.brn_m5 5  79.21   79.26   79.16  79.20
238   0 
2010-08-02 10:45:00 ICE.BRN   ice.brn_m5 5  79.19   79.26   79.19  79.21
413   0

problem is that after the conversion to xts  numeric values got converted to
chars

 head(xb,2)
X.TICKER. X.NAME.  X.PER. X.OPEN. X.HIGH. X.LOW. 
X.CLOSE. X.VOL. X.OPENINT.
2010-08-02 10:40:00 ICE.BRN ice.brn_m5 579.21 79.26 79.16
79.20   238 0   
2010-08-02 10:45:00 ICE.BRN ice.brn_m5 579.19 79.26 79.19
79.21   413 0 


and quantmod charting does  not work.

Q.  how to prevent converting to char with xts ? 

I suspect the problem is that index is constructed from two columns  date 
and time. 


 sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 
other attached packages:
[1] quantmod_0.3-15 TTR_0.20-2  Defaults_1.1-1  xts_0.7-6.11   
zoo_1.7-0 


Slava

-- 
View this message in context: 
http://r.789695.n4.nabble.com/zoo-read-intraday-data-tp3010256p3160102.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to control ticks

2010-12-21 Thread Yogesh Tiwari

Hi Jim,

Yes you are right, file$time is decimal date. In the attached plot I want to
replace decimal
date with proper time axis so I can show month ticks. Decimal date misleads
sometime
while interpretation. Data varies from Jan-Dec 2009.

Thanks,

Yogesh

On Tue, Dec 21, 2010 at 9:57 PM, jim holtman jholt...@gmail.com wrote:

 What is the structure of file$time? Is it Date/POSIXct?  'at=1:12'
 only works if those are the dimensions of file$time.  So give us an
 idea of what the data is (PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html and provide commented,
 minimal, self-contained, reproducible code).



 On Tue, Dec 21, 2010 at 7:36 AM, Yogesh Tiwari
 yogesh@googlemail.com wrote:
  Hi,
  I want 12 ticks at axis 1 and want to write Jan-Dec on each.
 
  something like:
 
  axis(1, at=1:12,
 labels=c('J','F','M','A','M','J','J','A','S','O','N','D'))
 
  I could omit default ticks but now how to control ticks.
 
  plot(file$time, file$ch4*1000, ylim=c(1500,1700), xaxt='n', xlab= NA,
  ylab=NA,col=blue,yaxs=i,lwd=2, pch=10, type=b)#
 
  axis(1, at=1:12,
 labels=c('J','F','M','A','M','J','J','A','S','O','N','D'))
 
  BUT above is not working, and there is no error as well.
 
  Pls help,
 
  Regards,
  Yogesh
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?




-- 
Yogesh K. Tiwari (Dr.rer.nat),
Scientist,
Centre for Climate Change Research,
Indian Institute of Tropical Meteorology,
Homi Bhabha Road,
Pashan,
Pune-411008
INDIA

Phone: 0091-99 2273 9513 (Cell)
 : 0091-20-25904452 (O)
Fax: 0091-20-258 93 825
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

79 matches

Mail list logo