Re: [R] Odp: For-loop
Hi Petr, thank you, I got it. In fact I was looking for the function aggregate() which I didn't know. aggregate(x = t(data), by = list(cov$Month, cov$Area), FUN = sum) that is doing exactly what I need. Anne -Ursprüngliche Nachricht- Von: Petr PIKAL Gesendet: 20.12.2010 14:23:50 An: Anne-Christine Mupepele [ Betreff: Odp: [R] For-loop Hi r-help-boun...@r-project.org napsal dne 20.12.2010 11:48:51: Hi, I have the following problem: I have a data.frame with 36 sample sites (colums) for which I have covariates in 3 categories: Area, Month and River. Each Area consists of 3 rivers, which were sampled over 3 month. Now I want to fuse River 1-3 for one area in one month. To get a data.frame with 12 colums. I am trying to do a for loop (which may be a complicated solution, but I don't see an easier way), which is not working, apparently because a[,ij] or a [,c(i,j)] is not working as a definition of the matrix with a double condition in the colums. How can I make it work or what would be an easier solution? Thank you for your help, Anne data=data.frame(matrix(1:99,nrow=5,ncol=36)) colnames(data)=c(paste(plot,1:36)) cov=data.frame(rep(1:3,12),c(rep(Jan,12),rep(Feb,12),rep(Mar,12)),rep(c (1,1,1,2,2,2,3,3,3,4,4,4),3)) dimnames(cov)=list(colnames(data),c(River,Month,Area)) ###loop### a=matrix(nrow=dim(data)[1],ncol=length(levels(factor(cov$Month)))*length (levels(factor(cov$Area for(i in 1:length(levels(factor(cov$Month { for(j in 1:length(levels(factor(cov$Area { a[,ij]=as.numeric(rowSums(data[,factor(cov$Month)==levels(factor(cov$Month)) [i]factor(cov$Area)==levels(factor(cov$Area))[j]])) } } I am not exactly sure what you want to do. What operation is fuse? If it is sum so having you data you can do area-rep(1:12, each=3) data.t-t(data) aggregate(data.t, list(area), sum) Group.1 V1 V2 V3 V4 V5 11 18 21 24 27 30 22 63 66 69 72 75 33 108 111 114 117 120 44 153 156 159 162 165 55 198 201 204 207 210 66 243 246 249 252 255 77 189 192 195 198 102 88 36 39 42 45 48 99 81 84 87 90 93 10 10 126 129 132 135 138 11 11 171 174 177 180 183 12 12 216 219 222 225 228 t(aggregate(data.t, list(area), sum)) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] Group.1123456789101112 V118 63 108 153 198 243 189 36 81 126 171 216 V221 66 111 156 201 246 192 39 84 129 174 219 V324 69 114 159 204 249 195 42 87 132 177 222 V427 72 117 162 207 252 198 45 90 135 180 225 V530 75 120 165 210 255 102 48 93 138 183 228 but then there is Month value, which is not apparent from your example. Maybe t(aggregate(data.t, list(area, data.t$Month), sum)) Could do the trick but you probably need to show us maybe str and/or head of your real data. Regards Petr __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ___ GRATIS! Movie-FLAT mit über 300 Videos. Jetzt freischalten unter http://movieflat.web.de __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R CMD build/install: wrong Rtools include path is passed to g++
Hi: I am trying to build/install rparallel source package in win32 using Rtools/R CMD. However, R CMD build or install fails. The R CMD build output shows that the path of Rtools/MinGW/include is wrong in g++ -I. How can I pass/configure the correct include path to R CMD? Tried this in both R 2.12 and 2.11 with compatible Rtools and Miktex/chm helper. Neither succeeded. Note, the R/Rtools/MinGW setting works fine if the package doesn't have C/C++ code. I was able to install my own R package which doesn't have C/C++ code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] monthly median in a daily dataset
Hi Dennis, I am looking for similar function and this post is useful. But a strange thing is happening when I try which I couldn't figure out (details below). Could you or anyone help me understand why this is so? df = data.frame(date = seq(as.Date(2010-1-1), by = days, length = 250)) df$value = cumsum(rnorm(1:250)) When I use the statement (as given in ?aggregate help file) the following error is displayed aggregate(df$value, by = months(df$date), FUN = median) Error in aggregate.data.frame(as.data.frame(x), ...) : 'by' must be a list But it works when I use as was suggested aggregate(value~months(date), data = df, FUN = median) months(date) value 1April 15.5721440 2 August -0.1261205 3 February -1.0230631 4 January -0.9277885 5 July -2.1890907 6 June 1.3045260 7March 11.4126371 8 May 2.1625091 The second question, is it possible to have the median across the months and years. Say I have daily data for last five years the above function will give me the median of Jan of all the five years, while I want Jan-2010, Jan-2009 and so... Wish my question is clear. Any assistance will be greatly appreciated and many thanks for the same. Regards, Krishna Date: Sun, 19 Dec 2010 15:42:15 -0800 From: Dennis Murphy djmu...@gmail.com To: HUXTERE emilyhux...@gmail.com Cc: r-help@r-project.org Subject: Re: [R] monthly median in a daily dataset Message-ID: aanlktimxtjhbse1mq4o121fekxtf8d1psyeegzkkz...@mail.gmail.com Content-Type: text/plain Hi: There is a months() function associated with Date objects, so you should be able to do something like aggregate(value ~ months(date), data = data$flow$daily, FUN = median) Here's a toy example because your data are not in a ready form: df - data.frame(date = seq(as.Date('2010-01-01'), by = 'days', length = 250), val = rnorm(250)) aggregate(val ~ months(date), data = df, FUN = median) months(date) val 1April -0.18864817 2 August -0.16203705 3 February 0.03671700 4 January 0.04500988 5 July -0.12753151 6 June 0.09864811 7March 0.23652105 8 May 0.25879994 9September 0.53570764 HTH, Dennis On Sun, Dec 19, 2010 at 2:31 PM, HUXTERE emilyhux...@gmail.com wrote: Hello, I have a multi-year dataset (see below) with date, a data value and a flag for the data value. I want to find the monthly median for each month in this dataset and then plot it. If anyone has suggestions they would be greatly apperciated. It should be noted that there are some dates with no values and they should be removed. Thanks Emily print ( str(data$flow$daily) ) 'data.frame': 16071 obs. of 3 variables: $ date :Class 'Date' num [1:16071] -1826 -1825 -1824 -1823 -1822 ... $ value: num NA NA NA NA NA NA NA NA NA NA ... $ flag : chr ... NULL 5202008-11-01 0.034 1041 2008-11-02 0.034 1562 2008-11-03 0.034 2083 2008-11-04 0.038 2604 2008-11-05 0.036 3125 2008-11-06 0.035 3646 2008-11-07 0.036 4167 2008-11-08 0.039 4688 2008-11-09 0.039 5209 2008-11-10 0.039 5730 2008-11-11 0.038 6251 2008-11-12 0.039 6772 2008-11-13 0.039 7293 2008-11-14 0.038 7814 2008-11-15 0.037 8335 2008-11-16 0.037 8855 2008-11-17 0.037 9375 2008-11-18 0.037 9895 2008-11-19 0.034B 10415 2008-11-20 0.034B 10935 2008-11-21 0.033B 11455 2008-11-22 0.034B 11975 2008-11-23 0.034B 12495 2008-11-24 0.034B 13016 2008-11-25 0.034B 13537 2008-11-26 0.033B 14058 2008-11-27 0.033B 14579 2008-11-28 0.033B 15068 2008-11-29 0.034B 15546 2008-11-30 0.035B 5212008-12-01 0.035B 1042 2008-12-02 0.034B 1563 2008-12-03 0.033B 2084 2008-12-04 0.031B 2605 2008-12-05 0.031B 3126 2008-12-06 0.031B 3647 2008-12-07 0.032B 4168 2008-12-08 0.032B 4689 2008-12-09 0.032B 5210 2008-12-10 0.033B 5731 2008-12-11 0.033B 6252 2008-12-12 0.032B 6773 2008-12-13 0.031B 7294 2008-12-14 0.030B 7815 2008-12-15 0.030B 8336 2008-12-16 0.029B 8856 2008-12-17 0.028B 9376 2008-12-18 0.028B 9896 2008-12-19 0.028B 10416 2008-12-20 0.027B 10936 2008-12-21 0.027B 11456 2008-12-22 0.028B 11976 2008-12-23 0.028B 12496 2008-12-24 0.029B 13017 2008-12-25 0.029B 13538 2008-12-26 0.029B 14059 2008-12-27 0.030B 14580 2008-12-28 0.030B 15069 2008-12-29 0.030B 15547 2008-12-30 0.031B 15851 2008-12-31 0.031B -- View this message in context: http://r.789695.n4.nabble.com/monthly-median-in-a-daily-dataset-tp3094917p30 94917.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
[R] predict function for kmeans
Hi, I am using kmeans algorithm to cluster my training dataset.After the model is generated, i need to apply it to my production dataset and see the clusters it falls into.But, i am unable to find a predict function for kmeans to do this. Could you please let me know if there is a predict function in R to perform this? In SPSS, once the kmeans model is generated , it can be applied to a new dataset and find the clusters.I am trying to do something similar in R. Thanks in advance. -- View this message in context: http://r.789695.n4.nabble.com/predict-function-for-kmeans-tp3121557p3121557.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R CMD build/install: wrong Rtools include path is passed to g++
Never mind. Found the solution: the package coded the rtools path in Makevars.win. So I was able to compile (but have another problem though). But not sure if there is an environment name for rtools, maybe RTOOLS_HOME ... Thanks. - Forwarded Message From: Andy Zhu andyzh...@yahoo.com Cc: r-help@r-project.org Sent: Mon, December 20, 2010 11:33:31 PM Subject: [R] R CMD build/install: wrong Rtools include path is passed to g++ Hi: I am trying to build/install rparallel source package in win32 using Rtools/R CMD. However, R CMD build or install fails. The R CMD build output shows that the path of Rtools/MinGW/include is wrong in g++ -I. How can I pass/configure the correct include path to R CMD? Tried this in both R 2.12 and 2.11 with compatible Rtools and Miktex/chm helper. Neither succeeded. Note, the R/Rtools/MinGW setting works fine if the package doesn't have C/C++ code. I was able to install my own R package which doesn't have C/C++ code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] replace values of a table !!!
Dear all, Dear all, I am a relatively new user. I have an ascii file with 550 rows and 400 columns. The file contain values ranging from 1 to 2000 and some values with -. I want to generate a new file where the - values are replaced with 0 values, the other values with the 1.0 value. What should I do, Thanks Taiseer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Performing basic Multiple Sequence Alignment in R?
Hello everyone, I am not sure if this should go on the general R mailing list (for example, if there is a text mining solution that might work here) or the bioconductor mailing list (since I wasn't able to find a solution to my question on searching their lists) - so this time I tried both, and in the future I'll know better (in case it should go to only one of the two). The task I'm trying to achieve is to align several sequences together. I don't have a basic pattern to match to. All that I know is that the True pattern should be of length 30 and that the sequences I'm looking at, have had missing values introduced to them at random points. Here is an example of such sequences, were on the left we see what is the real location of the missing values, and on the right we see the sequence that we will be able to observe. My goal is to reconstruct the left column using only the sequences I've got on the right column (based on the fact that many of the letters in each position are the same) Real_sequence The_sequence_we_see 1 CGCAATACTAAC-AGCTGACTTACGCACCG CGCAATACTAACAGCTGACTTACGCACCG 2 CGCAATACTAGC-AGGTGACTTCC-CT-CG CGCAATACTAGCAGGTGACTTCCCTCG 3 CGCAATGATCAC--GGTGGCTCCCGGTGCG CGCAATGATCACGGTGGCTCCCGGTGCG 4 CGCAATACTAACCA-CTAACT--CGCTGCG CGCAATACTAACCACTAACTCGCTGCG 5 CGCACGGGTAAGAACGTGA-TTACGCTCAG CGCACGGGTAAGAACGTGATTACGCTCAG 6 CGCTATACTAACAA-GTG-CTTAGGC-CTG CGCTATACTAACAAGTGCTTAGGCCTG 7 CCCA-C-CTAA-ACGGTGACTTACGCTCCG CCCACCTAAACGGTGACTTACGCTCCG Here is an example code to reproduce the above example: ATCG - c(A,T,C,G) set.seed(40) original.seq - sample(ATCG, 30, T) seqS - matrix(original.seq,200,30, T) change.letters - function(x, number.of.changes = 15, letters.to.change.with = ATCG) { number.of.changes - sample(seq_len(number.of.changes), 1) new.letters - sample(letters.to.change.with , number.of.changes, T) where.to.change.the.letters - sample(seq_along(x) , number.of.changes, F) x[where.to.change.the.letters] - new.letters return(x) } change.letters(original.seq) insert.missing.values - function(x) change.letters(x, 3, -) insert.missing.values(original.seq) seqS2 - t(apply(seqS, 1, change.letters)) seqS3 - t(apply(seqS2, 1, insert.missing.values)) seqS4 - apply(seqS3,1, function(x) {paste(x, collapse = )}) require(stringr) # library(help=stringr) all.seqS - str_replace(seqS4,- , ) # how do we allign this? data.frame(Real_sequence = seqS4, The_sequence_we_see = all.seqS) I understand that if all I had was a string and a pattern I would be able to use library(Biostrings) pairwiseAlignment(...) But in the case I present we are dealing with many sequences to align to one another (instead of aligning them to one pattern). Is there a known method for doing this in R? Thanks, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] .Rd file for S4-method warning
Hi Duncan, thanks for the quick answer! Oops, I have really overseen the correct argument specs... In the beginning roxygen did not support S4 but by now it does. The usage line is created automatically by roxygen in this case (only when using S4 classes and I am not sure when and when not). Not being an roxygen expert, I feel that roxygen limits the creation of .Rd files at some points but still is a great thing I use a lot. For those interested, here is how it worked for me with roxygen: #' Show method for testClass #' #' @param object a \code{testClass} object #' @docType methods #' @aliases show, testClass-method #' @usage \S4method{show}{testClass}(object) #' setMethod(show, testClass, function(object){ }) Mark Am 20.12.2010 um 23:31 schrieb Duncan Murdoch: On 20/12/2010 5:18 PM, Mark Heckmann wrote: Dear R users, I want to create a proper .Rd file for the show method for an S4 class. I am encountering problems in the \usage{} line, I guess. An example: setClass(testClass, representation(a=character)) setMethod(show, testClass, function(object){ }) The .Rd file: \name{show,-method} \alias{show,testClass-method} \alias{show} \title{Show method for testClass...} \usage{\S4method{show}{testClass}(object) } \description{Show method for testClass} \arguments{\item{testClass}{object} } CHECK says: * checking Rd \usage sections ... WARNING Undocumented arguments in documentation object 'show,-method' object What would be a correct \usage line? Writing R extensions says: \S4method{generic}{signature_list}(argument_list) That's okay, the warning is about the fact that you didn't document object in the \arguments section. You had \item{testClass}{object} but you should have had \item{object}{some description of what object is} As yours was written, it's documentation for the testclass argument, which doesn't exist. What am I doing wrong? It works though if I simply delete the \usage line. Unfortunately I use roxygen and the line is created automatically, so I need to create it properly. Does roxygen also create the argument? Looks like a bug or limitation (I seem to recall that roxygen doesn't support S4, or didn't in the past...) Duncan Murdoch Thanks in advance, Mark Mark Heckmann Blog: www.markheckmann.de R-Blog: http://ryouready.wordpress.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Mark Heckmann Dipl. Wirt.-Ing. cand. Psych. Celler Straße 27 28205 Bremen Blog: www.markheckmann.de R-Blog: http://ryouready.wordpress.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] NA's in survey analysis
Hello, I am trying to analyze sociological survey data using R. It is often important in survey to calculate both the actual factor sums and percentages (easily done with describe() ), but also the numbers and total percentage of NA's. Often it is important to present NA's in graphs besides the factors. Is there any easy way to make R treat NA's as if those were factors besides other factors? Now, describe(data$a) gives me percentages only for the factors. So I have to redo percentages manually. barplot() also ignores NA's. So, to include NA's into barplot I need to do a table more or less manually. The other way to do it is to convert NA's into factors (doable, although, unlike in SPSS, I cannot make an assumption that 99 is a good code for a factor NA – it has to be the next number in the factor list,so, might be different for each column in a data frame). And besides, I have read somewhere in this list that IT IS THE WRONG WAY TO DO STUFF IN R :) Is there the right way to do things that I want, and if not – what are the possible workarounds, smarter than the ones I listed? -- Donatas Glodenis -- Donatas Glodenis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: NA's in survey analysis
Hi r-help-boun...@r-project.org napsal dne 21.12.2010 11:02:07: Hello, I am trying to analyze sociological survey data using R. It is often important in survey to calculate both the actual factor sums and percentages (easily done with describe() ), but also the numbers and total percentage of NA's. Often it is important to present NA's in graphs besides the factors. Is there any easy way to make R treat NA's as if those were factors besides other factors? Now, describe(data$a) gives me percentages only for the factors. So I have to redo percentages manually. barplot() also ignores NA's. So, to include NA's into barplot I need to do a table more or less manually. The other way to do it is to convert NA's into factors (doable, although, unlike in SPSS, I cannot make an assumption that 99 is a not necessary to code missing values, you can set NA as one level. x-factor(sample(c(1:3, NA),20,replace=T), exclude=NULL) x [1] 1133323NA 312NA 3NA 2 [16] 231NA 3 Levels: 1 2 3 NA y-rnorm(20) boxplot(split(y,x)) Besides you could find it from factor help page as I did. Regards Petr good code for a factor NA – it has to be the next number in the factor list,so, might be different for each column in a data frame). And besides, I have read somewhere in this list that IT IS THE WRONG WAY TO DO STUFF IN R :) Is there the right way to do things that I want, and if not – what are the possible workarounds, smarter than the ones I listed? -- Donatas Glodenis -- Donatas Glodenis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: replace values of a table !!!
Hi r-help-boun...@r-project.org napsal dne 21.12.2010 09:59:31: Dear all, Dear all, I am a relatively new user. I have an ascii file with 550 rows and 400 columns. The file contain values ranging from 1 to 2000 and some values with -. I want to generate a new file where the - values are replaced with 0 values, the other values with the 1.0 value. Do you want to use R for it? If yes you can read the file and set - as missing value see ?read.table further on you can change not NA values to 1 by your.data[!is.na(your.data)] - 1 and NA values to 0 by your.data[is.na(your.data)] - 0 Regards Petr What should I do, Thanks Taiseer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] labels and barchart
Hello, I'm wondering how to set a value of mar ( par( mar=c(...)) ) in order to allow labels to be visible in barplot. Is there any relation between the number of characters in a label and the second value of mar? Look at my example. x - seq(20, 100, by=15) ety - rep( Effect on treatment group, times=length(x)) barplot(x, names.arg=ety, las=1, horiz=TRUE) Labels are not visible. But trial and error method with the second mar argument I get what I want. par(mar=c(3,12,2,1), cex=0.8) barplot(x, names.arg=ety, las=1, horiz=TRUE) I would like something like that: second.mar = max( nchar(ety) )/2 Taking the opportunity I have 2 another question: 1. Space between labels and bars is too big - how to change it to the value of 1 character? 2. In the example above the x axis is too short. How to make R draw a line little longer then maximum bar length. I know that I could set xlim=c(0,max(x)) but because of main increase equals 20 and the last value 95 it doesn't solve the problem. The increase is ok. but only line should be longer. Thank you Robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loading workspace- getting annoying
Also, rm(list=ls()) will remove absolutely everything from your workspace. Next time you quit and save workspace you start with and empty workspace. -- View this message in context: http://r.789695.n4.nabble.com/loading-workspace-getting-annoying-tp3004781p3138203.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] NA's in survey analysis
Hello, I am trying to analyze sociological survey data using R. It is often important in survey to calculate both the actual factor sums and percentages (easily done with describe() ), but also the numbers and total percentage of NA's. Often it is important to present NA's in graphs besides the factors. Is there any easy way to make R treat NA's as if those were factors besides other factors? Now, describe(data$a) gives me percentages only for the factors. So I have to redo percentages manually. barplot() also ignores NA's. So, to include NA's into barplot I need to do a table more or less manually. The other way to do it is to convert NA's into factors (doable, although, unlike in SPSS, I cannot make an assumption that 99 is a good code for a factor NA – it has to be the next number in the factor list,so, might be different for each column in a data frame). And besides, I have read somewhere in this list that IT IS THE WRONG WAY TO DO STUFF IN R :) Is there the right way to do things that I want, and if not – what are the possible workarounds, smarter than the ones I listed? -- Donatas Glodenis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RE : replace values of a table !!!
I suppose what you want to do is something like: dat - matrix(c(2:13,-),nc=4) dat dat[dat== -] - 1 # replace the - by 0 dat Please be careful to think twice what you are doing to you data by changing some values. Maybe you rather want to replace the - values by NA ? HTH, Wolfgang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wolfgang Raffelsberger, PhD Laboratoire de BioInformatique et Génomique Intégratives IGBMC, 1 rue Laurent Fries, 67404 Illkirch Strasbourg, France Tel (+33) 388 65 3300 Fax (+33) 388 65 3276 wolfgang.raffelsber...@igbmc.fr De : r-help-boun...@r-project.org [r-help-boun...@r-project.org] de la part de Taiseer Aljazzar [taljaz...@yahoo.com] Date d'envoi : mardi 21 décembre 2010 09:59 À : r-help@r-project.org Objet : [R] replace values of a table !!! Dear all, Dear all, I am a relatively new user. I have an ascii file with 550 rows and 400 columns. The file contain values ranging from 1 to 2000 and some values with -. I want to generate a new file where the - values are replaced with 0 values, the other values with the 1.0 value. What should I do, Thanks Taiseer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] labels and barchart
Hello, Robert, see hints below. On Tue, 21 Dec 2010, Robert Ruser wrote: Hello, I'm wondering how to set a value of mar ( par( mar=c(...)) ) in order to allow labels to be visible in barplot. Is there any relation between the number of characters in a label and the second value of mar? Look at my example. x - seq(20, 100, by=15) ety - rep( Effect on treatment group, times=length(x)) barplot(x, names.arg=ety, las=1, horiz=TRUE) Labels are not visible. But trial and error method with the second mar argument I get what I want. par(mar=c(3,12,2,1), cex=0.8) barplot(x, names.arg=ety, las=1, horiz=TRUE) I would like something like that: second.mar = max( nchar(ety) )/2 Can't help with that really, but ... Taking the opportunity I have 2 another question: 1. Space between labels and bars is too big - how to change it to the value of 1 character? 2. In the example above the x axis is too short. How to make R draw a line little longer then maximum bar length. I know that I could set xlim=c(0,max(x)) but because of main increase equals 20 and the last value 95 it doesn't solve the problem. The increase is ok. but only line should be longer. You could take a look at par()'s argument mgp, but it affects both axes at the same time. I have the impression that you want more control of the style of each axis separately; axis() might than be useful, like par( mar = c( 3, 13, 2, 1), cex = 0.8) barplot( x, names.arg = NULL, horiz = TRUE, axes = FALSE) axis( side = 1, at = c( seq( 0, 80, by = 20), 95)) axis( side = 2, at = 1:length(ety), line = -1, las = 1, tick = FALSE, labels = ety) Hth, Gerrit Thank you Robert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] labels and barchart
Robert Ruser wrote: x - seq(20, 100, by=15) ety - rep( Effect on treatment group, times=length(x)) barplot(x, names.arg=ety, las=1, horiz=TRUE) Labels are not visible. But trial and error method with the second mar argument I get what I want. Standard graphics has fallen a bit out of favor because of these quirks. Try lattice: library(lattice) x - seq(20, 100, by=15) ety - paste(Effect on treatment group,1:length(x)) barchart(ety~x) Note that the ety labels must be different to make this work. With your original data, you only get one bar (and I needed some time to find out what was wrong). Dieter -- View this message in context: http://r.789695.n4.nabble.com/labels-and-barchart-tp3141185p3145166.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Where is the bioDist package?
venik wrote: I am trying in vain to find the bioDist package. More generally, where can I find a lit of packages and their location? I thought CRAN will have it, but I had no luck with bioDist. Google bioDist, second hit (maybe another one, depending on your language settings). D -- View this message in context: http://r.789695.n4.nabble.com/Where-is-the-bioDist-package-tp3143266p3145231.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] labels and barchart
2010/12/21 Gerrit Eichner gerrit.eich...@math.uni-giessen.de: par( mar = c( 3, 13, 2, 1), cex = 0.8) barplot( x, names.arg = NULL, horiz = TRUE, axes = FALSE) axis( side = 1, at = c( seq( 0, 80, by = 20), 95)) axis( side = 2, at = 1:length(ety), line = -1, las = 1, tick = FALSE, labels = ety) Thank you very much. I would change a little because the levels of the labels are not good. par( mar = c( 3, 13, 2, 1), cex = 0.8) my.chart - barplot( x, names.arg = NULL, horiz = TRUE, axes = FALSE) axis( side = 1, at = c( seq( 0, 80, by = 20), 95)) axis( side = 2, at = my.chart, line = -1, las = 1, tick = FALSE, labels = ety) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] labels and barchart
2010/12/21 Dieter Menne dieter.me...@menne-biomed.de: Standard graphics has fallen a bit out of favor because of these quirks. Try lattice: library(lattice) x - seq(20, 100, by=15) ety - paste(Effect on treatment group,1:length(x)) barchart(ety~x) Note that the ety labels must be different to make this work. With your original data, you only get one bar (and I needed some time to find out what was wrong). Thank you. I know that lattice in some circumstances is better but I find traditional graphics more controllable. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Coding a new variable based on criteria in a dataset
Hi, I'm a bit stuck and need some help with R code to code a variable F_R based on a combination of conditions. The first condition would code F_R as F and would be based on the min(Date) and Min(Time) for each combination of UniqueID Reason. The second condition would code the variable as R as it would be the rest of the data that dont meet the first condition. For example: for UID 1 Reason 1 the first record would be coded F and the 4th record would be coded R. UniqueID Reason Date Time 1 UID 1 Reason 1 19/12/2010 15:00 2 UID 1 Reason 2 19/12/2010 16:00 3 UID 1 Reason 3 19/12/2010 16:30 4 UID 1 Reason 1 20/12/2010 08:00 5 UID 1 Reason 2 20/12/2010 10:01 6 UID 1 Reason 3 20/12/2010 11:30 7 UID 1 Reason 1 21/12/2010 12:45 8 UID 1 Reason 2 21/12/2010 18:44 9 UID 1 Reason 3 21/12/2010 19:29 10UID 2 Reason 1 19/12/2010 17:00 11UID 2 Reason 2 19/12/2010 18:00 12UID 2 Reason 3 19/12/2010 18:10 13UID 2 Reason 1 20/12/2010 13:00 14UID 2 Reason 2 20/12/2010 13:30 15UID 2 Reason 3 20/12/2010 16:15 Is a loop the most efficient way to do this or is there some pre-existing function that can help me with this? The sample dataset is what is given below. Thanks in advance, Raoul -- View this message in context: http://r.789695.n4.nabble.com/Coding-a-new-variable-based-on-criteria-in-a-dataset-tp3145176p3145176.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R CMD build/install: wrong Rtools include path is passed to g++
Andy Zhu wrote: Hi: I am trying to build/install rparallel source package in win32 using Rtools/R CMD. However, R CMD build or install fails. The R CMD build output shows that the path of Rtools/MinGW/include is wrong in g++ -I. How can I pass/configure the correct include path to R CMD? Tried this in both R 2.12 and 2.11 with compatible Rtools and Miktex/chm helper. Neither succeeded. Note, the R/Rtools/MinGW setting works fine if the package doesn't have C/C++ code. I was able to install my own R package which doesn't have C/C++ code. I think your analysis is wrong. The path to Rtools/MinGW/include is not explicitly set by R. You set the PATH to the compiler, and that include directory is automatically set. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NA's in survey analysis
2010/12/21 Petr PIKAL petr.pi...@precheza.cz: Hi r-help-boun...@r-project.org napsal dne 21.12.2010 11:02:07: Hello, I am trying to analyze sociological survey data using R. It is often important in survey to calculate both the actual factor sums and percentages (easily done with describe() ), but also the numbers and total percentage of NA's. Often it is important to present NA's in graphs besides the factors. Is there any easy way to make R treat NA's as if those were factors besides other factors? Now, describe(data$a) gives me percentages only for the factors. So I have to redo percentages manually. barplot() also ignores NA's. So, to include NA's into barplot I need to do a table more or less manually. The other way to do it is to convert NA's into factors (doable, although, unlike in SPSS, I cannot make an assumption that 99 is a not necessary to code missing values, you can set NA as one level. x-factor(sample(c(1:3, NA),20,replace=T), exclude=NULL) x [1] 1 1 3 3 3 2 3 NA 3 1 2 NA 3 NA 2 [16] 2 3 1 NA 3 Levels: 1 2 3 NA y-rnorm(20) boxplot(split(y,x)) Besides you could find it from factor help page as I did. Regards Petr Thank you Petr, this info (re exclude=NULL) might have saved me tons of time last week :) I still have not found an equivalent parameter in describe(), but anyway, I have been helped a lot! -- Donatas Glodenis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to suppress plotting for xyplot(zoo(x))?
Hi, I found the thread http://r.789695.n4.nabble.com/Matrix-as-input-to-xyplot-lattice-proper-extended-formula-syntax-td896948.html I used Gabor's approach and then tried to assign the plot to a variable (see below). But a Quartz device is opened... why? I don't want to have anything plot/printed, I just would like to store the plot object. Is there something like plot = FALSE? Cheers, Marius library(lattice) library(zoo) df - data.frame(y = matrix(rnorm(24), nrow = 6), x = 1:6) xyplot(zoo(df[1:4], df$x), type = p) plot.object - xyplot(zoo(df[1:4], df$x), type = p) # problem: a Quartz device is opened (on Mac OS X 10.6) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression or not?
array chip arrayprofile at yahoo.com writes: [snip] I can think of analyzing this data using glm() with the attached dataset: test-read.table('test.txt',sep='\t') fit-glm(cbind(positive,total-positive)~treatment,test,family=binomial) summary(fit) anova(fit, test='Chisq') First, is this still called logistic regression or something else? I thought with logistic regression, the response variable is a binary factor? Sometimes I've seen it called binomial regression, or just a binomial generalized linear model Second, then summary(fit) and anova(fit, test='Chisq') gave me different p values, why is that? which one should I use? summary(fit) gives you p-values from a Wald test. anova() gives you tests based on the Likelihood Ratio Test. In general the LRT is more accurate. Third, is there an equivalent model where I can use variable percentage instead of positive total? glm(percentage~treatment,weights=total,data=tests,family=binomial) is equivalent to the model you fitted above. Finally, what is the best way to analyze this kind of dataset where it's almost the same as ANOVA except that the response variable is a proportion (or success and failure)? Don't quite know what you mean here. How is the situation almost the same as ANOVA different from the situation you described above? Do you mean when there are multiple factors? or ??? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Performing basic Multiple Sequence Alignment in R?
Tal; I'm trimming the BioC posting. In the R lists it is considered spamming to cross post. (Please re-read the Posting Guide.) On Dec 21, 2010, at 4:21 AM, Tal Galili wrote: Hello everyone, I am not sure if this should go on the general R mailing list (for example, if there is a text mining solution that might work here) or the bioconductor mailing list (since I wasn't able to find a solution to my question on searching their lists) - so this time I tried both, and in the future I'll know better (in case it should go to only one of the two). The task I'm trying to achieve is to align several sequences together. I don't have a basic pattern to match to. All that I know is that the True pattern should be of length 30 and that the sequences I'm looking at, have had missing values introduced to them at random points. Here is an example of such sequences, were on the left we see what is the real location of the missing values, and on the right we see the sequence that we will be able to observe. My goal is to reconstruct the left column using only the sequences I've got on the right column (based on the fact that many of the letters in each position are the same) Real_sequence The_sequence_we_see 1 CGCAATACTAAC-AGCTGACTTACGCACCG CGCAATACTAACAGCTGACTTACGCACCG 2 CGCAATACTAGC-AGGTGACTTCC-CT-CG CGCAATACTAGCAGGTGACTTCCCTCG 3 CGCAATGATCAC--GGTGGCTCCCGGTGCG CGCAATGATCACGGTGGCTCCCGGTGCG 4 CGCAATACTAACCA-CTAACT--CGCTGCG CGCAATACTAACCACTAACTCGCTGCG 5 CGCACGGGTAAGAACGTGA-TTACGCTCAG CGCACGGGTAAGAACGTGATTACGCTCAG 6 CGCTATACTAACAA-GTG-CTTAGGC-CTG CGCTATACTAACAAGTGCTTAGGCCTG 7 CCCA-C-CTAA-ACGGTGACTTACGCTCCG CCCACCTAAACGGTGACTTACGCTCCG The agrep function allows one to specify which sort of differences to consider in calculating a Levenshtein edit distance. Insertions are one possible distance component. You could take a look at its code (in C in hte sources) and perhaps rejigger it to spit out the location of the deletions. agrep(seqdat$The_sequence_we_see[1], seqdat$Real_sequence, max.distance=list(deletions=0, substitutions=0, insertions=0)) integer(0) agrep(seqdat$The_sequence_we_see[1], seqdat$Real_sequence, max.distance=list(deletions=0, substitutions=0, insertions=1)) [1] 1 -- David. Here is an example code to reproduce the above example: ATCG - c(A,T,C,G) set.seed(40) original.seq - sample(ATCG, 30, T) seqS - matrix(original.seq,200,30, T) change.letters - function(x, number.of.changes = 15, letters.to.change.with = ATCG) { number.of.changes - sample(seq_len(number.of.changes), 1) new.letters - sample(letters.to.change.with , number.of.changes, T) where.to.change.the.letters - sample(seq_along(x) , number.of.changes, F) x[where.to.change.the.letters] - new.letters return(x) } change.letters(original.seq) insert.missing.values - function(x) change.letters(x, 3, -) insert.missing.values(original.seq) seqS2 - t(apply(seqS, 1, change.letters)) seqS3 - t(apply(seqS2, 1, insert.missing.values)) seqS4 - apply(seqS3,1, function(x) {paste(x, collapse = )}) require(stringr) # library(help=stringr) all.seqS - str_replace(seqS4,- , ) # how do we allign this? data.frame(Real_sequence = seqS4, The_sequence_we_see = all.seqS) I understand that if all I had was a string and a pattern I would be able to use library(Biostrings) pairwiseAlignment(...) But in the case I present we are dealing with many sequences to align to one another (instead of aligning them to one pattern). Is there a known method for doing this in R? Thanks, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Performing basic Multiple Sequence Alignment in R?
I don't have an answer, trying to solicit more input with additional questions. From: tal.gal...@gmail.com Date: Tue, 21 Dec 2010 11:21:03 +0200 To: r-help@r-project.org; bioconduc...@r-project.org Subject: [R] Performing basic Multiple Sequence Alignment in R? Hello everyone, I am not sure if this should go on the general R mailing list (for example, if there is a text mining solution that might work here) or the bioconductor mailing list (since I wasn't able to find a solution to my question on searching their lists) - so this time I tried both, and in the future I'll know better (in case it should go to only one of the two). I take it you don't want an R interface for clustal and I seem to recall, from doing this a few years ago, that alignment by exact string matching was a bit of a research area ( I think you can find papers on citeseer for example). It does seem you are asking about exact string matches for alignment markers- your left sequences appear exactly someplace on the right- but your overall interests are not real clear. I never got my code fully working but I was happy that I could do different strains of e coli ( or something in the 5-10 Mbp genome range ) very quickly ( seconds as I recall ) and you could also presumably find similar items that had moved a long way. Earlier someone came here with a task and was pointed to bio packages but I thought there may be something in computational linguistics or mining better suited to needs but no one ever volunteered anything. The task I'm trying to achieve is to align several sequences together. I don't have a basic pattern to match to. All that I know is that the True pattern should be of length 30 and that the sequences I'm looking at, have had missing values introduced to them at random points. Alternatively I guess someone could make an R interface for various BLAST's, sometimes the help desk at NCBI can get questions like this to the right person internally. Here is an example of such sequences, were on the left we see what is the real location of the missing values, and on the right we see the sequence that we will be able to observe. My goal is to reconstruct the left column using only the sequences I've got on the right column (based on the fact that many of the letters in each position are the same) Real_sequence The_sequence_we_see 1 CGCAATACTAAC-AGCTGACTTACGCACCG CGCAATACTAACAGCTGACTTACGCACCG 2 CGCAATACTAGC-AGGTGACTTCC-CT-CG CGCAATACTAGCAGGTGACTTCCCTCG 3 CGCAATGATCAC--GGTGGCTCCCGGTGCG CGCAATGATCACGGTGGCTCCCGGTGCG 4 CGCAATACTAACCA-CTAACT--CGCTGCG CGCAATACTAACCACTAACTCGCTGCG 5 CGCACGGGTAAGAACGTGA-TTACGCTCAG CGCACGGGTAAGAACGTGATTACGCTCAG 6 CGCTATACTAACAA-GTG-CTTAGGC-CTG CGCTATACTAACAAGTGCTTAGGCCTG 7 CCCA-C-CTAA-ACGGTGACTTACGCTCCG CCCACCTAAACGGTGACTTACGCTCCG __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to control ticks
Hi, I want 12 ticks at axis 1 and want to write Jan-Dec on each. something like: axis(1, at=1:12, labels=c('J','F','M','A','M','J','J','A','S','O','N','D')) I could omit default ticks but now how to control ticks. plot(file$time, file$ch4*1000, ylim=c(1500,1700), xaxt='n', xlab= NA, ylab=NA,col=blue,yaxs=i,lwd=2, pch=10, type=b)# axis(1, at=1:12, labels=c('J','F','M','A','M','J','J','A','S','O','N','D')) BUT above is not working, and there is no error as well. Pls help, Regards, Yogesh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coding a new variable based on criteria in a dataset
RaoulD raoul.t.dsouza at gmail.com writes: Hi, I'm a bit stuck and need some help with R code to code a variable F_R based on a combination of conditions. The first condition would code F_R as F and would be based on the min(Date) and Min(Time) for each combination of UniqueID Reason. The second condition would code the variable as R as it would be the rest of the data that dont meet the first condition. It isn't quite convenient to read the data posted below into R (if it was originally tab-separated, that formatting got lost) but ddply from the plyr package is good for this: something like (untested) d - with(data,ddply(data,interaction(UniqueID,Reason), function(x) { ## make sure x is sorted by date/time here x$F_R - c(F,rep(R,nrow(x)-1)) x }) For example: for UID 1 Reason 1 the first record would be coded F and the 4th record would be coded R. UniqueID Reason Date Time 1 UID 1 Reason 1 19/12/2010 15:00 2 UID 1 Reason 2 19/12/2010 16:00 3 UID 1 Reason 3 19/12/2010 16:30 4 UID 1 Reason 1 20/12/2010 08:00 [snip] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to control ticks
Hi, The following seems to work: plot(1:12,1:12,xaxt='n',xlab=NA) axis(1,at=1:12,labels=c(J,F,M,A,M,J,J,A,S,O,N,D) ) So I'd guess that your X axis data, file$time, doesn't take the values 1 to 12. Martyn -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Yogesh Tiwari Sent: 21 December 2010 12:37 To: r-help Subject: [R] how to control ticks Hi, I want 12 ticks at axis 1 and want to write Jan-Dec on each. something like: axis(1, at=1:12, labels=c('J','F','M','A','M','J','J','A','S','O','N','D')) I could omit default ticks but now how to control ticks. plot(file$time, file$ch4*1000, ylim=c(1500,1700), xaxt='n', xlab= NA, ylab=NA,col=blue,yaxs=i,lwd=2, pch=10, type=b)# axis(1, at=1:12, labels=c('J','F','M','A','M','J','J','A','S','O','N','D')) BUT above is not working, and there is no error as well. Pls help, Regards, Yogesh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This e-mail has been scanned for all viruses by Star.\ _...{{dropped:12}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to suppress plotting for xyplot(zoo(x))?
On Tue, Dec 21, 2010 at 7:53 AM, Marius Hofert m_hof...@web.de wrote: Hi, I found the thread http://r.789695.n4.nabble.com/Matrix-as-input-to-xyplot-lattice-proper-extended-formula-syntax-td896948.html I used Gabor's approach and then tried to assign the plot to a variable (see below). But a Quartz device is opened... why? I don't want to have anything plot/printed, I just would like to store the plot object. Is there something like plot = FALSE? Cheers, Marius library(lattice) library(zoo) df - data.frame(y = matrix(rnorm(24), nrow = 6), x = 1:6) xyplot(zoo(df[1:4], df$x), type = p) plot.object - xyplot(zoo(df[1:4], df$x), type = p) # problem: a Quartz device is opened (on Mac OS X 10.6) This also opens up a window on Windows. It occurs within lattice when lattice issues a trellis.par.get . A workaround would be to open a device directed to null. On Windows this would work. I assume if you use /dev/null it would work on your machine. png(NUL) plot.object - ... dev.off() -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression or not?
A possible caveat here. Traditionally, logistic regression was performed on the logit-transformed proportions, with the standard errors based on the residuals for the resulting linear fit. This accommodates overdispersion naturally, but without telling you that you have any. glm with a binomial family does not allow for overdispoersion unless you use the quasibinomial family. If you have overdispersion, standard errors from glm will be unrealistically small. Make sure your model fits in glm before you believe the standard errors, or use the quasibionomial family. Steve Ellison LGC Ben Bolker bbol...@gmail.com 21/12/2010 13:08:34 array chip arrayprofile at yahoo.com writes: [snip] I can think of analyzing this data using glm() with the attached dataset: test-read.table('test.txt',sep='\t') fit-glm(cbind(positive,total-positive)~treatment,test,family=binomial) summary(fit) anova(fit, test='Chisq') First, is this still called logistic regression or something else? I thought with logistic regression, the response variable is a binary factor? Sometimes I've seen it called binomial regression, or just a binomial generalized linear model Second, then summary(fit) and anova(fit, test='Chisq') gave me different p values, why is that? which one should I use? summary(fit) gives you p-values from a Wald test. anova() gives you tests based on the Likelihood Ratio Test. In general the LRT is more accurate. Third, is there an equivalent model where I can use variable percentage instead of positive total? glm(percentage~treatment,weights=total,data=tests,family=binomial) is equivalent to the model you fitted above. Finally, what is the best way to analyze this kind of dataset where it's almost the same as ANOVA except that the response variable is a proportion (or success and failure)? Don't quite know what you mean here. How is the situation almost the same as ANOVA different from the situation you described above? Do you mean when there are multiple factors? or ??? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] combination value
mailto:r-help@r-project.orgHi every one, I want to calculate the combination function in R, the value not all the possible choices. I mean cmbn(5,2)=10. Is there any function unless using factorial? Regards, Amir [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combination value
choose(5, 2) HTH, Jorge On Tue, Dec 21, 2010 at 9:23 AM, amir wrote: mailto:r-help@r-project.orgHi every one, I want to calculate the combination function in R, the value not all the possible choices. I mean cmbn(5,2)=10. Is there any function unless using factorial? Regards, Amir [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] two-part growth analysis
Hi everyone! Does anyone know if there is a package to do two-part growth analysis with R? Regards, Sebastian -- Sebastián Daza sebastian.d...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combination value
On Dec 21, 2010, at 9:23 AM, amir wrote: mailto:r-help@r-project.orgHi every one, I want to calculate the combination function in R, the value not all the possible choices. I mean cmbn(5,2)=10. Is there any function unless using factorial? ?choose -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ideas, modeling highly discrete time-series data
You could try the timeseries list at https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=TIMESERIES kjetil On Mon, Dec 20, 2010 at 6:26 PM, Mike Williamson this.is@gmail.com wrote: Hello all, First of all, thanks so those of you who helped me a week or so ago managing a time series with varying gaps between the data series in 'R'. (My final preferred solution was to use its function then forecast(Arima( ) ). ) My next question is a general statistical question where I'd like some advice, for those willing / able to proffer any wisdom: - I need to predict using this same time series, where the *data* are highly discrete. E.g., I will have values like 1e5, 2.2e5, and 3.6e5, but I will never have 1.3e5 or 1.8e5, etc. - I could simply leave these values as discrete, similar to a binomial distribution, but then I am not sure how to use time series tricks like arima above. For time-series analyses that I know of, an assumption of an approximately normal distribution is expected. No simple normalization (e.g., log(values) ) works, since the non-normality arises from the highly discrete distribution more than any drastic asymmetry in the population spread. - I could leave the values as they are an work with a model where the assumption is violated... I am not sure how sensitive a model such as arima is on the population distribution - Or I could... (here's where I am hoping for some collective genius). Thanks in advance for any help! If this isn't the best forum, since I know this is not specifically an 'R' question, please let me know of a better forum to post such a question. Thanks! Mike Telescopes and bathyscaphes and sonar probes of Scottish lakes, Tacoma Narrows bridge collapse explained with abstract phase-space maps, Some x-ray slides, a music score, Minard's Napoleanic war: The most exciting frontier is charting what's already here. -- xkcd -- Help protect Wikipedia. Donate now: http://wikimediafoundation.org/wiki/Support_Wikipedia/en [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to control ticks
On Tue, 21 Dec 2010 18:06:52 +0530 Yogesh Tiwari yogesh@googlemail.com wrote: Hi, I want 12 ticks at axis 1 and want to write Jan-Dec on each. something like: axis(1, at=1:12, labels=c('J','F','M','A','M','J','J','A','S','O','N','D')) I could omit default ticks but now how to control ticks. Dear Yogesh, I spray my clothing with No-Bite, and that controls ticks quite well. :-) Edwin plot(file$time, file$ch4*1000, ylim=c(1500,1700), xaxt='n', xlab= NA, ylab=NA,col=blue,yaxs=i,lwd=2, pch=10, type=b)# axis(1, at=1:12, labels=c('J','F','M','A','M','J','J','A','S','O','N','D')) BUT above is not working, and there is no error as well. Pls help, Regards, Yogesh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Dr. Edwin Groot, postdoctoral associate AG Laux Institut fuer Biologie III Schaenzlestr. 1 79104 Freiburg, Deutschland +49 761-2032945 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to control ticks
What is the structure of file$time? Is it Date/POSIXct? 'at=1:12' only works if those are the dimensions of file$time. So give us an idea of what the data is (PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code). On Tue, Dec 21, 2010 at 7:36 AM, Yogesh Tiwari yogesh@googlemail.com wrote: Hi, I want 12 ticks at axis 1 and want to write Jan-Dec on each. something like: axis(1, at=1:12, labels=c('J','F','M','A','M','J','J','A','S','O','N','D')) I could omit default ticks but now how to control ticks. plot(file$time, file$ch4*1000, ylim=c(1500,1700), xaxt='n', xlab= NA, ylab=NA,col=blue,yaxs=i,lwd=2, pch=10, type=b)# axis(1, at=1:12, labels=c('J','F','M','A','M','J','J','A','S','O','N','D')) BUT above is not working, and there is no error as well. Pls help, Regards, Yogesh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to control ticks
On Dec 21, 2010, at 17:01 , Edwin Groot wrote: On Tue, 21 Dec 2010 18:06:52 +0530 Yogesh Tiwari yogesh@googlemail.com wrote: Hi, I want 12 ticks at axis 1 and want to write Jan-Dec on each. something like: axis(1, at=1:12, labels=c('J','F','M','A','M','J','J','A','S','O','N','D')) I could omit default ticks but now how to control ticks. Dear Yogesh, I spray my clothing with No-Bite, and that controls ticks quite well. Yeah, but then how do you get the suckers to sit still while you write on them? ;-) :-) Edwin plot(file$time, file$ch4*1000, ylim=c(1500,1700), xaxt='n', xlab= NA, ylab=NA,col=blue,yaxs=i,lwd=2, pch=10, type=b)# axis(1, at=1:12, labels=c('J','F','M','A','M','J','J','A','S','O','N','D')) BUT above is not working, and there is no error as well. Pls help, Regards, Yogesh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Dr. Edwin Groot, postdoctoral associate AG Laux Institut fuer Biologie III Schaenzlestr. 1 79104 Freiburg, Deutschland +49 761-2032945 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression or not?
On Dec 21, 2010, at 14:22 , S Ellison wrote: A possible caveat here. Traditionally, logistic regression was performed on the logit-transformed proportions, with the standard errors based on the residuals for the resulting linear fit. This accommodates overdispersion naturally, but without telling you that you have any. glm with a binomial family does not allow for overdispoersion unless you use the quasibinomial family. If you have overdispersion, standard errors from glm will be unrealistically small. Make sure your model fits in glm before you believe the standard errors, or use the quasibionomial family. ...and before you believe in overdispersion, make sure you have a credible explanation for it. All too often, what you really have is a model that doesn't fit your data properly. Steve Ellison LGC Ben Bolker bbol...@gmail.com 21/12/2010 13:08:34 array chip arrayprofile at yahoo.com writes: [snip] I can think of analyzing this data using glm() with the attached dataset: test-read.table('test.txt',sep='\t') fit-glm(cbind(positive,total-positive)~treatment,test,family=binomial) summary(fit) anova(fit, test='Chisq') First, is this still called logistic regression or something else? I thought with logistic regression, the response variable is a binary factor? Sometimes I've seen it called binomial regression, or just a binomial generalized linear model Second, then summary(fit) and anova(fit, test='Chisq') gave me different p values, why is that? which one should I use? summary(fit) gives you p-values from a Wald test. anova() gives you tests based on the Likelihood Ratio Test. In general the LRT is more accurate. Third, is there an equivalent model where I can use variable percentage instead of positive total? glm(percentage~treatment,weights=total,data=tests,family=binomial) is equivalent to the model you fitted above. Finally, what is the best way to analyze this kind of dataset where it's almost the same as ANOVA except that the response variable is a proportion (or success and failure)? Don't quite know what you mean here. How is the situation almost the same as ANOVA different from the situation you described above? Do you mean when there are multiple factors? or ??? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to control ticks
I'm not sure, but perhaps you want to copy the logic of: http://www.portfolioprobe.com/R/blog/pp.timeplot.R On 21/12/2010 12:36, Yogesh Tiwari wrote: Hi, I want 12 ticks at axis 1 and want to write Jan-Dec on each. something like: axis(1, at=1:12, labels=c('J','F','M','A','M','J','J','A','S','O','N','D')) I could omit default ticks but now how to control ticks. plot(file$time, file$ch4*1000, ylim=c(1500,1700), xaxt='n', xlab= NA, ylab=NA,col=blue,yaxs=i,lwd=2, pch=10, type=b)# axis(1, at=1:12, labels=c('J','F','M','A','M','J','J','A','S','O','N','D')) BUT above is not working, and there is no error as well. Pls help, Regards, Yogesh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Patrick Burns pbu...@pburns.seanet.com twitter: @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of 'Some hints for the R beginner' and 'The R Inferno') __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression or not?
...and before you believe in overdispersion, make sure you have a credible explanation for it. All too often, what you really have is a model that doesn't fit your data properly. Well put. A possible fortune? S Ellison *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression or not?
Thank you Ben, Steve and Peter. Ben, my last question was to see if there are other ways of analyzing this type of data where the response variable is a proportion, in addition to binomial regression. BTW, I also found the following is also an equivalent model directly using percentage: glm(log(percentage/(1-percentage))~treatment,data=test) Thanks John From: Ben Bolker bbol...@gmail.com To: r-h...@stat.math.ethz.ch Sent: Tue, December 21, 2010 5:08:34 AM Subject: Re: [R] logistic regression or not? array chip arrayprofile at yahoo.com writes: [snip] I can think of analyzing this data using glm() with the attached dataset: test-read.table('test.txt',sep='\t') fit-glm(cbind(positive,total-positive)~treatment,test,family=binomial) summary(fit) anova(fit, test='Chisq') First, is this still called logistic regression or something else? I thought with logistic regression, the response variable is a binary factor? Sometimes I've seen it called binomial regression, or just a binomial generalized linear model Second, then summary(fit) and anova(fit, test='Chisq') gave me different p values, why is that? which one should I use? summary(fit) gives you p-values from a Wald test. anova() gives you tests based on the Likelihood Ratio Test. In general the LRT is more accurate. Third, is there an equivalent model where I can use variable percentage instead of positive total? glm(percentage~treatment,weights=total,data=tests,family=binomial) is equivalent to the model you fitted above. Finally, what is the best way to analyze this kind of dataset where it's almost the same as ANOVA except that the response variable is a proportion (or success and failure)? Don't quite know what you mean here. How is the situation almost the same as ANOVA different from the situation you described above? Do you mean when there are multiple factors? or ??? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression or not?
On 10-12-21 12:20 PM, array chip wrote: Thank you Ben, Steve and Peter. Ben, my last question was to see if there are other ways of analyzing this type of data where the response variable is a proportion, in addition to binomial regression. BTW, I also found the following is also an equivalent model directly using percentage: glm(log(percentage/(1-percentage))~treatment,data=test) Thanks John Yes, but this is a different model. The model you have here uses Gaussian errors (it is in fact an identical model, although not necessarily quite an identical algorithm (?), to just using lm(). It will fail if you have any percentages that are 0 or 1. See Stuart's comment about how things were done in the old days. Beta regression (see e.g. the betareg package) is another way of handling analysis of proportions. *From:* Ben Bolker bbol...@gmail.com *To:* r-h...@stat.math.ethz.ch *Sent:* Tue, December 21, 2010 5:08:34 AM *Subject:* Re: [R] logistic regression or not? array chip arrayprofile at yahoo.com http://yahoo.com/ writes: [snip] I can think of analyzing this data using glm() with the attached dataset: test-read.table('test.txt',sep='\t') fit-glm(cbind(positive,total-positive)~treatment,test,family=binomial) summary(fit) anova(fit, test='Chisq') First, is this still called logistic regression or something else? I thought with logistic regression, the response variable is a binary factor? Sometimes I've seen it called binomial regression, or just a binomial generalized linear model Second, then summary(fit) and anova(fit, test='Chisq') gave me different p values, why is that? which one should I use? summary(fit) gives you p-values from a Wald test. anova() gives you tests based on the Likelihood Ratio Test. In general the LRT is more accurate. Third, is there an equivalent model where I can use variable percentage instead of positive total? glm(percentage~treatment,weights=total,data=tests,family=binomial) is equivalent to the model you fitted above. Finally, what is the best way to analyze this kind of dataset where it's almost the same as ANOVA except that the response variable is a proportion (or success and failure)? Don't quite know what you mean here. How is the situation almost the same as ANOVA different from the situation you described above? Do you mean when there are multiple factors? or ??? __ R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression or not?
Ben, thanks again. John From: Ben Bolker bbol...@gmail.com Cc: r-h...@stat.math.ethz.ch; S Ellison s.elli...@lgc.co.uk; peter dalgaard pda...@gmail.com Sent: Tue, December 21, 2010 9:26:29 AM Subject: Re: [R] logistic regression or not? On 10-12-21 12:20 PM, array chip wrote: Thank you Ben, Steve and Peter. Ben, my last question was to see if there are other ways of analyzing this type of data where the response variable is a proportion, in addition to binomial regression. BTW, I also found the following is also an equivalent model directly using percentage: glm(log(percentage/(1-percentage))~treatment,data=test) Thanks John Yes, but this is a different model. The model you have here uses Gaussian errors (it is in fact an identical model, although not necessarily quite an identical algorithm (?), to just using lm(). It will fail if you have any percentages that are 0 or 1. See Stuart's comment about how things were done in the old days. Beta regression (see e.g. the betareg package) is another way of handling analysis of proportions. *From:* Ben Bolker bbol...@gmail.com *To:* r-h...@stat.math.ethz.ch *Sent:* Tue, December 21, 2010 5:08:34 AM *Subject:* Re: [R] logistic regression or not? array chip arrayprofile at yahoo.com http://yahoo.com/ writes: [snip] I can think of analyzing this data using glm() with the attached dataset: test-read.table('test.txt',sep='\t') fit-glm(cbind(positive,total-positive)~treatment,test,family=binomial) summary(fit) anova(fit, test='Chisq') First, is this still called logistic regression or something else? I thought with logistic regression, the response variable is a binary factor? Sometimes I've seen it called binomial regression, or just a binomial generalized linear model Second, then summary(fit) and anova(fit, test='Chisq') gave me different p values, why is that? which one should I use? summary(fit) gives you p-values from a Wald test. anova() gives you tests based on the Likelihood Ratio Test. In general the LRT is more accurate. Third, is there an equivalent model where I can use variable percentage instead of positive total? glm(percentage~treatment,weights=total,data=tests,family=binomial) is equivalent to the model you fitted above. Finally, what is the best way to analyze this kind of dataset where it's almost the same as ANOVA except that the response variable is a proportion (or success and failure)? Don't quite know what you mean here. How is the situation almost the same as ANOVA different from the situation you described above? Do you mean when there are multiple factors? or ??? __ R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R.matlab memory use
Hi, I am using Octave; what does that save options do, more specifically, is compression taking place when saving that file? If compression is done, then the Rcompression package is utilized by R.matlab (otherwise not). BTW, you don't have to load Rcompression explicitly; R.matlab will do it for you if needed. So, if you start a fresh R session and load R.matlab and then try to load your package, is Rcompression loaded? If so, what version Rcompression do you have installed, i.e. what does sessionInfo() report afterward? Duncan TL did Rcompression updates addressing memory usage about a year ago (I think) and it might be that you are using an older version of it. You should also update R.matlab et al, because your using old versions (though I don't think that is the cause here). If Rcompression is the cause here, then it also make sense that you don't experience the memory hog when reading a text file (which is never compressed). You could also see if there is an option in Octave that safes to binary format but without compression. I know Matlab has such options. /Henrik (author of R.matlab) On Mon, Dec 20, 2010 at 7:11 AM, Stefano Ghirlanda dr.ghirla...@gmail.com wrote: Hi Ben, Thanks for your reply. My data structure is about 2 x 2000 so one order of magnitude the one you tried. I have no problem saving and reading smaller data structures (even large ones, just not his large) between octave and R using octave's save -7 (which saves MATLAB v5 files) and R.matlab's readMat. And I can save in text format in octave and read in R using read.octave (from package foreign) so it's not a big deal. I was just surprised that R.matlab needed more memory than I have (I have 3GB on this machine). Thanks, Stefano On Sun, Dec 19, 2010 at 10:54 PM, Ben Bolker bbol...@gmail.com wrote: Stefano Ghirlanda dr.ghirlanda at gmail.com writes: I am trying to load into R a MATLAB format file (actually, as saved by octave). The file is about 300kB but R complains with a memory allocation error: library(Rcompression) library(R.matlab) Loading required package: R.oo Loading required package: R.methodsS3 R.methodsS3 v1.2.0 (2010-03-13) successfully loaded. See ?R.methodsS3 for help. R.oo v1.7.2 (2010-04-13) successfully loaded. See ?R.oo for help. R.matlab v1.3.1 (2010-04-20) successfully loaded. See ?R.matlab for help. f - readMat(freq.mat) Error: cannot allocate vector of size 296.5 Mb On the other hand, if I save the same data in ascii format (from octave: save -text), resulting in a 75MB file, then I can load it without problems with the read.octave() function from package foreign. Is this a known issue or am I doing something wrong? My R version is: This is not a package I'm particularly familiar with, but: what commands did you use to save the file in octave? Based on 'help save' I think that 'save' by default would get you an octave format file ... you might have to do some careful reading in ?readMat (in R) and 'help save' (in octave) to figure out the correspondence between octave/MATLAB and R/MATLAB. If possible, try saving a small file and see if it works; if you still don't know what's going on, post that file somewhere for people to try. I was able to save -6 save.mat in octave and readMat(save.mat) in R successfully, saving a vector of integers from 1 to 1 million (which took about 7.7 Mb) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Stefano Ghirlanda www.intercult.su.se/~stefano - drghirlanda.wordpress.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Performing basic Multiple Sequence Alignment in R?
Hello David, Mike and Thomas Dear David, First, my apologies for the double posting - I'll try to not forget that policy. Regarding agrep, I think it will be easier for me to work with the functions on {Biostrings} (for example stringDist, or pairwiseAlignment), then to open up the C code. Dear Mike and Thomas, From what I gathered here (Thanks to Joris Meys): http://stackoverflow.com/questions/4497747/how-to-perform-basic-multiple-sequence-alignments-in-r/4498434#4498434 There is an R interface to the MUSCLE algorithm in the bio3d package (function seqaln()). But not one for clustal. I will probably end up using pairwiseAlignment on pairs of allignments with some sort of stopping rules (I'll have to play with it to see how it works). Thank you all for your answers. It is always helpful to from others if something was already implemented in R or not. Best, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Tue, Dec 21, 2010 at 2:44 PM, Mike Marchywka marchy...@hotmail.comwrote: e came here with a task and was pointed to bio packages but I thought there m [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R.matlab memory use
On Tue, Dec 21, 2010 at 10:15 AM, Henrik Bengtsson h...@biostat.ucsf.edu wrote: Hi, I am using Octave; what does that save options do, more specifically, is compression taking place when saving that file? That should be: I am [not] using Octave... /H If compression is done, then the Rcompression package is utilized by R.matlab (otherwise not). BTW, you don't have to load Rcompression explicitly; R.matlab will do it for you if needed. So, if you start a fresh R session and load R.matlab and then try to load your package, is Rcompression loaded? If so, what version Rcompression do you have installed, i.e. what does sessionInfo() report afterward? Duncan TL did Rcompression updates addressing memory usage about a year ago (I think) and it might be that you are using an older version of it. You should also update R.matlab et al, because your using old versions (though I don't think that is the cause here). If Rcompression is the cause here, then it also make sense that you don't experience the memory hog when reading a text file (which is never compressed). You could also see if there is an option in Octave that safes to binary format but without compression. I know Matlab has such options. /Henrik (author of R.matlab) On Mon, Dec 20, 2010 at 7:11 AM, Stefano Ghirlanda dr.ghirla...@gmail.com wrote: Hi Ben, Thanks for your reply. My data structure is about 2 x 2000 so one order of magnitude the one you tried. I have no problem saving and reading smaller data structures (even large ones, just not his large) between octave and R using octave's save -7 (which saves MATLAB v5 files) and R.matlab's readMat. And I can save in text format in octave and read in R using read.octave (from package foreign) so it's not a big deal. I was just surprised that R.matlab needed more memory than I have (I have 3GB on this machine). Thanks, Stefano On Sun, Dec 19, 2010 at 10:54 PM, Ben Bolker bbol...@gmail.com wrote: Stefano Ghirlanda dr.ghirlanda at gmail.com writes: I am trying to load into R a MATLAB format file (actually, as saved by octave). The file is about 300kB but R complains with a memory allocation error: library(Rcompression) library(R.matlab) Loading required package: R.oo Loading required package: R.methodsS3 R.methodsS3 v1.2.0 (2010-03-13) successfully loaded. See ?R.methodsS3 for help. R.oo v1.7.2 (2010-04-13) successfully loaded. See ?R.oo for help. R.matlab v1.3.1 (2010-04-20) successfully loaded. See ?R.matlab for help. f - readMat(freq.mat) Error: cannot allocate vector of size 296.5 Mb On the other hand, if I save the same data in ascii format (from octave: save -text), resulting in a 75MB file, then I can load it without problems with the read.octave() function from package foreign. Is this a known issue or am I doing something wrong? My R version is: This is not a package I'm particularly familiar with, but: what commands did you use to save the file in octave? Based on 'help save' I think that 'save' by default would get you an octave format file ... you might have to do some careful reading in ?readMat (in R) and 'help save' (in octave) to figure out the correspondence between octave/MATLAB and R/MATLAB. If possible, try saving a small file and see if it works; if you still don't know what's going on, post that file somewhere for people to try. I was able to save -6 save.mat in octave and readMat(save.mat) in R successfully, saving a vector of integers from 1 to 1 million (which took about 7.7 Mb) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Stefano Ghirlanda www.intercult.su.se/~stefano - drghirlanda.wordpress.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to see what's wrong with a self written function?
Hi all, I am writing a simple function to implement regularfalsi (secant) method. ### regulafalsi=function(f,x0,x1){ x=c() x[1]=x1 i=1 while ( f(x[i])!=0 ) { i=i+1 if (i==2) { x[2]=x[1]-f(x[1])*(x[1]-x0)/(f(x[1])-f(x0)) } else { x[i]=x[i-1]-f(x[i-1])*(x[i-1]-x[i-2])/(f(x[i-1])-f(x[i-2])) } } x[i] } ### These work fine, regulafalsi(function(x) x^(1/2)+3*log(x)-5,1,10) regulafalsi(function(x) x^(1/2)+3*log(x)-5,10,1) For all x0, the function is strictly increasing. Then regulafalsi(function(x) x^(1/2)+3*log(x)-5,1,100) Error in while (f(x[i]) != 0) { : missing value where TRUE/FALSE needed In addition: Warning message: In log(x) : NaNs produced I dont know what happened there, is there a way to find the value for f(x[i]) that R can't determine TRUE/FALSE? Thanks! casper -- View this message in context: http://r.789695.n4.nabble.com/how-to-see-what-s-wrong-with-a-self-written-function-tp3159528p3159528.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] variable lengths differ (found for '(weights)') error in Zelig library
Dear R users, I am trying to estimate to estimate the average treatmen effect on the treated (ATT) using first the MatchIt software to weight the data set and, after this, the Zelig software as shown in Ho et al. (2007). See here for an explanation of how to apply this technique in R: http://imai.princeton.edu/research/files/matchit.pdf I encounter a slight problem when I apply the weights that are produced in the stage of preprocessing the data. The idea of this is to use the MatchIt software to preprocess the data and then use the Zelig software to generate the distribution of ATT. I believe that the main reason for preprocessing the data is to create weights (depending on the matching technique you use) so that balance would be achieved for the matching variables between the treatment and the control group. Then you use these weights in the regressions that follow in the Zelig library. Copied from the matchit article, whose link I provide above, the authors say: If one chooses options that allow matching with replacement, or any solution that has different numbers of controls (or treateds) within each subclass or strata (such as full matching), then the parametric analysis following matching must accomodate these procedures, such as by using fixed effects or weights, as appropriate. (Similar procedures can also be used to estimate various other quantities of interest such as the average treatment effect by computing it for all observations, but then one must be aware that the quantity of interest may change during the matching procedure as some control units may be dropped.) The following code is for the lalonde data set, where I get an error message in the end: library(Zelig) library(MatchIt) data(lalonde) m.out1 = matchit(treat ~ age + educ + black + hispan + nodegree + married + re74 + re75, method = subclass, subclass=6, data = lalonde) z.out1 = zelig(re78 ~ age + educ + black + hispan + nodegree + married + re74 + re75, data = match.data(m.out1, control), model = ls, weights=weights) x.out1 = setx(z.out1, data = match.data(m.out1, treat), cond = TRUE) s.out1 = sim(z.out1, x = x.out1) Error in model.frame.default(formula = re78 ~ age + educ + black + hispan + : variable lengths differ (found for '(weights)') I was wondering if somebody could tell me how to get around with this problem? Also, I have seen people adding the propensity scores in the regression analysis applied in the Zelig package, i.e. z.out1 = zelig(re78 ~ age + educ + black + hispan + nodegree + married + re74 + re75 + *distance*, data = match.data(m.out1, control), model = ls, weights=weights) Does anyone have a clue of why this can happen? Kind regards, Sotiris [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Density plot with lattice?
Hi, Is it possible to remove the points at the base of a density plot?I would like to keep only the curves of the plot, not the points. Thank you. Marie-Helene HacheyM.Sc. studentUniversite Laval, Quebec __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lm() on a matrix of zoo series
I have a matrix of zoo series. each series is in a column. x - as.yearmon(2000 + seq(0, 23)/12) # 24 months of data, lets make 20 sets of random data testData - matrix(rnorm(480),ncol=20) # make a zoo object and columns will hold the 20 series TestZoo - zoo(testData,order.by=x) # now run lm for just one series. m - lm(TestZoo[,1]~time(TestZoo))$coeff[2] m time(TestZoo) 0.3443124 m2 - lm(TestZoo[,2]~time(TestZoo))$coeff[2] m2 time(TestZoo) -0.1192866 I've been struggling trying to use apply ( or something equally suitable) to get a vector of m for this entire matrix without resorting to a loop. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lm() on a matrix of zoo series
On Tue, Dec 21, 2010 at 3:02 PM, steven mosher mosherste...@gmail.com wrote: I have a matrix of zoo series. each series is in a column. x - as.yearmon(2000 + seq(0, 23)/12) # 24 months of data, lets make 20 sets of random data testData - matrix(rnorm(480),ncol=20) # make a zoo object and columns will hold the 20 series TestZoo - zoo(testData,order.by=x) # now run lm for just one series. m - lm(TestZoo[,1]~time(TestZoo))$coeff[2] m time(TestZoo) 0.3443124 m2 - lm(TestZoo[,2]~time(TestZoo))$coeff[2] m2 time(TestZoo) -0.1192866 I've been struggling trying to use apply ( or something equally suitable) to get a vector of m for this entire matrix without resorting to a loop. Try this: lm(TestZoo ~ time(TestZoo)) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lm() on a matrix of zoo series
Thanks, I was trying apply(TestZoo,2,lm,TestZoo~time(TestZoo)) which was throwing a formula error. On Tue, Dec 21, 2010 at 12:21 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Tue, Dec 21, 2010 at 3:02 PM, steven mosher mosherste...@gmail.com wrote: I have a matrix of zoo series. each series is in a column. x - as.yearmon(2000 + seq(0, 23)/12) # 24 months of data, lets make 20 sets of random data testData - matrix(rnorm(480),ncol=20) # make a zoo object and columns will hold the 20 series TestZoo - zoo(testData,order.by=x) # now run lm for just one series. m - lm(TestZoo[,1]~time(TestZoo))$coeff[2] m time(TestZoo) 0.3443124 m2 - lm(TestZoo[,2]~time(TestZoo))$coeff[2] m2 time(TestZoo) -0.1192866 I've been struggling trying to use apply ( or something equally suitable) to get a vector of m for this entire matrix without resorting to a loop. Try this: lm(TestZoo ~ time(TestZoo)) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to see what's wrong with a self written function?
On 21/12/2010 2:39 PM, casperyc wrote: Hi all, I am writing a simple function to implement regularfalsi (secant) method. ### regulafalsi=function(f,x0,x1){ x=c() x[1]=x1 i=1 while ( f(x[i])!=0 ) { i=i+1 if (i==2) { x[2]=x[1]-f(x[1])*(x[1]-x0)/(f(x[1])-f(x0)) } else { x[i]=x[i-1]-f(x[i-1])*(x[i-1]-x[i-2])/(f(x[i-1])-f(x[i-2])) } } x[i] } ### These work fine, regulafalsi(function(x) x^(1/2)+3*log(x)-5,1,10) regulafalsi(function(x) x^(1/2)+3*log(x)-5,10,1) For all x0, the function is strictly increasing. Then regulafalsi(function(x) x^(1/2)+3*log(x)-5,1,100) Error in while (f(x[i]) != 0) { : missing value where TRUE/FALSE needed In addition: Warning message: In log(x) : NaNs produced I dont know what happened there, is there a way to find the value for f(x[i]) that R can't determine TRUE/FALSE? The easiest is to just use regular old-fashioned debugging methods, i.e. insert print() or cat() statements into your function. You could also try debug(regulafalsi) and single step through it to see where things go wrong. (An obvious guess is that one of the values being passed to f is negative, but you'll have to figure out why that happened and what to do about it.) Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Keeping Leading Zeros, Treating numbers as text
Hello, I have a data set, with some numerical values, some non-numerical data, my issue is that I need to preserve my ID numbers (numerics) with the leading zeros, but when I import the data into R (it's in .csv format) using the read.csv( ) command, it turns all the ID numbers (Example: 00210) into numbers, removing the leading zeros, so I end up with 210. I tried using the as.is= command on the column that I wanted to treat as text, but it had no effect. Any help would be very much appreciated, Thanks, James [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Keeping Leading Zeros, Treating numbers as text
Try reading the csv file with, say, Notepad. I think you may find that the problem is that Excel assumes the column is numeric and strips off the zeros before saving the file. So you need to tell it that the ID columns are character before saving. Then you need to read the Help page for read.csv more carefully, noting, in particular, the colClasses argument. -- Bert On Tue, Dec 21, 2010 at 12:43 PM, James Splinter james.r.splin...@gmail.com wrote: Hello, I have a data set, with some numerical values, some non-numerical data, my issue is that I need to preserve my ID numbers (numerics) with the leading zeros, but when I import the data into R (it's in .csv format) using the read.csv( ) command, it turns all the ID numbers (Example: 00210) into numbers, removing the leading zeros, so I end up with 210. I tried using the as.is= command on the column that I wanted to treat as text, but it had no effect. Any help would be very much appreciated, Thanks, James [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics 467-7374 http://devo.gene.com/groups/devo/depts/ncb/home.shtml __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Keeping Leading Zeros, Treating numbers as text
James, How about sprintf('%05d', 210) It works for fixed length id numbers. Dave From: James Splinter james.r.splin...@gmail.com To: R-help@r-project.org Date: 12/21/2010 02:44 PM Subject: [R] Keeping Leading Zeros, Treating numbers as text Sent by: r-help-boun...@r-project.org Hello, I have a data set, with some numerical values, some non-numerical data, my issue is that I need to preserve my ID numbers (numerics) with the leading zeros, but when I import the data into R (it's in .csv format) using the read.csv( ) command, it turns all the ID numbers (Example: 00210) into numbers, removing the leading zeros, so I end up with 210. I tried using the as.is= command on the column that I wanted to treat as text, but it had no effect. Any help would be very much appreciated, Thanks, James [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Keeping Leading Zeros, Treating numbers as text
On Dec 21, 2010, at 4:11 PM, Bert Gunter wrote: Try reading the csv file with, say, Notepad. I think you may find that the problem is that Excel assumes the column is numeric and strips off the zeros before saving the file. So you need to tell it that the ID columns are character before saving. If Excel turns out to be the culprit, there is an equivalent operation to the colClasses specification which you can do to prevent leading zeros from being dropped. Select the entire column by clicking on the column letter at the top margin of the sheet and then choose Format/ Cells/... and pick Text. The same sort of preparation can also save you grief with Date types in Excel or OO.org. Then you need to read the Help page for read.csv more carefully, noting, in particular, the colClasses argument. -- Bert On Tue, Dec 21, 2010 at 12:43 PM, James Splinter james.r.splin...@gmail.com wrote: Hello, I have a data set, with some numerical values, some non-numerical data, my issue is that I need to preserve my ID numbers (numerics) with the leading zeros, but when I import the data into R (it's in .csv format) using the read.csv( ) command, it turns all the ID numbers (Example: 00210) into numbers, removing the leading zeros, so I end up with 210. I tried using the as.is= command on the column that I wanted to treat as text, but it had no effect. Any help would be very much appreciated, Thanks, James -- Bert Gunter Genentech Nonclinical Biostatistics 467-7374 http://devo.gene.com/groups/devo/depts/ncb/home.shtml David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Write.table eol argument
Hello All, R 2.11.1 Windows XP, 32-bit Help says that default is eol='\n'. To me, that represents Linefeed (LF) From Help: eol the character(s) to print at the end of each line (row). For example, eol=\r\n will produce Windows' line endings on a Unix-alike OS, and eol=\r will produce files as expected by Mac OS Excel 2004. I would like for write.table to end each line with LF only-no carriage return (CR). Default eol='\n' generates CRLF Explicit eol='\n' generates CRLF eol='\r' generates CR eol='\r\n' generates (predictably) CRCRLF Thank you for your time. Jim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Keeping Leading Zeros, Treating numbers as text
Use the colClasses argument with a vector of character strings naming the types you want each column to have, and specify character for your id column. James Splinter james.r.splin...@gmail.com wrote: Hello, I have a data set, with some numerical values, some non-numerical data, my issue is that I need to preserve my ID numbers (numerics) with the leading zeros, but when I import the data into R (it's in .csv format) using the read.csv( ) command, it turns all the ID numbers (Example: 00210) into numbers, removing the leading zeros, so I end up with 210. I tried using the as.is= command on the column that I wanted to treat as text, but it had no effect. Any help would be very much appreciated, Thanks, James [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Write.table eol argument
At least on Windows, you need to open the file in binary mode (as opposed to text mode) to prevent the usual OS-dependent way of encoding end-of-line. E.g., z - data.frame(x=1:3, y=state.name[1:3]) f - file(tmp.csv, open=wb) write.table(z, file=f, quote=FALSE, sep=;, eol=\n) close(f) # do not forget to close it! system(e:\\cygwin\\bin\\od -c --width=8 tmp.csv) 000 x ; y \n 1 ; 1 ; 010 A l a b a m a \n 020 2 ; 2 ; A l a s 030 k a \n 3 ; 3 ; A 040 r i z o n a \n 047 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jim Moon Sent: Tuesday, December 21, 2010 1:37 PM To: R-help@r-project.org Subject: [R] Write.table eol argument Hello All, R 2.11.1 Windows XP, 32-bit Help says that default is eol='\n'. To me, that represents Linefeed (LF) From Help: eol the character(s) to print at the end of each line (row). For example, eol=\r\n will produce Windows' line endings on a Unix-alike OS, and eol=\r will produce files as expected by Mac OS Excel 2004. I would like for write.table to end each line with LF only-no carriage return (CR). Default eol='\n' generates CRLF Explicit eol='\n' generates CRLF eol='\r' generates CR eol='\r\n' generates (predictably) CRCRLF Thank you for your time. Jim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Performing basic Multiple Sequence Alignment in R?
From: tal.gal...@gmail.com Date: Tue, 21 Dec 2010 20:17:18 +0200 Subject: Re: [R] Performing basic Multiple Sequence Alignment in R? To: r-help@r-project.org Dear Mike and Thomas, From what I gathered here (Thanks to Joris Meys): http://stackoverflow.com/questions/4497747/how-to-perform-basic-multiple-sequence-alignments-in-r/4498434#4498434 There is an R interface to the MUSCLE algorithm in the bio3d package (function seqaln()). But not one for clustal. I will probably end up using pairwiseAlignment on pairs of allignments with some sort of stopping rules (I'll have to play with it to see how it works). http://scholar.google.com/scholar?hl=enq=%22exact+string+matching%22+alignment http://citeseerx.ist.psu.edu/search?q=exact+string+matching+alignment+dnasubmit=Searchsort=rel Certainly if you are flexible and can use whatever may be close in R that is fine but I seem to recall that exact string matching was a fast and interesting way to go and maybe some of the authors above, in the interest of promoting their work, would help implement an R version if there is demand. I seem to recall I did something like building indexes of the strings to be aligned first, finding substrings that were unique to a given string but appeared only once in each of the sequences to be aligned ( this was the most restrictive criterion but you can imagine how to make it more accomodating). Now that you got me started, up front tokenizing or compiling of input sequences ( usually no more than indexing them in some way ) made many later operations like alignment go faster. This may have ended up being similar to BLAST but now I can't really recall. Anyway, my point here is that some where in R there may be packages that generate intermediate forms useful across disciplines- mining data from text, linquistics, or macromolecule analysis. In fact, the indexing process helps find things that have migrated a long ways from their original place and there are probably other non-alignment related things you could get out of the approach. Thank you all for your answers. It is always helpful to from others if something was already implemented in R or not. Best, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Tue, Dec 21, 2010 at 2:44 PM, Mike Marchywka wrote: e came here with a task and was pointed to bio packages but I thought there m __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Matching 2 SQL tables
Hi, I have a postgresql and a mysql database and I would like to combine the info from two different tables in R. Both databases contain a table with three columns: project_name, release_id and release_date. So each project output could be released multiple times (I am interested in the first release_date). However, some of the data is missing. Basically, what I want to do is to try and fill the missing data in 1 table with the data from the other table. The difficulty here is that table1$project_name IS NOT table2$project_name. Example: green-tree and green tree, new(Jacket) and newJacket. Could you please help me? Thanks! Mathijs -- View this message in context: http://r.789695.n4.nabble.com/Matching-2-SQL-tables-tp3159678p3159678.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] randomForest: tuneRF error
Just curious if anyone else has got this error before, and if so, would know what I could do (if anything) to get past it: mtry - tuneRF(training, trainingdata$class, ntreeTry = 500, stepFactor = 2, improve = 0.05, trace = TRUE, plot = TRUE, doBest = FALSE) mtry = 13 OOB error = 0.62% Searching left ... mtry = 7OOB error = 1.38% -1.22 0.05 Searching right ... mtry = 26 OOB error = 0.24% 0.611 0.05 mtry = 52 OOB error = 0.07% 0.7142857 0.05 mtry = 104 OOB error = 0% 1 0.05 mtry = 173 OOB error = 0% NaN 0.05 Error in if (Improve improve) { : missing value where TRUE/FALSE needed I've used tuneRF successfully before, but in this instance, no matter what I change in the parameters, I still get the error above (last line). The data has no NAs in it. I'm using R 2.12.0 (64bit-M$ Windows 7). Thanks in advance! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Link prediction in social network with R
Dear R users I'm a novice user of R and have absolutely no prior knowledge of social network analysis, so apologies if my question is trivial. I've spent alot of time trying to solve this on my own but I really can't so hope someone here can help me out. Cheers! The dataset: I'm trying to predict the existance of links (True or False) in a test set using a training set. Both data sets are in an edgelist format, where User IDs represents nodes in both columns with the 1st column directing to the 2nd column (see figure 1 below). Using the AUC to evaluate the performance, I am looking for the best algorithm to predict the existance of links in the test data (50% are true and rest are false). Figure 1: training Vertices: 1133143 Edges: 999 Directed: TRUE Edges: [0] 105 - 850956 [1] 105 - 1073420 [2] 105 - 1102667 [3] 165 - 888346 [4] 165 - 579649 [5] 165 - 136665 etc.. I'm having problems obtaining the probability scores for the links / edges as most of the scores are for the nodes. An example of this is the graph.knn and page.rank module in igraph. So my questions are: 1) What do I need to do to obtain the scores for the links instead of the nodes (I presume it must be a data preparation step that I must be missing out)? 2) Which R package would be the best for running the various techniques - Jackard index, Adamic-Adar, common neightbours, PropFlow, etc 3) How to implement a supervised learning method such as random forest (I am guessing I need to obtain a feature list but again, how can I get the scores for the edges)? Hope I've explain my questions well but do let me know if more clarification is need. Thanks in advance Eu Jin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Density plot with lattice?
Hi: Try this: densityplot( ~ height | voice.part, data = singer, layout = c(2, 4), xlab = Height (inches), bw = 5) densityplot( ~ height | voice.part, data = singer, layout = c(2, 4), xlab = Height (inches), bw = 5, plot.points = FALSE) The plot.points argument is actually associated with panel.densityplot(); in this case, you can pass it from within densityplot(). HTH, Dennis On Tue, Dec 21, 2010 at 10:58 AM, Marie-Hélène Hachey marie_helen...@hotmail.com wrote: Hi, Is it possible to remove the points at the base of a density plot?I would like to keep only the curves of the plot, not the points. Thank you. Marie-Helene HacheyM.Sc. studentUniversite Laval, Quebec __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] please Help me on a repeated measures anova
I currently work on a draft of an aquatic bioassessment. The conditions tested are the following: ER river water T dechlorinated water control 0.5 + 0.5mg / L of malate T + 1 dechlorinated water control + 1g / L of malate T ED dechlorinated water control SED + ER + river water sediment SED ED + sediment + water dechlorinated. It is the result of AChE in muscle (fillet of fish). The production of acetylcholine is followed with a spectrophotometer every 15 seconds for two minutes. The results are presented in the following table: traitement t15 t30 t45 t60 t75 t90 t105 t120 ER 0.100 0.110 0.123 0.135 0.147 0.159 0.171 0.182 ER 0.112 0.134 0.153 0.174 0.192 0.208 0.226 0.251 T+0.5 0.078 0.082 0.088 0.094 0.101 0.108 0.113 0.120 t+0.5 0.053 0.100 0.109 0.120 0.127 0.136 0.145 0.154 TED 0.107 0.126 0.141 0.161 0.172 0.184 0.200 0.213 TED 0.117 0.135 0.153 0.169 0.183 0.201 0.218 0.229 TED 0.124 0.145 0.163 0.187 0.208 0.227 0.244 0.259 T+1 0.109 0.119 0.134 0.148 0.163 0.174 0.187 0.202 T+1 0.118 0.134 0.153 0.170 0.184 0.197 0.214 0.228 SED+ER 0.158 0.175 0.194 0.208 0.226 0.240 0.259 0.268 SED+ED 0.119 0.140 0.157 0.174 0.192 0.208 0.225 0.240 SED+ED 0.101 0.113 0.180 0.140 0.154 0.166 0.179 0.190 SED+ED 0.129 0.135 0.140 0.146 0.153 0.159 0.165 0.172 The statistical test is considered a repeated measures anova but I do not know how to do it in R. I watched the forums and I downloaded the R package 'nlme' by which I should be able to use the function 'lm'. But the problem is that I can not encode this function. Could you help me? -- View this message in context: http://r.789695.n4.nabble.com/please-Help-me-on-a-repeated-measures-anova-tp3159868p3159868.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] installation of R/parallel package in win32/64
This is to summarize my workaround to install R/parallel in win32/64 boxes. Recently I had problems to install rparallel: 1. The package's Makevars.win coded include fixed path for Rtools 2. package is written in C++; Rtools and R are not intended to run g++ by default. My workaround: 1. Need to install R, Rtools as usual. Note Rtools 2.12 has both 32bit mingw and 64 bit. However, it doesn't include package for g++ and libstdc++. You need to install these 2 packages into rtools from sourceforge first. 2. In rparallel source tree, open Makevars.win: change the PKG_CPPFLAGS and PKG_LIBS variables to point to your correct rtools and rtools/mingw directories. 3. In R_installation/etc: open i386 or win64 (forget the name for 64bit) and open Makeconf file; look for DLLFLAGS += ... and append -static-libstdc++. This flag will cause g++ to statically link in libstdc++. then you can run usual R CMD INSTALL rparallel. Good luck. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to see what's wrong with a self written function?
Here is what I get when I have: options(error=utils::recover) I always run with the option so that on an error, I get dumped in the browser to see what is happening. It appears that 'i == 3' when the error occurs and you can also see the values of 'x': regulafalsi=function(f,x0,x1){ +x=c() +x[1]=x1 +i=1 +while ( f(x[i])!=0 ) { +i=i+1 +if (i==2) { +x[2]=x[1]-f(x[1])*(x[1]-x0)/(f(x[1])-f(x0)) +} else { + x[i]=x[i-1]-f(x[i-1])*(x[i-1]-x[i-2])/(f(x[i-1])-f(x[i-2])) +} +} +x[i] + } regulafalsi(function(x) x^(1/2)+3*log(x)-5,10,1) [1] 2.978429 regulafalsi(function(x) x^(1/2)+3*log(x)-5,1,100) Error in while (f(x[i]) != 0) { : missing value where TRUE/FALSE needed In addition: Warning message: In log(x) : NaNs produced Enter a frame number, or 0 to exit 1: regulafalsi(function(x) x^(1/2) + 3 * log(x) - 5, 1, 100) Selection: 1 Called from: top level Browse[1] i [1] 3 Browse[1] x [1] 100.0 18.35661 -42.22301 Browse[1] On Tue, Dec 21, 2010 at 2:39 PM, casperyc caspe...@hotmail.co.uk wrote: Hi all, I am writing a simple function to implement regularfalsi (secant) method. ### regulafalsi=function(f,x0,x1){ x=c() x[1]=x1 i=1 while ( f(x[i])!=0 ) { i=i+1 if (i==2) { x[2]=x[1]-f(x[1])*(x[1]-x0)/(f(x[1])-f(x0)) } else { x[i]=x[i-1]-f(x[i-1])*(x[i-1]-x[i-2])/(f(x[i-1])-f(x[i-2])) } } x[i] } ### These work fine, regulafalsi(function(x) x^(1/2)+3*log(x)-5,1,10) regulafalsi(function(x) x^(1/2)+3*log(x)-5,10,1) For all x0, the function is strictly increasing. Then regulafalsi(function(x) x^(1/2)+3*log(x)-5,1,100) Error in while (f(x[i]) != 0) { : missing value where TRUE/FALSE needed In addition: Warning message: In log(x) : NaNs produced I dont know what happened there, is there a way to find the value for f(x[i]) that R can't determine TRUE/FALSE? Thanks! casper -- View this message in context: http://r.789695.n4.nabble.com/how-to-see-what-s-wrong-with-a-self-written-function-tp3159528p3159528.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] matrix indexing in 'for' loop?
Hi, I am having trouble with matrices. I have 2 matrices as given below, and I am interested in using these matrices inside for loops used to calculate correlations. I am creating a list with the names of the matrices assuming this list could be indexed inside the 'for' loop to retrieve the matrix values. But, as expected the code throws out an error. Can someone suggest a better way to call these matrices inside the loops? ts.m.dmi - matrix(c(1:20), 4, 5) ts.m.soi - matrix(c(21:40), 4, 5) ts.m.pe - matrix(c(21:40), 4, 5) factors - c(ts.m.dmi, ts.m.soi) for (j in 0:1){ y - factors[j+1] for (i in 1:5){ cor.pe.y - cor(ts.m.pe[,2], y[,i]) ct.tst - cor.test(ts.m.pe[,2], y[,i]) } } Thanks for your time. -- Regards, Maha Graduate Student [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] matrix indexing in 'for' loop?
To make your loop work, you need to learn about the get function. I'm not going to give you the details because there are better approaches available. First, let's make some data that will give values which can be verified. (All the correlations of the data you created are exactly equal to 1.) And to make the code readable, I'll omit the ts.m prefix. set.seed(14) dmi = matrix(rnorm(20),4,5) soi = matrix(rnorm(20),4,5) pe = matrix(rnorm(20),4,5) allmats = list(dmi,soi,pe) Since cor.test won't automatically do the tests for all columns of a matrix, I'll write a little helper function: gettests = function(x)apply(x,2,function(col)cor.test(pe[,2],col) tests = lapply(allmats,gettests) Now tests is a list of length 2, with a list of the output from cor.test for the five columns of the each matrix with pe[,2] (Notice that in your program you made no provision to store the results anywhere.) Suppose you want the correlations: sapply(tests,function(x)sapply(x,function(test)test$estimate)) [,1] [,2] cor 0.12723615 0.1342751 cor 0.07067819 0.6228158 cor -0.28761533 0.6218661 cor 0.83731828 -0.9602551 cor -0.36050836 0.1170035 The probabilities for the tests can be found similarly: sapply(tests,function(x)sapply(x,function(test)test$p.value)) [,1] [,2] [1,] 0.8727638 0.86572490 [2,] 0.9293218 0.37718416 [3,] 0.7123847 0.37813388 [4,] 0.1626817 0.03974489 [5,] 0.6394916 0.88299648 (Take a look at the Values section in the help file for cor.test to get the names of other quantities of interest.) The main advantage to this approach is that if you add more matrices to the allmats list, the other steps automaticall take it into account. Hope this helps. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Tue, 21 Dec 2010, govin...@msu.edu wrote: Hi, I am having trouble with matrices. I?have 2 matrices as given below, and I am interested in using these matrices inside for loops used to calculate correlations. I am creating a list with the names of the matrices assuming this list could be indexed inside the 'for' loop to retrieve the matrix values. But, as expected the code throws out an error. Can someone suggest a better way to call these matrices inside the loops? ts.m.dmi - matrix(c(1:20), 4, 5) ts.m.soi - matrix(c(21:40), 4, 5) ts.m.pe - matrix(c(21:40), 4, 5) factors - c(ts.m.dmi, ts.m.soi) for (j in 0:1){ y - factors[j+1] for (i in 1:5){ cor.pe.y - cor(ts.m.pe[,2], y[,i]) ct.tst - cor.test(ts.m.pe[,2], y[,i]) } } Thanks for your time. -- Regards, Maha Graduate Student [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] please Help me on a repeated measures anova
Hi: I did the following (note a fix to the assumed typo in t + 0.5 - T + 0.5) using the melt() function in package reshape2, the lattice graphics package and package lme4. I named your input data df. library(reshape) # or reshape2 if you have it # Fix the typo: df[4, 1] - 'T+0.5' # Redefine the factor to produce the correct number of levels: df$traitement - factor(df$traitement) # Create a subject variable to distinguish profiles in time df$subject - as.numeric(row.names(df)) # reshape the data from wide format to long dm - melt(df, id = c('traitement', 'subject')) # sort the reshaped data frame dm - dm[order(dm$traitement, dm$subject, dm$variable), ] head(dm) # Create a numeric time variable by stripping off the 't's dm$time - as.numeric(sub('^t','', dm$variable)) # Plot the individual profiles over time by treatment type library(lattice) xyplot(value ~ time | traitement, data = dm, groups = subject, type = c('p', 'l')) # The individual profiles are almost uniformly linearly increasing # with a couple of obvious nonconforming points visible in the plots. # There are mean differences among treatments, # but also unbalanced replication in subjects. Treatment (SED + ER) has only one subject. # One way to fit a model: library(lme4) m1 - lmer(value ~ traitement + (1 + time | subject), data = dm, reml = 0) summary(m1) This fits a mixed effects model with random subjects and time as a repeated measures variable, using maximum likelihood to fit the model. This particular specification treats time as numeric rather than factor because the linear component is so strong, but it is possible to replace it with the factor version instead (variable in data frame dm). The output of this model fit shows a very small residual effect, a strong correlation between time and subject (the sign seems wrong, though) and about the same amount of variation between subjects as within subjects. This is a model you should seriously consider, as it takes proper account of the randomness of subjects and the nesting of time as a linear effect within subject. I would encourage you to follow this direction, but there is much to learn if you are to use the lme4 package. I suspect, however, you're looking for something more along the lines of Anova() in the car package, which uses the 'traditional' ANOVA approach to repeated measures models. If you go in this direction, be sure you understand the underlying assumptions of the model. For multiple comparisons, which I presume you'll want to investigate, there's the TukeyHSD() function that you could use with Anova(), or for more general methods, the multcomp package, which has a function glht() that can be used with a mixed effects model per above or with an Anova() object. The multcomp package has several useful vignettes and a recent book that describes its essential features. Refs: Bretz, Hothorn and Westfall (2010). Multiple Comparisons in R. Chapman Hall. Fox and Weisberg (2011). An R Companion to Applied Regression, 2nd ed. Sage Publications. (Just out!) HTH, Dennis On Tue, Dec 21, 2010 at 3:10 PM, soileil soil...@msn.com wrote: I currently work on a draft of an aquatic bioassessment. The conditions tested are the following: ER river water T dechlorinated water control 0.5 + 0.5mg / L of malate T + 1 dechlorinated water control + 1g / L of malate T ED dechlorinated water control SED + ER + river water sediment SED ED + sediment + water dechlorinated. It is the result of AChE in muscle (fillet of fish). The production of acetylcholine is followed with a spectrophotometer every 15 seconds for two minutes. The results are presented in the following table: traitement t15 t30 t45 t60 t75 t90 t105 t120 ER 0.100 0.110 0.123 0.135 0.147 0.159 0.171 0.182 ER 0.112 0.134 0.153 0.174 0.192 0.208 0.226 0.251 T+0.5 0.078 0.082 0.088 0.094 0.101 0.108 0.113 0.120 t+0.5 0.053 0.100 0.109 0.120 0.127 0.136 0.145 0.154 TED 0.107 0.126 0.141 0.161 0.172 0.184 0.200 0.213 TED 0.117 0.135 0.153 0.169 0.183 0.201 0.218 0.229 TED 0.124 0.145 0.163 0.187 0.208 0.227 0.244 0.259 T+1 0.109 0.119 0.134 0.148 0.163 0.174 0.187 0.202 T+1 0.118 0.134 0.153 0.170 0.184 0.197 0.214 0.228 SED+ER 0.158 0.175 0.194 0.208 0.226 0.240 0.259 0.268 SED+ED 0.119 0.140 0.157 0.174 0.192 0.208 0.225 0.240 SED+ED 0.101 0.113 0.180 0.140 0.154 0.166 0.179 0.190 SED+ED 0.129 0.135 0.140 0.146 0.153 0.159 0.165 0.172 The statistical test is considered a repeated measures anova but I do not know how to do it in R. I watched the forums and I downloaded the R package 'nlme' by which I should be able to use the function 'lm'. But the problem is that I can not encode this function. Could you help me? -- View this message in context: http://r.789695.n4.nabble.com/please-Help-me-on-a-repeated-measures-anova-tp3159868p3159868.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list
Re: [R] how to control ticks
Em 21/12/2010 14:35, peter dalgaard escreveu: On Dec 21, 2010, at 17:01 , Edwin Groot wrote: On Tue, 21 Dec 2010 18:06:52 +0530 Yogesh Tiwariyogesh@googlemail.com wrote: Hi, I want 12 ticks at axis 1 and want to write Jan-Dec on each. something like: axis(1, at=1:12, labels=c('J','F','M','A','M','J','J','A','S','O','N','D')) I could omit default ticks but now how to control ticks. Dear Yogesh, I spray my clothing with No-Bite, and that controls ticks quite well. Yeah, but then how do you get the suckers to sit still while you write on them? ;-) You start telling them a story so they keep quiet paying attention? LOL __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Estimate between-axes vs within-axes heterogeneity of multivariate matrices
Hi! My question(s) in the end might be silly but I am no expert on this, so here it goes: Noy-Meir (1973), Pielou (1984) and a few others have pointed to non-centered PCA being in some cases useful. They clearly explain that it is the case when multi-dimensional data display distinct clusters (which have zero, or near-zero, projections in some subset of the axes) and the task is (exactly) to separate this clusters among the principal components. I have done my complete work using prcomp() and tested combinations of center=FALSE/TRUE and scale=FALSE/TRUE. I would like to now check this between-axes vs within-axes heterogeneity of my data and cross-check results with the various tested PCA-versions. Is there any (official or custom) function available in R that could answer this question? Some relative/comparative (preferrable simple and intuitive) measure(s)? Something that would graphically perhaps give an indication without time-consuming clustering, sampling or whatsoever processing? Even though the above mentoined authors mention some measure for the assymetry of the yielded compoenents ( uncentered - unipolar, centered - bipolar) I find the concept a bit hard to understand. Isn't there a quick way (function) to just say (with numbers of plots of course) well, it seems that the data are heterogenous looking at between- axes or the other way around it looks like the variables differ within, more than between? Apologies for repeating the same question (trying to understand the problem myself). Thank you, Nikos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Estimate between-axes vs within-axes heterogeneity of multivariate matrices
Hi! My question(s) in the end might be silly but I am no expert on this, so here it goes: Noy-Meir (1973), Pielou (1984) and a few others have pointed to non-centered PCA being in some cases useful. They clearly explain that it is the case when multi-dimensional data display distinct clusters (which have zero, or near-zero, projections in some subset of the axes) and the task is (exactly) to separate this clusters among the principal components. I have done my complete work using prcomp() and tested combinations of center=FALSE/TRUE and scale=FALSE/TRUE. I would like to now check this between-axes vs within-axes heterogeneity of my data and cross-check results with the various tested PCA-versions. Is there any (official or custom) function available in R that could answer this question? Some relative/comparative (preferrable simple and intuitive) measure(s)? Something that would graphically perhaps give an indication without time-consuming clustering, sampling or whatsoever processing? Even though the above mentoined authors mention some measure for the assymetry of the yielded compoenents ( uncentered - unipolar, centered - bipolar) I find the concept a bit hard to understand. Isn't there a quick way (function) to just say (with numbers of plots of course) well, it seems that the data are heterogenous looking at between- axes or the other way around it looks like the variables differ within, more than between? Apologies for repeating the same question (trying to understand the problem myself). Thank you, Nikos __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] zoo.read intraday data
Hi Gabor et al. the f3 - function(...) as.POSIXct(paste(...), format = %Y%m%d %H:%M:%S ) helped me to read intraday data from file ## TICKER,NAME,PER,DATE,TIME,OPEN,HIGH,LOW,CLOSE,VOL,OPENINT ICE.BRN,ice.brn_m5,5,20100802,10:40:00,79.21000,79.26000,79.16000,79.2,238,0 ICE.BRN,ice.brn_m5,5,20100802,10:45:00,79.19000,79.26000,79.19000,79.21000,413,0 ## ##intraday data 5m file fnameId= ./finam_brn_m5.csv pDateTimeColumns - list(4,5) b - read.zoo(fnameId, index=pDateTimeColumns , sep=,, header=TRUE, FUN=f3 ) xb - as.xts(b) head(b,2) ## X.TICKER. X.NAME.X.PER. X.OPEN. X.HIGH. X.LOW. X.CLOSE. X.VOL. X.OPENINT. 2010-08-02 10:40:00 ICE.BRN ice.brn_m5 5 79.21 79.26 79.16 79.20 238 0 2010-08-02 10:45:00 ICE.BRN ice.brn_m5 5 79.19 79.26 79.19 79.21 413 0 problem is that after the conversion to xts numeric values got converted to chars head(xb,2) X.TICKER. X.NAME. X.PER. X.OPEN. X.HIGH. X.LOW. X.CLOSE. X.VOL. X.OPENINT. 2010-08-02 10:40:00 ICE.BRN ice.brn_m5 579.21 79.26 79.16 79.20 238 0 2010-08-02 10:45:00 ICE.BRN ice.brn_m5 579.19 79.26 79.19 79.21 413 0 and quantmod charting does not work. Q. how to prevent converting to char with xts ? I suspect the problem is that index is constructed from two columns date and time. sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] quantmod_0.3-15 TTR_0.20-2 Defaults_1.1-1 xts_0.7-6.11 zoo_1.7-0 Slava -- View this message in context: http://r.789695.n4.nabble.com/zoo-read-intraday-data-tp3010256p3160102.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to control ticks
Hi Jim, Yes you are right, file$time is decimal date. In the attached plot I want to replace decimal date with proper time axis so I can show month ticks. Decimal date misleads sometime while interpretation. Data varies from Jan-Dec 2009. Thanks, Yogesh On Tue, Dec 21, 2010 at 9:57 PM, jim holtman jholt...@gmail.com wrote: What is the structure of file$time? Is it Date/POSIXct? 'at=1:12' only works if those are the dimensions of file$time. So give us an idea of what the data is (PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code). On Tue, Dec 21, 2010 at 7:36 AM, Yogesh Tiwari yogesh@googlemail.com wrote: Hi, I want 12 ticks at axis 1 and want to write Jan-Dec on each. something like: axis(1, at=1:12, labels=c('J','F','M','A','M','J','J','A','S','O','N','D')) I could omit default ticks but now how to control ticks. plot(file$time, file$ch4*1000, ylim=c(1500,1700), xaxt='n', xlab= NA, ylab=NA,col=blue,yaxs=i,lwd=2, pch=10, type=b)# axis(1, at=1:12, labels=c('J','F','M','A','M','J','J','A','S','O','N','D')) BUT above is not working, and there is no error as well. Pls help, Regards, Yogesh [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? -- Yogesh K. Tiwari (Dr.rer.nat), Scientist, Centre for Climate Change Research, Indian Institute of Tropical Meteorology, Homi Bhabha Road, Pashan, Pune-411008 INDIA Phone: 0091-99 2273 9513 (Cell) : 0091-20-25904452 (O) Fax: 0091-20-258 93 825 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.