[R] grouping by consecutive integers
Hello R-helpers! I have a question concerning extracting sequence information from a vector. I have a vector (representing the bins of a time series where the frequency of occurrences is greater than some threshold) where I would like to extract the min, median and max of each group of consecutive numbers. For Example: tmp - c(24,25,29,35,36,37,38,39,40,41,42,43,44,45,46,47,68,69,70,71) I would like to have the max,min,median of the following groups: 24,25 29 35,36,37,38,39,40,41,42,43,44,45,46,47, 68,69,70,71 I would like to be able to perform this for many time series so an automated process would be nice. I am hoping to use this as a peak detection protocol. Any advice would be greatly appreciated, Kevin - - Kevin J Emerson Center for Ecology and Evolutionary Biology 1210 University of Oregon Eugene, OR 97403 USA [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grouping by consecutive integers
Let me clarify one thing that I dont think I made clear in my posting. I am looking for the max, min and median of the indicies, not of the time series frequency counts. I am looking to find the max, min, and median time of peaks in a time series, so i am looking for the information concerning that. so mostly my question is how to extract the information of max, min, and median of sequential numbers in a vector. I will reword my original posting below. Hello R-helpers! I have a question concerning extracting sequence information from a vector. I have a vector (representing the bins of a time series where the frequency of occurrences is greater than some threshold) where I would like to extract the min, median and max of each group of consecutive numbers in the index vector.. For Example: tmp - c(24,25,29,35,36,37,38,39,40,41,42,43,44,45,46,47,68,69,70,71) I would like to have the max,min,median of the following groups: 24,25 - max = 25, min = 24 median = 24.5 29 max=min=median = 29 35,36,37,38,39,40,41,42,43,44,45,46,47, max = 45 min = 35 etc... 68,69,70,71 I would like to be able to perform this for many time series so an automated process would be nice. I am hoping to use this as a peak detection protocol. Any advice would be greatly appreciated, Kevin - - Kevin J Emerson Center for Ecology and Evolutionary Biology 1210 University of Oregon Eugene, OR 97403 USA [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Getting rid of for loops
Hello R-users! I have a style question. I know that for loops are somewhat frowned upon in R, and I was trying to figure out a nice way to do something without using loops, but figured that i could get it done quickly using them. I am now looking to see what kind of tricks I can use to make this code a bit more aesthetically appealing to other R users (and learn something about R along the way...). Here's the problem. I have a data.frame with 4 columns of dependent variables and then ~35 columns of predictor variables (factors) [for those interested, it is a qtl problem, where the predictors are genotypes at DNA markers and the dependent variable is a biological trait]. I want to go through all pairwise combinations of predictor variables and perform an anova with two predictors and their interaction on a given dependent variable. I then want to store the p.value of the interaction term, along with the predictor variable information. So I want to end up with a dataframe at the end with the two variable names and the interaction p value in each row, for all pairwise combinations of predictors. I used the following code: # qtl is the original data.frame, and my dependent var in this case is # qtl$CPP. marker1 - NULL marker2 - NULL p.interaction - NULL for ( i in 5:40) { # cols 5 - 41 are the predictor factors for (j in (i+1):41) { marker1 - rbind(marker1,names(qtl)[i]) marker2 - rbind(marker2,names(qtl)[j]) tmp2 - summary(aov(tmp$CPP ~ tmp[,i] * tmp[,j]))[[1]] p.interaction - rbind(p.interaction, tmp2$Pr(F)[3]) } } I have two questions: (1) is there a nicer way to do this without having to invoke for loops? (2) my other dependent variables are categorical in nature. I need basically the same information - I am looking for information regarding the interaction of predictors on a categorical variable. Any ideas on what tests to use? (I am new to analysis of all-categorical data). Thanks in advance! Kevin -- -- Kevin Emerson Center for Ecology and Evolutionary Biology 1210 University of Oregon Eugene, OR 97403 USA [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] logistic regression asymptote problem
R-helpers, I have a question about logistic regressions. Consider a case where you have binary data that reaches an asymptote that is not 1, maybe its 0.5. Can I still use a logistic regression to fit a curve to this data? If so, how can I do this in R. As far as I can figure out, using a logit link function assumes that the asymptote is at y = 1. An example. Consider the following data: tmp - structure(list(x = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14), yes = c(0, 0, 0, 2, 1, 14, 24, 15, 23, 18, 22, 20, 14, 17 ), no = c(94, 101, 95, 80, 81, 63, 51, 56, 30, 38, 31, 18, 21, 20)), .Names = c(x, yes, no), row.names = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14), class = data.frame) where x is the independent variable, and yes and no are counts of events. plotting the data you can see that the data seem to reach an asymptote at around y=0.5. using glm to fit a logistic regression it is easily seen that it does not fit well. tmp.glm - glm(cbind(yes,no) ~ x, data = tmp, family = binomial(link = logit)) plot(tmp.glm$fitted, type = l, ylim = c(0,1)) par(new=T) plot(tmp$yes / (tmp$yes + tmp$no), ylim = c(0,1)) Any suggestions would be greatly appreciated. Cheers, Kevin -- Kevin J Emerson Center for Ecology and Evolutionary Biology 1210 University of Oregon University of Oregon Eugene, OR 97403 [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] frequency tables
R-masters, I have a problem that I have been working on for a while and it seems that there may be a simple solution that I have yet to figure out, so I thought that I would venture to post to the help list. Let's say there was a data.frame with three vectors, two that are factors identifying the data, and one that holds the frequency of occurrence (the events are binary, yes or no). I would like to perform logistic regression on this data, and it seems that I need a vector of 0s and 1s for input into lrm. How might I convert between a frequency table and a vector of binary data while still maintaining all identifier information? I have thought about using the rep command over and over again and basically building the data.frame by hand but that seems long and tedious. Is there a quick and dirty way of doing this? Thanks in advance! Kevin -- Kevin J Emerson Center for Ecology and Evolutionary Biology 1210 University of Oregon University of Oregon Eugene, OR 97403 [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html