[R] Subsetting a data frame by a factor, using the level that occurs the most times
I think that title makes sense... I hope it does... I have a data frame, one of the columns of which is a factor. I want the rows of data that correspond to the level in that factor which occurs the most times. I can get a list by doing: by(data,data$pattern,subset) And go through each element of the list counting the rows, to find the maximum BUT I can't help thinking there's a more elegant way of doing this The second part is figuring out the rows which have the maximum number of consecutive patterns which are the same... Now that I would love some help with... :-) Thanks Mick __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Subsetting a data frame by a factor, using the level that occurs the most times
From: Douglas Bates michael watson (IAH-C) wrote: I think that title makes sense... I hope it does... I have a data frame, one of the columns of which is a factor. I want the rows of data that correspond to the level in that factor which occurs the most times. So first you want to determine the mode (in the sense of the most frequently occuring value) of the factor. One way to do this is names(which.max(table(fac))) Use this comparison for the subset as subset(data, pattern == names(which.max(table(pattern Just be careful that if there are ties (i.e., more than one level having the max) which.max() will randomly pick one of them. That may or may not be what's desired. If that is a possibility, Mick will need to think what he wants in such cases. Andy __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Subsetting a data frame by a factor, using the level that occurs the most times
Liaw, Andy wrote: From: Douglas Bates michael watson (IAH-C) wrote: I think that title makes sense... I hope it does... I have a data frame, one of the columns of which is a factor. I want the rows of data that correspond to the level in that factor which occurs the most times. So first you want to determine the mode (in the sense of the most frequently occuring value) of the factor. One way to do this is names(which.max(table(fac))) Use this comparison for the subset as subset(data, pattern == names(which.max(table(pattern Just be careful that if there are ties (i.e., more than one level having the max) which.max() will randomly pick one of them. That may or may not be what's desired. If that is a possibility, Mick will need to think what he wants in such cases. According to the documentation it picks the first one. Also, that's what Martin Maechler told me and he wrote the code so I trust him on that. I figure that if you have to trust someone to be meticulous and precise then a German-speaking Swiss is a good choice. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Subsetting a data frame by a factor, using the level that occurs the most times
From: Douglas Bates Liaw, Andy wrote: From: Douglas Bates michael watson (IAH-C) wrote: I think that title makes sense... I hope it does... I have a data frame, one of the columns of which is a factor. I want the rows of data that correspond to the level in that factor which occurs the most times. So first you want to determine the mode (in the sense of the most frequently occuring value) of the factor. One way to do this is names(which.max(table(fac))) Use this comparison for the subset as subset(data, pattern == names(which.max(table(pattern Just be careful that if there are ties (i.e., more than one level having the max) which.max() will randomly pick one of them. That may or may not be what's desired. If that is a possibility, Mick will need to think what he wants in such cases. According to the documentation it picks the first one. Also, that's what Martin Maechler told me and he wrote the code so I trust him on that. I figure that if you have to trust someone to be meticulous and precise then a German-speaking Swiss is a good choice. My apologies! I got it mixed up with max.col, which does the tie-breaking. Andy __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html