[R] selecting rows with more than x occurrences in a given column (data type is names)

2007-03-13 Thread Mike Jasper
Despite a long search on the archives, I couldn't find how to do this.
Thanks in advance for what is likely a simple issue.

I have a data set where the first column is name (i.e., 'Joe Smith',
'Jane Doe', etc). The following columns are data associated with that
person. I have many people with multiple rows. What I want is to get a
new data frame out with only the people who have more than x
occurrences in the first column.

Here's what I've done, that's not working:

Let's call my old data.frame all.data

table(all.data$names)10

I get a list of names and TRUE/FALSE values. I then want to make a
list of the TRUEs and pass that to some subset type command like

dup.names=table(all.data$names)10

new.data=(all.data[all.data$names==dup.names,])

That's not working because the dimensions are wrong (I think). But
even when I tried to do part of it manually (to troubleshoot) like
this

dup.names=c('Joe Smith','Jane Doe','etc')

I got warnings and it didn't work correctly. There must be a simple
way to do this that I'm just not seeing. Thanks.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] selecting rows with more than x occurrences in a given column(data type is names)

2007-03-13 Thread Mike Jasper
Thanks to all of you who got me the answer. The key I was missing was
%in%. Had never seen it before.

best.

On 3/13/07, Dimitris Rizopoulos [EMAIL PROTECTED] wrote:
 try this:

 set.seed(123)
 all.data - data.frame(name = sample(c(Joe, Elen, Jane, Mike),
 8, TRUE),
 x = rnorm(8), y = runif(8))
 ##
 tab.nams - table(all.data$name)
 nams - names(tab.nams[tab.nams = 2])
 all.data[all.data$name %in% nams, ]


 I hope it helps.

 Best,
 Dimitris

 
 Dimitris Rizopoulos
 Ph.D. Student
 Biostatistical Centre
 School of Public Health
 Catholic University of Leuven

 Address: Kapucijnenvoer 35, Leuven, Belgium
 Tel: +32/(0)16/336899
 Fax: +32/(0)16/337015
 Web: http://med.kuleuven.be/biostat/
  http://www.student.kuleuven.be/~m0390867/dimitris.htm


 - Original Message -
 From: Mike Jasper [EMAIL PROTECTED]
 To: r-help@stat.math.ethz.ch
 Sent: Tuesday, March 13, 2007 3:38 PM
 Subject: [R] selecting rows with more than x occurrences in a given
 column(data type is names)


  Despite a long search on the archives, I couldn't find how to do
  this.
  Thanks in advance for what is likely a simple issue.
 
  I have a data set where the first column is name (i.e., 'Joe Smith',
  'Jane Doe', etc). The following columns are data associated with
  that
  person. I have many people with multiple rows. What I want is to get
  a
  new data frame out with only the people who have more than x
  occurrences in the first column.
 
  Here's what I've done, that's not working:
 
  Let's call my old data.frame all.data
 
  table(all.data$names)10
 
  I get a list of names and TRUE/FALSE values. I then want to make a
  list of the TRUEs and pass that to some subset type command like
 
  dup.names=table(all.data$names)10
 
  new.data=(all.data[all.data$names==dup.names,])
 
  That's not working because the dimensions are wrong (I think). But
  even when I tried to do part of it manually (to troubleshoot) like
  this
 
  dup.names=c('Joe Smith','Jane Doe','etc')
 
  I got warnings and it didn't work correctly. There must be a simple
  way to do this that I'm just not seeing. Thanks.
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


 Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.