I have a table with a structure like the following: lang | basic id | doc id | topics| se | 447157 | MD_2002_0014 |12 |
loaded topics <- read.table("path to file",header=TRUE, sep="|", fileEncoding="utf-8") In that table the actual meaningful data (in this context) is the text before the first underscore in doc id which is the document type ( for example MD as above), and topics. However topics can have more than one value in it, multiple values are comma separated, if there is no actual topic I have a 0 although I can also have an empty column if I want. So what I want is the best way to extract the meaningful data - the comma separated values of each topics column and the actual document type so that I can start to do reports of how many documents of type X have no topics, median number of topics per document type etc. Do I have to loop through the table and build a new table up with the info I want, or is there a smarter way to do it? If a smarter way, what is that smarter way. Thanks, Bryan Rasmussen ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel