On Tue, 2007-09-25 at 16:39 -0400, lucy b wrote: > Dear List, > > I have an ascii text file with data I'd like to extract. Example: > > Year Built: 1873 Gross Building Area: 578 sq ft > Total Rooms: 6 Living Area: 578 sq ft > > There is a lot of data I'd like to ignore in each record, so I'm > hoping there is a way to use strings as delimiters to get the data I > want (e.g. tell R to take data between "Built:" and "Gross" - > incidentally, not always numeric). I think an ugly way would be to > start at the end of each record and use a substitution expression to > chip away at it, but I'm afraid it will take forever to run. Is there > a way to use strings as delimiters in an expression? > > Thanks in advance for ideas. > > LB
I don't know that any of the default base functions enable the use of a regex as a delimiter. If your text file is consistent in the use of the colon ':' as a separator, you might be able to use that. Each of the above lines then would be broken into 3 fields using: DF <- read.table("YourFile.txt", sep = ":") > DF V1 V2 V3 1 Year Built 1873 Gross Building Area 578 sq ft 2 Total Rooms 6 Living Area 578 sq ft You could then parse them further using appropriate functions if needed, such as gsub(): > as.data.frame(lapply(DF[, -1], function(x) gsub("[^0-9]", "", x))) V2 V3 1 1873 578 2 6 578 This now gives you the numeric data in two columns. You would now need to know that data in the rows are perhaps in some predictable or alternating order for further processing. See ?gsub and ?regex for more information. Hope that provides some help. You also might want to look at ?readLines and ?strsplit as other ways to read in the data and then post-process it once in an R object. Marc Schwartz ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.