>>>>> "BaRow" == Barry Rowlingson <[EMAIL PROTECTED]> >>>>> on Thu, 12 May 2005 11:05:43 +0100 writes:
BaRow> Uwe Ligges wrote: >> Please read about regular expressions (!!!) and try to >> understand that ".txt" also finds "Not_a_txt_file.xls" >> .... BaRow> The confusion here is between regular expressions BaRow> and wildcard expansion known as 'globbing'. The two BaRow> things are very different, and use characters such as BaRow> '*' '.' and '?' in different ways. Exactly, I had devised a "glob" to "regexp" function many years ago in order to help newbies make the transition. That function, nowadays, called 'glob2rx' has been part of our (CRAN) package "sfsmisc" and hence available to all via install.packages("sfsmisc") library("sfsmisc") But it's quite simple (though not trivial to read for the inexperienced because of the many escapes ("\") needed) and it maybe helpful to see its code on R-help, below. Then, this topic has lead me to add 2 (obvious in hindsight) logical optional arguments to the function so that it now looks like glob2rx <- function(pattern, trim.head = FALSE, trim.tail = TRUE) { ## Purpose: Change "ls" aka "wildcard" aka "globbing" _pattern_ to ## Regular Expression (as in grep, perl, emacs, ...) ## ------------------------------------------------------------------------- ## Author: Martin Maechler ETH Zurich, ~ 1991 ## New version using [g]sub() : 2004 p <- gsub('\\.','\\\\.', paste('^', pattern, '$', sep='')) p <- gsub('\\?', '.', gsub('\\*', '.*', p)) ## these are trimming '.*$' and '^.*' - in most cases only for esthetics if(trim.tail) p <- sub("\\.\\*\\$$", '', p) if(trim.head) p <- sub("\\^\\.\\*", '', p) p } So those confused newbies (and DOS long timers!) could use list.files(myloc, glob2rx("*.zip"), full=TRUE) ## (yes, make a habit of using 'TRUE', not 'T' ..) The current example code, BTW, has stopifnot(glob2rx("abc.*") == "^abc\\.", glob2rx("a?b.*") == "^a.b\\.", glob2rx("a?b.*", trim.tail=FALSE) == "^a.b\\..*$", glob2rx("*.doc") == "^.*\\.doc$", glob2rx("*.doc", trim.head=TRUE) == "\\.doc$", glob2rx("*.t*") == "^.*\\.t", glob2rx("*.t??") == "^.*\\.t..$" ) Martin Maechler, ETH Zurich BaRow> There's added confusion when people come from a DOS BaRow> background, where commands did their own thing when BaRow> given '*' as parameter. The DOS command: BaRow> RENAME *.FOO *.BAR BaRow> did what seems obvious, renaming all the .FOO files BaRow> to .BAR, but on a unix machine doing this with 'mv' BaRow> can be destructive! BaRow> In short (and slightly simplified), a '*' when BaRow> expanded as a wildcard in a glob matches any string, BaRow> whereas a '*' in a regular expression (regexp), BaRow> matches the previous character 0 or more times. This BaRow> is why "*.zip" is flagged as invalid now - there's no BaRow> character before the "*". BaRow> That should be enough clues to send you on your BaRow> way. BaRow> Baz ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html