| Confirmed for R-devel (current) on Ubuntu 17.10. But ... isn't the regexp | you use wrong, ie isn't R-devel giving the correct answer?
No, I don't think R-devel is correct (or at least consistent with the documentation). My interpretation of gsub("(\\w)", "\\U\\1", entry, perl = TRUE) is "Take every word character and replace it with itself, converted to uppercase." Perhaps my example was too minimal. Consider the following: R> gsub("(\\w)", "\\U\\1", entry, perl = TRUE) [1] "A" R> gsub("(\\w)", "\\1", entry, perl = TRUE) [1] "author: Amélie" # OK, but very different to 'A', despite only not specifying uppercase R> gsub("(\\w)", "\\U\\1", "author: Amelie", perl = TRUE) [1] "AUTHOR: AMELIE" # OK, but very different to 'A', R> gsub("^(\\w+?): (\\w)", "\\U\\1\\E: \\2", entry, perl = TRUE) "AUTHOR" # Where did everything after the first group go? I should note the following example too: R> gsub("(\\w)", "\\U\\1", entry, perl = TRUE, useBytes = TRUE) [1] "AUTHOR: AMéLIE" # latin1 encoding A call to `readLines` (possibly `scan()` and `read.table` and friends) is essential. On 18 February 2018 at 02:15, Dirk Eddelbuettel <e...@debian.org> wrote: > > On 17 February 2018 at 21:10, Hugh Parsonage wrote: > | I was told to re-raise this issue with R-dev: > | > | In the documentation of R-dev and R-3.4.3, under ?gsub > | > | > replacement > | > ... For perl = TRUE only, it can also contain "\U" or "\L" to convert > the rest of the replacement to upper or lower case and "\E" to end case > conversion. > | > | However, the following code runs differently: > | > | tempf <- tempfile() > | writeLines(enc2utf8("author: Amélie"), con = tempf, useBytes = TRUE) > | entry <- readLines(tempf, encoding = "UTF-8") > | gsub("(\\w)", "\\U\\1", entry, perl = TRUE) > | > | > | "AUTHOR: AMÉLIE" # R-3.4.3 > | > | "A" # R-dev > > Confirmed for R-devel (current) on Ubuntu 17.10. But ... isn't the regexp > you use wrong, ie isn't R-devel giving the correct answer? > > R> tempf <- tempfile() > R> writeLines(enc2utf8("author: Amélie"), con = tempf, useBytes = TRUE) > R> entry <- readLines(tempf, encoding = "UTF-8") > R> gsub("(\\w)", "\\U\\1", entry, perl = TRUE) > [1] "A" > R> gsub("(\\w+)", "\\U\\1", entry, perl = TRUE) > [1] "AUTHOR" > R> gsub("(.*)", "\\U\\1", entry, perl = TRUE) > [1] "AUTHOR: AMÉLIE" > R> > > Dirk > > -- > http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel