Le 2011-08-15 à 19:06, Duncan Murdoch a écrit : > On 11-08-15 2:42 PM, Denis Chabot wrote: >> Hi, >> >> I usually do not give second thought to accented vowels and R handles >> everything fine thanks to UTF8 being used in my R scripts. But today I have >> a problem. Accented vowels do not behave properly when they were imported >> into R using list.files. >> >> Maybe this is because OS X (I'm using 10.6.8) still uses MacRoman for file >> names, though visually the names seem to have been read correctly into R. >> >> An example is better than words: >> >> sessionInfo() >> R version 2.13.1 (2011-07-08) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> >> This does not cause problem: >> a = c("1_MO2 crevettes po2crit.Rda", "1_MO2 soles Sète sda.Rda", "1_MO2 >> turbots po2crit.Rda"); a >> [1] "1_MO2 crevettes po2crit.Rda" "1_MO2 soles Sète sda.Rda" "1_MO2 >> turbots po2crit.Rda" >> >> a2 = gsub(" Sète", "S", a); a2 >> [1] "1_MO2 crevettes po2crit.Rda" "1_MO2 solesS sda.Rda" "1_MO2 >> turbots po2crit.Rda" >> >> >> but if instead of creating the vector within the R script, I read it as a >> series of file names, the substitution does not work. I am sorry that I >> cannot make this a reproducible example as it requires the 3 files to exist >> on your computer, but you could create 3 dummy files having the same names >> in the directory of your choice. >> >> don = file.path("données/") >> b = list.files(path = don, pattern = "1_MO2"); b >> [1] "1_MO2 crevettes po2crit.Rda" "1_MO2 soles Sète sda.Rda" "1_MO2 >> turbots po2crit.Rda" >> >> b2 = gsub(" Sète", "S", b); b2 >> [1] "1_MO2 crevettes po2crit.Rda" "1_MO2 soles Sète sda.Rda" "1_MO2 >> turbots po2crit.Rda" >> >> I am puzzled and also "stuck". For now I'll modify the file name, but I need >> to be able to handle such names at some point. >> >> Any advice? > > > Possibly your system really is using MacRoman or some other local encoding; > in that case, iconv(x, "", "UTF-8") should convert from the local encoding to > UTF-8. > > I think declaring everything to be UTF8 may be sufficient. When I use > list.files(), I see the encoding listed as "unknown", but > > x <- list.files() > Encoding(x) <- "UTF-8" > > works. However, the iconv() method should be safer. > > Duncan Murdoch
Hi Duncan, iconv() confirmed what I suspected: there was no problem with the encoding of the result of list.files, and if there had been one, the "è" would not have looked like a "è". Therefore, I got nonsense when treating this "è" as MacRoman to be converted into UTF-8: iconv(b, from="MacRoman", to="UTF-8") [1] "1_MO2 crevettes po2crit.Rda" "1_MO2 soles SeÃÄte sda.Rda" "1_MO2 turbots po2crit.Rda" It is not clear however that R considered b to be UTF=8: Encoding(b) [1] "unknown" "unknown" "unknown" so I followed your suggestion: Encoding(b) <- "UTF-8" Encoding(b) [1] "unknown" "UTF-8" "unknown" but gsub still did not work: b2 = gsub(" Sète", "S", b); b2 [1] "1_MO2 crevettes po2crit.Rda" "1_MO2 soles Sète sda.Rda" "1_MO2 turbots po2crit.Rda" I do not know why gsub worked with example "a" but not "b" in the example shown in my original message. Strange and frustrating. Denis _______________________________________________ R-SIG-Mac mailing list R-SIG-Mac@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-mac