Thank you!. My apologies again for not including the console output in my message before. I sent another e-mail with the output in the meantime, so it should be a bit clearer now, what I am seeing. In case I missed something, please let me know.
Yes, I am using latin1 and cp1252 interchangebly here, mostly because Encoding() is reporting the encoding as "latin1". You presumed correctly that my current/default locale's encoding is CP1252. (I also mentioned that my locale is LC_COLLATE=German_Germany.1252 before). As you are changing encodings, you do not want to preserve encoding! > I am not interested in preserving encodings. What I am worried about is that the encoding is not marked anymore, i.e. that Encoding() returns "unknown". In cp1252 encoding on Windows (note that I am using the cp1252 escape "\x80" and not the Unicode "\u20AC") > x_utf8 <- enc2utf8(c("€", "\x80")) > Encoding(x_utf8) [1] "UTF-8" "UTF-8" > x_nat <- enc2native(x_utf8) > Encoding(x_nat) [1] "unknown" "unknown" See also Kirill's message to this list: "ASCII strings are marked as ASCII internally, but this information doesn't seem to be available, e.g., Encoding() returns "unknown" for such strings " http://r.789695.n4.nabble.com/source-parse-and-foreign-UTF-8-characters-tp4733523.html > > Again, this is not the case with iconv() >> >> x_iutf8 <- iconv(x, to = "UTF-8") >> Encoding(x_iutf8) >> x_inat <- iconv(x_iutf8, from = "UTF-8") >> Encoding(x_inat) >> > > iconv is converting from/to the current locale's encoding, presumably > CP1252, not from the marked encoding (as the help page states explicitly.) > I am aware that iconv is not using the marked encoding, but that you either have to set it explicitly or it uses the current locale's default encoding. As I said I am worried about the fact that the encoding markers get lost with the enc2* functions or rather they are not set correctly. I am just using the iconv example to show that iconv is able to set the encoding markers correctly. So it seems generally possible. > x_iutf8 <- iconv(c("€", "\x80"), to = "UTF-8") > Encoding(x_iutf8) [1] "UTF-8" "UTF-8" > x_iutf8 [1] "€" "€" > x_inat <- iconv(x_iutf8, from = "UTF-8") > Encoding(x_inat) [1] "latin1" "latin1" > x_inat [1] "\u0080" "\u0080" [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel